I will not just write code and build models: I can think adaptively, curate, and implement solutions to your problems - both those that you know, and ones that you do not know you have!
-
My data processing pipelines improved the efficiency of document verification for the Jiinue Youth Program - a Mastercard program run by the Kenya Private Sector Alliance (KEPSA) for financing MSMEs. In this project, I utilized advanced Excel querying techniques to enhance data ETL, which reduced the verification period for more than 60,000 records by over 70% -saving time and costs.
-
At ShopOkoa, I demonstrated how Generative Adversarial Networks (GANs) can be deployed in the training of reliable AI models that can be used for credit scoring and customer classification. This discovery has helped this Fintech startup roll out an AI-powered digital lending platform that has now onboarded more than 500 merchants in Kenya, enabling customers to shop on credit and pay later.
As a freelance data analyst, I have created numerous well-documented end-to-end projects for various clients, as highlighted below.
I am open to corporate and freelance work opportunities.
The project utilized synthetic data to develop various predictive models that informed potential customer trends and credit repayment behaviors.
- Previous history of loan repayment was a more reliable metric of future repayment than other demographic variables.
- Segregating customers into demographic clusters for model-training produced a better credit score than generalized linear models.
The projects aimed to track and predict ROI for a farm tractor used for various activities, including transport services and ploughing. The farmer intended to understand which of the two services was most profitable and at what times of the year each service was most sought after.
- We noted that high income seasons repeatedly preceded the two rain seasons for this region - an indication that in times before the rains, the tractor owner should expect a higher demand for ploughing services in comparison to transport and towing services.
This capstone project aimed at assessing best performing clustering models on a generic dataset. I tested: Random Forests, logistic regression, K-Nearest Neighbours, and Decision Trees.
- Random Forest classifiers outperformed all the other base nodels both in accuracy and runtime.
- On fine tuning, Random Forest classifiers were less likely to overfit compared to the other control models.
- There was a general improvement in accuracy accross all classifiers with the use of cross-validation data folds for model training & testing.
Code: Linear & Tree ML Models
Traffic snarls are a major challenge for modern cities. This project utilized deep learning neural networks to predict traffic flow at different road junctions over a period of years. The intention was to create a predictive AI system that helps road users plan better and save time and resources.
- Combined with Time Series Analysis, Gated Recurrent Units demonstrated a high efficiency and accuracy in predicting traffic flow for different road sections.
- The efficiency and reliability of GNU Neural Networks for predicting traffic flow rely heavily on the size of the dataset (the period of observation).
Code: GRU Neural Networks
This project explored the robustness of R in data analysis and data visualization on a generic dataset.
- I discovered that other Python, R is an effective and robust language for Exploratory Data Analysis(EDA) & data visualization.
- R carries comes with additional libraries, packages, and methods that allow for tweaks, better visuals, and better insights compared with Python.



