Skip to content
View hhashifa-port's full-sized avatar

Block or report hhashifa-port

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hhashifa-port/README.md

Hello!

I'm Hasna Hashifa—a Mathematics graduate with a deep passion for Data Science. I truly enjoy delving into data—exploring, learning, and developing my skills in building data pipelines and predictive models. I’m highly motivated to apply my knowledge to real-world cases, where I can gain hands-on experience and uncover valuable insights that drive better decision-making. Let's connect and explore the world of data together, I'm always happy to learn and improve! :D

Here are a few things about me:

  • 🌱 Areas I'm currently exploring are:
    Data Wrangling (and Cleaning), Exploratory Data Analysis, Statistical Analysis, Data Visualization, Machine Learning Modeling (training with hyperparameter tuning and predicting, focused on supervised and unsupervised learning). I'm recently studying about machine learning applied to unstructured data such as NLP.

  • 🔭 Tools I often work with are:
    Python, SQL (Postgre), Tableau (or Power BI), Microsoft Office Product (including Microsoft Excel).

  • 📫 Feel free to reach me via mail: hhashifa@gmail.com

  • 👯 Here’s a summary of my hands-on experience; the projects I worked on while learning data science. There are still many things that could be improved, so I’ve added some notes for future improvements.

Project Goals Area Libraries Used Project Description Notes
Machine Failure Diagnostic Dashboard Predicting and diagnosing industrial machine failure risks using a two-stage classification approach.
Diagnostic Dashboard
Machine Learning, Predictive Maintenance, Feature Engineering, SMOTE, Web Deployment. Python: Pandas, NumPy, Scikit-Learn, XGBoost, Imbalanced-learn (SMOTE), Streamlit, Joblib. This project develops an end-to-end diagnostic system using a sequential pipeline: a Binary Classifier first filters normal operations from failures, then a Multiclass Specialist Model (trained exclusively on failure events) identifies the specific root cause (HDF, PWF, OSF, or TWF). The system includes an interactive dashboard that provides real-time risk levels and prescribed maintenance actions. The multiclass model is designed as a specialist that only learns from failure instances, eliminating the noise from the majority 'Normal' class. This architecture ensures high sensitivity to specific failure modes once an anomaly is detected by the first stage.
Data Analysis for a Crisis Recovery to an Online Food Delivery Start Up Data analysis and visualization. This whole project are based on codebasic challenges here. Dataset preparation, Visualization, create a dashboard as a result from data analysis (Report) SQL using PostgreSQL, Visualization using Tableau We want to analyze a recovery strategy for an Online Food Delivery after crisis incident occured that affected the decline in number of orders, total revenue, customer satisfaction, and performance rate on each delivery and restaurant partners. As a result we make a customer segmentation based on customer profile (RFM score, satisfaction and review) and general ideas to overcome the aftermath situation for each segment. You can see the dashboard here. We can still go deep to analyze each segment's profile so we can personalized the strategies. Besides, there are a cost that are needed to be considered when it comes to recovery agenda. We have to come to the solution that works for both cost strategy and projected gains revenue
Fraud Detection Analysis Built a model to predict whether a transaction is fraudulent or not. Supervised Learning - Classification os, json, numpy, pandas, matplotlib, searborn, scikit-learn, scipy, garbage-collector (since the data size are huge) - Data Collection and Preparation: Combined data from multiple CSV and JSON files into a single, unified dataset containing all relevant information from the original files.
- Data Splitting: Divided the dataset into training, validation, and test sets.
- Exploratory Data Analysis (EDA) on Training Data: Performed statistical and technical analysis, as well as an investigation of the relationship between each feature and the target variable, is_fraudulent.
- Data Preprocessing: Conducted feature engineering, outlier detection using Isolation Forest, standardization, encoding of categorical features, and feature correlation and selection.
- Data Modeling: Implemented machine learning models including Logistic Regression, Decision Tree, Random Forest, and XGBoost, with hyperparameter tuning focused on AUC metrics.
Fraud detection can be challenging since fraudulent transactions are rare and may appear as outliers. Given the small proportion of fraud cases, using the F1 score is advisable to reduce the risk of incorrectly labeling fraudulent transactions as non-fraudulent.
Online Shoppers Purchasing Intention This project aims to predict whether a user visiting an online store during a given session will make a purchase, using data collected from user interactions and session behavior. Supervised Learning - Classification pandas, numpy, matplotlib, seaborn, scikit-learn, shap - Exploratory Data Analysis (EDA): Explored and visualized the data to understand its statistics and uncover insights useful for both modeling and business interpretation.
- Data Preprocessing: This step covered data cleaning, feature encoding, feature engineering, feature selection, and feature transformation — making sure the dataset was fully numeric before training.
- Machine Learning Modeling and Evaluation: Trained and compared several models, including Logistic Regression, KNN, Decision Tree, Random Forest, AdaBoost, and XGBoost, while analyzing learning curves to fine-tune the hyperparameters.
- Make sure to understand every feature so the analysis results aren’t misinterpreted.
- To avoid data leakage, do the exploratory data analysis (EDA) only on the training data.
- Be careful when doing feature encoding, especially for nominal and categorical features.
Human Resources Analytics The goal of this project is to analyze and visualize the characteristics of employees who decide to leave the company and those who stay, and to explore what factors might influence their decisions based on the given dataset. - Python for Analysis and Visualization
- Canva for making reports (non-interactive dashboard)
pandas, numpy, matplotlib, seaborn, scipy Focused on analyzing employee attrition data and creating visualizations in Python using Matplotlib and Seaborn. The analysis was performed on a dataset that had been cleaned and aggregated to make it ready for exploration. Reports created in a non-interactive format are less useful for up-to-date decision making. It would be more effective to build an interactive multi-page dashboard, where each page focuses on some specific KPIs to be presented.
Telecom Customer Churn Built a machine learning model to predict whether a customer will churn or not. Supervised Learning - Classification pandas, numpy, matplotlib, seaborn, scikit-learn, scipy - Data Validation: Ensured that each feature in the dataset accurately represents its intended meaning and has the correct data type.
- Exploratory Data Analysis (EDA) on Training Data: Focused on understanding the characteristics of the data in preparation for modeling.
- Data Preprocessing and Cleaning on Training Data: Executed insights gained from the EDA process to prepare the data for modeling. Steps included imputation, feature selection, encoding, creating new features, and handling class imbalance in the dataset.
- Modeling and Evaluation: Applied Logistic Regression, K-Nearest Neighbors (KNN) Classifier, Decision Tree, Random Forest, and XGBoost. Evaluation focused on Recall and ROC-AUC metrics.
The next step could involve customer segmentation to identify specific strategies to reduce churn within each segment.
Airline Customer Value Analysis The project aimed to analyze customer value in an airline company and create actionable clusters that could inform the next steps in marketing campaigns. Unsupervised Learning - Clustering pandas, numpy, matplotlib, seaborn, scikit-learn, scipy Conducted customer segmentation using K-Means clustering combined with Principal Component Analysis (PCA), resulting in four distinct clusters. The insights from each cluster can guide the next marketing campaigns for more precise targeting. - Ensure a clear understanding of the meaning of each feature in the dataset, so that data preprocessing and interpretation are accurate.
- Verify that the data meets the requirements for modeling using K-Means and PCA.
- Conduct further analysis on the clustering results to identify the most effective clusters, which can then be used to design data-driven marketing campaigns.
Raw: Customer Insights for E-Commerce This project analyzes the characteristics of customers from a certain e-commerce. SQL and Tableau There's no missing values nor duplicates data. Whereas, SQL used for data validation such as data type and inconsistent formatting, merged information from one to another database.

Popular repositories Loading

  1. Online-Shoppers-Purchasing-Intention Online-Shoppers-Purchasing-Intention Public

    Machine learning model to predict whether a user visiting a particular session in an online store will make a purchase or not. Exploratory data analysis, data processing and machine learning modeli…

    Jupyter Notebook

  2. Human-Resources-Analytics Human-Resources-Analytics Public

    Analyze and visualize the characteristics of employees who choose to leave the company or not, and what influences employees to leave the company based on the available dataset.

    Jupyter Notebook

  3. Airline-Customer-Clustering Airline-Customer-Clustering Public

    Unsupervised machine learning model to analyze customers value in an airline company. Exploratory data analysis, data preparation, data processing, and modeling were carried out using K-Means clust…

    Jupyter Notebook

  4. Telecom-Customer-Churn Telecom-Customer-Churn Public

    Melakukan eksplorasi terkait karakteristik perilaku customer, melakukan data preprocessing, dan membangun model machine learning untuk memprediksi customer churn.

    Jupyter Notebook

  5. Customer-Insights-for-E-Commerce Customer-Insights-for-E-Commerce Public

    Using query with PostgreSQL for data preprocessing and Tableau for dashboard visualization

  6. hhashifa-port hhashifa-port Public