I'm a committed researcher currently pursuing my MSc in Data Science at BRAC University. I'm passionate about Machine Learning, Natural Language Processing, Computer Vision and Large Language Models. I find the intersection of these areas particularly intriguing and aim to contribute significantly to the exciting world of technology and data science.
Institution: BRAC University Duration: Summer 2023 - Present
Key Courses and Learning Outcomes:
Symbolic Machine Learning I: Learned about the classical machine learning algorithms such as random forest, regression trees, AdaBoost, gradient boosting, k-nearest neighbors, and support vector machines. Developed proficiency in clustering techniques and dimensionality reduction using Principal Component Analysis (PCA). Completed additional DataCamp courses in supervised and unsupervised learning, reinforcing practical implementation skills.
Symbolic Machine Learning II: Focused on advanced Natural Language Processing (NLP) techniques, including Chomsky Normal Form (CNF) algorithms, structured prediction, and representation learning. Studied sequence learning models and transformer architectures in depth. Explored critical aspects of AI ethics, interpretability in machine learning models, and foundations of dataset construction for NLP tasks.
Advanced Artificial Intelligence: Developed expertise in genetic algorithms, TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution), fuzzy TOPSIS, and Markov chains. Gained proficiency in probabilistic reasoning and decision-making using graphical models and Bayesian networks. Implemented AI algorithms for complex problem-solving, including reinforcement learning and recommender systems.
Neural Network and Fuzzy Systems: Acquired in-depth knowledge of various neural network architectures including Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Graph Neural Networks (GNN). Studied the foundations of neural networks, including backpropagation and optimization techniques. Explored fuzzy systems, fuzzy control, and genetic algorithms.
Distributed Computing Systems: Acquired proficiency in shell scripting and version control with Git. Studied fundamental concepts of distributed systems and cloud computing, with a focus on ACID properties in distributed databases and the CAP theorem.
Graph Theory: Learned about algorithms for directed and undirected graphs, including path finding, cycle detection, and tree structures. Studied network flow problems, graph coloring, and planarity, along with theoretical concepts such as Eulerian cycles and Menger's Theorem.
Institution: BRAC University Duration: Summer 2018 - Fall 2022 CGPA: 3.45 Thesis: Yoga Posture Detection Using Deep Learning and Ensemble Modeling
My academic focus intertwines natural language processing (NLP) and computer vision. My thesis explored image classification in computer vision, while my passion extends to NLP and trnasformers based large language models.
- Programming Languages: Python, Java
- Tools & Frameworks: TensorFlow, PyTorch, scikit-learn, spaCy
- Data Manipulation & Analysis: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Mathematical Skills for Machine Learning:
- Sound understanding of Linear Algebra, Calculus, Probability & Statistics.
- Soft Skills: Communication, Problem-solving, Team Collaboration
- Developed an end-to-end system for creating a semi-synthetic Bengali corpus using advanced NLP techniques.
- Employed ChatGroq LLM for generating high-quality Bengali summaries from source texts.
- Implemented a robust Quality Control (QC) process with metrics like ROUGE-L, brevity, clarity, and accuracy.
- Utilized tools such as LangChain, Groq API, and Hugging Face Transformers for efficient summarization and evaluation.
- Created a scalable pipeline for expanding the corpus to include other datasets like Paraphrase, Question Answering, and Dialogues.
- Link to Project Repository
- Developed NER models for multilingual (12 languages) and Bengali monolingual datasets.
- Fine-tuned BERT-based models, utilizing bert-base-multilingual-cased for multilingual data and bangla-bert-base for Bengali.
- Achieved F1 score of 0.62 and accuracy of 0.92 on the Bengali dataset using bangla-bert-base, outperforming multilingual models.
- Employed tokenization adjustments, data augmentation, and optimized hyperparameters for improved performance.
- Link to Project Repository
- Fine-tuned a large language model for Bengali using distributed computing and quantization techniques.
- Optimized the model using QLORA and layer-wise relevance adjustments, reducing the model size by over 70% for efficient local deployment.
- Created a custom deployment pipeline using Ollama, LangChain, Streamlit, and Docker for seamless local execution.
- Employed distributed training on multiple GPUs, enhancing fine-tuning efficiency for large models.
- Link to Project Repository
- Implemented a YOLO-based model for automated detection of diseases in cucumber leaves.
- Explored object detection techniques and fine-tuned YOLO architecture for plant pathology.
- Conducted data augmentation and preprocessing for improved model generalization.
- Optimized model parameters for real-time disease detection in agricultural settings.
- Link to Project Repository
- Developed an algorithm for efficiently classifying and recommending music based on emotions.
- Applied natural language processing (NLP) techniques for sentiment analysis of song lyrics.
- Utilized machine learning models for emotion classification and recommendation engine.
- Conducted feature engineering to capture emotional nuances in music data.
- Link to Project Repository
- Implemented a deep learning-based system for accurate detection of yoga poses.
- Utilized computer vision techniques, including image preprocessing and feature extraction.
- Explored convolutional neural networks (CNNs) for image classification and pose estimation.
- Fine-tuned models for improved accuracy and robustness against diverse yoga poses.
- Link to Project Repository
- Developed a T5-based model for generating Bengali poetry using the complete collection of Kazi Nazrul Islam.
- Applied advanced tokenization and attention mechanisms from the Transformer architecture.
- Conducted thorough data preprocessing to handle linguistic nuances and optimize model training.
- Evaluated model performance using metrics such as BLEU score and semantic coherence.
- Link to Project Repository
- Fine-tuned GPT-2 on Bengali literature for contextual text generation.
- Implemented tokenization techniques and utilized transfer learning on a pre-trained GPT-2 model.
- Explored hyperparameter tuning for optimal model performance.
- Conducted in-depth analysis of generated text to ensure linguistic coherence.
- Link to Project Repository
- Email: [shah.imran.1599@gmail.com]