Fake News Detection & Classification

Accenture AI Studio Challenge Project

Investigated how well machine learning models can identify fake news articles compared to human review, applying Python, NLP, and deep learning methods within Break Through Tech AI's AI Studio accelerator program.

Team Members

Name	GitHub	Contribution
Lin Zhang	@lin-zhang88	Exploratory Data Analysis, Feature Engineering, BERT
Kashvi Vijay	@kv772	Feature Engineering, Logistic Regression
Nancy Huang	@naanci	Exploratory Data Analysis, Feature Engineering, BERT
Adriena Jiang	@adrienajiang	Exploratory Data Analysis, Visualizations, Feature Engineering
Ousman Bah	@Ousmanbah10	Exploratory Data Analysis, Feature Engineering, CNN
Sanskriti Khadka	@Sanskritik7	Exploratory Data Analysis, Feature Engineering, CNN
Harshika Agrawal	@HarshikaAgr	Exploratory Data Analysis, Logistic Regression

Project Highlights

Developed a machine learning model using Logistic Regression, BERT and Neural Networks to identify fake news articles compared to human review.
Achieved 74% accuracy with Logistic Regression, 94% accuracy with BERT, 93.5% accuracy with average embedding model and 96% with CNN model.
Technologies Used: Python, TensorFlow, Keras, PyTorch, Transformers, scikit-learn, pandas, NumPy, matplotlib, seaborn, BERT, LSTM Networks, Google Colab and Jupyter Notebook.

Setup and Installation

Running the Notebook

Clone the Repository

git clone https://github.com/kv772/Accenture1D_AIStudio.git
cd Accenture1D_AIStudio

Create the Virtual Environment

python3 -m venv venv
source venv/bin/activate # MacOS/Linux
venv\Scripts\activate # Windows

Install Dependencies

Ensure you are in the project folder and virtual environment is active.

pip install -r requirements.txt

Download the Datasets

This project uses the Kaggle Fake News Dataset.

After downloading, update notebook paths accordingly.

Run the Notebook

jupyter notebook

Open Accenture_1D_Model.ipynb and run all cells.

Running the Web Application

Backend Server (Flask API)

Navigate to the backend directory:

cd backend

Install backend dependencies:

pip3 install Flask flask-cors joblib scikit-learn numpy scipy

Start the Flask server:

python3 app.py

The backend API will be running at http://localhost:5001

Frontend (React Web App)

Navigate to the WEB directory:

cd WEB

Install frontend dependencies:

npm install

Start the development server:

npm run dev

The web application will be running at http://localhost:3000

Note: If port 3000 is already in use, Vite will automatically run on the next available port (e.g., 3001, 3002).

Running Both Servers

Open two terminal windows and run:

Terminal 1: cd backend && python3 app.py
Terminal 2: cd WEB && npm run dev

Then open http://localhost:3000 in your browser to use the Fake News Detector!

Note: If your ports are occupied, the servers will run on different ports. Check the terminal output for the actual URLs:

Backend: Usually http://localhost:5001 (or next available port)
Frontend: Usually http://localhost:3000 (or next available port shown in terminal)

Project Overview

Trust in digital media and content moderation are critical challenges in today's information ecosystem. Social media platforms, publishers, and advertisers face financial and reputational risk when their services propagate false information. Manual review of news articles is infeasible at scale. With the exponential growth of online content, there is a growing need for automated tools that can support content moderators and improve detection consistency.
This project with Accenture aims to utilize deep learning techniques and NLP models to accurately classify real and fake news. Understanding the projects strength and weaknesses align with Accentures responsible AI initiatives, strengthen digital trust offerings for clients and automate content verification/risk detection.

Data Exploration

Used datasets from Kaggle Fake News Dataset, which includes two CSV files: one containing real news articles and one containing fake news articles.
The true news file included ~21,000 unique entries while the fake news file included ~18,000 unique entries.
Each dataset contains fields such as title, text, subject and date providing multiple features for analysis.
Conducted extensive EDA to identify potential data leakages, feature engineered, and applied text processing steps (tokenization, stop word removal) to transform our dataset for model development.

Model Development

1. Logistic Regression

Logistic Regression was selected because it is lightweight, interpretable, and a strong baseline for text classification.
Paired with TF-IDF, it effectively captures key linguistic and stylistic cues that differentiate real and fake news.
Used HalvingGridSearchCV to tune hyperparameters; the best configuration was C = 1 with an L2 penalty.
Training included 5-fold cross validation to ensure consistent performance across splits.
Performance: 74% accuracy, True F1-score: 97%, Fake F1-score: 96%.

2. BERT

BERT was chosen because it understands contextual meaning by reading text bidirectionally, allowing it to detect tone, writing style, and subtle misleading cues.
Kept stop words since BERT performs better on full sentence structure, which also reduced overfitting.
Performance: 94% accuracy, F1-score: 96%.

3. Neural Networks

Neural Networks allowed us to capture more complex linguistic patterns through deep learning architectures built with TensorFlow/Keras.

Model A: Global Average Pooling

Uses word embeddings and averages them to learn the overall meaning of the article.
Serves as a simple and fast baseline deep learning model.
Performance: 93.5% accuracy.

Model B: 1D CNN

Uses embeddings combined with a convolutional layer to learn phrase-level patterns (n-grams).
Better at capturing tone and structural signals within the text.
Performance: 96% accuracy.

Code Highlights

`Accenture_1D_Model.ipynb`

This notebook contains the full workflow/pipeline for building and evaluating our models for fake news detection.

Results and Key Findings

Successfully trained and evaluated three different models for fake news classification. Each model achieved strong performance, demonstrating that both traditional ML and deep learning architectures can effectively support misinformation detection tasks. The models show strong potential as screening or triage tools to assist human content moderators by flagging potentially misleading content for further review.

Model	Accuracy
Logistic Regression	74%
BERT	94%
Average Embedding	93.5%
CNN	96%

Discussion and Reflection

Throughout this project, our team found different modeling approaches excelled for different reasons. Traditional machine learning models like Logistic Regression performed surprisingly well, especially when paired with TF-IDF, because they captured strong stylistic signals in the text. Deep learning models, such as neural networks and BERT, performed well they captured phrase level patterns effectively and leverage contextual understanding to handle subtle word differences. Our deep learning models are still experiencing overfitting indicating the importance of careful data exploration before model development.

Next Steps

Although our models achieved high accuracy, this may indicate remaining sources of data leakage. Our next step is to perform deeper cleaning and feature analysis to identify and remove any remaining unintended signals.
Re-train all models under stricter preprocessing conditions with a target accuracy of 70–75%, which likely reflects the dataset’s true difficulty once leakage is fully mitigated.

Acknowledgments

Special thanks to Accenture and Break Through Tech AI for making this project possible. We also express our deep appreciation to our coach, Jenna Hunte, and our challenge advisor, Abdul Wahab, for their expert guidance and mentorship throughout the project.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Fake News Detection Datasets		Fake News Detection Datasets
WEB		WEB
backend		backend
Accenture 1D Presentation.pptx		Accenture 1D Presentation.pptx
Accenture_1D_Model.ipynb		Accenture_1D_Model.ipynb
README.md		README.md
logistic_regression_model.joblib		logistic_regression_model.joblib
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Detection & Classification

Accenture AI Studio Challenge Project

Team Members

Project Highlights

Setup and Installation

Running the Notebook

Clone the Repository

Create the Virtual Environment

Install Dependencies

Download the Datasets

Run the Notebook

Running the Web Application

Backend Server (Flask API)

Frontend (React Web App)

Running Both Servers

Project Overview

Data Exploration

Model Development

1. Logistic Regression

2. BERT

3. Neural Networks

Model A: Global Average Pooling

Model B: 1D CNN

Code Highlights

`Accenture_1D_Model.ipynb`

Results and Key Findings

Discussion and Reflection

Next Steps

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

kv772/Accenture1D_AIStudio

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection & Classification

Accenture AI Studio Challenge Project

Team Members

Project Highlights

Setup and Installation

Running the Notebook

Clone the Repository

Create the Virtual Environment

Install Dependencies

Download the Datasets

Run the Notebook

Running the Web Application

Backend Server (Flask API)

Frontend (React Web App)

Running Both Servers

Project Overview

Data Exploration

Model Development

1. Logistic Regression

2. BERT

3. Neural Networks

Model A: Global Average Pooling

Model B: 1D CNN

Code Highlights

Accenture_1D_Model.ipynb

Results and Key Findings

Discussion and Reflection

Next Steps

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

`Accenture_1D_Model.ipynb`

Packages