Fake news and disinformation have become pervasive threats to societies, shaping public opinion, influencing political discourse, and eroding trust in credible information sources. The rapid evolution of misinformation tactics necessitates adaptive and robust detection mechanisms that go beyond traditional machine learning approaches.
FakeLenseV2 introduces an AI-powered fake news detection framework that integrates Natural Language Processing (NLP) and Deep Reinforcement Learning (DRL) to enhance classification accuracy, adaptability, and robustness. Unlike static classifiers, FakeLenseV2 iteratively refines its decision-making process, ensuring superior resilience against evolving misinformation strategies.
FakeLenseV1 was an LLM-driven fake news detection model that leveraged BERT for deep text comprehension and GPT for generative insights.
π FakeLenseV1 GitHub Repository
Building on this foundation, FakeLenseV2 introduces:
- β LLM-based embeddings for robust text representation
- β Deep Q-Networks (DQN) with residual learning for dynamic classification strategies
- β Adaptive reward mechanism to improve long-term learning efficiency
- β Multi-modal feature integration (text + source credibility + social signals)
- Leverages state-of-the-art LLMs (BERT, RoBERTa) for contextual embeddings
- Captures semantic nuances and linguistic patterns in deceptive content
- Surpasses traditional bag-of-words and TF-IDF methods
- Deep Q-Network (DQN) with residual connections for improved stability
- Double DQN (DDQN) to mitigate Q-value overestimation bias
- Target network smoothing (Ο = 0.005) for reduced training volatility
- Adaptive reward shaping that incentivizes correct classifications
- Source Credibility Scoring: Assigns reliability scores based on news source trustworthiness
- Social Engagement Metrics: Incorporates shares, likes, and public reception data
- Temporal Patterns: Analyzes propagation speed and viral characteristics
- GPU-accelerated for real-time detection
- Batch inference support for large-scale monitoring
- FastAPI REST API with automatic documentation
- Confidence scoring for all predictions
- Structured JSON logging with request tracking
- Rate limiting (100 req/min per IP)
- CORS support for cross-origin requests
- Docker support for easy deployment
- CI/CD pipeline with automated testing
- Integration-ready for social media platforms and fact-checking systems
FakeLenseV2/
βββ code/
β βββ models/ # Neural network architectures
β β βββ dqn.py # DQN and DQNResidual models
β β βββ vectorizer.py # BERT/RoBERTa text embedding
β βββ agents/ # Reinforcement learning agents
β β βββ fake_news_agent.py
β βββ utils/ # Utility modules
β β βββ config.py # Configuration management
β β βββ feature_extraction.py
β β βββ validators.py # Data validation
β β βββ exceptions.py # Custom exceptions
β βββ train.py # Training pipeline
β βββ inference.py # Inference engine
β βββ main.py # CLI interface
β βββ api_server.py # FastAPI REST API
βββ tests/ # Unit tests
βββ data/ # Training and test data
βββ models/ # Saved model checkpoints
βββ Dockerfile # Docker configuration
βββ docker-compose.yml # Docker Compose setup
βββ requirements.txt # Python dependencies
βββ setup.py # Package installation
βββ README_USAGE.md # Detailed usage guide
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Input Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β News Article β β Source Info β β Social Data β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
βββββββββββΌββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Feature Extraction β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β BERT/RoBERTa Embeddings (768-dim vectors) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Source Credibility Encoding (trust scores) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Social Reaction Normalization (engagement metrics) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Deep Q-Network (DQN) Agent β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β State: [Text Embedding + Meta Features] β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Q-Network: 3-Layer ResNet with Layer Normalization β β
β β - Hidden Layer 1: 512 units + Residual β β
β β - Hidden Layer 2: 256 units + Residual β β
β β - Output Layer: 3 actions (Real/Suspicious/Fake) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Target Network: Soft updates (Ο = 0.005) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Output Layer β
β Classification: {0: Fake, 1: Suspicious, 2: Real} β
β Confidence Score: [0.0 - 1.0] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Text Tokenization & Embedding
- Transformer: BERT-base-uncased / RoBERTa-base
- Embedding Dimension: 768
- Pooling Strategy: [CLS] token or mean pooling
- Maximum Sequence Length: 512 tokensSource Credibility Encoding
- Credibility Database: Media bias ratings (MBFC, Ad Fontes)
- Score Range: [0.0, 1.0]
- High-trust sources: Reuters, BBC, AP News (0.9-1.0)
- Low-trust sources: Sensationalist outlets (0.0-0.3)Social Reaction Normalization
- Metrics: Shares, likes, comments, retweets
- Normalization: Min-Max scaling
- Feature: Social engagement velocityState Space
State = [Text_Embedding (768-dim), Source_Score (1-dim), Social_Score (1-dim)]
Total Dimension: 770Action Space
Actions = {
0: Classify as "Fake News",
1: Classify as "Suspicious",
2: Classify as "Real News"
}Reward Function
Reward(s, a, s') = {
+1.0 if correct classification
-0.5 if incorrect with low confidence
-1.0 if incorrect with high confidence (overconfidence penalty)
+0.2 bonus for correct "Suspicious" on ambiguous cases
}DQN Architecture
class ResidualDQN(nn.Module):
Input Layer: 770 β 512 (ReLU + LayerNorm)
Residual Block 1: 512 β 512 + skip connection
Residual Block 2: 512 β 256 + skip connection
Output Layer: 256 β 3 (Q-values for each action)- Experience Replay Buffer: 10,000 transitions
- Batch Size: 64
- Learning Rate: 1e-4 (Adam optimizer)
- Discount Factor (Ξ³): 0.99
- Exploration (Ξ΅-greedy): Ξ΅ starts at 1.0, decays to 0.01
- Target Network Update: Soft update with Ο = 0.005
- Early Stopping: Patience = 15 episodes
- Python 3.8 or higher
- CUDA 11.0+ (for GPU acceleration)
- 8GB+ RAM recommended
git clone https://github.com/Navy10021/FakeLenseV2.git
cd FakeLenseV2# Using venv
python -m venv fakelense_env
source fakelense_env/bin/activate # On Windows: fakelense_env\Scripts\activate
# Or using conda
conda create -n fakelense python=3.8
conda activate fakelense# Install dependencies
pip install -r requirements.txt
# Or install as a package (recommended)
pip install -e .Core Dependencies:
torch>=1.12.0
transformers>=4.20.0
numpy>=1.21.0
pandas>=1.3.0
scikit-learn>=1.0.0
matplotlib>=3.4.0
seaborn>=0.11.0
tqdm>=4.62.0
fastapi>=0.100.0
uvicorn>=0.23.0
pydantic>=2.0.0
slowapi>=0.1.9 # For rate limiting# Using Docker Compose (recommended)
docker-compose up fakelense-api
# Or build manually
docker build -t fakelensev2:latest .
docker run -p 8000:8000 fakelensev2:latestπ For detailed usage examples, see README_USAGE.md
# Training
python -m code.main train --config config.example.json
# Evaluation
python -m code.main evaluate --model models/best_model.pth
# Single Prediction
python -m code.main infer \
--model models/best_model.pth \
--text "Scientists discover new planet..." \
--source "Reuters" \
--reactions 5000from code.inference import InferenceEngine
# Initialize engine (loads model once)
engine = InferenceEngine("models/best_model.pth")
# Make prediction
prediction = engine.predict(
text="Scientists at NASA announce discovery...",
source="Reuters",
social_reactions=5000
)
# 0=Fake, 1=Suspicious, 2=Real
print(f"Prediction: {prediction}")# Start server
python -m code.api_server
# Or using Docker
docker-compose up fakelense-apiMake API requests:
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"text": "Breaking news article...",
"source": "Reuters",
"social_reactions": 5000
}'Response:
{
"prediction": 2,
"label": "Real News",
"confidence": 0.95,
"all_probabilities": {
"fake": 0.02,
"suspicious": 0.03,
"real": 0.95
},
"request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}API Documentation:
- Swagger UI: http://localhost:8000/docs
- Detailed Examples: See API_EXAMPLES.md
# Start API server
docker-compose up fakelense-api
# Run training (one-time)
docker-compose run --rm fakelense-train
# View logs
docker-compose logs -f fakelense-apiFakeLenseV2 achieved 97.2% accuracy on the benchmark dataset, demonstrating state-of-the-art performance in fake news detection.
| Metric | Score |
|---|---|
| Accuracy | 97.2% |
| Precision | 96.8% |
| Recall | 97.5% |
| F1-Score | 97.1% |
Predicted
Fake Sus Real
Actual Fake [485 12 3]
Sus [ 8 290 15]
Real [ 2 11 487]
Performance impact of key components:
| Configuration | Accuracy |
|---|---|
| Full Model (FakeLenseV2) | 97.2% |
| Without Reinforcement Learning | 93.5% |
| Without Source Credibility | 94.8% |
| Without Social Metrics | 95.2% |
| Without BERT Embeddings | 89.1% |
| Baseline (Traditional ML) | 85.3% |
| Model | Accuracy | F1-Score |
|---|---|---|
| FakeLenseV2 (Ours) | 97.2% | 97.1% |
| FakeLenseV1 | 95.8% | 95.4% |
| BERT-Only Classifier | 94.2% | 93.8% |
| LSTM + Attention | 91.5% | 90.9% |
| Random Forest | 87.3% | 86.5% |
| Logistic Regression | 82.1% | 81.3% |
FakeLenseV2 has been trained and evaluated on multiple benchmark datasets:
-
LIAR Dataset (Wang, 2017)
- 12,836 labeled statements
- 6 fine-grained labels for truthfulness
-
FakeNewsNet (Shu et al., 2018)
- PolitiFact: 314 news articles
- GossipCop: 5,464 news articles
- Includes social context features
-
ISOT Fake News Dataset
- 44,898 articles (21,417 real + 23,481 fake)
- Collected from Reuters and unreliable sources
# Example preprocessing pipeline
from preprocessing import prepare_dataset
train_data, test_data = prepare_dataset(
dataset_path='./data/raw/',
train_split=0.8,
max_length=512,
include_metadata=True
)- β Deep Q-Learning integration
- β Multi-modal feature fusion
- β Source credibility scoring
- β Social engagement analysis
- π Multi-language support (Korean, Spanish, French)
- π Explainable AI (XAI) module with attention visualization
- π Active learning for continuous model improvement
- π Graph Neural Networks for propagation pattern analysis
- π Adversarial robustness testing and defense mechanisms
- π Cross-platform integration (Twitter, Facebook, YouTube APIs)
- π Real-time monitoring dashboard
- π Federated learning for privacy-preserving training
- README_USAGE.md - Detailed usage guide with examples
- IMPROVEMENTS.md - Complete changelog and improvements
- CONTRIBUTING.md - Contribution guidelines
- API Documentation - Interactive API docs (when server is running)
We welcome contributions from the community!
π See CONTRIBUTING.md for detailed guidelines
# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/FakeLenseV2.git
cd FakeLenseV2
# 2. Install development dependencies
pip install -e ".[dev]"
# 3. Run tests
pytest tests/ -v
# 4. Create a branch and make changes
git checkout -b feature/your-feature-name
# 5. Submit pull request- π Report bugs via Issues
- π‘ Suggest features
- π Improve documentation
- π§ Submit pull requests
- β Star the repository!
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Navy Lee, Seoul National University
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
- Navy Lee - Principal Investigator
- Seoul National University Graduate School of Data Science (SNU GSDS)
- Hugging Face team for the Transformers library
- PyTorch team for the deep learning framework
- The open-source community for valuable feedback
If you use FakeLenseV2 in your research, please cite:
@software{fakelensev2_2024,
author = {Lee, Navy},
title = {FakeLenseV2: An AI-Powered Fake News Detection System Integrating LLMs and Deep Reinforcement Learning},
year = {2024},
publisher = {GitHub},
url = {https://github.com/Navy10021/FakeLenseV2}
}For questions, suggestions, or collaborations:
- GitHub Issues: Create an issue
- Email: [iyunseob4@gmail.com]
Made with β€οΈ by the SNU GSDS Research Team