Skip to content

CrisisWatch-NLP is a social media monitoring tool that uses NLP, sentiment analysis, and geospatial mapping to detect and visualize crisis-related discussions on Reddit, including mental health distress, suicidality, and substance use.

Notifications You must be signed in to change notification settings

Precioux/CrisisWatch-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 CrisisWatch-NLP

Crisis Detection using NLP, Sentiment Analysis & Geospatial Mapping

CrisisWatch-NLP is a social media analysis tool. The goal is to build a pipeline that detects, analyzes, and visualizes crisis-related discussions (e.g., suicidality, mental health distress, substance use) from Reddit using advanced NLP techniques and geospatial tools.


πŸ“Œ Objectives

  • πŸ” Extract crisis-related posts from Reddit based on keyword filters and location cues
  • πŸ’¬ Analyze sentiment using VADER
  • 🧠 Assess crisis risk levels using BERT-based semantic similarity
  • πŸ—ΊοΈ Visualize regional distress patterns on an interactive map

βœ… Tasks Completed

πŸ“₯ Task 1: Social Media Data Extraction & Preprocessing

  • Collected posts via Reddit’s praw API from mental health subreddits
  • Filtered posts using a keyword list (e.g., β€œi want to die”, β€œrelapse”)
  • Cleaned text using nltk (stopwords, punctuation, emojis removed)
  • Output: raw_reddit_posts.csv, cleaned_reddit_posts.csv

πŸ’¬ Task 2: Sentiment & Crisis Risk Classification

  • Applied VADER sentiment scoring (Positive, Neutral, Negative)
  • Embedded post text and high-risk phrases using BERT (sentence-transformers)
  • Calculated semantic similarity to label risk levels:
    • πŸŸ₯ High-Risk
    • 🟧 Moderate Concern
    • 🟩 Low Concern
  • Output: reddit_bert_risk_labeled.csv, distribution plots

🌍 Task 3: Crisis Geolocation & Mapping

  • Extracted city/state names using spaCy NER (GPE labels)
  • Geocoded locations with geopy.Nominatim
  • Created interactive Folium heatmap of distress discussions
  • Displayed Top 5 crisis-prone locations
  • Output: crisis_heatmap.html

πŸ› οΈ Tech Stack

Component Tool / Library
Data Source Reddit via praw
NLP nltk, spaCy, sentence-transformers
Sentiment vaderSentiment
Embeddings BERT (all-MiniLM-L6-v2)
Geocoding geopy
Mapping folium
Visualization matplotlib, seaborn
Environment Google Colab

▢️ How to Run

  1. Open the notebook in Google Colab
  2. Run code cells step-by-step:
    • Extract posts
    • Clean text
    • Run sentiment & BERT classification
    • Extract and map locations
  3. Download outputs:
    • CSV files
    • crisis_heatmap.html

πŸ“ Project Structure

/crisiswatch-nlp/
β”œβ”€β”€ raw_reddit_posts.csv
β”œβ”€β”€ cleaned_reddit_posts.csv
β”œβ”€β”€ reddit_bert_risk_labeled.csv
β”œβ”€β”€ crisis_heatmap.html
β”œβ”€β”€ README.md
└── [Colab Notebook].ipynb

About

CrisisWatch-NLP is a social media monitoring tool that uses NLP, sentiment analysis, and geospatial mapping to detect and visualize crisis-related discussions on Reddit, including mental health distress, suicidality, and substance use.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published