Here’s the original README tailored for your Wat’sTheStory Data Analysis Project:
WatTheStory Data Analysis Project
Description
This project focuses on exploring and analyzing the Wat’sTheStory dataset to uncover trends in cybersecurity, technology, stock markets, marketing, and Irish news. By transforming a messy dataset into a structured format, the analysis aims to extract meaningful insights, identify correlations, and highlight emerging trends using advanced data analysis techniques.
Key Features 1. Data Cleaning and Preprocessing: • Cleaned the dataset using Python and Pandas to handle missing values, remove inconsistencies, and standardize text. • Added meaningful features like word count and cleaned summaries for deeper analysis. 2. Keyword Analysis: • Extracted dominant keywords using Scikit-learn’s CountVectorizer to uncover thematic patterns across topics. 3. Sentiment Analysis: • Classified sentiments as positive, neutral, or negative using TextBlob, providing insights into public mood across categories. 4. Visualizations: • Generated word clouds, topic-wise trends, and sentiment distributions using Matplotlib, Plotly, and WordCloud. 5. Emerging Trends: • Explored startup growth in FinTech and AI, revealing a significant spike in September 2024 tied to industry events. 6. Topic Correlations: • Analyzed overlaps between topics like tech news and cybersecurity, uncovering their interconnected nature.
Technologies Used • Python: Core language for data manipulation and analysis. • Pandas: For data cleaning and feature engineering. • Scikit-learn: For keyword extraction and thematic analysis. • TextBlob: For sentiment analysis. • Matplotlib, Plotly, WordCloud: For creating insightful visualizations.
Key Insights 1. Emerging Startups: • Identified FinTech and AI startups as the leading sectors for growth, with a significant spike in September 2024. 2. Sentiment Trends: • Positive sentiment dominated tech news, reflecting optimism for innovation, while cybersecurity leaned negative, highlighting risk concerns. 3. Word Cloud Analysis: • Highlighted recurring themes like data, security, privacy, and AI across categories. 4. Topic Overlaps: • Strong connections between tech news and cybersecurity demonstrated the ripple effects of technological advancements on security.
Future Work • Incorporate real-time data for dynamic insights. • Leverage advanced NLP models like BERT for more robust sentiment and thematic analysis. • Expand to additional topics like healthcare or climate change for broader insights.
Call to Action
Explore the insights and discover how structured data analysis transforms news: • Visit the Wat’sTheStory platform: https://www.watsthestory.ie/