Skip to content

Exploring and analyzing the Wat’sTheStory dataset to uncover trends in cybersecurity, technology, startups, and more using Python

Notifications You must be signed in to change notification settings

Mugeshgithub/Wat-sTheStory-DataAnalysis.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Here’s the original README tailored for your Wat’sTheStory Data Analysis Project:

WatTheStory Data Analysis Project

Description

This project focuses on exploring and analyzing the Wat’sTheStory dataset to uncover trends in cybersecurity, technology, stock markets, marketing, and Irish news. By transforming a messy dataset into a structured format, the analysis aims to extract meaningful insights, identify correlations, and highlight emerging trends using advanced data analysis techniques.

Key Features 1. Data Cleaning and Preprocessing: • Cleaned the dataset using Python and Pandas to handle missing values, remove inconsistencies, and standardize text. • Added meaningful features like word count and cleaned summaries for deeper analysis. 2. Keyword Analysis: • Extracted dominant keywords using Scikit-learn’s CountVectorizer to uncover thematic patterns across topics. 3. Sentiment Analysis: • Classified sentiments as positive, neutral, or negative using TextBlob, providing insights into public mood across categories. 4. Visualizations: • Generated word clouds, topic-wise trends, and sentiment distributions using Matplotlib, Plotly, and WordCloud. 5. Emerging Trends: • Explored startup growth in FinTech and AI, revealing a significant spike in September 2024 tied to industry events. 6. Topic Correlations: • Analyzed overlaps between topics like tech news and cybersecurity, uncovering their interconnected nature.

Technologies Used • Python: Core language for data manipulation and analysis. • Pandas: For data cleaning and feature engineering. • Scikit-learn: For keyword extraction and thematic analysis. • TextBlob: For sentiment analysis. • Matplotlib, Plotly, WordCloud: For creating insightful visualizations.

Key Insights 1. Emerging Startups: • Identified FinTech and AI startups as the leading sectors for growth, with a significant spike in September 2024. 2. Sentiment Trends: • Positive sentiment dominated tech news, reflecting optimism for innovation, while cybersecurity leaned negative, highlighting risk concerns. 3. Word Cloud Analysis: • Highlighted recurring themes like data, security, privacy, and AI across categories. 4. Topic Overlaps: • Strong connections between tech news and cybersecurity demonstrated the ripple effects of technological advancements on security.

Future Work • Incorporate real-time data for dynamic insights. • Leverage advanced NLP models like BERT for more robust sentiment and thematic analysis. • Expand to additional topics like healthcare or climate change for broader insights.

Call to Action

Explore the insights and discover how structured data analysis transforms news: • Visit the Wat’sTheStory platform: https://www.watsthestory.ie/

About

Exploring and analyzing the Wat’sTheStory dataset to uncover trends in cybersecurity, technology, startups, and more using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages