A powerful Spring Boot application with professional Swing GUI for web scraping news websites and analyzing articles with AI-powered sentiment analysis.
- Smart Link Extraction: Scrapes and lists latest news articles from websites
- Content Preview: Click any link to view full article content
- Image Display: Shows article images with proper loading and scaling
- News Focus: Filters out navigation/footer links, shows only articles
- Performance Optimized: Background loading prevents UI freezing
- Professional UI: Modern blue color scheme with numbered article list
- Detailed Article Parsing: Extract headline, author, publish date, and content
- Sentiment Analysis: AI-powered emotion detection (๐ Positive/๐ Negative/๐ Neutral)
- Word Count: Automatic article statistics
- Image Extraction: Finds and displays article images
- Keyword Analysis: Shows positive/negative sentiment keywords
- Professional UI: Modern green color scheme with structured report format
- CSV Export: Export individual article analysis to CSV format
- PDF Export: Generate professional PDF reports with complete analysis
- Batch Analysis: Analyze multiple articles and store them in memory
- Batch Export: Export all analyzed articles at once to CSV or PDF
- Flexible Workflow: Add articles to batch, export when ready, or clear batch
The application features a professional, industry-standard UI with:
- Modern Color Scheme: Blue and green themes with professional grays
- Intuitive Icons: Emoji icons for better visual recognition
- Clear Feedback: Loading states, success/error messages with helpful suggestions
- Responsive Layout: Split panes with adjustable dividers
- Enhanced Typography: Clear fonts and proper spacing throughout
- Hover Tooltips: Full information on hover for truncated text
- Numbered Lists: Easy-to-follow article enumeration
- Status Bar: Real-time application status with tips
- Spring Boot 3.4.8 - Application framework
- Java Swing - Desktop GUI with custom styling
- JSoup 1.18.1 - HTML parsing and web scraping
- Apache HTTP Client - HTTP connections
- Apache Commons CSV 1.10.0 - CSV export functionality
- iText7 7.2.5 - PDF generation and export
- Java 17 - Runtime environment
- Java 17 or higher
- Maven 3.6+
- Internet connection for web scraping
mvn spring-boot:runmvn clean package
java -jar target/web-scraper-app-1.0.0.jarRun the WebScraperApplication.java main class
- BBC News (
https://www.bbc.com/) - CNN (
https://www.cnn.com/) - Reuters (
https://www.reuters.com/) - NBC News (
https://www.nbcnews.com/) - The Guardian (
https://www.theguardian.com/)
- Telegraph India
- Many paywalled news sites
- Sites with heavy JavaScript content loading
- Enter a news website URL (e.g.,
https://www.bbc.com/) - Click "๐ Get Latest Articles" button to scrape
- Browse the numbered list of articles on the left
- Select any article to view its content and images
- Images load automatically in the background
- Clear success/error messages guide you throughout
- Paste a specific article URL in the input field
- Click "๐ง Analyze with AI" button
- View the comprehensive analysis in the left panel:
- Headline with hover for full text
- Author information
- Publication Date
- Sentiment Analysis with emoji indicator and color coding
- Read the structured report in the main area:
- Formatted headline and metadata
- Sentiment analysis with score
- Sentiment keywords (positive/negative)
- Full article content
- View extracted images below the content
- All processing happens in the background for smooth experience
Exporting Single Articles:
- After analyzing an article, use the export buttons in the left panel
- Click "๐พ CSV" to export to CSV format
- Click "๐ PDF" to export to PDF format
- Choose the save location in the file dialog
- Get confirmation when export is successful
Batch Analysis Workflow:
- Analyze an article as usual
- Click "โ Add to Batch" to store it for batch processing
- Repeat steps 1-2 for multiple articles
- Click "๐ฆ Export Batch" when ready
- Choose CSV or PDF format
- All articles are exported to a single file
- Use "๐๏ธ Clear Batch" to start fresh
Benefits of Batch Analysis:
- Compare multiple articles at once
- Generate consolidated reports
- Save time with bulk exports
- Perfect for research and analysis tasks
The built-in sentiment analyzer provides:
- Analyzes emotional tone of articles using word-based analysis
- Scores from -1.0 to +1.0 (negative to positive)
- Color coding: ๐ข Positive (Green), ๐ด Negative (Red), ๐ต Neutral (Blue)
- Emoji indicators: ๐ for positive, ๐ for negative, ๐ for neutral
- Keyword detection shows sentiment-bearing words found in the article
- Statistical analysis with word count and sentiment score metrics
- Structured report format with clear sections and formatting
- Connection timeout: 5-8 seconds
- Read timeout: 8-10 seconds
- Max article links: 25 (for performance)
- Max images per article: 3-5
- Max image size: 300x200px (scaled automatically)
The application uses proper browser headers to avoid blocking:
- Modern Chrome User-Agent
- Accept headers for HTML/images
- Referer headers for legitimacy
Add to your pom.xml:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.16.2</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
</dependency>
</dependencies>The application handles common issues:
- 403 Forbidden: Website blocks automated requests
- Connection timeouts: Network or server issues
- SSL errors: Certificate problems with HTTPS sites
- Image loading failures: Graceful fallbacks with error messages
- Content extraction failures: Clear user feedback
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ARTICLE ANALYSIS REPORT
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฐ HEADLINE
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Breaking: Major Economic Policy Changes Announced
โ๏ธ METADATA
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Author: John Smith
Published: 2024-08-07 10:30:00
Words: 847 words
๐ญ SENTIMENT ANALYSIS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Overall Sentiment: ๐ Negative (Score: -0.23)
๐ SENTIMENT KEYWORDS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
Positive: progress, improve, success
โ Negative: crisis, problem, decline, concern
๐ ARTICLE CONTENT
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[Full article text here...]
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- JavaScript-heavy sites: JSoup cannot execute JavaScript, so dynamic content may not be captured
- Anti-bot protection: Some sites actively block automated requests
- Image loading: Some images may fail due to CORS or authentication requirements
- Use major news sites: BBC, CNN, Reuters work best
- Check robots.txt: Respect website scraping policies
- Don't overwhelm servers: Built-in delays prevent server overload
- Try different URLs: If one site blocks, try alternatives
- Export analysis results to PDF/CSV โ COMPLETED
- Batch article analysis โ COMPLETED
- Advanced sentiment analysis with machine learning
- Support for RSS feeds
- Custom keyword tracking
- Article comparison features
Built with โค๏ธ using Spring Boot and Java Swing
For questions or issues, please open a GitHub issue or contact the maintainer.