GitHub - tonytan1/WebCrawler: A stanalone program that iterate all links from root by BFS/DFS

tonytan1 / WebCrawler Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A stanalone program that iterate all links from root by BFS/DFS

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
gradle/wrapper		gradle/wrapper
src/main/java/webcrawler		src/main/java/webcrawler
ReadMe.txt		ReadMe.txt
WebCrawler.iml		WebCrawler.iml
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Repository files navigation

/*
*Project: WebCrawler
*
*@author TonyTan
**/
Apply BFS (or DFS) strategy to visit all links within a url, and check whether linked page contains errors/unsafe-data	 
This project utilises several techniques: 
	1. Breadth-first search (or depth-first search) strategy; 
	2. Selenium interacts with HTML, e.g., parse html page to get all links; 
	click link to open a new window; 
	judge whether pages contain bugs (e .g., errors/unsafe-data);
	take screenshot for problematic pages;
	
At the same time, creating a concurrent version of WebCrawler which aims to reduce running time.
It provides multiple mode (it depends on your computer)to execute the multi-thread.
However, this class is different WebCrawler since it loads link data from external files while WebCrawler collects link data (via DFS/BFS) at runtime. I