This project is a Cyber Threat Intelligence (CTI) reconnaissance tool developed in Go (Golang).
It automatically extracts the Raw HTML Source Code and captures a Full-Page Screenshot of a target website, organizing the collected data into structured directories.
In CTI analysis, gathering data from a target often requires a hybrid approach. This tool implements a dual-method strategy:
- Raw Data Collection (
net/http): Fetches the static HTML source directly from the server. This is crucial for analyzing hidden comments, meta tags, or malicious scripts that might not be rendered in a browser. - Visual Evidence (
chromedp): Uses a headless Chrome browser to render the page and capture a full-page screenshot, providing visual proof of how the site appears to an end-user.
- CLI Support: Easy-to-use command line interface with flags.
- Smart Sanitization: Automatically cleans URLs to create OS-compatible directory names.
- Hybrid Extraction: Combines standard HTTP requests with headless browser automation.
- Full-Page Screenshots: Captures the entire scrollable area of the webpage.
- Organized Output: Saves artifacts in a structured
output/directory.
Ensure you have Go (1.20+) and Google Chrome installed on your machine.
-
Clone the repository:
git clone https://github.com/MESLEKDAA/go-web-scraper.git cd go-web-scraper -
Install dependencies:
go mod tidy
Run the program from the terminal using the -url flag.
Basic Usage:
go run main.go -url https://example.comAfter scanning, an output directory is created in the project root. Example structure for https://example.com:
go-web-scraper/
│
├── main.go
├── go.mod
├── ...
└── output/
└── example.com/ <-- Auto-generated folder
├── 2025-12-20_11-56-35_source_code.html <-- Raw HTML data
└── 2025-12-20_11-56-35_screenshot.png <-- Timestamped visual proof
- Core: Go (Golang) 1.20+
- Browser Automation: Chromedp
- Networking: net/http
This tool is developed for educational purposes and authorized security testing only. The developer is not responsible for any misuse or damage caused by this tool.