Skip to content

A CTI reconnaissance tool that captures HTML source code and full-page screenshots developed with Go.

Notifications You must be signed in to change notification settings

MESLEKDAA/go-web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go Web Scraper & Screenshot Tool (CTI)

This project is a Cyber Threat Intelligence (CTI) reconnaissance tool developed in Go (Golang).

It automatically extracts the Raw HTML Source Code and captures a Full-Page Screenshot of a target website, organizing the collected data into structured directories.

Purpose & Methodology

In CTI analysis, gathering data from a target often requires a hybrid approach. This tool implements a dual-method strategy:

  1. Raw Data Collection (net/http): Fetches the static HTML source directly from the server. This is crucial for analyzing hidden comments, meta tags, or malicious scripts that might not be rendered in a browser.
  2. Visual Evidence (chromedp): Uses a headless Chrome browser to render the page and capture a full-page screenshot, providing visual proof of how the site appears to an end-user.

Features

  • CLI Support: Easy-to-use command line interface with flags.
  • Smart Sanitization: Automatically cleans URLs to create OS-compatible directory names.
  • Hybrid Extraction: Combines standard HTTP requests with headless browser automation.
  • Full-Page Screenshots: Captures the entire scrollable area of the webpage.
  • Organized Output: Saves artifacts in a structured output/ directory.

Installation

Ensure you have Go (1.20+) and Google Chrome installed on your machine.

  1. Clone the repository:

    git clone https://github.com/MESLEKDAA/go-web-scraper.git
    cd go-web-scraper
  2. Install dependencies:

    go mod tidy

Usage

Run the program from the terminal using the -url flag.

Basic Usage:

go run main.go -url https://example.com

Output Structure

After scanning, an output directory is created in the project root. Example structure for https://example.com:

go-web-scraper/
│
├── main.go
├── go.mod
├── ...
└── output/
    └── example.com/                                    <-- Auto-generated folder
        ├── 2025-12-20_11-56-35_source_code.html        <-- Raw HTML data
        └── 2025-12-20_11-56-35_screenshot.png          <-- Timestamped visual proof

Tech Stack

  • Core: Go (Golang) 1.20+
  • Browser Automation: Chromedp
  • Networking: net/http

Disclaimer

This tool is developed for educational purposes and authorized security testing only. The developer is not responsible for any misuse or damage caused by this tool.

About

A CTI reconnaissance tool that captures HTML source code and full-page screenshots developed with Go.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages