web-capture

A CLI and microservice to fetch URLs and render them as:

HTML: Rendered page content
Markdown: Converted from HTML
PNG screenshot: Full page capture

Language Implementations

This repository contains two implementations with compatible APIs:

Implementation	Directory	Package	Status
JavaScript/Node.js	`./js`	@link-assistant/web-capture	Production
Rust	`./rust`	web-capture	Production

Both implementations provide the same CLI interface and HTTP API endpoints, allowing you to choose based on your deployment preferences.

Quick Start

JavaScript

cd js
npm install
npm run dev

Rust

cd rust
cargo run -- --serve

API Endpoints

Both implementations expose the same API:

Endpoint	Description
`GET /html?url=<URL>`	Get rendered HTML content
`GET /markdown?url=<URL>`	Get Markdown conversion
`GET /image?url=<URL>`	Get PNG screenshot
`GET /fetch?url=<URL>`	Proxy fetch content
`GET /stream?url=<URL>`	Stream content

CLI Usage

# Capture a URL as HTML (output to stdout)
web-capture https://example.com

# Capture as Markdown and save to file
web-capture https://example.com --format markdown --output page.md

# Take a screenshot
web-capture https://example.com --format png --output screenshot.png

# Start as API server
web-capture --serve

# Start server on custom port
web-capture --serve --port 8080

CLI Options

Option	Short	Description	Default
`--serve`	`-s`	Start as HTTP API server	-
`--port`	`-p`	Port to listen on	3000
`--format`	`-f`	Output format: `html`, `markdown`/`md`, `image`/`png`	`html`
`--output`	`-o`	Output file path	stdout (text) or auto-generated (images)
`--engine`	`-e`	Browser engine (JS only): `puppeteer`, `playwright`	`puppeteer`

Docker

JavaScript

cd js
docker build -t web-capture-js .
docker run -p 3000:3000 web-capture-js

Rust

cd rust
docker build -t web-capture-rust .
docker run -p 3000:3000 web-capture-rust

Project Structure

web-capture/
├── js/                          # JavaScript/Node.js implementation
│   ├── src/                     # Source code
│   ├── bin/                     # CLI entry point
│   ├── tests/                   # Test files
│   ├── examples/                # Usage examples
│   ├── package.json             # npm package manifest
│   ├── Dockerfile               # Docker build file
│   └── README.md                # JavaScript-specific docs
│
├── rust/                        # Rust implementation
│   ├── src/                     # Source code
│   │   ├── lib.rs               # Library exports
│   │   ├── main.rs              # CLI/server entry point
│   │   ├── browser.rs           # Browser automation
│   │   ├── html.rs              # HTML processing
│   │   └── markdown.rs          # Markdown conversion
│   ├── tests/                   # Test files
│   ├── examples/                # Usage examples
│   ├── Cargo.toml               # Cargo package manifest
│   ├── Dockerfile               # Docker build file
│   └── README.md                # Rust-specific docs
│
├── scripts/                     # Shared build/release scripts
│   ├── *.mjs                    # JavaScript-specific scripts
│   └── rust-*.mjs               # Rust-specific scripts
│
├── .github/workflows/
│   ├── js.yml                   # JavaScript CI/CD
│   └── rust.yml                 # Rust CI/CD
│
└── README.md                    # This file

Development

JavaScript

cd js
npm install
npm run dev          # Start dev server
npm test             # Run tests
npm run lint         # Run linter

Rust

cd rust
cargo build          # Build
cargo test           # Run tests
cargo clippy         # Run linter
cargo fmt            # Format code

Features

HTML Rendering: Fetch and render HTML with JavaScript support via headless browsers
Markdown Conversion: Clean HTML-to-Markdown conversion with proper formatting
Screenshots: Capture PNG screenshots of web pages
URL Normalization: Convert relative URLs to absolute
Encoding Detection: Automatic charset detection and UTF-8 conversion
Proxy Support: Fetch and stream content through the service

Browser Engines

JavaScript Version

The JavaScript implementation supports two browser engines:

Puppeteer (default): Mature, well-tested Chrome automation
Playwright: Cross-browser automation with similar capabilities

Rust Version

The Rust implementation uses:

browser-commander: A Rust crate for browser automation using chromiumoxide

License

UNLICENSED (JavaScript) / Unlicense (Rust)

Related Projects

browser-commander - Browser automation library used in Rust implementation
turndown - HTML to Markdown converter used in JS implementation
html2md - HTML to Markdown converter used in Rust implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-capture

Language Implementations

Quick Start

JavaScript

Rust

API Endpoints

CLI Usage

CLI Options

Docker

JavaScript

Rust

Project Structure

Development

JavaScript

Rust

Features

Browser Engines

JavaScript Version

Rust Version

License

Related Projects

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
docs		docs
js		js
rust		rust
scripts		scripts
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
apply.sh		apply.sh

License

link-assistant/web-capture

Folders and files

Latest commit

History

Repository files navigation

web-capture

Language Implementations

Quick Start

JavaScript

Rust

API Endpoints

CLI Usage

CLI Options

Docker

JavaScript

Rust

Project Structure

Development

JavaScript

Rust

Features

Browser Engines

JavaScript Version

Rust Version

License

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages