Stringy

A smarter alternative to the standard strings command that uses binary analysis to extract meaningful strings from executables, focusing on data structures rather than arbitrary byte runs.

The Problem with `strings`

The standard strings command dumps every printable byte sequence it finds, which means you get:

Padding bytes and table data
Interleaved garbage in UTF-16 strings
No context about where strings come from
No prioritization of what's actually useful

Stringy solves this by being data-structure aware, section-aware, and semantically intelligent.

What Makes Stringy Different

Data-Structure Aware

Only extracts strings that are part of the binary's actual data structures, not arbitrary byte runs.

Section-Aware

Prioritizes .rodata/.rdata/__cstring, resources, and version info; de-emphasizes writable .data; avoids .bss.

Encoding-Aware

Supports ASCII/UTF-8, UTF-16LE (PE), and UTF-16BE; detects null-interleaved text.

Semantically Tagged

Identifies URLs, domains, IPs, file paths, registry keys, GUIDs, user agents, format strings, Base64 runs, crypto constants, and cloud metadata.

Runtime-Specific

Handles import/export names, demangled Rust symbols, section names, Go build info, .NET metadata, and PE resources.

Ranked

Presents the most relevant strings first using a scoring algorithm.

Features

Format-aware parsing via goblin: ELF, PE, Mach-O
Section targeting: .rodata, .rdata, __cstring, resources, manifests
Encoding support: ASCII, UTF-8, UTF-16LE/BE with confidence scoring
Smart classification:
- URLs, domains, IPv4/IPv6 addresses (implemented)
- Filepaths & registry keys
- GUIDs & user agents
- Format strings (%s, %d, etc.)
- Base64 & crypto constants
Rust symbol demangling (rustc-demangle)
JSON output for pipelines
YARA-friendly output for rule generation
Ranking & scoring: high-signal strings first

Installation

Note: Stringy is currently in development and not yet published to crates.io.

From Source

git clone https://github.com/EvilBit-Labs/Stringy
cd Stringy
cargo build --release
./target/release/stringy --help

Development Build

cargo run -- --help

Usage

Current Status: The CLI interface is under development. Currently available:

stringy target_binary

Current CLI Interface

Basic functionality is implemented with the full interface in development:

# Current: Basic analysis
stringy target_binary

# In Development: Advanced features
stringy --only url,filepath target_binary
stringy --min-len 8 --enc ascii,utf16 target_binary
stringy --top 50 --json target_binary

# Planned: Format-specific features
stringy --pe-version --pe-manifest target.exe
stringy --utf16-only target.exe

# Planned: Pipeline integration
stringy --json target_binary | jq '.[] | select(.tags[] | contains("url"))'
stringy --yara candidates.txt target_binary

Example Output

Human-readable mode:

Score  Offset    Section    Tags           String
-----  ------    -------    ----           ------
  95   0x1000    .rdata     url,https      https://api.example.com/v1/
  87   0x2000    .rdata     guid           {12345678-1234-1234-1234-123456789abc}
  82   0x3000    __cstring  filepath       /usr/local/bin/stringy
  78   0x4000    .rdata     fmt            Error: %s at line %d

JSON mode:

{
  "text": "https://api.example.com/v1/",
  "offset": 4096,
  "rva": 4096,
  "section": ".rdata",
  "encoding": "utf-8",
  "length": 28,
  "tags": [
    "url"
  ],
  "score": 95,
  "source": "SectionData"
}

Advantages Over Standard `strings`

Eliminates noise: Stops dumping padding, tables, and interleaved garbage
UTF-16 support: Surfaces UTF-16 (crucial for PE) cleanly
Actionable buckets: Provides categorized results (URLs, keys, UAs, registry paths) first
Provenance tracking: Keeps offset/section info for pivoting to other tools
YARA integration: Feeds only high-signal candidates

Development Status

This project is in active development. Current implementation status:

✅ Core Infrastructure: Complete project structure, comprehensive data types, robust error handling
✅ Format Detection: ELF, PE, Mach-O binary format detection via goblin
✅ Container Parsers: Full section classification with weight-based prioritization
✅ Import/Export Extraction: Symbol extraction from all supported formats
✅ Section Analysis: Smart classification of string-rich sections
✅ PE Resource Enumeration: VERSIONINFO, STRINGTABLE, and MANIFEST resource detection (Phase 1 complete)
🚧 String Extraction: ASCII/UTF-8 and UTF-16 extraction engines (framework ready)
🚧 Semantic Classification: IPv4/IPv6 detection implemented; URL, domain, path, GUID pattern matching in progress (types defined)
🚧 Ranking System: Section-aware scoring algorithm (framework in place)
🚧 Output Formats: JSONL, human-readable, and YARA-friendly output (types ready)
🚧 CLI Interface: Basic argument parsing implemented, main pipeline in progress

Current Capabilities

The foundation is robust with fully implemented binary format parsers that can:

Format Detection: Automatically detect ELF, PE, and Mach-O formats using goblin
Section Classification: Intelligently classify sections by string likelihood with weighted scoring:
- ELF: .rodata (10.0), .comment (9.0), .data.rel.ro (7.0)
- PE: .rdata (10.0), .rsrc (9.0), read-only .data (7.0)
- Mach-O: __TEXT,__cstring (10.0), __TEXT,__const (9.0), __DATA_CONST (7.0)
Symbol Processing: Extract and classify import/export names from symbol tables
PE Resource Extraction (Phase 1 complete):
- VERSIONINFO resource detection
- STRINGTABLE resource detection
- MANIFEST resource detection
- Metadata extraction (type, language, size)
Cross-Platform Support: Handle platform-specific section characteristics and naming
Comprehensive Metadata: Track section offsets, sizes, RVAs, and permissions

Architecture Highlights

Trait-Based Design: ContainerParser trait enables easy format extension
Type Safety: Comprehensive error handling with StringyError enum
Performance Ready: Section weighting system prioritizes high-value areas
Extensible Classification: Tag enum supports semantic string categorization
Multiple Sources: Handles strings from section data, imports, exports, and resources

See the implementation plan for detailed progress tracking.

License

Licensed under Apache 2.0.

Acknowledgements

Inspired by strings(1) and the need for better binary analysis tools
Built with Rust ecosystem crates: goblin, bstr, regex, rustc-demangle
My coworkers, for their excellent input on the original name selection

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.cursor		.cursor
.github		.github
.kiro		.kiro
benches		benches
docs		docs
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.markdownlint.json		.markdownlint.json
.mdformat.toml		.mdformat.toml
.mega-linter.yml		.mega-linter.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
concept.md		concept.md
cspell.config.yaml		cspell.config.yaml
deny.toml		deny.toml
dist-workspace.toml		dist-workspace.toml
justfile		justfile
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Stringy

The Problem with `strings`

What Makes Stringy Different

Data-Structure Aware

Section-Aware

Encoding-Aware

Semantically Tagged

Runtime-Specific

Ranked

Features

Installation

From Source

Development Build

Usage

Current CLI Interface

Example Output

Advantages Over Standard `strings`

Development Status

Current Capabilities

Architecture Highlights

License

Acknowledgements

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Uh oh!

License

EvilBit-Labs/Stringy

Folders and files

Latest commit

History

Repository files navigation

Stringy

The Problem with strings

What Makes Stringy Different

Data-Structure Aware

Section-Aware

Encoding-Aware

Semantically Tagged

Runtime-Specific

Ranked

Features

Installation

From Source

Development Build

Usage

Current CLI Interface

Example Output

Advantages Over Standard strings

Development Status

Current Capabilities

Architecture Highlights

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

The Problem with `strings`

Advantages Over Standard `strings`

Packages