A smarter alternative to the standard strings command that uses binary analysis to extract meaningful strings from executables, focusing on data structures rather than arbitrary byte runs.
The standard strings command dumps every printable byte sequence it finds, which means you get:
- Padding bytes and table data
- Interleaved garbage in UTF-16 strings
- No context about where strings come from
- No prioritization of what's actually useful
Stringy solves this by being data-structure aware, section-aware, and semantically intelligent.
Only extracts strings that are part of the binary's actual data structures, not arbitrary byte runs.
Prioritizes .rodata/.rdata/__cstring, resources, and version info; de-emphasizes writable .data; avoids .bss.
Supports ASCII/UTF-8, UTF-16LE (PE), and UTF-16BE; detects null-interleaved text.
Identifies URLs, domains, IPs, file paths, registry keys, GUIDs, user agents, format strings, Base64 runs, crypto constants, and cloud metadata.
Handles import/export names, demangled Rust symbols, section names, Go build info, .NET metadata, and PE resources.
Presents the most relevant strings first using a scoring algorithm.
- Format-aware parsing via
goblin: ELF, PE, Mach-O - Section targeting:
.rodata,.rdata,__cstring, resources, manifests - Encoding support: ASCII, UTF-8, UTF-16LE/BE with confidence scoring
- Smart classification:
- URLs, domains, IPv4/IPv6 addresses (implemented)
- Filepaths & registry keys
- GUIDs & user agents
- Format strings (
%s,%d, etc.) - Base64 & crypto constants
- Rust symbol demangling (
rustc-demangle) - JSON output for pipelines
- YARA-friendly output for rule generation
- Ranking & scoring: high-signal strings first
Note: Stringy is currently in development and not yet published to crates.io.
git clone https://github.com/EvilBit-Labs/Stringy
cd Stringy
cargo build --release
./target/release/stringy --helpcargo run -- --helpCurrent Status: The CLI interface is under development. Currently available:
stringy target_binaryBasic functionality is implemented with the full interface in development:
# Current: Basic analysis
stringy target_binary
# In Development: Advanced features
stringy --only url,filepath target_binary
stringy --min-len 8 --enc ascii,utf16 target_binary
stringy --top 50 --json target_binary
# Planned: Format-specific features
stringy --pe-version --pe-manifest target.exe
stringy --utf16-only target.exe
# Planned: Pipeline integration
stringy --json target_binary | jq '.[] | select(.tags[] | contains("url"))'
stringy --yara candidates.txt target_binaryHuman-readable mode:
Score Offset Section Tags String
----- ------ ------- ---- ------
95 0x1000 .rdata url,https https://api.example.com/v1/
87 0x2000 .rdata guid {12345678-1234-1234-1234-123456789abc}
82 0x3000 __cstring filepath /usr/local/bin/stringy
78 0x4000 .rdata fmt Error: %s at line %d
JSON mode:
{
"text": "https://api.example.com/v1/",
"offset": 4096,
"rva": 4096,
"section": ".rdata",
"encoding": "utf-8",
"length": 28,
"tags": [
"url"
],
"score": 95,
"source": "SectionData"
}- Eliminates noise: Stops dumping padding, tables, and interleaved garbage
- UTF-16 support: Surfaces UTF-16 (crucial for PE) cleanly
- Actionable buckets: Provides categorized results (URLs, keys, UAs, registry paths) first
- Provenance tracking: Keeps offset/section info for pivoting to other tools
- YARA integration: Feeds only high-signal candidates
This project is in active development. Current implementation status:
- ✅ Core Infrastructure: Complete project structure, comprehensive data types, robust error handling
- ✅ Format Detection: ELF, PE, Mach-O binary format detection via
goblin - ✅ Container Parsers: Full section classification with weight-based prioritization
- ✅ Import/Export Extraction: Symbol extraction from all supported formats
- ✅ Section Analysis: Smart classification of string-rich sections
- ✅ PE Resource Enumeration: VERSIONINFO, STRINGTABLE, and MANIFEST resource detection (Phase 1 complete)
- 🚧 String Extraction: ASCII/UTF-8 and UTF-16 extraction engines (framework ready)
- 🚧 Semantic Classification: IPv4/IPv6 detection implemented; URL, domain, path, GUID pattern matching in progress (types defined)
- 🚧 Ranking System: Section-aware scoring algorithm (framework in place)
- 🚧 Output Formats: JSONL, human-readable, and YARA-friendly output (types ready)
- 🚧 CLI Interface: Basic argument parsing implemented, main pipeline in progress
The foundation is robust with fully implemented binary format parsers that can:
- Format Detection: Automatically detect ELF, PE, and Mach-O formats using
goblin - Section Classification: Intelligently classify sections by string likelihood with weighted scoring:
- ELF:
.rodata(10.0),.comment(9.0),.data.rel.ro(7.0) - PE:
.rdata(10.0),.rsrc(9.0), read-only.data(7.0) - Mach-O:
__TEXT,__cstring(10.0),__TEXT,__const(9.0),__DATA_CONST(7.0)
- ELF:
- Symbol Processing: Extract and classify import/export names from symbol tables
- PE Resource Extraction (Phase 1 complete):
- VERSIONINFO resource detection
- STRINGTABLE resource detection
- MANIFEST resource detection
- Metadata extraction (type, language, size)
- Cross-Platform Support: Handle platform-specific section characteristics and naming
- Comprehensive Metadata: Track section offsets, sizes, RVAs, and permissions
- Trait-Based Design:
ContainerParsertrait enables easy format extension - Type Safety: Comprehensive error handling with
StringyErrorenum - Performance Ready: Section weighting system prioritizes high-value areas
- Extensible Classification:
Tagenum supports semantic string categorization - Multiple Sources: Handles strings from section data, imports, exports, and resources
See the implementation plan for detailed progress tracking.
Licensed under Apache 2.0.
- Inspired by
strings(1)and the need for better binary analysis tools - Built with Rust ecosystem crates:
goblin,bstr,regex,rustc-demangle - My coworkers, for their excellent input on the original name selection
