Skip to content

Format-aware alternative to strings. Uses binary format intelligence to find useful strings where others see noise.

License

Notifications You must be signed in to change notification settings

EvilBit-Labs/Stringy

Repository files navigation

Stupid Sentient Yarn Ball Logo

Stringy

A smarter alternative to the standard strings command that uses binary analysis to extract meaningful strings from executables, focusing on data structures rather than arbitrary byte runs.


The Problem with strings

The standard strings command dumps every printable byte sequence it finds, which means you get:

  • Padding bytes and table data
  • Interleaved garbage in UTF-16 strings
  • No context about where strings come from
  • No prioritization of what's actually useful

Stringy solves this by being data-structure aware, section-aware, and semantically intelligent.


What Makes Stringy Different

Data-Structure Aware

Only extracts strings that are part of the binary's actual data structures, not arbitrary byte runs.

Section-Aware

Prioritizes .rodata/.rdata/__cstring, resources, and version info; de-emphasizes writable .data; avoids .bss.

Encoding-Aware

Supports ASCII/UTF-8, UTF-16LE (PE), and UTF-16BE; detects null-interleaved text.

Semantically Tagged

Identifies URLs, domains, IPs, file paths, registry keys, GUIDs, user agents, format strings, Base64 runs, crypto constants, and cloud metadata.

Runtime-Specific

Handles import/export names, demangled Rust symbols, section names, Go build info, .NET metadata, and PE resources.

Ranked

Presents the most relevant strings first using a scoring algorithm.


Features

  • Format-aware parsing via goblin: ELF, PE, Mach-O
  • Section targeting: .rodata, .rdata, __cstring, resources, manifests
  • Encoding support: ASCII, UTF-8, UTF-16LE/BE with confidence scoring
  • Smart classification:
    • URLs, domains, IPv4/IPv6 addresses (implemented)
    • Filepaths & registry keys
    • GUIDs & user agents
    • Format strings (%s, %d, etc.)
    • Base64 & crypto constants
  • Rust symbol demangling (rustc-demangle)
  • JSON output for pipelines
  • YARA-friendly output for rule generation
  • Ranking & scoring: high-signal strings first

Installation

Note: Stringy is currently in development and not yet published to crates.io.

From Source

git clone https://github.com/EvilBit-Labs/Stringy
cd Stringy
cargo build --release
./target/release/stringy --help

Development Build

cargo run -- --help

Usage

Current Status: The CLI interface is under development. Currently available:

stringy target_binary

Current CLI Interface

Basic functionality is implemented with the full interface in development:

# Current: Basic analysis
stringy target_binary

# In Development: Advanced features
stringy --only url,filepath target_binary
stringy --min-len 8 --enc ascii,utf16 target_binary
stringy --top 50 --json target_binary

# Planned: Format-specific features
stringy --pe-version --pe-manifest target.exe
stringy --utf16-only target.exe

# Planned: Pipeline integration
stringy --json target_binary | jq '.[] | select(.tags[] | contains("url"))'
stringy --yara candidates.txt target_binary

Example Output

Human-readable mode:

Score  Offset    Section    Tags           String
-----  ------    -------    ----           ------
  95   0x1000    .rdata     url,https      https://api.example.com/v1/
  87   0x2000    .rdata     guid           {12345678-1234-1234-1234-123456789abc}
  82   0x3000    __cstring  filepath       /usr/local/bin/stringy
  78   0x4000    .rdata     fmt            Error: %s at line %d

JSON mode:

{
  "text": "https://api.example.com/v1/",
  "offset": 4096,
  "rva": 4096,
  "section": ".rdata",
  "encoding": "utf-8",
  "length": 28,
  "tags": [
    "url"
  ],
  "score": 95,
  "source": "SectionData"
}

Advantages Over Standard strings

  • Eliminates noise: Stops dumping padding, tables, and interleaved garbage
  • UTF-16 support: Surfaces UTF-16 (crucial for PE) cleanly
  • Actionable buckets: Provides categorized results (URLs, keys, UAs, registry paths) first
  • Provenance tracking: Keeps offset/section info for pivoting to other tools
  • YARA integration: Feeds only high-signal candidates

Development Status

This project is in active development. Current implementation status:

  • Core Infrastructure: Complete project structure, comprehensive data types, robust error handling
  • Format Detection: ELF, PE, Mach-O binary format detection via goblin
  • Container Parsers: Full section classification with weight-based prioritization
  • Import/Export Extraction: Symbol extraction from all supported formats
  • Section Analysis: Smart classification of string-rich sections
  • PE Resource Enumeration: VERSIONINFO, STRINGTABLE, and MANIFEST resource detection (Phase 1 complete)
  • 🚧 String Extraction: ASCII/UTF-8 and UTF-16 extraction engines (framework ready)
  • 🚧 Semantic Classification: IPv4/IPv6 detection implemented; URL, domain, path, GUID pattern matching in progress (types defined)
  • 🚧 Ranking System: Section-aware scoring algorithm (framework in place)
  • 🚧 Output Formats: JSONL, human-readable, and YARA-friendly output (types ready)
  • 🚧 CLI Interface: Basic argument parsing implemented, main pipeline in progress

Current Capabilities

The foundation is robust with fully implemented binary format parsers that can:

  • Format Detection: Automatically detect ELF, PE, and Mach-O formats using goblin
  • Section Classification: Intelligently classify sections by string likelihood with weighted scoring:
    • ELF: .rodata (10.0), .comment (9.0), .data.rel.ro (7.0)
    • PE: .rdata (10.0), .rsrc (9.0), read-only .data (7.0)
    • Mach-O: __TEXT,__cstring (10.0), __TEXT,__const (9.0), __DATA_CONST (7.0)
  • Symbol Processing: Extract and classify import/export names from symbol tables
  • PE Resource Extraction (Phase 1 complete):
    • VERSIONINFO resource detection
    • STRINGTABLE resource detection
    • MANIFEST resource detection
    • Metadata extraction (type, language, size)
  • Cross-Platform Support: Handle platform-specific section characteristics and naming
  • Comprehensive Metadata: Track section offsets, sizes, RVAs, and permissions

Architecture Highlights

  • Trait-Based Design: ContainerParser trait enables easy format extension
  • Type Safety: Comprehensive error handling with StringyError enum
  • Performance Ready: Section weighting system prioritizes high-value areas
  • Extensible Classification: Tag enum supports semantic string categorization
  • Multiple Sources: Handles strings from section data, imports, exports, and resources

See the implementation plan for detailed progress tracking.


License

Licensed under Apache 2.0.


Acknowledgements

  • Inspired by strings(1) and the need for better binary analysis tools
  • Built with Rust ecosystem crates: goblin, bstr, regex, rustc-demangle
  • My coworkers, for their excellent input on the original name selection

About

Format-aware alternative to strings. Uses binary format intelligence to find useful strings where others see noise.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •