High-performance regex engine for Go. Drop-in replacement for regexp with 3-3000x speedup.
Go's stdlib regexp is intentionally simple — single NFA engine, no optimizations. This guarantees O(n) time but leaves performance on the table.
coregex brings Rust regex-crate architecture to Go:
- Multi-engine: Lazy DFA, PikeVM, OnePass, BoundedBacktracker
- SIMD prefilters: AVX2/SSSE3 for fast candidate rejection
- Reverse search: Suffix/inner literal patterns run 1000x+ faster
- O(n) guarantee: No backtracking, no ReDoS vulnerabilities
go get github.com/coregx/coregexRequires Go 1.25+. Zero external dependencies.
package main
import (
"fmt"
"github.com/coregx/coregex"
)
func main() {
re := coregex.MustCompile(`\w+@\w+\.\w+`)
text := []byte("Contact support@example.com for help")
// Find first match
fmt.Printf("Found: %s\n", re.Find(text))
// Check if matches (zero allocation)
if re.MatchString("test@email.com") {
fmt.Println("Valid email format")
}
}Cross-language benchmarks on 6MB input (source):
| Pattern | Go stdlib | coregex | Rust regex | vs stdlib |
|---|---|---|---|---|
| Email validation | 259 ms | 1.5 ms | 1.5 ms | 172x |
| URL extraction | 257 ms | 1.3 ms | 0.8 ms | 192x |
Suffix .*\.txt |
240 ms | 1.5 ms | 1.3 ms | 166x |
Inner .*keyword.* |
232 ms | 1.5 ms | 0.6 ms | 153x |
Char class [\w]+ |
550 ms | 26 ms | 52 ms | 21x |
Alternation a|b|c |
473 ms | 31 ms | 0.8 ms | 15x |
Where coregex excels:
- Suffix patterns (
.*\.log,.*\.txt) — reverse search optimization - Inner literals (
.*error.*,.*@example\.com) — bidirectional DFA - Character classes (
[\w]+,\d+) — 256-byte lookup table - Multi-pattern (
foo|bar|baz) — Teddy SIMD algorithm
Known gaps vs Rust:
literal_alt— Rust uses Aho-Corasick (planned for coregex)- Complex alternations — architectural differences
coregex automatically selects the optimal engine:
| Strategy | Pattern Type | Speedup |
|---|---|---|
| ReverseInner | .*keyword.* |
1000-3000x |
| ReverseSuffix | .*\.txt |
100-400x |
| CharClassSearcher | [\w]+, \d+ |
20-25x |
| Teddy | foo|bar|baz |
15-240x |
| LazyDFA | Complex with literals | 10-50x |
| OnePass | Anchored captures | 10x |
| BoundedBacktracker | Small patterns | 2-5x |
Drop-in replacement for regexp.Regexp:
// stdlib
re := regexp.MustCompile(pattern)
// coregex — same API
re := coregex.MustCompile(pattern)Supported methods:
Match,MatchString,MatchReaderFind,FindString,FindAll,FindAllStringFindIndex,FindStringIndex,FindAllIndexFindSubmatch,FindStringSubmatch,FindAllSubmatchReplaceAll,ReplaceAllString,ReplaceAllFuncSplit,SubexpNames,NumSubexpLongest,Copy,String
// Zero allocations — returns bool
matched := re.IsMatch(text)
// Zero allocations — returns (start, end, found)
start, end, found := re.FindIndices(text)config := coregex.DefaultConfig()
config.DFAMaxStates = 10000 // Limit DFA cache
config.EnablePrefilter = true // SIMD acceleration
re, err := coregex.CompileWithConfig(pattern, config)Uses Go's regexp/syntax parser:
| Feature | Support |
|---|---|
| Character classes | [a-z], \d, \w, \s |
| Quantifiers | *, +, ?, {n,m} |
| Anchors | ^, $, \b, \B |
| Groups | (...), (?:...), (?P<name>...) |
| Unicode | \p{L}, \P{N} |
| Flags | (?i), (?m), (?s) |
| Backreferences | Not supported (O(n) guarantee) |
Pattern → Parse → NFA → Literal Extract → Strategy Select
↓
┌─────────────────────────────────┐
│ Engines: │
│ LazyDFA, PikeVM, OnePass, │
│ BoundedBacktracker, │
│ ReverseInner, ReverseSuffix, │
│ CharClassSearcher, Teddy │
└─────────────────────────────────┘
↓
Input → Prefilter (SIMD) → Engine → Match Result
SIMD Primitives (AMD64):
memchr— single byte search (AVX2)memmem— substring search (SSSE3)teddy— multi-pattern search (SSSE3)
Pure Go fallback on other architectures.
coregex is integrated in GoAWK by Ben Hoyt. This real-world testing uncovered 15+ edge cases that synthetic benchmarks missed.
We need more testers! If you have a project using regexp, try coregex and report issues.
| coregex | stdlib | regexp2 | |
|---|---|---|---|
| Performance | 3-3000x faster | Baseline | Slower |
| SIMD | AVX2/SSSE3 | No | No |
| O(n) guarantee | Yes | Yes | No |
| Backreferences | No | No | Yes |
| API | Drop-in | — | Different |
Use coregex for performance-critical code with O(n) guarantee. Use stdlib for simple cases where performance doesn't matter. Use regexp2 if you need backreferences (accept exponential worst-case).
- golang/go#26623 — Go regexp performance discussion
- golang/go#76818 — Upstream path proposal
- kolkov/regex-bench — Cross-language benchmarks
Inspired by:
- Rust regex — Architecture
- RE2 — O(n) guarantees
- Hyperscan — SIMD algorithms
MIT — see LICENSE.
Status: Pre-1.0 (API may change). Ready for testing and feedback.