Version 9.0.0 - A powerful, context-aware search tool designed specifically for source code.
CGrep extends the capabilities of traditional grep by understanding the structure and semantics of source code across multiple programming languages. It allows developers to search within specific contexts like code, comments, or string literals, and provides advanced pattern matching with semantic awareness.
- 75% faster plain token search - Dramatically improved performance for standard token-based searches
- 39% faster semantic search - Significant speed boost for advanced semantic pattern matching
- Full UTF-8 support - Switched from bytestream to text processing, enabling proper handling of UTF-8 character sets and international characters
- Semantic Test Filtering - New capability to filter out test code from search results across 27+ programming languages and their respective testing frameworks (see Test Framework Support below)
- Improved Text Processing - Native support for utf-8 character encodings with accurate column positioning
cabal update
cabal install cgrepgit clone https://github.com/awgn/cgrep.git
cd cgrep
cabal installor using stack:
stack build
stack installcgrep 9.0.0 - Usage: cgrep [OPTION] [PATTERN] files...
Usage: cgrep [--file FILE] [-w|--word] [-p|--prefix] [-s|--suffix] [-e|--edit]
[-G|--regex] [-i|--ignore-case] [-c|--code] [-m|--comment]
[-l|--literal] [--identifier|--name] [--native|--type] [--keyword]
[--number] [--string] [--op] [--type TYPE] [--kind KIND]
[--code-only] [--hdr-only] [-T|--tests ARG] [--prune-dir DIR]
[-r|--recursive] [-L|--follow] [-S|--semantic] [--strict]
[--max-count INT] [--force-type TYPE] [--type-list]
[-v|--invert-match] [-j|--threads INT] [--show-match] [--color]
[--no-color] [-h|--no-filename] [--no-numbers] [--no-column]
[--count] [--filename-only] [--json] [--vim] [--editor]
[--fileline] [--verbose] [--stats] [--null-output] [--palette]
[PATTERN FILES...] [--version]
Context-aware grep for source codes
Available options:
--file FILE Read PATTERNs from file (one per line)
-w,--word Force word matching
-p,--prefix Force prefix matching
-s,--suffix Force suffix matching
-e,--edit Use edit distance
-G,--regex Use regex matching (posix)
-i,--ignore-case Ignore case distinctions
-c,--code Enable search in source code
-m,--comment Enable search in comments
-l,--literal Enable search in string literals
--identifier,--name Identifiers
--native,--type Native Types
--keyword Keywords
--number Literal numbers
--string Literal strings
--op Operators
--type TYPE Specify file types. ie: Cpp, +Haskell, -Makefile
--kind KIND Specify file kinds. Text, Config, Language, Data,
Markup or Script
--code-only Parse code modules only (skip headers/interfaces)
--hdr-only Parse headers/interfaces only (skip modules)
-T,--tests ARG Filter tests: 'True' tests only, 'False' code only,
omitted (search all)
--prune-dir DIR Do not descend into dir
-r,--recursive Enable recursive search (don't follow symlinks)
-L,--follow Follow symlinks
-S,--semantic "code" pattern: _, _1, _2... (identifiers), ANY, KEY,
STR, LIT, NUM, HEX, OCT
--strict Enable strict semantic for operators
--max-count INT Stop search in files after INT matches
--force-type TYPE Force the type of file
--type-list List the supported file types
-v,--invert-match Select non-matching lines
-j,--threads INT Approximate number of threads to run search
--show-match Show list of matching tokens
--color Use colors to highlight the match strings
--no-color Do not use colors (override config file)
-h,--no-filename Suppress the file name prefix on output
--no-numbers Suppress both line and column numbers on output
--no-column Suppress the column number on output
--count Print only a count of matching lines per file
--filename-only Print only the name of files containing matches
--json Format output as json object
--vim Run vim editor passing the files that match
--editor Run the editor specified by EDITOR var., passing the
files that match
--fileline When edit option is specified, pass the list of
matching files in file:line format (e.g. vim
'file-line' plugin)
--verbose Enable verbose mode
--stats Print statistics about the search
--null-output Disable output for performance evaluation
--palette Show color palette
-h,--help Show this help text
--version Show version information and exit
Search for a simple pattern in source files:
cgrep "main" *.cSearch recursively in a directory:
cgrep -r "TODO" src/Case-insensitive search:
cgrep -i "buffer" *.cppSearch only in code (exclude comments and strings):
cgrep -c "malloc" *.cSearch only in comments:
cgrep -m "TODO" -r src/Search only in string literals:
cgrep -l "hello" *.cppSearch in both code and comments, but not in strings:
cgrep -c -m "config" *.jsSearch for identifiers only:
cgrep --identifier "main" *.cSearch for native types:
cgrep --type "int" *.cSearch for string literals containing specific text:
cgrep --string "hello" *.cppSearch only in C++ files (recursively):
cgrep --type=Cpp -r "char" test/Search in Haskell files, but exclude test code:
cgrep --type=Haskell -T False "function" -r .Search by file kind (configuration files):
cgrep --kind=Config "database" -r /etc/Search only in production code, excluding all tests:
cgrep -T False "function" -r src/Search only in test code:
cgrep -T True "mock" -r tests/This feature automatically detects and filters test code based on language-specific conventions (see Test Framework Support).
Semantic search allows you to match code patterns using wildcards:
_,_1,_2, ... : Match any identifierANY: Match any tokenKEY: Match any keywordSTR: Match any string literalLIT: Match any literalNUM: Match any numberHEX: Match any hexadecimal numberOCT: Match any octal number
Find variable assignments with numeric literals:
cgrep -S "_ = NUM" *.cUse regular expressions (POSIX):
cgrep -G "main|return" *.cUse word boundaries:
cgrep -w "read" *.c # Matches "read" but not "thread" or "reader"Prefix matching:
cgrep -p "ma" *.c # Matches "main", etc.Suffix matching:
cgrep -s "rn" *.c # Matches "return", etc.Case-insensitive search:
cgrep -i "SED" test.cpp # Matches "Sed", "sed", etc.Show only filenames of files containing matches:
cgrep --filename-only "char" *.cppCount matches per file:
cgrep --count "char" test.cppJSON output (useful for scripting):
cgrep --json "error" *.logSuppress filename prefix:
cgrep -h "pattern" file.cShow matching tokens:
cgrep --show-match "std::" *.cppOpen matching files in vim:
cgrep --vim "FIXME" -r src/Open with your default editor (set via EDITOR environment variable):
cgrep --editor "bug" -r src/Use vim with file:line format (works with vim-file-line plugin):
cgrep --vim --fileline "error" -r src/Limit number of threads:
cgrep -j 4 "pattern" -r large_codebase/Stop after finding N matches in each file:
cgrep --max-count=5 "TODO" -r src/Show search statistics:
cgrep --stats "function" -r src/Version 9's improved UTF-8 support allows searching in files with international characters:
cgrep "Hello" test.utf8Version 9 introduces intelligent test code filtering across 27+ programming languages. When using the -T flag, cgrep can automatically detect and filter test code based on language-specific conventions and testing frameworks.
| Language | Testing Frameworks Detected | Detection Patterns |
|---|---|---|
| Rust | Built-in, cargo test | #[test], #[cfg(test)] modules |
| Go | Built-in testing | func Test*, func Benchmark* |
| Java | JUnit | @Test annotations |
| Kotlin | JUnit | @Test annotations |
| C | Google Test, Catch2 | TEST(), TEST_F(), TEST_CASE(), test_* functions |
| C++ | Google Test, Catch2, Boost.Test | TEST(), TEST_F(), TEST_CASE(), BOOST_AUTO_TEST, Test* functions |
| Python | pytest, unittest | test_* functions, Test* classes, @pytest, @unittest decorators |
| Zig | Built-in | test "..." blocks |
| JavaScript | Mocha, Jasmine, Jest | describe(), it(), test(), context() |
| TypeScript | Mocha, Jasmine, Jest | describe(), it(), test(), context() |
| Scala | ScalaTest, MUnit | test(), it(), describe(), scenario(), feature() |
| Haskell | HSpec, Tasty, QuickCheck, HUnit | describe, it, context, testCase, testGroup, testProperty, prop_* functions |
| C# | NUnit, xUnit, MSTest | [Test], [Fact], [Theory], [TestMethod], [TestFixture], [TestClass] |
| F# | NUnit, xUnit, Expecto | [<Test>], [<Fact>], [<Theory>], testCase, testList, test |
| Dart | Built-in test package, Flutter | test(), group(), testWidgets() |
| Elixir | ExUnit | test "...", describe "...", defmodule *Test |
| Ruby | RSpec, Minitest | describe, context, it, test_*, def test_* |
| PHP | PHPUnit | @test annotations, test* methods, *Test classes |
| Swift | XCTest | XCTestCase classes, func test*() |
| Objective-C | XCTest | XCTestCase classes, test methods |
| R | testthat | test_that(), describe(), context() |
| Julia | Built-in Test | @testset, @test |
| Perl | Test::More, Test::Simple | subtest, *.t files |
| OCaml | OUnit, Alcotest | let test_*, test_case |
| Erlang | EUnit | *_test(), *_test_() functions |
| Nim | unittest | unittest module, test "..." |
| Clojure | clojure.test | (deftest ...), (testing ...) |
| D | Built-in | unittest { ... } blocks |
# Search for "config" only in production code, skipping all tests
cgrep -T False "config" -r src/
# Search for mock usage only in test files
cgrep -T True "mock" -r .CGrep supports a wide range of programming languages and file formats:
Programming Languages: C, C++, C#, Java, Kotlin, Scala, JavaScript, TypeScript, CoffeeScript, Python, Ruby, Perl, PHP, Go, Rust, Haskell, OCaml, F#, Erlang, Elixir, Clojure, Lisp, Scheme, Lua, R, Julia, Dart, Nim, Zig, D, Swift, Objective-C, Chapel, Awk, Shell scripts (Bash, Fish)
Markup & Config: HTML, XML, LaTeX, Markdown, YAML, JSON, TOML, INI, Dhall, CMake, Makefile, Cabal
To see the complete list of supported file types:
cgrep --type-listCGrep can be configured using a configuration file located at ~/.cgreprc (or $XDG_CONFIG_HOME/cgrep/cgreprc).
Example configuration:
colors: true
file-types:
- +Cpp
- +Haskell
- -Test
jobs: 8-
Use file type filters to reduce the number of files processed:
cgrep --type=Cpp "pattern" -r .
-
Limit the search scope with
--prune-dirto exclude directories:cgrep --prune-dir=node_modules --prune-dir=.git "pattern" -r .
-
Use context filters when you know where to search:
cgrep -c "pattern" -r . # Search only in code
-
Use test filtering to focus on production code:
cgrep -T False "pattern" -r . # Exclude all test code
-
Adjust thread count for optimal performance on your system:
cgrep -j 16 "pattern" -r .
| Search Type | v8 | v9 | Improvement |
|---|---|---|---|
| Plain token search | 1.0x | 1.75x | +75% |
| Semantic search | 1.0x | 1.39x | +39% |
Benchmarks performed on a typical codebase with ~100k lines of code.
Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.
CGrep is released under the GPL-2.0-or-later license. See the LICENSE file for details.
Nicola Bonelli nicola@larthia.com
- Homepage: http://awgn.github.io/cgrep/
- Hackage: https://hackage.haskell.org/package/cgrep
- GitHub: https://github.com/awgn/cgrep
- Gitter Chat: https://gitter.im/awgn/cgrep