Skip to content

f1monkey/spellchecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spellchecker

Go Reference CI

Yet another spellchecker written in go.

Features:

  • very compact database: ~1 MB for 30,000 unique words
  • average time to fix a single word: ~35 µs
  • achieves about 70–74% accuracy on Peter Norvig’s test sets (see benchmarks)
  • no built-in dictionary — you can provide any custom words, and the spellchecker will only know them

Installation

go get -v github.com/f1monkey/spellchecker/v3

Usage

Quick start

  1. Initialize the spellchecker. You need to pass an alphabet: a set of allowed characters that will be used for indexing and primary word checks. (All other characters will be ignored for these operations.)
	// Create a new instance
	sc, err := spellchecker.New(
		"abcdefghijklmnopqrstuvwxyz1234567890", // allowed symbols, other symbols will be ignored
	)
  1. Add some words to the dictionary:

    1. from any io.Reader:
    	in, _ := os.Open("data/sample.txt")
    	sc.AddFrom(in)
    1. Or add words manually:
    	sc.AddMany([]string{"lock", "stock", "and", "two", "smoking"})
    	sc.Add("barrels")
  2. Use the spellchecker:

    1. Check if a word is correct:
    	result := sc.IsCorrect("stock")
    	fmt.Println(result) // true
    1. Suggest corrections:
    	// Find up to 10 suggestions for a word
    	matches := sc.Suggest(nil, "rang", 10)
    	fmt.Println(matches) // [range, orange]

Options

Options

The spellchecker supports customizable options for both searching/suggesting corrections and adding words to the dictionary.

Search/Suggestion Options

These options are passed to the Suggest method (or to SuggestWith... helpers).

  • SuggestWithMaxErrors(maxErrors int)
    Sets the maximum allowed edit distance (in "bits") between the input word and dictionary candidates.

    • Deletion: 1 bit (e.g., "proble" → "problem")
    • Insertion: 1 bit (e.g., "problemm" → "problem")
    • Substitution: 2 bits (e.g., "problam" → "problem")
    • Transposition: 0 bits (e.g., "problme" → "problem")

    Default: 2. Increasing this value beyond 2 is not recommended as it can significantly degrade performance.

  • SuggestWithFilterFunc(f FilterFunc)
    Replaces the default scoring/filtering function with a custom one.
    The function receives:

    • src: runes of the input word
    • candidate: runes of the dictionary word
    • count: frequency count of the candidate in the dictionary

    It must return:

    • a float64 score (higher = better suggestion)
    • a bool indicating whether the candidate should be kept

    The default filter uses Levenshtein distance (with costs: insert/delete=1, substitute=1, transpose=1), filters out candidates exceeding maxErrors, and boosts score based on word frequency and shared prefix/suffix length.

Example usage:

matches := sc.Suggest(
	"rang",
	10,
	spellchecker.SuggestWithMaxErrors(1),
	spellchecker.SuggestWithFilterFunc(myCustomFilter),
)

Add Options

These options are passed to Add, AddMany, or AddFrom.

  • AddWithWeight(weight uint) Sets the frequency weight for added word(s). Higher weight increases the chance that the word will appear higher in suggestion results. Default: 1.

  • AddWithSplitter(splitter bufio.SplitFunc) Customizes how AddFrom(reader) splits the input stream into words.

    The default splitter:

    • Uses bufio.ScanWords as base
    • Converts to lowercase
    • Keeps only sequences matching [-\pL]+ (letters and hyphens)

Example:

sc.AddFrom(
	file,
	spellchecker.AddWithWeight(10),          // these words are very common
	spellchecker.AddWithSplitter(customSplitter),
)

sc.AddMany([]string{"hello", "world"},
	spellchecker.AddWithWeight(5),
)

Save/load

	sc, err := spellchecker.New("abc")

	// Save data to any io.Writer
	out, err := os.Create("data/out.bin")
	if err != nil {
		panic(err)
	}
	sc.Save(out)

	// Load data back from io.Reader
	in, err = os.Open("data/out.bin")
	if err != nil {
		panic(err)
	}
	sc, err = spellchecker.Load(in)
	if err != nil {
		panic(err)
	}

Benchmarks

Tests are based on data from Peter Norvig's article about spelling correction

Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^Benchmark_Norvig1$ github.com/f1monkey/spellchecker -count=1

goos: linux
goarch: amd64
pkg: github.com/f1monkey/spellchecker
cpu: 13th Gen Intel(R) Core(TM) i9-13980HX
Benchmark_Norvig1-32    	     357	   3305052 ns/op	        74.44 success_percent	       201.0 success_words	       270.0 total_words	  768899 B/op	   13302 allocs/op
PASS
ok  	github.com/f1monkey/spellchecker	3.801s
Running tool: /usr/bin/go test -benchmem -run=^$ -bench ^Benchmark_Norvig2$ github.com/f1monkey/spellchecker -count=1

goos: linux
goarch: amd64
pkg: github.com/f1monkey/spellchecker
cpu: 13th Gen Intel(R) Core(TM) i9-13980HX
Benchmark_Norvig2-32    	     236	   5257185 ns/op	        71.25 success_percent	       285.0 success_words	       400.0 total_words	 1201260 B/op	   19346 allocs/op
PASS
ok  	github.com/f1monkey/spellchecker	4.350s

About

Yet another spellchecker written in go

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages