From 9e63f78f91cfd1f0ddbc5c1889f15d2c31ea0806 Mon Sep 17 00:00:00 2001
From: URRO <me@urro.xyz>
Date: Sun, 9 Mar 2025 13:26:11 -0400
Subject: [PATCH 1/5] Update README.md

---
 README.md | 460 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 248 insertions(+), 212 deletions(-)
diff --git a/README.md b/README.md
index 6cc3903..8ba4dea 100644
--- a/README.md
+++ b/README.md
@@ -9,27 +9,31 @@
 
 <details>
 
-#### **Features**
+#### Features
 - LlamaCPP Python wrapper support ([#116](https://github.com/epfl-dlab/transformers-CFG/pull/116))
 
-#### **Bug fixes**
+#### Bug fixes
 - `pip show` license ([#117](https://github.com/epfl-dlab/transformers-CFG/pull/117))
 
 </details>
 
 ### Latest stable
-#### **[v0.2.7 Latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02)
 
-#### **Features**
+#### [V0.2.7 latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7) (2025-03-02)
+
+#### Features
 
 - Types and MLX ([#93](https://github.com/epfl-dlab/transformers-CFG/pull/93))
-- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94), [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95), [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104))
+- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94),
+  [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95),
+  [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96),
+  [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104))
 - Qwen2 and Qwen2.5 ([#97](https://github.com/epfl-dlab/transformers-CFG/pull/97))
-- Resuable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100))
+- Reusable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100))
 - Pytest for testing ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109))
 - GitHub Actions workflow for automation ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110))
 
-#### **Bug fixes**
+#### Bug fixes
 
 - Avoid computing full masks and optimized type additions ([#101](https://github.com/epfl-dlab/transformers-CFG/pull/101))
 - Refactored grammar encoding to improve structure ([#99](https://github.com/epfl-dlab/transformers-CFG/pull/99))
@@ -41,17 +45,22 @@
 - **[Gemma-2](https://github.com/epfl-dlab/transformers-CFG/pull/75)** — @fillassuncao (2024-08-16)
 - **[DeepSeek](https://github.com/epfl-dlab/transformers-CFG/pull/73)** (2024-07-24)
 - **LLaMA-3** (2024-07-08)
-- **[JSON Schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13)
+- **[JSON schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13)
 - **Mask optimization** (2024-04-25)
 - **[Phi](https://github.com/epfl-dlab/transformers-CFG/issues/34)** (2024-04-16)
 - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10)
 - **Unicode and foreign text** (2024-02-29)
 - **Text-Generation-WebUI** (2023-12-17)
-  - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)).
+  - We are pleased to announce that `transformers-cfg` has been integrated into the
+    [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to
+    leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)).
 
 ## 🚀 Introduction
 
-Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.
+Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers)
+library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face
+Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers
+parallel for LlamaCPP's GBNF support, but with stricter generation rules.
 
 ## 💻 Installation
 
@@ -72,207 +81,198 @@ pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main
 ```
 
 ## 🔧 Grammar quickstart
-Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text.
+
+Let's set up a predictable generation method where the model would usually reply with
+"The animal is a dog." Instead, we force the model to say either "The animal is a cat" or
+"The animal is a fish" — two other common domestic pets that contradict the initial text.
 
 ### Command-line interface (CLI)
 
-The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar. Unicode is supported.
+The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar.
+Unicode is supported.
 
 ```bash
+# Run text generation using the CLI tool.
+# Note: Use proper quotes so that the inner double quotes are preserved.
 transformers-cfg-cli generate \
-    -m "microsoft/Phi-3-mini-4k-instruct" \
-    -g "examples/grammars/json.ebnf" \
-    -p "This is a valid JSON string for an HTTP request:" \
-    --use_4bit \
-    --max_new_tokens 60 \
-    --repetition_penalty 1.1
-# {"name":"John","age":30,"car":null}
+    -m "facebook/opt-125m" \
+    -g "examples/grammars/animal.ebnf" \
+    -p 'The text says, "The animal is a dog." The answer is obvious.' \
+    --max_new_tokens 60
 ```
 
 Run `transformers-cfg-cli generate --help` for available options.
 
 ### Transformers *Torch*
 
-```py
+#### Non-streaming example
+
+```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
 from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
 
 if __name__ == "__main__":
-    # Detect if GPU is available, otherwise use CPU
+    # 1. Set device (GPU if available, otherwise CPU)
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    print(f"Using device: {device}")
+    print("Using device:", device)
 
+    # 2. Specify model ID
     model_id = "facebook/opt-125m"
 
-    # Load model and tokenizer
+    # 3. Load tokenizer and model
     tokenizer = AutoTokenizer.from_pretrained(model_id)
     tokenizer.pad_token = tokenizer.eos_token
-
     model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
     model.generation_config.pad_token_id = model.generation_config.eos_token_id
 
-    # Define grammar string
-    json_grammar = """
-
+    # 4. Define the grammar string
+    grammar_str = r"""
     root   ::= "The animal is a " animal "."
-
     animal ::= "cat" | "fish"
-
     """
     
-    grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer)
-    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
-
-    # Generate
+    # 5. Create grammar constraint and logits processor
+    grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
+    logits_processor = GrammarConstrainedLogitsProcessor(grammar)
+    
+    # 6. Create prompt(s) and tokenize
     prompts = [
-        'The text says, "The animal is a dog." The answer is obvious. ', 'I\'m going to say "The animal is a dog." Here I go! '
-              ]
-    input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)
-
-    output = model.generate(
-        input_ids,
+        'The text says, "The animal is a dog." The answer is obvious.',
+        'I\'m going to say "The animal is a dog." Here I go!'
+    ]
+    inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device)
+
+    # 7. Generate outputs
+    outputs = model.generate(
+        inputs,
         max_length=50,
-        logits_processor=[grammar_processor],
+        logits_processor=[logits_processor],
         repetition_penalty=1.1,
-        num_return_sequences=1,
+        num_return_sequences=1
     )
     
-    # Decode output
-    generations = tokenizer.batch_decode(output, skip_special_tokens=True)
-
-    # Print all generations in for loop
-    for generation in generations:
-        print(generation)
-
+    # 8. Decode and print generated text
+    results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
+    for result in results:
+        print(result)
 ```
 
-#### Stream
+#### Streaming example
 
 <details>
+<summary>Streaming example</summary>
 
-```py
+```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
 from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
 from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
 
 if __name__ == "__main__":
-    # Detect if GPU is available, otherwise use CPU
+    # 1. Set device (GPU if available, otherwise CPU)
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    print(f"Using device: {device}")
+    print("Using device:", device)
 
+    # 2. Specify model ID
     model_id = "facebook/opt-125m"
 
-    # Load model and tokenizer
+    # 3. Load tokenizer and model
     tokenizer = AutoTokenizer.from_pretrained(model_id)
     tokenizer.pad_token = tokenizer.eos_token
-
     model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
     model.generation_config.pad_token_id = model.generation_config.eos_token_id
 
-    # Define grammar as a string
-    grammar_str = """
-
+    # 4. Define the grammar string
+    grammar_str = r"""
     root   ::= "The animal is a " animal "."
-
     animal ::= "cat" | "fish"
-
     """
     
+    # 5. Create grammar constraint and logits processor
     grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
-    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
-
-    # Generate
-    prompts = [
-        'The text says, "The animal is a dog." The answer is obvious. ', #'I\'m going to say "The animal is a dog." Here I go! '
-              ]
-    input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)
-
-    # Set up streaming
+    logits_processor = GrammarConstrainedLogitsProcessor(grammar)
+    
+    # 6. Create prompt and tokenize
+    prompts = ['The text says, "The animal is a dog." The answer is obvious.']
+    inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device)
+    
+    # 7. Set up the streamer for output
     streamer = TextStreamer(tokenizer)
-
-    output = model.generate(
-        input_ids,
+    
+    # 8. Generate outputs using streaming
+    outputs = model.generate(
+        inputs,
         max_length=50,
-        logits_processor=[grammar_processor],
+        logits_processor=[logits_processor],
         repetition_penalty=1.1,
         num_return_sequences=1,
         streamer=streamer
     )
-
 ```
 
 </details>
 
-### Transformers *Pipeline*
+### Transformers pipeline
 
 <details>
+<summary>Streaming example</summary>
 
-```py
+```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
 from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
 
-# Load model and tokenizer
+# 1. Specify model ID
 model_id = "facebook/opt-125m"
 
+# 2. Load tokenizer and model
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 tokenizer.pad_token = tokenizer.eos_token
-
-# Detect if GPU is available, otherwise use CPU
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
 model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
 
-# Define grammar string
-json_grammar = """
-
+# 3. Define the grammar string
+grammar_str = r"""
 root   ::= "The animal is a " animal "."
-
 animal ::= "cat" | "fish"
-
 """
 
-grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer)
-grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
+# 4. Create grammar constraint and logits processor
+grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
+logits_processor = GrammarConstrainedLogitsProcessor(grammar)
 
-# Initialize pipeline
-pipe = pipeline(
+# 5. Initialize the text-generation pipeline
+text_pipe = pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     device_map="auto",
     max_new_tokens=100,
-    batch_size=2,
+    batch_size=2
 )
 
-# Generate text
-generations = pipe(
-    [
-        'The text says, "The animal is a dog." The answer is obvious. ',
-        'I\'m going to say "The animal is a dog." Here I go! '
-    ],
-    do_sample=False,
-    logits_processor=[grammar_processor],
-)
-
-# Print results
-for generation_group in generations:
-    for generation in generation_group:
-        print(generation['generated_text'])
+# 6. Define prompts and generate text
+prompts = [
+    'The text says, "The animal is a dog." The answer is obvious.',
+    'I\'m going to say "The animal is a dog." Here I go!'
+]
+generations = text_pipe(prompts, do_sample=False, logits_processor=[logits_processor])
+for group in generations:
+    for gen in group:
+        print(gen["generated_text"])
 ```
 
 </details>
 
-### LlamaCPP Python
-Use the `llama-cpp-python` adapter, automatically loadable with the `adapter` parameter.
+### LlamaCPP python
+
+#### Non-streaming example
 
-```py
+```python
 import io
-import torch
 import logging
 from contextlib import redirect_stderr
 from llama_cpp import Llama
@@ -282,50 +282,91 @@ from transformers import AutoTokenizer
 
 logging.basicConfig(level=logging.INFO)
 
-# Define your EBNF grammar (you can replace this with your own)
-ebnf_grammar = """
-
+# 1. Define the EBNF grammar string
+grammar_str = r"""
     root   ::= "The animal is a " animal "."
-
     animal ::= "cat" | "fish"
+"""
 
-    """
-
-# Load the tokenizer matching your model
+# 2. Load the tokenizer for the model
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b")
 
-# Redirect stderr and load the model via llama-cpp-python
+# 3. Load the model using llama-cpp-python (suppress stderr)
 f = io.StringIO()
 with redirect_stderr(f):
     model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False)
 
-# Create the grammar constraint and the logits processor with the new parameter.
-grammar_constraint = IncrementalGrammarConstraint(ebnf_grammar, "root", tokenizer)
-grammar_processor = GrammarConstrainedLogitsProcessor(grammar_constraint, adapter="llama-cpp-python")
+# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter)
+grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
+logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
+
+# 5. Define the prompt and generate a completion (non-streaming)
+prompt = 'The text says, "The animal is a dog." The answer is obvious.'
+response = model.create_completion(
+    prompt=prompt,
+    logits_processor=[logits_processor],
+    max_tokens=100,
+)
+print(response["choices"][0]["text"])
+```
+
+#### Streaming example
+
+<details>
+<summary>Streaming example</summary>
+
+```python
+import io
+import logging
+from contextlib import redirect_stderr
+from llama_cpp import Llama
+from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
+from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
+from transformers import AutoTokenizer
+
+logging.basicConfig(level=logging.INFO)
 
-# Define a prompt.
-prompt = """The text says, "The animal is a dog." The answer is obvious. """
+# 1. Define the EBNF grammar string
+grammar_str = r"""
+    root   ::= "The animal is a " animal "."
+    animal ::= "cat" | "fish"
+"""
 
-# Use the text completion API with the logits processor.
+# 2. Load the tokenizer for the model
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b")
+
+# 3. Load the model using llama-cpp-python (suppress stderr)
+f = io.StringIO()
+with redirect_stderr(f):
+    model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False)
+
+# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter)
+grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
+logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
+
+# 5. Define the prompt and generate a completion using streaming
+prompt = 'The text says, "The animal is a dog." The answer is obvious.'
 response = model.create_completion(
     stream=True,
     prompt=prompt,
-    logits_processor=[grammar_processor],
+    logits_processor=[logits_processor],
     max_tokens=100,
 )
 
+# 6. Print tokens as they are received
 for token in response:
-    token_text = token["choices"][0]["text"]
-    print(token_text, end="", flush=True)
-
+    print(token["choices"][0]["text"], end="", flush=True)
 ```
 
-## 💡 Why use `transformers-cfg`?
+</details>
+
+## 💡 Why use transformers-cfg?
 
-- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description.
-- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement.
-- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library.
-- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗).  
+- **EBNF grammar support:** Uses Extended Backus-Naur Form (EBNF) for grammar description.
+- **Seamless integration:** Compatible with the llama-cpp project for easy replacement.
+- **Broad model compatibility:** Works with all models in the 🤗 Transformers library.
+- **Multilingual grammar support:** Enables grammars in various languages, including Chinese (中文), Japanese (日本語),
+  Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗).
 
 ## 🤔 What is a grammar?
 
@@ -359,94 +400,88 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
 
 ## ✅ Supported models
 
-### Qwen  
-<details>  
-<summary>Qwen</summary>  
-  
-- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) ≤ 2.5  
-
-</details>  
-
-### Meta (LLaMa)  
-<details>  
-<summary>Meta (LLaMa)</summary>  
-
-- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf) ≤ 3.0  
-- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b)  
-- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded)  
-- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319)  
-- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf)  
-- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)  
-- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ)  
-- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf)  
-- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer)  
-- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)  
-- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)  
-- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)  
-- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K)  
-- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2)  
-- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b)  
-- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ)  
-- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat)  
-- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat)  
-- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B)  
-
-</details>  
-
-### GPT  
-<details>  
-<summary>GPT</summary>  
-
-- [GPT](https://huggingface.co/openai-community/gpt2) ≤ 2  
-- [gpt2](https://huggingface.co/gpt2)  
-- [distilgpt2](https://huggingface.co/distilgpt2)  
-- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)  
-- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl)  
-- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium)  
-- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m)  
-
-</details>  
-
-### Mistral  
-<details>  
-<summary>Mistral</summary>  
-
-- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) ≤ 0.3  
-- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)  
-- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)  
-
-</details>  
-
-### Falcon  
-<details>  
-<summary>Falcon</summary>  
-
-- [Falcon](https://huggingface.co/tiiuae/falcon-7b)  
-- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)  
-- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)  
-
-</details>  
-
-### OPT  
-<details>  
-<summary>OPT</summary>  
-
-- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844)  
-- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)  
-- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)  
-- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)  
-- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)  
-- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b)  
+<details>
+<summary>Qwen (≤ 2.5)</summary>
+
+- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
+
+</details>
+
+<details>
+<summary>Meta (LLaMa) (≤ 3.0)</summary>
+
+- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf)
+- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b)
+- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded)
+- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319)
+- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf)
+- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)
+- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ)
+- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf)
+- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer)
+- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)
+- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)
+- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
+- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K)
+- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2)
+- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b)
+- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ)
+- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat)
+- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat)
+- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B)
+
+</details>
+
+<details>
+<summary>GPT (≤ 2)</summary>
+
+- [GPT](https://huggingface.co/openai-community/gpt2)
+- [gpt2](https://huggingface.co/gpt2)
+- [distilgpt2](https://huggingface.co/distilgpt2)
+- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)
+- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl)
+- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium)
+- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m)
+
+</details>
+
+<details>
+<summary>Mistral (≤ 0.3)</summary>
+
+- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
+- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
+
+</details>
+
+<details>
+<summary>Falcon (≤ 3)</summary>
+
+- [Falcon](https://huggingface.co/tiiuae/falcon-7b)
+- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)
+- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
+
+</details>
+
+<details>
+<summary>OPT (≤ 125m)</summary>
+
+- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844)
+- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)
+- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)
+- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
+- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)
+- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b)
 
 </details>
 
-See [supported_models.yaml](docs/supported_models.yaml) for the full list whose extent is constantly being updated.
+See [supported_models.yaml](docs/supported_models.yaml) for the full list whose content is constantly being updated.
 
-If you encounter an unsupported model, please open an issue or submit a pull request.
+If you encounter an unsupported model, please open an issue or create a pull request.
 
 ## 📖 Citation
 
-If you find this work useful, please cite it with the reccomended citation:
+If you find this work useful, please cite it with the recommended citation:
 
 ```bibtex
 @inproceedings{geng-etal-2023-grammar,
@@ -468,4 +503,5 @@ This project is licensed under the [MIT License](LICENSE).
 
 ## 🙌 Acknowledgements
 
-Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on [llama-cpp](https://github.com/ggerganov/llama.cpp).
+Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on
+[llama-cpp](https://github.com/ggerganov/llama.cpp).

From 94bd0da80901efa729914b772d888f41dc6bd767 Mon Sep 17 00:00:00 2001
From: URRO <me@urro.xyz>
Date: Sun, 9 Mar 2025 13:59:42 -0400
Subject: [PATCH 2/5] Update README.md

---
 README.md | 442 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 226 insertions(+), 216 deletions(-)

diff --git a/README.md b/README.md
index 8ba4dea..7ab7e71 100644
--- a/README.md
+++ b/README.md
@@ -7,33 +7,37 @@
 
 ### Latest experimental
 
+#### **Features**
+
 <details>
 
-#### Features
 - LlamaCPP Python wrapper support ([#116](https://github.com/epfl-dlab/transformers-CFG/pull/116))
 
-#### Bug fixes
+</details>
+
+#### **Bug fixes**
+
+<details>
+
 - `pip show` license ([#117](https://github.com/epfl-dlab/transformers-CFG/pull/117))
 
 </details>
 
 ### Latest stable
+#### **[v0.2.7](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02)
 
-#### [V0.2.7 latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7) (2025-03-02)
-
-#### Features
+#### **Features**
 
 - Types and MLX ([#93](https://github.com/epfl-dlab/transformers-CFG/pull/93))
-- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94),
-  [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95),
-  [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96),
-  [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104))
+- Negation ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94))
+- Wildcards ([#95](https://github.com/epfl-dlab/transformers-CFG/pull/95))
+- Repetition brackets ([#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104))
 - Qwen2 and Qwen2.5 ([#97](https://github.com/epfl-dlab/transformers-CFG/pull/97))
-- Reusable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100))
-- Pytest for testing ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109))
-- GitHub Actions workflow for automation ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110))
+- Resuable logits processor ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100))
+- Pytest ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109))
+- GitHub Actions workflow ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110))
 
-#### Bug fixes
+#### **Bug fixes**
 
 - Avoid computing full masks and optimized type additions ([#101](https://github.com/epfl-dlab/transformers-CFG/pull/101))
 - Refactored grammar encoding to improve structure ([#99](https://github.com/epfl-dlab/transformers-CFG/pull/99))
@@ -45,22 +49,17 @@
 - **[Gemma-2](https://github.com/epfl-dlab/transformers-CFG/pull/75)** — @fillassuncao (2024-08-16)
 - **[DeepSeek](https://github.com/epfl-dlab/transformers-CFG/pull/73)** (2024-07-24)
 - **LLaMA-3** (2024-07-08)
-- **[JSON schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13)
+- **[JSON Schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13)
 - **Mask optimization** (2024-04-25)
 - **[Phi](https://github.com/epfl-dlab/transformers-CFG/issues/34)** (2024-04-16)
 - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10)
 - **Unicode and foreign text** (2024-02-29)
 - **Text-Generation-WebUI** (2023-12-17)
-  - We are pleased to announce that `transformers-cfg` has been integrated into the
-    [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to
-    leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)).
+  - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)).
 
 ## 🚀 Introduction
 
-Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers)
-library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face
-Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers
-parallel for LlamaCPP's GBNF support, but with stricter generation rules.
+Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.
 
 ## 💻 Installation
 
@@ -80,198 +79,231 @@ For the latest updates, install directly from GitHub:
 pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main
 ```
 
-## 🔧 Grammar quickstart
+## 💡 Why use `transformers-cfg`?
+
+- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description.
+- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement.
+- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library.
+- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗).  
+
+## 🤔 What is a grammar?
+
+Think of it as an enhanced version of regular expressions.
+
+### Valid JSON object
+
+```bnf
+root ::= object
+object ::= "{" pair ("," pair)* "}"
+pair ::= string ":" value
+string ::= '"' [a-zA-Z0-9]* '"'
+value ::= string | object | "true" | "false" | "null"
+```
+
+For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md).
 
-Let's set up a predictable generation method where the model would usually reply with
-"The animal is a dog." Instead, we force the model to say either "The animal is a cat" or
-"The animal is a fish" — two other common domestic pets that contradict the initial text.
+## 🔧 Grammar quickstart
+Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text.
 
 ### Command-line interface (CLI)
 
-The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar.
-Unicode is supported.
+The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar. Unicode is supported.
 
 ```bash
-# Run text generation using the CLI tool.
-# Note: Use proper quotes so that the inner double quotes are preserved.
 transformers-cfg-cli generate \
     -m "facebook/opt-125m" \
     -g "examples/grammars/animal.ebnf" \
     -p 'The text says, "The animal is a dog." The answer is obvious.' \
-    --max_new_tokens 60
+    --max_new_tokens 50 \
+# The animal is a cat.
 ```
 
 Run `transformers-cfg-cli generate --help` for available options.
 
 ### Transformers *Torch*
 
-#### Non-streaming example
-
-```python
+```py
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
 from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
 
 if __name__ == "__main__":
-    # 1. Set device (GPU if available, otherwise CPU)
+    # Set device: use GPU if available, else CPU.
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    print("Using device:", device)
+    print(f"Using device: {device}")
 
-    # 2. Specify model ID
+    # Model identifier
     model_id = "facebook/opt-125m"
 
-    # 3. Load tokenizer and model
+    # Load model and tokenizer
     tokenizer = AutoTokenizer.from_pretrained(model_id)
     tokenizer.pad_token = tokenizer.eos_token
     model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
     model.generation_config.pad_token_id = model.generation_config.eos_token_id
 
-    # 4. Define the grammar string
-    grammar_str = r"""
+    # Define grammar string
+    grammar_str = """
     root   ::= "The animal is a " animal "."
     animal ::= "cat" | "fish"
     """
     
-    # 5. Create grammar constraint and logits processor
+    # Create grammar constraint and logits processor
     grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
-    logits_processor = GrammarConstrainedLogitsProcessor(grammar)
-    
-    # 6. Create prompt(s) and tokenize
+    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
+
+    # Define prompts
     prompts = [
         'The text says, "The animal is a dog." The answer is obvious.',
         'I\'m going to say "The animal is a dog." Here I go!'
     ]
-    inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device)
+    
+    # Tokenize prompts
+    input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)
 
-    # 7. Generate outputs
-    outputs = model.generate(
-        inputs,
+    # Generate constrained text
+    output = model.generate(
+        input_ids,
         max_length=50,
-        logits_processor=[logits_processor],
+        logits_processor=[grammar_processor],
         repetition_penalty=1.1,
-        num_return_sequences=1
+        num_return_sequences=1,
     )
     
-    # 8. Decode and print generated text
-    results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
-    for result in results:
-        print(result)
+    # Decode and print generated text
+    generations = tokenizer.batch_decode(output, skip_special_tokens=True)
+    for generation in generations:
+        print(generation)
+
+# The animal is a cat.
 ```
 
-#### Streaming example
+#### Stream
 
 <details>
-<summary>Streaming example</summary>
 
-```python
+```py
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
 from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
 from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
 
 if __name__ == "__main__":
-    # 1. Set device (GPU if available, otherwise CPU)
+    # Set device: use GPU if available, else CPU
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    print("Using device:", device)
+    print(f"Using device: {device}")
 
-    # 2. Specify model ID
+    # Model identifier
     model_id = "facebook/opt-125m"
 
-    # 3. Load tokenizer and model
+    # Load model and tokenizer
     tokenizer = AutoTokenizer.from_pretrained(model_id)
     tokenizer.pad_token = tokenizer.eos_token
     model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
     model.generation_config.pad_token_id = model.generation_config.eos_token_id
 
-    # 4. Define the grammar string
-    grammar_str = r"""
+    # Define grammar string
+    grammar_str = """
     root   ::= "The animal is a " animal "."
     animal ::= "cat" | "fish"
     """
     
-    # 5. Create grammar constraint and logits processor
+    # Create grammar constraint and logits processor
     grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
-    logits_processor = GrammarConstrainedLogitsProcessor(grammar)
-    
-    # 6. Create prompt and tokenize
-    prompts = ['The text says, "The animal is a dog." The answer is obvious.']
-    inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device)
+    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
+
+    # Define prompt
+    prompts = [
+        'The text says, "The animal is a dog." The answer is obvious.'
+    ]
     
-    # 7. Set up the streamer for output
+    # Tokenize prompt
+    input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)
+
+    # Set up streaming
     streamer = TextStreamer(tokenizer)
-    
-    # 8. Generate outputs using streaming
-    outputs = model.generate(
-        inputs,
+
+    # Generate constrained text with streaming.
+    model.generate(
+        input_ids,
         max_length=50,
-        logits_processor=[logits_processor],
+        logits_processor=[grammar_processor],
         repetition_penalty=1.1,
         num_return_sequences=1,
         streamer=streamer
     )
+
+# The animal is a cat.
 ```
 
 </details>
 
-### Transformers pipeline
+### Transformers *Pipeline*
 
 <details>
-<summary>Streaming example</summary>
 
-```python
+```py
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
 from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
 
-# 1. Specify model ID
+# Model identifier
 model_id = "facebook/opt-125m"
 
-# 2. Load tokenizer and model
+# Load model and tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 tokenizer.pad_token = tokenizer.eos_token
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
 
-# 3. Define the grammar string
-grammar_str = r"""
+# Define grammar string
+grammar_str = """
 root   ::= "The animal is a " animal "."
 animal ::= "cat" | "fish"
 """
 
-# 4. Create grammar constraint and logits processor
+# Create grammar constraint and logits processor
 grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
-logits_processor = GrammarConstrainedLogitsProcessor(grammar)
+grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
 
-# 5. Initialize the text-generation pipeline
-text_pipe = pipeline(
+# Initialize text generation pipeline
+pipe = pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     device_map="auto",
     max_new_tokens=100,
-    batch_size=2
+    batch_size=2,
 )
 
-# 6. Define prompts and generate text
+# Define prompts
 prompts = [
     'The text says, "The animal is a dog." The answer is obvious.',
     'I\'m going to say "The animal is a dog." Here I go!'
 ]
-generations = text_pipe(prompts, do_sample=False, logits_processor=[logits_processor])
-for group in generations:
-    for gen in group:
-        print(gen["generated_text"])
+
+# Generate constrained text using the pipeline.
+generations = pipe(
+    prompts,
+    do_sample=False,
+    logits_processor=[grammar_processor],
+)
+
+# Print generated texts
+for generation_group in generations:
+    for generation in generation_group:
+        print(generation['generated_text'])
+
+# The animal is a cat.
 ```
 
 </details>
 
-### LlamaCPP python
+### LlamaCPP Python
+Use the `llama-cpp-python` adapter, automatically loadable with the `adapter` parameter.
 
-#### Non-streaming example
-
-```python
+```py
 import io
 import logging
 from contextlib import redirect_stderr
@@ -282,40 +314,43 @@ from transformers import AutoTokenizer
 
 logging.basicConfig(level=logging.INFO)
 
-# 1. Define the EBNF grammar string
-grammar_str = r"""
-    root   ::= "The animal is a " animal "."
-    animal ::= "cat" | "fish"
+# Define grammar string.
+grammar_str = """
+root   ::= "The animal is a " animal "."
+animal ::= "cat" | "fish"
 """
 
-# 2. Load the tokenizer for the model
+# Load the tokenizer matching the model.
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b")
 
-# 3. Load the model using llama-cpp-python (suppress stderr)
-f = io.StringIO()
-with redirect_stderr(f):
+# Redirect stderr and load the model via llama-cpp-python.
+with redirect_stderr(io.StringIO()):
     model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False)
 
-# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter)
+# Create grammar constraint and logits processor using the adapter.
 grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
-logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
+grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
 
-# 5. Define the prompt and generate a completion (non-streaming)
+# Define prompt.
 prompt = 'The text says, "The animal is a dog." The answer is obvious.'
+
+# Generate constrained text (non-streaming).
 response = model.create_completion(
     prompt=prompt,
-    logits_processor=[logits_processor],
+    logits_processor=[grammar_processor],
     max_tokens=100,
 )
+
+# Print generated text.
 print(response["choices"][0]["text"])
-```
 
-#### Streaming example
+# The animal is a cat.
+```
 
+#### Stream
 <details>
-<summary>Streaming example</summary>
 
-```python
+```py
 import io
 import logging
 from contextlib import redirect_stderr
@@ -326,67 +361,42 @@ from transformers import AutoTokenizer
 
 logging.basicConfig(level=logging.INFO)
 
-# 1. Define the EBNF grammar string
-grammar_str = r"""
-    root   ::= "The animal is a " animal "."
-    animal ::= "cat" | "fish"
+# Define grammar string
+grammar_str = """
+root   ::= "The animal is a " animal "."
+animal ::= "cat" | "fish"
 """
 
-# 2. Load the tokenizer for the model
+# Load the tokenizer matching the model
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b")
 
-# 3. Load the model using llama-cpp-python (suppress stderr)
-f = io.StringIO()
-with redirect_stderr(f):
+# Redirect stderr and load the model via llama-cpp-python
+with redirect_stderr(io.StringIO()):
     model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False)
 
-# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter)
+# Create grammar constraint and logits processor using the adapter
 grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
-logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
+grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
 
-# 5. Define the prompt and generate a completion using streaming
+# Define prompt.
 prompt = 'The text says, "The animal is a dog." The answer is obvious.'
+
+# Generate constrained text with streaming
 response = model.create_completion(
     stream=True,
     prompt=prompt,
-    logits_processor=[logits_processor],
+    logits_processor=[grammar_processor],
     max_tokens=100,
 )
 
-# 6. Print tokens as they are received
+# Stream and print generated text
 for token in response:
     print(token["choices"][0]["text"], end="", flush=True)
-```
-
-</details>
-
-## 💡 Why use transformers-cfg?
-
-- **EBNF grammar support:** Uses Extended Backus-Naur Form (EBNF) for grammar description.
-- **Seamless integration:** Compatible with the llama-cpp project for easy replacement.
-- **Broad model compatibility:** Works with all models in the 🤗 Transformers library.
-- **Multilingual grammar support:** Enables grammars in various languages, including Chinese (中文), Japanese (日本語),
-  Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗).
 
-## 🤔 What is a grammar?
-
-Think of it as an enhanced version of regular expressions.
-
-### Valid JSON object
-
-```bnf
-root ::= object
-object ::= "{" pair ("," pair)* "}"
-pair ::= string ":" value
-string ::= '"' [a-zA-Z0-9]* '"'
-value ::= string | object | "true" | "false" | "null"
+# The animal is a cat.
 ```
 
-For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md).
-
-## 🛠 JSON schema
-
-Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md).
+</details>
 
 ## 📜 Grammar collection
 
@@ -398,90 +408,91 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
 - [chess.ebnf](examples/grammars/chess.ebnf): Valid chess moves.
 - [arithmetic.ebnf](examples/grammars/arithmetic.ebnf): Valid arithmetic expressions.
 
-## ✅ Supported models
+## 🛠 JSON schema
 
-<details>
-<summary>Qwen (≤ 2.5)</summary>
+Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md).
 
-- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
+## ✅ Supported tokenizers
 
-</details>
 
-<details>
-<summary>Meta (LLaMa) (≤ 3.0)</summary>
-
-- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf)
-- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b)
-- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded)
-- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319)
-- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf)
-- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)
-- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ)
-- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf)
-- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer)
-- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)
-- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)
-- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
-- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K)
-- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2)
-- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b)
-- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ)
-- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat)
-- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat)
-- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B)
+### 🤖 Tested models
 
-</details>
+<details>  
+<summary>Qwen (≤ 2.5)</summary>  
+  
+- [Qwen2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
+- [Qwen2.5]()
 
-<details>
-<summary>GPT (≤ 2)</summary>
+</details>  
 
-- [GPT](https://huggingface.co/openai-community/gpt2)
-- [gpt2](https://huggingface.co/gpt2)
-- [distilgpt2](https://huggingface.co/distilgpt2)
-- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)
-- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl)
-- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium)
-- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m)
+<details>  
+<summary>LLaMa (≤ 3.3)</summary>  
 
-</details>
+- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b)  
+- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded)  
+- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319)  
+- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf)  
+- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)  
+- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ)  
+- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf)  
+- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer)  
+- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)  
+- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)  
+- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)  
+- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K)  
+- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2)  
+- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b)  
+- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ)  
+- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat)  
+- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat)  
+- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B)  
 
-<details>
-<summary>Mistral (≤ 0.3)</summary>
+</details>  
 
-- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1)
-- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
-- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
+<details>  
+<summary>GPT (≤ 2)</summary>  
 
-</details>
+- [gpt2](https://huggingface.co/gpt2)  
+- [distilgpt2](https://huggingface.co/distilgpt2)  
+- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)  
+- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl)  
+- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium)  
+- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m)  
 
-<details>
-<summary>Falcon (≤ 3)</summary>
+</details>  
 
-- [Falcon](https://huggingface.co/tiiuae/falcon-7b)
-- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)
-- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
+<details>  
+<summary>Mistral (≤ 0.3)</summary>  
 
-</details>
+- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)  
+- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)  
 
-<details>
-<summary>OPT (≤ 125m)</summary>
+</details>  
 
-- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844)
-- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)
-- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)
-- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
-- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)
-- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b)
+<details>  
+<summary>Falcon (≤ 3.0)</summary>  
 
-</details>
+- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)  
+- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)  
+
+</details>  
 
-See [supported_models.yaml](docs/supported_models.yaml) for the full list whose content is constantly being updated.
+<details>  
+<summary>OPT</summary>  
+
+- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)  
+- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)  
+- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)  
+- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)  
+- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b)  
+
+</details>
 
-If you encounter an unsupported model, please open an issue or create a pull request.
+If you encounter an unsupported model, please open an issue or submit a pull request.
 
 ## 📖 Citation
 
-If you find this work useful, please cite it with the recommended citation:
+If you find this work useful, please cite it with the reccomended citation:
 
 ```bibtex
 @inproceedings{geng-etal-2023-grammar,
@@ -503,5 +514,4 @@ This project is licensed under the [MIT License](LICENSE).
 
 ## 🙌 Acknowledgements
 
-Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on
-[llama-cpp](https://github.com/ggerganov/llama.cpp).
+Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on [llama-cpp](https://github.com/ggerganov/llama.cpp).

From 28d77bfc04efd11618b20e7d4f3ab373bc34e48e Mon Sep 17 00:00:00 2001
From: URRO <me@urro.xyz>
Date: Sun, 9 Mar 2025 14:01:01 -0400
Subject: [PATCH 3/5] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 7ab7e71..1436b3e 100644
--- a/README.md
+++ b/README.md
@@ -55,11 +55,11 @@
 - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10)
 - **Unicode and foreign text** (2024-02-29)
 - **Text-Generation-WebUI** (2023-12-17)
-  - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)).
+  - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([PR](https://github.com/oobabooga/text-generation-webui/pull/4953)).
 
 ## 🚀 Introduction
 
-Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.
+Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([PR](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.
 
 ## 💻 Installation
 

From ddd0380c10bfb00705d3934425248739a4b772d5 Mon Sep 17 00:00:00 2001
From: URRO <me@urro.xyz>
Date: Sun, 9 Mar 2025 14:04:34 -0400
Subject: [PATCH 4/5] Update README.md

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 1436b3e..262836c 100644
--- a/README.md
+++ b/README.md
@@ -113,7 +113,7 @@ The `transformers-cfg-cli` tool enables text generation using a model and a spec
 transformers-cfg-cli generate \
     -m "facebook/opt-125m" \
     -g "examples/grammars/animal.ebnf" \
-    -p 'The text says, "The animal is a dog." The answer is obvious.' \
+    -p 'The text says, "The animal is a dog." The answer is obvious. ' \
     --max_new_tokens 50 \
 # The animal is a cat.
 ```
@@ -154,8 +154,8 @@ if __name__ == "__main__":
 
     # Define prompts
     prompts = [
-        'The text says, "The animal is a dog." The answer is obvious.',
-        'I\'m going to say "The animal is a dog." Here I go!'
+        'The text says, "The animal is a dog." The answer is obvious. ',
+        'I\'m going to say "The animal is a dog." Here I go! '
     ]
     
     # Tokenize prompts
@@ -214,7 +214,7 @@ if __name__ == "__main__":
 
     # Define prompt
     prompts = [
-        'The text says, "The animal is a dog." The answer is obvious.'
+        'The text says, "The animal is a dog." The answer is obvious. '
     ]
     
     # Tokenize prompt
@@ -279,8 +279,8 @@ pipe = pipeline(
 
 # Define prompts
 prompts = [
-    'The text says, "The animal is a dog." The answer is obvious.',
-    'I\'m going to say "The animal is a dog." Here I go!'
+    'The text says, "The animal is a dog." The answer is obvious. ',
+    'I\'m going to say "The animal is a dog." Here I go! '
 ]
 
 # Generate constrained text using the pipeline.
@@ -332,7 +332,7 @@ grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
 grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
 
 # Define prompt.
-prompt = 'The text says, "The animal is a dog." The answer is obvious.'
+prompt = 'The text says, "The animal is a dog." The answer is obvious. '
 
 # Generate constrained text (non-streaming).
 response = model.create_completion(
@@ -379,7 +379,7 @@ grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
 grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
 
 # Define prompt.
-prompt = 'The text says, "The animal is a dog." The answer is obvious.'
+prompt = 'The text says, "The animal is a dog." The answer is obvious. '
 
 # Generate constrained text with streaming
 response = model.create_completion(

From 06b0f83b92bf3a8d659f5fcff83792d25e098e3f Mon Sep 17 00:00:00 2001
From: URRO <me@urro.xyz>
Date: Sun, 9 Mar 2025 17:46:03 -0400
Subject: [PATCH 5/5] Create animal.ebnf

---
 examples/grammars/animal.ebnf | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 examples/grammars/animal.ebnf

diff --git a/examples/grammars/animal.ebnf b/examples/grammars/animal.ebnf
new file mode 100644
index 0000000..a8c1c3a
--- /dev/null
+++ b/examples/grammars/animal.ebnf
@@ -0,0 +1,2 @@
+root   ::= "The animal is a " animal "."
+animal ::= "cat" | "fish"