diff --git a/README.md b/README.md
index 6cc3903..262836c 100644
--- a/README.md
+++ b/README.md
@@ -7,27 +7,35 @@
### Latest experimental
+#### **Features**
+
-#### **Features**
- LlamaCPP Python wrapper support ([#116](https://github.com/epfl-dlab/transformers-CFG/pull/116))
+
+
#### **Bug fixes**
+
+
+
- `pip show` license ([#117](https://github.com/epfl-dlab/transformers-CFG/pull/117))
### Latest stable
-#### **[v0.2.7 Latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02)
+#### **[v0.2.7](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02)
#### **Features**
- Types and MLX ([#93](https://github.com/epfl-dlab/transformers-CFG/pull/93))
-- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94), [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95), [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104))
+- Negation ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94))
+- Wildcards ([#95](https://github.com/epfl-dlab/transformers-CFG/pull/95))
+- Repetition brackets ([#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104))
- Qwen2 and Qwen2.5 ([#97](https://github.com/epfl-dlab/transformers-CFG/pull/97))
-- Resuable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100))
-- Pytest for testing ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109))
-- GitHub Actions workflow for automation ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110))
+- Resuable logits processor ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100))
+- Pytest ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109))
+- GitHub Actions workflow ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110))
#### **Bug fixes**
@@ -47,11 +55,11 @@
- **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10)
- **Unicode and foreign text** (2024-02-29)
- **Text-Generation-WebUI** (2023-12-17)
- - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)).
+ - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([PR](https://github.com/oobabooga/text-generation-webui/pull/4953)).
## 🚀 Introduction
-Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.
+Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([PR](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules.
## 💻 Installation
@@ -71,6 +79,29 @@ For the latest updates, install directly from GitHub:
pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main
```
+## 💡 Why use `transformers-cfg`?
+
+- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description.
+- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement.
+- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library.
+- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗).
+
+## 🤔 What is a grammar?
+
+Think of it as an enhanced version of regular expressions.
+
+### Valid JSON object
+
+```bnf
+root ::= object
+object ::= "{" pair ("," pair)* "}"
+pair ::= string ":" value
+string ::= '"' [a-zA-Z0-9]* '"'
+value ::= string | object | "true" | "false" | "null"
+```
+
+For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md).
+
## 🔧 Grammar quickstart
Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text.
@@ -80,13 +111,11 @@ The `transformers-cfg-cli` tool enables text generation using a model and a spec
```bash
transformers-cfg-cli generate \
- -m "microsoft/Phi-3-mini-4k-instruct" \
- -g "examples/grammars/json.ebnf" \
- -p "This is a valid JSON string for an HTTP request:" \
- --use_4bit \
- --max_new_tokens 60 \
- --repetition_penalty 1.1
-# {"name":"John","age":30,"car":null}
+ -m "facebook/opt-125m" \
+ -g "examples/grammars/animal.ebnf" \
+ -p 'The text says, "The animal is a dog." The answer is obvious. ' \
+ --max_new_tokens 50 \
+# The animal is a cat.
```
Run `transformers-cfg-cli generate --help` for available options.
@@ -100,37 +129,39 @@ from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
if __name__ == "__main__":
- # Detect if GPU is available, otherwise use CPU
+ # Set device: use GPU if available, else CPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
+ # Model identifier
model_id = "facebook/opt-125m"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
-
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
# Define grammar string
- json_grammar = """
-
+ grammar_str = """
root ::= "The animal is a " animal "."
-
animal ::= "cat" | "fish"
-
"""
- grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer)
+ # Create grammar constraint and logits processor
+ grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
- # Generate
+ # Define prompts
prompts = [
- 'The text says, "The animal is a dog." The answer is obvious. ', 'I\'m going to say "The animal is a dog." Here I go! '
- ]
+ 'The text says, "The animal is a dog." The answer is obvious. ',
+ 'I\'m going to say "The animal is a dog." Here I go! '
+ ]
+
+ # Tokenize prompts
input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)
+ # Generate constrained text
output = model.generate(
input_ids,
max_length=50,
@@ -139,13 +170,12 @@ if __name__ == "__main__":
num_return_sequences=1,
)
- # Decode output
+ # Decode and print generated text
generations = tokenizer.batch_decode(output, skip_special_tokens=True)
-
- # Print all generations in for loop
for generation in generations:
print(generation)
+# The animal is a cat.
```
#### Stream
@@ -159,41 +189,42 @@ from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
if __name__ == "__main__":
- # Detect if GPU is available, otherwise use CPU
+ # Set device: use GPU if available, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
+ # Model identifier
model_id = "facebook/opt-125m"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
-
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
- # Define grammar as a string
+ # Define grammar string
grammar_str = """
-
root ::= "The animal is a " animal "."
-
animal ::= "cat" | "fish"
-
"""
+ # Create grammar constraint and logits processor
grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
- # Generate
+ # Define prompt
prompts = [
- 'The text says, "The animal is a dog." The answer is obvious. ', #'I\'m going to say "The animal is a dog." Here I go! '
- ]
+ 'The text says, "The animal is a dog." The answer is obvious. '
+ ]
+
+ # Tokenize prompt
input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device)
# Set up streaming
streamer = TextStreamer(tokenizer)
- output = model.generate(
+ # Generate constrained text with streaming.
+ model.generate(
input_ids,
max_length=50,
logits_processor=[grammar_processor],
@@ -202,6 +233,7 @@ if __name__ == "__main__":
streamer=streamer
)
+# The animal is a cat.
```
@@ -216,30 +248,26 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
-# Load model and tokenizer
+# Model identifier
model_id = "facebook/opt-125m"
+# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
-
-# Detect if GPU is available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
# Define grammar string
-json_grammar = """
-
+grammar_str = """
root ::= "The animal is a " animal "."
-
animal ::= "cat" | "fish"
-
"""
-grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer)
+# Create grammar constraint and logits processor
+grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
grammar_processor = GrammarConstrainedLogitsProcessor(grammar)
-# Initialize pipeline
+# Initialize text generation pipeline
pipe = pipeline(
"text-generation",
model=model,
@@ -249,20 +277,25 @@ pipe = pipeline(
batch_size=2,
)
-# Generate text
+# Define prompts
+prompts = [
+ 'The text says, "The animal is a dog." The answer is obvious. ',
+ 'I\'m going to say "The animal is a dog." Here I go! '
+]
+
+# Generate constrained text using the pipeline.
generations = pipe(
- [
- 'The text says, "The animal is a dog." The answer is obvious. ',
- 'I\'m going to say "The animal is a dog." Here I go! '
- ],
+ prompts,
do_sample=False,
logits_processor=[grammar_processor],
)
-# Print results
+# Print generated texts
for generation_group in generations:
for generation in generation_group:
print(generation['generated_text'])
+
+# The animal is a cat.
```
@@ -272,7 +305,6 @@ Use the `llama-cpp-python` adapter, automatically loadable with the `adapter` pa
```py
import io
-import torch
import logging
from contextlib import redirect_stderr
from llama_cpp import Llama
@@ -282,70 +314,89 @@ from transformers import AutoTokenizer
logging.basicConfig(level=logging.INFO)
-# Define your EBNF grammar (you can replace this with your own)
-ebnf_grammar = """
-
- root ::= "The animal is a " animal "."
-
- animal ::= "cat" | "fish"
-
- """
+# Define grammar string.
+grammar_str = """
+root ::= "The animal is a " animal "."
+animal ::= "cat" | "fish"
+"""
-# Load the tokenizer matching your model
+# Load the tokenizer matching the model.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b")
-# Redirect stderr and load the model via llama-cpp-python
-f = io.StringIO()
-with redirect_stderr(f):
+# Redirect stderr and load the model via llama-cpp-python.
+with redirect_stderr(io.StringIO()):
model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False)
-# Create the grammar constraint and the logits processor with the new parameter.
-grammar_constraint = IncrementalGrammarConstraint(ebnf_grammar, "root", tokenizer)
-grammar_processor = GrammarConstrainedLogitsProcessor(grammar_constraint, adapter="llama-cpp-python")
+# Create grammar constraint and logits processor using the adapter.
+grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
+grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
-# Define a prompt.
-prompt = """The text says, "The animal is a dog." The answer is obvious. """
+# Define prompt.
+prompt = 'The text says, "The animal is a dog." The answer is obvious. '
-# Use the text completion API with the logits processor.
+# Generate constrained text (non-streaming).
response = model.create_completion(
- stream=True,
prompt=prompt,
logits_processor=[grammar_processor],
max_tokens=100,
)
-for token in response:
- token_text = token["choices"][0]["text"]
- print(token_text, end="", flush=True)
+# Print generated text.
+print(response["choices"][0]["text"])
+# The animal is a cat.
```
-## 💡 Why use `transformers-cfg`?
+#### Stream
+
-- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description.
-- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement.
-- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library.
-- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗).
+```py
+import io
+import logging
+from contextlib import redirect_stderr
+from llama_cpp import Llama
+from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
+from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
+from transformers import AutoTokenizer
-## 🤔 What is a grammar?
+logging.basicConfig(level=logging.INFO)
-Think of it as an enhanced version of regular expressions.
+# Define grammar string
+grammar_str = """
+root ::= "The animal is a " animal "."
+animal ::= "cat" | "fish"
+"""
-### Valid JSON object
+# Load the tokenizer matching the model
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b")
-```bnf
-root ::= object
-object ::= "{" pair ("," pair)* "}"
-pair ::= string ":" value
-string ::= '"' [a-zA-Z0-9]* '"'
-value ::= string | object | "true" | "false" | "null"
-```
+# Redirect stderr and load the model via llama-cpp-python
+with redirect_stderr(io.StringIO()):
+ model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False)
-For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md).
+# Create grammar constraint and logits processor using the adapter
+grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
+grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python")
-## 🛠 JSON schema
+# Define prompt.
+prompt = 'The text says, "The animal is a dog." The answer is obvious. '
-Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md).
+# Generate constrained text with streaming
+response = model.create_completion(
+ stream=True,
+ prompt=prompt,
+ logits_processor=[grammar_processor],
+ max_tokens=100,
+)
+
+# Stream and print generated text
+for token in response:
+ print(token["choices"][0]["text"], end="", flush=True)
+
+# The animal is a cat.
+```
+
+
## 📜 Grammar collection
@@ -357,21 +408,26 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
- [chess.ebnf](examples/grammars/chess.ebnf): Valid chess moves.
- [arithmetic.ebnf](examples/grammars/arithmetic.ebnf): Valid arithmetic expressions.
-## ✅ Supported models
+## 🛠 JSON schema
+
+Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md).
+
+## ✅ Supported tokenizers
+
+
+### 🤖 Tested models
-### Qwen
-Qwen
+Qwen (≤ 2.5)
-- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) ≤ 2.5
+- [Qwen2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
+- [Qwen2.5]()
-### Meta (LLaMa)
-Meta (LLaMa)
+LLaMa (≤ 3.3)
-- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf) ≤ 3.0
- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b)
- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded)
- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319)
@@ -393,11 +449,9 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
-### GPT
-GPT
+GPT (≤ 2)
-- [GPT](https://huggingface.co/openai-community/gpt2) ≤ 2
- [gpt2](https://huggingface.co/gpt2)
- [distilgpt2](https://huggingface.co/distilgpt2)
- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)
@@ -407,31 +461,25 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
-### Mistral
-Mistral
+Mistral (≤ 0.3)
-- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) ≤ 0.3
- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
-### Falcon
-Falcon
+Falcon (≤ 3.0)
-- [Falcon](https://huggingface.co/tiiuae/falcon-7b)
- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)
- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
-### OPT
OPT
-- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844)
- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)
- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)
- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
@@ -440,8 +488,6 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
-See [supported_models.yaml](docs/supported_models.yaml) for the full list whose extent is constantly being updated.
-
If you encounter an unsupported model, please open an issue or submit a pull request.
## 📖 Citation
diff --git a/examples/grammars/animal.ebnf b/examples/grammars/animal.ebnf
new file mode 100644
index 0000000..a8c1c3a
--- /dev/null
+++ b/examples/grammars/animal.ebnf
@@ -0,0 +1,2 @@
+root ::= "The animal is a " animal "."
+animal ::= "cat" | "fish"