From 9e63f78f91cfd1f0ddbc5c1889f15d2c31ea0806 Mon Sep 17 00:00:00 2001 From: URRO Date: Sun, 9 Mar 2025 13:26:11 -0400 Subject: [PATCH 1/5] Update README.md --- README.md | 460 +++++++++++++++++++++++++++++------------------------- 1 file changed, 248 insertions(+), 212 deletions(-) diff --git a/README.md b/README.md index 6cc3903..8ba4dea 100644 --- a/README.md +++ b/README.md @@ -9,27 +9,31 @@
-#### **Features** +#### Features - LlamaCPP Python wrapper support ([#116](https://github.com/epfl-dlab/transformers-CFG/pull/116)) -#### **Bug fixes** +#### Bug fixes - `pip show` license ([#117](https://github.com/epfl-dlab/transformers-CFG/pull/117))
### Latest stable -#### **[v0.2.7 Latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02) -#### **Features** +#### [V0.2.7 latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7) (2025-03-02) + +#### Features - Types and MLX ([#93](https://github.com/epfl-dlab/transformers-CFG/pull/93)) -- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94), [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95), [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104)) +- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94), + [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95), + [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), + [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104)) - Qwen2 and Qwen2.5 ([#97](https://github.com/epfl-dlab/transformers-CFG/pull/97)) -- Resuable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100)) +- Reusable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100)) - Pytest for testing ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109)) - GitHub Actions workflow for automation ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110)) -#### **Bug fixes** +#### Bug fixes - Avoid computing full masks and optimized type additions ([#101](https://github.com/epfl-dlab/transformers-CFG/pull/101)) - Refactored grammar encoding to improve structure ([#99](https://github.com/epfl-dlab/transformers-CFG/pull/99)) @@ -41,17 +45,22 @@ - **[Gemma-2](https://github.com/epfl-dlab/transformers-CFG/pull/75)** — @fillassuncao (2024-08-16) - **[DeepSeek](https://github.com/epfl-dlab/transformers-CFG/pull/73)** (2024-07-24) - **LLaMA-3** (2024-07-08) -- **[JSON Schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13) +- **[JSON schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13) - **Mask optimization** (2024-04-25) - **[Phi](https://github.com/epfl-dlab/transformers-CFG/issues/34)** (2024-04-16) - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10) - **Unicode and foreign text** (2024-02-29) - **Text-Generation-WebUI** (2023-12-17) - - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)). + - We are pleased to announce that `transformers-cfg` has been integrated into the + [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to + leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)). ## 🚀 Introduction -Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules. +Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) +library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face +Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers +parallel for LlamaCPP's GBNF support, but with stricter generation rules. ## 💻 Installation @@ -72,207 +81,198 @@ pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main ``` ## 🔧 Grammar quickstart -Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text. + +Let's set up a predictable generation method where the model would usually reply with +"The animal is a dog." Instead, we force the model to say either "The animal is a cat" or +"The animal is a fish" — two other common domestic pets that contradict the initial text. ### Command-line interface (CLI) -The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar. Unicode is supported. +The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar. +Unicode is supported. ```bash +# Run text generation using the CLI tool. +# Note: Use proper quotes so that the inner double quotes are preserved. transformers-cfg-cli generate \ - -m "microsoft/Phi-3-mini-4k-instruct" \ - -g "examples/grammars/json.ebnf" \ - -p "This is a valid JSON string for an HTTP request:" \ - --use_4bit \ - --max_new_tokens 60 \ - --repetition_penalty 1.1 -# {"name":"John","age":30,"car":null} + -m "facebook/opt-125m" \ + -g "examples/grammars/animal.ebnf" \ + -p 'The text says, "The animal is a dog." The answer is obvious.' \ + --max_new_tokens 60 ``` Run `transformers-cfg-cli generate --help` for available options. ### Transformers *Torch* -```py +#### Non-streaming example + +```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor if __name__ == "__main__": - # Detect if GPU is available, otherwise use CPU + # 1. Set device (GPU if available, otherwise CPU) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - print(f"Using device: {device}") + print("Using device:", device) + # 2. Specify model ID model_id = "facebook/opt-125m" - # Load model and tokenizer + # 3. Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token - model = AutoModelForCausalLM.from_pretrained(model_id).to(device) model.generation_config.pad_token_id = model.generation_config.eos_token_id - # Define grammar string - json_grammar = """ - + # 4. Define the grammar string + grammar_str = r""" root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" - """ - grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer) - grammar_processor = GrammarConstrainedLogitsProcessor(grammar) - - # Generate + # 5. Create grammar constraint and logits processor + grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) + logits_processor = GrammarConstrainedLogitsProcessor(grammar) + + # 6. Create prompt(s) and tokenize prompts = [ - 'The text says, "The animal is a dog." The answer is obvious. ', 'I\'m going to say "The animal is a dog." Here I go! ' - ] - input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device) - - output = model.generate( - input_ids, + 'The text says, "The animal is a dog." The answer is obvious.', + 'I\'m going to say "The animal is a dog." Here I go!' + ] + inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device) + + # 7. Generate outputs + outputs = model.generate( + inputs, max_length=50, - logits_processor=[grammar_processor], + logits_processor=[logits_processor], repetition_penalty=1.1, - num_return_sequences=1, + num_return_sequences=1 ) - # Decode output - generations = tokenizer.batch_decode(output, skip_special_tokens=True) - - # Print all generations in for loop - for generation in generations: - print(generation) - + # 8. Decode and print generated text + results = tokenizer.batch_decode(outputs, skip_special_tokens=True) + for result in results: + print(result) ``` -#### Stream +#### Streaming example
+Streaming example -```py +```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor if __name__ == "__main__": - # Detect if GPU is available, otherwise use CPU + # 1. Set device (GPU if available, otherwise CPU) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - print(f"Using device: {device}") + print("Using device:", device) + # 2. Specify model ID model_id = "facebook/opt-125m" - # Load model and tokenizer + # 3. Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token - model = AutoModelForCausalLM.from_pretrained(model_id).to(device) model.generation_config.pad_token_id = model.generation_config.eos_token_id - # Define grammar as a string - grammar_str = """ - + # 4. Define the grammar string + grammar_str = r""" root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" - """ + # 5. Create grammar constraint and logits processor grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) - grammar_processor = GrammarConstrainedLogitsProcessor(grammar) - - # Generate - prompts = [ - 'The text says, "The animal is a dog." The answer is obvious. ', #'I\'m going to say "The animal is a dog." Here I go! ' - ] - input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device) - - # Set up streaming + logits_processor = GrammarConstrainedLogitsProcessor(grammar) + + # 6. Create prompt and tokenize + prompts = ['The text says, "The animal is a dog." The answer is obvious.'] + inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device) + + # 7. Set up the streamer for output streamer = TextStreamer(tokenizer) - - output = model.generate( - input_ids, + + # 8. Generate outputs using streaming + outputs = model.generate( + inputs, max_length=50, - logits_processor=[grammar_processor], + logits_processor=[logits_processor], repetition_penalty=1.1, num_return_sequences=1, streamer=streamer ) - ```
-### Transformers *Pipeline* +### Transformers pipeline
+Streaming example -```py +```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor -# Load model and tokenizer +# 1. Specify model ID model_id = "facebook/opt-125m" +# 2. Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token - -# Detect if GPU is available, otherwise use CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - model = AutoModelForCausalLM.from_pretrained(model_id).to(device) -# Define grammar string -json_grammar = """ - +# 3. Define the grammar string +grammar_str = r""" root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" - """ -grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer) -grammar_processor = GrammarConstrainedLogitsProcessor(grammar) +# 4. Create grammar constraint and logits processor +grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) +logits_processor = GrammarConstrainedLogitsProcessor(grammar) -# Initialize pipeline -pipe = pipeline( +# 5. Initialize the text-generation pipeline +text_pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", max_new_tokens=100, - batch_size=2, + batch_size=2 ) -# Generate text -generations = pipe( - [ - 'The text says, "The animal is a dog." The answer is obvious. ', - 'I\'m going to say "The animal is a dog." Here I go! ' - ], - do_sample=False, - logits_processor=[grammar_processor], -) - -# Print results -for generation_group in generations: - for generation in generation_group: - print(generation['generated_text']) +# 6. Define prompts and generate text +prompts = [ + 'The text says, "The animal is a dog." The answer is obvious.', + 'I\'m going to say "The animal is a dog." Here I go!' +] +generations = text_pipe(prompts, do_sample=False, logits_processor=[logits_processor]) +for group in generations: + for gen in group: + print(gen["generated_text"]) ```
-### LlamaCPP Python -Use the `llama-cpp-python` adapter, automatically loadable with the `adapter` parameter. +### LlamaCPP python + +#### Non-streaming example -```py +```python import io -import torch import logging from contextlib import redirect_stderr from llama_cpp import Llama @@ -282,50 +282,91 @@ from transformers import AutoTokenizer logging.basicConfig(level=logging.INFO) -# Define your EBNF grammar (you can replace this with your own) -ebnf_grammar = """ - +# 1. Define the EBNF grammar string +grammar_str = r""" root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" +""" - """ - -# Load the tokenizer matching your model +# 2. Load the tokenizer for the model tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b") -# Redirect stderr and load the model via llama-cpp-python +# 3. Load the model using llama-cpp-python (suppress stderr) f = io.StringIO() with redirect_stderr(f): model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False) -# Create the grammar constraint and the logits processor with the new parameter. -grammar_constraint = IncrementalGrammarConstraint(ebnf_grammar, "root", tokenizer) -grammar_processor = GrammarConstrainedLogitsProcessor(grammar_constraint, adapter="llama-cpp-python") +# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter) +grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) +logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") + +# 5. Define the prompt and generate a completion (non-streaming) +prompt = 'The text says, "The animal is a dog." The answer is obvious.' +response = model.create_completion( + prompt=prompt, + logits_processor=[logits_processor], + max_tokens=100, +) +print(response["choices"][0]["text"]) +``` + +#### Streaming example + +
+Streaming example + +```python +import io +import logging +from contextlib import redirect_stderr +from llama_cpp import Llama +from transformers_cfg.grammar_utils import IncrementalGrammarConstraint +from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor +from transformers import AutoTokenizer + +logging.basicConfig(level=logging.INFO) -# Define a prompt. -prompt = """The text says, "The animal is a dog." The answer is obvious. """ +# 1. Define the EBNF grammar string +grammar_str = r""" + root ::= "The animal is a " animal "." + animal ::= "cat" | "fish" +""" -# Use the text completion API with the logits processor. +# 2. Load the tokenizer for the model +tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b") + +# 3. Load the model using llama-cpp-python (suppress stderr) +f = io.StringIO() +with redirect_stderr(f): + model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False) + +# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter) +grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) +logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") + +# 5. Define the prompt and generate a completion using streaming +prompt = 'The text says, "The animal is a dog." The answer is obvious.' response = model.create_completion( stream=True, prompt=prompt, - logits_processor=[grammar_processor], + logits_processor=[logits_processor], max_tokens=100, ) +# 6. Print tokens as they are received for token in response: - token_text = token["choices"][0]["text"] - print(token_text, end="", flush=True) - + print(token["choices"][0]["text"], end="", flush=True) ``` -## 💡 Why use `transformers-cfg`? +
+ +## 💡 Why use transformers-cfg? -- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description. -- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement. -- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library. -- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗). +- **EBNF grammar support:** Uses Extended Backus-Naur Form (EBNF) for grammar description. +- **Seamless integration:** Compatible with the llama-cpp project for easy replacement. +- **Broad model compatibility:** Works with all models in the 🤗 Transformers library. +- **Multilingual grammar support:** Enables grammars in various languages, including Chinese (中文), Japanese (日本語), + Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗). ## 🤔 What is a grammar? @@ -359,94 +400,88 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama- ## ✅ Supported models -### Qwen -
-Qwen - -- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) ≤ 2.5 - -
- -### Meta (LLaMa) -
-Meta (LLaMa) - -- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf) ≤ 3.0 -- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b) -- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded) -- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319) -- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) -- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b) -- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) -- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) -- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer) -- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) -- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) -- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) -- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) -- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2) -- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b) -- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ) -- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat) -- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat) -- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B) - -
- -### GPT -
-GPT - -- [GPT](https://huggingface.co/openai-community/gpt2) ≤ 2 -- [gpt2](https://huggingface.co/gpt2) -- [distilgpt2](https://huggingface.co/distilgpt2) -- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large) -- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl) -- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium) -- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) - -
- -### Mistral -
-Mistral - -- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) ≤ 0.3 -- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) -- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) - -
- -### Falcon -
-Falcon - -- [Falcon](https://huggingface.co/tiiuae/falcon-7b) -- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) -- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) - -
- -### OPT -
-OPT - -- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844) -- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) -- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) -- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) -- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) -- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b) +
+Qwen (≤ 2.5) + +- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) + +
+ +
+Meta (LLaMa) (≤ 3.0) + +- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf) +- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b) +- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded) +- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319) +- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) +- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b) +- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) +- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) +- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer) +- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) +- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) +- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) +- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) +- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2) +- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b) +- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ) +- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat) +- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat) +- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B) + +
+ +
+GPT (≤ 2) + +- [GPT](https://huggingface.co/openai-community/gpt2) +- [gpt2](https://huggingface.co/gpt2) +- [distilgpt2](https://huggingface.co/distilgpt2) +- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large) +- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl) +- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium) +- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) + +
+ +
+Mistral (≤ 0.3) + +- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) +- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) +- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) + +
+ +
+Falcon (≤ 3) + +- [Falcon](https://huggingface.co/tiiuae/falcon-7b) +- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) +- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) + +
+ +
+OPT (≤ 125m) + +- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844) +- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) +- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) +- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) +- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) +- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b)
-See [supported_models.yaml](docs/supported_models.yaml) for the full list whose extent is constantly being updated. +See [supported_models.yaml](docs/supported_models.yaml) for the full list whose content is constantly being updated. -If you encounter an unsupported model, please open an issue or submit a pull request. +If you encounter an unsupported model, please open an issue or create a pull request. ## 📖 Citation -If you find this work useful, please cite it with the reccomended citation: +If you find this work useful, please cite it with the recommended citation: ```bibtex @inproceedings{geng-etal-2023-grammar, @@ -468,4 +503,5 @@ This project is licensed under the [MIT License](LICENSE). ## 🙌 Acknowledgements -Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on [llama-cpp](https://github.com/ggerganov/llama.cpp). +Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on +[llama-cpp](https://github.com/ggerganov/llama.cpp). From 94bd0da80901efa729914b772d888f41dc6bd767 Mon Sep 17 00:00:00 2001 From: URRO Date: Sun, 9 Mar 2025 13:59:42 -0400 Subject: [PATCH 2/5] Update README.md --- README.md | 442 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 226 insertions(+), 216 deletions(-) diff --git a/README.md b/README.md index 8ba4dea..7ab7e71 100644 --- a/README.md +++ b/README.md @@ -7,33 +7,37 @@ ### Latest experimental +#### **Features** +
-#### Features - LlamaCPP Python wrapper support ([#116](https://github.com/epfl-dlab/transformers-CFG/pull/116)) -#### Bug fixes +
+ +#### **Bug fixes** + +
+ - `pip show` license ([#117](https://github.com/epfl-dlab/transformers-CFG/pull/117))
### Latest stable +#### **[v0.2.7](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02) -#### [V0.2.7 latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7) (2025-03-02) - -#### Features +#### **Features** - Types and MLX ([#93](https://github.com/epfl-dlab/transformers-CFG/pull/93)) -- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94), - [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95), - [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), - [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104)) +- Negation ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94)) +- Wildcards ([#95](https://github.com/epfl-dlab/transformers-CFG/pull/95)) +- Repetition brackets ([#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104)) - Qwen2 and Qwen2.5 ([#97](https://github.com/epfl-dlab/transformers-CFG/pull/97)) -- Reusable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100)) -- Pytest for testing ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109)) -- GitHub Actions workflow for automation ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110)) +- Resuable logits processor ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100)) +- Pytest ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109)) +- GitHub Actions workflow ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110)) -#### Bug fixes +#### **Bug fixes** - Avoid computing full masks and optimized type additions ([#101](https://github.com/epfl-dlab/transformers-CFG/pull/101)) - Refactored grammar encoding to improve structure ([#99](https://github.com/epfl-dlab/transformers-CFG/pull/99)) @@ -45,22 +49,17 @@ - **[Gemma-2](https://github.com/epfl-dlab/transformers-CFG/pull/75)** — @fillassuncao (2024-08-16) - **[DeepSeek](https://github.com/epfl-dlab/transformers-CFG/pull/73)** (2024-07-24) - **LLaMA-3** (2024-07-08) -- **[JSON schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13) +- **[JSON Schema](examples/grammars/custom_json_grammars/README.md)** (2024-05-13) - **Mask optimization** (2024-04-25) - **[Phi](https://github.com/epfl-dlab/transformers-CFG/issues/34)** (2024-04-16) - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10) - **Unicode and foreign text** (2024-02-29) - **Text-Generation-WebUI** (2023-12-17) - - We are pleased to announce that `transformers-cfg` has been integrated into the - [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to - leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)). + - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)). ## 🚀 Introduction -Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) -library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face -Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers -parallel for LlamaCPP's GBNF support, but with stricter generation rules. +Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules. ## 💻 Installation @@ -80,198 +79,231 @@ For the latest updates, install directly from GitHub: pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main ``` -## 🔧 Grammar quickstart +## 💡 Why use `transformers-cfg`? + +- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description. +- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement. +- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library. +- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗). + +## 🤔 What is a grammar? + +Think of it as an enhanced version of regular expressions. + +### Valid JSON object + +```bnf +root ::= object +object ::= "{" pair ("," pair)* "}" +pair ::= string ":" value +string ::= '"' [a-zA-Z0-9]* '"' +value ::= string | object | "true" | "false" | "null" +``` + +For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md). -Let's set up a predictable generation method where the model would usually reply with -"The animal is a dog." Instead, we force the model to say either "The animal is a cat" or -"The animal is a fish" — two other common domestic pets that contradict the initial text. +## 🔧 Grammar quickstart +Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text. ### Command-line interface (CLI) -The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar. -Unicode is supported. +The `transformers-cfg-cli` tool enables text generation using a model and a specified grammar. Unicode is supported. ```bash -# Run text generation using the CLI tool. -# Note: Use proper quotes so that the inner double quotes are preserved. transformers-cfg-cli generate \ -m "facebook/opt-125m" \ -g "examples/grammars/animal.ebnf" \ -p 'The text says, "The animal is a dog." The answer is obvious.' \ - --max_new_tokens 60 + --max_new_tokens 50 \ +# The animal is a cat. ``` Run `transformers-cfg-cli generate --help` for available options. ### Transformers *Torch* -#### Non-streaming example - -```python +```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor if __name__ == "__main__": - # 1. Set device (GPU if available, otherwise CPU) + # Set device: use GPU if available, else CPU. device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - print("Using device:", device) + print(f"Using device: {device}") - # 2. Specify model ID + # Model identifier model_id = "facebook/opt-125m" - # 3. Load tokenizer and model + # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained(model_id).to(device) model.generation_config.pad_token_id = model.generation_config.eos_token_id - # 4. Define the grammar string - grammar_str = r""" + # Define grammar string + grammar_str = """ root ::= "The animal is a " animal "." animal ::= "cat" | "fish" """ - # 5. Create grammar constraint and logits processor + # Create grammar constraint and logits processor grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) - logits_processor = GrammarConstrainedLogitsProcessor(grammar) - - # 6. Create prompt(s) and tokenize + grammar_processor = GrammarConstrainedLogitsProcessor(grammar) + + # Define prompts prompts = [ 'The text says, "The animal is a dog." The answer is obvious.', 'I\'m going to say "The animal is a dog." Here I go!' ] - inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device) + + # Tokenize prompts + input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device) - # 7. Generate outputs - outputs = model.generate( - inputs, + # Generate constrained text + output = model.generate( + input_ids, max_length=50, - logits_processor=[logits_processor], + logits_processor=[grammar_processor], repetition_penalty=1.1, - num_return_sequences=1 + num_return_sequences=1, ) - # 8. Decode and print generated text - results = tokenizer.batch_decode(outputs, skip_special_tokens=True) - for result in results: - print(result) + # Decode and print generated text + generations = tokenizer.batch_decode(output, skip_special_tokens=True) + for generation in generations: + print(generation) + +# The animal is a cat. ``` -#### Streaming example +#### Stream
-Streaming example -```python +```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor if __name__ == "__main__": - # 1. Set device (GPU if available, otherwise CPU) + # Set device: use GPU if available, else CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - print("Using device:", device) + print(f"Using device: {device}") - # 2. Specify model ID + # Model identifier model_id = "facebook/opt-125m" - # 3. Load tokenizer and model + # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained(model_id).to(device) model.generation_config.pad_token_id = model.generation_config.eos_token_id - # 4. Define the grammar string - grammar_str = r""" + # Define grammar string + grammar_str = """ root ::= "The animal is a " animal "." animal ::= "cat" | "fish" """ - # 5. Create grammar constraint and logits processor + # Create grammar constraint and logits processor grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) - logits_processor = GrammarConstrainedLogitsProcessor(grammar) - - # 6. Create prompt and tokenize - prompts = ['The text says, "The animal is a dog." The answer is obvious.'] - inputs = tokenizer(prompts, return_tensors="pt", padding=True, add_special_tokens=False)["input_ids"].to(device) + grammar_processor = GrammarConstrainedLogitsProcessor(grammar) + + # Define prompt + prompts = [ + 'The text says, "The animal is a dog." The answer is obvious.' + ] - # 7. Set up the streamer for output + # Tokenize prompt + input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device) + + # Set up streaming streamer = TextStreamer(tokenizer) - - # 8. Generate outputs using streaming - outputs = model.generate( - inputs, + + # Generate constrained text with streaming. + model.generate( + input_ids, max_length=50, - logits_processor=[logits_processor], + logits_processor=[grammar_processor], repetition_penalty=1.1, num_return_sequences=1, streamer=streamer ) + +# The animal is a cat. ```
-### Transformers pipeline +### Transformers *Pipeline*
-Streaming example -```python +```py import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor -# 1. Specify model ID +# Model identifier model_id = "facebook/opt-125m" -# 2. Load tokenizer and model +# Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = AutoModelForCausalLM.from_pretrained(model_id).to(device) -# 3. Define the grammar string -grammar_str = r""" +# Define grammar string +grammar_str = """ root ::= "The animal is a " animal "." animal ::= "cat" | "fish" """ -# 4. Create grammar constraint and logits processor +# Create grammar constraint and logits processor grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) -logits_processor = GrammarConstrainedLogitsProcessor(grammar) +grammar_processor = GrammarConstrainedLogitsProcessor(grammar) -# 5. Initialize the text-generation pipeline -text_pipe = pipeline( +# Initialize text generation pipeline +pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, device_map="auto", max_new_tokens=100, - batch_size=2 + batch_size=2, ) -# 6. Define prompts and generate text +# Define prompts prompts = [ 'The text says, "The animal is a dog." The answer is obvious.', 'I\'m going to say "The animal is a dog." Here I go!' ] -generations = text_pipe(prompts, do_sample=False, logits_processor=[logits_processor]) -for group in generations: - for gen in group: - print(gen["generated_text"]) + +# Generate constrained text using the pipeline. +generations = pipe( + prompts, + do_sample=False, + logits_processor=[grammar_processor], +) + +# Print generated texts +for generation_group in generations: + for generation in generation_group: + print(generation['generated_text']) + +# The animal is a cat. ```
-### LlamaCPP python +### LlamaCPP Python +Use the `llama-cpp-python` adapter, automatically loadable with the `adapter` parameter. -#### Non-streaming example - -```python +```py import io import logging from contextlib import redirect_stderr @@ -282,40 +314,43 @@ from transformers import AutoTokenizer logging.basicConfig(level=logging.INFO) -# 1. Define the EBNF grammar string -grammar_str = r""" - root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" +# Define grammar string. +grammar_str = """ +root ::= "The animal is a " animal "." +animal ::= "cat" | "fish" """ -# 2. Load the tokenizer for the model +# Load the tokenizer matching the model. tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b") -# 3. Load the model using llama-cpp-python (suppress stderr) -f = io.StringIO() -with redirect_stderr(f): +# Redirect stderr and load the model via llama-cpp-python. +with redirect_stderr(io.StringIO()): model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False) -# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter) +# Create grammar constraint and logits processor using the adapter. grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) -logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") +grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") -# 5. Define the prompt and generate a completion (non-streaming) +# Define prompt. prompt = 'The text says, "The animal is a dog." The answer is obvious.' + +# Generate constrained text (non-streaming). response = model.create_completion( prompt=prompt, - logits_processor=[logits_processor], + logits_processor=[grammar_processor], max_tokens=100, ) + +# Print generated text. print(response["choices"][0]["text"]) -``` -#### Streaming example +# The animal is a cat. +``` +#### Stream
-Streaming example -```python +```py import io import logging from contextlib import redirect_stderr @@ -326,67 +361,42 @@ from transformers import AutoTokenizer logging.basicConfig(level=logging.INFO) -# 1. Define the EBNF grammar string -grammar_str = r""" - root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" +# Define grammar string +grammar_str = """ +root ::= "The animal is a " animal "." +animal ::= "cat" | "fish" """ -# 2. Load the tokenizer for the model +# Load the tokenizer matching the model tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b") -# 3. Load the model using llama-cpp-python (suppress stderr) -f = io.StringIO() -with redirect_stderr(f): +# Redirect stderr and load the model via llama-cpp-python +with redirect_stderr(io.StringIO()): model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False) -# 4. Create grammar constraint and logits processor (using the llama-cpp-python adapter) +# Create grammar constraint and logits processor using the adapter grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) -logits_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") +grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") -# 5. Define the prompt and generate a completion using streaming +# Define prompt. prompt = 'The text says, "The animal is a dog." The answer is obvious.' + +# Generate constrained text with streaming response = model.create_completion( stream=True, prompt=prompt, - logits_processor=[logits_processor], + logits_processor=[grammar_processor], max_tokens=100, ) -# 6. Print tokens as they are received +# Stream and print generated text for token in response: print(token["choices"][0]["text"], end="", flush=True) -``` - -
- -## 💡 Why use transformers-cfg? - -- **EBNF grammar support:** Uses Extended Backus-Naur Form (EBNF) for grammar description. -- **Seamless integration:** Compatible with the llama-cpp project for easy replacement. -- **Broad model compatibility:** Works with all models in the 🤗 Transformers library. -- **Multilingual grammar support:** Enables grammars in various languages, including Chinese (中文), Japanese (日本語), - Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗). -## 🤔 What is a grammar? - -Think of it as an enhanced version of regular expressions. - -### Valid JSON object - -```bnf -root ::= object -object ::= "{" pair ("," pair)* "}" -pair ::= string ":" value -string ::= '"' [a-zA-Z0-9]* '"' -value ::= string | object | "true" | "false" | "null" +# The animal is a cat. ``` -For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md). - -## 🛠 JSON schema - -Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md). +
## 📜 Grammar collection @@ -398,90 +408,91 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama- - [chess.ebnf](examples/grammars/chess.ebnf): Valid chess moves. - [arithmetic.ebnf](examples/grammars/arithmetic.ebnf): Valid arithmetic expressions. -## ✅ Supported models +## 🛠 JSON schema -
-Qwen (≤ 2.5) +Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md). -- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) +## ✅ Supported tokenizers -
-
-Meta (LLaMa) (≤ 3.0) - -- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf) -- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b) -- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded) -- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319) -- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) -- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b) -- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) -- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) -- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer) -- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) -- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) -- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) -- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) -- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2) -- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b) -- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ) -- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat) -- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat) -- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B) +### 🤖 Tested models -
+
+Qwen (≤ 2.5) + +- [Qwen2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) +- [Qwen2.5]() -
-GPT (≤ 2) +
-- [GPT](https://huggingface.co/openai-community/gpt2) -- [gpt2](https://huggingface.co/gpt2) -- [distilgpt2](https://huggingface.co/distilgpt2) -- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large) -- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl) -- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium) -- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) +
+LLaMa (≤ 3.3) -
+- [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b) +- [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded) +- [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319) +- [NousResearch/Llama-2-7b-chat-hf](https://huggingface.co/NousResearch/Llama-2-7b-chat-hf) +- [NousResearch/Nous-Hermes-Llama2-13b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b) +- [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) +- [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) +- [fxmarty/tiny-llama-fast-tokenizer](https://huggingface.co/fxmarty/tiny-llama-fast-tokenizer) +- [TheBloke/Llama-2-7B-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ) +- [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) +- [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) +- [togethercomputer/LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) +- [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2) +- [NousResearch/Nous-Hermes-llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b) +- [TheBloke/Llama-2-7B-Chat-AWQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-AWQ) +- [h2oai/h2ogpt-4096-llama2-7b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-7b-chat) +- [h2oai/h2ogpt-4096-llama2-13b-chat](https://huggingface.co/h2oai/h2ogpt-4096-llama2-13b-chat) +- [garage-bAInd/Platypus2-7B](https://huggingface.co/garage-bAInd/Platypus2-7B) -
-Mistral (≤ 0.3) +
-- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) -- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) -- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) +
+GPT (≤ 2) -
+- [gpt2](https://huggingface.co/gpt2) +- [distilgpt2](https://huggingface.co/distilgpt2) +- [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large) +- [openai-community/gpt2-xl](https://huggingface.co/openai-community/gpt2-xl) +- [openai-community/gpt2-medium](https://huggingface.co/openai-community/gpt2-medium) +- [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) -
-Falcon (≤ 3) +
-- [Falcon](https://huggingface.co/tiiuae/falcon-7b) -- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) -- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) +
+Mistral (≤ 0.3) -
+- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) +- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) -
-OPT (≤ 125m) +
-- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844) -- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) -- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) -- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) -- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) -- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b) +
+Falcon (≤ 3.0) -
+- [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) +- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) + +
-See [supported_models.yaml](docs/supported_models.yaml) for the full list whose content is constantly being updated. +
+OPT + +- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) +- [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) +- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) +- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) +- [facebook/opt-13b](https://huggingface.co/facebook/opt-13b) + +
-If you encounter an unsupported model, please open an issue or create a pull request. +If you encounter an unsupported model, please open an issue or submit a pull request. ## 📖 Citation -If you find this work useful, please cite it with the recommended citation: +If you find this work useful, please cite it with the reccomended citation: ```bibtex @inproceedings{geng-etal-2023-grammar, @@ -503,5 +514,4 @@ This project is licensed under the [MIT License](LICENSE). ## 🙌 Acknowledgements -Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on -[llama-cpp](https://github.com/ggerganov/llama.cpp). +Derived from [torch-grammars](https://github.com/Shopify/torch-grammar), which was based on [llama-cpp](https://github.com/ggerganov/llama.cpp). From 28d77bfc04efd11618b20e7d4f3ab373bc34e48e Mon Sep 17 00:00:00 2001 From: URRO Date: Sun, 9 Mar 2025 14:01:01 -0400 Subject: [PATCH 3/5] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7ab7e71..1436b3e 100644 --- a/README.md +++ b/README.md @@ -55,11 +55,11 @@ - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10) - **Unicode and foreign text** (2024-02-29) - **Text-Generation-WebUI** (2023-12-17) - - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)). + - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([PR](https://github.com/oobabooga/text-generation-webui/pull/4953)). ## 🚀 Introduction -Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules. +Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([PR](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules. ## 💻 Installation From ddd0380c10bfb00705d3934425248739a4b772d5 Mon Sep 17 00:00:00 2001 From: URRO Date: Sun, 9 Mar 2025 14:04:34 -0400 Subject: [PATCH 4/5] Update README.md --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 1436b3e..262836c 100644 --- a/README.md +++ b/README.md @@ -113,7 +113,7 @@ The `transformers-cfg-cli` tool enables text generation using a model and a spec transformers-cfg-cli generate \ -m "facebook/opt-125m" \ -g "examples/grammars/animal.ebnf" \ - -p 'The text says, "The animal is a dog." The answer is obvious.' \ + -p 'The text says, "The animal is a dog." The answer is obvious. ' \ --max_new_tokens 50 \ # The animal is a cat. ``` @@ -154,8 +154,8 @@ if __name__ == "__main__": # Define prompts prompts = [ - 'The text says, "The animal is a dog." The answer is obvious.', - 'I\'m going to say "The animal is a dog." Here I go!' + 'The text says, "The animal is a dog." The answer is obvious. ', + 'I\'m going to say "The animal is a dog." Here I go! ' ] # Tokenize prompts @@ -214,7 +214,7 @@ if __name__ == "__main__": # Define prompt prompts = [ - 'The text says, "The animal is a dog." The answer is obvious.' + 'The text says, "The animal is a dog." The answer is obvious. ' ] # Tokenize prompt @@ -279,8 +279,8 @@ pipe = pipeline( # Define prompts prompts = [ - 'The text says, "The animal is a dog." The answer is obvious.', - 'I\'m going to say "The animal is a dog." Here I go!' + 'The text says, "The animal is a dog." The answer is obvious. ', + 'I\'m going to say "The animal is a dog." Here I go! ' ] # Generate constrained text using the pipeline. @@ -332,7 +332,7 @@ grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") # Define prompt. -prompt = 'The text says, "The animal is a dog." The answer is obvious.' +prompt = 'The text says, "The animal is a dog." The answer is obvious. ' # Generate constrained text (non-streaming). response = model.create_completion( @@ -379,7 +379,7 @@ grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") # Define prompt. -prompt = 'The text says, "The animal is a dog." The answer is obvious.' +prompt = 'The text says, "The animal is a dog." The answer is obvious. ' # Generate constrained text with streaming response = model.create_completion( From 06b0f83b92bf3a8d659f5fcff83792d25e098e3f Mon Sep 17 00:00:00 2001 From: URRO Date: Sun, 9 Mar 2025 17:46:03 -0400 Subject: [PATCH 5/5] Create animal.ebnf --- examples/grammars/animal.ebnf | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 examples/grammars/animal.ebnf diff --git a/examples/grammars/animal.ebnf b/examples/grammars/animal.ebnf new file mode 100644 index 0000000..a8c1c3a --- /dev/null +++ b/examples/grammars/animal.ebnf @@ -0,0 +1,2 @@ +root ::= "The animal is a " animal "." +animal ::= "cat" | "fish"