diff --git a/README.md b/README.md index 6cc3903..262836c 100644 --- a/README.md +++ b/README.md @@ -7,27 +7,35 @@ ### Latest experimental +#### **Features** +
-#### **Features** - LlamaCPP Python wrapper support ([#116](https://github.com/epfl-dlab/transformers-CFG/pull/116)) +
+ #### **Bug fixes** + +
+ - `pip show` license ([#117](https://github.com/epfl-dlab/transformers-CFG/pull/117))
### Latest stable -#### **[v0.2.7 Latest](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02) +#### **[v0.2.7](https://github.com/epfl-dlab/transformers-CFG/releases/tag/v0.2.7)** (2025-03-02) #### **Features** - Types and MLX ([#93](https://github.com/epfl-dlab/transformers-CFG/pull/93)) -- Negation, wildcards, repetition brackets ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94), [#95](https://github.com/epfl-dlab/transformers-CFG/pull/95), [#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104)) +- Negation ([#94](https://github.com/epfl-dlab/transformers-CFG/pull/94)) +- Wildcards ([#95](https://github.com/epfl-dlab/transformers-CFG/pull/95)) +- Repetition brackets ([#96](https://github.com/epfl-dlab/transformers-CFG/pull/96), [#104](https://github.com/epfl-dlab/transformers-CFG/pull/104)) - Qwen2 and Qwen2.5 ([#97](https://github.com/epfl-dlab/transformers-CFG/pull/97)) -- Resuable `GrammarConstrainedLogitsProcessor` for efficiency ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100)) -- Pytest for testing ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109)) -- GitHub Actions workflow for automation ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110)) +- Resuable logits processor ([#100](https://github.com/epfl-dlab/transformers-CFG/pull/100)) +- Pytest ([#109](https://github.com/epfl-dlab/transformers-CFG/pull/109)) +- GitHub Actions workflow ([#110](https://github.com/epfl-dlab/transformers-CFG/pull/110)) #### **Bug fixes** @@ -47,11 +55,11 @@ - **[Online demo](http://saibo-creator.xyz:7860/)** (2024-04-10) - **Unicode and foreign text** (2024-02-29) - **Text-Generation-WebUI** (2023-12-17) - - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([Pull](https://github.com/oobabooga/text-generation-webui/pull/4953)). + - We are pleased to announce that `transformers-cfg` has been integrated into the [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui) project, allowing users to leverage CFG capabilities within this widely used text-generation interface ([PR](https://github.com/oobabooga/text-generation-webui/pull/4953)). ## 🚀 Introduction -Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([Pull](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules. +Initially developed as a pull request to the [Hugging Face Transformers](https://github.com/huggingface/transformers) library ([PR](https://github.com/huggingface/transformers/pull/27557)), `transformers-cfg` extends the Hugging Face Transformers library to support constrained decoding through context-free grammars (CFG), offering a Transformers parellel for LlamaCPP's GBNF support, but with stricter generation rules. ## 💻 Installation @@ -71,6 +79,29 @@ For the latest updates, install directly from GitHub: pip install git+https://github.com/epfl-dlab/transformers-CFG.git@main ``` +## 💡 Why use `transformers-cfg`? + +- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description. +- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement. +- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library. +- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗). + +## 🤔 What is a grammar? + +Think of it as an enhanced version of regular expressions. + +### Valid JSON object + +```bnf +root ::= object +object ::= "{" pair ("," pair)* "}" +pair ::= string ":" value +string ::= '"' [a-zA-Z0-9]* '"' +value ::= string | object | "true" | "false" | "null" +``` + +For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md). + ## 🔧 Grammar quickstart Let's set up a predictable generation method where the model would usually reply with "The animal is a dog." However, we'll force the model to say either "The animal is a cat" or "The animal is a fish," two other common domestic pets that contradict the inital text. @@ -80,13 +111,11 @@ The `transformers-cfg-cli` tool enables text generation using a model and a spec ```bash transformers-cfg-cli generate \ - -m "microsoft/Phi-3-mini-4k-instruct" \ - -g "examples/grammars/json.ebnf" \ - -p "This is a valid JSON string for an HTTP request:" \ - --use_4bit \ - --max_new_tokens 60 \ - --repetition_penalty 1.1 -# {"name":"John","age":30,"car":null} + -m "facebook/opt-125m" \ + -g "examples/grammars/animal.ebnf" \ + -p 'The text says, "The animal is a dog." The answer is obvious. ' \ + --max_new_tokens 50 \ +# The animal is a cat. ``` Run `transformers-cfg-cli generate --help` for available options. @@ -100,37 +129,39 @@ from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor if __name__ == "__main__": - # Detect if GPU is available, otherwise use CPU + # Set device: use GPU if available, else CPU. device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") + # Model identifier model_id = "facebook/opt-125m" # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token - model = AutoModelForCausalLM.from_pretrained(model_id).to(device) model.generation_config.pad_token_id = model.generation_config.eos_token_id # Define grammar string - json_grammar = """ - + grammar_str = """ root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" - """ - grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer) + # Create grammar constraint and logits processor + grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) grammar_processor = GrammarConstrainedLogitsProcessor(grammar) - # Generate + # Define prompts prompts = [ - 'The text says, "The animal is a dog." The answer is obvious. ', 'I\'m going to say "The animal is a dog." Here I go! ' - ] + 'The text says, "The animal is a dog." The answer is obvious. ', + 'I\'m going to say "The animal is a dog." Here I go! ' + ] + + # Tokenize prompts input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device) + # Generate constrained text output = model.generate( input_ids, max_length=50, @@ -139,13 +170,12 @@ if __name__ == "__main__": num_return_sequences=1, ) - # Decode output + # Decode and print generated text generations = tokenizer.batch_decode(output, skip_special_tokens=True) - - # Print all generations in for loop for generation in generations: print(generation) +# The animal is a cat. ``` #### Stream @@ -159,41 +189,42 @@ from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor if __name__ == "__main__": - # Detect if GPU is available, otherwise use CPU + # Set device: use GPU if available, else CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") + # Model identifier model_id = "facebook/opt-125m" # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token - model = AutoModelForCausalLM.from_pretrained(model_id).to(device) model.generation_config.pad_token_id = model.generation_config.eos_token_id - # Define grammar as a string + # Define grammar string grammar_str = """ - root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" - """ + # Create grammar constraint and logits processor grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) grammar_processor = GrammarConstrainedLogitsProcessor(grammar) - # Generate + # Define prompt prompts = [ - 'The text says, "The animal is a dog." The answer is obvious. ', #'I\'m going to say "The animal is a dog." Here I go! ' - ] + 'The text says, "The animal is a dog." The answer is obvious. ' + ] + + # Tokenize prompt input_ids = tokenizer(prompts, add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"].to(device) # Set up streaming streamer = TextStreamer(tokenizer) - output = model.generate( + # Generate constrained text with streaming. + model.generate( input_ids, max_length=50, logits_processor=[grammar_processor], @@ -202,6 +233,7 @@ if __name__ == "__main__": streamer=streamer ) +# The animal is a cat. ``` @@ -216,30 +248,26 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from transformers_cfg.grammar_utils import IncrementalGrammarConstraint from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor -# Load model and tokenizer +# Model identifier model_id = "facebook/opt-125m" +# Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token - -# Detect if GPU is available, otherwise use CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - model = AutoModelForCausalLM.from_pretrained(model_id).to(device) # Define grammar string -json_grammar = """ - +grammar_str = """ root ::= "The animal is a " animal "." - animal ::= "cat" | "fish" - """ -grammar = IncrementalGrammarConstraint(json_grammar, "root", tokenizer) +# Create grammar constraint and logits processor +grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) grammar_processor = GrammarConstrainedLogitsProcessor(grammar) -# Initialize pipeline +# Initialize text generation pipeline pipe = pipeline( "text-generation", model=model, @@ -249,20 +277,25 @@ pipe = pipeline( batch_size=2, ) -# Generate text +# Define prompts +prompts = [ + 'The text says, "The animal is a dog." The answer is obvious. ', + 'I\'m going to say "The animal is a dog." Here I go! ' +] + +# Generate constrained text using the pipeline. generations = pipe( - [ - 'The text says, "The animal is a dog." The answer is obvious. ', - 'I\'m going to say "The animal is a dog." Here I go! ' - ], + prompts, do_sample=False, logits_processor=[grammar_processor], ) -# Print results +# Print generated texts for generation_group in generations: for generation in generation_group: print(generation['generated_text']) + +# The animal is a cat. ``` @@ -272,7 +305,6 @@ Use the `llama-cpp-python` adapter, automatically loadable with the `adapter` pa ```py import io -import torch import logging from contextlib import redirect_stderr from llama_cpp import Llama @@ -282,70 +314,89 @@ from transformers import AutoTokenizer logging.basicConfig(level=logging.INFO) -# Define your EBNF grammar (you can replace this with your own) -ebnf_grammar = """ - - root ::= "The animal is a " animal "." - - animal ::= "cat" | "fish" - - """ +# Define grammar string. +grammar_str = """ +root ::= "The animal is a " animal "." +animal ::= "cat" | "fish" +""" -# Load the tokenizer matching your model +# Load the tokenizer matching the model. tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b") -# Redirect stderr and load the model via llama-cpp-python -f = io.StringIO() -with redirect_stderr(f): +# Redirect stderr and load the model via llama-cpp-python. +with redirect_stderr(io.StringIO()): model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False) -# Create the grammar constraint and the logits processor with the new parameter. -grammar_constraint = IncrementalGrammarConstraint(ebnf_grammar, "root", tokenizer) -grammar_processor = GrammarConstrainedLogitsProcessor(grammar_constraint, adapter="llama-cpp-python") +# Create grammar constraint and logits processor using the adapter. +grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) +grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") -# Define a prompt. -prompt = """The text says, "The animal is a dog." The answer is obvious. """ +# Define prompt. +prompt = 'The text says, "The animal is a dog." The answer is obvious. ' -# Use the text completion API with the logits processor. +# Generate constrained text (non-streaming). response = model.create_completion( - stream=True, prompt=prompt, logits_processor=[grammar_processor], max_tokens=100, ) -for token in response: - token_text = token["choices"][0]["text"] - print(token_text, end="", flush=True) +# Print generated text. +print(response["choices"][0]["text"]) +# The animal is a cat. ``` -## 💡 Why use `transformers-cfg`? +#### Stream +
-- **EBNF Grammar Support**: Uses Extended Backus-Naur Form (EBNF) for grammar description. -- **Seamless Integration**: Compatible with the llama-cpp project for easy replacement. -- **Broad Model Compatibility**: Works with all models in the 🤗 Transformers library. -- **Multilingual Grammar Support**: Enables grammars in various languages, including Chinese (中文), Japanese (日本語), Korean (한국어), Hindi (हिन्दी), Hebrew (עברית), Arabic (العربية), and emoji (🤗). +```py +import io +import logging +from contextlib import redirect_stderr +from llama_cpp import Llama +from transformers_cfg.grammar_utils import IncrementalGrammarConstraint +from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor +from transformers import AutoTokenizer -## 🤔 What is a grammar? +logging.basicConfig(level=logging.INFO) -Think of it as an enhanced version of regular expressions. +# Define grammar string +grammar_str = """ +root ::= "The animal is a " animal "." +animal ::= "cat" | "fish" +""" -### Valid JSON object +# Load the tokenizer matching the model +tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5b") -```bnf -root ::= object -object ::= "{" pair ("," pair)* "}" -pair ::= string ":" value -string ::= '"' [a-zA-Z0-9]* '"' -value ::= string | object | "true" | "false" | "null" -``` +# Redirect stderr and load the model via llama-cpp-python +with redirect_stderr(io.StringIO()): + model = Llama(model_path="qwen2.5-1.5b-q8_0.gguf", n_ctx=8000, verbose=False) -For advanced grammar debugging, see our [debugging guide](docs/debugging_custom_grammars.md). +# Create grammar constraint and logits processor using the adapter +grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer) +grammar_processor = GrammarConstrainedLogitsProcessor(grammar, adapter="llama-cpp-python") -## 🛠 JSON schema +# Define prompt. +prompt = 'The text says, "The animal is a dog." The answer is obvious. ' -Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md). +# Generate constrained text with streaming +response = model.create_completion( + stream=True, + prompt=prompt, + logits_processor=[grammar_processor], + max_tokens=100, +) + +# Stream and print generated text +for token in response: + print(token["choices"][0]["text"], end="", flush=True) + +# The animal is a cat. +``` + +
## 📜 Grammar collection @@ -357,21 +408,26 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama- - [chess.ebnf](examples/grammars/chess.ebnf): Valid chess moves. - [arithmetic.ebnf](examples/grammars/arithmetic.ebnf): Valid arithmetic expressions. -## ✅ Supported models +## 🛠 JSON schema + +Learn to create grammars for complex JSON objects in our [documentation](examples/grammars/custom_json_grammars/README.md). + +## ✅ Supported tokenizers + + +### 🤖 Tested models -### Qwen
-Qwen +Qwen (≤ 2.5) -- [Qwen](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) ≤ 2.5 +- [Qwen2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) +- [Qwen2.5]()
-### Meta (LLaMa)
-Meta (LLaMa) +LLaMa (≤ 3.3) -- [LLaMa](https://huggingface.co/baffo32/decapoda-research-llama-7B-hf) ≤ 3.0 - [huggyllama/llama-7b](https://huggingface.co/huggyllama/llama-7b) - [TinyPixel/Llama-2-7B-bf16-sharded](https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded) - [OpenAssistant/llama2-13b-orca-8k-3319](https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319) @@ -393,11 +449,9 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
-### GPT
-GPT +GPT (≤ 2) -- [GPT](https://huggingface.co/openai-community/gpt2) ≤ 2 - [gpt2](https://huggingface.co/gpt2) - [distilgpt2](https://huggingface.co/distilgpt2) - [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large) @@ -407,31 +461,25 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
-### Mistral
-Mistral +Mistral (≤ 0.3) -- [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) ≤ 0.3 - [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) - [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
-### Falcon
-Falcon +Falcon (≤ 3.0) -- [Falcon](https://huggingface.co/tiiuae/falcon-7b) - [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) - [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
-### OPT
OPT -- [OPT](https://huggingface.co/collections/facebook/opt-66ed00e15599f02966818844) - [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) - [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) - [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) @@ -440,8 +488,6 @@ We maintain a collection of grammars in `examples/grammars`, aligned with llama-
-See [supported_models.yaml](docs/supported_models.yaml) for the full list whose extent is constantly being updated. - If you encounter an unsupported model, please open an issue or submit a pull request. ## 📖 Citation diff --git a/examples/grammars/animal.ebnf b/examples/grammars/animal.ebnf new file mode 100644 index 0000000..a8c1c3a --- /dev/null +++ b/examples/grammars/animal.ebnf @@ -0,0 +1,2 @@ +root ::= "The animal is a " animal "." +animal ::= "cat" | "fish"