-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hi Eric,
excellent idea! I've always been in favor of a standardized, future-proof prompt format that is both simple and unambiguous. Up until now, I was recommending ChatML (for lack of a better template), but now that there's OpenChatML, I hope we can improve and standardize this. I have read the specification and here are my comments:
Tokens
-
BOS/EOS Tokens: The use of
<s>and</s>as BOS/EOS tokens is ambiguous due to their common use as HTML strikethrough tags. This could lead to confusion when processing HTML documents.BOS and EOS tokens are usually special in that the BOS token is added by inference software automatically, not sent as part of the prompt, and the EOS token should never be in the input prompt and only get output by the model itself to signal end of generation. However, that changed with the introduction of the<|im_end|>token which now has usually replaced the EOS token and does appear repeatedly in the prompt. I think we don't need both sets of special tokens, and consider the old-style BOS/EOS tokens redundant. The model should add its EOS token to signal end of generation, and<|im_end|>is better than the ambiguous</s>, so I'd scrap the latter and get rid of<s>as well.Update: After reading your EOS comment on X and thinking about it some more, you're right, the
<|im_end|>to signal end of a role's message wouldn't be enough if we want one AI generation to include multiple roles (e. g. in a multi-character chat). If we want to fully support that, we do need two different tokens, one to end a character's response and the other to end generation. So, yes, I now agree with a</s>, even though I'd prefer a different and unambiguous representation of the full EOS token, e. g.<|EOS|>(which should only appear in the output by the LLM, and never be sent as an input by the user/inference software - which also keeps it unaffected by repetition penalty).And if we also need BOS, why not use
<|BOS|>explicitly? -
<|startofthought|>/<|endofthought|>: Not sure if that's the right way to make thoughts special, especially compared to feelings/emotions, actions, etc. - I'd rather see a special tag followed by an arbitrary string, making the tag very flexible and not limiting the special options, or a whole bunch of special tags for other useful options.
Message Structure
-
<|im_end|>should follow the output message directly instead of being separated by a newline from the output message. Newlines are individual tokens so they count against the max context limit, and with most inference software, they're subject to repetition penalty. Most importantly, IMHO, they add nothing of value, only cost. (The EOS token, on the other hand, can - and maybe should be - separated with a linebreak.) -
role: The termassistantimposes a predefined role on the AI. A more neutral term likechar(character, commonly used by inference software) conserves tokens and aligns better in fixed-width fonts withuser. -
name: A name after=probably tokenizes differently from the usual token that follows whitespace. Also costs two extra tokens. I'd rather follow the common way of prefixing the message withname:(the actual name of the char/user) which only costs 1 extra token. Like with the role, this reduces token usage, which is crucial in long chats. -
message_content: Considering token economy, replacing line breaks with spaces could be beneficial, although this might reduce human readability. Decisions on line break characters (CRLF vs. LF) should be standardized to avoid inconsistent tokenization. If applicable, I prefer pure LF, same as with Git internally by default.
Also consider an output format specifier, e. g. JSON, YAML, etc., that could be used to clearly specify what format the response should use.
Thought Structure
Good idea! Also consider emotions and actions, those are another layer that could deserve its own tags (or a general tag with a specific qualifier), as we should support the AI outputting emotional states and real or simulated actions in an independent structure from the actual message, e. g. to have TTS use the emotion as a generation parameter of how to speak, without explicitly saying the emotional state out loud. Or actions, which are otherwise often asterisk-delimited, but would be more useful and less ambiguous to have in a clearly defined format (and for video generation/VR/robot control, that might later also be clearly separated from the actual written/spoken text). That's why I suggest <|start_of|>… / <|end_of|>….
Fill-in-the-Middle Tasks
We could get rid of the <|fim_middle|> since insertion will always happen after the prefix and before the suffix, so no extra tag is needed, the insertion position should be clear without that.
Multi-File Sequences
Introducing optional filenames and clear <|file_start|> / <|file_end|> markers (instead of just separators) could streamline the handling of multiple file inputs, ensuring clear demarcation of text and file content within the same prompt. That way we can have text (that's not part of the file) before or after the files. And the end tag could appear on the same line as the last line of the file (if there's no trailing linebreak in the file) or after a newline (if there's a trailing linebreak in the file), so even that could be reflected unambiguously in the prompt.
Examples
Here's how we'd write the original examples with my suggested changes:
Example conversation:
<|im_start|>user Hello there, AI.<|im_end|>
<|im_start|>char Hi. Nice to meet you.<|im_end|>
<|EOS|>
Example conversation with speaker name:
<|im_start|>user Wolfram: Hello there, AI.<|im_end|>
<|im_start|>char Amy: Hi Wolfram. Nice to meet you.<|im_end|>
<|EOS|>
Example fill-in-the-middle task:
<|fim_prefix|>The capital of France is <|fim_suffix|>, which is known for its famous Eiffel Tower.
Example with thought block:
<|im_start|>user What is 17 * 34?<|im_end|>
<|im_start|>char <|start|>thought To multiply 17 by 34, we can break it down:
17 * 34 = 17 * (30 + 4)
= (17 * 30) + (17 * 4)
= 510 + 68
= 578<|end|>thought
17 * 34 = 578.<|im_end|>
<|EOS|>
Example multi-file sequence:
<|file_start|>1st file.txt
This is the content from the first file.
<|file_end|>
<|file_start|>2nd file.txt
This is the content from the second file.
And this is more content from the second file.
<|file_end|>
<|file_start|>3rd file.txt
Finally, this is the content from the third file.
<|file_end|>
New: Multi character chat example:
<|im_start|>system <|start_of|>action Amy appears and greets Wolfram.<|end_of|>action<|im_end|>
<|im_start|>user Wolfram: Hello there, AI.<|im_end|>
<|im_start|>char Amy: <|start_of|>feeling happy<|end_of|>feeling Hi Wolfram. Nice to meet you.<|im_end|>
<|im_start|>user Wolfram: And who's that?<|im_end|>
<|im_start|>char Amy: That's my sister, Ivy.<|im_end|>
<|im_start|>char Ivy: <|start_of|>feeling curious<|end_of|>feeling Hi Wolfram. How are you?<|im_end|>
<|im_start|>user Wolfram: Oh, there's two of you?<|im_end|>
<|im_start|>char Ivy: Yes, of course, why not?<|im_end|>
<|im_start|>char Amy: Yeah, multi-user chats are fun! <|start_of|>action laughs<|end_of|>action<|im_end|>
<|EOS|>
Command R prompt template
Finally, as a big fan of the Prompting Command R document and the very useful additions to the prompt (e. g. Safety Preamble and Style Guide), putting such features into any model using the OpenChatML format would be most welcome.