What is actually sent to the LLM?
This week, Stable LM 2 1.6B was released by Stability.ai, and to use it properly, the instruction format was shared.
Two versions of the model were released: pre-trained (https://huggingface.co/stabilityai/stablelm-2-1_6b) and fine-tuned for chat (https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6B). But before going ahead and playing with the model, one needs to understand what the model expects as the input. The fine-tuned version includes the instruction format to work with the model.
The instruction format or chat template is defined during the fine-tuning stage. We need to define this format to let the model know what a good answer looks like. The instruction format is specified in tokenizer_config.json (https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b/blob/main/tokenizer_config.json). StableLM 2 Zephyr 1.6B uses the following format:<|user|>
{PromptDesign}<|endoftext|>
<|assistant|>
Compare this to what Llama2 needs:
<s>[INST] <<SYS>>
{PromptDesign - You are a helpful assistant...}
<</SYS>>
{PromptDesign - Task} [/INST]
As a result - the user's input is converted to the model's input with the help of prompt design and the instruction format. And this is a big deal. According to the paper "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?" - format impacts the result even more than correct examples for in-context learning cases.
The paper "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting" explores the prompt design a little bit deeper. By changing spaces, separators, and casing, we can see performance differences of up to 76 accuracy points when evaluated using LLaMA-2-13B.
To sum up. We insert the user's request into the prompt design. Then, the prompt design is inserted into the instruction format. The resulting string is passed to the model for inference.
Resources:
https://stability.ai/news/introducing-stable-lm-2 - Introducing Stable LM 2 1.6B
https://huggingface.co/stabilityai/stablelm-2-1_6b - Stable LM 2 1.6B
https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b - StableLM 2 Zephyr 1.6B
https://huggingface.co/docs/transformers/main/en/chat_templating - Templates for Chat Models
https://arxiv.org/abs/2310.11324 - Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
https://arxiv.org/pdf/2202.12837.pdf - Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
https://cobusgreyling.medium.com/llms-contextual-demonstration-af99de936cf0 - LLMs & Contextual Demonstration

