00322 Inference - Chat with models - Templates 学习笔记

大语言模型

发布日期: 2025-04-19

更新日期: 2025-04-20

前言

聊天管道指南介绍了TextGenerationPipeline以及与模型对话的聊天提示或聊天模板的概念。在这个高级管道的背后，是apply_chat_template方法。聊天模板是tokenizer的一部分，它指定了如何将对话转换为预期模型格式的一个可令牌化的字符串。

在下面的例子中，Mistral-7B-Instruct 和 Zephyr-7B 都是基于同一个基础模型进行微调的，但它们使用了不同的聊天格式。在没有聊天模板的情况下，你必须手动为每个模型编写格式化代码，即使是很小的错误也可能影响性能。聊天模板提供了一种通用的方式来格式化任何模型的聊天输入。

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
chat = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

tokenizer.apply_chat_template(chat, tokenize=False)

<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]

本指南将更详细地探讨 apply_chat_template 和聊天模板。

src link: https://huggingface.co/docs/transformers/en/chat_templating

Operating System: Ubuntu 22.04.4 LTS

参考文档

Templates

apply_chat_template

聊天应以字典列表的形式组织，其中包含role和content键。role键指定了说话人（通常是你和system之间），content键包含你的消息。对于system来说，content是对你在与其聊天时模型应该如何表现和响应的高级描述。

将您的消息传递给 apply_chat_template 进行标记化和格式化。您可以将 add_generation_prompt 设置为 True，以指示消息的开始。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", device_map="auto", torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate",},
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>

现在将标记化的聊天内容传递给 generate() 函数，以生成响应。

outputs = model.generate(tokenized_chat, max_new_tokens=128) 
print(tokenizer.decode(outputs[0]))

<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.

add_generation_prompt

add_generation_prompt 参数会添加表示响应开始的令牌。这可以确保聊天模型生成系统响应，而不是继续用户的消息。

并非所有模型都需要生成提示，有些模型，如Llama，在系统响应之前没有任何特殊的令牌。在这种情况下，add_generation_prompt没有任何效果。

tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
tokenized_chat

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

continue_final_message

continue_final_message 参数控制聊天中的最后一条消息是否应该继续生成，而不是开始一条新消息。它会删除序列结束标记，以便模型从最后一条消息继续生成。

这对于“预填充”模型响应很有用。在下面的例子中，模型生成的文本会继续 JSON 字符串，而不是开始一条新消息。当你知道如何启动它的回复时，这对于提高指令跟随的准确性非常有用。

chat = [
    {"role": "user", "content": "Can you format the answer in JSON?"},
    {"role": "assistant", "content": '{"name": "'},
]

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=True, return_dict=True, continue_final_message=True)
model.generate(**formatted_chat)

你不应该同时使用 add_generation_prompt 和 continue_final_message。前者会添加开始新消息的令牌，而后者会删除序列结束令牌。同时使用它们会返回一个错误。

TextGenerationPipeline默认将add_generation_prompt设置为True，以开始一条新消息。然而，如果聊天中的最后一条消息具有“assistant”角色，它会假设该消息是预填充内容，并切换到continue_final_message=True。这是因为大多数模型不支持多个连续的助手消息。要覆盖此行为，请显式地将continue_final_message传递给管道。

Multiple templates

一个模型可能有几个不同的模板，用于不同的用例。例如，一个模型可能有一个用于常规聊天、工具使用和RAG的模板。

当有多个模板时，聊天模板是一个字典。每个键对应一个模板的名称。apply_chat_template根据模板的名称处理多个模板。在大多数情况下，它会寻找名为”default”的模板，如果找不到，就会抛出一个错误。

对于一个调用工具的模板，如果用户传递了一个工具参数，并且存在一个tool_use模板，那么这个工具调用模板将被使用，而不是默认模板。

如果要访问其他名称的模板，请在 apply_chat_template 中将模板名称传递给 chat_template 参数。例如，如果您正在使用 RAG 模板，则应设置 chat_template=”rag”。

然而，管理多个模板可能会让人感到困惑，所以我们建议在所有用例中使用单一模板。使用 Jinja 语句，如 if tools is defined 和 {% macro %} 定义来在单一模板中封装多个代码路径。

Template selection

重要的是要设置一个与模型预训练时使用的模板格式相匹配的聊天模板格式，否则性能可能会受到影响。即使你正在进一步训练模型，如果保持聊天令牌不变，性能也会更好。

如果你正在从零开始训练模型或为聊天应用微调模型，你就有更多选择来选择模板。例如，ChatML是一种流行的格式，足够灵活，可以处理许多用例。它甚至包括对生成提示的支持，但不添加字符串开始（BOS）或字符串结束（EOS）标记。如果你的模型需要BOS和EOS标记，请设置add_special_tokens=True，并确保在你的模板中添加它们。

{%- for message in messages %}
    {{- '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n' }}
{%- endfor %}

设置以下逻辑的模板以支持生成提示。模板用 <|im_start|> 和 <|im_end|> 标记包裹每条消息，并将角色作为字符串输出。这使您可以轻松定制要训练的角色。

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

用户、系统和助手角色是聊天模板中的标准角色。我们建议在合适的情况下使用这些角色，特别是当你将模型与TextGenerationPipeline一起使用时。

<|im_start|>system
You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
I'm doing great!<|im_end|>

Model training

使用聊天模板训练模型是一种很好的方法，可以确保聊天模板与模型训练时使用的令牌相匹配。在数据集中应用聊天模板作为预处理步骤。设置add_generation_prompt=False，因为在训练过程中提示助手响应的额外令牌并无帮助。

以下是一个使用聊天模板预处理数据集的示例。

from transformers import AutoTokenizer
from datasets import Dataset

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")

chat1 = [
    {"role": "user", "content": "Which is bigger, the moon or the sun?"},
    {"role": "assistant", "content": "The sun."}
]
chat2 = [
    {"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
    {"role": "assistant", "content": "A bacterium."}
]

dataset = Dataset.from_dict({"chat": [chat1, chat2]})
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
print(dataset['formatted_chat'][0])

<|user|>
Which is bigger, the moon or the sun?</s>
<|assistant|>
The sun.</s>

完成此步骤后，您可以继续按照训练因果语言模型的教程，使用formatted_chat列。

一些标记器会添加特殊的<bos>和<eos>标记。聊天模板应该包含所有必要的特殊标记，添加额外的特殊标记通常是错误的或重复的，会损害模型性能。当使用apply_chat_template(tokenize=False)格式化文本时，请确保也将add_special_tokens=False设置为false，以避免重复使用它们。

apply_chat_template(messages, tokenize=False, add_special_tokens=False)

如果使用 apply_chat_template(tokenize=True)，这个问题就不会出现。

结语

第三百二十二篇博文写完，开心！！！！

今天，也是充满希望的一天。

LuYF-Lemon-love

https://luyf-lemon-love.space/2025/04/19/00322-inference-chat-with-models-templates-xue-xi-bi-ji/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 LuYF-Lemon-love !

深度学习大语言模型 huggingface

00323 QwQ-32B: 领略强化学习之力

2025-04-20 大语言模型

深度学习大语言模型

00321 Inference - Chat with models - Chat basics 学习笔记

2025-04-19 大语言模型

深度学习大语言模型 huggingface

00322 Inference - Chat with models - Templates 学习笔记

前言

参考文档

apply_chat_template

add_generation_prompt

continue_final_message

Multiple templates

Template selection

Model training

结语

谢谢小主！