From b0ca8fe634c35073bb156447ff45c5a8eb54aca1 Mon Sep 17 00:00:00 2001 From: Peter Pan Date: Tue, 22 Aug 2023 01:30:57 -0400 Subject: [PATCH] add rm dataset explanation Signed-off-by: Peter Pan --- data/README.md | 12 ++++++++++++ data/README_zh.md | 13 +++++++++++++ 2 files changed, 25 insertions(+) diff --git a/data/README.md b/data/README.md index fe0f7e42..45ea7dad 100644 --- a/data/README.md +++ b/data/README.md @@ -16,3 +16,15 @@ If you are using a custom dataset, please provide your dataset definition in the ``` where the `prompt` and `response` columns should contain non-empty values. The `query` column will be concatenated with the `prompt` column and used as input for the model. The `history` column should contain a list where each element is a string tuple representing a query-response pair. + +For Reward-Modeling(rm) dataset, the first n examples represent chosen examples and the last n examples represent rejected examples. +```json +{ + "instruction": "Question?", + "input": "", + "output": [ + "chosen answer", + "rejected answer" + ] +} +``` diff --git a/data/README_zh.md b/data/README_zh.md index 3be0c09d..a36b3750 100644 --- a/data/README_zh.md +++ b/data/README_zh.md @@ -16,3 +16,16 @@ ``` 其中 `prompt` 和 `response` 列应当是非空的字符串。`query` 列的内容将会和 `prompt` 列拼接作为模型输入。`history` 列应当是一个列表,其中每个元素是一个字符串二元组,分别代表用户请求和模型答复。 + +对于奖励模型(rm)的数据集,头N个输出表示`chosen`的数据,后N个输出表示`rejected`的数据,例如: +```json +{ + "instruction": "Question?", + "input": "", + "output": [ + "chosen answer", + "rejected answer" + ] +} + +```