add rm dataset explanation

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2023-08-22 01:30:57 -04:00 · 2023-08-22 01:30:57 -04:00 · b0ca8fe634
parent bc7795655f
commit b0ca8fe634
2 changed files with 25 additions and 0 deletions
--- a/data/README.md
+++ b/data/README.md
@ -16,3 +16,15 @@ If you are using a custom dataset, please provide your dataset definition in the
 ```

 where the `prompt` and `response` columns should contain non-empty values. The `query` column will be concatenated with the `prompt` column and used as input for the model. The `history` column should contain a list where each element is a string tuple representing a query-response pair.
+
+For Reward-Modeling(rm) dataset, the first n examples represent chosen examples and the last n examples represent rejected examples.
+```json
+{
+    "instruction": "Question?",
+    "input": "",
+    "output": [
+       "chosen answer",
+       "rejected answer"
+    ]
+}
+```
--- a/data/README_zh.md
+++ b/data/README_zh.md
@ -16,3 +16,16 @@
 ```

 其中 `prompt` 和 `response` 列应当是非空的字符串。`query` 列的内容将会和 `prompt` 列拼接作为模型输入。`history` 列应当是一个列表，其中每个元素是一个字符串二元组，分别代表用户请求和模型答复。
+
+对于奖励模型(rm)的数据集，头N个输出表示`chosen`的数据，后N个输出表示`rejected`的数据，例如：
+```json
+{
+    "instruction": "Question?",
+    "input": "",
+    "output": [
+       "chosen answer",
+       "rejected answer"
+    ]
+}
+
+```