From e0bb66c34dd5c37cd78b36f5935f9e98aafbfaae Mon Sep 17 00:00:00 2001 From: "chaoyu@qiyuanlab.com" Date: Tue, 16 Jul 2024 18:52:23 +0800 Subject: [PATCH] =?UTF-8?q?=E4=BF=AE=E6=94=B9=E4=B8=80=E4=BA=9B=E6=A0=BC?= =?UTF-8?q?=E5=BC=8F=E9=97=AE=E9=A2=98?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- quick_start_clean/readmes/quick_start.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/quick_start_clean/readmes/quick_start.md b/quick_start_clean/readmes/quick_start.md index 5c7a17a..81378de 100644 --- a/quick_start_clean/readmes/quick_start.md +++ b/quick_start_clean/readmes/quick_start.md @@ -159,8 +159,7 @@ python convert_json2index.py \ 脚本运行成功时,会有如下显示:(不需要用hadoop所以不用管hadoop: not found的警告信息) - -![alt text](https://www.osredm.com/jiuyuan/CPM-9G-8B/tree/FM_9G/quick_start_clean/readmes/055bf7ce-faab-403b-a7ee-896279bee11f.png) +![脚本运行成功后的显示](./055bf7ce-faab-403b-a7ee-896279bee11f.png) 转完后,在index的目录下会生成四个文件:data.jsonl(原先的jsonl数据)、index、index.h5、meta.json(记录数据集信息,包含 "language", "nlines", "nbytes", "length_distribute", "avg_token_per_line", "hdfs_path", "data_sample"字段)。 这里有一个meta.json的例子: