修改一些格式问题

This commit is contained in:
chaoyu@qiyuanlab.com 2024-07-16 18:41:27 +08:00
parent d4da91b29c
commit 7ca0eb0bcc
1 changed files with 1 additions and 0 deletions

View File

@ -148,6 +148,7 @@ cat pretrain.txt | python convert_txt2jsonl.py > pretrain.jsonl
```
2. jsonl格式转index。脚本位于./quick_start_clean/convert_json2index.py应用方法如下
```shell
python convert_json2index.py \
--path ../data_process/data \ #存放jsonl文件的目录