修改一些格式问题

This commit is contained in:
chaoyu@qiyuanlab.com 2024-07-16 18:34:11 +08:00
parent 04edd8740d
commit 26bb024cb4
1 changed files with 1 additions and 0 deletions

View File

@ -137,6 +137,7 @@ for line in sys.stdin:
temp_json = {"input": "", "output": line.strip()}#预训练计算Loss时只计算output部分所以input字段为空
print(json.dumps(temp_json, ensure_ascii=False))
```
脚本使用方法如下其中pretrain.txt是原始txt数据pretrain.jsonl是输出的jsonl格式数据
```shell
cat pretrain.txt | python convert_txt2jsonl.py > pretrain.jsonl