* 添加 MLU 平台分布式验收脚本 * add fp16 test, fix cast * fix * add onnxsim for llama * add matmul tf32 for mlu * add submodule: onnxsim_large_model * fix * modified bang_launch.py, start_single * add test for albert/opt * change file path --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
* kunlun dist inference fix * kunlun distributed * 添加昆仑芯分布式脚本以及解决运行llama遇到的问题 * set -j8 * format * move run_pytorch.py int o cuda/ * update notes --------- Co-authored-by: weijie01 <weijie01@baidu.com> Co-authored-by: wanghailu <wanghailu0717@163.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com>