* MLU CNCL base
* add FindCNCL.cmake, not find -lcncl
* bangPrintFloat not find
* docker:make sucessful, test error
* delete net file and onnxtest.py
* init
* fix cncl
* format
* fix
* format
* fix cncl
* run dist gpt2 on mlu
* format
* fix import error on mlu docker
* run llama single card
* run distributed llama2
* add test for slice/reduce on mlu
* fix cncl related test
* fix format
* format
* delete comments
* change GPU to MLU
* MLU CNCL base
* add FindCNCL.cmake, not find -lcncl
* bangPrintFloat not find
* docker:make sucessful, test error
* delete net file and onnxtest.py
* init
* fix cncl
* format
* fix
* format
* fix cncl
* run dist gpt2 on mlu
* format
* fix import error on mlu docker
* run llama single card
* run distributed llama2
* add test for slice/reduce on mlu
* fix cncl related test
* fix format
* format
* delete comments
* change GPU to MLU
* modify launch script
* fix name
* fix format
* fix gather
* format python script
---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: Bolun <chamberlain0w0@gmail.com>
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>