* feat: SqueezeOp lift the dependency of onnx infershape.
* feat: UnsqueezeOp lift the dependency of onnx infershape.
* feat: lift the dependency of onnx infershape
* fix: fix Makefile off nccl
* MLU CNCL base
* add FindCNCL.cmake, not find -lcncl
* bangPrintFloat not find
* docker:make sucessful, test error
* delete net file and onnxtest.py
* init
* fix cncl
* format
* fix
* format
* fix cncl
* run dist gpt2 on mlu
* format
* fix import error on mlu docker
* run llama single card
* run distributed llama2
* add test for slice/reduce on mlu
* fix cncl related test
* fix format
* format
* delete comments
* change GPU to MLU
* MLU CNCL base
* add FindCNCL.cmake, not find -lcncl
* bangPrintFloat not find
* docker:make sucessful, test error
* delete net file and onnxtest.py
* init
* fix cncl
* format
* fix
* format
* fix cncl
* run dist gpt2 on mlu
* format
* fix import error on mlu docker
* run llama single card
* run distributed llama2
* add test for slice/reduce on mlu
* fix cncl related test
* fix format
* format
* delete comments
* change GPU to MLU
* modify launch script
* fix name
* fix format
* fix gather
* format python script
---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: Bolun <chamberlain0w0@gmail.com>
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
* test: 支持编译 einnet 单元测试,但不是所有测试都能通过
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* Fix: locating resource files and skip codegen
- Change the path parameters in `matchExprResult` and `checkExprLogSame` to paths relative to the project home
- Skip NNetMemboundOp tests as they require codegen
---------
Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
* support kunlun xpu and add an operator named Add
* add sub, mul, div, pow, maximum, minimum
* add code
* add xpu code
* add code
* add matmul
* add transpose
* add unary operator
* add unary operator
* add some operator
* add code
* support run resnet18 on xpu
* add code
* add max pool2d
* fix xpu code, let it can run.
* 添加XPU算子 (#120)
* add floordiv for xpu
* add batchnorm for xpu
* add more cast types for xpu
* add conv_trans for xpu
* add pad for xpu
* add logical ops for xpu
* fix format for xpu src and include
* fix format for xpu test
* fix format for xpu src
---------
Co-authored-by: Bolun <bolunz@u.nus.edu>
* Xpu abs (#121)
* add: unary kernel for xpu
* formatting
* format
* format
* format
* fix: pointer jump
* fix optype comments
* fix bug introduced while resolving conflict
* change cmake option for kunlunxin xpu from 'xpu' to 'kunlun'; fix bug after merging distributed infrastructure
* Add doc support for xpu (#141)
* fix
* fix
* fix pooling test
* format
* format
* fix
* fix
* set cmake version requirement
* fix cmakelists
* rename xpu to kunlun
* fix
* fix format
* fix format
* fix format
* fix change name to kunlun
* format
* fix format
* clang format
* fix format
---------
Co-authored-by: root <root@localhost.localdomain>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
Co-authored-by: Bolun <bolunz@u.nus.edu>
Co-authored-by: zhangyue207 <138768300+zhangyue207@users.noreply.github.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: baominghelly <41820386+baominghelly@users.noreply.github.com>
Co-authored-by: Bolun <chamberlain0w0@gmail.com>
- Added the necessary `Dockerfile` for running the project on CPU and CUDA environment.
- Added commands to the `Makefile` for convenient Docker startup.
- Added documentation in `docs/INSTALL_GUIDE_CN.md` explaining how to launch the Docker environment.
Co-authored-by: Xiaonan Song <songxiaonan@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
* impl sqrt on CUDA
fix parser of Gather and ReduceMean
* fix test_gather
* fix test_cuda_gather
* impl sqrt cpu and add sqrt to test_cuda_unary
* cuda_unary supports arbitary shapes
* fix SplitOp with dim=-1
* fix SplitOp with dim=-1
* update doc of mlu
* delete README_CN.md. because the file has been split into INSTALL_GUIDE_CN.md and USER_GUIDE_CN.md at 2023.06.23
* remove the build Dependencies of test-cpp, avoid twice build
* fix code
---------
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
fix review
change Device::MKL to Device::INTELCPU
fix mkl linkage
fix errors according to merge from master
now can call mkl backend
fix softmax/flatten with axis from onnx.
modify README.md
fix memory refree
add env_lotus_intelcpu.sh
fix compile
merge from branch cpu_backend
fix something add gather
fix something
FIX: directory rename from "mkl" to "intelcpu"
ADD: use oneMKL dpcpp interface to implement matmul kernel.
ADD: add dpcpp as compiler for mkl, and fix warnings for clang compiling.
add dpcpp kernel for pow.
ADD: mkl kernel for pad.
ADD: slice mkl kernel.
ADD: reshape/flatten/identity mkl kernel.
ADD: split mkl kernel.
fix compile error
FIX: fix flattenObj with axis.
ADD reduce_mean mkl kernel.
Add concat mkl kernel.
bathNorm for mkl kernel.
sigmoid mkl kernel.
ADD:add mkl kernel for pooling
add more tests for softmax
Now softmax cuda kernel supports any axises.
mkl kernel for softmax
softmax
add axis to softmax operator
add mkl kernel for abs tanh
ADD: relu kernel for mkl
fix binary mkl primitives.
add mkl kernel for binary operators
fix compiler error
move stream to runtime
clang format
add MemoryFormat for tensorObj.
use post_ops for fused conv/deconv
Distinguish mkl op_timer from cuda op timer.
add act optype to conv and deconv
add operator timer
add mkl kernel for convTransposed
minor fix for group conv
do not use cblas_sgemm_batch
CpuRuntimeObj->NativeCpuRuntimeObj
add matmul op for mkl