Commit Graph

13 Commits

Author SHA1 Message Date
zhangyunze 67b2bcb7d5
fix mlu some kernel registration & gather op (#210)
* fix: fix bang build/kernel registration | test_onnx

* delete assert float

* fix gather

* fix CMakeLists and Reshape

* fix cncl ops

* add hardsigmoid/hardswish

* fix

* add invalid datatype exception

* fix gather

* fix gather indices type

* fix gather/prelu/hardsigmoid on mlu

* fix format

* fix

---------

Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: Zhang Bolun <Chamberlain0w0@gmail.com>
2024-02-01 15:02:02 +08:00
Hardy 42032356fb
Bang cncl (#163)
* MLU CNCL base

* add FindCNCL.cmake, not find -lcncl

* bangPrintFloat not find

* docker:make sucessful, test error

* delete net file and onnxtest.py

* init

* fix cncl

* format

* fix

* format

* fix cncl

* run dist gpt2 on mlu

* format

* fix import error on mlu docker

* run llama single card

* run distributed llama2

* add test for slice/reduce on mlu

* fix cncl related test

* fix format

* format

* delete comments

* change GPU to MLU

* MLU CNCL base

* add FindCNCL.cmake, not find -lcncl

* bangPrintFloat not find

* docker:make sucessful, test error

* delete net file and onnxtest.py

* init

* fix cncl

* format

* fix

* format

* fix cncl

* run dist gpt2 on mlu

* format

* fix import error on mlu docker

* run llama single card

* run distributed llama2

* add test for slice/reduce on mlu

* fix cncl related test

* fix format

* format

* delete comments

* change GPU to MLU

* modify launch script

* fix name

* fix format

* fix gather

* format python script

---------

Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: Bolun <chamberlain0w0@gmail.com>
Co-authored-by: Bolun Zhang <48948016+Chamberlain0w0@users.noreply.github.com>
2024-01-03 13:28:03 +08:00
Hardy 5ac0ab442f
Fix bang (#198)
* fix bang batchnorm

* fix pooling test bang

* add test batchnorm

* HIGH PRECISION ACTIVATION

* fix pooling

* fix matmul

* fix test

* add layernorm

* fix softmax

* fix

* better code

* fix

* fix worlflow

* fix workflow

* fix

* fix

* fxi matmul

* add LRN

* fix lrn

* fix lrn

---------

Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: Baoming Li <1508269885@qq.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-28 13:44:10 +08:00
Haojie Wang 7a9fcd93b2
Pooling ceil mode (#155)
* add ceil mode for pooling

* do not print debug info for allocator by default

* fix test bugs after introducing pooling ceil mode

* fix onnx import bug
2023-10-09 20:51:39 +08:00
zhangyunze 1438f14a25
fix: fix castType mlu (#117) 2023-08-22 14:54:32 +08:00
kilinchange 0dc5347089
memory_allocator (#103)
* - add LazyAllocator class
- calculate memory consumption at present

* - basic function of lazy_allocator, remaining test

* - modify LazyAllocator

* - modify InfiniTensor to fit LazyAllocator

* - add setDataBlob
- modify alignment
- fix GraphObj::dataMalloc

* - modified alignment value(64bytes -> 8bytes)
- fix LazyAllocator::getPtr()
- some dubug codes and commonts
- do alignment by chaning size instead of tailAddr

* - fix some problem

* - translate chinese comments to english

* - format codes

* - fix test

* - code format

* - modify codes as YdrMaser and bitzyz suggested

* - code format

* - modify codes as constroy suggested

* - codes format

* - modify alignment on cuda

* - code format

* - add test_lazy_allocator
- fix tests where not add input tensor into graph.tensors
- fix tests where init tensor's data before calling graph->dataMallocate()

* - code format

* - remove gpu runtime in test_lazy_allocator

* - fix test_lazy_allocator: remove cuda include

* - add test

* - code format

* - add ifdef for test of allocator

* - code format

* - fix test: remove unused ifdef

* - fix bang test

* - code format

* Merge branch 'master' into dcj/memory_allocator

* fix: fix cuda conv_fp16 run fail

* fix bang_runtime.cc and cuda_runtime.cc

* - update mkl code

* - fix codes for mkl

* - code format

* - remove unused commented codes
- add an empty line at the end of the blob.cc

---------

Co-authored-by: zhangyunze <z13785159769@163.com>
2023-08-13 13:39:35 +08:00
Derui Yang 57ac94d893
refactor(core): 添加新的 `OpType` 定义 (#99)
* feat: 添加新的 OpType 定义

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* refactor: 使用新的 OpType 替换原来的,修改整个项目

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: onnx 导入

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修正 cuda 和 bang kernel 的问题

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 过滤 bang test

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 过滤 bang test

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix bang code.

* fix code on bang

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 删除指定文件

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 删两个没用的文件,去掉一个不知道为什么的注释

Signed-off-by: YdrMaster <ydrml@hotmail.com>

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
2023-08-07 11:17:05 +08:00
YdrMaster 26f0d13c26
Dev for 202303ddl (#66)
* add activation operatiopn relu, tanh, sigmoid on mlu

* commit for format

* add activation backward operation

* add test for activation_backward

* add test

* add convbpfilter

* fix

* add transpsoe code and test

* add trigon function operation on mlu: sin,cos,tan,asin,sinh,asinh

* add copy operation on mlu

* add ceil operation and floor operation

* add operation clip

* add operation cnnl div, test and test for divdemo bangc kernel

* add divnonan operation and test

* add erf operation

* add exp operation

* add operation fill

* add log operation

* add log1p operation

* add l2loss operation

* add maximum and minimum operation

* add mseloss operation

* add negTensor operation

* add power operation

* add reciprocal operation

* add sqrt and rsqrt operation

* add transform operation

* add addn operation

* add muln operation

* cherrry pick some operation

* add floordiv operation and floordivtrunc operation

* add floormod operation

* add cumsum operation

* add det operation

* add pad operation

* format

* add concat operation

* format

* add split operation

* fix concat and split operation

* add round operation

* add pooling operation

* add square operation

* add squaredDifference operation

* code format fix

* add flip operation

* code format fix

* add hardtanh operation

* add logic operation

* add addcdiv and addcmul operation

* add arange operation

* add bitcompute operation

* add net test

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* style: rename

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 用 NativeCpuRuntime 替换 CpuRuntime

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix code

* fix code

* fix code by review suggestion

* remove operation which is not the onnx operation

* fix format

* clang format

* refactor: tensor 的 print 加一层模板的 dataToString

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: onnx 导出

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 增加计算图优化接口

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add clip operation

* feat: 支持导入 clip

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* test: 导入导出测试加入 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix batch norm

* feat: 增加 Shape 算子

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持导入 unsqueeze

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修正 clip 接口

feat: 支持导入 transpose
Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add broadcast operation

* fix elementwise-broadcast

* fix elementwise broadcast

* add broadcast for gpu elementsie

* feat: pad 支持 axes 负数

feat: 不支持的 padding 导出为独立的 pad 算子

feat: 支持导入 onnxsim 过的 inception
Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修正池化的测试

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导出 pads,支持 inception 导入导出,已加入 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持 densenet 导入导出,并加入 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导入 squeeze

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix softmax

* feat: 导出 clip 和 transpose

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持 Conv 的 bias

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: bias of conv

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: bias of conv

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导入 split

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 导出 split

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: conv

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: conv group

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: matmul 的 bias 没有放在输入里,修正

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix exmaple

* fix: 改正 reduce_mean 导出

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* refactor: 修改 slice 实现与 onnx 一致

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* style: 不导出两个 runtime 函数

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* doc: 中文使用指南

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* doc: 补全指南

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 修复导入数据的问题

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 添加 Dropout 基本结构,但不支持两个输出是不同的类型

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 重新导出优化接口

feat: dropout 导入
Signed-off-by: YdrMaster <ydrml@hotmail.com>

* build: BANG 选项加入 Makefile

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fxi code, change of test/kernels/bang/test* is use NativeCpuRuntime.
chaneg of include/bang/bang_runtime is for the cntoolkit upgrade.

* feat: 导出 bang runtime

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add USE_BANG=1

* fix matmul

* fix reshape

* fix

* fix activation

* fix transpose

* format

* format

* update Makefile

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持导入导出 ConvTranspose

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add prelu on mlu

* fix: ConvTranspose

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* feat: 支持导入导出 PRelu

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* add convtrans on mlu

* fmt

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* docs: 更新 README_CN.md

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix code by review suggestions

* style

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: Softmax 的 axis 可以用默认值?感觉是 onnx 不标准

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix cuda & intelcpu bugs after merging

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 15:10:33 +08:00
Hardy fe1afe38fa
fix code of bang conv (#76)
* fix code of bang conv

* test: 向 master push 时也执行 ci

Signed-off-by: YdrMaster <ydrml@hotmail.com>

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
2023-03-29 15:47:32 +08:00
Hardy 823e66a9ff
Support perf bang 1115 (#57)
* support matmul

* add matmul

* add matmul

* add code for cnnl matmul operation and test

* add conv

* add code for conv test on mlu

* add code for test cnnl conv on mlu

* add code for perf conv and matmul on mlu

* clang format

* fix convolution operation

* fxi cmaklist

* code format

* fix code

* code format

---------

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: wanghailu <wanghailu0717@163.com>
2023-03-29 13:52:56 +08:00
wendy12022 86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
* move memory format transformation to TensorObj

clang format

add MemoryFormat for tensorObj.

use post_ops for fused conv/deconv

Distinguish mkl  op_timer from cuda op timer.

add act optype to conv and deconv

add operator timer

add mkl kernel for convTransposed

minor fix for group conv

do not use cblas_sgemm_batch

CpuRuntimeObj->NativeCpuRuntimeObj

add  matmul op for mkl

* fix: fix bugs when rebasing from master

fix: fix bugs when rebasing from master

* fix: update api after rebasing

* fix: fix format; fix onnx import

* fix: fix clang-format

* [fix] fix conv_transpose test

* [fix] use stronger test case for transposed conv

* [fix] remove tensor memory format; fix mkl transpose conv

* [fix] add FIXME tag for op_timer python api

---------

Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
Hardy b0c2a08252
Support bang c kernel wanghailu 0927 (#43)
* fix a little bug which found by new verison CMake

* add code for support BangC language kernel , just like Cuda kernel, not
library

* add bangc kernel

* support BangC kernel

* add code for support BangC kernel

* support bangc kernel

* fix some code from reviewer

* fix code of template fumction

* add code for support bangc kernel

* fix bangc format

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-30 11:01:52 +08:00
Hardy c7c974f07a
Add bangc runtime and element-wise kernels
* add code for cambricon mlu, bang, cnnl

* add code for support cambricon mlu,cnnl,cnrt

* add code for support mlu

* add code for support cambricon cnnl

* add code for support mlu

* add code for mlu

* add code for mlu
`

* Update CMakeLists.txt

Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: zhengly123 <zhengly123@outlook.com>
2022-09-22 16:57:39 +08:00