Commit Graph

8 Commits

Author SHA1 Message Date
Chenjie Duan 51086d2b8d
Modify kernel registration & support fp16 (#205)
* - Remove dataType from the kernel registration.

* - support fp16 for conv

* - cpu kernel: adapt the new registration mechanism

* modified all register kernel

* add where fp16

* add layernorm fp16

* add split_concat fp16

* - element_wise support fp16

* feat: support transpose fp16

* feat: support sliceOp fp16

* - unary support fp16

* - feat: support reduceOp fp16

* feat: support matmulOp/expandOp fp16

* feat: support powOp int8

* add cuda cast & support half-precision for gather

* style: fix style

* feat:support int8 for gather

* style:fix style

* modified test_cuda_conv_transposed

* fix: fix dist code to support fp16

* fix(graph.cc): fix topo_sort

* fix: fix recv and send kernel registration

* feat: add field tensors for stub

* refactor(frontend): 先排序后构图

Signed-off-by: YdrMaster <ydrml@hotmail.com>

* fix: 为中间结果提供tensor到node的mapping

* fix (slice): add guard for area out of range

* fix: fix matmul fp16

* fix: fix re-dataMalloc for weight tensor and use of naive allocator

* feat: add dataType filter for cuda kernel

* feat: bang kernel adapt the new registration mechanism

* fix: fix some error on mlu

* feat: intelcpu kernel adapt the new registration mechanism

* feat: modify kernel registration on kunlun

* fix intelcpu compiler bug

* feat: bang reshape support all dataType

* fix: fix bang reduce

* fix(all_reduce.cc): fix as reviewer suggessted

* fix: fix style and restore unary test codes

---------

Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
zhangyunze 3967b437c8
fix Issue 187 split infershape wrong (#197)
* fix: fix splitOp to support unequal portions

* fix: fix as review comment

---------

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-12-28 21:39:24 +08:00
xgqdut2016 ec3adf6fa7
support 8D tensor, add test example (#170)
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-10-31 10:47:36 +08:00
PanZezhong1725 62be816f53
修复split concat当dim=0结果出错的问题 (#138)
Fix split_concat kernel not supporting dim=0

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2023-09-25 10:25:54 +08:00
kilinchange 0dc5347089
memory_allocator (#103)
* - add LazyAllocator class
- calculate memory consumption at present

* - basic function of lazy_allocator, remaining test

* - modify LazyAllocator

* - modify InfiniTensor to fit LazyAllocator

* - add setDataBlob
- modify alignment
- fix GraphObj::dataMalloc

* - modified alignment value(64bytes -> 8bytes)
- fix LazyAllocator::getPtr()
- some dubug codes and commonts
- do alignment by chaning size instead of tailAddr

* - fix some problem

* - translate chinese comments to english

* - format codes

* - fix test

* - code format

* - modify codes as YdrMaser and bitzyz suggested

* - code format

* - modify codes as constroy suggested

* - codes format

* - modify alignment on cuda

* - code format

* - add test_lazy_allocator
- fix tests where not add input tensor into graph.tensors
- fix tests where init tensor's data before calling graph->dataMallocate()

* - code format

* - remove gpu runtime in test_lazy_allocator

* - fix test_lazy_allocator: remove cuda include

* - add test

* - code format

* - add ifdef for test of allocator

* - code format

* - fix test: remove unused ifdef

* - fix bang test

* - code format

* Merge branch 'master' into dcj/memory_allocator

* fix: fix cuda conv_fp16 run fail

* fix bang_runtime.cc and cuda_runtime.cc

* - update mkl code

* - fix codes for mkl

* - code format

* - remove unused commented codes
- add an empty line at the end of the blob.cc

---------

Co-authored-by: zhangyunze <z13785159769@163.com>
2023-08-13 13:39:35 +08:00
wendy12022 86ec4036ce
ADD: add mkl runtime for intel cpu , and add mkl kernel for matmul/conv/convtransposed. (#61)
* move memory format transformation to TensorObj

clang format

add MemoryFormat for tensorObj.

use post_ops for fused conv/deconv

Distinguish mkl  op_timer from cuda op timer.

add act optype to conv and deconv

add operator timer

add mkl kernel for convTransposed

minor fix for group conv

do not use cblas_sgemm_batch

CpuRuntimeObj->NativeCpuRuntimeObj

add  matmul op for mkl

* fix: fix bugs when rebasing from master

fix: fix bugs when rebasing from master

* fix: update api after rebasing

* fix: fix format; fix onnx import

* fix: fix clang-format

* [fix] fix conv_transpose test

* [fix] use stronger test case for transposed conv

* [fix] remove tensor memory format; fix mkl transpose conv

* [fix] add FIXME tag for op_timer python api

---------

Co-authored-by: whjthu <haojie0429@gmail.com>
2023-03-27 21:28:49 +08:00
wendy12022 a4d6426589
ADD: batch norm operator and cuda kernel. (#44)
fix numInputs of batchNorm, add new line in file ending.

ADD: batch norm operator and cuda kernel.

add training

remove comments.

fix compile error.

add batch norm operator and cuda kernel.
2022-10-15 16:29:28 +08:00
wendy12022 3c6e208f42
ADD:concat/split operator and cuda kernels (#29)
* ADD:concat/split operator and cuda kernels

refector

minor change comment

ADD:concat/split operator and cuda kernels

merge split_kernel and concat_kernel to split_concat_kernel.

Revert "fix"

This reverts commit 459926be09a838658ec55f1e0a72b3cf17037d5c.

fix

ADD:concat/split operator and cuda kernels

change whole tensor name to composed tensor

fix some

remove unused header.

rebase

add CudaKernel

add test for split.

ADD split operator and cuda kernel.

modify test.

ADD:concat operator and cuda kernel.

ADD:concat/split operator and cuda kernels

fix some

remove unused header.

rebase

add CudaKernel

ADD:concat/split operator and cuda kernels

add test for split.

ADD split operator and cuda kernel.

modify test.

ADD:concat operator and cuda kernel.

* remove extra comment; typo fix.

Co-authored-by: Haojie Wang <haojie0429@gmail.com>
2022-09-29 11:01:30 +08:00