Chenjie Duan
51086d2b8d
Modify kernel registration & support fp16 ( #205 )
...
* - Remove dataType from the kernel registration.
* - support fp16 for conv
* - cpu kernel: adapt the new registration mechanism
* modified all register kernel
* add where fp16
* add layernorm fp16
* add split_concat fp16
* - element_wise support fp16
* feat: support transpose fp16
* feat: support sliceOp fp16
* - unary support fp16
* - feat: support reduceOp fp16
* feat: support matmulOp/expandOp fp16
* feat: support powOp int8
* add cuda cast & support half-precision for gather
* style: fix style
* feat:support int8 for gather
* style:fix style
* modified test_cuda_conv_transposed
* fix: fix dist code to support fp16
* fix(graph.cc): fix topo_sort
* fix: fix recv and send kernel registration
* feat: add field tensors for stub
* refactor(frontend): 先排序后构图
Signed-off-by: YdrMaster <ydrml@hotmail.com>
* fix: 为中间结果提供tensor到node的mapping
* fix (slice): add guard for area out of range
* fix: fix matmul fp16
* fix: fix re-dataMalloc for weight tensor and use of naive allocator
* feat: add dataType filter for cuda kernel
* feat: bang kernel adapt the new registration mechanism
* fix: fix some error on mlu
* feat: intelcpu kernel adapt the new registration mechanism
* feat: modify kernel registration on kunlun
* fix intelcpu compiler bug
* feat: bang reshape support all dataType
* fix: fix bang reduce
* fix(all_reduce.cc): fix as reviewer suggessted
* fix: fix style and restore unary test codes
---------
Signed-off-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
Co-authored-by: zhangyunze <z13785159769@163.com>
Co-authored-by: OdinaryWord <sx-hz@163.com>
Co-authored-by: YdrMaster <ydrml@hotmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
2024-01-15 11:02:13 +08:00
zhangyunze
9b10a74788
支持fp16 dtype ( #96 )
...
* add conv_half kernel
* Conv Kernel FP16
* dcj:
replace "DataType::Float32" with "op->getDType()" to support more DataType
* feat: support Float16 dtype
* fix: set default clang-format to 14 version
* fix: 按照review意见修改
* fix: add data convert to convfp16 kernel test
* test: add conv_fp16 kernel test
---------
Co-authored-by: zhangyue207 <zhangyue@qiyuanlab.com>
Co-authored-by: kilinchange <kilinchange@163.com>
2023-08-02 16:38:16 +08:00
zhengly123
a1974aabcd
NNET supports TVM backend and kernels ( #78 )
...
* Add: mutator InfoGAN minimum test
* Add: cache and padding (bugs!!)
* Add: expression reader as a cmake target
* Fix: [Intermediate] NMutator::expressionToGraph
To be fix: matmul with implicit broadcast
* Add: matmul broadcast
* Fix: GraphObj ctor should use cloneTensor
* Fix: cuBLAS failure when codegen is enabled
* Add: Exception for checkCuError
* Fix: graph OpList ctor
* Add: expr simplication for TVM
* Add: TVM headers and CMake include paths
* Add: CMake config
* Add: PackedFunc (broken)
* Fix: remove cuCtxCreate which makes TVM fails
* Fix: membound_tvm
* Fix: test_memboundOp
* Add: PRelu Expr and AsTVMVisitor
* Add: Random generator
* Add: support TVM packed function
* Fix: specify runtime
* Add: CMake support of TVM
* Add: detailed output of Matmul
* Add: comments for Matmul
* Chore: format and comments
* Chore: GraphObj::selfCheck without assert control
* Fix: CMAKE_CXX_FLAGS in CMakeLists
* fix merge bug
* update api for mkl batchnorm test
* fix lotus env
* fig header bug
---------
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
Co-authored-by: huangshuhong <huangsh19@mails.tsinghua.edu.cn>
Co-authored-by: whjthu <haojie0429@gmail.com>
2023-04-18 00:26:36 +08:00
wendy12022
a4d6426589
ADD: batch norm operator and cuda kernel. ( #44 )
...
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.
2022-10-15 16:29:28 +08:00
zhengly123
1aefc1b27e
Add python interface for CUDA operator evaluation ( #42 )
...
* Refactor: seperate data generator
* Add: python bindings for opTimer
* Fix: test_perfengine
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>
2022-09-27 10:41:12 +08:00