* "add softmax.cu,.cc,.h"
* Modify cuda softmax
* "modified the introduction of softmax.cu"
* "add format of cuda_softmax.h"
* "modified where.cc(.cu,.h) and softmax.cu"
* "modified format"
* Fix cpu softmax kernel
* "modified the // introduction of softmax.cu"
* "modified softmax.cu and use 1D block"
* "modified softmax.cu,format, and use 1D block"
* "introduce share mem to speed softmax"
* "reduce the input of function"
* modified the format
* remodify 2D block softmax
* remodify 1D block softmax
* modified the share memory
* add warp reduce
* conflict solve two
* remove extra space line
* solve comment
---------
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
* feat: support to sqrt op
* feat: support to erf op
* feat: support to expand op
* feat: support to where op
* fix: gather op index can be int64_t(hard coding)
* fix: some wrong use
* style: fix the format style
* test: add test for change op
* fix: rebase to master
* fix: fix matmul b compute wrong
* add expand and where kernel
* Add int64 support for cuda gather kernel
* add test_where.cc
* add "expand.(cu/cc,test,cuda),modified where.cu"
* Separate initialization of datatypes to avoid compile error
* modify where.(cu/cc/h,test), expand and clip
* Format fix
* Format fix
---------
Co-authored-by: xgqdut2016 <kenan_gewei@163.com>
Co-authored-by: panzezhong <panzezhong@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>