InfiniTensor/include/cuda/gather.h

#pragma once
#include "core/data_type.h"
#include "core/operator.h"
#include "operators/gather.h"

namespace infini {
struct GatherMetaData {
    // Pointer to indices
    void *indexValue;
    // Type of index values
    DataType indexType;
    // Type of input and output data
    DataType dataType;
    // Axis of the gather operation
    int axis;
    // Rank of input
    int inNDim;
    // Rank of output
    int outNDim;
    // Rank of indices
    int idxNDim;
    // Shape of output
    int outDim[4];
    // Shape of indices
    int idxDim[4];
    // Strides of indices
    int idxStride[4];
    // Strides of input
    int inStride[4];
};

inline void initGatherMetaData(GatherMetaData &metaData,
                               const Ref<OperatorObj> &_op) {
    memset(&metaData, 0, sizeof(metaData));
    auto op = as<GatherBaseObj>(_op);
    Ref<TensorObj> in = op->getInputs(0);
    Ref<TensorObj> index = op->getInputs(1);
    Ref<TensorObj> out = op->getOutput();
    metaData.indexValue = index->getRawDataPtr<void *>();
    metaData.indexType = index->getDType();
    metaData.dataType = in->getDType();
    metaData.axis = op->getAxis();
    metaData.inNDim = in->getRank();
    metaData.outNDim = out->getRank();
    metaData.idxNDim = index->getRank();
    for (int i = 0; i < metaData.outNDim; ++i)
        metaData.outDim[i] = out->getDims()[i];
    for (int i = 0; i < metaData.idxNDim; ++i) {
        metaData.idxDim[i] = index->getDims()[i];
        metaData.idxStride[i] = index->getStride()[i];
    }
    for (int i = 0; i < metaData.inNDim; ++i) {
        metaData.inStride[i] = in->getStride()[i];
    }
}
template <typename T>
void gather_kernel(T *in, T *out, GatherMetaData metaData, size_t num);

void gather_elements_kernel(void *in, void *out, GatherMetaData metaData,
                            size_t num);
} // namespace infini
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`#pragma once`
框架支持bert/gpt2模型构图 (#94) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-08-29 16:06:52 +08:00			`#include "core/data_type.h"`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`#include "core/operator.h"`
			`#include "operators/gather.h"`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00
框架支持bert/gpt2模型构图 (#94) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-08-29 16:06:52 +08:00			`namespace infini {`
			`struct GatherMetaData {`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Pointer to indices`
框架支持bert/gpt2模型构图 (#94) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-08-29 16:06:52 +08:00			`void *indexValue;`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Type of index values`
框架支持bert/gpt2模型构图 (#94) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-08-29 16:06:52 +08:00			`DataType indexType;`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Type of input and output data`
			`DataType dataType;`
			`// Axis of the gather operation`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int axis;`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Rank of input`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int inNDim;`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Rank of output`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int outNDim;`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Rank of indices`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int idxNDim;`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Shape of output`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int outDim[4];`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Shape of indices`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int idxDim[4];`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Strides of indices`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int idxStride[4];`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`// Strides of input`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00			`int inStride[4];`
框架支持bert/gpt2模型构图 (#94) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-08-29 16:06:52 +08:00			`};`
ADD: Gather operator and cuda kernel. (#41) fix a memory leak. add tests. ADD gather cuda kernel. ADD gather operator Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2022-09-29 14:44:20 +08:00
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00			`inline void initGatherMetaData(GatherMetaData &metaData,`
			`const Ref<OperatorObj> &_op) {`
			`memset(&metaData, 0, sizeof(metaData));`
			`auto op = as<GatherBaseObj>(_op);`
			`Ref<TensorObj> in = op->getInputs(0);`
			`Ref<TensorObj> index = op->getInputs(1);`
			`Ref<TensorObj> out = op->getOutput();`
			`metaData.indexValue = index->getRawDataPtr<void *>();`
			`metaData.indexType = index->getDType();`
			`metaData.dataType = in->getDType();`
			`metaData.axis = op->getAxis();`
			`metaData.inNDim = in->getRank();`
			`metaData.outNDim = out->getRank();`
			`metaData.idxNDim = index->getRank();`
			`for (int i = 0; i < metaData.outNDim; ++i)`
			`metaData.outDim[i] = out->getDims()[i];`
			`for (int i = 0; i < metaData.idxNDim; ++i) {`
			`metaData.idxDim[i] = index->getDims()[i];`
			`metaData.idxStride[i] = index->getStride()[i];`
			`}`
			`for (int i = 0; i < metaData.inNDim; ++i) {`
			`metaData.inStride[i] = in->getStride()[i];`
			`}`
			`}`
Modify kernel registration & support fp16 (#205) * - Remove dataType from the kernel registration. * - support fp16 for conv * - cpu kernel: adapt the new registration mechanism * modified all register kernel * add where fp16 * add layernorm fp16 * add split_concat fp16 * - element_wise support fp16 * feat: support transpose fp16 * feat: support sliceOp fp16 * - unary support fp16 * - feat: support reduceOp fp16 * feat: support matmulOp/expandOp fp16 * feat: support powOp int8 * add cuda cast & support half-precision for gather * style: fix style * feat:support int8 for gather * style:fix style * modified test_cuda_conv_transposed * fix: fix dist code to support fp16 * fix(graph.cc): fix topo_sort * fix: fix recv and send kernel registration * feat: add field tensors for stub * refactor(frontend): 先排序后构图 Signed-off-by: YdrMaster <ydrml@hotmail.com> * fix: 为中间结果提供tensor到node的mapping * fix (slice): add guard for area out of range * fix: fix matmul fp16 * fix: fix re-dataMalloc for weight tensor and use of naive allocator * feat: add dataType filter for cuda kernel * feat: bang kernel adapt the new registration mechanism * fix: fix some error on mlu * feat: intelcpu kernel adapt the new registration mechanism * feat: modify kernel registration on kunlun * fix intelcpu compiler bug * feat: bang reshape support all dataType * fix: fix bang reduce * fix(all_reduce.cc): fix as reviewer suggessted * fix: fix style and restore unary test codes --------- Signed-off-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com> Co-authored-by: zhangyunze <z13785159769@163.com> Co-authored-by: OdinaryWord <sx-hz@163.com> Co-authored-by: YdrMaster <ydrml@hotmail.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> 2024-01-15 11:02:13 +08:00			`template <typename T>`
			`void gather_kernel(T in, T out, GatherMetaData metaData, size_t num);`
Add GatherElements op and cuda kernel (#149) * Add GatherElements op and cuda kernel * fix format * remove print * remove unused var * fix spacing * fix format --------- Co-authored-by: panzezhong@qiyuanlab.com <panzezhong@zezhongpan> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-10-12 09:18:12 +08:00
			`void gather_elements_kernel(void in, void out, GatherMetaData metaData,`
			`size_t num);`
框架支持bert/gpt2模型构图 (#94) * feat: support to sqrt op * feat: support to erf op * feat: support to expand op * feat: support to where op * fix: gather op index can be int64_t(hard coding) * fix: some wrong use * style: fix the format style * test: add test for change op * fix: rebase to master * fix: fix matmul b compute wrong * add expand and where kernel * Add int64 support for cuda gather kernel * add test_where.cc * add "expand.(cu/cc,test,cuda),modified where.cu" * Separate initialization of datatypes to avoid compile error * modify where.(cu/cc/h,test), expand and clip * Format fix * Format fix --------- Co-authored-by: xgqdut2016 <kenan_gewei@163.com> Co-authored-by: panzezhong <panzezhong@qiyuanlab.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-08-29 16:06:52 +08:00			`} // namespace infini`