InfiniTensor/include/operators/GBMM.h

#pragma once
#include "core/operator.h"
#include <assert.h>
namespace infini {
/**
 * @brief General band matrix multiplication. See
 * https://cscproxy.mpi-magdeburg.mpg.de/mpcsc/benner/pub/brdeq-cle2014.pdf for
 * detail.
 *
 */
class GBMMObj : public OperatorObj {
  private:
    int dilation;
    ActType act;

    int b, m, w, n;

  public:
    /**
     * @brief Construct a new GBMM object.
     *
     * @param graph The computation graph that this operator belongs to.
     * @param A The input tensor.
     * @param B The input tensor.
     * @param C C is the output of G2BMM. If outputs are going to be created in
     * the constructor, C should be an empty Ref.
     * @param dilation The dilation of the attention window.
     * @param bias The bias tensor.
     * @param act The activation.
     */
    GBMMObj(GraphObj *graph, Tensor A, Tensor B, Tensor C, const int dilation,
            Tensor bias = nullptr, ActType act = ActType::None);
    OP_CLONE(GBMMObj);

    std::string toString() const override;
    optional<vector<Shape>> inferShape(const TensorVec &inputs) override;

    int numInputs() const override { return 2; }
    int numOutputs() const override { return 1; }

    int getDilation() const { return dilation; }
    Tensor getBias() const { return inputs[2]; }
    ActType getAct() const { return act; }

    int getB() const { return b; }
    int getM() const { return m; }
    int getW() const { return w; }
    int getN() const { return n; }
    auto getBMWND() const { return tuple{b, m, w, n, dilation}; }

  private:
    vector<int> getWorkloadVector() const override;
    vector<int> getOpAttrVector() const override;
};

} // namespace infini
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00			`#pragma once`
			`#include "core/operator.h"`
			`#include <assert.h>`
			`namespace infini {`
Add documentation for operators. 2023-02-13 22:48:20 +08:00			`/**`
			`* @brief General band matrix multiplication. See`
			`* https://cscproxy.mpi-magdeburg.mpg.de/mpcsc/benner/pub/brdeq-cle2014.pdf for`
			`* detail.`
			`*`
			`*/`
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00			`class GBMMObj : public OperatorObj {`
			`private:`
			`int dilation;`
			`ActType act;`

			`int b, m, w, n;`

			`public:`
			`/**`
Add documentation for operators. 2023-02-13 22:48:20 +08:00			`* @brief Construct a new GBMM object.`
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00			`*`
Add documentation for operators. 2023-02-13 22:48:20 +08:00			`* @param graph The computation graph that this operator belongs to.`
			`* @param A The input tensor.`
			`* @param B The input tensor.`
			`* @param C C is the output of G2BMM. If outputs are going to be created in`
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00			`* the constructor, C should be an empty Ref.`
Add documentation for operators. 2023-02-13 22:48:20 +08:00			`* @param dilation The dilation of the attention window.`
			`* @param bias The bias tensor.`
			`* @param act The activation.`
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00			`*/`
			`GBMMObj(GraphObj *graph, Tensor A, Tensor B, Tensor C, const int dilation,`
			`Tensor bias = nullptr, ActType act = ActType::None);`
Add search engine (#64) * Add: tensor fuid * [Intermediate state] Add: Graph ctor for OpVec * Add: clone for operators * tmp: search_engine * search: init search Engine. * Add: dummy mutator for the test of search engine * search: add print graph. * search: add partition. * search: update comments. * Fix: remain FUID in Tensor::clone * Chore: rename GUidBaseType to UidBaseType * Fix: connect NMutator to SearchEngine * Chore: output * Fix test_memboundOp: nmutator uses input runtime * Chore: clang-format * Chore: clang-format * Fix: comments in the review --------- Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> Co-authored-by: mazx <dyxdy@live.com> 2023-02-12 18:27:52 +08:00			`OP_CLONE(GBMMObj);`
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00
			`std::string toString() const override;`
support Dynamic tensor infer shape and fix memory pool (#176) * feat: support dynamic tensor part1 * feat: support dynamic-tensor part2 * feat: support dynamic tensor part 3 * fix: fix some .. * - add kvcache example * feat: support concat to identity kernel * add a simple mempory pool for allocator * fix: rebase to master * fix bug after merging * - remove outdated script * fix: fix as review --------- Co-authored-by: kilinchange <kilinchange@163.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> 2023-11-23 13:11:50 +08:00			`optional<vector<Shape>> inferShape(const TensorVec &inputs) override;`
Operators g2bmm&gbmm transplantation (#24) * Function tune and corresponding testcase. Add: Tune function in /src/kernel/cuda/conv.cc and corresponding testcase in test_conv. Fix: A little bug of perfRecord using in /src/core/runtime.cc. * Tune part debug Add: recover the code, fixed the commit error. Add: some anotations in tune function * clang formmat test * Fix: mem leak in CUDA Runtime and Conv * Fix: sync in conv and default sync in timeit * Change the way to tune operator conv. Timeit function cudNNUnfused -> Timeit function cudnnConvolutionForward. * Change: merge the common part of cudnnunfused&tune into cudnndescriptoraccess * clang test * clang-format * clang-format bash. * Added operator G2BMM and corresponding testcase. Added files related to operator G2BMM creating&calling. Added custom_ops.cuh&custom_op.h. * Add operator GBMML * new version * Fix: G2BMM and GBMM kernel bugs * Added testcase of operator GBMML * clang format * Added cmake option REQUIRE_GCC9 * Delete redundent file * Renamed class GBMML into GBMM * clang format * Reviewed. * Added cudahostcompier option. * Add: explicit CMAKE_CUDA_HOST_COMPILER * Rename gbmm kernel * Fix: nvcc warning in GBMM and G2BMM Co-authored-by: wcz112 <wcz19@mails.tsinghua.edu.cn> Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com> 2022-09-08 21:31:35 +08:00
			`int numInputs() const override { return 2; }`
			`int numOutputs() const override { return 1; }`

			`int getDilation() const { return dilation; }`
			`Tensor getBias() const { return inputs[2]; }`
			`ActType getAct() const { return act; }`

			`int getB() const { return b; }`
			`int getM() const { return m; }`
			`int getW() const { return w; }`
			`int getN() const { return n; }`
			`auto getBMWND() const { return tuple{b, m, w, n, dilation}; }`

			`private:`
			`vector<int> getWorkloadVector() const override;`
			`vector<int> getOpAttrVector() const override;`
			`};`

ADD: batch norm operator and cuda kernel. (#44) fix numInputs of batchNorm, add new line in file ending. ADD: batch norm operator and cuda kernel. add training remove comments. fix compile error. add batch norm operator and cuda kernel. 2022-10-15 16:29:28 +08:00			`} // namespace infini`