* feat: support dynamic tensor part1
* feat: support dynamic-tensor part2
* feat: support dynamic tensor part 3
* fix: fix some ..
* - add kvcache example
* feat: support concat to identity kernel
* add a simple mempory pool for allocator
* fix: rebase to master
* fix bug after merging
* - remove outdated script
* fix: fix as review
---------
Co-authored-by: kilinchange <kilinchange@163.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
* add ceil mode for pooling
* do not print debug info for allocator by default
* fix test bugs after introducing pooling ceil mode
* fix onnx import bug
fix numInputs of batchNorm, add new line in file ending.
ADD: batch norm operator and cuda kernel.
add training
remove comments.
fix compile error.
add batch norm operator and cuda kernel.