* fix a little bug which found by new verison CMake
* add code for support BangC language kernel , just like Cuda kernel, not
library
* add bangc kernel
* support BangC kernel
* add code for support BangC kernel
* support bangc kernel
* fix some code from reviewer
* fix code of template fumction
* add code for support bangc kernel
* fix bangc format
Co-authored-by: wanghailu <wanghailu@qiyuanlab.com>
Co-authored-by: Haojie Wang <haojie0429@gmail.com>
* Fix: add warm-up and repetition in timing
* Add: CUDA runtime and float support
* Refactor: Cuda and Cpu runtimes inherit Runtime
* Add: environment script for Lotus
* Add: Lotus build instructions
* Update README.md
Co-authored-by: Liyan Zheng <liyan-zheng@outlook.com>