forked from jiuyuan/InfiniTensor
a7293c12ba
* - add layernorm kernel * success:add layernorm kernel and test * fix: remove unusalble comments * fix: modify code as reviewer suggested * debug,modified .cu and test * optional bias support * overloading function * fix bug after merging; remove time constrain in conv test --------- Co-authored-by: kilinchange <kilinchange@163.com> Co-authored-by: Haojie Wang <haojie0429@gmail.com> |
||
---|---|---|
.. | ||
cuda_attention_kvcache.h | ||
cuda_clip.h | ||
cuda_common.h | ||
cuda_element_wise.h | ||
cuda_expand.h | ||
cuda_kernel_wihtout_config.h | ||
cuda_layernorm.h | ||
cuda_pad_slice.h | ||
cuda_runtime.h | ||
cuda_softmax.h | ||
cuda_split_concat.h | ||
cuda_transpose.h | ||
cuda_unary.h | ||
cuda_utility.h | ||
cuda_where.h | ||
gather.h | ||
gbmm_g2bmm.cuh | ||
gbmm_g2bmm.h | ||
nccl_communicator.h | ||
operator_timer.h | ||
resize.cuh |