Merge branch 'main' of https://osredm.com/p04798526/LLaMA-Factory-310P3

2024-09-10 15:43:29 +08:00 · 2024-09-10 15:43:29 +08:00 · ccbea71b65
parent 42d9773188 9bbf989502
commit ccbea71b65
193 changed files with 24234 additions and 0 deletions
--- a/mindie/examples/README.md
+++ b/mindie/examples/README.md
@ -0,0 +1,220 @@
+# README
+
+- 此README对各模型统一的脚本及其使用方式进行介绍
+
+## 路径变量解释
+| 变量名  | 含义                                             |
+|--------|--------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置目录                  |
+| llm_path | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用gitee下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models`    |
+| weight_path | 模型权重路径                                 |
+| w8a8s_weight_path | 稀疏量化权重路径                        |
+| w8a8sc_weight_path | 切分并压缩后的稀疏量化权重路径           |
+| cur_dir | 运行指令或执行脚本时的路径（当前目录）              |
+
+## 权重
+
+### 权重设置
+- `${weight_path}/config.json`文件中需设置`dtype`和`quantize`类型来标识权重的量化类型和精度
+  - 若`dtype`和`quantize`字段不存在，需新增
+
+- 配置
+  | 量化类型及精度  | torch_dtype | quantize |
+  |----------------|-------------|----------|
+  | FP16           | "float16"   | 无       |
+  | BF16           | "bfloat16"  | 无       |
+  | W8A8           | "float16"   | "w8a8"   |
+  | W8A8S          | "float16"   | "w8a8s"  |
+  | W8A8SC         | "float16"   | "w8a8sc" |
+  | W8A16          | "float16"   | "w8a16"  |
+
+- 示例
+  - LLaMa模型的权重使用BF16精度，非量化
+    ```json
+    {
+      "architectures": [
+        "LlamaForCausalLM"
+      ],
+      ...
+      "torch_dtype": "bfloat16",
+      ...
+    }
+    ```
+  - LLaMa模型的权重使用FP16精度，W8A16量化
+    ```json
+    {
+      "architectures": [
+        "LlamaForCausalLM"
+      ],
+      ...
+      "torch_dtype": "float16",
+      ...
+      "quantize": "w8a16",
+    }
+    ```
+
+### 权重转换
+> 当前仅支持加载safetensor格式的权重文件
+> 若下载的权重文件中已有safetensor格式的文件，则无需进行权重转换
+> 若环境中仅有bin格式的权重文件，请按照如下方式进行转换
+> 若当前环境不存在模型权重，请至hugging face官网下载
+- 使用`${llm_path}/examples/convert/convert_weights.py`将bin转成safetensor格式
+- 示例
+    ```shell
+    cd ${llm_path}
+    python examples/convert/convert_weights.py --model_path ${weight_path}
+    ```
+  - 注意：必须先进入`${llm_path}`路径下执行以上命令，否则由于脚本中存在相对路径，会导致moudle not found的问题
+- 输出结果会保存在bin权重同目录下
+
+### NPU多卡量化
+- 环境要求
+  - 硬件环境：910A或910B环境
+  - Pytorch、PTA配套在2.1版本以上
+  - CANN >= 8.0.RC2.B010
+  - accelerate >= 0.28.0
+  - 关闭虚拟内存：`PYTORCH_NPU_ALLOC_CONF`环境变量需设置为`expandable_segments:False`（虚拟内存默认关闭）
+- 调用`${llm_path}/examples/convert/model_slim/quantifier.py`脚本时，`--device_type`参数需设置为`npu`
+- 参数配置和运行指令见各模型README文件
+
+### 稀疏量化权重生成
+- Step 1：生成稀疏量化权重
+  ```shell
+  cd ${llm_path}
+  python -m examples.convert.model_slim.quantifier --model_path ${weight_path} --save_directory ${w8a8s_weight_path} --w_bit 4 --a_bit 8 --calib_dataset_type TeacherQualification --fraction 0.011 --co_sparse True
+  ```
+  - 参数配置以模型README文件中的描述为准
+- Step 2：量化权重切分及压缩
+  ```shell
+  torchrun --nproc_per_node {TP数} -m examples.convert.model_slim.sparse_compressor --model_path ${w8a8s_weight_path} --save_directory ${w8a8sc_weight_path}
+  ```
+  - TP数为tensor parallel并行个数
+  - 注意：若权重生成时以TP=4进行切分，则运行时也需以TP=4运行
+  - 示例
+    ```shell
+      torchrun --nproc_per_node 2 -m examples.convert.model_slim.sparse_compressor --model_path /data1/weights/model_slim/llama2-7b_w8a8s --save_directory /data1/weights/model_slim/llama2-7b_w8a8sc_temp
+    ```
+
+## 启动脚本
+- Flash Attention的启动脚本路径为`${llm_path}/examples/run_fa.py`
+- Page Attention的启动脚本路径为`${llm_path}/examples/run_pa.py`
+
+### 启动脚本相关环境变量
+  - `ASCEND_RT_VISIBLE_DEVICES`
+    - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+    - 核心编号需要通过 npu-smi info 指令查阅
+    - Atlas 800I A2服务器需基于输出的 NPU 列查阅
+        ![npu_smi_info](../images/npu_smi_info_800i_a2.png)
+    - Atlas 300I DUO服务器需基于输出的 Device 列查阅
+        ![npu_smi_info](../images/npu_smi_info_300i_duo.png)
+        - 若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+  - `BIND_CPU`
+    - 绑定CPU核心开关
+    - 设置为1进行绑核，设置为0则不绑核；默认进行绑核
+    - 若当前机器未设置NUMA或绑核失败，可将 BIND_CPU 设为 0
+  - `PROFILING_LEVEL`
+    - 设置ProfilerLevel，默认为0
+  - `ATB_PROFILING_ENABLE`
+    - 是否落性能profiling文件
+    - 设置为1生成profiling文件，设置为0则不生成；默认不生成profiling文件
+  - `PROFILING_FILEPATH`
+    - （若生成profiling文件）profiling文件的路径
+    - 默认为`${cur_dir}/profiling`
+  - `ATB_LLM_BENCHMARK_ENABLE`
+    - 是否统计端到端和各token的性能数据
+    - 设置为1统计耗时，设置为0则不统计；默认不统计
+  - `ATB_LLM_BENCHMARK_FILEPATH`
+    - 性能数据的保存路径
+    - 默认为`${cur_dir}/benchmark_result/benchmark.csv`
+  - `ATB_LLM_LCOC_ENABLE`
+    - 是否开启通信计算掩盖功能
+    - 在Prefill阶段开启通信计算掩盖会提升性能
+  - `ATB_LLM_LOGITS_SAVE_ENABLE`
+    - 是否保存每个token的logits，每个logits会保存成一个单独的pth文件
+    - 设置为1保存，设置为0则不保存；默认不保存
+  - `ATB_LLM_LOGITS_SAVE_FOLDER`
+    - logits保存路径
+    - 默认为`${cur_dir}`
+  - `ATB_LLM_TOKEN_IDS_SAVE_ENABLE`
+    - 是否保存每个token的id，输入和输出token会单独保存成两个文件
+    - 设置为1保存，设置为0则不保存；默认不保存
+  - `ATB_LLM_TOKEN_IDS_SAVE_FOLDER`
+    - token id保存路径
+    - 默认为`${cur_dir}`
+
+### run_fa.py脚本参数
+- `--model_path`
+  - 模型权重路径
+- `--input_text`
+  - 输入问题
+  - 支持字符串列表或者字符串
+  - 若此值为字符串，则构造推理输入时会基于batch size入参复制多份
+  - 若此值为列表，则构造推理输入时会忽略batch size入参，真实的batch size为此列表实际长度
+- `--max_input_length`
+  - 最大输入长度
+  - 默认512个token
+  - 若输入长度不足512个token，会自动使用padding补齐
+- `--max_output_length`
+  - 最大输出长度
+  - - 默认输出20个token
+- `--batch_size`
+  - 推理时固定的batch数量
+  - 默认单batch
+- `--is_flash_causal_lm`
+  - 是否使用Paged Attention，默认不使用
+- 示例
+  ```shell
+  # 使用多卡运行Flash Attention，设置模型权重路径，设置输出长度为2048个token，精度使用BF16
+  torchrun --nproc_per_node 2 --master_port 20038 -m examples.run_fa --model_path ${weight_path} --max_output_length 2048 --is_bf16
+  ```
+
+### run_pa.py脚本参数
+- `--model_path`
+  - 模型权重路径
+- `--input_text`
+  - 输入问题
+  - 支持字符串列表或者字符串
+  - 若此值为单元素列表或字符串，则构造推理输入时会基于batch size入参复制多份
+  - 若此值为多元素列表，则构造推理输入时会忽略batch size入参，真实的batch size为此列表实际长度
+- `--input_file`
+  - 目前仅支持jsonl格式文件，每一行必须为List[Dict]格式的按时间顺序排序对话数据
+  - 每个Dict字典中需要至少包含"role"和"content"两个字段
+- `--max_position_embeddings`
+  - 模型可接受的最长输入长度
+  - 默认从模型权重的config文件中读取
+- `--max_output_length`
+  - 最大输出长度
+  - - 默认输出20个token
+- `--max_prefill_tokens`
+  - Prefill推理阶段，最大输入长度
+  - 默认4096个token
+- `--max_batch_size`
+  - 最大batch size，实际运行的batch size动态变化，有可能达不到设置的最大batch size
+  - 默认单batch
+- `--is_flash_model`
+  - 是否使用Paged Attention，默认使用
+- `--is_chat_model`
+  - store_true类型参数，若添加，则判定是chat模型
+  - 会从input_file(当前仅支持jsonl格式文件)中读取List[Dict]类型的对话数据
+  - 若未指定input_file，则会将input_text中的文本自动组成对话数据
+- `--chat_template`
+    - 默认值为None，且仅在is_chat_model为True时生效
+    - 若设置为文件名且is_chat_model为True，则从文件名指定的文件中读取jinja格式的chat模板
+    - 若设置为字符串且is_chat_model为True，则将该字符串解析为jinja格式的chat模板
+- 示例
+  ```shell
+  # 使用多卡运行Paged Attention，设置模型权重路径，设置输出长度为2048个token
+  torchrun --nproc_per_node 2 --master_port 20038 -m examples.run_pa --model_path ${weight_path} --max_output_length 2048
+  ```
+
+### 特殊场景说明
+- 单机多用户场景
+  - 300I DUO 和 800I A2 上，单机多用户场景下，由于通信算子之间采用共享内存进行通信，每个用户需要配置如下环境变量，进行共享内存的区分；
+    ```shell
+    export ATB_SHARE_MEMORY_NAME_SUFFIX="user1"
+    ```
+  - 单机多用户场景：如300I DUO上有4张卡，每张卡单独跑一个模型推理任务，需要根据不同任务设置上述环境变量来区分，如`user1`、`user2`
+- 300I DUO卡上需开启以下环境变量
+    ```shell
+    export INT8_FORMAT_NZ_ENABLE=1
+    ```
--- a/mindie/examples/init.py
+++ b/mindie/examples/init.py
--- a/mindie/examples/convert/init.py
+++ b/mindie/examples/convert/init.py
--- a/mindie/examples/convert/convert_utils.py
+++ b/mindie/examples/convert/convert_utils.py
@ -0,0 +1,27 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import json
+import os.path
+import shutil
+from atb_llm.utils.file_utils import safe_open
+
+
+def copy_tokenizer_files(model_dir, dest_dir):
+    os.makedirs(dest_dir, exist_ok=True)
+    for filename in os.listdir(model_dir):
+        if 'tokenizer' in filename or 'tokenization' in filename or 'special_token_map' in filename:
+            src_filepath = os.path.join(model_dir, filename)
+            dest_filepath = os.path.join(dest_dir, filename)
+            shutil.copyfile(src_filepath, dest_filepath)
+
+
+def modify_config(model_dir, dest_dir, torch_dtype, quantize_type, kv_quant_type=False):
+    src_config_filepath = os.path.join(model_dir, 'config.json')
+    with open(src_config_filepath, 'r', encoding='utf-8') as fr:
+        data = json.load(fr)
+    data['torch_dtype'] = str(torch_dtype).split(".")[1]
+    data['quantize'] = quantize_type
+    if kv_quant_type:
+        data['kv_quant'] = "C8"  # 当前仅支持kv cache仅支持C8类型的量化方式
+    dest_config_filtpath = os.path.join(dest_dir, 'config.json')
+    with safe_open(dest_config_filtpath, 'w', encoding='utf-8', is_exist_ok=False) as fw:
+        json.dump(data, fw, indent=4)
--- a/mindie/examples/convert/convert_weights.py
+++ b/mindie/examples/convert/convert_weights.py
@ -0,0 +1,41 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import argparse
+
+from atb_llm.utils.convert import convert_files
+from atb_llm.utils.hub import weight_files
+from atb_llm.utils.log import logger
+
+
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model_path', help="model and tokenizer path")
+    return parser.parse_args()
+
+
+def convert_bin2st(model_path):
+    local_pt_files = weight_files(model_path, revision=None, extension=".bin")
+    local_st_files = [
+        p.parent / f"{p.stem.lstrip('pytorch_')}.safetensors"
+        for p in local_pt_files
+    ]
+    convert_files(local_pt_files, local_st_files, discard_names=[])
+    _ = weight_files(model_path)
+
+
+def convert_bin2st_from_pretrained(model_path):
+    from transformers import AutoModelForCausalLM
+    model = AutoModelForCausalLM.from_pretrained(
+        pretrained_model_name_or_path=model_path,
+        low_cpu_mem_usage=True,
+        torch_dtype="auto")
+    model.save_pretrained(model_path, safe_serialization=True)
+
+
+if __name__ == '__main__':
+    args = parse_arguments()
+
+    try:
+        convert_bin2st(args.model_path)
+    except RuntimeError:
+        logger.warning('convert weights failed with torch.load method, need model loaded to convert')
+        convert_bin2st_from_pretrained(args.model_path)
--- a/mindie/examples/convert/model_slim/init.py
+++ b/mindie/examples/convert/model_slim/init.py
--- a/mindie/examples/convert/model_slim/boolq.jsonl
+++ b/mindie/examples/convert/model_slim/boolq.jsonl
@ -0,0 +1,50 @@
+{"id": 0, "inputs_pretokenized": "Ghost in the Shell -- Animation studio Production I.G has produced several different anime adaptations of Ghost in the Shell, starting with the 1995 film of the same name, telling the story of Section 9's investigation of the Puppet Master. The television series Ghost in the Shell: Stand Alone Complex followed in 2002, telling an alternate story from the manga and first film, featuring Section 9's investigations of government corruption in the Laughing Man and Individual Eleven incidents. A sequel to the 1995 film, Ghost in the Shell 2: Innocence, was released in 2004. In 2006, the film Ghost in the Shell: Stand Alone Complex - Solid State Society retook the story of the television series. 2013 saw the start of the Ghost in the Shell: Arise original video animation (OVA) series, consisting of four parts through mid-2014. The series was recompiled in early 2015 as a television series titled Ghost in the Shell: Arise - Alternative Architecture, airing with an additional two episodes (one part). An animated feature film produced by most of the Arise staff, titled Ghost in the Shell: The New Movie, was released on June 20, 2015. A live-action American film of the same name was released on March 31, 2017.\nQuestion: is ghost in the shell based on the anime?\nAnswer:"}
+{"id": 1, "inputs_pretokenized": "The Walking Dead (season 8) -- The eighth season of The Walking Dead, an American post-apocalyptic horror television series on AMC, premiered on October 22, 2017, and concluded on April 15, 2018, consisting of 16 episodes. Developed for television by Frank Darabont, the series is based on the eponymous series of comic books by Robert Kirkman, Tony Moore, and Charlie Adlard. The executive producers are Kirkman, David Alpert, Scott M. Gimple, Greg Nicotero, Tom Luse, and Gale Anne Hurd, with Gimple as showrunner for his fifth and final season. The eighth season received positive reviews from critics. It was nominated for multiple awards and won two, including Best Horror Television Series for the third consecutive year, at the 44th Saturn Awards.\nQuestion: is there gonna be a season 8 of the walking dead?\nAnswer:"}
+{"id": 2, "inputs_pretokenized": "Onyx -- Brazilian green onyx was often used as plinths for art deco sculptures created in the 1920s and 1930s. The German sculptor Ferdinand Preiss used Brazilian green onyx for the base on the majority of his chryselephantine sculptures. Green onyx was also used for trays and pin dishes -- produced mainly in Austria -- often with small bronze animals or figures attached.\nQuestion: is there such a thing as green onyx?\nAnswer:"}
+{"id": 3, "inputs_pretokenized": "Wachovia -- The acquisition of Wachovia by Wells Fargo was completed on December 31, 2008 after a government-forced sale to avoid Wachovia's failure. The Wachovia brand was absorbed into the Wells Fargo brand in a process that lasted three years: on October 15, 2011, the last Wachovia branches in North Carolina were converted to Wells Fargo.\nQuestion: is wells fargo and wachovia the same bank?\nAnswer:"}
+{"id": 4, "inputs_pretokenized": "Friday Night Lights (film) -- Friday Night Lights is a 2004 American sports drama film, directed by Peter Berg, which 'dramatized' the coach and players of a high school football team in the Texas city of Odessa that supported and was obsessed with them. The book on which it was based, Friday Night Lights: A Town, a Team, and a Dream (1990) by H.G. Bissinger, followed the story of the 1988 Permian High School Panthers football team as they made a run towards the state championship. A television series of the same name premiered on October 3, 2006 on NBC. The film won the Best Sports Movie ESPY Award and was ranked number 37 on Entertainment Weekly's list of the Best High School Movies.\nQuestion: is friday night lights movie based on a true story?\nAnswer:"}
+{"id": 5, "inputs_pretokenized": "Peace bond -- The use of peace bonds is rather uncommon in the U.S. justice system, but a deferred prosecution has a similar effect. Since there is no conviction or admission of any guilt, signing a peace bond in Canada does not usually result in U.S. inadmissibility under INA \u00a7 212 (a) (2).\nQuestion: is a peace bond an admission of guilt?\nAnswer:"}
+{"id": 6, "inputs_pretokenized": "Eating mucus -- Mucophagy, despite its benefits on one's immunity, comes with some health risks due to the potential physical aggravation resulting from the action of nose picking, and the germs on fingers and in mucus. Picking one's nose can cause upper airway irritation as well as other injuries including nasal septal perforation (a ``through-and-through defect'' of the cartilage separating the nostrils), and epistaxis (nosebleed). In a study by Andrade and Srihari, 25% of subjects were ailed by nose bleeds, 17% with nasal infections, and 2% with damage more serious than bleeding. W. Buzina studied the fungal diversity in nasal mucus in 2003. 104 samples were gathered with 331 identifiable strains of fungi and 9 different species per patient.\nQuestion: does eating your boogers improve your immune system?\nAnswer:"}
+{"id": 7, "inputs_pretokenized": "High-altitude flatus expulsion -- High-altitude flatus expulsion (HAFE) is a gastrointestinal syndrome which involves the spontaneous passage of increased quantities of rectal gases at high altitudes. First described by Joseph Hamel in c. 1820 and occasionally described afterward, a landmark study of this phenomenon was published in 1981 by Paul Auerbach and York Miller.\nQuestion: do you have more gas at higher altitudes?\nAnswer:"}
+{"id": 8, "inputs_pretokenized": "Big Boss (Metal Gear) -- Big Boss is one of the central characters in the Metal Gear video game series. He was introduced in the original Metal Gear games for the MSX2 as the commanding officer and subsequent nemesis of Solid Snake. He is later featured as Naked Snake, the protagonist of Metal Gear Solid prequels where he is initially depicted as an American Special Forces Operator and decorated war hero until political manipulations cause him to be disillusioned and start his own private mercenary company. Big Boss's character has been praised by video game publications for his role as a villain as well for his relationship with Solid Snake. As the series' chronology progressed, his exact allegiance and motivations became increasingly complex; his first appearances are depicted as a traitor dreaming of a world of perpetual war, but subsequent appearances have revealed him to be a key figure in an ideological dispute that shaped the latter half of the twentieth century and a man whose conscience was disturbed by the attitude of leaders towards soldiers, prompting his decision to become a soldier of fortune and Venom Snake's mental template.\nQuestion: is solid snake and big boss the same person?\nAnswer:"}
+{"id": 9, "inputs_pretokenized": "Jessie (2011 TV series) -- After casting was finalized and changes were made to several of the characters to suit the actors chosen, the series skipped the pilot phase and was put directly into production. Filming began in June 2011 on Stage 3/8 at Hollywood Center Studios which, prior to start of production, served as the sound stage where the Disney Channel series Wizards of Waverly Place was taped. 13 episodes were originally ordered for the first season, but while the show's first season was in production, Disney Channel ordered an additional seven episodes, bringing the total number of episodes for the first season to 20. When asked about the atmosphere on set during an interview with MSN TV, Ryan described her relationship with the young cast: ``I definitely feel like a nanny! They are smart kids, but they're real kids. They like to have fun. My policy is: We can play hard, as long as we work hard, and because we work hard, we need to play hard.'' Filming on the series wrapped on February 22, 2015.\nQuestion: is the show jessie filmed in new york?\nAnswer:"}
+{"id": 10, "inputs_pretokenized": "Song of Songs -- The Song of Songs, also Song of Solomon or Canticles (Hebrew: \u05e9\u05b4\u05c1\u05d9\u05e8 \u05d4\u05b7\u05e9\u05b4\u05bc\u05c1\u05d9\u05e8\u05b4\u05d9\u05dd\u202c, \u0160\u00eer Ha\u0161\u0160\u00eer\u00eem, Greek: \u1f8e\u03c3\u03bc\u03b1 \u1f8e\u03c3\u03bc\u03ac\u03c4\u03c9\u03bd, asma asmaton, both meaning Song of Songs), is one of the megillot (scrolls) found in the last section of the Tanakh, known as the Ketuvim (or ``Writings''), and a book of the Old Testament.\nQuestion: is the song of songs the same as the song of solomon?\nAnswer:"}
+{"id": 11, "inputs_pretokenized": "Northwest Florida State College -- The school voted to change its name to Okaloosa-Walton Community College in 1988, and gained four-year status in 2003, thus changing its name to Okaloosa-Walton College.\nQuestion: is northwest florida state college a 4 year college?\nAnswer:"}
+{"id": 12, "inputs_pretokenized": "A Quiet Place (film) -- A Quiet Place is a production of Sunday Night and Platinum Dunes; it was produced on a budget of $17 million. Krasinski wrote the screenplay with story co-writers Scott Beck and Bryan Woods. Beck and Woods grew up together in the US state of Iowa, and had watched numerous silent films in college. By 2013, they began working on the story that would lead to the film. They used their experience growing up close to farmland as the basis, including a grain silo setting as a place considered dangerous in their upbringing. They initiated their approach with a 15-page proof of concept. Initially, the writers had considered developing the film into a Cloverfield installment, but after pitching their ideas to the studio collectively, all of those involved decided to keep the film its own entity.\nQuestion: is the movie the quiet place based on a book?\nAnswer:"}
+{"id": 13, "inputs_pretokenized": "2018 FIFA World Cup qualification \u2013 UEFA Group G -- The group winners, Spain, qualified directly for the 2018 FIFA World Cup. The group runners-up, Italy, advanced to the play-offs as one of the best 8 runners-up, where they lost to Sweden and thus failed to qualify for the first time since 1958.\nQuestion: did spain qualify for the 2018 world cup?\nAnswer:"}
+{"id": 14, "inputs_pretokenized": "Red squirrel -- The eastern grey squirrel and the red squirrel are not directly antagonistic, and violent conflict between these species is not a factor in the decline in red squirrel populations. However, the eastern grey squirrel appears to be able to decrease the red squirrel population due to several reasons:\nQuestion: are grey and red squirrels the same species?\nAnswer:"}
+{"id": 15, "inputs_pretokenized": "Bermuda -- Bermuda is a group of low-forming volcanoes in the Atlantic Ocean, near the western edge of the Sargasso Sea, roughly 578 nautical miles (1,070 km; 665 mi) east-southeast of Cape Hatteras on the Outer Banks of North Carolina and about 594 nautical miles (1,100 km; 684 mi) southeast of Martha's Vineyard of Massachusetts. It is 898 nautical miles (1,663 km; 1,033 mi) northeast of Miami, Florida, and 667 nautical miles (1,235 km; 768 mi) from Cape Sable Island, in Nova Scotia, Canada. The islands lie due east of Fripp Island, South Carolina, west-northwest of Cape Verde, southeast of New York City, New York, north-northwest of Brazil and 1,759 km (1,093 mi) north of Cuba.\nQuestion: is bermuda off the coast of south carolina?\nAnswer:"}
+{"id": 16, "inputs_pretokenized": "The People's Court -- The losing party does not actually need to pay the judgment, as such. Instead (as is stated in the disclaimer at the end of each show), both parties are paid from a fund (set up by Ralph Edwards-Stu Billett Productions). This fund was based on the amount of the lawsuit claim, but an exact formula was not stated. The fund was to be first divided equally, then any monetary judgment ordered was subtracted from the loser's half (and presumably both halves in the case of cross judgments). Each litigant received at least what remained of their half in shows concluding with that disclaimer.\nQuestion: do litigants on people's court get paid?\nAnswer:"}
+{"id": 17, "inputs_pretokenized": "Texas -- Texas (/\u02c8t\u025bks\u0259s/, locally /-s\u0259z/; Spanish: Texas or Tejas (\u02c8texas)) is the second largest state in the United States by both area and population. Geographically located in the South Central region of the country, Texas shares borders with the U.S. states of Louisiana to the east, Arkansas to the northeast, Oklahoma to the north, New Mexico to the west, and the Mexican states of Chihuahua, Coahuila, Nuevo Le\u00f3n, and Tamaulipas to the southwest, while the Gulf of Mexico is to the southeast.\nQuestion: is texas the biggest state in the us?\nAnswer:"}
+{"id": 18, "inputs_pretokenized": "The Adventures of Tintin (film) -- Spielberg acquired rights to produce a film based on The Adventures of Tintin series following Herg\u00e9's death in 1983, and re-optioned them in 2002. Filming was due to begin in October 2008 for a 2010 release, but release was delayed to 2011 after Universal opted out of producing the film with Paramount, who provided $30 million on pre-production. Sony chose to co-produce the film. The delay resulted in Thomas Sangster, who had been originally cast as Tintin, departing from the project. Producer Peter Jackson, whose company Weta Digital provided the computer animation, intends to direct a sequel. Spielberg and Jackson also hope to co-direct a third film. The world premi\u00e8re took place on 22 October 2011 in Brussels. The film was released in the United Kingdom and other European countries on 26 October 2011, and in the United States on 21 December 2011, in Digital 3D and IMAX.\nQuestion: will there be a adventures of tintin 2?\nAnswer:"}
+{"id": 19, "inputs_pretokenized": "Emma Pillsbury -- Emma Pillsbury Schuester (previously Pillsbury-Howell) is a fictional character from the Fox musical comedy-drama series Glee. Portrayed by actress Jayma Mays, Emma has appeared in Glee from its pilot episode, first broadcast on May 19, 2009. Emma was developed by Glee creators Ryan Murphy, Brad Falchuk and Ian Brennan. She is a guidance counselor at the fictional William McKinley High School in Lima, Ohio where the series is set. Emma suffers from obsessive-compulsive disorder and has romantic feelings for glee club director Will Schuester (Matthew Morrison), but becomes engaged to football coach Ken Tanaka (Patrick Gallagher) as Will is married. Ken ultimately breaks up with her on their wedding day because of her feelings for Will, and when Will leaves his wife Terri (Jessalyn Gilsig), he and Emma share a kiss. Their relationship is short-lived, and in the second season, Emma and her dentist boyfriend Carl Howell (John Stamos) marry in Las Vegas. The wedding is later annulled as it was unconsummated. At the beginning of the third season, she and Will are living together; they become engaged shortly after New Years, and consummate their relationship near the end of the school year. Emma leaves Will at the altar midway through the fourth season, but the two later reconcile and marry in the season finale. She becomes pregnant during the middle of the fifth season.\nQuestion: do will and emma get together in glee?\nAnswer:"}
+{"id": 20, "inputs_pretokenized": "The Princess and the Goblin (film) -- The Princess and the Goblin (Hungarian: A hercegn\u0151 \u00e9s a kobold) is a 1991 British-Hungarian-American animated musical fantasy film directed by J\u00f3zsef G\u00e9mes and written by Robin Lyons, an adaptation of George MacDonald's 1872 novel of the same name.\nQuestion: is the princess and the goblin a disney movie?\nAnswer:"}
+{"id": 21, "inputs_pretokenized": "WWE draft -- On May 25, 2016, due to SmackDown moving to Tuesdays and to a live broadcast starting July 19, necessitating a brand extension, WWE announced that the draft would be returning. It would later be announced that the 2016 WWE draft would take place on July 19 during SmackDown's first live broadcast, which was also the first time that the draft took place on SmackDown. The 2017 draft was labeled the Superstar Shake-up as instead of a traditional draft, the general managers of Raw and SmackDown could trade and make deals between their respective talent.\nQuestion: is there going to be a wwe draft in 2017?\nAnswer:"}
+{"id": 22, "inputs_pretokenized": "Izzie Stevens -- Heigl garnered critical acclaim for her performance as Izzie and received numerous awards and nominations for her role, winning the ``Outstanding Supporting Actress In A Drama Series'' at the 2007 Emmy Awards. She was critical of the character's development during the show's fourth season, particularly her romance with George. She declined to put herself forward for the 2008 Emmy Awards, citing insufficient material in the role. After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma. She married Alex in the series' one-hundredth episode, and afterwards, her tumor was successfully removed. Izzie made her final appearance in the sixth season, leaving Seattle after Alex refused to resume their marriage. Heigl requested to be released from her contract 18 months early, in order to spend more time with her family. In January 2012, Heigl reported that she would like to return to Grey's Anatomy to give closure to her character, however, Rhimes confirmed that there were no plans to have the character return at that time and has since stated that she has no plans to ever re-approach Izzie's storyline again.\nQuestion: does izzie come back in grey's anatomy?\nAnswer:"}
+{"id": 23, "inputs_pretokenized": "Sam Beckett -- When Sam corrected the timeline, he leaped forward, but not all the way home; this time, he found himself assuming the identity of a minor-league professional baseball player named Tim Fox. For the rest of his life (an epilogue in the series finale tells us Sam never gets home, but in our terms, it was the next four years/five seasons, the duration of the show) Sam would continue to travel back and forth through time; swapping identities with various people and as a tagline for the show reiterated, ``setting right what once went wrong.''\nQuestion: did sam ever make it home in quantum leap?\nAnswer:"}
+{"id": 24, "inputs_pretokenized": "Safety (gridiron football score) -- In gridiron football, the safety (American football) or safety touch (Canadian football) is a scoring play that results in two points (or, in rare cases, one point) being awarded to the scoring team. Safeties can be scored in a number of ways, such as when a ball carrier is tackled in his own end zone or when a foul is committed by the offense in their own end zone. After a safety is scored in American football, the ball is kicked off to the team that scored the safety from the 20-yard line; in Canadian football, the scoring team also has the options of taking control of the ball at their own 35-yard line or kicking off the ball, also at their own 35-yard line. The ability of the scoring team to receive the ball through a kickoff differs from the touchdown and field goal, which require the scoring team to kick the ball off to the scored upon team. Despite being of relatively low point value, safeties can have a significant impact on the result of games, and Brian Burke of Advanced NFL Stats estimated that safeties have a greater abstract value than field goals, despite being worth a point less, due to the field position and reclaimed possession gained off the safety kick.\nQuestion: is it possible to get 1 point in football?\nAnswer:"}
+{"id": 25, "inputs_pretokenized": "Atomic number -- The atomic number or proton number (symbol Z) of a chemical element is the number of protons found in the nucleus of an atom. It is identical to the charge number of the nucleus. The atomic number uniquely identifies a chemical element. In an uncharged atom, the atomic number is also equal to the number of electrons.\nQuestion: is the atomic number equal to the number of protons?\nAnswer:"}
+{"id": 26, "inputs_pretokenized": "Tick (comics) -- In the Amazon Prime video series, The Tick is fixated on Arthur, and even mentions at one point that his thinking is fuzzy when away from Arthur. Despite Arthur's repeated attempts to push The Tick away, the hero won't leave Arthur's side for long. The Tick also frequently talks about Destiny as if she is a literal person, guiding Arthur's path (``Destiny gave him the suit. I just acted in more of a 'delivery man' role''), alluding to the Parcae in Roman mythology. At one point, Arthur starts to believe that The Tick is merely another hallucination, but that thought is quickly dispelled when Arthur's sister, Dot, interacts with ``The Blue Guy.''\nQuestion: is the tick part of arthur's imagination?\nAnswer:"}
+{"id": 27, "inputs_pretokenized": "Game of Thrones -- Game of Thrones is an American fantasy drama television series created by David Benioff and D.B. Weiss. It is an adaptation of A Song of Ice and Fire, George R.R. Martin's series of fantasy novels, the first of which is A Game of Thrones. It is filmed in Belfast and elsewhere in the United Kingdom, Canada, Croatia, Iceland, Malta, Morocco, Spain, and the United States. The series premiered on HBO in the United States on April 17, 2011, and its seventh season ended on August 27, 2017. The series will conclude with its eighth season premiering either in 2018 or 2019.\nQuestion: is this the last season of gsme of thrones?\nAnswer:"}
+{"id": 28, "inputs_pretokenized": "State supreme court -- The court consists of a panel of judges selected by methods outlined in the state constitution. State supreme courts are completely distinct from any United States federal courts located within the geographical boundaries of a state's territory, or the federal United States Supreme Court (although appeals, on some issues, from judgments of a state's highest court can be sought in the U.S. Supreme Court).\nQuestion: can a state supreme court decision be appealed?\nAnswer:"}
+{"id": 29, "inputs_pretokenized": "Snake River -- The Snake River is the thirteenth longest river in the United States. Its watershed is the 10th largest among North American rivers, and covers almost 108,000 square miles (280,000 km) in portions of six U.S. states: Wyoming, Idaho, Nevada, Utah, Oregon, and Washington, with the largest portion in Idaho. Most of the Snake River watershed lies between the Rocky Mountains on the east and the Columbia Plateau on the northwest. The largest tributary of the Columbia River, the Snake River watershed makes up about 41% of the entire Columbia River Basin. Its average discharge at the mouth constitutes 31% of the Columbia's flow at that point. Above the confluence, the Snake is slightly longer than the Columbia--1,078 miles (1,735 km) compared to 928 miles (1,493 km)--and its drainage basin is slightly larger--4% bigger than the upstream Columbia River watershed.\nQuestion: does the snake river flow into the columbia river?\nAnswer:"}
+{"id": 30, "inputs_pretokenized": "Outlier -- Deletion of outlier data is a controversial practice frowned upon by many scientists and science instructors; while mathematical criteria provide an objective and quantitative method for data rejection, they do not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known. An outlier resulting from an instrument reading error may be excluded but it is desirable that the reading is at least verified.\nQuestion: can there be outliers in a normal distribution?\nAnswer:"}
+{"id": 31, "inputs_pretokenized": "Ready Player One -- Ready Player One is a 2011 science fiction novel, and the debut novel of American author Ernest Cline. The story, set in a dystopian 2040s, follows protagonist Wade Watts on his search for an Easter egg in a worldwide virtual reality game, the discovery of which will lead him to inherit the game creator's fortune. Cline sold the rights to publish the novel in June 2010, in a bidding war to the Crown Publishing Group (a division of Random House). The book was published on August 16, 2011. An audiobook was released the same day; it was narrated by Wil Wheaton, who was mentioned briefly in one of the chapters. In 2012, the book received an Alex Award from the Young Adult Library Services Association division of the American Library Association and won the 2012 Prometheus Award.\nQuestion: is ready player one based on a true story?\nAnswer:"}
+{"id": 32, "inputs_pretokenized": "Four-leaf clover -- The four-leaf clover is a rare variation of the common three-leaf clover. According to traditional superstition, such clovers bring good luck, though it is not clear when or how that superstition got started. The earliest mention of ``Fower-leafed or purple grasse'' is from 1640 and simply says that it was kept in gardens because it was ``good for the purples in children or others''. A description from 1869 says that four-leaf clovers were ``gathered at night-time during the full moon by sorceresses, who mixed it with vervain and other ingredients, while young girls in search of a token of perfect happiness made quest of the plant by day''. The first reference to luck might be from an 11-year-old girl, who wrote in an 1877 letter to St. Nicholas Magazine, ``Did the fairies ever whisper in your ear, that a four-leaf clover brought good luck to the finder?''\nQuestion: is there such a thing as a four leaf clover?\nAnswer:"}
+{"id": 33, "inputs_pretokenized": "Statutory declaration -- Statutory declarations are commonly used to allow a person to declare something to be true for the purposes of satisfying some legal requirement or regulation when no other evidence is available. They are thus similar to affidavits (which are made on oath).\nQuestion: can a statutory declaration be used as evidence?\nAnswer:"}
+{"id": 34, "inputs_pretokenized": "Convention to propose amendments to the United States Constitution -- To become part of the Constitution, an amendment must be ratified by either--as determined by Congress--the legislatures of three-fourths (presently 38) of the states or State ratifying conventions in three-fourths of the states. Thirty-three amendments to the United States Constitution have been approved by Congress and sent to the states for ratification. Twenty-seven of these amendments have been ratified and are now part of the Constitution. As of 2018, the convention process has never been used for proposing constitutional amendments.\nQuestion: has there ever been a convention of states?\nAnswer:"}
+{"id": 35, "inputs_pretokenized": "South African English -- SAE is an extraterritorial (ET) variety of English, or a language variety that has been ``transported'' outside its mainland home. More specifically, SAE is a Southern hemisphere ET originating from later English colonisation in the 18th and 19th centuries (Zimbabwean, Australian, and New Zealand English are also Southern hemisphere ET varieties). SAE resembles British English more closely than it does American English due to the close ties that South African colonies maintained with the mainland in the 19th and 20th centuries. However, with the increasing influence of American pop-culture around the world via modes of contact like television, American English has become more familiar in South Africa. Indeed, some American lexical items are becoming alternatives to comparable British terms.\nQuestion: is south african english similar to british english?\nAnswer:"}
+{"id": 36, "inputs_pretokenized": "Haroun and the Sea of Stories -- Haroun and the Sea of Stories is a 1990 children's book by Salman Rushdie. It was Rushdie's fifth novel after The Satanic Verses. It is a phantasmagorical story that begins in a city so old and ruinous that it has forgotten its name.\nQuestion: is haroun and the sea of stories a children's book?\nAnswer:"}
+{"id": 37, "inputs_pretokenized": "Mandalay Bay -- Mandalay Bay is a 43-story luxury resort and casino on the Las Vegas Strip in Paradise, Nevada. It is owned and operated by MGM Resorts International. One of the property's towers operates as the Delano; the Four Seasons Hotel is independently operated within the Mandalay Bay tower, occupying 5 floors (35--39).\nQuestion: is four seasons las vegas part of mandalay bay?\nAnswer:"}
+{"id": 38, "inputs_pretokenized": "Lynette Scavo -- Her world is further shocked when Tom asks for a divorce, and announces that he and Jane will be moving in together. Lynette is devastated, and her rivalry with Jane becomes more heated at Penny's birthday party when they continually try to one up each other. Jane then later tries to reconcile with Lynette, but then she begins to choke on a snack. Lynette hesitates to help Jane, but ultimately comes to her aid and saves her. However, Jane is alarmed at Lynette thinking such an action over believing she thought of letting Jane die. Then on the day of Mike Delfino's funeral, Tom and Lynette comfort each other as Jane looks on. Sparks of their marriage appear and while sitting at the service Lynette thinks back to the day Tom moved out. Mike tries to understand why Lynette isn't fighting for her marriage. He then reveals that everyone in the neighborhood knows that she and Tom belong together. This memory finally causes Lynette to make the decision to fight for her marriage, win Tom back, and dissolve his romance with Jane. In With So Little to Be Sure Of Lynette and Tom officially sign their divorce papers ending their marriage. When Lynette hears Tom hasn't filed the papers, she is hopeful but after seeing Tom and Jane kiss at the office, she accepts a date from Tom's boss. It goes well at first but when he plans to transfer Tom to India, Lynette breaks it off. The boss sardonically insults Lynette before Tom about her being hung up on another man and after insults to her, Tom punches him. He and Jane argue with Jane realizing that Tom still loves Lynette and they break up. Tom goes to see Lynette but sees her hugging Lee and (not seeing who it is), thinks Lynette has moved on. He tells her he is filing but in a later talk, they realize how much they love each other and reconcile.\nQuestion: do tom and lynette get back together spoiler?\nAnswer:"}
+{"id": 39, "inputs_pretokenized": "List of Major League Baseball single-game home run leaders -- Writers of Sporting News described hitting four home runs in a single Major League Baseball (MLB) game as ``baseball's greatest single-game accomplishment''. Eighteen players have accomplished the feat to date, the most recent being Scooter Gennett on June 6, 2017 against the St. Louis Cardinals. No player has done this more than once in his career and no player has ever hit more than four in a game. Bobby Lowe was the first to hit four home runs in a single game, doing so on May 30, 1894. Fans were reportedly so excited that they threw $160 in silver coins ($4,500 today) onto the field after his fourth home run.\nQuestion: has there ever been a 5 home run game?\nAnswer:"}
+{"id": 40, "inputs_pretokenized": "Virginia Cavaliers men's basketball -- The Wahoos, as they are unofficially known, have appeared in the NCAA Tournament twenty-two times, advancing to the Elite Eight six times (1981, 1983, 1984, 1989, 1995, 2016). They further advanced to the 1981 and 1984 Final Fours; in the former winning the last NCAA third place game ever played, defeating No. 1 LSU 78--74. The Cavaliers won the post-season NIT Tournaments of 1980 and 1992.\nQuestion: has university of virginia ever won the ncaa tournament?\nAnswer:"}
+{"id": 41, "inputs_pretokenized": "Chiko Roll -- A Chiko Roll's filling is primarily cabbage and barley, as well as carrot, green beans, beef, beef tallow, wheat cereal, celery and onion. This filling is partially pulped and enclosed in a thick egg and flour pastry tube designed to survive handling at football matches. The roll is typically deep-fried in vegetable oil.\nQuestion: is there any meat in a chiko roll?\nAnswer:"}
+{"id": 42, "inputs_pretokenized": "Pupil -- The pupil is a hole located in the center of the iris of the eye that allows light to strike the retina. It appears black because light rays entering the pupil are either absorbed by the tissues inside the eye directly, or absorbed after diffuse reflections within the eye that mostly miss exiting the narrow pupil.\nQuestion: is your pupil a hole in your eye?\nAnswer:"}
+{"id": 43, "inputs_pretokenized": "Interleague play -- Interleague play in Major League Baseball refers to regular-season baseball games played between an American League (AL) team and a National League (NL) team. Interleague play was first introduced in the 1997 Major League Baseball season. Prior to that, matchups between AL teams and NL teams occurred only during spring training, the All-Star Game, other exhibition games (such as the Hall of Fame Game in Cooperstown, New York), and the World Series. Unlike modern interleague play, none of these contests, except for the World Series, counted toward official team or league records.\nQuestion: does the national league play the american league in the world series?\nAnswer:"}
+{"id": 44, "inputs_pretokenized": "Steel-toe boot -- A steel-toe boot (also known as a safety boot, steel-capped boot or safety shoe) is a durable boot or shoe that has a protective reinforcement in the toe which protects the foot from falling objects or compression, usually combined with a mid sole plate to protect against punctures from below.\nQuestion: are steel toe boots made to cut toes off?\nAnswer:"}
+{"id": 45, "inputs_pretokenized": "51st state -- Voters in Washington, D.C. and Puerto Rico have both voted for statehood in referendums. As statehood candidates, their admission to the Union requires congressional approval. American Samoa, Guam, the Northern Mariana Islands, and the United States Virgin Islands are also U.S. territories and could potentially become U.S. states someday.\nQuestion: is puerto rico the 51st state of the united states?\nAnswer:"}
+{"id": 46, "inputs_pretokenized": "List of The Waltons characters -- Mary Ellen (Judy Norton Taylor) is the oldest of Liv and John's daughters, born in April 1920, aged 13 in season one. Throughout the first few seasons, she is a typically whiny, sometimes rebellious teenager, somewhat of a tomboy who enjoys playing baseball, but could also be vain, engaging in a rivalry with rich-girl Martha-Rose Coverdale for the affections of the awkward G.W. Haines (David Doremus). Mary Ellen matures into a wiser young woman and her childish fantasy of becoming a movie star gives way for a more reasonable and realistic ambition to go into medicine after reading up on it and developing an interest. She then works to gain an education as a medical worker, and becomes a nurse. However, when she ends up taking care of the people out in the country by herself, she concludes they need more medical expertise than she can offer them and continues studying medicine until she succeeds in becoming a fully-fledged doctor. Even though some people frown upon female doctors and she receives mixed support from her family, she refuses to let this stop her. Mary Ellen has a special relationship with each of her six siblings, but she is especially close to her younger sister Erin. Mary Ellen and Erin fought a lot when they were younger girls, particularly in seasons 1 and 2. But in the middle seasons, Mary Ellen and Erin matured and became friends. In season 5 after Mary Ellen married Curt, her relationship with her sister deepened even further and by the end of the show, they truly did become each other's best friend.\nQuestion: does mary ellen become a doctor on the waltons?\nAnswer:"}
+{"id": 47, "inputs_pretokenized": "Switched at Birth (film) -- Switched at Birth is a 1991 American television film directed by Waris Hussein. It is based on the true story of Kimberly Mays and Arlena Twigg, babies switched soon after birth in a Florida hospital in 1978.\nQuestion: is switched at birth based on a real story?\nAnswer:"}
+{"id": 48, "inputs_pretokenized": "Pine oil -- Pine oil is distinguished from other products from pine, such as turpentine, the low-boiling fraction from the distillation of pine sap, and rosin, the thick tar remaining after turpentine is distilled.\nQuestion: is pine oil and turpentine the same thing?\nAnswer:"}
+{"id": 49, "inputs_pretokenized": "Mayfly -- Mayflies (also known as Canadian soldiers in the United States, or shadflies or fishflies in Canada and Michigan; also up-winged flies in the United Kingdom ) are aquatic insects belonging to the order Ephemeroptera. This order is part of an ancient group of insects termed the Palaeoptera, which also contains dragonflies and damselflies. Over 3,000 species of mayfly are known worldwide, grouped into over 400 genera in 42 families.\nQuestion: are canadian soldiers and mayflies the same thing?\nAnswer:"}
--- a/mindie/examples/convert/model_slim/get_calibration_dataset.py
+++ b/mindie/examples/convert/model_slim/get_calibration_dataset.py
@ -0,0 +1,12 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import json
+
+
+def load_jsonl(dataset_path, key_name='inputs_pretokenized'):
+    dataset = []
+    with open(dataset_path, encoding='utf-8') as file:
+        for line in file:
+            data = json.loads(line)
+            text = data[key_name]
+            dataset.append(text)
+    return dataset
--- a/mindie/examples/convert/model_slim/quantifier.py
+++ b/mindie/examples/convert/model_slim/quantifier.py
@ -0,0 +1,176 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import os
+import argparse
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlier, AntiOutlierConfig
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+
+
+from examples.convert.convert_utils import copy_tokenizer_files, modify_config
+from examples.convert.model_slim.get_calibration_dataset import load_jsonl
+
+
+CPU = "cpu"
+NPU = "npu"
+
+
+def cmd_bool(cmd_arg):
+    if cmd_arg == "True":
+        return True
+    elif cmd_arg == "False":
+        return False
+    raise ValueError(f"{cmd_arg} should be a boolean")
+
+
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model_path', help="model and tokenizer path")
+    parser.add_argument('--save_directory')
+    parser.add_argument(
+        '--calib_texts',
+        type=str,
+        nargs='+',
+        default=["What's deep learning?"])
+    parser.add_argument(
+        '--calib_file',
+        type=str,
+        help='CSV or Numpy file containing tokenized input. Alternative to text input.',
+        default=f"{os.path.join(os.path.dirname(__file__), 'teacher_qualification.jsonl')}")
+    parser.add_argument(
+        '--calib_dataset_length',
+        type=int,
+        help='Max calibration dataset length.',
+        default=50)
+    parser.add_argument('--w_bit', type=int, default=8)
+    parser.add_argument('--a_bit', type=int, default=8)
+    parser.add_argument('--disable_names', type=str, nargs='+', default=None)
+    parser.add_argument('--device_type', type=str, choices=[CPU, NPU], default=CPU)
+    parser.add_argument('--fraction', type=float, default=0.01)
+    parser.add_argument("--act_method", type=int, choices=[1, 2, 3], default=1,
+                        help=" `1`: `MinMax`, `2`: `Histogram`, `3`: `Auto`")
+    parser.add_argument('--co_sparse', type=cmd_bool, default=False)
+    parser.add_argument('--anti_method', type=str, default='',help=" `m3`: `AWQ`")
+    parser.add_argument('--disable_level', type=str, default='L0')
+    parser.add_argument('--input_ids_name', type=str, default='input_ids')
+    parser.add_argument('--attention_mask_name', type=str, default='attention_mask')
+    parser.add_argument('--do_smooth', type=cmd_bool, default=False)
+    parser.add_argument('--use_sigma', type=cmd_bool, default=False)
+    parser.add_argument('--sigma_factor', type=float, default=3.0)
+    parser.add_argument('--is_lowbit', type=cmd_bool, default=False)
+    parser.add_argument('--mm_tensor', type=cmd_bool, default=True)
+    parser.add_argument('--w_sym', type=cmd_bool, default=True)
+    parser.add_argument('--use_kvcache_quant', type=cmd_bool, default=False)
+    parser.add_argument('--open_outlier', type=cmd_bool, default=True)
+    parser.add_argument('--group_size', type=int, default=64)
+    return parser.parse_args()
+
+
+class Quantifier:
+    def __init__(self, model_path_or_name, quant_config=None, anti_outlier_config=None, device_type='cpu', **kwargs):
+        self.device_type = device_type
+        device_map = CPU if self.device_type == CPU else "auto"
+
+        self.quant_config = quant_config
+        self.anti_outlier_config = anti_outlier_config
+        self.model_path_or_name = model_path_or_name
+        self.config = AutoConfig.from_pretrained(self.model_path_or_name, trust_remote_code=True)
+        self.dtype = self.config.torch_dtype if self.device_type == NPU else torch.float32
+        self.model = AutoModelForCausalLM.from_pretrained(
+            pretrained_model_name_or_path=model_path_or_name,
+            low_cpu_mem_usage=True, torch_dtype=self.dtype,
+            device_map=device_map,
+            use_safetensors=True, trust_remote_code=True)
+
+        tokenizer_args = kwargs.get("tokenizer_args", {})
+        self.tokenizer = AutoTokenizer.from_pretrained(
+            model_path_or_name, use_fast=False, trust_remote_code=True, legacy=False, **tokenizer_args
+        )
+
+    def get_tokenized_data(self, input_texts,
+                           input_ids_name='input_ids',
+                           attention_mask_name='attention_mask'):
+        tokenized_data = []
+        for input_text in input_texts:
+            inputs = self.tokenizer(input_text, return_tensors='pt', padding=True).to(self.device_type)
+            tokenized_data.append(
+                [inputs.data[input_ids_name], inputs.data[attention_mask_name]])
+        return tokenized_data
+
+    def convert(self, tokenized_data, save_path, disable_level):
+        if self.device_type == NPU:
+            # 避免在线编译算子，使用二进制编译的算子
+            torch.npu.set_compile_mode(jit_compile=False)
+
+        if self.anti_outlier_config is not None:
+            anti_outlier = AntiOutlier(self.model, calib_data=tokenized_data, cfg=self.anti_outlier_config)
+            anti_outlier.process()
+
+        if not os.path.exists(save_path):
+            os.mkdir(save_path)
+
+        calibrator = Calibrator(self.model, self.quant_config, calib_data=tokenized_data, disable_level=disable_level)
+        calibrator.run()
+        calibrator.save(save_path, save_type=["safe_tensor"])
+
+
+if __name__ == '__main__':
+    args = parse_arguments()
+    rank = int(os.getenv("RANK", "0"))
+
+    calib_file = args.calib_file
+    calib_texts = load_jsonl(calib_file) if calib_file else args.calib_texts
+    model_path = args.model_path
+    save_directory = args.save_directory
+
+    quant_conf = QuantConfig(
+        w_bit=args.w_bit,
+        a_bit=args.a_bit,
+        disable_names=args.disable_names,
+        dev_type=args.device_type,
+        dev_id=rank,
+        act_method=args.act_method,
+        pr=1.0,  # randseed
+        nonuniform=False,
+        w_sym=args.w_sym,
+        mm_tensor=False,
+        co_sparse=args.co_sparse,
+        fraction=args.fraction,
+        sigma_factor=args.sigma_factor,
+        use_sigma=args.use_sigma,
+        is_lowbit=args.is_lowbit,
+        do_smooth=args.do_smooth,
+        use_kvcache_quant=args.use_kvcache_quant,
+        open_outlier=args.open_outlier,
+        group_size=args.group_size
+    )
+    anti_outlier_config = None
+    if args.anti_method == 'm3':
+        anti_outlier_config = AntiOutlierConfig(a_bit=args.a_bit, w_bit=args.w_bit, 
+            anti_method=args.anti_method, w_sym=args.w_sym, dev_type=args.device_type)
+    elif args.anti_method:
+        anti_outlier_config = AntiOutlierConfig(anti_method=args.anti_method)
+    quantifier = Quantifier(
+        model_path, quant_conf, anti_outlier_config,
+        device_type=args.device_type
+    )
+    tokenized_calib_data = None
+    if calib_texts is not None:
+        tokenized_calib_data = quantifier.get_tokenized_data(
+            calib_texts,
+            input_ids_name=args.input_ids_name,
+            attention_mask_name=args.attention_mask_name
+        )
+
+    if not os.path.exists(save_directory):
+        os.makedirs(save_directory, exist_ok=True)
+    #为适配工具稀疏量化传入w_bit=4,a_bit=8暂时修改quant_type
+    quantifier.convert(tokenized_calib_data, save_directory, args.disable_level)
+    quant_type = f"w{args.w_bit}a{args.a_bit}"
+    is_sparseCompress = args.w_bit == 4 and args.a_bit == 8 and (args.co_sparse or args.is_lowbit)
+    if is_sparseCompress:
+        quant_type = "w8a8s"
+    auto_config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
+    modify_config(model_path, save_directory, auto_config.torch_dtype,
+                  quant_type, args.use_kvcache_quant)
+    copy_tokenizer_files(model_path, save_directory)
--- a/mindie/examples/convert/model_slim/sparse_compressor.py
+++ b/mindie/examples/convert/model_slim/sparse_compressor.py
@ -0,0 +1,94 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import argparse
+import os
+import torch
+from atb_llm.runner import ModelRunner
+from atb_llm.utils.cpu_binding import NpuHbmInfo
+from atb_llm.utils.log import logger, print_log
+from atb_llm.models.base.model_utils import unwrap_model_state_dict
+
+from msmodelslim.pytorch.weight_compression import CompressConfig, Compressor
+from examples.convert.convert_utils import copy_tokenizer_files, modify_config
+
+
+class SparseCompressor:
+    def __init__(self, **kwargs):
+        self.rank = kwargs.get('rank', '0')
+        self.world_size = kwargs.get('world_size', '1')
+
+        self.model_path = kwargs.get('model_path', None)
+        self.save_directory = kwargs.get('save_directory', None)
+        self.multiprocess_num = kwargs.get('multiprocess_num', 8)
+        self.save_split_w8a8s_dir = kwargs.get('save_split_w8a8s_dir', None)
+
+        self.model = ModelRunner(self.model_path, rank=self.rank, world_size=self.world_size)
+        self.dtype = self.model.dtype
+        self.quantize = self.model.quantize
+        self.model.load_weights()
+
+        self.device = self.model.device
+        self.max_memory = NpuHbmInfo.get_hbm_capacity(self.rank, self.world_size, self.model.soc_info.need_nz)
+        self.init_memory = int(
+            self.max_memory * NpuHbmInfo.get_hbm_usage(self.rank, self.world_size, self.model.soc_info.need_nz))
+        print_log(self.rank, logger.info, f'hbm_capacity(GB): {self.max_memory / (1024 ** 3)}, '
+                                          f'init_memory(GB): {self.init_memory / (1024 ** 3)}')
+
+        self.warm_up_memory = 0
+        self.warm_up_num_blocks = 0
+        self.cache_manager = None
+
+        if self.save_split_w8a8s_dir is not None:
+            self.model.save_pretrained(save_directory=f'{self.save_split_w8a8s_dir}_{self.world_size}',
+                                       safe_serialization=True)
+            modify_config(model_path, save_directory, torch.float16, 'w8a8s')
+            copy_tokenizer_files(model_path, save_directory)
+
+    def compress(self):
+        model_dict = unwrap_model_state_dict(self.model.model.state_dict())
+        quant_desc = self.model.model.generate_description()
+        compress_config = CompressConfig(do_pseudo_sparse=False, sparse_ratio=1, is_debug=True,
+                                         record_detail_root=self.save_directory,
+                                         multiprocess_num=self.multiprocess_num)
+        compressor = Compressor(compress_config, weight=model_dict, quant_model_description=quant_desc)
+        compressor.run()
+        part_save_directory = os.path.join(self.save_directory, f'part{self.rank}-of-{self.world_size}')
+        os.makedirs(part_save_directory, exist_ok=True)
+        compressor.export_safetensors(part_save_directory)
+
+
+def parse_arguments():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model_path',
+                        help="model and tokenizer path",
+                        default='/data/acltransformer_testdata/weights/llama2/llama-2-70b',
+                        )
+    parser.add_argument('--save_directory', type=str, required=True)
+    parser.add_argument('--multiprocess_num', type=int, default=8)
+    parser.add_argument('--save_split_w8a8s_dir', type=str, default=None)
+
+    return parser.parse_args()
+
+
+if __name__ == '__main__':
+    args = parse_arguments()
+
+    rank = int(os.getenv("RANK", "0"))
+    world_size = int(os.getenv("WORLD_SIZE", "1"))
+    input_dict = {
+        'rank': rank,
+        'world_size': world_size,
+        **vars(args)
+    }
+
+    model_path = args.model_path
+    save_directory = args.save_directory
+    if not os.path.exists(save_directory):
+        os.makedirs(save_directory, exist_ok=True)
+
+    sparse_compressor = SparseCompressor(**input_dict)
+
+    sparse_compressor.compress()
+
+    if rank == 0:
+        modify_config(model_path, save_directory, torch.float16, 'w8a8sc')
+        copy_tokenizer_files(model_path, save_directory)
--- a/mindie/examples/convert/model_slim/teacher_qualification.jsonl
+++ b/mindie/examples/convert/model_slim/teacher_qualification.jsonl
@ -0,0 +1,44 @@
+{"id": 0, "inputs_pretokenized": "编写中小学教科书的直接依据是____。\nA. 《中华人民共和国教育法》\nB. 课程计划\nC. 课程标准\nD. 课程表", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 1, "inputs_pretokenized": "下列关于课程的三种文本表现形式说法正确的是____\nA. 课程计划是由当地教育主管部门制订的\nB. 课程标准是依据课程计划制定的\nC. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式、螺旋式、交叉式", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 2, "inputs_pretokenized": "悦悦是一名右耳失聪的残疾儿童，活动课上有时会听不清楚周老师所讲的内容，因此经常提问题。对此，周老师应当采取的措施是____。\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿，不理会悦悦", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 3, "inputs_pretokenized": "内流河也称“内陆河”，是指没有流入海洋的河流，大多分布在大陆内部干燥地区，上游降水或冰雪融水为其主要补给水源，最终消失于沙漠或注入内陆湖泊。下列中国内流河中，最长的是____。\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 4, "inputs_pretokenized": "学校规定学生不能烫染头发，但是小文为了彰显个性，在假期把头发染成了棕色。面对小文的情况，教师应该怎样处理？____\nA. 年轻人追求个性是合情合理的，应该宽容对待\nB. 违反学校的校规，应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明小文违反校规的原因，并对其进行劝导和教育", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 5, "inputs_pretokenized": "张老师根据自己班级的情况，为解决班级内部班干部的人际关系问题，建立和谐融洽的班级氛围，自主开发了“和谐人际”的班级课程，这体现了教师____。\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 6, "inputs_pretokenized": "刘老师工作很负责，学生在学校出现一点问题他就会与家长联系，在与家长沟通时他经常以前辈的姿态对待家长，对家长的教育方式指指点点。刘老师的做法____。\nA. 正确，老师就应该与家长经常沟通\nB. 正确，老师的经验比家长丰富，应该多指导家长\nC. 不正确，教师没有权利指导家长\nD. 不正确，教师应该与家长建立平等的沟通关系，尊重家长的人格", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 7, "inputs_pretokenized": "在古代印度，有一户人家经营一家棉布店销售自己手工制作的衣服。你认为这户人家属于哪个等级？____\nA. 婆罗门\nB. 刹帝利\nC. 吠舍\nD. 首陀罗", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 8, "inputs_pretokenized": "“小型分散，便于开展多种多样的活动，满足学生不同的兴趣、爱好，发展学生的才能，使学生得到更多的学习和锻炼的机会。”这种课外活动的形式是____。\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 9, "inputs_pretokenized": "小红每天晚上临睡前都要多次反复检查自己的书包，确保带齐了第二天需要用的教材和文具。她明知道没有这个必要，但就是控制不住。她可能出现了____。\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 10, "inputs_pretokenized": "国家管理和评价课程的基础是____。\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 11, "inputs_pretokenized": "儿童坚持性发生明显质变的年龄约在____\nA. 3～4岁\nB. 4～5岁\nC. 5～6岁\nD. 6岁以后", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 12, "inputs_pretokenized": "《红楼梦》中人物众多、关系繁杂。为了帮助读者阅读，许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图。这属于____。\nA. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 13, "inputs_pretokenized": "学期结束时，班主任王老师会对学生思想品德的发展变化情况进行评价。这项工作属于____。\nA. 工作总结\nB. 工作计划\nC. 操行评定\nD. 建立学生档案", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 14, "inputs_pretokenized": "人们常说：“教学有法而教无定法。”这反映了教师的劳动具有____。\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 15, "inputs_pretokenized": "县级以上地方各级人民代表大会是县级以上地方国家权力机关，其职权不包括____。\nA. 改变或撤销本级人大常务委员会不适当的决定\nB. 选举并有权罢免本级人民法院院长\nC. 批准本行政区域内的预算执行情况的报告\nD. 决定并宣布下一级行政区城进入紧急状态", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 16, "inputs_pretokenized": "在心理健康课上，同一批学生在第二次进行同样内容的人格测验时获得的分数与上次测验差别较大。这说明该测验存在的问题是____。\nA. 信度问题\nB. 效度问题\nC. 难度问题\nD. 区分度问题", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 17, "inputs_pretokenized": "李老师在教学生区分形近字“渴”“竭”“碣”“谒”时，将四个字相同的右半部分用白色粉笔写出，相异的左半部分用彩色粉笔写出。李老师运用了知觉的____。\nA. 整体性\nB. 选择性\nC. 理解性\nD. 恒常性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 18, "inputs_pretokenized": "兰兰学会走路后,就要很喜欢尝试自己穿衣、吃饭、捡东西,喜欢探索周围世界。按照埃里克森人格发展阶段理论,兰兰所处的发展阶段是____\nA. 信任对怀疑\nB. 自立对羞怯\nC. 主动感对内疚感\nD. 勤奋感对自卑感", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 19, "inputs_pretokenized": "杨老师在教授生字词的过程中发现部分学生有缺笔少画的现象，于是他把“小学生缺笔少画现象的原因及对策研究”作为研究课题，拟订相应的研究计划，在工作中收集、整理相关资料并实施教学措施，最后根据反馈信息调整教学方案。这种研究方法属于____。\nA. 教育行动研究法\nB. 教育实验法\nC. 教育叙事研究法\nD. 个案研究法", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 20, "inputs_pretokenized": "小青的数学成绩不好，她认为这是因为自己脑子笨，不是学数学的料。她的这种归因属于____。\nA. 内部、稳定，不可控的归因\nB. 外部、稳定、可控的归因\nC. 内部、不稳定，可控的归因\nD. 外部，不稳定，不可控的归因", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 21, "inputs_pretokenized": "中小学教科书不同于其他任何书籍的基本特点是内容的____。\nA. 准确性\nB. 示范性\nC. 新颖性\nD. 基础性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 22, "inputs_pretokenized": "王老师在课堂上给学生演示了与知识点有关的几个实验。这属于____。\nA. 实物直观\nB. 模象直观\nC. 言语直观\nD. 思维直观", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 23, "inputs_pretokenized": "在Excel中，单元格A1, A2, A3中的内容依次为数值1，2，3，单元格A4中的内容为字符前添加了英文单撇号“，”的文本字符“3”，在单元格A5的编辑栏输入公式“=COUNT( A1：A4) +12”并点击回车键，A5单元格的内容为____。\nA. 15\nB. 21\nC. 12\nD. 18", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 24, "inputs_pretokenized": "唐朝时形成了“父教其子，子教其弟”“五尺童子耻不言文墨焉”的社会风尚，它的形成主要得益于____。\nA. 社会经济的繁荣\nB. 科举制度的推行\nC. 学校体系的完备\nD. 三省六部制的确立", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 25, "inputs_pretokenized": "教导处的刘老师抓到两名学生藏在厕所里偷偷抽烟，于是把他们叫到办公室，慢悠悠地点燃了一根香烟，准备耐心细致地给他们做思想工作。对此，以下说法错误的是____。\nA. 刘老师既禁止学生抽烟，又能耐心劝导，严慈相济，真正做到了关爱学生\nB. 刘老师要求学生不要抽烟，却在学生面前抽烟，违背了为人师表的要求\nC. 刘老师的抽烟行为与他教导学生不能抽烟的言词相悖，很容易损害自己的威信\nD. 刘老师的行为表明教师队伍中存在一些教师需要对其加强师风师德建设的", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 26, "inputs_pretokenized": "小班幼儿看木偶剧表演时，看到“老虎”会感到害怕。这说明幼儿的____\nA. 想象脱离现实\nB. 想象与现实混淆\nC. 想象容易受情绪影响\nD. 想象内容零散", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 27, "inputs_pretokenized": "有的成语与历史人物密切相关。下列选项中，与“狡兔三窟”相关的历史人物是____。\nA. 管仲与齐桓公\nB. 毛遂与平原君\nC. 冯谖与孟尝君\nD. 曹刿与鲁庄公", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 28, "inputs_pretokenized": "王浩同学活动过多、注意力不集中、冲动行为多。这种心理障碍可能是____。\nA. 多动综合征\nB. 学习困难综合征\nC. 儿童厌学症\nD. 儿童强迫行为", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 29, "inputs_pretokenized": "在对班级学生进行教育时，班主任李老师引导学生对自己每日的学习、行为进行反省。李老师主要运用的德育方法是____。\nA. 自我修养法\nB. 榜样示范法\nC. 实践锻炼法\nD. 情感陶冶法", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 30, "inputs_pretokenized": "在讲解方程时，王老师先讲一元一次方程，再讲二元一次方程，然后讲一元二次方程，逐步加深难度。这种教学方式所遵循的原则是____。\nA. 理论联系实际原则\nB. 启发性原则\nC. 循序渐进原则\nD. 巩固性原则", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 31, "inputs_pretokenized": "近代原子核物理学之父是____。\nA. 普朗克\nB. 卢瑟福\nC. 玻尔\nD. 霍金", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 32, "inputs_pretokenized": "很多人因为有了受教育的机会而得到了和父辈完全不同的人生发展机遇。这说明教育在人的发展中起到____。\nA. 辅助作用\nB. 决定作用\nC. 次要作用\nD. 主导作用", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 33, "inputs_pretokenized": "下面是中国古代四大名著中的人物与情节，其中搭配不当的一项是____。\nA. 鲁智深——倒拔垂杨柳\nB. 孙悟空——大闹天宫\nC. 周瑜——三顾茅庐\nD. 刘姥姥——进大观园", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 34, "inputs_pretokenized": "找规律填数字是一项很有趣的活动，特别锻炼观察和思考能力。下列选项中，填入数列“1、7、8、57、____、26050”空缺处的数字，符合该组数字排列规律的是____。\nA. 456\nB. 457\nC. 458\nD. 459", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 35, "inputs_pretokenized": "教育自身的许多规律，是人类长期教育实践认识的结果，它们不会因政治经济制度和其他文化的发展而过时，更不会随时代的发展而被否定。这说明教育具有____。\nA. 历史性\nB. 永恒性\nC. 阶级性\nD. 相对独立性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 36, "inputs_pretokenized": "高中毕业会考是一种达标考试，属于____。\nA. 定量评价\nB. 相对性评价\nC. 形成性评价\nD. 绝对性评价", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 37, "inputs_pretokenized": "下列选项中，与“图书”和“音乐书”的逻辑关系相同的一组是____。\nA. “钢笔”和“铅笔”\nB. “蛋糕”和“香油”\nC. “水果”和“西瓜”\nD. “白菜”和“黄瓜”", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 38, "inputs_pretokenized": "语文教师裴老师每天下课后都会对自己一天的工作进行总结反思，并记录下来。这属于布鲁巴奇反思方法中的____。\nA. 反思日记\nB. 详细描述\nC. 交流讨论\nD. 行动研究", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 39, "inputs_pretokenized": "以下关于幼儿有意注意发展的表述，不正确的是____\nA. 幼儿有意注意发展受大脑发育水平局限\nB. 幼儿有意注意的发展水平较低，无法依靠活动和操作来维持\nC. 幼儿在幼儿园需要遵守各种行为规则，完成各项任务，这都需要幼儿形成或发展有意注意\nD. 教师在组织活动时，要求幼儿保持注意的对象应该是幼儿认知范围以内或幼儿易于理解的事物", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 40, "inputs_pretokenized": "某幼儿园根据幼儿的发展情况将班级分为快班、中班和慢班。对于快班的幼儿安排大量优秀师资和先进设备，而对于慢班的幼儿则给予较少的优良教育资源。该幼儿园的做法违背了素质教育内涵中的____。\nA. 以提高国民素质为基本宗旨\nB. 面向全体幼儿\nC. 促进幼儿全面发展\nD. 促进幼儿个性发展", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 41, "inputs_pretokenized": "作为古埃及文明的象征之一，____既寄托了古埃及人对死后重生的向往，又证明了新一代法老王权统治的神圣不可侵犯，充分显示了古埃及人的高度智慧和精湛的建筑艺术。\nA. 金字塔\nB. 帕特农神庙\nC. 圆形竞技场\nD. 麦加清真寺", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 42, "inputs_pretokenized": "在太阳系的八大行星中，质量最大和最小的行星分别是____。\nA. 木星；水星\nB. 火星；地球\nC. 金星；水星\nD. 土星；天王星", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 43, "inputs_pretokenized": "据调查，教师对学生拳打脚踢的情况现在已经较少存在，取而代之的是“心罚”。比如，对于成绩不好的学生罚做题目、罚抄单词一百遍。教师这样的行为____。\nA. 是正确的，教育中适当的惩罚是必不可少的\nB. 是正确的，教师没有侵犯学生的身体健康\nC. 是不正确的，教师没能做到依法执教\nD. 是不正确的，教师没能做到团结合作", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
--- a/mindie/examples/input.jsonl
+++ b/mindie/examples/input.jsonl
@ -0,0 +1 @@
+[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What's deep learning?"}, {"role": "assistant", "content": "Deep learning is a subset of machine learning that uses artificial neural networks to learn from data."}, {"role": "user", "content": "Can you explain in more detail?"}]
--- a/mindie/examples/models/aquila/README.md
+++ b/mindie/examples/models/aquila/README.md
@ -0,0 +1,181 @@
+# README
+
+- 悟道·天鹰（Aquila） 语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
+
+- 此代码仓中实现了一套基于NPU硬件的Aquila推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了各Aquila模型支持的特性
+
+| 模型及参数量           | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI  | 长序列 |
+| ---------------------- |-------------------------|---------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ | -------------------------- | ---- | ------ | ---- |-----|
+| Aquila-7B                | 支持world size 1,2,4,8    | 支持world size 1,2,4        | √    | ×   | √               | √               | ×        | ×         | ×         | ×            | ×                          | ×    | ×      | ×    | ×    |
+| Aquila2-7B               | 支持world size 1,2,4,8    | 支持world size 1,2,4        | √    | ×   | √               | √               | ×        | ×         | ×         | ×            | ×                          | ×    | ×      | ×    | ×    |
+| Aquila2-34B              | 支持world size 4,8        | ×                         | √    | ×   | √               | √               | ×        | ×         | ×         | ×            | ×                          | ×    | ×      | ×    | ×    |
+
+- 此模型仓已适配的模型版本
+    - [FalshAI GitHub仓](https://github.com/FlagAI-Open/FlagAI/)
+
+# 使用说明
+
+## 路径变量解释
+| 变量名  | 含义                                                                                                                  |
+|--------|---------------------------------------------------------------------------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置的目录                                                                                                     |
+| llm_path | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用gitee下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
+| script_path | 脚本所在路径；Aquila和Aquila2的工作脚本所在路径为`${llm_path}/examples/models/aquila`                                                 |
+| weight_path | 模型权重路径                                                                                                              |
+
+## 权重
+**权重下载**
+- [Aquila-7B](https://huggingface.co/BAAI/Aquila-7B/tree/main)
+- [Aquila2-7B](https://huggingface.co/BAAI/Aquila2-7B/tree/main)
+- [Aquila2-34B](https://huggingface.co/BAAI/Aquila2-34B/tree/main)
+**权重转换**
+- 参考[此README文件](../../README.md)
+
+**量化权重生成**
+- 基于原始的FP16的权重，生成量化权重
+- W8A8 Antioutlier量化权重请使用以下指令生成
+- 暂不支持
+
+- W8A8量化权重请使用以下指令生成
+- 暂不支持
+
+- W8A16量化权重请使用以下指令生成
+- 暂不支持
+
+- 稀疏量化权重请使用以下指令生成
+- 暂不支持
+
+**基础环境变量**
+- 参考[此README文件](../../../README.md)
+
+## 推理
+
+### 对话测试
+**运行Flash Attention FP16**
+- 其余Aquila模型参考以下运行方式
+    - 运行启动脚本
+        - 在\${llm_path}目录下执行以下指令
+          ```shell
+          bash ${script_path}/run_fa.sh ${weight_path}
+          ```
+    - 环境变量说明
+        - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+            - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+            - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+            - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+            - 各模型支持的核心数参考“特性矩阵”
+        - `export MASTER_PORT=20031`
+            - 设置卡间通信端口
+            - 默认使用20031端口
+            - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+            - 设置时端口建议范围为：20000-20050
+        - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+          ```shell
+          export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+          export INF_NAN_MODE_ENABLE=0
+          export ATB_OPERATION_EXECUTE_ASYNC=1
+          export TASK_QUEUE_ENABLE=1
+          export ATB_CONVERT_NCHW_TO_ND=1
+          export HCCL_BUFFSIZE=120
+          export HCCL_WHITELIST_DISABLE=1
+          export ATB_CONTEXT_WORKSPACE_RING=1
+          export ATB_CONTEXT_WORKSPACE_SIZE=2629145600
+          export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
+          export ATB_LAUNCH_KERNEL_WITH_TILING=0
+          export ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=1
+          export ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=0
+    
+          ```
+
+**运行Flash Attention BF16**
+- 暂不支持
+
+**运行Flash Attention W8A8**
+- 暂不支持
+
+**运行Flash Attention W8A16**
+- 暂不支持
+
+**运行Paged Attention FP16**
+- 运行启动脚本
+    - 在\${llm_path}目录下执行以下指令
+      ```shell
+      bash ${script_path}/run_pa.sh ${weight_path}
+      ```
+- 环境变量说明
+    - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+        - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+        - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+        - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+        - 各模型支持的核心数参考“特性矩阵”
+    - `export MASTER_PORT=20031`
+        - 设置卡间通信端口
+        - 默认使用20031端口
+        - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+        - 设置时端口建议范围为：20000-20050
+    - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+      ```shell
+      export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+      export INF_NAN_MODE_ENABLE=0
+      export ATB_OPERATION_EXECUTE_ASYNC=1
+      export TASK_QUEUE_ENABLE=1
+      export ATB_CONVERT_NCHW_TO_ND=1
+      export LCCL_ENABLE_FALLBACK=1
+      export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+      export ATB_CONTEXT_WORKSPACE_SIZE=0
+      ```
+
+**运行Paged Attention BF16**
+- 暂不支持
+
+**运行Paged Attention W8A8**
+- 暂不支持
+
+**运行Paged Attention W8A16**
+- 暂不支持
+
+**运行KV cache量化**
+- 暂不支持
+
+**运行稀疏量化**
+- 暂不支持
+
+**运行MOE量化**
+- 暂不支持
+
+## 精度测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+    - 示例
+      ```shell
+      cd ${llm_path}/tests/modeltest
+      export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+      export MAX_MEMORY_GB=29
+      bash run.sh pa_fp16 full_BoolQ 1 aquila_7b ${aquila-7b权重路径} 8
+      bash run.sh pa_fp16 full_BoolQ 1 aquila2_7b ${aquila2-7b权重路径} 8
+      bash run.sh pa_fp16 full_BoolQ 1 aquila2_34b ${aquila2-34b权重路径} 8
+      ```
+    - MMLU测试集精度测试
+      - 使用GPU测试Aquila模型测试MMLU数据集，需修改如下配置：
+      - 1、修改开源文件config.json中max_position_embeddings大于3072
+      - 2、修改开源文件tokenizer_config.json中model_max_length为3072
+
+## 性能测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+    - 示例
+      ```shell
+      cd ${llm_path}/tests/modeltest
+      export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+      export MAX_MEMORY_GB=29
+      export ATB_LLM_BENCHMARK_ENABLE=1
+      bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 aquila_7b ${aquila-7b权重路径} 8
+      bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 aquila2_7b ${aquila2-7b权重路径} 8
+      bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 aquila2_34b ${aquila2-34b权重路径} 8
+      ```
+
+## FAQ
+- 更多环境变量见[此README文件](../../README.md)
+- 对话测试实际执行的Python文件为`${llm_path}/examples/run_fa.py`和`${llm_path}/examples/run_pa.py`；这两个文件的参数说明见[此README文件](../../README.md)
+- 运行时，需要通过指令pip list｜grep protobuf确认protobuf版本，如果版本高于3.20.x，请运行指令pip install protobuf==3.20.0进行更新
--- a/mindie/examples/models/aquila/run_fa.sh
+++ b/mindie/examples/models/aquila/run_fa.sh
@ -0,0 +1,23 @@
+#Copyright (c) Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
+#
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MASTER_PORT=20031
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export HCCL_BUFFSIZE=120
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+
+extra_param="--max_output_length=128"
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_fa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_fa --model_path $1 $extra_param --input_text='假如你是小明，请给小红写一封情书？'
+fi
--- a/mindie/examples/models/aquila/run_pa.sh
+++ b/mindie/examples/models/aquila/run_pa.sh
@ -0,0 +1,24 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export ASCEND_RT_VISIBLE_DEVICES=4,5
+export MASTER_PORT=20030
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export LCCL_ENABLE_FALLBACK=1
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+export INT8_FORMAT_NZ_ENABLE=1
+
+extra_param=""
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_pa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
+fi
--- a/mindie/examples/models/atb_speed_sdk/README.md
+++ b/mindie/examples/models/atb_speed_sdk/README.md
@ -0,0 +1,306 @@
+# atb_speed_sdk
+
+*提高加速库的易用性，统一下游任务，集成公共能力*  
+优点：
+
+1. 同时兼容GPU与NPU，最大程度减少迁移适配的工作量
+2. 屏蔽NPU与GPU的差异，用户无感切换
+3. 一个配置文件覆盖所有配置
+4. 进程安全的日志
+
+# sdk安装
+
+```shell
+pip install .
+```
+
+# 配置文件使用及样例
+
+## 使用
+
+```python
+from atb_speed.common.config import atb_speed_config
+
+config_path = "xxxx"
+atb_speed_config.init_config(config_path)
+```
+
+## 样例
+
+```
+[model]
+;模型路径
+model_path=../model
+;使用的设备号,多卡用逗号分隔，设置多卡，将默认使用并行模式
+device_ids=2
+;并行通信类型，默认是hccl，可选hccl/nccl(GPU)
+;parallel_backend=hccl
+;日志保存路径，默认是执行脚本所在路径
+;log_dir=./
+;是否绑核，0或1，默认是1表示开启
+;bind_cpu=1
+
+[precision]
+;精度测试方法，默认为ceval，可选ceval/mmlu
+mode=ceval
+;精度测试工作路径
+work_dir=./
+;批量精度测试，默认是1
+batch=1
+;每个科目的shot数量，默认是5
+shot=5
+;每个问题的回答长度，默认是32
+;seq_len_out=32
+
+[performance]
+;性能测试模型名称，用于结果文件的命名
+model_name=vicuna_13b
+;测试的batch size
+batch_size=1
+;测试的输入的最大2的幂
+max_len_exp=10
+;测试的输入的最小2的幂
+min_len_exp=5
+;特定用例测试，格式为[[seq_in,seq_out]],注意当设置这个参数时，max_len_exp min_len_exp不生效
+;case_pair=[[1,2],[2,3]]
+;生成的结果文件名称，默认会自动生成，一般不设置
+;save_file_name=
+;性能测试方法，detail / normal , 默认是normal.要使用detail需要配合装饰器计时，并加上环境变量 TIMEIT=1
+;perf_mode=
+;性能测试时是否只测试generate而跳过decode，0/1 默认是0
+;skip_decode=
+```
+
+# 使用说明
+
+最核心的模块是launcher，所有的下游任务都围绕launcher来执行
+
+## launcher [model]
+
+用户通过继承Launcher，多卡继承ParallelLauncher 基类来实现自定义launcher。  
+当前的launcher对GPU和NPU做了自适应适配，因此可以通用。  
+使用launcher时，用户需要实现自定义的init_model方法，这里需要注意的是，self.model_path是从配置文件中读出的。  
+如果要进行功能测试，则需要实现自定义的infer方法。
+
+```python
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher import Launcher
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+
+class BaichuanLM(Launcher):
+
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True, use_fast=False)
+        model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
+        model.eval()
+        model.generation_config = self.remove_part_of_generation_config(model.generation_config)
+        return model, tokenizer
+
+
+if __name__ == '__main__':
+    atb_speed_config.init_config()
+    baichuan = BaichuanLM()
+    print("---------------warm-up---------------")
+    baichuan.infer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->')
+
+    print("---------------inference---------------")
+    baichuan.infer('登鹳雀楼->王之涣\n夜雨寄北->')
+    baichuan.infer('苹果公司的CEO是')
+
+    query_list = ["谷歌公司的CEO是",
+                  '登鹳雀楼->王之涣\n夜雨寄北->',
+                  '苹果公司的CEO是',
+                  '华为公司的CEO是',
+                  '微软公司的CEO是']
+    baichuan.infer_batch(query_list)
+
+```
+
+# 精度测试
+
+SDK提供了两种精度测试方法，ceval和mmlu
+
+## 配置说明 [precision]
+
+| 配置项key      | 默认值   | 备注                                |
+|-------------|-------|-----------------------------------|
+| mode        | ceval | 精度测试方法。可选ceval/mmlu               |
+| work_dir    |       | 精度测试工作路径。必填                       |
+| batch       | 1     | 批量精度测试的批数，请注意batch大于1时精度会和等于1时有差别 |
+| shot        | 5     | 每个科目的shot数量                       |
+| seq_len_out | 32    | 每个问题的回答长度                         |
+
+### 1. 下载测试数据集
+
+ceval
+
+```
+wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
+unzip ceval-exam.zip -d data
+```
+
+mmlu
+
+```shell
+wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
+tar -xvf data.tar
+```
+
+注:wget网络不通请从网页下载并复制
+
+### 2. 配置精度测试相关项
+
+0. 按照推理指导,下载模型及配置路径，并安装atb_speed_sdk
+1. 新建工作文件夹${precision_work_dir}。
+2. 将下载的测试数据集进行解压后的数据和脚本放置在${precision_work_dir}
+3. 修改config.ini文件设置，设置ceval相关路径
+
+目录结构示例${ceval_work_dir}:  
+--test_result 跑完之后生成  
+--data (包含：数据文件夹dev、test、val三者)
+
+## 运行脚本
+
+只需要声明一个launcher即可使用
+
+```python
+from atb_speed.common.precision import get_precision_test_cls
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher import Launcher
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+
+class BaichuanLM(Launcher):
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True, use_fast=False)
+        model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
+        model.eval()
+        model.generation_config = self.remove_part_of_generation_config(model.generation_config)
+        return model, tokenizer
+
+
+if __name__ == '__main__':
+    atb_speed_config.init_config("config.ini")
+    baichuan = BaichuanLM()
+    c_t = get_precision_test_cls()(baichuan)
+    c_t.run()
+```
+
+# 性能测试 [performance]
+
+SDK提供了两种性能测试的方法,常规估计法，精确打点法。也提供了两种测试方案，2幂测试和特定case测试
+
+## 配置说明
+
+| 配置项key         | 默认值    | 备注                                                                                    |
+|----------------|--------|---------------------------------------------------------------------------------------|
+| model_name     |        | 性能测试模型名称，用于结果文件的命名                                                                    |
+| batch_size     | 1      | 测试的batch size                                                                         |
+| max_len_exp    | 10     | 测试的输入的最大2的幂                                                                           |
+| min_len_exp    | 5      | 测试的输入的最小2的幂                                                                           |
+| case_pair      |        | 特定用例测试，格式为[[seq_in,seq_out]],注意当设置这个参数时，max_len_exp min_len_exp不生效                    |
+| save_file_name |        | 生成的结果文件名称，默认会自动生成，一般不设置                                                               |
+| perf_mode      | normal | 性能测试方法，detail / normal , 默认是normal.要使用detail需要侵入式替换utils，并加上环境变量 RETURN_PERF_DETAIL=1 |
+| skip_decode    | 0      | 性能测试时是否只测试generate而跳过decode，0/1 默认是0                                                  |
+
+## 精确打点法
+
+- 通过在modeling中使用sdk里的计时装饰器进行计时
+- 不再需要侵入式修改任何的三方件中的源码，支持任意版本的transformers
+- perf_mode设为detail
+- 将环境变量`TIMEIT`设置成1来开启性能测试，为了不影响正常使用，默认是0
+
+### Timer介绍
+
+- 将环境变量`TIMEIT`设置成1来开计时，为了不影响正常使用，默认是0
+- 计时的数据是累积的，使用 Timer.reset() 来重置计时器
+- 硬件设备上的数据需要同步才能准确计时。在计时前，请使用`Timer.sync = getattr(torch, device_type).synchronize`设置计时器的同步函数
+
+### 如何使用
+
+只需要在最外层的forward函数上方增加timing的计时器即可。  
+例如：
+
+```python
+import torch
+from torch import nn
+
+from atb_speed.common.timer import Timer
+
+
+class AddNet(nn.Module):
+    def __init__(self, in_dim, h_dim=5, out_dim=1):
+        super().__init__()
+        self.fc1 = nn.Linear(in_dim, h_dim)
+        self.fc2 = nn.Linear(h_dim, out_dim)
+
+    @Timer.timing
+    def forward(self, x, y):
+        out = torch.cat([x, y], dim=1)
+        out = torch.relu(self.fc1(out))
+        out = self.fc2(out)
+        return out
+
+
+if __name__ == '__main__':
+    add_net = AddNet(in_dim=2)
+    Timer.sync = torch.cuda.synchronize
+    Timer.reset()
+    for i in range(5):
+        x = torch.randn(1, 1)
+        y = torch.randn(1, 1)
+        result = add_net.forward(x, y)
+        print(result)
+    print(Timer.timeit_res)
+    print(Timer.timeit_res.first_token_delay)
+    print(Timer.timeit_res.next_token_avg_delay)
+```
+
+## 常规估计法
+
+- 通过第一次生成1个token，第2次生成n个token，计时作差来估计性能。
+- *假设两次推理首token的时延相同*
+- perf_mode设为normal
+
+## 运行脚本
+
+```python
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher import Launcher
+from atb_speed.common.performance.base import PerformanceTest
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+
+class LMLauncher(Launcher):
+    """
+    Baichuan2_7B_NPU
+    """
+
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        tokenizer = AutoTokenizer.from_pretrained(
+            self.model_path, trust_remote_code=True, use_fast=False)
+        model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
+        model.eval()
+        model.generation_config = self.remove_part_of_generation_config(model.generation_config)
+        return model, tokenizer
+
+
+if __name__ == '__main__':
+    atb_speed_config.init_config("config.ini")
+    performance_test = PerformanceTest(LMLauncher())
+    performance_test.warm_up()
+    performance_test.run_test()
+```
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/init.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/init.py
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/init.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/init.py
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/config.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/config.py
@ -0,0 +1,122 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+config
+"""
+import ast
+import configparser
+import os
+import warnings
+from dataclasses import dataclass
+from typing import Optional, List, Union, Type
+
+
+class ConfigInitializationError(Exception):
+    def __init__(self, message):
+        self.message = message
+        super().__init__(self.message)
+
+
+@dataclass
+class PrecisionConfig:
+    work_dir: str = ""
+    batch: int = 1
+    shot: int = 5
+    seq_len_out: int = 32
+    mode: str = "ceval"
+
+    def __post_init__(self):
+        int_attr = ("batch", "shot", "seq_len_out")
+        for attr in int_attr:
+            self.__dict__[attr] = int(self.__dict__[attr])
+        self.work_dir = os.path.realpath(self.work_dir)
+
+
+@dataclass
+class PerformanceConfig:
+    model_name: str = ""
+    batch_size: int = 1
+    max_len_exp: int = 11
+    min_len_exp: int = 5
+    case_pair: Union[Optional[List[int]], str] = None
+    save_file_name: str = ""
+    perf_mode: str = "normal"
+    skip_decode: int = 0
+
+    def __post_init__(self):
+        int_attr = ("batch_size", "max_len_exp", "min_len_exp", "skip_decode")
+        for attr in int_attr:
+            self.__dict__[attr] = int(self.__dict__[attr])
+        if self.case_pair is not None:
+            self.case_pair = ast.literal_eval(self.case_pair)
+
+
+@dataclass
+class ModelConfig:
+    model_path: str = ""
+    device_ids: str = "0"
+    parallel_backend: str = "hccl"
+    device_num: int = 1
+    log_dir: str = os.path.join(os.getcwd(), "atb_speed_log")
+    bind_cpu: int = 1
+
+    def __post_init__(self):
+        self.model_path = os.path.realpath(self.model_path)
+        self.device_num = len(self.device_ids.split(","))
+        int_attr = ("bind_cpu",)
+        for attr in int_attr:
+            self.__dict__[attr] = int(self.__dict__[attr])
+
+
+@dataclass
+class Config:
+    model: ModelConfig = None
+    performance: PerformanceConfig = None
+    precision: PrecisionConfig = None
+
+    def init_config(self, raw_content_path, allow_modify=False):
+        if not os.path.exists(raw_content_path):
+            raise FileNotFoundError(f"{raw_content_path} not exists.")
+
+        section_map = {
+            "model": ModelConfig,
+            "performance": PerformanceConfig,
+            "precision": PrecisionConfig
+        }
+        if allow_modify:
+            warn_msg = "Warning, allow_modify has been set as True. " \
+                       "It is dangerous to modify the reserved fields below.\n"
+            for cfg_key, cfg_cls in section_map.items():
+                warn_msg = warn_msg + "\n".join(
+                    f"{cfg_key}.{sub_k} is reserved."
+                    for sub_k in cfg_cls.__dict__ if not sub_k.startswith("__")) + "\n"
+            warnings.warn(warn_msg, DeprecationWarning, stacklevel=2)
+        conf = configparser.ConfigParser()
+        conf.read(raw_content_path, encoding="utf-8")
+        for section_name, section_content in conf.items():
+            if section_name == "DEFAULT":
+                continue
+            if section_name == "ceval":
+                warnings.warn(
+                    "The section_name [ceval] is deprecated, "
+                    "please refer to readme and use [precision] instead",
+                    DeprecationWarning,
+                    stacklevel=2)
+                section_name = "precision"
+            if not hasattr(self, section_name) and not allow_modify:
+                warnings.warn(f"The section [{section_name}] is not recognized and not allowed to modify.",
+                              UserWarning,
+                              stacklevel=2)
+                continue
+            config_cls: Type | None = section_map.get(section_name)
+            if not config_cls:
+                raise ConfigInitializationError(f"No configuration class found for section [{section_name}].")
+            try:
+                attr = config_cls(**section_content)
+            except TypeError as e:
+                raise ConfigInitializationError(f"Invalid configuration for section [{section_name}].") from e
+            setattr(self, section_name, attr)
+
+
+atb_speed_config = Config()
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/cpu_binding.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/cpu_binding.py
@ -0,0 +1,178 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import logging
+import os
+import subprocess
+from dataclasses import dataclass
+from typing import List, Dict, Union
+
+import psutil
+
+
+def execute_command(cmd_list):
+    with subprocess.Popen(cmd_list,
+                          shell=False,
+                          stdout=subprocess.PIPE,
+                          stderr=subprocess.PIPE) as p:
+        out, err = p.communicate(timeout=1000)
+    res = out.decode()
+    return res
+
+
+@dataclass
+class DeviceInfo:
+    _info_line: str = ""
+    npu_id: int = 0
+    chip_id: int = 0
+    chip_logic_id: Union[int, str] = 0
+    chip_name: str = ""
+
+    def __post_init__(self):
+        self.npu_id, self.chip_id, self.chip_logic_id, self.chip_name = self._info_line.strip().split(None, 3)
+        self.npu_id = int(self.npu_id)
+        self.chip_id = int(self.chip_id)
+        if self.chip_logic_id.isnumeric():
+            self.chip_logic_id = int(self.chip_logic_id)
+
+
+@dataclass
+class CPUBinder:
+    logger: logging.Logger = logging.getLogger()
+
+    @staticmethod
+    def _get_device_map_info() -> Dict[int, DeviceInfo]:
+        device_map_info = {}
+        device_map = execute_command([f"npu-smi", "info", "-m"]).strip().split("\n")[1:]
+        for line in device_map:
+            device_info = DeviceInfo(line.strip())
+            if isinstance(device_info.chip_logic_id, int):
+                device_map_info[device_info.chip_logic_id] = device_info
+        return device_map_info
+
+    @staticmethod
+    def _get_pcie_info(devices: List[int], keyword="PCIeBusInfo"):
+        device_map_info = CPUBinder._get_device_map_info()
+        device_pcie_tbl = {}
+        for device in devices:
+            device_info = device_map_info.get(device)
+            if not device_info:
+                raise RuntimeError("Can not get device info, binding cpu will skip.")
+            pcie_info = execute_command(["npu-smi", "info", "-t", "board", "-i", f"{device_info.npu_id}",
+                                         "-c", f"{device_info.chip_id}"]).strip().split("\n")
+            for _ in pcie_info:
+                line = ''.join(_.split())  # 此处是因为310P的关键字是 PCIe Bus Info 910是 PCIeBusInfo，故去掉空格以此兼容
+                if line.startswith(keyword):
+                    device_pcie_tbl[device] = line[len(keyword) + 1:]
+                    break
+
+        return device_pcie_tbl
+
+    @staticmethod
+    def _get_numa_info(pcie_tbl, keyword="NUMAnode"):
+        device_numa_tbl = {}  # key is device id, value is numa id
+        numa_devices_tbl = {}  # key is numa id, value is device id list
+
+        for device, pcie_no in pcie_tbl.items():
+            numa_info = execute_command(["lspci", "-s", f"{pcie_no}", "-vvv"]).strip().split("\n")
+            for _ in numa_info:
+                line = ''.join(_.split())
+                if line.startswith(keyword):
+                    numa_id = int(line[len(keyword) + 1:])
+                    device_numa_tbl[device] = numa_id
+
+                    devices = numa_devices_tbl.get(numa_id, None)
+                    if devices is None:
+                        numa_devices_tbl[numa_id] = list()
+
+                    numa_devices_tbl[numa_id].append(device)
+                    break
+
+        return device_numa_tbl, numa_devices_tbl
+
+    @staticmethod
+    def _get_cpu_info(numa_ids, keyword1="NUMAnode", keyword2="CPU(s)"):
+        cpu_idx_tbl = dict()
+        numa_keywords = [keyword1 + str(idx) + keyword2 for idx in numa_ids]
+        cpu_info = execute_command(["lscpu"]).strip().split("\n")
+        for _ in cpu_info:
+            line = ''.join(_.split())
+            if any(line.startswith(word) for word in numa_keywords):
+                split_info = line.split(":")
+                cpu_id_ranges = split_info[-1].split(",")
+
+                ranges = list()
+                for range_str in cpu_id_ranges:
+                    endpoints = range_str.split("-")
+                    if len(endpoints) != 2:
+                        raise Exception("lscpu command output error, please check !")
+
+                    ranges += [cid for cid in range(int(endpoints[0]), int(endpoints[1]) + 1)]
+
+                numa_id = int(split_info[0].replace(keyword1, '').replace(keyword2, ''))
+                cpu_idx_tbl[numa_id] = ranges
+        return cpu_idx_tbl
+
+    def bind_cpus(self, visible_devices: List[int] = None, rank_id: int = 0, ratio: float = 0.5):
+        """
+        可以用export CPU_BINDING_NUM设置每个进程绑的核数;如果不设置CPU_BINDING_NUM,
+        会根据ratio(numa利用率)进行计算,如果有64个核，0.5表示用一半，用32个核, 平分给亲和在这个numa上的npu
+        :param visible_devices:
+        :param rank_id:
+        :param ratio:
+        :return:
+        """
+
+        if visible_devices is None:
+            devices = [
+                int(item.strip())
+                for item in os.getenv("ASCEND_RT_VISIBLE_DEVICES", None).split(",")
+                if item.isnumeric()
+            ]
+        else:
+            devices = visible_devices
+
+        # 获取npu和pcie的对应关系
+        device_pcie_tbl = self._get_pcie_info(devices)
+        # 根据pcie信息获取npu和numa的对应关系
+        device_numa_tbl, numa_devices_tbl = self._get_numa_info(device_pcie_tbl)
+        # 获取使用的numa对应的cpu核分配信息
+        cpu_idx_tbl = self._get_cpu_info(list(numa_devices_tbl.keys()))
+
+        # 当前rank的npu id
+        cur_device = devices[rank_id]
+        # 获取npu对应的numa id
+        numa_id = device_numa_tbl.get(cur_device)
+
+        # 获取共享该numa的npu信息
+        shard_devices = numa_devices_tbl.get(numa_id)
+        # 按照npu id进行排序
+        shard_devices.sort()
+
+        # 获取该numa上所有的cpu id信息
+        all_cpus = cpu_idx_tbl[numa_id]
+        info_msg = (f"rank_id: {rank_id}, device_id: {cur_device}, numa_id: {numa_id}, "
+                    f"shard_devices: {shard_devices}, cpus: {all_cpus}")
+        self.logger.info(info_msg)
+
+        cpu_nums = len(all_cpus)
+        # 计算给该共享numa的npu分配的核的个数
+        cpu_binding_num = os.environ.get("CPU_BINDING_NUM", None)
+        if cpu_binding_num is None:
+            cpu_num_per_device = int(cpu_nums * ratio // len(shard_devices))
+        else:
+            cpu_num_per_device = int(cpu_binding_num)
+            if len(shard_devices) * cpu_num_per_device > cpu_nums:
+                raise Exception(
+                    f"Cpu num in numa {numa_id} to assign {cpu_num_per_device} for every device is not enough, "
+                    f"please decrease the value of CPU_BINDING_NUM!")
+
+        # 获取该npu的下标信息
+        idx = shard_devices.index(cur_device)
+        # 给该npu分配要绑定的cpu id
+        binding_cpus = [all_cpus[_] for _ in range(idx * cpu_num_per_device, (idx + 1) * cpu_num_per_device)]
+
+        # cpu bind
+        p = psutil.Process()
+        p.cpu_affinity(binding_cpus)
+        new_affinity = p.cpu_affinity()
+        info_msg = f"process {p.pid}, new_affinity is {new_affinity}, cpu count {cpu_num_per_device}"
+        self.logger.info(info_msg)
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/init.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/init.py
@ -0,0 +1,12 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+common launcher
+"""
+from atb_speed.common.launcher.base import get_device, DeviceType
+
+if get_device() == DeviceType.npu:
+    from atb_speed.common.launcher.npu import Launcher, ParallelLauncher
+else:
+    from atb_speed.common.launcher.gpu import Launcher, ParallelLauncher
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/base.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/base.py
@ -0,0 +1,244 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+common launcher
+"""
+import inspect
+import logging
+import os
+import time
+from abc import abstractmethod
+from enum import Enum
+from typing import Dict, Tuple
+
+import torch
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.log.logging import init_logger
+from transformers import GenerationConfig
+
+
+class DeviceType(str, Enum):
+    npu = "npu"
+    cuda = "cuda"
+    cpu = "cpu"
+
+
+def get_device() -> str:
+    """
+    获取当前所在设备
+    :return:
+    """
+    flag = torch.cuda.is_available()
+    if flag:
+        return DeviceType.cuda
+    try:
+        import torch_npu
+        flag = torch.npu.is_available()
+    except ImportError:
+        flag = False
+    return DeviceType.npu if flag else DeviceType.cpu
+
+
+class BaseLauncher:
+    """
+    BaseLauncher
+    """
+
+    def __init__(self, device_ids: str = None, model_path="", options=None):
+        options = {} if options is None else options
+        self.model_path = atb_speed_config.model.model_path if not model_path else model_path
+
+        if device_ids is None and atb_speed_config.model:
+            device_ids = atb_speed_config.model.device_ids
+        self.device_ids = device_ids
+        self.device_id_list = [int(item.strip()) for item in self.device_ids.split(",") if item.isnumeric()]
+        self.local_rank, self.world_size = self.setup_model_parallel()
+
+        self.logger_name = f"device{self.local_rank}_{self.world_size}_{time.time()}.log"
+        os.makedirs(atb_speed_config.model.log_dir, exist_ok=True)
+        self.logger_path = os.path.join(atb_speed_config.model.log_dir, self.logger_name)
+        self.logger = init_logger(logging.getLogger(f"device_{self.local_rank}"), self.logger_path)
+        if atb_speed_config.model.bind_cpu:
+            try:
+                self.bind_cpu()
+            except Exception as err:
+                self.logger.error(f"Failed to bind cpu, skip to bind cpu. \nDetail: %s ", err)
+        self.set_torch_env(self.device_ids, options)
+        self.model, self.tokenizer = self.init_model()
+        self.logger.info(self.model.device)
+        self.logger.info(f"load model from %s successfully!", os.path.basename(inspect.getmodule(self.model).__file__))
+        self.logger.info(f"load model from %s successfully!", os.path.realpath(inspect.getmodule(self.model).__file__))
+
+    @property
+    def _device(self) -> str:
+        """
+         获取当前所在设备
+        :return:
+        """
+        return get_device()
+
+    @property
+    def device(self) -> torch.device:
+        """
+        获取模型所在的设备
+        :return:
+        """
+        return self.model.device
+
+    @property
+    def device_type(self) -> str:
+        """
+        获取模型所在的设备的字符串
+        :return:
+        """
+        return self.model.device.type
+
+    @property
+    def device_name(self) -> str:
+        """
+        获取所在设备的详细硬件名称
+        :return:
+        """
+        if self.device_type == DeviceType.npu:
+            device_name = torch.npu.get_device_name()
+        elif self.device_type == DeviceType.cuda:
+            device_name = torch.cuda.get_device_name()
+        else:
+            device_name = "cpu"
+        return "_".join(device_name.split())
+
+    @abstractmethod
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        ...
+
+    @staticmethod
+    def set_torch_env(device_ids, options: Dict = None):
+        """
+
+        :param device_ids:
+        :param options:
+        :return:
+        """
+
+    @staticmethod
+    def bind_cpu():
+        ...
+
+    @staticmethod
+    def setup_model_parallel() -> Tuple[int, int]:
+        local_rank, world_size = 0, 1
+        return local_rank, world_size
+
+    @classmethod
+    def safe_serialization(cls, model, tokenizer, save_dir):
+        """
+        权重转safetensors
+        :param model:
+        :param tokenizer:
+        :param save_dir:
+        :return:
+        """
+        os.makedirs(save_dir, exist_ok=True)
+        model.save_pretrained(save_dir, safe_serialization=True)
+        tokenizer.save_pretrained(save_dir)
+
+    def infer(self, query, model_params=None):
+        """
+        推理代码
+        :param query:
+        :param model_params:
+        :return:
+        """
+        inputs = self.tokenizer(query, return_tensors='pt')
+        inputs = inputs.to(self.model.device)
+        with torch.no_grad():
+            start_time = time.time()
+            model_params = model_params if model_params is not None else {}
+            pred = self.model.generate(**inputs, **model_params)
+            end_time = time.time()
+            time_cost = end_time - start_time
+        output = self.tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)
+        self.logger.info(output)
+        self.logger.info(f"cost %s s", time_cost)
+        new_tokens = len(pred[0]) - len(inputs.input_ids[0])
+        final_msg = f"generate {new_tokens} new tokens，({new_tokens / time_cost:.2f} tokens/s)"
+        self.logger.info(final_msg)
+        return output
+
+    def infer_batch(self, query, model_params=None):
+        """
+        推理代码
+        :param query:
+        :param model_params:
+        :return:
+        """
+        inputs = self.tokenizer(query, return_tensors='pt', padding=True)
+        inputs = inputs.to(self.model.device)
+        with torch.no_grad():
+            start_time = time.time()
+            model_params = model_params if model_params is not None else {}
+            pred = self.model.generate(**inputs, **model_params)
+            end_time = time.time()
+            time_cost = end_time - start_time
+        output = self.tokenizer.batch_decode(pred, skip_special_tokens=True, clean_up_tokenization_spaces=False)
+
+        for ind, item in enumerate(output):
+            self.logger.info(f"###### batch %s ", ind)
+            self.logger.info(item)
+
+        self.logger.info(f"cost %s s", time_cost)
+        new_tokens = len(pred[0]) - len(inputs.input_ids[0])
+        final_msg = f"generate {new_tokens} new tokens，({new_tokens / time_cost:.2f} tokens/s)"
+        self.logger.info(final_msg)
+        return output
+
+    def infer_test(self, batch_size: int = 1, seq_in: int = 2048, seq_out: int = 64):
+        """
+        推理代码
+        :param batch_size: 特定batch size
+        :param seq_in:  特定长度输入
+        :param seq_out: 特定长度输出
+        :return:
+        """
+        inputs = self.tokenizer("hi", return_tensors='pt')
+        dummy_input_ids_nxt = torch.randint(0, self.model.config.vocab_size, [batch_size, seq_in], dtype=torch.int64)
+        dummy_attention_mask = torch.ones((batch_size, seq_in), dtype=torch.int64)
+        inputs["input_ids"] = dummy_input_ids_nxt
+        inputs["attention_mask"] = dummy_attention_mask
+        inputs = inputs.to(self.model.device)
+        with torch.no_grad():
+            start_time = time.time()
+            pred = self.model.generate(**inputs, max_new_tokens=seq_out,
+                                       eos_token_id=self.model.config.vocab_size * 2)
+            end_time = time.time()
+            time_cost = end_time - start_time
+        output = self.tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)
+        self.logger.info(f"cost %s s", time_cost)
+        new_tokens = len(pred[0]) - seq_in
+        final_msg = (f"generate {batch_size * new_tokens} new tokens，"
+                     f"({batch_size * new_tokens / time_cost:.2f} tokens/s)")
+        self.logger.info(final_msg)
+        return output
+
+    def remove_part_of_generation_config(self, generation_config) -> GenerationConfig:
+        """
+        移除部分当前不支持后处理相关参数
+        :param generation_config:
+        :return:
+        """
+        ori_gen = GenerationConfig()
+        diff_dict = generation_config.to_diff_dict()
+        self.logger.info(diff_dict)
+        for key in diff_dict:
+            if key.endswith("_id"):
+                continue
+            ori_value = getattr(ori_gen, key, None)
+            if ori_value is not None:
+                setattr(generation_config, key, getattr(ori_gen, key))
+                self.logger.info(f"replace %s", key)
+        return generation_config
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/gpu.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/gpu.py
@ -0,0 +1,57 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+common launcher
+"""
+import abc
+import os
+from typing import Dict
+
+import torch
+from atb_speed.common.launcher.base import BaseLauncher
+
+
+class Launcher(BaseLauncher):
+    """
+    BaseLauncher
+    """
+
+    @staticmethod
+    def set_torch_env(device_ids, options: Dict = None):
+        """
+
+        :param device_ids:
+        :param options:
+        :return:
+        """
+        os.environ['CUDA_VISIBLE_DEVICES'] = device_ids
+
+    @abc.abstractmethod
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        ...
+
+
+class ParallelLauncher(Launcher):
+    @staticmethod
+    def set_torch_env(device_ids, options: Dict = None):
+        os.environ['CUDA_VISIBLE_DEVICES'] = device_ids
+
+    @abc.abstractmethod
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        ...
+
+    def setup_model_parallel(self):
+        torch.distributed.init_process_group()
+        local_rank = torch.distributed.get_rank()
+        world_size = torch.distributed.get_world_size()
+        torch.manual_seed(1)
+        return local_rank, world_size
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/npu.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/launcher/npu.py
@ -0,0 +1,117 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+common launcher
+"""
+import abc
+from dataclasses import dataclass
+from typing import Dict
+
+import torch
+import torch_npu
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.cpu_binding import CPUBinder
+from atb_speed.common.launcher.base import BaseLauncher
+
+
+@dataclass
+class NPUSocInfo:
+    soc_name: str = ""
+    soc_version: int = -1
+    need_nz: bool = False
+
+    def __post_init__(self):
+        self.soc_version = torch_npu._C._npu_get_soc_version()
+        if self.soc_version in (100, 101, 102, 103, 104, 200, 201, 202, 203):
+            self.need_nz = True
+
+
+class Launcher(BaseLauncher):
+    """
+    BaseLauncher
+    """
+
+    def __init__(self, device_ids: str = None, model_path="", options=None):
+        super().__init__(device_ids, model_path, options)
+        self.soc_info = NPUSocInfo()
+        self.fit_npu(self.model)
+
+    @staticmethod
+    def set_torch_env(device_ids, options: Dict = None):
+        """
+
+        :param device_ids:
+        :param options:
+        :return:
+        """
+        torch_npu.npu.set_device(int(device_ids.split(",")[0]))
+        torch.npu.set_compile_mode(jit_compile=False)
+        torch.npu.set_option(options)
+
+    @abc.abstractmethod
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        ...
+
+    def fit_npu(self, model):
+        """
+        芯片适配,提前转换，提高性能
+        :param model:
+        :return:
+        """
+        if not self.soc_info.need_nz:
+            for _, module in model.named_modules():
+                if isinstance(module, torch.nn.Linear):
+                    module.weight.data = torch_npu.npu_format_cast(module.weight.data, 2)
+            self.logger.info(f"soc info: {self.soc_info.soc_version} , {self.soc_info.soc_name}, support ND")
+        else:
+            # if on 910A or 310P chip, eliminate the TransData and Transpose ops by converting weight data types
+            for name, module in model.named_modules():
+                if isinstance(module, torch.nn.Linear):
+                    if name == 'lm_head':
+                        # eliminate TransData op before lm_head calculation
+                        module.weight.data = torch.nn.parameter.Parameter(module.weight.data)
+                    module.weight.data = torch_npu.npu_format_cast(module.weight.data, 29)
+            self.logger.info(f"soc info: {self.soc_info.soc_version} , {self.soc_info.soc_name}, support NZ")
+
+        for _, module in model.named_modules():
+            if isinstance(module, torch.nn.Embedding):
+                module.weight.data = torch_npu.npu_format_cast(module.weight.data, 2)
+
+    def bind_cpu(self):
+        """
+        绑核
+        :return:
+        """
+        cpu_binder = CPUBinder(self.logger)
+        cpu_binder.bind_cpus(self.device_id_list, self.local_rank, 1.0)
+        self.logger.info("Bind cpu successfully!")
+
+
+class ParallelLauncher(Launcher):
+
+    @staticmethod
+    def set_torch_env(device_ids, options: Dict = None):
+        torch.npu.set_compile_mode(jit_compile=False)
+        torch.npu.set_option(options)
+
+    @abc.abstractmethod
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        ...
+
+    def setup_model_parallel(self):
+        torch.distributed.init_process_group(atb_speed_config.model.parallel_backend)
+        local_rank = torch.distributed.get_rank()
+        world_size = torch.distributed.get_world_size()
+        torch_npu.npu.set_device(self.device_id_list[local_rank])
+        # seed must be the same in all processes
+        torch.manual_seed(1)
+        return local_rank, world_size
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/log/init.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/log/init.py
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/log/logging.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/log/logging.py
@ -0,0 +1,39 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+logging
+"""
+import logging
+import os
+from logging.handlers import RotatingFileHandler
+
+from atb_speed.common.log.multiprocess_logging_handler import install_logging_handler
+
+
+def init_logger(logger: logging.Logger, file_name: str):
+    """
+    日志初始化
+    :param logger:
+    :param file_name:
+    :return:
+    """
+    logger.setLevel(logging.INFO)
+    # 创建日志记录器，指明日志保存路径,每个日志的大小，保存日志的上限
+    flask_file_handle = RotatingFileHandler(
+        filename=file_name,
+        maxBytes=int(os.getenv('PYTHON_LOG_MAXSIZE', "1073741824")),
+        backupCount=10,
+        encoding="utf-8")
+    formatter = logging.Formatter('%(asctime)s [%(levelname)s] pid: %(process)d %(filename)s-%(lineno)d: %(message)s')
+    # 将日志记录器指定日志的格式
+    flask_file_handle.setFormatter(formatter)
+    # 为全局的日志工具对象添加日志记录器
+    logger.addHandler(flask_file_handle)
+
+    # 添加控制台输出日志
+    console_handle = logging.StreamHandler()
+    console_handle.setFormatter(formatter)
+    logger.addHandler(console_handle)
+    install_logging_handler(logger)
+    return logger
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/log/multiprocess_logging_handler.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/log/multiprocess_logging_handler.py
@ -0,0 +1,135 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved.
+"""
+multiprocess_logging_handler
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import unicode_literals
+
+import logging
+import multiprocessing
+import threading
+
+
+def install_logging_handler(logger=None):
+    """
+    Wraps the handlers in the given Logger with an MultiProcessingHandler.
+    :param logger: whose handlers to wrap. By default, the root logger.
+    """
+    if logger is None:
+        logger = logging.getLogger("service_operation")
+
+    for index, org_handler in enumerate(list(logger.handlers)):
+        handler = MultiLoggingHandler('mp-handler-{0}'.format(index), log_handler=org_handler)
+        logger.removeHandler(org_handler)
+        logger.addHandler(handler)
+
+
+class MultiLoggingHandler(logging.Handler):
+    """
+    multiprocessing handler.
+    """
+
+    def __init__(self, name, log_handler=None):
+        """
+        Init multiprocessing handler
+        :param name:
+        :param log_handler:
+        :return:
+        """
+        super().__init__()
+
+        if log_handler is None:
+            log_handler = logging.StreamHandler()
+
+        self.log_handler = log_handler
+        self.queue = multiprocessing.Queue(-1)
+        self.setLevel(self.log_handler.level)
+        self.set_formatter(self.log_handler.formatter)
+        # The thread handles receiving records asynchronously.
+        t_thd = threading.Thread(target=self.receive, name=name)
+        t_thd.daemon = True
+        t_thd.start()
+
+    def set_formatter(self, fmt):
+        """
+
+        :param fmt:
+        :return:
+        """
+        logging.Handler.setFormatter(self, fmt)
+        self.log_handler.setFormatter(fmt)
+
+    def receive(self):
+        """
+
+        :return:
+        """
+        while True:
+            try:
+                record = self.queue.get()
+                self.log_handler.emit(record)
+            except (KeyboardInterrupt, SystemExit) as err:
+                raise err
+            except EOFError:
+                break
+            except ValueError:
+                pass
+
+    def send(self, message):
+        """
+
+        :param message:
+        :return:
+        """
+        self.queue.put_nowait(message)
+
+    def emit(self, record):
+        """
+
+        :param record:
+        :return:
+        """
+        try:
+            sd_record = self._format_record(record)
+            self.send(sd_record)
+        except (KeyboardInterrupt, SystemExit) as err:
+            raise err
+        except ValueError:
+            self.handleError(record)
+
+    def close(self):
+        """
+
+        :return:
+        """
+        self.log_handler.close()
+        logging.Handler.close(self)
+
+    def handle(self, record):
+        """
+
+        :param record:
+        :return:
+        """
+        rsv_record = self.filter(record)
+        if rsv_record:
+            self.emit(record)
+        return rsv_record
+
+    def _format_record(self, org_record):
+        """
+
+        :param org_record:
+        :return:
+        """
+        if org_record.args:
+            org_record.msg = org_record.msg % org_record.args
+            org_record.args = None
+        if org_record.exc_info:
+            self.format(org_record)
+            org_record.exc_info = None
+        return org_record
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/performance/init.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/performance/init.py
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/performance/base.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/performance/base.py
@ -0,0 +1,231 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+performance test base
+"""
+import time
+from dataclasses import dataclass
+from enum import Enum
+from typing import List, Callable
+
+import torch
+import torch.distributed as dist
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher.base import BaseLauncher
+from atb_speed.common.timer import Timer
+from atb_llm.utils.file_utils import safe_open
+
+
+class PerfMode(str, Enum):
+    detail = "detail"
+    normal = "normal"
+
+
+@dataclass
+class PerformanceTestConfig:
+    """
+    PerformanceTestGPUConfig
+    """
+    batch_size: int = 1
+    max_len_exp: int = 5
+    min_len_exp: int = 11
+    model_name: str = "model"
+    device_name: str = "cpu"
+    save_file_name: str = ""
+    case_pair: List[List[int]] = None
+
+    def __post_init__(self):
+        self.batch_size = atb_speed_config.performance.batch_size
+        self.max_len_exp = atb_speed_config.performance.max_len_exp
+        self.min_len_exp = atb_speed_config.performance.min_len_exp
+        self.model_name = atb_speed_config.performance.model_name
+        self.case_pair = atb_speed_config.performance.case_pair
+        if not atb_speed_config.performance.save_file_name:
+            self.save_file_name = f"performance_test_{self.model_name}_{self.device_name}_bs{self.batch_size}.csv"
+        else:
+            self.save_file_name = atb_speed_config.performance.save_file_name
+
+
+class PerformanceTest:
+    """
+    PerformanceTestNPU
+    """
+
+    def __init__(self, launcher: BaseLauncher):
+        """
+
+        :param launcher:
+        """
+        self.launcher = launcher
+        self.local_rank, self.world_size = launcher.local_rank, launcher.world_size
+        self.config = PerformanceTestConfig(device_name=self.launcher.device_name)
+        self.launcher.logger.info(self.config.__dict__)
+        self.model, self.tokenizer = launcher.model, launcher.tokenizer
+        self.dummy_input = "Common sense questions and answers\n\nQuestion: Why do people need sleep\nFactual answer:"
+        if atb_speed_config.performance.perf_mode == PerfMode.detail:
+            self.perf = self._perf_detail_v2
+        else:
+            self.perf = self._perf
+        self.test_case = self.generate_test_case()
+
+    def generate_test_case(self):
+        if self.config.case_pair is None:
+            return [[2 ** i, 2 ** j]
+                    for i in range(self.config.min_len_exp, self.config.max_len_exp + 1)
+                    for j in range(self.config.min_len_exp, self.config.max_len_exp + 1)]
+        return self.config.case_pair
+
+    def warm_up(self, seq_len_in=None, seq_len_out=None):
+        """
+
+        :return:
+        """
+        if seq_len_in is None:
+            seq_len_in = max(case[0] for case in self.test_case)
+        if seq_len_out is None:
+            seq_len_out = max(case[1] for case in self.test_case)
+        dummy_input_ids_nxt = torch.randint(0, self.model.config.vocab_size, [self.config.batch_size, seq_len_in],
+                                            dtype=torch.int64)
+        dummy_attention_mask = torch.ones((self.config.batch_size, seq_len_in), dtype=torch.int64)
+        inputs = self.tokenizer([self.dummy_input] * self.config.batch_size, return_tensors="pt", padding='max_length',
+                                max_length=seq_len_in)
+        inputs["input_ids"] = dummy_input_ids_nxt
+        inputs["attention_mask"] = dummy_attention_mask
+        inputs = inputs.to(self.model.device)
+        with torch.no_grad():
+            _ = self.model.generate(
+                **inputs,
+                max_new_tokens=seq_len_out,
+                eos_token_id=self.model.config.vocab_size * 2
+            )
+        self.launcher.logger.info("warm up finished.")
+
+    def run_test(self):
+        self.launcher.logger.info("---------------inference---------------")
+        file = None
+        if self.local_rank == 0:
+            file = safe_open(self.config.save_file_name, "w", encoding="utf-8")
+            file.write(
+                "batch_size,"
+                "input_seq_len(Encoding),"
+                "output_seq_len(Decoding),"
+                "ResponseTime(s),"
+                "forward_first_token_time(ms),"
+                "forward_next_token_time(ms),"
+                "pre_next_token_time(ms),"
+                "post_next_token_time_post(ms)\n")
+        for seq_len_in, seq_len_out in self.test_case:
+            time_tensor = self._run(seq_len_in, seq_len_out)
+            if self.local_rank == 0:
+                file.write(
+                    f"{self.config.batch_size},"
+                    f"{seq_len_in},"
+                    f"{seq_len_out},"
+                    f"{round(time_tensor[0], 2)},"
+                    f"{time_tensor[1]},"
+                    f"{time_tensor[2]},"
+                    f"{time_tensor[3]},"
+                    f"{time_tensor[4]}\n")
+        if self.local_rank == 0:
+            file.close()
+
+    def _run(self, seq_len_in, seq_len_out):
+        dummy_input_ids_nxt = torch.randint(0, self.model.config.vocab_size, [self.config.batch_size, seq_len_in],
+                                            dtype=torch.int64)
+        dummy_attention_mask = torch.ones((self.config.batch_size, seq_len_in), dtype=torch.int64)
+        inputs = self.tokenizer(
+            [self.dummy_input] * self.config.batch_size,
+            return_tensors="pt", padding='max_length', max_length=seq_len_in)
+        inputs["input_ids"] = dummy_input_ids_nxt
+        inputs["attention_mask"] = dummy_attention_mask
+        inputs = inputs.to(self.model.device)
+        self.launcher.logger.info("---------------inputs shape---------------")
+        self.launcher.logger.info(inputs.input_ids.shape)
+        self.launcher.logger.info(f"seq_len_in: {seq_len_in}, seq_len_out: {seq_len_out}")
+        start_time = time.time()
+        forward_first_token_time, forward_next_token_time, pre_next_token_time, post_next_token_time_post = (
+            self.perf(inputs, seq_len_out))
+        end_time = time.time()
+        # output
+        # time analysis
+        total_time = end_time - start_time
+        time_tensor = torch.tensor(
+            [total_time,
+             forward_first_token_time,
+             forward_next_token_time,
+             pre_next_token_time,
+             post_next_token_time_post], device=self.model.device)
+        if self.world_size > 1:
+            dist.all_reduce(time_tensor, dist.ReduceOp.MAX)
+        time_tensor = time_tensor.tolist()
+        return time_tensor
+
+    def _perf_detail_v2(self, inputs, seq_len_out):
+        """
+        使用装饰器的方式进行计时，从而从根本上解决侵入式修改打点的方式
+        :param inputs:
+        :param seq_len_out:
+        :return:
+        """
+        Timer.reset()
+        Timer.sync = getattr(torch, self.launcher.device_type).synchronize
+        with torch.no_grad():
+            generate_ids = self.model.generate(**inputs, max_new_tokens=seq_len_out,
+                                               eos_token_id=self.model.config.vocab_size * 2  # 避免提前停止
+                                               )
+            # decode
+            if not atb_speed_config.performance.skip_decode:
+                _ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True,
+                                                clean_up_tokenization_spaces=False)
+        return [Timer.timeit_res.first_token_delay, Timer.timeit_res.next_token_avg_delay, 0, 0]
+
+    def _perf_detail(self, inputs, seq_len_out):
+        with torch.no_grad():
+            generate_ids, \
+                forward_first_token_time, \
+                forward_next_token_time, \
+                pre_next_token_time, \
+                post_next_token_time_post = \
+                self.model.generate(**inputs, max_new_tokens=seq_len_out,
+                                    eos_token_id=self.model.config.vocab_size * 2  # 避免提前停止
+                                    )
+            # decode
+            if not atb_speed_config.performance.skip_decode:
+                _ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True,
+                                                clean_up_tokenization_spaces=False)
+        return [forward_first_token_time,
+                forward_next_token_time,
+                pre_next_token_time,
+                post_next_token_time_post]
+
+    def _perf(self, inputs, seq_len_out):
+        with torch.no_grad():
+            getattr(torch, self.launcher.device_type).synchronize()
+            first_token_start = time.time()
+            generate_ids = self.model.generate(**inputs,
+                                               min_new_tokens=1,
+                                               max_new_tokens=1)
+            getattr(torch, self.launcher.device_type).synchronize()
+            first_token_end = time.time()
+            if not atb_speed_config.performance.skip_decode:
+                _ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True,
+                                                clean_up_tokenization_spaces=False)
+
+            getattr(torch, self.launcher.device_type).synchronize()
+            total_start = time.time()
+            generate_ids = self.model.generate(
+                **inputs,
+                min_new_tokens=seq_len_out,
+                max_new_tokens=seq_len_out
+            )
+            getattr(torch, self.launcher.device_type).synchronize()
+            total_end = time.time()
+        if not atb_speed_config.performance.skip_decode:
+            _ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
+        # time analysis
+        forward_first_token_time = (first_token_end - first_token_start) * 1000
+        time_inc_total = (total_end - total_start) * 1000
+
+        forward_next_token_time = (time_inc_total - forward_first_token_time) / (seq_len_out - 1)
+        return [forward_first_token_time, forward_next_token_time, 0, 0]
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/init.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/init.py
@ -0,0 +1,21 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+common launcher
+"""
+from atb_speed.common.config import atb_speed_config
+
+from .base import CEVALPrecisionTest, MMLUPrecisionTest
+
+
+def get_precision_test_cls(mode=""):
+    """
+
+    :return:
+    """
+    cls_map = {
+        "mmlu": MMLUPrecisionTest,
+        "ceval": CEVALPrecisionTest
+    }
+    return cls_map.get(mode or atb_speed_config.precision.mode.lower(), CEVALPrecisionTest)
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/base.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/base.py
@ -0,0 +1,256 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+precision base
+"""
+import json
+import os
+from string import ascii_letters
+
+import pandas as pd
+import torch
+from atb_llm.utils.file_utils import safe_open
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher.base import BaseLauncher
+from atb_speed.common.utils import torch_parallel_info
+from tqdm import tqdm
+
+HARD_TASK = (
+    "advanced_mathematics", "discrete_mathematics", "probability_and_statistics", "college_chemistry",
+    "college_physics", "high_school_mathematics", "high_school_chemistry", "high_school_physics"
+)
+
+
+class Record:
+    """only keep one card result when debug is False"""
+
+    def __init__(self, log_dir, log_flag, debug=False):
+        self.debug = debug
+        self.flag = log_flag if debug else ""
+        self.log_name = os.path.join(log_dir, f"device{self.flag}.log")
+        self.cache_name = os.path.join(log_dir, f"cache{self.flag}.csv")
+        self.begin_idx = self.load_cache()
+
+    def log(self, *msg):
+        if self.debug or torch_parallel_info.is_rank_0:
+            with safe_open(self.log_name, "a", encoding="utf-8") as f:
+                f.write(" ".join([str(i) for i in msg]) + '\n')
+
+    def load_cache(self):
+        if not os.path.exists(self.cache_name):
+            self.log("[-] No cache file, cache will be created")
+            return 0
+        self.log("[~] Loading cache on last abnormal exit ... (and continue with the cache)")
+        with safe_open(self.cache_name, "r", encoding="utf-8") as f:
+            cache = f.read().strip().split()
+        if not cache:
+            return 0
+        cache = [row.split(",") for row in cache]
+        start_idx = cache[-1][0]
+        self.log(f"[+] Load cache successfully! start idx: {start_idx}")
+        return int(start_idx) + 1
+
+    def update_cache(self, task_name, question_id, truth_answer, predict_answer):
+        if self.debug or torch_parallel_info.is_rank_0:
+            with safe_open(self.cache_name, "a", encoding="utf-8") as f:
+                f.write(f"{question_id},{task_name},{truth_answer},{predict_answer}\n")
+
+
+class PrecisionTestBase:
+    def __init__(self, launcher: BaseLauncher, workdir="", **kwargs):
+        workdir = atb_speed_config.precision.work_dir if not workdir else workdir
+        self.data_dir = os.path.join(workdir, "data")
+        self.result_output_dir = os.path.join(workdir, "test_result")
+        self.init_result_dir()
+        self.choices = ["A", "B", "C", "D"]
+        self.shot = 5
+        self.batch = 1
+        self.seq_len_out = 32
+
+        self.model, self.tokenizer = launcher.model, launcher.tokenizer
+        self.local_rank = launcher.local_rank
+        self.launcher = launcher
+        self.recorder = Record(self.result_output_dir, self.local_rank)
+        self.subject_mapping_path = os.path.join(os.path.dirname(os.path.realpath(__file__)),
+                                                 f"{atb_speed_config.precision.mode}_subject_mapping.json")
+        # kwargs have higher priority
+        if atb_speed_config.precision:
+            self.update_param(atb_speed_config.precision.__dict__)
+        self.update_param(kwargs)
+
+    @staticmethod
+    def format_subject(subject):
+        sub_list = subject.split("_")
+        final_str = ""
+        for entry in sub_list:
+            final_str += " " + entry
+        return final_str
+
+    def update_param(self, param_dict):
+        for key, value in param_dict.items():
+            setattr(self, key, value)
+            self.recorder.log(f"[+] set {key} to {value}")
+
+    def init_result_dir(self):
+        if torch_parallel_info.is_rank_0:
+            os.makedirs(self.result_output_dir, exist_ok=True)
+        if torch_parallel_info.world_size > 1:
+            torch.distributed.barrier()
+
+    def compute_metric(self, subject_mapping):
+        run_results = pd.read_csv(
+            self.recorder.cache_name,
+            names=['question_id', 'task_name', 'truth_answer', 'predict_answer'])
+        classes_acc = dict()
+        subject_acc = dict()
+        hard_task = [0, 0]
+        for task in subject_mapping:
+            class_of_task = subject_mapping[task][2]
+            this_task = run_results.loc[run_results.task_name == task]
+            if not this_task.shape[0]:
+                continue
+            correct_num = (this_task.truth_answer == this_task.predict_answer).sum()
+            if class_of_task not in classes_acc:
+                classes_acc[class_of_task] = [0, 0]  # correct num, total num
+            if task in HARD_TASK:
+                hard_task[0] += correct_num
+            hard_task[1] += this_task.shape[0]
+            subject_acc[task] = correct_num / this_task.shape[0]
+            classes_acc[class_of_task][0] += correct_num
+            classes_acc[class_of_task][1] += this_task.shape[0]
+
+        avg_acc = sum([i[0] for i in classes_acc.values()]) / sum([j[1] for j in classes_acc.values()])
+        for c in classes_acc:
+            classes_acc[c] = classes_acc[c][0] / classes_acc[c][1]
+        classes_acc["Avg"] = avg_acc
+        classes_acc["Avg(Hard)"] = hard_task[0] / hard_task[1]
+        with safe_open(os.path.join(self.result_output_dir, f"result{self.recorder.flag}_subject_acc.json"), "w") as fp:
+            json.dump(subject_acc, fp)
+        with safe_open(os.path.join(self.result_output_dir, f"result{self.recorder.flag}_classes_acc.json"), "w") as fp:
+            json.dump(classes_acc, fp)
+        if torch_parallel_info.is_rank_0:
+            self.launcher.logger.info(f"[+] Avg acc: {classes_acc['Avg']}")
+
+    def get_subject_mapping(self):
+        with safe_open(self.subject_mapping_path, "r", encoding="utf-8") as f:
+            subject_mapping = json.load(f)
+        return subject_mapping
+
+    def load_csv_by_task_name(self, task_name):
+        dev_df = pd.read_csv(os.path.join(self.data_dir, "dev", task_name + "_dev.csv"), header=None)[
+                 :self.shot + 1]
+        val_df = pd.read_csv(os.path.join(self.data_dir, "val", task_name + "_val.csv"), header=None)
+
+        return dev_df, val_df
+
+    def format_example(self, df, idx, include_answer=True):
+        prompt = df.iloc[idx, 0]
+        k = len(self.choices)
+        for j in range(k):
+            prompt += "\n{}. {}".format(self.choices[j], df.iloc[idx, j + 1])
+        prompt += "\nAnswer:"
+        if include_answer:
+            prompt += " {}\n\n".format(df.iloc[idx, k + 1])
+        return prompt
+
+    def gen_prompt(self, train_df, subject, k=-1):
+        prompt = "The following are multiple choice questions (with answers) about {}.\n\n".format(
+            self.format_subject(subject))
+        if k == -1:
+            k = train_df.shape[0]
+        for i in range(k):
+            prompt += self.format_example(train_df, i)
+        return prompt
+
+    def batch_infer(self, qr_pair, begin_idx):
+        prompts = [item['prompt'] for item in qr_pair]
+        truth_answers = [item['answer'] for item in qr_pair]
+        task_names = [item['task_name'] for item in qr_pair]
+
+        inputs = self.tokenizer(prompts, return_tensors="pt", padding='longest')
+        inputs = inputs.to(self.model.device)
+        input_len = len(inputs.input_ids[0])
+        with torch.no_grad():
+            output = self.model.generate(inputs.input_ids,
+                                         attention_mask=inputs.attention_mask,
+                                         max_new_tokens=self.seq_len_out)
+        answers = self.tokenizer.batch_decode(output.to(torch.int32)[:, input_len:])
+
+        for prompt, truth_answer, task_name, ori_answer in zip(prompts, truth_answers, task_names, answers):
+            self.recorder.log("\n========== prompt start ==========\n", prompt,
+                              "\n==========  prompt end  ==========\n")
+            self.recorder.log(f"[+] prompt length: {input_len}")
+            self.recorder.log("\n========== answer start ==========\n", ori_answer,
+                              "\n==========  answer end  ==========\n")
+            answer_list = [char.upper() for char in ori_answer if char in ascii_letters]
+            answer = answer_list[0] if answer_list else "-1"
+            is_correct = "Correct" if answer == truth_answer else "Wrong"
+            self.recorder.log(f"[{is_correct}] predict: {answer}, label: {truth_answer}")
+            self.recorder.update_cache(task_name, begin_idx, truth_answer, answer)
+            begin_idx += 1
+
+    def run(self):
+        subject_mapping = self.get_subject_mapping()
+        subject_name_list = sorted(list(subject_mapping.keys()))
+        qr_pair = []
+
+        total_len = 0
+        begin_idx = self.recorder.begin_idx
+        for task_name in subject_name_list:
+            dev_df, val_df = self.load_csv_by_task_name(task_name)
+            total_len += len(val_df)
+            if len(val_df) <= begin_idx:
+                self.recorder.log(f"[~] Skip Task: {task_name}")
+                begin_idx -= len(val_df)
+                continue
+
+            for i in range(val_df.shape[0]):
+                if begin_idx > 0:
+                    begin_idx -= 1
+                    continue
+                for cut_shot in range(self.shot):
+                    prompt_end = self.format_example(val_df, i, include_answer=False)
+                    train_prompt = self.gen_prompt(dev_df, task_name, self.shot - cut_shot)
+                    prompt = train_prompt + prompt_end
+                    input_len = len(self.tokenizer(prompt, return_tensors="pt").input_ids[0])
+                    if input_len > 2000:
+                        continue
+                    label = val_df.iloc[i, val_df.shape[1] - 1]
+                    qr_pair.append({'task_name': task_name, 'prompt': prompt, 'answer': label})
+                    break
+        pbar = None
+        if torch_parallel_info.is_rank_0:
+            pbar = tqdm(total=total_len, initial=self.recorder.begin_idx)
+        for i in range(0, len(qr_pair), self.batch):
+            self.batch_infer(qr_pair[i: i + self.batch], i + self.recorder.begin_idx)
+            if torch_parallel_info.is_rank_0:
+                pbar.update(self.batch if i + self.batch <= len(qr_pair) else len(qr_pair) - i)
+        if torch_parallel_info.is_rank_0:
+            pbar.close()
+        self.compute_metric(subject_mapping)
+
+
+class CEVALPrecisionTest(PrecisionTestBase):
+    """
+    CEVAL
+    """
+
+    def load_csv_by_task_name(self, task_name):
+        dev_df, val_df = super().load_csv_by_task_name(task_name)
+
+        # remove the first row "column names" and the first column "id"
+        dev_df = dev_df.iloc[1:, 1:]
+        val_df = val_df.iloc[1:, 1:]
+
+        return dev_df, val_df
+
+
+class MMLUPrecisionTest(PrecisionTestBase):
+    """
+    MMLU
+    """
+
+    def compute_metric(self, subject_mapping):
+        subject_mapping_adapt = {k: [None, None, v] for k, v in subject_mapping.items()}
+        return super().compute_metric(subject_mapping_adapt)
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/ceval_subject_mapping.json
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/ceval_subject_mapping.json
@ -0,0 +1,262 @@
+{
+	"computer_network": [
+		"Computer Network",
+		"\u8ba1\u7b97\u673a\u7f51\u7edc",
+		"STEM"
+	],
+	"operating_system": [
+		"Operating System",
+		"\u64cd\u4f5c\u7cfb\u7edf",
+		"STEM"
+	],
+	"computer_architecture": [
+		"Computer Architecture",
+		"\u8ba1\u7b97\u673a\u7ec4\u6210",
+		"STEM"
+	],
+	"college_programming": [
+		"College Programming",
+		"\u5927\u5b66\u7f16\u7a0b",
+		"STEM"
+	],
+	"college_physics": [
+		"College Physics",
+		"\u5927\u5b66\u7269\u7406",
+		"STEM"
+	],
+	"college_chemistry": [
+		"College Chemistry",
+		"\u5927\u5b66\u5316\u5b66",
+		"STEM"
+	],
+	"advanced_mathematics": [
+		"Advanced Mathematics",
+		"\u9ad8\u7b49\u6570\u5b66",
+		"STEM"
+	],
+	"probability_and_statistics": [
+		"Probability and Statistics",
+		"\u6982\u7387\u7edf\u8ba1",
+		"STEM"
+	],
+	"discrete_mathematics": [
+		"Discrete Mathematics",
+		"\u79bb\u6563\u6570\u5b66",
+		"STEM"
+	],
+	"electrical_engineer": [
+		"Electrical Engineer",
+		"\u6ce8\u518c\u7535\u6c14\u5de5\u7a0b\u5e08",
+		"STEM"
+	],
+	"metrology_engineer": [
+		"Metrology Engineer",
+		"\u6ce8\u518c\u8ba1\u91cf\u5e08",
+		"STEM"
+	],
+	"high_school_mathematics": [
+		"High School Mathematics",
+		"\u9ad8\u4e2d\u6570\u5b66",
+		"STEM"
+	],
+	"high_school_physics": [
+		"High School Physics",
+		"\u9ad8\u4e2d\u7269\u7406",
+		"STEM"
+	],
+	"high_school_chemistry": [
+		"High School Chemistry",
+		"\u9ad8\u4e2d\u5316\u5b66",
+		"STEM"
+	],
+	"high_school_biology": [
+		"High School Biology",
+		"\u9ad8\u4e2d\u751f\u7269",
+		"STEM"
+	],
+	"middle_school_mathematics": [
+		"Middle School Mathematics",
+		"\u521d\u4e2d\u6570\u5b66",
+		"STEM"
+	],
+	"middle_school_biology": [
+		"Middle School Biology",
+		"\u521d\u4e2d\u751f\u7269",
+		"STEM"
+	],
+	"middle_school_physics": [
+		"Middle School Physics",
+		"\u521d\u4e2d\u7269\u7406",
+		"STEM"
+	],
+	"middle_school_chemistry": [
+		"Middle School Chemistry",
+		"\u521d\u4e2d\u5316\u5b66",
+		"STEM"
+	],
+	"veterinary_medicine": [
+		"Veterinary Medicine",
+		"\u517d\u533b\u5b66",
+		"STEM"
+	],
+	"college_economics": [
+		"College Economics",
+		"\u5927\u5b66\u7ecf\u6d4e\u5b66",
+		"Social Science"
+	],
+	"business_administration": [
+		"Business Administration",
+		"\u5de5\u5546\u7ba1\u7406",
+		"Social Science"
+	],
+	"marxism": [
+		"Marxism",
+		"\u9a6c\u514b\u601d\u4e3b\u4e49\u57fa\u672c\u539f\u7406",
+		"Social Science"
+	],
+	"mao_zedong_thought": [
+		"Mao Zedong Thought",
+		"\u6bdb\u6cfd\u4e1c\u601d\u60f3\u548c\u4e2d\u56fd\u7279\u8272\u793e\u4f1a\u4e3b\u4e49\u7406\u8bba\u4f53\u7cfb\u6982\u8bba",
+		"Social Science"
+	],
+	"education_science": [
+		"Education Science",
+		"\u6559\u80b2\u5b66",
+		"Social Science"
+	],
+	"teacher_qualification": [
+		"Teacher Qualification",
+		"\u6559\u5e08\u8d44\u683c",
+		"Social Science"
+	],
+	"high_school_politics": [
+		"High School Politics",
+		"\u9ad8\u4e2d\u653f\u6cbb",
+		"Social Science"
+	],
+	"high_school_geography": [
+		"High School Geography",
+		"\u9ad8\u4e2d\u5730\u7406",
+		"Social Science"
+	],
+	"middle_school_politics": [
+		"Middle School Politics",
+		"\u521d\u4e2d\u653f\u6cbb",
+		"Social Science"
+	],
+	"middle_school_geography": [
+		"Middle School Geography",
+		"\u521d\u4e2d\u5730\u7406",
+		"Social Science"
+	],
+	"modern_chinese_history": [
+		"Modern Chinese History",
+		"\u8fd1\u4ee3\u53f2\u7eb2\u8981",
+		"Humanities"
+	],
+	"ideological_and_moral_cultivation": [
+		"Ideological and Moral Cultivation",
+		"\u601d\u60f3\u9053\u5fb7\u4fee\u517b\u4e0e\u6cd5\u5f8b\u57fa\u7840",
+		"Humanities"
+	],
+	"logic": [
+		"Logic",
+		"\u903b\u8f91\u5b66",
+		"Humanities"
+	],
+	"law": [
+		"Law",
+		"\u6cd5\u5b66",
+		"Humanities"
+	],
+	"chinese_language_and_literature": [
+		"Chinese Language and Literature",
+		"\u4e2d\u56fd\u8bed\u8a00\u6587\u5b66",
+		"Humanities"
+	],
+	"art_studies": [
+		"Art Studies",
+		"\u827a\u672f\u5b66",
+		"Humanities"
+	],
+	"professional_tour_guide": [
+		"Professional Tour Guide",
+		"\u5bfc\u6e38\u8d44\u683c",
+		"Humanities"
+	],
+	"legal_professional": [
+		"Legal Professional",
+		"\u6cd5\u5f8b\u804c\u4e1a\u8d44\u683c",
+		"Humanities"
+	],
+	"high_school_chinese": [
+		"High School Chinese",
+		"\u9ad8\u4e2d\u8bed\u6587",
+		"Humanities"
+	],
+	"high_school_history": [
+		"High School History",
+		"\u9ad8\u4e2d\u5386\u53f2",
+		"Humanities"
+	],
+	"middle_school_history": [
+		"Middle School History",
+		"\u521d\u4e2d\u5386\u53f2",
+		"Humanities"
+	],
+	"civil_servant": [
+		"Civil Servant",
+		"\u516c\u52a1\u5458",
+		"Other"
+	],
+	"sports_science": [
+		"Sports Science",
+		"\u4f53\u80b2\u5b66",
+		"Other"
+	],
+	"plant_protection": [
+		"Plant Protection",
+		"\u690d\u7269\u4fdd\u62a4",
+		"Other"
+	],
+	"basic_medicine": [
+		"Basic Medicine",
+		"\u57fa\u7840\u533b\u5b66",
+		"Other"
+	],
+	"clinical_medicine": [
+		"Clinical Medicine",
+		"\u4e34\u5e8a\u533b\u5b66",
+		"Other"
+	],
+	"urban_and_rural_planner": [
+		"Urban and Rural Planner",
+		"\u6ce8\u518c\u57ce\u4e61\u89c4\u5212\u5e08",
+		"Other"
+	],
+	"accountant": [
+		"Accountant",
+		"\u6ce8\u518c\u4f1a\u8ba1\u5e08",
+		"Other"
+	],
+	"fire_engineer": [
+		"Fire Engineer",
+		"\u6ce8\u518c\u6d88\u9632\u5de5\u7a0b\u5e08",
+		"Other"
+	],
+	"environmental_impact_assessment_engineer": [
+		"Environmental Impact Assessment Engineer",
+		"\u73af\u5883\u5f71\u54cd\u8bc4\u4ef7\u5de5\u7a0b\u5e08",
+		"Other"
+	],
+	"tax_accountant": [
+		"Tax Accountant",
+		"\u7a0e\u52a1\u5e08",
+		"Other"
+	],
+	"physician": [
+		"Physician",
+		"\u533b\u5e08\u8d44\u683c",
+		"Other"
+	]
+}
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/mmlu_subject_mapping.json
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/precision/mmlu_subject_mapping.json
@ -0,0 +1,59 @@
+{
+	"abstract_algebra": "STEM",
+	"anatomy": "other",
+	"astronomy": "STEM",
+	"business_ethics": "other",
+	"clinical_knowledge": "other",
+	"college_biology": "STEM",
+	"college_chemistry": "STEM",
+	"college_computer_science": "STEM",
+	"college_mathematics": "STEM",
+	"college_medicine": "other",
+	"college_physics": "STEM",
+	"computer_security": "STEM",
+	"conceptual_physics": "STEM",
+	"econometrics": "social sciences",
+	"electrical_engineering": "STEM",
+	"elementary_mathematics": "STEM",
+	"formal_logic": "humanities",
+	"global_facts": "other",
+	"high_school_biology": "STEM",
+	"high_school_chemistry": "STEM",
+	"high_school_computer_science": "STEM",
+	"high_school_european_history": "humanities",
+	"high_school_geography": "social sciences",
+	"high_school_government_and_politics": "social sciences",
+	"high_school_macroeconomics": "social sciences",
+	"high_school_mathematics": "STEM",
+	"high_school_microeconomics": "social sciences",
+	"high_school_physics": "STEM",
+	"high_school_psychology": "social sciences",
+	"high_school_statistics": "STEM",
+	"high_school_us_history": "humanities",
+	"high_school_world_history": "humanities",
+	"human_aging": "other",
+	"human_sexuality": "social sciences",
+	"international_law": "humanities",
+	"jurisprudence": "humanities",
+	"logical_fallacies": "humanities",
+	"machine_learning": "STEM",
+	"management": "other",
+	"marketing": "other",
+	"medical_genetics": "other",
+	"miscellaneous": "other",
+	"moral_disputes": "humanities",
+	"moral_scenarios": "humanities",
+	"nutrition": "other",
+	"philosophy": "humanities",
+	"prehistory": "humanities",
+	"professional_accounting": "other",
+	"professional_law": "humanities",
+	"professional_medicine": "other",
+	"professional_psychology": "social sciences",
+	"public_relations": "social sciences",
+	"security_studies": "social sciences",
+	"sociology": "social sciences",
+	"us_foreign_policy": "social sciences",
+	"virology": "other",
+	"world_religions": "humanities"
+  }
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/timer.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/timer.py
@ -0,0 +1,101 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved.
+"""
+decorator
+"""
+
+import logging
+import os
+import time
+import uuid
+from dataclasses import dataclass, field
+from functools import wraps, partial
+from typing import List
+from typing import Union
+
+
+@dataclass
+class TimeData:
+    step: int = 0
+    time_cost: Union[float, int] = 0
+
+
+@dataclass
+class SeqTimeData:
+    task_id: str = ""
+    time_data_list: List[TimeData] = field(default_factory=list)
+
+    @property
+    def generated_tokens(self):
+        return len(self.time_data_list)
+
+    @property
+    def first_token_delay(self):
+        return self.time_data_list[0].time_cost if self.time_data_list else 0
+
+    @property
+    def next_token_avg_delay(self):
+        if self.generated_tokens <= 1:
+            return 0
+        return sum(item.time_cost for item in self.time_data_list[1:]) / (self.generated_tokens - 1)
+
+
+class Timer:
+    """
+    CommonDecorator
+    """
+    step: int = 0
+    timeit_res: SeqTimeData = SeqTimeData(str(uuid.uuid4()))
+
+    @classmethod
+    def reset(cls):
+        cls.step = 0
+        cls.timeit_res = SeqTimeData(str(uuid.uuid4()))
+
+    @classmethod
+    def sync(cls):
+        ...
+
+    @classmethod
+    def timing(cls, func=None, *, logger=None, level=logging.INFO):
+        """
+        函数计时
+        :return:
+        """
+        if logger is None:
+            logger = logging.getLogger()
+        if func is None:
+            # 没有括号的时候args是func，有括号的时候args是None
+            return partial(Timer.timing, logger=logger, level=level)
+
+        run = cls._timeit_run if os.getenv("TIMEIT", "0") == "1" else cls._run
+
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            """
+            wrapper
+            :param args:
+            :param kwargs:
+            :return:
+            """
+            res = run(func, *args, **kwargs)
+            return res
+
+        return wrapper
+
+    @classmethod
+    def _run(cls, func, *args, **kwargs):
+        res = func(*args, **kwargs)
+        return res
+
+    @classmethod
+    def _timeit_run(cls, func, *args, **kwargs):
+        cls.sync()
+        start_time = time.time()
+        res = func(*args, **kwargs)
+        cls.sync()
+        end_time = (time.time() - start_time) * 1000  # ms
+        cls.timeit_res.time_data_list.append(TimeData(cls.step, end_time))
+        cls.step = cls.step + 1
+        return res
--- a/mindie/examples/models/atb_speed_sdk/atb_speed/common/utils.py
+++ b/mindie/examples/models/atb_speed_sdk/atb_speed/common/utils.py
@ -0,0 +1,81 @@
+#!/usr/bin/env python
+# coding:utf-8
+# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
+"""
+utils
+"""
+import os
+from dataclasses import dataclass
+
+import torch
+
+FLAG_OS_MAP = {
+    'r': os.O_RDONLY, 'r+': os.O_RDWR,
+    'w': os.O_CREAT | os.O_TRUNC | os.O_WRONLY,
+    'w+': os.O_CREAT | os.O_TRUNC | os.O_RDWR,
+    'a': os.O_CREAT | os.O_APPEND | os.O_WRONLY,
+    'a+': os.O_CREAT | os.O_APPEND | os.O_RDWR,
+    'x': os.O_CREAT | os.O_EXCL,
+    "b": getattr(os, "O_BINARY", 0)
+}
+
+
+@dataclass
+class TorchParallelInfo:
+    __is_initialized: bool = False
+    __world_size: int = 1
+    __local_rank: int = 0
+
+    def __post_init__(self):
+        self.try_to_init()
+
+    @property
+    def is_initialized(self):
+        return self.__is_initialized
+
+    @property
+    def world_size(self):
+        _ = self.try_to_init()
+        return self.__world_size
+
+    @property
+    def local_rank(self):
+        _ = self.try_to_init()
+        return self.__local_rank
+
+    @property
+    def is_rank_0(self) -> bool:
+        return self.local_rank == 0
+
+    @staticmethod
+    def get_rank() -> int:
+        return 0 if not torch.distributed.is_initialized() else torch.distributed.get_rank()
+
+    @staticmethod
+    def get_world_size() -> int:
+        return 1 if not torch.distributed.is_initialized() else torch.distributed.get_world_size()
+
+    def try_to_init(self):
+        """
+        没有初始化的时候，刷新初始化状态以及world_size local_rank
+        :return:
+        """
+        if not self.__is_initialized:
+            is_initialized = torch.distributed.is_initialized()
+            if is_initialized:
+                self.__local_rank = self.get_rank()
+                self.__world_size = self.get_world_size()
+            self.__is_initialized = is_initialized
+        return self.__is_initialized
+
+
+def load_atb_speed():
+    env_name = "ATB_SPEED_HOME_PATH"
+    atb_speed_home_path = os.getenv(env_name)
+    if atb_speed_home_path is None:
+        raise RuntimeError(f"env {env_name} not exist, source set_env.sh")
+    lib_path = os.path.join(atb_speed_home_path, "lib", "libatb_speed_torch.so")
+    torch.classes.load_library(lib_path)
+
+
+torch_parallel_info = TorchParallelInfo()
--- a/mindie/examples/models/atb_speed_sdk/setup.py
+++ b/mindie/examples/models/atb_speed_sdk/setup.py
@ -0,0 +1,19 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
+"""
+setup
+"""
+
+from setuptools import find_packages, setup
+
+setup(name='atb_speed',
+      version='1.1.0',
+      description='atb speed sdk',
+      license='MIT',
+      keywords='atb_speed',
+      packages=find_packages(),
+      install_requires=["pandas"],
+      package_data={"atb_speed": ["**/*.json"]},
+      include_package_data=True
+      )
--- a/mindie/examples/models/atb_speed_sdk/test/sdk_ceval_config_test.py
+++ b/mindie/examples/models/atb_speed_sdk/test/sdk_ceval_config_test.py
@ -0,0 +1,36 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher import Launcher
+from atb_speed.common.precision import get_precision_test_cls
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+
+class BaichuanLM(Launcher):
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True, use_fast=False)
+        model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
+        model.eval()
+        model.generation_config = self.remove_part_of_generation_config(model.generation_config)
+        return model, tokenizer
+
+
+def demo_ceval(launcher: Launcher):
+    """
+
+    :param launcher:
+    :return:
+    """
+    c_t = get_precision_test_cls()(launcher)
+    c_t.run()
+
+
+if __name__ == '__main__':
+    atb_speed_config.init_config("config.ini")
+    baichuan = BaichuanLM()
+    demo_ceval(baichuan)
--- a/mindie/examples/models/atb_speed_sdk/test/sdk_perf_config_test.py
+++ b/mindie/examples/models/atb_speed_sdk/test/sdk_perf_config_test.py
@ -0,0 +1,32 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
+from atb_speed.common.config import atb_speed_config
+from atb_speed.common.launcher import Launcher
+from atb_speed.common.performance.base import PerformanceTest
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+
+class LMLauncher(Launcher):
+    """
+    LMLauncher
+    """
+
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        tokenizer = AutoTokenizer.from_pretrained(
+            self.model_path, trust_remote_code=True, use_fast=False)
+        model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
+        model.eval()
+        model.generation_config = self.remove_part_of_generation_config(model.generation_config)
+        return model, tokenizer
+
+
+if __name__ == '__main__':
+    atb_speed_config.init_config("config.ini")
+    performance_test = PerformanceTest(LMLauncher("0"))
+    performance_test.warm_up()
+    performance_test.run_test()
--- a/mindie/examples/models/atb_speed_sdk/test/sdk_test.py
+++ b/mindie/examples/models/atb_speed_sdk/test/sdk_test.py
@ -0,0 +1,40 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
+import os
+
+from atb_speed.common.launcher import Launcher
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+
+class BaichuanLM(Launcher):
+
+    def init_model(self):
+        """
+        模型初始化
+        :return:
+        """
+        pwd = os.path.realpath(os.path.dirname(__file__))
+        model_path = os.path.join(pwd, "..", "model")
+        tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=False)
+        model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).half().to(self._device)
+        model.eval()
+        model.generation_config = self.remove_part_of_generation_config(model.generation_config)
+        return model, tokenizer
+
+
+if __name__ == '__main__':
+    baichuan = BaichuanLM(device_ids="1", )
+    baichuan.infer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->')
+
+    baichuan.infer('登鹳雀楼->王之涣\n夜雨寄北->')
+    baichuan.infer('苹果公司的CEO是')
+
+    query_list = [
+        "谷歌公司的CEO是",
+        '登鹳雀楼->王之涣\n夜雨寄北->',
+        '苹果公司的CEO是',
+        '华为公司的CEO是',
+        '微软公司的CEO是'
+    ]
+    baichuan.infer_batch(query_list)
--- a/mindie/examples/models/atb_speed_sdk/test/template.ini
+++ b/mindie/examples/models/atb_speed_sdk/test/template.ini
@ -0,0 +1,41 @@
+[model]
+;模型路径
+model_path=../model
+;使用的设备号,多卡用逗号分隔，设置多卡，将默认使用并行模式
+device_ids=2
+;并行通信类型，默认是hccl，可选hccl/nccl(GPU)
+;parallel_backend=hccl
+;日志保存路径，默认是执行脚本所在路径
+;log_dir=./
+;是否绑核，0或1，默认是1表示开启
+;bind_cpu=1
+
+[precision]
+;精度测试方法，默认为ceval，可选ceval/mmlu
+mode=ceval
+;精度测试工作路径
+work_dir=./
+;批量精度测试，默认是1
+batch=1
+;每个科目的shot数量，默认是5
+shot=5
+;每个问题的回答长度，默认是32
+;seq_len_out=32
+
+[performance]
+;性能测试模型名称，用于结果文件的命名
+model_name=vicuna_13b
+;测试的batch size
+batch_size=1
+;测试的输入的最大2的幂
+max_len_exp=10
+;测试的输入的最小2的幂
+min_len_exp=5
+;特定用例测试，格式为[[seq_in,seq_out]],注意当设置这个参数时，max_len_exp min_len_exp不生效
+;case_pair=[[1,2],[2,3]]
+;生成的结果文件名称，默认会自动生成，一般不设置
+;save_file_name=
+;性能测试方法，detail / normal , 默认是normal.要使用detail需要配合装饰器计时，并加上环境变量 TIMEIT=1
+;perf_mode=
+;性能测试时是否只测试generate而跳过decode，0/1 默认是0
+;skip_decode=
--- a/mindie/examples/models/atb_speed_sdk/test/test_config.py
+++ b/mindie/examples/models/atb_speed_sdk/test/test_config.py
@ -0,0 +1,14 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
+import os
+from unittest import TestCase
+
+from atb_speed.common.config import atb_speed_config
+
+
+class ConfigTest(TestCase):
+    def test_1(self):
+        pwd = os.path.dirname(os.path.realpath(__file__))
+        atb_speed_config.init_config(os.path.join(pwd, "template.ini"))
+        self.assertEqual(atb_speed_config.performance.batch_size, 1)
--- a/mindie/examples/models/atb_speed_sdk/test/test_timer.py
+++ b/mindie/examples/models/atb_speed_sdk/test/test_timer.py
@ -0,0 +1,49 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
+"""
+@Time   :  2024/2/9 14:46
+"""
+import logging
+import os
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from atb_speed.common.timer import Timer
+
+logging.basicConfig(level=logging.NOTSET)
+
+os.environ["TIMEIT"] = "1"
+
+
+class AddNet(nn.Module):
+    def __init__(self, in_dim, h_dim=5, out_dim=1):
+        super().__init__()
+        self.fc1 = nn.Linear(in_dim, h_dim)
+        self.fc2 = nn.Linear(h_dim, out_dim)
+
+    @Timer.timing
+    def forward(self, x_tensor, y_tensor):
+        out = torch.cat([x_tensor, y_tensor], dim=1)
+        out = torch.relu(self.fc1(out))
+        out = self.fc2(out)
+        return out
+
+
+class TimerTest(TestCase):
+    @classmethod
+    def setUpClass(cls):
+        Timer.reset()
+        # Timer.sync= xxxx
+        cls.add_net = AddNet(in_dim=2)
+
+    def test_1(self):
+        for _ in range(5):
+            x_tensor = torch.randn(1, 1)
+            y_tensor = torch.randn(1, 1)
+            result = self.add_net.forward(x_tensor, y_tensor)
+            logging.info(result)
+        logging.info(Timer.timeit_res)
+        logging.info(Timer.timeit_res.first_token_delay)
+        logging.info(Timer.timeit_res.next_token_avg_delay)
--- a/mindie/examples/models/baichuan/README.md
+++ b/mindie/examples/models/baichuan/README.md
@ -0,0 +1,302 @@
+# README
+
+- Baichuan大模型，融合了意图理解、信息检索以及强化学习技术，结合有监督微调与人类意图对齐，在知识问答、文本创作领域表现突出。
+
+- 此代码仓中实现了一套基于NPU硬件的Baichuan推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+
+- 此矩阵罗列了各Baichuan模型支持的特性
+
+| 模型及参数量                | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI  | 长序列 |
+|-----------------------|----------------------------|-----------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ | -------------------------- | ---- | ------ | ---- |-----|
+| Baichuan2-7B          | 支持world size 1,2,4,8       | 支持world size 2              | √    | ×   | √               | √               | √        | ×         | ×         | ×            | √                          | ×    | √      | √    | ×    |
+| Baichuan2-13B         | 支持world size 2,4,8         | 支持world size 2,4            | √    | ×   | √               | √               | √        | ×         | √         | ×            | √                          | ×    | √      | √    | ×    |
+| Baichuan-7B           | 支持world size 1,2,4,8       | 支持world size 2              | √    | ×   | √               | √               | ×        | ×         | ×         | ×            | ×                          | ×    | √      | ×    | ×    |
+| Baichuan-13B          | 支持world size 2,4,8         | 支持world size 2,4            | √    | ×   | √               | √               | ×        | ×         | ×         | ×            | ×                          | ×    | √      | ×    | ×    |
+
+# 使用说明
+
+## 路径变量解释
+
+| 变量名         | 含义                                                                                                                             |
+|-------------|--------------------------------------------------------------------------------------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置的目录                                                                                                                |
+| llm_path    | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/ModelLink/`；若使用gitee下载的代码，则路径为`${working_dir}/ModelLink/mindie_ref/mindie_llm/atb_models` |
+| script_path | 脚本所在路径。Baichuan系列模型的工作脚本所在路径为${llm_path}/examples/models/baichuan                                                              |
+| weight_path | 模型权重路径                                                                                                                         |
+
+## 权重
+**权重下载**
+- [Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
+- [Baichuan-13B](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/tree/main)
+- [Baichuan2-7B](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/tree/main)
+- [Baichuan2-13B](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main)
+
+**权重转换**
+- Paged Attention 场景下需要.safetensors 格式的权重，如果没有，参考[此README文件](../../README.md)转换
+
+**量化权重生成**
+- 基于原始的FP16的权重，生成量化权重
+- W8A8 Antioutlier量化权重请使用以下指令生成
+    - 暂不支持
+
+- W8A8量化权重请使用以下指令生成
+  - baichuan2-7b使用quant_baichuan2_7b_w8a8.py，baichuan2-13b使用quant_baichuan2_13b_w8a8.py
+  - 备注：建议精度测试使用cpu生成量化权重。npu生成的量化权重可作为调试使用，精度会有损失。
+  - 修改权重路径
+    - 根据模型，将当前目录下的quant_baichuan2_7b_w8a8.py或quant_baichuan2_13b_w8a8.py文件中的input_fp16_path 和output_w8a8_path修改为自己的浮点权重路径和输出权重路径
+    - 如果想用npu转换权重，需要根据注释修改代码将设备设置为npu
+  - 执行
+    ```
+    python quant_baichuan2_7b_w8a8.py    (baichuan2-7b)
+    python quant_baichuan2_13b_w8a8.py   (baichuan2-13b)
+    ```
+  - 将原权重文件夹下所有文件（除权重文件*。bin）拷贝到新的量化权重文件下
+  - `${weight_path}/config.json`文件中需设置`dtype`和`quantize`类型来标识量化类型和精度
+  - 若`dtype`和`quantize`字段不存在，需新增
+    - 配置
+
+      | 量化类型及精度 | torch_dtype | quantize |
+      | -------------- | ----------- | -------- |
+      | FP16           | "float16"   | ""       |
+      | W8A8           | "float16"   | "w8a8"   |
+    - 示例
+      - baichuan模型使用FP16精度，W8A8量化
+        ```
+        {
+          "torch_dtype": "float16",
+          "quantize": "w8a8"
+        }
+        ``` 
+
+- W8A16量化权重请使用以下指令生成
+    - 暂不支持
+
+- W4A16量化权重请使用以下指令生成
+  - 当前w4a16只支持baichuan2-13b模型
+  - baichuan2-13b使用quant_baichuan2_13b_w4a16.py
+  - 备注：建议精度测试使用cpu生成量化权重。
+  - 修改权重路径
+    - 根据模型，将当前目录下的quant_baichuan2_13b_w4a16.py文件中的FP16_PATH 和OUTPUT_PATH修改为自己的浮点权重路径和输出权重路径
+  - 执行
+    ```
+    python quant_baichuan2_13b_w4a16.py   (baichuan2-13b)
+    ```
+  - 将原权重文件夹下所有文件（除权重文件*。bin）拷贝到新的量化权重文件下
+  - `${weight_path}/config.json`文件中需设置`dtype`和`quantize`类型来标识量化类型和精度
+  - 若`dtype`和`quantize`字段不存在，需新增
+    - 配置
+
+      | 量化类型及精度 | torch_dtype | quantize |
+      | -------------- | ----------- | -------- |
+      | FP16           | "float16"   | ""       |
+      | W4A16          | "float16"   | "w4a16"  |
+    - 示例
+      - baichuan模型使用FP16精度，W8A8量化
+        ```
+        {
+          "torch_dtype": "float16",
+          "quantize": "w4a16"
+        }
+        ``` 
+
+- 稀疏量化权重请使用以下指令生成
+  - Step 1
+    ```shell
+    # 设置CANN包的环境变量
+    source /usr/local/Ascend/ascend-toolkit/set_env.sh
+    cd ${llm_path}
+    python examples/models/llama/convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A8S量化权重路径} --w_bit 4 --a_bit 8 --calib_file ${llm_path}/examples/convert/model_slim/teacher_qualification.jsonl --fraction 0.011 --co_sparse True
+    ```
+    请确保转换量化权重时transformer是==4.30.2
+  - Step 2：量化权重切分及压缩
+    > 运行前需要确保压缩工具编译过
+    >
+    > `cd /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/msmodelslim/pytorch/weight_compression/compress_graph`
+    >
+    > `bash build.sh /usr/local/Ascend/ascend-toolkit/latest`
+
+    ```
+    torchrun --nproc_per_node {TP数} -m examples.convert.model_slim.sparse_compressor --model_path {W8A8S量化权重路径} --save_directory {W8A8SC量化权重路径}
+    ```
+
+    - TP数为tensor parallel并行个数
+    - 注意：若权重生成时以TP=4进行切分，则运行时也需以TP=4运行
+    - 示例
+      ```
+        torchrun --nproc_per_node 4 -m examples.convert.model_slim.sparse_compressor --model_path /data1/weights/model_slim/baichuan2-7b_w8a8s --save_directory /data1/weights/model_slim/baichuan2-7b_w8a8sc
+      ```
+
+**基础环境变量**
+- 参考[此README文件](../../../README.md)
+
+## 推理
+### 对话测试
+**运行Flash Attention FP16**
+- 其余Baichuan模型参考以下运行方式
+    - 运行启动脚本
+        - 在\${llm_path}目录下执行以下指令
+          ```shell
+          bash examples/models/baichuan/run_fa.sh ${weight_path}
+          ```
+    - 环境变量说明
+        - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+            - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+            - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+            - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+            - 各模型支持的核心数参考“特性矩阵”
+        - `export MASTER_PORT=20036`
+            - 设置卡间通信端口
+            - 默认使用20036端口
+            - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+            - 设置时端口建议范围为：20000-20050
+        - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+          ```shell
+          export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+          export INF_NAN_MODE_ENABLE=0
+          export ATB_OPERATION_EXECUTE_ASYNC=1
+          export TASK_QUEUE_ENABLE=1
+          export ATB_CONVERT_NCHW_TO_ND=1
+          export HCCL_BUFFSIZE=120
+          export HCCL_WHITELIST_DISABLE=1
+          export ATB_CONTEXT_WORKSPACE_RING=1
+          export ATB_CONTEXT_WORKSPACE_SIZE=2629145600
+          export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
+          export ATB_LAUNCH_KERNEL_WITH_TILING=0
+          export ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=1
+          export ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=0
+    
+          ```
+
+**运行Flash Attention BF16**
+- 暂不支持
+
+**运行Flash Attention W8A8**
+- 暂不支持
+
+**运行Flash Attention W8A16**
+- 暂不支持
+
+**运行Flash Attention W4A16**
+- 暂不支持
+
+**运行Paged Attention FP16**
+- 运行启动脚本
+    - 在\${llm_path}目录下执行以下指令
+      ```shell
+      chat模式（仅支持baichuan2系列）:
+      bash examples/models/baichuan/run_pa.sh ${weight_path} chat
+
+      非chat模式:
+      bash examples/models/baichuan/run_pa.sh ${weight_path} 
+      ```
+- 环境变量说明
+    - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+        - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+        - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+        - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+        - 各模型支持的核心数参考“特性矩阵”
+    - `export MASTER_PORT=20036`
+        - 设置卡间通信端口
+        - 默认使用20036端口
+        - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+        - 设置时端口建议范围为：20000-20050
+    - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+      ```shell
+      export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+      export INF_NAN_MODE_ENABLE=0
+      export ATB_OPERATION_EXECUTE_ASYNC=1
+      export TASK_QUEUE_ENABLE=1
+      export ATB_CONVERT_NCHW_TO_ND=1
+      export LCCL_ENABLE_FALLBACK=1
+      export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+      export ATB_CONTEXT_WORKSPACE_SIZE=0
+      ```
+
+**运行Paged Attention BF16**
+- 暂不支持
+
+**运行Paged Attention W8A8**
+- 运行启动脚本
+  - 与“运行Paged Attention FP16”的启动方式相同
+  - `${weight_path}`为W8A8量化权重的路径
+- 环境变量说明
+  - 参见“运行Paged Attention FP16”中的环境变量说明
+- 相比于FP16，运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段，将此字段对应的值修改为`w8a8`
+  - 若config.json中无此字段，则新增
+
+**运行Paged Attention W8A16**
+- 暂不支持
+
+**运行Paged Attention W4A16**
+- 运行启动脚本
+  - 与“运行Paged Attention FP16”的启动方式相同
+  - `${weight_path}`为W4A16量化权重的路径
+- 环境变量说明
+  - 参见“运行Paged Attention FP16”中的环境变量说明
+- 相比于FP16，运行量化时需修改W4A16量化权重`${weight_path}/config.json`中的`quantize`字段，将此字段对应的值修改为`w4a16`
+  - 若config.json中无此字段，则新增
+
+**运行KV cache量化**
+- 待补充
+
+**运行稀疏量化**
+- 运行启动脚本
+  - 与“运行Paged Attention FP16”的启动方式相同
+  - `${weight_path}`为W8A8量化权重的路径
+- 环境变量说明
+  - 参见“运行Paged Attention FP16”中的环境变量说明
+- 相比于FP16，运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段，将此字段对应的值修改为`w8a8sc`
+  - 若config.json中无此字段，则新增
+- 注意：压缩算法与硬件强相关，当前仅300I DUO卡支持稀疏量化
+
+**运行MOE量化**
+- 待补充
+
+## 精度测试
+- 参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
+    export MAX_MEMORY_GB=29
+    bash run.sh pa_fp16 full_BoolQ 1 baichuan2_7b ${baichuan-7b权重路径} 4
+    bash run.sh pa_fp16 full_BoolQ 1 baichuan2_13b ${baichuan-13b权重路径} 4
+    bash run.sh pa_fp16 full_BoolQ 1 baichuan2_7b ${baichuan2-7b权重路径} 4
+    bash run.sh pa_fp16 full_BoolQ 1 baichuan2_13b ${baichuan2-13b权重路径} 4
+    ```
+- 注意：baichuan-7b和baichuan-13b模型测试时复用baichuan2_7b和baichuan2_13b的model_name
+- 运行量化权重时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配，参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/examples/README.md)
+
+## 性能测试
+- 支持ALiBi Mask Free。默认关闭，如需开启，请修改当前目录下的run_pa.sh中环境变量如下：
+```
+export IS_ALIBI_MASK_FREE=1
+```
+- 参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    export ATB_LLM_BENCHMARK_ENABLE=1
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_7b ${baichuan-7b权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_13b ${baichuan-13b权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_7b ${baichuan2-7b权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_13b ${baichuan2-13b权重路径} 8
+    ```
+- 注意：baichuan-7b和baichuan-13b模型测试时复用baichuan2_7b和baichuan2_13b的model_name
+- 运行量化权重时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配，参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/examples/README.md)
+- 特殊场景说明: 若在性能测试时发现有波动情况，可配置透明大页，提升内存访问性能。该功能请按需开启，对内存占用有一定影响。
+```shell
+# 性能测试时，可按需开启透明大页
+echo always > /sys/kernel/mm/transparent_hugepage/enabled
+# 关闭透明大页
+echo never > /sys/kernel/mm/transparent_hugepage/enabled
+```
+
+## FAQ
+- 更多环境变量见[此README文件](../../README.md)
+- 对话测试实际执行的Python文件为`${llm_path}/examples/run_fa.py`和`${llm_path}/examples/run_pa.py`；这两个文件的参数说明见[此README文件](../../README.md)
+- 运行时，需要通过指令pip list｜grep protobuf确认protobuf版本，如果版本高于3.20.x，请运行指令pip install protobuf==3.20.0进行更新
--- a/mindie/examples/models/baichuan/quant_baichuan2_13b_w4a16.py
+++ b/mindie/examples/models/baichuan/quant_baichuan2_13b_w4a16.py
@ -0,0 +1,208 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import torch.utils.data
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+
+SEQ_LEN_OUT = 32
+
+
+# for local path
+OUTPUT_PATH = "your output path"
+FP16_PATH = "your path to model"    # 原始模型路径，其中的内容如下图
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=FP16_PATH, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    pretrained_model_name_or_path=FP16_PATH,
+    torch_dtype=torch.float32,
+    trust_remote_code=True
+)
+
+W_SYM = True
+
+
+# 获取校准数据函数定义
+def get_calib_dataset(input_tokenizer, calib_list, device="cpu"):  # 如果需要使用npu进行量化, device="npu:0"。使用cpu,device="cpu"
+    calib_dataset = []
+    for calib_data in calib_list:
+        inputs = input_tokenizer(calib_data, return_tensors='pt')
+        calib_dataset.append([
+            inputs.data['input_ids'].to(device),
+            inputs.data['attention_mask'].to(device)
+            ])
+    return calib_dataset
+
+
+CALIB_SET = [
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\
+B. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\
+A. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n编写中小学教科书的直接依据是____。\nA. 《中华人民共和国教育法》\nB. 课程计划\nC. 课程标准\
+D. 课程表\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的\
+是____。\nA. 坐井观天，所见甚少B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n下列关于课程的三种文本表现形式说法正确的是____。\nA. 课程计划是由当\
+地教育主管部门制订的\nB. 课程标准是依据课程计划制定的C. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式、螺旋式、交叉式\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的\
+是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n悦悦是一名右耳失聪的残疾儿童，活动课上有时会听不清楚周老师所讲的内容，因此\
+经常提问题。对此，周老师应当采取的措施是____。\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿，不理会悦悦\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同\
+的是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n内流河也称“内陆河”，是指没有流入海洋的河流，大多分布在大陆内部干燥地区，上\
+游降水或冰雪融水为其主要补给水源，最终消失于沙漠或注入内陆湖泊。下列中国内流河中，最长的是____。\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同\
+的是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学校规定学生不能烫染头发，但是小文为了彰显个性，在假期把头发染成了棕色。面\
+对小文的情况，教师应该怎样处理？____。\nA. 年轻人追求个性是合情合理的，应该宽容对待\nB. 违反学校的校规，应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明\
+小文违反校规的原因，并对其进行劝导和教育\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的\
+是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n张老师根据自己班级的情况，为解决班级内部班干部的人际关系问题，建立和谐融洽\
+的班级氛围，自主开发了“和谐人际”的班级课程，这体现了教师____。\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n刘老师工作很负责，学生在学校出现一点问题他就会与家长联系，在与家长沟通时他经常以前辈的姿态对待家长，对家长的教育方式指指点点。刘老师的做法\
+____。\nA. 正确，老师就应该与家长经常沟通\nB. 正确，老师的经验比家长丰富，应该多指导家长\nC. 不正确，教师没有权利指导家长\nD. 不正确，教师应该与家长建立平等的沟通关系，尊重家长的人格\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在古代印度，有一户人家经营一家棉布店销售自己手工制作的衣服。你认为这户人家属于哪个等级？____\nA. 婆罗门\nB. 刹帝利\
+C. 吠舍\nD. 首陀罗\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n“小型分散，便于开展多种多样的活动，满足学生不同的兴趣、爱好，发展学生的才能，使学生得到更多的学习和锻炼的机会。\
+”这种课外活动的形式是____。\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n小红每天晚上临睡前都要多次反复检查自己的书包，确保带齐了第二天需要用的教材和文具。她明知道没有这个必要，但就是控制不住。她可\
+能出现了____。\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n国家管理和评价课程的基础是____。\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n儿童坚持性发生明显质变的年龄约在____\nA. 3～4岁\nB. 4～5岁\nC. 5～6岁\nD. 6岁以后\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n《红楼梦》中人物众多、关系繁杂。为了帮助读者阅读，许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图。这属于____。\
+A. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学期结束时，班主任王老师会对学生思想品德的发展变化情况进行评价。这项工作属于____。\nA. 工作总结\nB. 工作计划\nC. 操行评定\
+D. 建立学生档案\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n人们常说：“教学有法而教无定法。”这反映了教师的劳动具有____。\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造\
+性\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n县级以上地方各级人民代表大会是县级以上地方国家权力机关，其职权不包括____。\nA. 改变或撤销本级人大常务委员会不适当的决定\
+B. 选举并有权罢免本级人民法院院长\nC. 批准本行政区域内的预算执行情况的报告\nD. 决定并宣布下一级行政区城进入紧急状态\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在心理健康课上，同一批学生在第二次进行同样内容的人格测验时获得的分数与上次测验差别较大。这说明该测验存在的问题是____。\
+A. 信度问题\nB. 效度问题\nC. 难度问题\nD. 区分度问题\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n李老师在教学生区分形近字“渴”“竭”“碣”“谒”时，将四个字相同的右半部分用白色粉笔写出，相异的左半部分用彩色粉笔写出。李老师运用了\
+知觉的____。\nA. 整体性\nB. 选择性\nC. 理解性\nD. 恒常性\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n兰兰学会走路后,就要很喜欢尝试自己穿衣、吃饭、捡东西,喜欢探索周围世界。按照埃里克森人格发展阶段理论,兰兰所处的发展阶段是____\
+A. 信任对怀疑\nB. 自立对羞怯\nC. 主动感对内疚感\nD. 勤奋感对自卑感\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n杨老师在教授生字词的过程中发现部分学生有缺笔少画的现象，于是他把“小学生缺笔少画现象的原因及对策研究”作为研究课题，拟订相应的研究计划，\
+在工作中收集、整理相关资料并实施教学措施，最后根据反馈信息调整教学方案。这种研究方法属于____。\nA. 教育行动研究法\nB. 教育实验法\nC. 教育叙事研究法\nD. 个案研究法\nAnswer:"
+]
+
+
+def main():
+    dataset_calib = get_calib_dataset(tokenizer, CALIB_SET)
+    '''
+    对于linear算子中的激活值如果有表示范围过大，或者“尖刺”的异常值过多，
+    需要使用anti outlier功能，使用方法如下
+    '''
+    anti_config = AntiOutlierConfig(a_bit=16, w_bit=4, anti_method="m3", dev_type="cpu", w_sym=W_SYM)
+    anti_outlier = AntiOutlier(model, calib_data=dataset_calib, cfg=anti_config, norm_class_name="RMSNorm")
+    anti_outlier.process()
+    '''
+    下面是回退层的设置，因为w8a8的对激活值也进行了量化，会有部分网络层对激活值的表示
+    范围较为敏感所以需要回退这些网络层使用浮点权重进行计算
+    '''
+
+    disable_names = []
+    baichuan_layers = 40
+    disable_idx_lst = list(range(baichuan_layers))
+    for layer_index in disable_idx_lst:
+        down_proj_name = "model.layers.{}.mlp.down_proj".format(layer_index)
+        disable_names.append(down_proj_name)
+        
+    model.eval()
+    quant_config = QuantConfig(a_bit=16, w_bit=4, disable_names=disable_names, dev_type='cpu',
+                               w_sym=W_SYM, mm_tensor=False, is_lowbit=True, open_outlier=False, 
+                               group_size=64, disable_last_linear=False)
+    calibrator = Calibrator(model, quant_config, calib_data=[], disable_level='L0')
+    calibrator.run()
+    calibrator.save(OUTPUT_PATH, save_type=["safe_tensor", "numpy"])
+
+if __name__ == "__main__":
+    main()
--- a/mindie/examples/models/baichuan/quant_baichuan2_13b_w8a8.py
+++ b/mindie/examples/models/baichuan/quant_baichuan2_13b_w8a8.py
@ -0,0 +1,197 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+
+INPORT_FP16_PATH = 'the_path_of_fp16_model_input'
+OUTPORT_W8A8_PATH = 'the_path_of_w8a8_model_output'
+tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=INPORT_FP16_PATH, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=INPORT_FP16_PATH, trust_remote_code=True).\
+        float().cpu()
+
+
+# 获取校准数据函数定义
+def get_calib_dataset(tokenizer, calib_list, device="cpu"):  # 如果需要使用npu进行量化, device="npu:0"。使用cpu,device="cpu"
+    calib_dataset = []
+    for calib_data in calib_list:
+        inputs = tokenizer(calib_data, return_tensors='pt')
+        calib_dataset.append([
+            inputs.data['input_ids'].to(device),
+            inputs.data['attention_mask'].to(device)
+            ])
+    return calib_dataset
+
+
+CALIB_SET = [
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\
+B. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\
+A. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n编写中小学教科书的直接依据是____。\nA. 《中华人民共和国教育法》\nB. 课程计划\nC. 课程标准\
+D. 课程表\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的\
+是____。\nA. 坐井观天，所见甚少B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n下列关于课程的三种文本表现形式说法正确的是____。\nA. 课程计划是由当\
+地教育主管部门制订的\nB. 课程标准是依据课程计划制定的C. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式、螺旋式、交叉式\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的\
+是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n悦悦是一名右耳失聪的残疾儿童，活动课上有时会听不清楚周老师所讲的内容，因此\
+经常提问题。对此，周老师应当采取的措施是____。\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿，不理会悦悦\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同\
+的是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n内流河也称“内陆河”，是指没有流入海洋的河流，大多分布在大陆内部干燥地区，上\
+游降水或冰雪融水为其主要补给水源，最终消失于沙漠或注入内陆湖泊。下列中国内流河中，最长的是____。\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同\
+的是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学校规定学生不能烫染头发，但是小文为了彰显个性，在假期把头发染成了棕色。面\
+对小文的情况，教师应该怎样处理？____。\nA. 年轻人追求个性是合情合理的，应该宽容对待\nB. 违反学校的校规，应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明\
+小文违反校规的原因，并对其进行劝导和教育\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，\
+学习迁移产生的关键是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现\
+“####”符号时，表明____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
+年\nB. 1918年\nC. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的\
+是____。\nA. 坐井观天，所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n张老师根据自己班级的情况，为解决班级内部班干部的人际关系问题，建立和谐融洽\
+的班级氛围，自主开发了“和谐人际”的班级课程，这体现了教师____。\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n刘老师工作很负责，学生在学校出现一点问题他就会与家长联系，在与家长沟通时他经常以前辈的姿态对待家长，对家长的教育方式指指点点。刘老师的做法\
+____。\nA. 正确，老师就应该与家长经常沟通\nB. 正确，老师的经验比家长丰富，应该多指导家长\nC. 不正确，教师没有权利指导家长\nD. 不正确，教师应该与家长建立平等的沟通关系，尊重家长的人格\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在古代印度，有一户人家经营一家棉布店销售自己手工制作的衣服。你认为这户人家属于哪个等级？____\nA. 婆罗门\nB. 刹帝利\
+C. 吠舍\nD. 首陀罗\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n“小型分散，便于开展多种多样的活动，满足学生不同的兴趣、爱好，发展学生的才能，使学生得到更多的学习和锻炼的机会。\
+”这种课外活动的形式是____。\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n小红每天晚上临睡前都要多次反复检查自己的书包，确保带齐了第二天需要用的教材和文具。她明知道没有这个必要，但就是控制不住。她可\
+能出现了____。\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n国家管理和评价课程的基础是____。\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n儿童坚持性发生明显质变的年龄约在____\nA. 3～4岁\nB. 4～5岁\nC. 5～6岁\nD. 6岁以后\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n《红楼梦》中人物众多、关系繁杂。为了帮助读者阅读，许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图。这属于____。\
+A. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学期结束时，班主任王老师会对学生思想品德的发展变化情况进行评价。这项工作属于____。\nA. 工作总结\nB. 工作计划\nC. 操行评定\
+D. 建立学生档案\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n人们常说：“教学有法而教无定法。”这反映了教师的劳动具有____。\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造\
+性\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n县级以上地方各级人民代表大会是县级以上地方国家权力机关，其职权不包括____。\nA. 改变或撤销本级人大常务委员会不适当的决定\
+B. 选举并有权罢免本级人民法院院长\nC. 批准本行政区域内的预算执行情况的报告\nD. 决定并宣布下一级行政区城进入紧急状态\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在心理健康课上，同一批学生在第二次进行同样内容的人格测验时获得的分数与上次测验差别较大。这说明该测验存在的问题是____。\
+A. 信度问题\nB. 效度问题\nC. 难度问题\nD. 区分度问题\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n李老师在教学生区分形近字“渴”“竭”“碣”“谒”时，将四个字相同的右半部分用白色粉笔写出，相异的左半部分用彩色粉笔写出。李老师运用了\
+知觉的____。\nA. 整体性\nB. 选择性\nC. 理解性\nD. 恒常性\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n兰兰学会走路后,就要很喜欢尝试自己穿衣、吃饭、捡东西,喜欢探索周围世界。按照埃里克森人格发展阶段理论,兰兰所处的发展阶段是____\
+A. 信任对怀疑\nB. 自立对羞怯\nC. 主动感对内疚感\nD. 勤奋感对自卑感\nAnswer:",
+  "The following are multiple choice questions (with answers) about  teacher qualification.\n\n下列对于多动症的说法，不正确的是____\
+A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理，学习迁移产生的关键\
+是学习者通过活动能概括出其共同原理。持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中，通常在单元格内出现“####”符号时，表明\
+____。\nA. 显示的是字符串“####”\nB. 列宽不够，无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
+C. 1939年\nD. 1945年\nAnswer: C\n\n在日常生活中，我们经常会接触一些民谚、俗语，这些民谚、俗语蕴含着丰富的物理知识。下列民谚、俗语蕴含的物理知识所属领域不同的是____。\nA. 坐井观天，所见甚少\
+B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n杨老师在教授生字词的过程中发现部分学生有缺笔少画的现象，于是他把“小学生缺笔少画现象的原因及对策研究”作为研究课题，拟订相应的研究计划，\
+在工作中收集、整理相关资料并实施教学措施，最后根据反馈信息调整教学方案。这种研究方法属于____。\nA. 教育行动研究法\nB. 教育实验法\nC. 教育叙事研究法\nD. 个案研究法\nAnswer:"
+]
+
+
+dataset_calib = get_calib_dataset(tokenizer, CALIB_SET)
+# 对于linear算子中的激活值如果有表示范围过大，或者“尖刺”的异常值过多，
+# 需要使用anti outleir功能，使用方法如下
+anti_config = AntiOutlierConfig(anti_method="m2", dev_type="cpu")  # dev_type="npu", dev_id=0  如果需要使用npu进行量化。
+anti_outlier = AntiOutlier(model, calib_data=dataset_calib, cfg=anti_config, norm_class_name="RMSNorm")
+anti_outlier.process()
+# 下面是回退层的设置，因为w8a8的对激活值也进行了量化，会有部分网络层对激活值的表示
+# 范围较为敏感所以需要回退这些网络层使用浮点权重进行计算。
+disable_names = []
+baichuan_layers = 40
+disable_idx_lst = list(range(baichuan_layers))
+for layer_index in disable_idx_lst:
+    down_proj_name = "model.layers.{}.mlp.down_proj".format(layer_index)
+    disable_names.append(down_proj_name)
+quant_config = QuantConfig(
+    a_bit=8, 
+    w_bit=8, 
+    disable_names=disable_names, 
+    disable_last_linear=False, 
+    dev_type='cpu',  # dev_type="npu", dev_id=0  如果需要使用npu进行量化
+    act_method=3, 
+    pr=1.0, 
+    w_sym=True, 
+    mm_tensor=False
+)
+
+calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L0')
+calibrator.run()  # 执行PTQ量化校准
+# "safe_tensor"对应safetensors格式权重，"numpy"对应npy格式权重
+calibrator.save(OUTPORT_W8A8_PATH, save_type=["safe_tensor"])
--- a/mindie/examples/models/baichuan/quant_baichuan2_7b_w8a8.py
+++ b/mindie/examples/models/baichuan/quant_baichuan2_7b_w8a8.py
@ -0,0 +1,746 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import logging
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+
+INPORT_FP16_PATH = 'the_path_of_fp16_model_input'
+OUTPORT_W8A8_PATH = 'the_path_of_w8a8_model_output'
+tokenizer = AutoTokenizer.from_pretrained(
+    pretrained_model_name_or_path=INPORT_FP16_PATH,
+    use_fast=False,
+    padding_side='left',
+    trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    pretrained_model_name_or_path=INPORT_FP16_PATH,
+    trust_remote_code=True).float().cpu()
+
+# model = model.half().npu() # 如果需要使用npu进行量化
+
+
+# 获取校准数据函数定义
+def get_calib_dataset(
+        auto_tokenizer,
+        calib_list,
+        device="cpu"):  # 如果需要使用npu进行量化, device="npu:0"。使用cpu,device="cpu"
+    calib_dataset = []
+    for calib_data in calib_list:
+        inputs = auto_tokenizer(calib_data, return_tensors='pt')
+        calib_dataset.append([
+            inputs.data['input_ids'].to(device),
+            inputs.data['attention_mask'].to(device)
+        ])
+    return calib_dataset
+
+
+calib_set = [
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，\
+静谧中含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸\
+露着贫瘠\nB. 也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都\
+是荒村野店。时而会有一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只\螃蟹\
+放进去时，渔夫就用重物将口封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如\
+此下去，即使篓口没有盖盖子，但也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必\
+然内耗，团结就是力量\nD. 与人方便，自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer:\
+ A\n\n①我的奶奶是这样，我的父亲也是这样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像\
+抽离出记忆，那么，在那个时代里成长、生活的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，\
+也会认认真真卷起来放好，我曾看别人卷过这画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生\
+活 ⑥从这个意义上说，尽管也许并不懂他，但人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. \
+⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n甲与乙准备进行一个游戏：向空中扔\
+三枚硬币，如果它们落地后全是\正面向上或全是反面向上，乙就给甲钱；但若出现两正面一反面或两反面一正面的情况，则由甲给乙钱。乙要求甲每次给10元，那\
+么，从长远来看，甲应该要求乙每次至少给____元才可考虑参加这个游戏。\nA. 10\nB. 15\nC. 20\nD. 30\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧\
+中含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫\
+瘠\nB. 也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野\
+店。时而会有一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔\
+夫就用重物将口封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓\
+口没有盖盖子，但也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是\
+力量\nD. 与人方便，自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶\
+是这样，我的父亲也是这样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，\
+在那个时代里成长、生活的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起\
+来放好，我曾看别人卷过这画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义\
+上说，尽管也许并不懂他，但人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: \
+D\n\n相机：拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n下列著名诗人与其代表作对应有误的是____。\nA. \
+李白——《将进酒》\nB. 白居易——《琵琶行》\nC. 王之焕——《登鹳雀楼》\nD. 杜甫——《长恨歌》\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧\
+中含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫\
+瘠\nB. 也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。\
+时而会有一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用\
+重物将口封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖\
+盖子，但也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与\
+人方便，自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父\
+亲也是这样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成\
+长、生活的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别\
+人卷过这画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不\
+懂他，但人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n经济学上所推崇的“橄榄型”收入分配结构，是指低收入和高收入相对较\
+少、中等收入占绝大多数的分配结构。我国正在采取措施，实施“提低、扩中、调高、打非、保困”的方针，使收入分配朝着“橄榄型”方向发展。这主要是为了\
+促进____。\nA. 生产的发展\nB. 效率的提高\nC. 社会的公平\nD. 内需的扩大\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄\
+____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n-81，-36，-9，0，9，36，____\nA. 49\nB. 64\nC. 81\nD. 100\nAns\
+wer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\nVIP服务本来是个好东西，大企业作为市场竞争的主体，实行差别化服\
+务，无可厚非。但近年来，一些企业纷纷进军医院、机场、车站等公共场所，掏些赞助费，设立所谓“贵宾厅”，霸占公共资源，不仅带来浪费，更造成公共资源分配的不\
+公。这段文字主要强调的是____。\nA. 公共资源不该过度VIP\nB. VIP服务导致了公共资源的不公平分配\nC. 一些企业搬进医院、机场、车站办公\nD. 实行差别化\
+服务是VIP服务的优势所在\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n2，5，8，12，17，24，____\nA. 30\nB. 32\nC. 34\nD. 36\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n4，4，6，12，30，____\nA. 48\nB. 64\nC. 80\nD. 90\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n当下中国文学描写官斗、职斗、婚斗、家斗的作品比较流行，这些作品\
+中包含了不少对日常生活中权术和心机的描写。这样的写作有可能削弱文学对社会的积极影响。文学有必要与正义结盟，形成诗性正义，以提升生活。 作者想表达的主\
+要观点是____。\nA. 当下文学作品的社会影响力有下降的趋势\nB. 流行作品未必是好作品，这需要时间的检验\nC. 文学不应过度渲染权术机诈，否则有可能泯灭正\
+义\nD. 生活中没那么多权术机诈，文学创作应该贴近生活，不能闭门造车\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n一天，一个农民的驴子掉到枯井里，那可怜的驴子在井里凄凉地惨叫了\
+几个钟头，农民亦急得团团转，就是毫无办法把它救起来，最后，他断然认定：驴子已老了，这口枯井也该填起来，不值得花精力去救驴子。他请来所有邻居帮他填井。\
+大家抓起铁锹，开始往井里填土。驴子很快意识到发生了什么事，起初，它恐慌地大哭，不一会儿，居然安静下来。人们忍不住往井里看，奇迹发生了。每一铲砸到驴子\
+背上的土，它都作了出人意料的处理：迅速抖落一身尘土，然后狠狠地用脚踩紧。这样，没过多久，驴子竟然自己把自己升了起来，到了井口，它纵身一跳，平安地跑开\
+了，在场的人均惊诧不已。 这段文字告诉我们的道理是____。\nA. 人生中的每一个困难都是通往成功的垫脚石\nB. 换一种思维常常能够产生意想不到的效果\nC. 冷\
+静思考是克服困难的首要条件\nD. 求人不如求己，很多时候，自己才是自己最大的救星\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中含\
+着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. 也\
+许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有一\
+座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口封\
+住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也没\
+有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自己\
+方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样——\
+那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画像，\
+那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人们心\
+甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空\
+调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n在现代社会，教育符号也即文凭和学历是一种重要的文化货币，手持符号资本，可进入相\
+应职业群体、身份团体和社会位置。譬如，凭借医学博士文凭，可成为医生。此为教育的筛选功能，亦被喻为人才的分类编码场，如同公共汽车总站，目的地不同的人选\
+择不同的路线，乘坐不同的车辆，到达不同的地方。 下列选项不符合文意的一项是____。\nA. 文凭与学历都是符号资本\nB. 教育符号是人才的分类编码\nC. 文凭体\
+现了教育的筛选功能\nD. 手持相应的符号资本才能进入相应的职业群体\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n侯方域：《桃花扇》____\nA. 蒲松龄：《聊斋志异》\nB. 石头记：\
+《红楼梦》\nC. 崔莺莺：《西厢记》\nD. 秦始皇：《后汉书》\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n____全党同志和全国人民团结一心，坚持不懈地奋斗，不断取得扎扎实\
+实的成效，我们____一定能够使社会主义新农村建设真正成为惠及广大农民群众的民心工程。 填入画横线部分最恰当的一项是____。\nA. 如果 就\nB. 只有 才\
+能\nC. 只要 就\nD. 倘若 也就\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中含\
+着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会\
+有一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物\
+将口封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖\
+子，但也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与\
+人方便，自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父\
+亲也是这样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成\
+长、生活的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别\
+人卷过这画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不\
+懂他，但人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n下列关于世界银行的说法中不正确的是____。\nA. 原名国际复兴开发\
+银行，于1944年开始营业\nB. 它是联合国下属的一个专门机构\nC. 是负责长期贷款的国际金融机构\nD. 贷款期限较长，一般为数年，最长可达30年\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n外资银行进入新兴市场国家，新兴市场国家银行业的各主体为了维持自\
+身的生存，会尽可能争取较大的市场份额，充分拓展自身竞争优势，努力向客户提供质优价廉的金融产品和金融服务，这个过程必然带动银行业微观效率的提升。 “这个\
+过程”指的是____。\nA. 外资银行进入新兴市场国家的过程\nB. 新兴市场国家银行业发展的过程\nC. 外资银行提供优质服务的过程\nD. 新兴市场国家银行业扩大市场\
+份额的过程\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中含\
+着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会\
+有一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将\
+口封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，\
+但也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方\
+便，自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也\
+是这样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、\
+生活的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷\
+过这画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂\
+他，但人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n按照行政层级标准来划分，我国政府机构的类型有____。\nA. 一\
+般地方国家行政机关和基层国家行政机关两大类\nB. 常设机构与非常设机构两类\nC. 领导机构、办公办事机构、职能机构和派出机构四类\nD. 中央国家行政机关和地\
+方国家行政机关两大类\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n在某市一项对公司年轻人员的最新调查中，与往年相比，今年有70％的\
+人打算购买房屋，这一比例已达到历史最高值。然而，在房屋管理局的统计中，该市今年的房屋成交量却比往年有所下降。以下哪项如果为真，最不能解释上述现\
+象?____\nA. 一些打算购买房屋的年轻人目前并不具备该市购买房屋的条件\nB. 往年资料表明，年轻人员购买房屋的比例不足购买房屋成员的30％\nC. 近年来爆发的\
+金融风暴，对房地产行业有一定的打击\nD. 近几个月该市楼市价格不稳定，使得一些购房者持观望态度\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n我们以往所理解的“现代化”概念仅仅局限于物质层面，局限于表层经济现代化，这也是\
+迟发展国家长期存在的一个普遍性问题：在物质层面上求变的欲望很强，而在制度层面和观念层面上却是文化守成主义的，这种状况对于现代化实际进程的影响自不必说，\
+它对于学术的影响是导致知识的流俗化。不断地更换新词语，在新词语的装潢下重复古老的思想观念，结果是词语和口号不断地更换而社会精神气质则没有实质性的变化。 \
+这段文字要表达的主要意思是____。\nA. 现代化应包括物质的、制度的、观念的三个层面\nB. 片面理解现代化是迟发展国家长期存在的一个普遍性问题\nC. 物质层面\
+的落后现状是迟发展国家片面理解现代化的一个重要因素\nD. 片面理解现代化会导致知识的流俗化\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n皮肤破损出血、颈髓损伤、锐器插入体内、严重挤压伤等是灾害发生时\
+的常见损伤类型．掌握科学的自救方法对于延续生命、等待救援很重要。下列自救措施中，恰当的是____。\nA. 锐器插人体内后，应快速将锐器拔出，简单处理伤口后\
+立即送往医院救治\nB. 对颈后锐痛、活动时疼痛加剧等症状，即用颈托，一时无颈托，可临时用敷料、硬板纸或塑料板做成颈圈固定颈部\nC. 伤口发生喷射状出血时，\
+应立即用厚消毒纱布(或毛巾)包扎好伤口\nD. 被重物挤压引起肢体肿胀或青紫时，应尽快在患处用热毛巾湿敷消肿\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画像，\
+那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人们心\
+甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空\
+调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n多年以来，医生和家属对待癌症患者大多采取这样的态度：即向患者隐瞒已得癌症的实情，\
+这样的做法在医学上叫作“保护性医疗”，其目的在于减少患者的心理负担。但是，某肿瘤医生新设立的康复科的张主任却主张实行“公开性治疗”。 由此可推知下文将要论\
+述的是____。\nA. 家属对实行“公开性治疗”的态度\nB. “保护性医疗”的弊端\nC. “公开性治疗”将使病情得到控制和好转\nD. “公开性治疗”的含义和形式\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画像\
+，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人们\
+心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空\
+调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n古人归纳总结出许多观天象识天气的谚语。下列与天气变化无关的谚语是____。\nA. 朝\
+霞不出门，晚霞行千里\nB. 天上鱼鳞云，地下雨淋淋\nC. 东风是个精，不下也要阴\nD. 百日连阴雨，总有一日晴\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧\
+中含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸瘠\nB. 也许\
+是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有一座\
+小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口封\
+住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n从《论语》看，孔子对音乐的重视，可以说远远超出了后世那些尊敬他\
+的人的想象，这一方面来自他对于乐的精神艺术的新发现。艺术，只在人们精神的发现中才存在，可以说，就现在见到的材料看，孔子可能是中国历史上最伟大的艺术精\
+神的发现者。这段文字重点强调____。\nA. 孔子在音乐方面的成就与贡献\nB. 后人评价孔子时所存在的偏颇\nC. 艺术精神在乐教传承中的作用\nD. 《论语》作为文\
+献的重要意义\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n①当地球撞进尘埃带时，从地球上看，是短时间内无数尘埃以极高的速\
+度划破大气层下落 ②因此，流星雨实际上是彗星留下的无数尘埃形成的 ③进入大气层的尘埃被大气加热，发出明亮的光 ④彗星释放出的尘埃，并非顷刻扩散到宇宙空间，\
+消失得无影无踪，而是留在彗星的轨道上继续公转 ⑤这样看上去就有许多流星，也就是流星雨 ⑥这样形成的“尘埃带”，有些和地球的公转轨道交叉 将以上6个句子重新排\
+列，语序正确的是____。\nA. ④②⑥③⑤①\nB. ①④③⑥⑤②\nC. ④⑥①③⑤②\nD. ①③⑤②④⑥\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：\
+拍摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n3，7，16，107，____\nA. 1704\nB. 1072\nC. 1707\nD. \
+\1068\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有一\
+座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口封\
+住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n我始终____，开始在内心生活得更严肃的人，也会在外在上开始生活得更____。在一个\
+奢华浪费的年代，我希望能向世界____，人类真正要的东西是非常之微小的。 填入画横线部分最恰当的一项是____。\nA. 确认 朴素 表明\nB. 相信 质朴 证明\nC. \
+确认 质朴 证明\nD. 相信 朴素 表明\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n一特殊跑道为正三角形，某运动员用6米／秒的速度跑一圈耗时50秒，问该运动员提\
+速10％后从跑道的某个顶点横穿跑道跑向对边，问最少约需多少秒可踏足对边?(四舍五入到个位)____\nA. 9秒\nB. 10秒\nC. 13秒\nD. 15秒\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n文学资料在思想史领域著作中，被使用得还是相当少。其实，作为记述史实的历史，可\
+能对有些夸张和虚构的小说需要警惕，但是，作为考察理性和情感的思想史，却不必胶柱鼓瑟或因噎废食，任何文学作品也许在事实上有想象，但在语言、立场和情感上，\
+却仿佛“当堂呈供”，并不能把自己的本相全盘隐匿。 对这段文字的主旨理解最准确的是____。\nA. 文学作品呈现艺术的真实\nB. 思想史研究应体现理性和情\
+感\nC. 文学资料可以作为思想史研究的史料\nD. 思想史研究中要慎用文学资料\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n下列关于国际组织的表述不正确的是____。\nA. 石油输出国组织通过实行\
+石油生产配额限制维护石油生产国利益\nB. 博鳌亚洲论坛是第一个总部设在中国的国际会议组织\nC. 蒙古国是上海合作组织的成员国之一\nD. 国际货币基金组织是联\
+合国的专门机构\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n实验证明，植物体内含有一种觉察光的蛋白质，可以“分辨”光的强弱。这\
+种能力很可能使植物看到人类视力所看不到的波长，而且具有较高的灵敏度。植物能感觉光照射过来的方向，光使植物知道早上什么时候该醒来，同样也能促使植物额外\
+分泌栎精和堪非醇这两种无色色素，他们能过滤强烈的阳光，充分发挥遮光剂的作用，从而保护植物免受紫外线的强烈照射。 这段文字主要介绍的是____。\nA. 植物是\
+怎么辨别方向的\nB. 植物是如何避免阳光暴晒的\nC. 植物具有一定意义上的“视觉”\nD. 感知阳光对植物生长的重要性\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方\
+便，自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也\
+是这样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、\
+生活的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷\
+过这画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂\
+他，但人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n1，10，37，82，145，____\nA. 170\nB. 197\nC. 224\nD. \
+226\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口封\
+住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也没\
+有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自己\
+方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样——\
+那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许多\
+人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画像，\
+那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人们心\
+甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空调\
+：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n某县在一次招商引资活动中，投资商刁难引资方说：“我有三个项目：环境项目、旅游项目\
+和化工项目。如果你说的话是正确的，我会把其中一个项目投资到贵县，但是如果你说的话是错误的，我就一个项目也不投资。”引资方当然想获得环境项目，那么引资\
+方该如何说呢?____\nA. 你不会把环境项目或旅游项目投资到我县\nB. 你不会把环境项目或化工项目投资到我县\nC. 你不会把旅游项目或化工项目投资到我县\nD. 你\
+不会把旅游项目和化工项目都投资到我县\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n民意“被满意”，民众“不满意”，甚至“很生气”。尊重民意、顺应民意、采\
+纳民意是服务型政府的执政要义，是政治文明建设的题中之意。民意的力量一方面取决于民意征集占全民的比例，即广泛性；另一方面也体现在政府对民意的尊重程度\
+上。保障民众的知情权、参与权、表达权和监督权，就是要随时随地与民众进行多种途径的沟通、交流。民意内涵民智，民意关乎民生。我们不仅要从民意中看到民众欢\
+迎什么、反对什么，为科学决策提供依据，而且要充分发挥民智的作用。尊重民意、吸纳民智是科学决策的重要保证，也是衡量政府亲民为民的重要标志。阅读上面文\
+，最符合文意的一项是____。\nA. 让民众“不满意”“很生气”的政府就不是服务型政府\nB. 知情权是监督权的前提，参与权是表达权的前提\nC. 尊重民意、吸纳民智\
+是科学决策的决定性因素\nD. 民意力量的发挥取决于民意征集的广度和尊重民意的程度\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n3，5，16，82，1315，____\nA. 107834\nB. 12849\nC. 12847\nD. 108847\nAns\
+wer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画像\
+，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人们\
+心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空\
+调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n下列可以反映气候垂直变化的诗句是____。\nA. 东边日出西边雨，道是无晴却有晴\nB. \
+罗浮山下四时春，卢橘杨梅次第新\nC. 人间四月芳菲尽，山寺桃花始盛开\nD. 横看成岭侧成峰，远近高低各不同\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口封\
+住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也没\
+有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n日本松下公司日前在东京“松下中心”向当地媒体展示了其面向未来的“零排放概念环保房\
+屋”。环保屋的主要特点是“节能、创能、蓄能”。“节能”就是提高对自然界既有资源的利用率，同时采用环保隔热的建筑材料以及最先进的环保节能家电设备等。 下文最\
+有可能介绍的是____。\nA. 环保屋是怎样设计出来的\nB. 环保屋的创能、蓄能特点\nC. 环保屋的推广\nD. 环保屋的材料\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n下列没有歧义的一项是____。\nA. 几个派出所的民警。\nB. 法院门前的石狮\
+子。\nC. 这份起诉书我写不好。\nD. 咬死了主人的藏獒。\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n我们发现零工制度有一个重要的支持机制就是完善的、科学化的员工培训系统。几乎所\
+有的现代企业和公司都非常重视内部培训，有的企业主甚至成为了培训狂，哪怕有一秒钟的空闲也要为员工安排一次培训。但真正有效的培训并不是无休止的洗脑和课程\
+轰炸，不是“潜能激发”和“感恩教育”，而是适合公司运营需求的专业性、针对性、科学性的业务训练。这种培训机制如果能够建立起来，无论你是否采用零工制度都会对\
+企业的发展起到重要的推动作用。 这段文字意在说明____。\nA. 很多公司培训缺乏科学性\nB. 科学的员工培训对企业很重要\nC. 零工制度不一定适合所有企业\nD.\
+过度培训可能会造成相反效果\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n全国人民代表大会举行会议时，主持大会正式会议的是____。\nA. 全国人\
+大常委会\nB. 大会主席团\nC. 全国人大常委会委员长\nD. 大会秘书长\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样—\
+—那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许\
+多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n改革开放以来，中国农学会____“献身、创新、求实、协作”的宗旨，始终不渝地坚持以\
+推动农业科技进步、促进农村发展为己任，大力开展学术交流和科技普及，积极____和举荐人才，为提高广大农民科技素质、加快农业科技进步作出了重要贡献。 填入画\
+横线部分最恰当的一项是____。\nA. 继承 出谋划策\nB. 继承 建言献策\nC. 秉承 建言献策\nD. 秉承 出谋划策\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n0， 4， 3， 10， 6， 7， ____\nA. 101\nB. 102\nC. 103\nD. 1\
+04\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n“新生代散文”作家大多有写现代诗的背景，诗人所拥有的____的思维、大胆的想象、敏\
+锐的感觉，将“诗质”____在散文语言的血液和肌理里。这不同于平铺直叙式的浅浮的诗意，而是自我心灵的体认中____而成的诗质。 填入画横线部分最恰当的一项\
+是____。\nA. 跳脱 镶嵌 凝结\nB. 另类 浓缩 升华\nC. 感性 渗透 铸就\nD. 活跃 散播 提炼\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB.\
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n据《咬文嚼字》编辑部透露，编制年度“十大流行语”是一项十分严肃的事，既要____到\
+词语在当年的流行度，又要从语文伦理角度加以必要的____，选优汰劣，力争通过“十大流行语”向社会____正能量。 填入画横线部分最恰当的一项是____。\nA. 斟酌 \
+估量 传播\nB. 思考 权衡 传送\nC. 思索 考察 传达\nD. 考虑 考量 传递\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口封\
+住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也没\
+有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自己\
+方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样——\
+那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的许多\
+人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画像，\
+那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人们心\
+甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. 空\
+调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n20世纪60年代以前，世界各国普遍注重防洪的工程措施，即通过修建大堤、水库水利设施\
+对洪水进行控制。但在60年代以后，世界各国在防洪规划中越来越重视非工程措施的运用，即通过洪水预警、灾情评估、洪灾保险等多种手段，结合各种工程措施，从而\
+尽可能减少洪灾对人类经济、环境和社会发展的影响。 这段文字主要谈的是____。\nA. 世界各国防洪理念的转变\nB. 世界各国控制洪水的新途径\nC. 单纯重视防洪\
+工程不能有效控制洪水\nD. 非工程措施逐渐成为防洪规划的主导\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n近年来，国家房地产调控措施的出台十分密集，除了增加公共租赁住房供应\
+外，再加上央行加息，多个城市出现了房屋成交量下跌的态势，房价涨幅开始放缓。这表明____。\nA. 国家通过宏观调控平衡供求关系\nB. 价格的波动通过供求关系表\
+现出来\nC. 宏观调控是资源配置的基础性手段\nD. 宏观调控可以克服市场调节的滞后性\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n学生在操场上列队做操，只知人数在90-110之间。如果排成3排则不多不\
+少：排成5排则少2人；排成7排则少4人。问学生人数是多少人?____\nA. 102\nB. 98\nC. 104\nD. 108\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n有人说：人本是散落的珍珠，随地乱滚。文化就是那极____又强韧的细\
+线，将珠子串起来成为社会。也有人说：文化犹如空气中的氧气，自然界的春雨，不可或缺却____，飘飘洒洒，润物无声。可见，文化资源价值是无法用尺度衡量的。 填\
+入画横线部分最恰当的一项是____。\nA. 柔弱 视之无形\nB. 纤细 不可名状\nC. 结实 视而不见\nD. 薄弱 不可捉摸\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但也\
+没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，自\
+己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这样\
+——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活的\
+许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这画\
+像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但人\
+们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍摄____\nA. \
+空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n政府职能与成本问题一直备受争议，但这方面的研究似乎还处于一种观点与立场远未一致\
+的状态，一个重要原因是研究视角与方法的局限。大体上看，这类研究有两条思路，一条是信守新古典经济学理论预设，认为市场可以有效解决经济社会发展中的问\
+题，持“小政府”观点；另一条是信守政府干预主义理论预设，认为政府不时干预是市场能够健康运转的必要条件。笔者认为，要解决这种困境，必须有新的理论视野和新\
+的研究方法，而新兴古典经济学理论就是其中之一。 这段文字接下来最有可能讲述的是____。\nA. 新兴古典经济学的理论框架与研究方法\nB. 新理论视野对提高政府\
+的行政效率有何帮助\nC. 新古典经济学理论预设的局限性\nD. 政府职能与成本之间矛盾难解的原因\nAnswer:",
+    "The following are multiple choice questions (with answers) about  civil servant.\n\n透过车轮卷起的黄土，却见山野人秋，庄稼割过，静谧中\
+含着一些寂静，只有阳光在切割过的根茬上烁烁闪亮。____。 填入横线上最恰当的是____。\nA. 这是一段颠簸的行程，一路上景色苍凉雄浑，寂静中裸露着贫瘠\nB. \
+也许是久旱的缘故，这边的溪流也变成了涓涓细流，在盘踞的石缝间流动\nC. 同绿色的南方相比，这里是荒凉的，乃至荒蛮\nD. 偶见人迹，大都是荒村野店。时而会有\
+一座小小的孤庙一闪而过\nAnswer: D\n\n据说，在东南沿海一带，渔民在捕到螃蟹后，将螃蟹放进一个上小肚大的竹篓里面，第一只螃蟹放进去时，渔夫就用重物将口\
+封住，当第二只、第三只放进去后，渔夫就不再盖重物了，因为，第一只即将爬出篓口的螃蟹，会被第二只、第三只螃蟹拉到篓底。如此下去，即使篓口没有盖盖子，但\
+也没有一只蟹能够爬出去。 这个故事意在告诉我们____。\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗，团结就是力量\nD. 与人方便，\
+自己方便\nAnswer: C\n\n谨慎：成就____\nA. 温和：好感\nB. 勤奋：努力\nC. 轻松：普通\nD. 好学：智慧\nAnswer: A\n\n①我的奶奶是这样，我的父亲也是这\
+样——那张画像，已经成为许多老百姓生活必需品的一部分，没有它，似乎客厅都是空的 ②如果因为认知能力的提升而将偶像抽离出记忆，那么，在那个时代里成长、生活\
+的许多人，脑子里将空空如也，甚至不记得自己曾经活过这一回 ③卷的过程，是在收叠他个人的历史 ④有时挂旧了、破了，也会认认真真卷起来放好，我曾看别人卷过这\
+画像，那种澄澈的眼神令人难忘 ⑤有些伟大者永远不会被人遗忘，因为那个伟大者，在那个时代，其实是一种生活，精神生活 ⑥从这个意义上说，尽管也许并不懂他，但\
+人们心甘情愿尊他的名为圣 将以上6个句子重新排列，语序正确的是____。\nA. ②⑥⑤①④③\nB. ②⑥④③①⑤\nC. ①④③②⑥⑤\nD. ⑤②⑥①④③\nAnswer: D\n\n相机：拍\
+摄____\nA. 空调：降温\nB. B超：诊断\nC. 电脑：操作\nD. 地图：交通\nAnswer: B\n\n2009年有两次“立春”，很容易让人联想到“第二春”“二度春”，可想而知这\
+样的婚姻不稳定，所以网络上有“2009年不能结婚，或者2009年爱情不会长久”等传闻。但是，大多数年轻人认为，登记结婚是件水到渠成的事，不会因为赶日子仓促提前\
+或延迟。 根据这段文字，下列说法正确的是____。\nA. 作者认为2009年不适合结婚\nB. 大多数年轻人认为2009年是结婚的好年头\nC. 2009年结婚会使婚姻不稳定的\
+说法是无稽之谈\nD. 大多数年轻人不会因为2009年有两次“立春”而改变自己的结婚计划\nAnswer:"
+]
+dataset_calib = get_calib_dataset(tokenizer, calib_set)
+# 对于linear算子中的激活值如果有表示范围过大，或者“尖刺”的异常值过多，
+# 需要使用anti outleir功能，使用方法如下
+
+logging.info("===============start AntiOutlier==============")
+anti_config = AntiOutlierConfig(
+    w_bit=8, a_bit=8, anti_method="m2",
+    dev_type="cpu")  # dev_type="npu", dev_id=0  如果需要使用npu进行量化。
+anti_outlier = AntiOutlier(model,
+                           calib_data=dataset_calib,
+                           cfg=anti_config,
+                           norm_class_name="RMSNorm")
+anti_outlier.process()
+
+#下面是回退层的设置，因为w8a8的对激活值也进行了量化，会有部分网络层对激活值的表示
+#范围较为敏感所以需要回退这些网络层使用浮点权重进行计算。
+
+logging.info("===============end AntiOutlier==============")
+disable_names = []
+BAICHUAN_LAYERS = 32
+disable_idx_lst = list(range(BAICHUAN_LAYERS))
+for layer_index in disable_idx_lst:
+    down_proj_name = "model.layers.{}.mlp.down_proj".format(layer_index)
+    disable_names.append(down_proj_name)
+quant_config = QuantConfig(
+    a_bit=8,
+    w_bit=8,
+    disable_names=disable_names,
+    disable_last_linear=False,
+    dev_type='cpu',  # dev_type="npu", dev_id=0,  如果需要使用npu进行量化
+    act_method=3,
+    pr=1.0,
+    w_sym=True,
+    mm_tensor=False)
+logging.info("===============start Calibrator==============")
+calibrator = Calibrator(model,
+                        quant_config,
+                        calib_data=dataset_calib,
+                        disable_level='L0')
+calibrator.run()  # 执行PTQ量化校准
+calibrator.save(OUTPORT_W8A8_PATH, save_type=[
+    "safe_tensor"
+])  # "safe_tensor"对应safetensors格式权重，"numpy"对应npy格式权重
+logging.info("===============end Calibrator==============")
--- a/mindie/examples/models/baichuan/run_fa.sh
+++ b/mindie/examples/models/baichuan/run_fa.sh
@ -0,0 +1,23 @@
+# copyright (c) Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
+
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MASTER_PORT=20031
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export HCCL_BUFFSIZE=120
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+
+extra_param=""
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_fa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_fa --model_path $1 $extra_param
+fi
--- a/mindie/examples/models/baichuan/run_pa.sh
+++ b/mindie/examples/models/baichuan/run_pa.sh
@ -0,0 +1,20 @@
+#!/bin/bash
+set -ex
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export BIND_CPU=1
+export IS_QUANT=0
+export RESERVED_MEMORY_GB=3
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
+export MASTER_PORT=20036
+export IS_ALIBI_MASK_FREE=0
+export TP_WORLD_SIZE=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+export INT8_FORMAT_NZ_ENABLE=1
+atb_options="ATB_LAUNCH_KERNEL_WITH_TILING=1 ATB_LAYER_INTERNAL_TENSOR_REUSE=1 ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1 PYTORCH_NPU_ALLOC_CONF='max_split_size_mb:2048' HCCL_BUFFSIZE=120"
+atb_async_options="ATB_OPERATION_EXECUTE_ASYNC=1 TASK_QUEUE_ENABLE=1"
+base_cmd="torchrun --nproc_per_node $TP_WORLD_SIZE --master_port $MASTER_PORT -m examples.run_pa --model_path $1"
+if [[ "$2" == "chat" ]]; then
+    base_cmd+=" --is_chat_model"
+fi
+run_cmd="${atb_options} ${atb_async_options} ${base_cmd}"
+eval "${run_cmd}"
--- a/mindie/examples/models/bge/large-zh-v1.5/README.md
+++ b/mindie/examples/models/bge/large-zh-v1.5/README.md
@ -0,0 +1,251 @@
+# README
+
+# 特性矩阵
+- 此矩阵罗列了各bge-large-zh模型支持的特性
+
+| 模型及参数量       | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI  | 长序列 |
+|--------------|-------------------------|---------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ |------| ---- | ------ | ---- |-----|
+| bge-large-zh | 支持world size 1    | 支持world size 1        | √    | ×   | ×               | ×               | ×        | ×         | ×         | ×            | ×    | ×    | ×      | ×    | ×    |
+
+## 离线模型版本
+
+### 模型介绍
+
+bge-large-zh是由智源研究院研发的中文版文本表示模型，可将任意文本映射为低维稠密向量，以用于检索、分类、聚类或语义匹配等任务，并可支持为大模型调用外部知识。其中**1.5版本**的相似度分布更加合理
+
+[开源模型地址](https://huggingface.co/BAAI/bge-large-zh-v1.5)
+
+`Commit-id 79e7739b6ab944e86d6171e44d24c997fc1e0116`
+
+### 模型转换流程
+
+首先获取`huggingface`开源模型，将其转换为ONNX格式，再使用Ascend ATC工具将ONNX格式的模型转换为om格式，我们主要关注该模型在昇腾设备上的精度和性能表现。
+
+### 变量名称解释
+
+|变量名         |含义   |
+| ------------ | ------------ |
+|save_directory |onnx模型以及转换后om离线模型存放目录 |
+|soc_version |昇腾AI处理器的版本，可以通过执行**npu-smi info** 命令查询，在查询到的型号前加Ascend信息，例如：**Ascend910B4、Ascend310P3** |
+|precision_mode_v2 |设置网络模型的精度模式。例如：**fp16、mixed_float16、origin**
+| cur_dir  |运行指令或执行脚本时的路径(当前目录)   |
+|device_id  |npu芯片的id,在装了CANN驱动的服务器上使用npu-smi info查看可用的npu芯片的id |
+
+### 安装python依赖
+
+```shell
+cd ${cur_dir}
+pip install -r requirements.txt
+```
+
+### 安装ais_bench推理工具
+
+[ais_bench推理工具使用指南](https://gitee.com/ascend/tools/blob/master/ais-bench_workload/tool/ais_bench/README.md)
+
+- 需安装**aclruntime**包和**ais_bench**推理程序包
+
+#### 开源模型转换onnx格式
+
+```shell
+cd ${cur_dir}
+python bin2onnx.py --model_path ${save_directory}
+```
+
+#### onnx转换om离线模型
+
+在环境上使用[昇腾ATC](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/devaids/auxiliarydevtool/atlasatc_16_0001.html)将onnx格式转换为om格式的离线模型
+
+- ATC工具集成在CANN中，source相应的环境变量即可
+
+```shell
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+```
+
+在 ${cur_dir} 下运行脚本
+
+```shell
+atc --model=${save_directory}/model.onnx --framework=5 --output=${save_directory}/bge --soc_version=${soc_version} --input_shape="input_ids:-1,-1;attention_mask:-1,-1;token_type_ids:-1,-1" --optypelist_for_implmode="Gelu" --op_select_implmode=high_performance --input_format=ND --precision_mode_v2=${precision_mode} --modify_mixlist=${cur_dir}/ops_info.json
+```
+
+#### 参数说明
+
+- bert模型的三个输入依次为**input_ids**、 **attention_mask**、 **token_type_ids**， 按顺序指定模型输入数据的shape。
+
+- 参照ATC说明文档，设置shape范围时，若设置为 -1，表示此维度可以使用 >=0 的任意取值，该场景下取值上限为 int64 数据类型表达范围，但受限于host和device侧物理内存的大小，用户可以通过增大内存来支持。
+- Gelu算子在不影响精度的情况下开启高性能模式，提升模型性能
+
+- 所配置的精度模式不同，网络模型精度以及性能有所不同，具体为：
+
+精度高低排序：`origin>mixed_float16>fp16`
+
+性能优劣排序：`fp16>=mixed_float16>origin`
+
+推荐配置: **mixed_float16**
+
+- modify_mixlist参数为配置混合精度下的黑白灰名单，目的是控制在fp16精度溢出的算子保持原精度格式，避免其溢出，这里定义了一个将算子写入黑名单的json文件
+
+### 获取测试数据集
+
+```shell
+cd ${cur_dir}
+mkdir dataset
+cd dataset
+```
+
+将[corpus、queries](https://huggingface.co/datasets/C-MTEB/T2Retrieval/tree/main/data)和[dev](https://huggingface.co/datasets/C-MTEB/T2Retrieval-qrels/tree/main/data)下载到该路径下
+
+### 离线模型推理脚本指南  
+
+- om模型推理脚本的启动路径为`${cur_dir}/infer.py`
+- hf开源模型推理脚本的启动路径为`${cur_dir}/demo.py`
+
+在昇腾机器上**运行**`python infer.py --model-path ${save_directory} --device ${device_id}`
+
+或者GPU的权重存放路径上**运行**`python demo.py`
+
+- **说明：**执行infer.py时，脚本会运行模型存放的目录的第一个以.om为结尾的模型，若想指定某个om模型，可以在infer.py中修改
+`session = InferSession(device_id=device, model_path=model_path)` 的 **model_path** 为$`{save_directory}/*.om`
+
+其中，*为OM离线模型文件名。
+
+### 精度 & 性能测试
+
+- 修改Config_bge.json内的模型路径为各模型所在的相应路径
+
+- 精度测试脚本
+
+```shell
+python eval_cmteb.py --model_type_or_path om --device ${device_id}
+```
+
+- 性能测试脚本
+
+```shell
+python eval_performance.py --model_type_or_path om --input_shape [batch_size, seq_len] --device ${device_id}
+```
+
+#### 模型推理性能
+
+性能验证NPU环境使用 `OM` 模型，GPU环境使用 `ONNX` 模型
+
+吞吐率：1000 * batch_size / compute_time
+
+| 环境  | 芯片型号        | batch_size | seq_len | 吞吐率（fps） |
+|-----|-------------|------------|---------|----------|
+| NPU | Ascend310P3 | 8          | 100     | 449.22   |
+| NPU | Ascend310P3 | 20         | 512     | 39.40    |
+| NPU | Ascend310P3 | 128        | 512     | 39.63    |
+| GPU | NVIDIA A10  | 8          | 100     | 149.93   |
+| GPU | NVIDIA A10  | 20         | 512     | 48.21    |
+| GPU | NVIDIA A10  | 128        | 512     | 49.38    |
+
+说明：Atlas 300I Duo 推理卡为单卡双芯，比较吞吐率时需要×2
+
+| 环境  | 芯片型号        | batch_size | seq_len | 吞吐率（fps） |
+|-----|-------------|------------|---------|----------|
+| NPU | Ascend910B4 | 8          | 100     | 696.06   |
+| NPU | Ascend910B4 | 20         | 512     | 132.96   |
+| NPU | Ascend910B4 | 128        | 512     | 123.94   |
+| GPU | NVIDIA L20  | 8          | 100     | 384.60   |
+| GPU | NVIDIA L20  | 20         | 512     | 112.80   |
+| GPU | NVIDIA L20  | 128        | 512     | 104.37   |
+
+#### 模型推理精度
+
+精度验证NPU环境使用 `OM` 模型，GPU环境使用 `ONNX` 模型
+
+| 环境  | 芯片型号        | ndcg@10（%） |
+|-----|-------------|--------|
+| NPU | Ascend310P3 | 83.66  |
+| GPU | Nvidia A10  | 83.67  |
+
+| 环境  | 芯片型号        | ndcg@10（%） |
+|-----|-------------|--------|
+| NPU | Ascend910B4 | 83.86  |
+| GPU | Nvidia L20  | 83.67  |
+
+### Ascend310P3性能说明
+
+在昇腾310P3上，需要进行一项操作来发挥出算子更好的性能
+
+1. SoftmaxV2使能VectorCore：需要在以下路径的json文件中找到SoftmaxV2
+
+```
+/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/config/ascend310p/aic-ascend310p-ops-info.json
+```
+
+加入使能VectorCore
+
+```json
+"enableVectorCore":{
+        "flag":"true"
+}
+```
+
+2. 并且在以下路径中把已经存在的softmax_v2改为其它名称，否则使能不生效
+
+```shell
+ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/kernel/ascend310p
+```
+
+3. 重新进行ATC转换再进行性能测试即可
+
+------------
+
+## 加速库版本
+
+### 离线模型推理脚本指南  
+
+- 接入FA加速库模型推理脚本的启动路径为`${cur_dir}/main.py`
+
+1. 把 **modeling_bert_ascend.py** 的代码替换原生transformers内的 **modeling_bert.py** 的代码
+
+路径为
+
+```shell
+/miniconda/envs/${conda_name}/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py
+```
+
+2. 在昇腾机器上**运行**`python main.py`
+
+### 精度 & 性能测试
+
+- 修改Config_bge.json内的模型路径为各模型所在的相应路径
+
+- 精度测试脚本
+
+```shell
+python eval_cmteb.py --model_type_or_path pytorch --device ${device_id}
+```
+
+- 性能测试脚本
+
+```shell
+python eval_performance.py --model_type_or_path pytorch --input_shape [batch_size, seq_len] --device ${device_id}
+```
+
+#### 模型推理性能
+
+性能验证NPU环境使用 `PYTORCH` 模型，GPU环境使用 `PYTORCH` 模型
+
+吞吐率：1000 * batch_size / compute_time
+
+| 环境  | 芯片型号        | batch_size | seq_len | 吞吐率（fps） |
+|-----|-------------|------------|---------|----------|
+| NPU | Ascend910B4 | 8          | 100     | 486.66   |
+| NPU | Ascend910B4 | 20         | 512     | 1100.48   |
+| NPU | Ascend910B4 | 128        | 512     | 4885.53   |
+| GPU | NVIDIA L40  | 8          | 100     | 453.42   |
+| GPU | NVIDIA L40  | 20         | 512     | 575.13   |
+| GPU | NVIDIA L40  | 128        | 512     | 2104.04   |
+
+#### 模型推理精度
+
+精度验证NPU环境使用 `PYTORCH` 模型，GPU环境使用 `PYTORCH` 模型
+
+| 环境  | 芯片型号           | ndcg@10（%） |
+|-----|-------------       |--------|
+| NPU | Ascend910B4 (fp16) | 83.67  |
+| GPU | Nvidia L40 (fp32)  | 83.67  |
+
+- Ascend310P3待测试
--- a/mindie/examples/models/bge/large-zh-v1.5/bin2onnx.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/bin2onnx.py
@ -0,0 +1,28 @@
+# Copyright 2024 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+from optimum.onnxruntime import ORTModelForFeatureExtraction
+
+parser = argparse.ArgumentParser(description="Export a model from transformers to ONNX format.")
+parser.add_argument("--model_path", type=str, required=True, help="Path to the model checkpoint to convert.")
+
+args = parser.parse_args()
+
+model_checkpoint = args.model_path
+
+ort_model = ORTModelForFeatureExtraction.from_pretrained(model_checkpoint, export=True, from_transformers=True)
+
+# Save the ONNX model
+ort_model.save_pretrained(model_checkpoint)
--- a/mindie/examples/models/bge/large-zh-v1.5/config_bge.json
+++ b/mindie/examples/models/bge/large-zh-v1.5/config_bge.json
@ -0,0 +1,8 @@
+{
+    "default_path": {
+        "tokenizer_path": "./bge-large-zh-v1.5",
+        "pytorch_model_path": "./bge-large-zh-v1.5",
+        "onnx_model_path": "./bge-large-zh-v1.5",
+        "om_model_path": "./bge-large-zh-v1.5/bge_liunx_aarch.om"
+    }
+}
--- a/mindie/examples/models/bge/large-zh-v1.5/configuration_bert.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/configuration_bert.py
@ -0,0 +1,129 @@
+# coding=utf-8
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" BERT model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.utils import logging
+
+logger = logging.get_logger(__name__)
+
+
+class BertConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`BertModel`] or a [`TFBertModel`]. It is used to
+    instantiate a BERT model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the BERT
+    [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) architecture.
+
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+
+
+    Args:
+        vocab_size (`int`, *optional*, defaults to 30522):
+            Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`BertModel`] or [`TFBertModel`].
+        hidden_size (`int`, *optional*, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (`int`, *optional*, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (`int`, *optional*, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
+            `"relu"`, `"silu"` and `"gelu_new"` are supported.
+        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (`int`, *optional*, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (`int`, *optional*, defaults to 2):
+            The vocabulary size of the `token_type_ids` passed when calling [`BertModel`] or [`TFBertModel`].
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
+            Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
+            positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
+            [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
+            For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
+            with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
+        is_decoder (`bool`, *optional*, defaults to `False`):
+            Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        classifier_dropout (`float`, *optional*):
+            The dropout ratio for the classification head.
+
+    Examples:
+
+    ```python
+    >>> from transformers import BertConfig, BertModel
+
+    >>> # Initializing a BERT google-bert/bert-base-uncased style configuration
+    >>> configuration = BertConfig()
+
+    >>> # Initializing a model (with random weights) from the google-bert/bert-base-uncased style configuration
+    >>> model = BertModel(configuration)
+
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+
+    model_type = "bert"
+
+    def __init__(
+            self,
+            vocab_size=30522,
+            hidden_size=768,
+            num_hidden_layers=12,
+            num_attention_heads=12,
+            intermediate_size=3072,
+            hidden_act="gelu",
+            hidden_dropout_prob=0.1,
+            attention_probs_dropout_prob=0.1,
+            max_position_embeddings=512,
+            type_vocab_size=2,
+            initializer_range=0.02,
+            layer_norm_eps=1e-12,
+            pad_token_id=0,
+            position_embedding_type="absolute",
+            use_cache=True,
+            classifier_dropout=None,
+            **kwargs,
+    ):
+        super().__init__(pad_token_id=pad_token_id, **kwargs)
+
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.layer_norm_eps = layer_norm_eps
+        self.position_embedding_type = position_embedding_type
+        self.use_cache = use_cache
+        self.classifier_dropout = classifier_dropout
--- a/mindie/examples/models/bge/large-zh-v1.5/convert.sh
+++ b/mindie/examples/models/bge/large-zh-v1.5/convert.sh
@ -0,0 +1,42 @@
+#!/bin/bash  
+  
+# 定义模型检查点和保存目录  
+model_checkpoint="$1" 
+save_directory="$model_checkpoint"
+soc_version=$(python -c "import torch;import torch_npu;print(torch.npu.get_device_name())")
+
+precision_mode=allow_mix_precision
+
+# 确保当前模型路径下没有同名的model.onnx文件
+if [ -f "$save_directory/model.onnx" ]; then
+    echo "Error: model.onnx already exists in the current path"
+    exit 1
+fi
+
+# 使用Python脚本加载并导出模型到ONNX  
+python -c "  
+from optimum.onnxruntime import ORTModelForFeatureExtraction  
+  
+ort_model = ORTModelForFeatureExtraction.from_pretrained('$model_checkpoint', export=True, from_transformers=True)  
+ort_model.save_pretrained('$save_directory')  
+"  
+  
+# 检查ONNX模型是否成功保存  
+if [ -f "$save_directory/model.onnx" ]; then  
+    echo "ONNX model successfully saved at $save_directory/model.onnx"  
+else  
+    echo "Error: Failed to save ONNX model."  
+    exit 1  
+fi  
+
+
+# 使用ATC命令对ONNX模型进行转换或优化  
+atc --model=$save_directory/model.onnx --framework=5 --output=$save_directory/bge_"$soc_version" --soc_version="$soc_version" --input_shape="input_ids:-1,-1;attention_mask:-1,-1;token_type_ids:-1,-1" --precision_mode="$precision_mode"
+  
+# 检查ATC命令是否执行成功  
+if [ $? -eq 0 ]; then  
+    echo "Model conversion with ATC successful."  
+else  
+    echo "Error: Failed to convert model with ATC."  
+    exit 1  
+fi
--- a/mindie/examples/models/bge/large-zh-v1.5/demo.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/demo.py
@ -0,0 +1,85 @@
+# Copyright 2023 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+import logging
+import torch
+
+try:
+    import torch_npu
+
+    device = "npu:0"
+    torch_npu.npu.set_device(0)
+    torch.npu.set_compile_mode(jit_compile=False)
+except ImportError:
+    device = "cuda:0"
+from transformers import AutoTokenizer, AutoModel
+
+logging.getLogger().setLevel(logging.INFO)
+
+# Sentences we want sentence embeddings for
+sentences = ["样例数据-1", "样例数据-2"]
+MODEL_PATH = "./"
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
+model = AutoModel.from_pretrained(MODEL_PATH).to(device)
+model.eval()
+
+
+def infer(text):
+    # Tokenize sentences
+    encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors='pt', max_length=512)
+    encoded_input = encoded_input.to(device)
+    logging.info(encoded_input.input_ids.shape)
+
+    # Compute token embeddings
+    with torch.no_grad():
+        model_output = model(**encoded_input)
+        # Perform pooling. In this case, cls pooling.
+        sentence_embeddings = model_output[0][:, 0]
+    # normalize embeddings
+    sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
+    logging.info("Sentence embeddings:", sentence_embeddings)
+    logging.info("Sentence embeddings.shape:", sentence_embeddings.shape)
+
+
+def infer_test(text):
+    # Tokenize sentences
+    encoded_input = tokenizer(text, padding="max_length", return_tensors='pt', max_length=512)
+    encoded_input = encoded_input.to(device)
+    logging.info(encoded_input.input_ids.shape)
+
+    # Compute token embeddings
+    with torch.no_grad():
+        start_time = time.time()
+        model_output = model(**encoded_input)
+        end_time = time.time()
+        # Perform pooling. In this case, cls pooling.
+        sentence_embeddings = model_output[0][:, 0]
+    # normalize embeddings
+    sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
+    time_cost = end_time - start_time
+    logging.info("Sentence embeddings:", sentence_embeddings)
+    logging.info("Sentence embeddings.shape:", sentence_embeddings.shape)
+    logging.info("generate cost %g ms", time_cost * 1000)
+    return sentence_embeddings
+
+
+if __name__ == '__main__':
+    try:
+        infer_test(sentences)
+        infer_test(sentences)
+
+    except Exception as e:
+        logging.error("An error occurred during inference:", str(e))
--- a/mindie/examples/models/bge/large-zh-v1.5/eval_cmteb.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/eval_cmteb.py
@ -0,0 +1,304 @@
+# Copyright 2024 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import logging
+import os
+from typing import List, Any, Union
+from collections import defaultdict
+import json
+import numpy as np
+import torch
+import transformers.tokenization_utils_base
+from mteb import MTEB, AbsTaskRetrieval
+from datasets import load_dataset, DatasetDict
+from optimum.onnxruntime import ORTModelForFeatureExtraction
+from transformers import AutoTokenizer, AutoModel
+from tqdm import tqdm as progressbar
+
+from atb_llm.utils.file_utils import safe_open
+
+logging.getLogger().setLevel(logging.INFO)
+
+
+def get_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description='Evaluate LLM.')
+    parser.add_argument(
+        '--model_type_or_path',
+        type=str,
+        required=True,
+        help='Specipy model type to load default model or path to the directory containing model file.'
+    )
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=20,
+        help='Batch size of dataset for computing.'
+    )
+    parser.add_argument(
+        '--device',
+        type=int,
+        default=0,
+        choices=list(range(8)),
+        help='Adapt model on device id x.'
+    )
+    return parser.parse_args()
+
+
+def load_retrieval_data(hf_hub_name, eval_splits):
+    eval_split = eval_splits[0]
+    dataset = load_dataset("parquet", data_files={'corpus': 'dataset/corpus-00000-of-00001-8afe7b7a7eca49e3.parquet',
+                                                  'queries': 'dataset/queries-00000-of-00001-930bf3b805a80dd9.parquet'})
+    qrels = load_dataset("parquet", data_files={eval_split: 'dataset/dev-00000-of-00001-92ed0416056ff7e1.parquet'})[
+        eval_split]
+
+    corpus = {e['id']: {'text': e['text']} for e in dataset['corpus']}
+    queries = {e['id']: e['text'] for e in dataset['queries']}
+    relevant_docs = defaultdict(dict)
+    for e in qrels:
+        relevant_docs[e['qid']][e['pid']] = e['score']
+
+    corpus = DatasetDict({eval_split: corpus})
+    queries = DatasetDict({eval_split: queries})
+    relevant_docs = DatasetDict({eval_split: relevant_docs})
+    return corpus, queries, relevant_docs
+
+
+class T2RetrievalLocal(AbsTaskRetrieval):
+    def __init__(self, **kwargs: Any):
+        super().__init__(**kwargs)
+        self.data_loaded = None
+        self.corpus = None
+        self.queries = None
+        self.relevant_docs = None
+
+    @property
+    def description(self) -> dict:
+        return {
+            'name': 'T2RetrievalLocal',
+            'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
+            'hf_hub_name': 'C-MTEB/T2Retrieval',
+            'reference': "https://arxiv.org/abs/2304.03679",
+            'type': 'Retrieval',
+            'category': 's2p',
+            'eval_splits': ['test'],
+            'eval_langs': ['zh'],
+            'main_score': 'ndcg_at_10',
+        }
+
+    def load_data(self, **kwargs) -> None:
+        if self.data_loaded:
+            return
+        try:
+            self.corpus, self.queries, self.relevant_docs = load_retrieval_data(self.description['hf_hub_name'],
+                                                                                self.description['eval_splits'])
+        except KeyError as e:
+            raise RuntimeError('load dataset failed because {}'.format(e)) from e
+        else:
+            self.data_loaded = True
+
+
+class Model:
+    def __init__(self, tokenizer_path: str, batch_size: int) -> None:
+        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
+        self.batch_size = batch_size
+
+    def encode(self, sentences: List[str], **kwargs: Any) -> torch.Tensor:
+        """ Returns a list of embeddings for the given sentences.
+        Args:
+            sentences (`List[str]`): List of sentences to encode
+
+        Returns:
+            `torch.Tensor`: Tensor of embeddings for the given sentences
+        """
+        pass
+
+    def _tokenize_sentences(self, sentences: List[str]) -> transformers.tokenization_utils_base.BatchEncoding:
+        return self.tokenizer(
+            sentences,
+            padding='max_length',
+            truncation=True,
+            return_tensors='pt',
+            max_length=512
+        )
+
+
+class PyTorchModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
+        super(PyTorchModel, self).__init__(tokenizer_path, batch_size)
+
+        # init model runtime
+        try:
+            import torch_npu
+        except ImportError:
+            self.device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+        else:
+            self.device = 'npu:{}'.format(device_id)
+            torch_npu.npu.set_device(device_id)
+            torch.npu.set_compile_mode(jit_compile=False)
+
+        self.model = AutoModel.from_pretrained(
+            model_path,
+            local_files_only=True,
+            trust_remote_code=True
+        ).half().to(self.device)
+        self.model.eval()
+
+    def encode(self, sentences: List[str], **kwargs: Any) -> Union[np.ndarray, torch.Tensor]:
+        all_embs = []
+
+        for start_index in progressbar(range(0, len(sentences), self.batch_size)):
+            sentences_batch = sentences[start_index:start_index + self.batch_size]
+            # Tokenize sentences
+            encoded_inputs = self._tokenize_sentences(sentences_batch)
+            # Compute token embeddings
+            with torch.no_grad():
+                embs = self.model(**encoded_inputs.to(self.device)).float()
+                sentence_embeddings = embs[:, 0]
+            all_embs.extend(sentence_embeddings.cpu())
+
+        if all_embs:
+            if isinstance(all_embs, np.ndarray):
+                all_embs = torch.from_numpy(all_embs)
+            else:
+                all_embs = torch.stack(all_embs)
+        else:
+            all_embs = torch.Tensor()
+
+        return all_embs
+
+
+class ONNXModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
+        super(ONNXModel, self).__init__(tokenizer_path, batch_size)
+
+        # init model runtime
+        try:
+            import torch_npu
+        except ImportError:
+            self.device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+        else:
+            self.device = 'npu:{}'.format(device_id)
+            torch_npu.npu.set_device(device_id)
+            torch.npu.set_compile_mode(jit_compile=False)
+
+        self.ort = ORTModelForFeatureExtraction.from_pretrained(model_path).to(self.device)
+
+    def encode(self, sentences: List[str], **kwargs: Any) -> Union[np.ndarray, torch.Tensor]:
+        all_embs = []
+        for start_index in progressbar(range(0, len(sentences), self.batch_size)):
+            sentences_batch = sentences[start_index:start_index + self.batch_size]
+            # Tokenize sentences
+            encoded_inputs = self._tokenize_sentences(sentences_batch)
+            # Compute token embeddings
+            encoded_input = encoded_inputs.to(self.device)
+            with torch.no_grad():
+                model_output = self.ort(**encoded_input)
+                # Perform pooling. In this case, cls pooling.
+                sentence_embeddings = model_output[0][:, 0]
+            embs = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
+            all_embs.extend(embs)
+
+        if all_embs:
+            if isinstance(all_embs, np.ndarray):
+                all_embs = torch.from_numpy(all_embs)
+            else:
+                all_embs = torch.stack(all_embs)
+        else:
+            all_embs = torch.Tensor()
+
+        return all_embs
+
+
+class OMModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int) -> None:
+        super(OMModel, self).__init__(tokenizer_path, batch_size)
+
+        # init model runtime
+        from ais_bench.infer.interface import InferSession
+
+        self.session = InferSession(device_id, model_path)
+
+    def encode(self, sentences: List[str], **kwargs: Any) -> Union[np.ndarray, torch.Tensor]:
+        all_embs = []
+
+        for start_index in progressbar(range(0, len(sentences), self.batch_size)):
+            sentences_batch = sentences[start_index:start_index + self.batch_size]
+            # Tokenize sentences
+            encoded_inputs = self._tokenize_sentences(sentences_batch)
+            input_ids = encoded_inputs.data['input_ids']
+            attention_mask = encoded_inputs.data['attention_mask']
+            token_type_ids = encoded_inputs.data['token_type_ids']
+            # Compute token embeddings
+            outputs = self.session.infer(feeds=[input_ids, attention_mask, token_type_ids], mode='dymshape',
+                                         custom_sizes=10000000)[0][:, 0]
+            outputs = torch.from_numpy(outputs)
+            embs = torch.nn.functional.normalize(outputs, p=2, dim=1)
+            all_embs.extend(embs)
+
+        if all_embs:
+            if isinstance(all_embs, np.ndarray):
+                all_embs = torch.from_numpy(all_embs)
+            else:
+                all_embs = torch.stack(all_embs)
+        else:
+            all_embs = torch.Tensor()
+
+        return all_embs
+
+
+def load_model(model_args: argparse.Namespace) -> Model:
+    # default model path
+    with safe_open('config_bge.json', 'r', encoding='utf-8') as reader:
+        text = reader.read()
+    default_path = json.loads(text)['default_path']
+    pytorch_model_path = tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
+    onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
+    om_model_path = os.path.abspath(default_path['om_model_path'])
+
+    model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
+    model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
+
+    model_type = model_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
+    default_model_path = model_path_map.get(model_type, 'not exist')
+    if default_model_path != 'not exist':
+        model_path = (
+            model_args.model_type_or_path
+            if os.path.isdir(model_args.model_type_or_path) or os.path.isfile(model_args.model_type_or_path)
+            else default_model_path
+        )
+    else:
+        raise RuntimeError(
+            'load model failed because '
+            '\'{}\' is not a valid model type or path'.format(model_args.model_type_or_path)
+        )
+    try:
+        model_for_eval = model_map[model_type](
+            tokenizer_path=tokenizer_path,
+            model_path=model_path,
+            batch_size=model_args.batch_size,
+            device_id=model_args.device
+        )
+    except KeyError as e:
+        raise RuntimeError('load {} model failed because {}'.format(model_type, e)) from e
+    return model_for_eval
+
+
+if __name__ == '__main__':
+    args = get_args()
+    model = load_model(args)
+    task = ['T2RetrievalLocal']
+    evaluation = MTEB(tasks=task, task_langs=['zh'])
+    results = evaluation.run(model)
+    logging.info(results)
--- a/mindie/examples/models/bge/large-zh-v1.5/eval_performance.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/eval_performance.py
@ -0,0 +1,302 @@
+# Copyright 2024 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import json
+import logging
+import os
+import time
+from typing import Any, List, Union, Tuple
+
+import datasets
+import numpy as np
+import torch
+import transformers.tokenization_utils_base
+from transformers import AutoTokenizer, AutoModel
+from optimum.onnxruntime import ORTModelForFeatureExtraction
+from tqdm import tqdm as progressbar
+
+from atb_llm.utils.file_utils import safe_open
+
+logging.getLogger().setLevel(logging.INFO)
+
+
+def get_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description='Evaluate LLM.')
+    parser.add_argument(
+        '--model_type_or_path',
+        type=str,
+        required=True,
+        help='Specipy model type to load default model or path to the directory containing model file.'
+    )
+    parser.add_argument(
+        '--input_shape',
+        type=str,
+        required=True,
+        help='Shape of input tensors.'
+    )
+    parser.add_argument(
+        '--device',
+        type=int,
+        default=4,
+        choices=list(range(8)),
+        help='Adapt model on device id x.'
+    )
+    parser.add_argument(
+        '--loop',
+        type=int,
+        default=50,
+        help='Evaluation loops.'
+    )
+    return parser.parse_args()
+
+
+class Model:
+    def __init__(self, tokenizer_path: str) -> None:
+        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
+
+    def init_runtime(self, device_id: int) -> Tuple[Union[str, int], Any]:
+        if self.__class__.__name__.startswith(('PyTorch', 'ONNX')):
+            try:
+                import torch_npu
+            except ImportError:
+                device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+            else:
+                device = 'npu:{}'.format(device_id)
+                torch_npu.npu.set_device(device_id)
+                torch.npu.set_compile_mode(jit_compile=False)
+            return device, 0
+        elif self.__class__.__name__.startswith('OMModel'):
+            from ais_bench.infer.interface import InferSession
+            return device_id, InferSession
+        else:
+            raise RuntimeError
+
+    def tokenize(
+            self,
+            sentences_batch: List[List[str]],
+            seq_len: int
+    ) -> transformers.tokenization_utils_base.BatchEncoding:
+        encoded_inputs = self.tokenizer(
+            sentences_batch,
+            padding='max_length',
+            truncation=True,
+            return_tensors='pt',
+            max_length=512  # seq_len
+        ).to(self.device)
+        return encoded_inputs
+
+    def encode(self, pairs: List[List[str]], seq_len: int) -> float:
+        # Tokenize sentences
+        encoded_inputs = self.tokenize(pairs, seq_len)
+        # Compute token embedding time
+        computing_time = self._encode_batched(encoded_inputs)
+
+        return computing_time
+
+    def compute_scores(self, pairs: List[List[str]], batch_size: int, seq_len: int, loop: int) -> dict:
+        all_computing_time = []
+
+        for _ in progressbar(range(loop), 'Evaluating...'):
+            computing_time = self.encode(pairs, seq_len)
+            all_computing_time.append(computing_time)
+
+        try:
+            throughput = 1000 * batch_size / np.mean(all_computing_time)
+        except ZeroDivisionError as e:
+            raise RuntimeError('{} because no evaluation results'.format(e)) from e
+
+        scores = {
+            'compute_time': {
+                'min': np.min(all_computing_time),
+                'max': np.max(all_computing_time),
+                'mean': np.mean(all_computing_time),
+                'median': np.median(all_computing_time),
+                'percentile(99%)': np.percentile(all_computing_time, 99)
+            },
+            'throughput': throughput
+        }
+
+        return scores
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        """ Returns a list of embeddings for the given sentences.
+
+        Args:
+            inputs (`BatchEncoding`): List of sentences to encode
+
+        Returns:
+            `float: Computing time of embeddings for the given sentences
+        """
+        _ = self
+        return 0.0
+
+
+class PyTorchModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
+        super(PyTorchModel, self).__init__(tokenizer_path)
+        self.device, _ = self.init_runtime(device_id)
+        self.model = AutoModel.from_pretrained(model_path).half().to(self.device)
+        self.model.eval()
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        tick = time.time()
+        with torch.no_grad():
+            model_output = self.model(**inputs)
+            _ = model_output[0][:, 0]
+        tock = time.time()
+        return 1000 * (tock - tick)
+
+
+class ONNXModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
+        super(ONNXModel, self).__init__(tokenizer_path)
+        self.device = self.init_runtime(device_id)
+        self.ort = ORTModelForFeatureExtraction.from_pretrained(model_path).to(self.device)
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        tick = time.time()
+        with torch.no_grad():
+            _ = self.ort(**inputs)
+            # Perform pooling. In this case, cls pooling.
+        tock = time.time()
+        return 1000 * (tock - tick)
+
+
+class OMModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, device_id: int) -> None:
+        super(OMModel, self).__init__(tokenizer_path)
+        self.device, infer_session = self.init_runtime(device_id)
+        self.session = infer_session(device_id, model_path, loop=4)
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        input_ids = inputs.data['input_ids']
+        attention_mask = inputs.data['attention_mask']
+        token_type_ids = inputs.data['token_type_ids']
+        tick = time.time()
+        _ = self.session.infer(feeds=[input_ids, attention_mask, token_type_ids],
+                           mode='dymshape', custom_sizes=5000000)[0][:, 0]
+        tock = time.time()
+
+        return 1000 * (tock - tick) / 4
+
+
+class PerformanceEvaluator:
+    def __init__(self, metadata: dict) -> None:
+        self.metadata = metadata
+        self.dataset = datasets.load_dataset("parquet", data_files={
+            'corpus': 'dataset/corpus-00000-of-00001-8afe7b7a7eca49e3.parquet',
+            'queries': 'dataset/queries-00000-of-00001-930bf3b805a80dd9.parquet'})
+
+        self.samples = self.dataset[self.metadata['eval_splits'][0]]
+
+    def __call__(
+            self,
+            model: Model,
+            input_shape: Union[Tuple, List],
+            loop: int) -> dict:
+        """This is called during training to evaluate the model.
+        It returns scores.
+
+        Args:
+            model (`Model`): the model to evaluate
+            input_shape (`Union[Tuple[int, int], List[int, int]]`): shape of input tensors
+            loop (`int`): evaluation loops
+        """
+        return self.compute_performance(model, input_shape, loop)
+
+    def compute_performance(
+            self,
+            model: Model,
+            input_shape: Union[Tuple, List],
+            loop: int) -> dict:
+        batch_size, seq_len = input_shape
+
+        pairs = []
+        docs = []
+        for sample in self.samples:
+            docs.append(sample['text'])
+        pairs = docs
+        pairs = pairs[:batch_size]
+
+        scores = model.compute_scores(pairs, batch_size, seq_len, loop)
+
+        return scores
+
+
+class Evaluation:
+    def __init__(self, eval_args: argparse.Namespace):
+        self.input_shape = tuple(map(int, eval_args.input_shape.split(',')))
+        self.device_id = eval_args.device
+        self.loop = eval_args.loop
+        # dataset metadata
+        self.metadata = {
+            'name': 'T2RetrievalLocal',
+            'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
+            'reference': 'https://arxiv.org/abs/2304.03679',
+            'type': 'Retrieval',
+            'category': 's2p',
+            'eval_splits': ['corpus'],
+            'eval_langs': ['zh'],
+            'main_score': 'ndcg_at_10'
+        }
+
+        # default model path
+        with safe_open('config_bge.json', 'r', encoding='utf-8') as reader:
+            text = reader.read()
+        default_path = json.loads(text)['default_path']
+        pytorch_model_path = self.tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
+        onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
+        om_model_path = os.path.abspath(default_path['om_model_path'])
+
+        model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
+
+        self.model_type = eval_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
+        default_model_path = model_path_map.get(self.model_type, 'not exist')
+        if default_model_path != 'not exist':
+            self.model_path = (
+                eval_args.model_type_or_path
+                if os.path.isdir(eval_args.model_type_or_path) or os.path.isfile(eval_args.model_type_or_path)
+                else default_model_path
+            )
+        else:
+            raise RuntimeError(
+                'load model failed because '
+                '\'{}\' is not a valid model type or path'.format(eval_args.model_type_or_path)
+            )
+
+    def load_model(self) -> Model:
+        model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
+        try:
+            model = model_map[self.model_type](
+                tokenizer_path=self.tokenizer_path,
+                model_path=self.model_path,
+                device_id=self.device_id
+            )
+        except KeyError as e:
+            raise RuntimeError('load {} model failed because {}'.format(self.model_type, e)) from e
+        return model
+
+    def run(self) -> dict:
+        model = self.load_model()
+        evaluator = PerformanceEvaluator(self.metadata)
+        eval_results = evaluator(model, self.input_shape, self.loop)
+        return eval_results
+
+
+if __name__ == '__main__':
+    args = get_args()
+    evaluation = Evaluation(args)
+    results = evaluation.run()
+    logging.info(results)
--- a/mindie/examples/models/bge/large-zh-v1.5/infer.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/infer.py
@ -0,0 +1,95 @@
+# Copyright 2023 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+import logging
+import torch
+from transformers import AutoTokenizer
+from ais_bench.infer.interface import InferSession
+
+parser = argparse.ArgumentParser(description='Infer with a specified .om model file and device id')
+parser.add_argument('--model-path', type=str, required=True, help='Path to the directory containing the .om model file')
+parser.add_argument('--device', type=int, default=0, choices=[0, 1, 2, 3, 4, 5, 6, 7],
+                    help='load the model.om on device id x')
+
+logging.getLogger().setLevel(logging.INFO)
+
+
+class InferEngine:
+
+    def __init__(self, device_id, model_path):
+        self.device_id = device
+        self.model_path = model_path
+        self.tokenizer = AutoTokenizer.from_pretrained(hf_model_path)
+        # InferSession的初始化表示在某个device的npu芯片上加载模型model.om
+        self.session = InferSession(device_id=device_id, model_path=model_path)
+
+    def infer(self, text):
+        encoded_input = self.tokenizer(text, padding=True, truncation=True, return_tensors='np', max_length=512)
+        input_ids = encoded_input['input_ids']
+        attention_mask = encoded_input['attention_mask']
+        token_type_ids = encoded_input['token_type_ids']
+        inputs = [input_ids, attention_mask, token_type_ids]
+        # feeds传入一组输入数据；mode选择模型类型，static表示输入节点shape固定的静态模型
+        outputs = self.session.infer(feeds=inputs, mode="dymshape", custom_sizes=10000000)[0][:, 0]
+        outputs = torch.from_numpy(outputs)
+        outputs = torch.nn.functional.normalize(outputs, p=2, dim=1)
+
+        logging.info("Sentence embeddings: %s", outputs)
+        logging.info("Sentence embeddings.shape: %s", outputs.shape)
+        return outputs
+
+    def infer_test(self, text):
+        encoded_input = self.tokenizer(text, padding=True, truncation=True, return_tensors='np', max_length=512)
+        input_ids = encoded_input['input_ids']
+        attention_mask = encoded_input['attention_mask']
+        token_type_ids = encoded_input['token_type_ids']
+        inputs = [input_ids, attention_mask, token_type_ids]
+        # feeds传入一组输入数据；mode选择模型类型，static表示输入节点shape固定的静态模型
+        outputs = self.session.infer(feeds=inputs, mode="dymshape", custom_sizes=10000000)[0][:, 0]
+        outputs = torch.from_numpy(outputs)
+        outputs = torch.nn.functional.normalize(outputs, p=2, dim=1)
+
+        logging.info("Sentence embeddings: %s", outputs)
+        logging.info("Sentence embeddings.shape: %s", outputs.shape)
+        # exec_time_list 按先后顺序保留了所有session在执行推理的时间。
+        exec_time = self.session.summary().exec_time_list[-1]
+        time_cost = exec_time[1] - exec_time[0]
+        logging.info("generate cost %g ms", time_cost * 1000)
+        return outputs
+
+    def free(self):
+        self.session.free_resource()
+
+
+if __name__ == '__main__':
+    args = parser.parse_args()
+    device = args.device
+    # Load model from HuggingFace Hub
+    hf_model_path = args.model_path
+    # Sentences we want sentence embeddings for
+    sentences = ["样例数据-1", "样例数据-2"]
+
+    om_files = [f for f in os.listdir(hf_model_path) if f.endswith('.om')]
+    if not om_files:
+        raise ValueError(f"No .om files found in {hf_model_path}")
+
+    # 选择第一个找到的.om文件
+    om_file_name = om_files[0]
+    om_model_path = os.path.join(hf_model_path, om_file_name)
+    infer_engine = InferEngine(device_id=device, model_path=om_model_path)
+    infer_engine.infer_test(sentences)
+    infer_engine.infer_test(sentences)
+    infer_engine.free()
--- a/mindie/examples/models/bge/large-zh-v1.5/main.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/main.py
@ -0,0 +1,83 @@
+# Copyright 2024 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the License);
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+import logging
+import torch
+
+try:
+    import torch_npu
+
+    device = "npu:0"
+    torch_npu.npu.set_device(0)
+    torch.npu.set_compile_mode(jit_compile=False)
+except ImportError:
+    device = "cuda:0"
+from transformers import AutoTokenizer, AutoModel
+
+logging.getLogger().setLevel(logging.INFO)
+
+
+class ModelInference:
+    def __init__(self, model_path):
+        self.model_path = model_path
+        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
+        self.model = AutoModel.from_pretrained(model_path).half().to(device)
+        self.model.eval()
+
+    def infer(self, text):
+        encoded_input = self.tokenizer(
+            text, padding=True, truncation=True, return_tensors="pt", max_length=512
+        )
+        encoded_input = encoded_input.to(device)
+        logging.info(encoded_input.input_ids.shape)
+
+        with torch.no_grad():
+            model_output = self.model(**encoded_input)
+            sentence_embeddings = model_output[0][:, 0]
+
+        sentence_embeddings = torch.nn.functional.normalize(
+            sentence_embeddings, p=2, dim=1
+        )
+        logging.info("Sentence embeddings: %s", sentence_embeddings)
+        logging.info("Sentence embeddings.shape: %s", sentence_embeddings.shape)
+
+    def infer_test(self, text):
+        encoded_input = self.tokenizer(
+            text, padding="max_length", return_tensors="pt", max_length=512
+        )
+        encoded_input = encoded_input.to(device)
+
+        with torch.no_grad():
+            start_time = time.time()
+            model_output = self.model(**encoded_input)
+            end_time = time.time()
+            sentence_embeddings = model_output[:, 0]
+
+        sentence_embeddings = torch.nn.functional.normalize(
+            sentence_embeddings, p=2, dim=1
+        )
+        time_cost = end_time - start_time
+        logging.info("Sentence embeddings: %s", sentence_embeddings)
+        logging.info("Sentence embeddings.shape: %s", sentence_embeddings.shape)
+        logging.info("generate cost %g ms", time_cost * 1000)
+        return sentence_embeddings
+
+
+if __name__ == "__main__":
+    MODEL_PATH = "/data1/models/BAAI/bge-large-zh-v1.5"
+    sentences = ["样例数据-1", "样例数据-2"]
+    model_inference = ModelInference(MODEL_PATH)
+    model_inference.infer_test(sentences)
+    model_inference.infer_test(sentences)
--- a/mindie/examples/models/bge/large-zh-v1.5/modeling_bert_ascend.py
+++ b/mindie/examples/models/bge/large-zh-v1.5/modeling_bert_ascend.py
--- a/mindie/examples/models/bge/large-zh-v1.5/ops_info.json
+++ b/mindie/examples/models/bge/large-zh-v1.5/ops_info.json
@ -0,0 +1,10 @@
+{
+    "black-list": {
+        "to-add": [
+            "Add",
+            "Sub",
+            "Mul",
+            "SoftmaxV2"
+        ]
+    }
+}
--- a/mindie/examples/models/bge/large-zh-v1.5/requirements.txt
+++ b/mindie/examples/models/bge/large-zh-v1.5/requirements.txt
@ -0,0 +1,3 @@
+optimum==1.18.0
+onnx==1.16.0
+onnxruntime==1.17.1
--- a/mindie/examples/models/bge/reranker-large/README.md
+++ b/mindie/examples/models/bge/reranker-large/README.md
@ -0,0 +1,251 @@
+# README
+
+# 特性矩阵
+- 此矩阵罗列了各bge-reranker-large模型支持的特性
+
+| 模型及参数量       | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI  | 长序列 |
+|--------------|-------------------------|---------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ | -------------------------- | ---- | ------ | ---- |-----|
+| bge-reranker-large | 支持world size 1    | 支持world size 1        | √    | ×   | ×               | ×               | ×        | ×         | ×         | ×            | ×                          | ×    | ×      | ×    | ×    |
+
+# bge-reranker-large模型-推理指导
+
+- [概述](#概述)
+  - [输入输出数据](#输入输出数据)
+- [推理环境准备](#推理环境准备)
+- [快速上手](#快速上手)
+  - [获取源码](#获取源码)
+  - [模型转换](#模型转换)
+  - [模型推理](#模型推理)
+- [模型推理性能&精度](#模型推理性能精度)
+  - [模型推理性能](#模型推理性能)
+  - [精度](#精度)
+
+## 概述
+
+### 模型介绍
+
+`bge-reranker-large` 是由智源研究院研发的交叉编码器重排模型，可对查询和答案实时计算相关性分数，这比向量模型（即双编码器）更准确，但比向量模型更耗时。
+
+### 开源模型地址
+
+```text
+url=https://huggingface.co/BAAI/bge-reranker-large
+commit_id=bc0c7056d15eaea221616887bf15da63743d19e1
+model_name=bge-reranker-large
+```
+
+### 路径变量解释
+
+```text
+{cur_dir}
+├─ .cache
+│  ├─ huggingface
+│  │  └─ datasets
+│  │     └─ C-MTEB
+│  │        └─ T2Reranking
+│  │           └─ dev-00000-of-00001-65d96bde8023d9b9.parquet
+├─ models
+│  ├─ om
+│  │  ├─ bge-reranker-large_{soc_version}_{precision_mode}_linux_aarch64.om
+│  ├─ onnx
+│  │  ├─ model.onnx
+│  └─ pytorch
+│     └─ pytorch_model.bin
+├─ eval_performance.py
+├─ eval_precision.py
+└─ run.py
+```
+
+| 变量名            | 含义                                                                                                                                                      |
+|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
+| soc_version    | npu芯片的处理器的版本，可以使用 `npu-smi info` 查询                                                                                                                     |
+| precision_mode | 转换的om模型的精度模式，参考[ATC工具参数](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/devaids/auxiliarydevtool/atlasatc_16_0099.html) |
+
+
+### 输入输出数据
+
+**输入数据**
+
+| 输入数据           | 数据类型  | 大小                   | 数据排布格式 |
+|----------------|-------|----------------------|--------|
+| input_ids      | INT64 | batch_size * seq_len | ND     |
+| attention_mask | INT64 | batch_size * seq_len | ND     |
+
+**输出数据**
+
+| 输出数据   | 数据类型    | 大小                 | 数据排布格式 |
+|--------|---------|--------------------|--------|
+| output | FLOAT32 | batch_size * class | ND     |
+
+## 推理环境准备
+
+**该模型需要以下插件与驱动**
+
+| 配套      | 版本       | 环境准备指导                                                                                                        |
+|---------|----------|---------------------------------------------------------------------------------------------------------------|
+| 固件与驱动   | 23.0.RC3 | [Pytorch框架推理环境准备](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/pies/pies_00001.html) |
+| CANN    | 7.0.RC1  | -                                                                                                             |
+| Python  | 3.10     | -                                                                                                             |
+| Pytorch | 2.1.0    | -                                                                                                             |
+
+说明：Atlas 300I Duo 推理卡请以 CANN 版本选择实际固件与驱动版本。
+
+## 快速上手
+
+### 获取源码
+
+1. 获取本项目源码
+    ```shell
+    git clone https://gitee.com/ascend/MindIE-LLM.git                 # 克隆本仓库代码
+    git checkout master                                               # 切换对应分支
+    cd examples/atb_models/pytorch/examples/BAAI/bge-reranker-large   # 打开工作（当前）目录 {cur_dir}
+    ```
+
+2. 安装依赖
+
+    安装python依赖
+    ```shell
+    pip install -r requirements.txt
+    ```
+    下载安装 `ais_bench` 推理工具
+
+    [ais_bench推理工具使用指南](https://gitee.com/ascend/tools/blob/master/ais-bench_workload/tool/ais_bench/README.md)
+    ```shell
+    pip install ./aclruntime-{version}-{python_version}-linux_{arch}.whl
+    pip install ./ais_bench-{version}-py3-none-any.whl
+    # {version}表示软件版本号，{python_version}表示Python版本号，{arch}表示CPU架构
+    ```
+
+3. 获取开源模型
+    ```shell
+    git lfs install
+    GIT_LFS_SKIP_SMUDGE=1 git clone https://gitee.com/ascend/MindIE-LLM.git
+    ```
+
+4. 准备数据集
+
+    下载 [C-MTEB/T2Reranking](https://huggingface.co/datasets/C-MTEB/T2Reranking) 数据集
+    
+    ```shell
+    mkdir .cache/huggingface/datasets/C-MTEB/
+    cd .cache/huggingface/datasets/C-MTEB/
+    git clone https://huggingface.co/datasets/C-MTEB/T2Reranking
+    mv T2Reranking/data/dev-00000-of-00001-65d96bde8023d9b9.parquet T2Reranking/
+    ```
+
+### 模型转换
+
+1. 获取开源模型 pytorch 权重文件 [pytorch_model.bin](https://huggingface.co/BAAI/bge-reranker-large/blob/main/pytorch_model.bin)，放在 `models/pytorch` 目录中
+
+2. 获取开源模型 onnx 权重文件 [model.onnx](https://huggingface.co/BAAI/bge-large-zh-v1.5/resolve/main/pytorch_model.bin?download=true)，放在 `models/onnx` 目录中
+
+3. 运行脚本转换模型
+
+    ```shell
+    bash ${cur_dir}/convert.sh ${onnx} ${om} ${precision_mode}
+    ```
+    
+    - 参数说明，参考 [ATC工具参数](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/devaids/auxiliarydevtool/atlasatc_16_0039.html)
+      - `onnx`：转换后的onnx模型文件路径
+      - `om`：转换后的om模型文件路径
+      - `precision_mode`：模型精度模式，精度高低排序 `origin>mixed_float16>fp16`，性能优劣排序 `fp16>=mixed_float16>origin`，推荐使用 `mixed_float16` 以在保证精度的前提下获得最大性能，默认为 `mixed_float16`
+
+### 模型推理
+
+1. 执行推理
+
+    ```shell
+    python run.py \
+      --model_type_or_path=${model_type} or ${model_path}
+      --device=${device}
+    ```
+    
+    - 参数说明
+      - `model_type_or_path`：选择需要推理的模型类型或模型文件路径
+      - `device`：选择加载模型的芯片id
+
+2. 性能测试
+
+    ```shell
+    python eval_performance.py \
+      --model_type_or_path=${model_type} or ${model_path} \
+      --input_shape=${batch_size},${seq_len} \
+      --device=${device} \
+      --loop=${loop}
+    ```
+    
+    - 参数说明
+      - `model_type_or_path`：选择需要推理的模型类型或模型文件路径
+      - `batch_size`：选择每次推理时加载的数据集长度
+      - `seq_len`：选择每次推理时加载的文本长度
+      - `device`：选择加载模型的芯片id
+      - `loop`：验证循环次数
+
+3. 精度测试
+
+    ```shell
+    python eval_precision.py \
+      --model_type_or_path=${model_type} or ${model_path} \
+      --batch_size=${batch_size} \
+      --device=${device}
+    ```
+    
+    - 参数说明
+      - `model_type_or_path`：选择需要推理的模型类型或模型文件路径
+      - `batch_size`：选择每次推理时加载的数据集长度
+      - `device`：选择加载模型的芯片id
+
+## 模型推理性能&精度
+
+### 模型推理性能
+
+吞吐率：1000 * batch_size / compute_time
+
+| 环境  | 芯片型号        | batch_size | seq_len | 吞吐率（fps） |
+|-----|-------------|------------|---------|----------|
+| NPU | Ascend310P3 | 20         | 512     | 43.84    |
+| NPU | Ascend310P3 | 50         | 512     | 44.23    |
+| GPU | NVIDIA A10  | 20         | 512     | 46.43    |
+| GPU | NVIDIA A10  | 50         | 512     | 49.16    |
+
+说明：Atlas 300I Duo 推理卡为单卡双芯，比较吞吐率时需要×2
+
+| 环境  | 芯片型号        | batch_size | seq_len | 吞吐率（fps） |
+|-----|-------------|------------|---------|----------|
+| NPU | Ascend910B4 | 20         | 512     | 144.02   |
+| NPU | Ascend910B4 | 50         | 512     | 135.82   |
+| GPU | NVIDIA L40S | 20         | 512     | 119.75   |
+| GPU | NVIDIA L40S | 50         | 512     | 113.42   |
+
+### 模型推理精度
+
+精度验证NPU环境使用 `OM` 模型，GPU环境使用 `ONNX` 模型
+
+有数据集精度验证选择 [C-MTEB/T2Reranking](https://huggingface.co/datasets/C-MTEB/T2Reranking) 任务，开源模型在该任务下 MAP 分数为 67.28
+
+| 环境  | 芯片型号        | MAP（%） | MRR@10（%） | 执行时间（s） |
+|-----|-------------|--------|-----------|---------|
+| NPU | Ascend310P3 | 67.60  | 77.68     | 4496.25 |
+| GPU | Nvidia A10  | 67.61  | 77.66     | 2216.56 |
+
+| 环境  | 芯片型号        | MAP（%） | MRR@10（%） | 执行时间（s） |
+|-----|-------------|--------|-----------|---------|
+| NPU | Ascend910B4 | 67.60  | 77.66     | 985.30  |
+| GPU | Nvidia L40S | 67.61  | 77.66     | 991.57  |
+
+说明：
+
+- MAP：平均精度均值（Mean Average Precision）$MAP = \frac{1}{|U|} \sum_{i=1}^{|U|} hit(i) \times \frac{1}{P_i}$
+- MRR：平均倒数排名（Mean Reciprocal Rank）$MRR = \frac{1}{N} \sum_{i=1}^N \frac{1}{p_i}$
+
+无数据集精度验证选择输入 `[[query, positive], [query, negative]]` 文本，`torch.allclose` 满足1%精度
+
+| 环境  | 芯片型号        | 推理结果                     |
+|-----|-------------|--------------------------|
+| NPU | Ascend310P3 | tensor([7.5195, 1.3613]) |
+| GPU | Nvidia A10  | tensor([7.5152, 1.3654]) |
+
+| 环境  | 芯片型号        | 推理结果                     |
+|-----|-------------|--------------------------|
+| NPU | Ascend910B4 | tensor([7.5195, 1.3779]) |
+| GPU | Nvidia L40S | tensor([7.5140, 1.3697]) |
--- a/mindie/examples/models/bge/reranker-large/config.json
+++ b/mindie/examples/models/bge/reranker-large/config.json
@ -0,0 +1,8 @@
+{
+    "default_path": {
+        "tokenizer_path": "models/pytorch",
+        "pytorch_model_path": "models/pytorch",
+        "onnx_model_path": "models/onnx",
+        "om_model_path": "models/om/bge-reranker-large_Ascend910B4_allow_mix_precision_linux_aarch64.om"
+    }
+}
--- a/mindie/examples/models/bge/reranker-large/convert.sh
+++ b/mindie/examples/models/bge/reranker-large/convert.sh
@ -0,0 +1,39 @@
+#!/bin/bash
+
+# 定义模型检查点和保存目录
+onnx_directory="$1"
+om_directory="$2"
+soc_version=$(python -c "import torch;import torch_npu;print(torch.npu.get_device_name())")
+
+# 检查是否输入了转换精度参数
+if [ -z "$3" ]; then
+    precision_mode=mixed_float16
+else
+    precision_mode="$3"
+fi
+
+# 检查ONNX模型是否存在
+if [ -f "$onnx_directory/model.onnx" ]; then
+    echo "ONNX model found at $onnx_directory/model.onnx"
+else
+    echo "Error: Unable to find ONNX model."
+    exit 1
+fi
+
+# 使用ATC命令对ONNX模型进行转换或优化
+atc --model="$onnx_directory/model.onnx" \
+    --framework=5 \
+    --output="$om_directory/bge-reranker-large_'$soc_version'_'$precision_mode'" \
+    --soc_version="$soc_version" \
+    --input_shape="input_ids:-1,-1;attention_mask:-1,-1" \
+    --precision_mode_v2="$precision_mode" \
+    --modify_mixlist="$om_directory/ops_info.json"
+
+# 检查ATC命令是否执行成功
+# shellcheck disable=SC2181
+if [ $? -eq 0 ]; then
+    echo "Model conversion with ATC successful."
+else
+    echo "Error: Failed to convert model with ATC."
+    exit 1
+fi
--- a/mindie/examples/models/bge/reranker-large/eval_performance.py
+++ b/mindie/examples/models/bge/reranker-large/eval_performance.py
@ -0,0 +1,299 @@
+# Copyright Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
+import argparse
+import json
+import logging
+import os
+import time
+from typing import Any, List, Union, Tuple
+
+import datasets
+import numpy as np
+import torch
+import transformers.tokenization_utils_base
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+from optimum.onnxruntime import ORTModelForSequenceClassification
+from tqdm import tqdm as progressbar
+
+from atb_llm.utils.file_utils import safe_open
+
+
+def get_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description='Evaluate LLM.')
+    parser.add_argument(
+        '--model_type_or_path',
+        type=str,
+        required=True,
+        help='Specipy model type to load default model or path to the directory containing model file.'
+    )
+    parser.add_argument(
+        '--input_shape',
+        type=str,
+        required=True,
+        help='Shape of input tensors.'
+    )
+    parser.add_argument(
+        '--device',
+        type=int,
+        default=6,
+        choices=list(range(8)),
+        help='Adapt model on device id x.'
+    )
+    parser.add_argument(
+        '--loop',
+        type=int,
+        default=50,
+        help='Evaluation loops.'
+    )
+    return parser.parse_args()
+
+
+class Model:
+    def __init__(self, tokenizer_path: str, device_id: int) -> None:
+        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
+        self.device, self.runtime = self.init_runtime(device_id)
+
+    def init_runtime(self, device_id: int) -> Tuple[Union[str, int], Any]:
+        if self.__class__.__name__.startswith(('PyTorch', 'ONNX')):
+            try:
+                import torch_npu
+            except ImportError:
+                device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+            else:
+                device = 'npu:{}'.format(device_id)
+                torch_npu.npu.set_device(device_id)
+                torch.npu.set_compile_mode(jit_compile=False)
+            return device, 0
+        elif self.__class__.__name__.startswith('OM'):
+            from ais_bench.infer.interface import InferSession
+            return device_id, InferSession
+        else:
+            raise RuntimeError
+
+    def tokenize(
+            self,
+            sentences_batch: List[List[str]],
+            seq_len: int
+    ) -> transformers.tokenization_utils_base.BatchEncoding:
+        encoded_inputs = self.tokenizer(
+            sentences_batch,
+            padding='max_length',
+            truncation=True,
+            return_tensors='pt',
+            max_length=seq_len
+        ).to(self.device)
+        return encoded_inputs
+
+    def encode(self, encoded_inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        # Compute token embedding time
+        computing_time = self._encode_batched(encoded_inputs)
+
+        return computing_time
+
+    def compute_scores(self, pairs: List[List[str]], batch_size: int, seq_len: int, loop: int) -> dict:
+        # Tokenize sentences
+        encoded_inputs = self.tokenize(pairs, seq_len)
+
+        all_computing_time = []
+        for _ in progressbar(range(loop), 'Evaluating...'):
+            computing_time = self.encode(encoded_inputs)
+            all_computing_time.append(computing_time)
+
+        try:
+            throughput = 1000 * batch_size / np.mean(all_computing_time)
+        except ZeroDivisionError as e:
+            raise RuntimeError('{} because no evaluation results'.format(e)) from e
+
+        scores = {
+            'compute_time': {
+                'min': np.min(all_computing_time),
+                'max': np.max(all_computing_time),
+                'mean': np.mean(all_computing_time),
+                'median': np.median(all_computing_time),
+                'percentile(99%)': np.percentile(all_computing_time, 99)
+            },
+            'throughput': throughput
+        }
+
+        return scores
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        """ Returns a list of embeddings for the given sentences.
+
+        Args:
+            inputs (`BatchEncoding`): List of sentences to encode
+
+        Returns:
+            `float: Computing time of embeddings for the given sentences
+        """
+        # 规避【华为Python规范】【建议】G.CLS.07 类的方法不需要访问实例时，建议定义为staticmethod或classmethod
+        _ = self
+        return 0.0
+
+
+class PyTorchModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
+        super(PyTorchModel, self).__init__(tokenizer_path, device_id)
+        self.model = AutoModelForSequenceClassification.from_pretrained(
+            model_path,
+            local_files_only=True,
+            trust_remote_code=True
+        ).half().to(self.device)
+        self.model.eval()
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        tick = time.time()
+        with torch.no_grad():
+            self.model(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
+        tock = time.time()
+        return 1000 * (tock - tick)
+
+
+class ONNXModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
+        super(ONNXModel, self).__init__(tokenizer_path, device_id)
+        self.ort = ORTModelForSequenceClassification.from_pretrained(model_path).to(self.device)
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        tick = time.time()
+        with torch.inference_mode():
+            self.ort(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
+        tock = time.time()
+        return 1000 * (tock - tick)
+
+
+class OMModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, device_id: int) -> None:
+        super(OMModel, self).__init__(tokenizer_path, device_id)
+        self.session = self.runtime(device_id, model_path)
+
+    def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
+        input_ids = inputs.data['input_ids'].numpy().astype(np.int64)
+        attention_mask = inputs.data['attention_mask'].numpy().astype(np.int64)
+
+        tick = time.time()
+        self.session.infer(feeds=[input_ids, attention_mask], mode='dymshape', custom_sizes=10000000)
+        tock = time.time()
+
+        return 1000 * (tock - tick)
+
+
+class PerformanceEvaluator:
+    def __init__(self, metadata: dict) -> None:
+        self.metadata = metadata
+
+        # load dataset from HuggingFace hub
+        self.dataset = datasets.load_dataset(
+            self.metadata['dataset']['path'].split('.')[-1],
+            data_files={self.metadata['eval_splits'][0]: self.metadata['dataset']['path']}
+        )
+        self.samples = self.dataset[self.metadata['eval_splits'][0]]
+
+    def __call__(
+            self,
+            model: Model,
+            input_shape: Union[Tuple, List],
+            loop: int) -> dict:
+        """This is called during training to evaluate the model.
+        It returns scores.
+
+        Args:
+            model (`Model`): the model to evaluate
+            input_shape (`Union[Tuple[int, int], List[int, int]]`): shape of input tensors
+            loop (`int`): evaluation loops
+        """
+        return self.compute_performance(model, input_shape, loop)
+
+    def compute_performance(
+            self,
+            model: Model,
+            input_shape: Union[Tuple, List],
+            loop: int) -> dict:
+        batch_size, seq_len = input_shape
+
+        pairs = []
+        for sample in self.samples:
+            query = sample['query']
+            docs = []
+            docs.extend(sample['positive'])
+            docs.extend(sample['negative'])
+            for doc in docs:
+                pairs.append([query, doc])
+        pairs = pairs[:batch_size]
+
+        scores = model.compute_scores(pairs, batch_size, seq_len, loop)
+
+        return scores
+
+
+class Evaluation:
+    def __init__(self, eval_args: argparse.Namespace):
+        self.input_shape = tuple(map(int, eval_args.input_shape.split(',')))
+        self.device_id = eval_args.device
+        self.loop = eval_args.loop
+
+        # dataset metadata
+        self.metadata = {
+            'name': 'T2RerankingLocal',
+            'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
+            'reference': 'https://arxiv.org/abs/2304.03679',
+            'dataset': {
+                'path': '.cache/huggingface/datasets/C-MTEB/T2Reranking/dev-00000-of-00001-65d96bde8023d9b9.parquet',
+                'revision': '76631901a18387f85eaa53e5450019b87ad58ef9',
+            },
+            'type': 'Reranking',
+            'category': 's2p',
+            'eval_splits': ['test'],
+            'eval_langs': ['zh'],
+            'main_score': 'map'
+        }
+
+        # default model path
+        with safe_open('config.json', 'r', encoding='utf-8') as reader:
+            text = reader.read()
+        default_path = json.loads(text)['default_path']
+        pytorch_model_path = self.tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
+        onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
+        om_model_path = os.path.abspath(default_path['om_model_path'])
+
+        model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
+
+        self.model_type = eval_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
+        default_model_path = model_path_map.get(self.model_type, 'not exist')
+        if default_model_path != 'not exist':
+            self.model_path = (
+                eval_args.model_type_or_path
+                if os.path.isdir(eval_args.model_type_or_path) or os.path.isfile(eval_args.model_type_or_path)
+                else default_model_path
+            )
+        else:
+            raise RuntimeError(
+                'load model failed because '
+                '\'{}\' is not a valid model type or path'.format(eval_args.model_type_or_path)
+            )
+
+    def load_model(self) -> Model:
+        model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
+        try:
+            model = model_map[self.model_type](
+                tokenizer_path=self.tokenizer_path,
+                model_path=self.model_path,
+                device_id=self.device_id
+            )
+        except KeyError as e:
+            raise RuntimeError('load {} model failed because {}'.format(self.model_type, e)) from e
+        return model
+
+    def run(self) -> dict:
+        model = self.load_model()
+        evaluator = PerformanceEvaluator(self.metadata)
+        eval_results = evaluator(model, self.input_shape, self.loop)
+        return eval_results
+
+
+if __name__ == '__main__':
+    logger = logging.getLogger(__name__)
+    logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.INFO)
+    args = get_args()
+    evaluation = Evaluation(args)
+    results = evaluation.run()
+    logging.info(results)
--- a/mindie/examples/models/bge/reranker-large/eval_precision.py
+++ b/mindie/examples/models/bge/reranker-large/eval_precision.py
@ -0,0 +1,351 @@
+# Copyright Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
+import argparse
+import json
+import logging
+import os
+from typing import Any, List, Union, Tuple
+
+import datasets
+import numpy as np
+import torch
+import transformers.tokenization_utils_base
+from mteb import MTEB, AbsTaskReranking
+from C_MTEB.tasks import ChineseRerankingEvaluator
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+from optimum.onnxruntime import ORTModelForSequenceClassification
+from tqdm import tqdm as progressbar
+
+from atb_llm.utils.file_utils import safe_open
+
+
+def get_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description='Evaluate LLM.')
+    parser.add_argument(
+        '--model_type_or_path',
+        type=str,
+        required=True,
+        help='Specipy model type to load default model or path to the directory containing model file.'
+    )
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=20,
+        help='Batch size of dataset for computing.'
+    )
+    parser.add_argument(
+        '--device',
+        type=int,
+        default=6,
+        choices=list(range(8)),
+        help='Adapt model on device id x.'
+    )
+    return parser.parse_args()
+
+
+# copied from mteb.evaluation.evaluators.utils.cos_sim
+def cos_sim(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+    """Computes the cosine similarity cos_sim(a[i], b[j]) for all i and j.
+
+    Returns:
+        Matrix with res[i][j]  = cos_sim(a[i], b[j])
+    """
+    if not isinstance(a, torch.Tensor):
+        a = torch.tensor(a)
+    if not isinstance(b, torch.Tensor):
+        b = torch.tensor(b)
+    if len(a.shape) == 1:
+        a = a.unsqueeze(0)
+    if len(b.shape) == 1:
+        b = b.unsqueeze(0)
+
+    a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
+    b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
+
+    # transpose will cause RuntimeError in C_MTEB.tasks.ChineseRerankingEvaluator.compute_metrics_from_biencoder():
+    # mat1 and mat2 shapes cannot be multiplied
+    try:
+        similarity = torch.mm(a_norm, b_norm.transpose(0, 1))
+    except RuntimeError:
+        similarity = torch.mm(a_norm, b_norm)
+
+    return similarity
+
+
+class ChineseRerankingEvaluatorTweaked(ChineseRerankingEvaluator):
+    # copied from mteb.evaluation.evaluators.RerankingEvaluator._compute_metrics_instance with similarity_fct->cos_sim
+    def _compute_metrics_instance(
+            self,
+            query_emb: torch.Tensor,
+            docs_emb: torch.Tensor,
+            is_relevant: List[bool]
+    ) -> dict[str, float]:
+        """Computes metrics for a single instance = (query, positives, negatives)
+
+        Args:
+            query_emb (`torch.Tensor` of shape `(num_queries, hidden_size)`): Query embedding
+                if `num_queries` > 0: we take the closest document to any of the queries
+            docs_emb (`torch.Tensor` of shape `(num_pos+num_neg, hidden_size)`): Candidates documents embeddings
+            is_relevant (`List[bool]` of length `num_pos+num_neg`): True if the document is relevant
+
+        Returns:
+            scores (`Dict[str, float]`):
+                - `mrr`: Mean Reciprocal Rank @ `self.mrr_at_k`
+                - `ap`: Average Precision
+        """
+        pred_scores = cos_sim(query_emb, docs_emb)
+        if len(pred_scores.shape) > 1:
+            pred_scores = torch.amax(pred_scores, dim=0)
+
+        pred_scores_argsort = torch.argsort(-pred_scores)  # Sort in decreasing order
+
+        mrr = self.mrr_at_k_score(is_relevant, pred_scores_argsort, self.mrr_at_k)
+        ap = self.ap_score(is_relevant, pred_scores.cpu().tolist())
+        return {'mrr': mrr, 'ap': ap}
+
+
+# copied from C_MTEB.tasks.Reranking.evaluate
+def evaluate(self, model_for_eval, split: str = 'test', **kwargs: Any) -> dict[str, float]:
+    if not self.data_loaded:
+        self.load_data()
+
+    data_split = self.dataset[split]
+
+    evaluator = ChineseRerankingEvaluatorTweaked(data_split, **kwargs)
+    scores = evaluator(model_for_eval)
+
+    return dict(scores)
+
+
+# rewrite
+AbsTaskReranking.evaluate = evaluate
+
+
+# custom task
+class T2RerankingLocal(AbsTaskReranking):
+    # 规避【华为Python规范】【要求】G.CLS.08 避免在__init__方法外定义类实例属性
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.dataset = None
+        self.data_loaded = None
+
+    @property
+    def description(self) -> dict:
+        return {
+            'name': 'T2RerankingLocal',
+            'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
+            'reference': "https://arxiv.org/abs/2304.03679",
+            'dataset': {
+                'path': '.cache/huggingface/datasets/C-MTEB/T2Reranking/dev-00000-of-00001-65d96bde8023d9b9.parquet',
+                'revision': '76631901a18387f85eaa53e5450019b87ad58ef9',
+            },
+            'type': 'Reranking',
+            'category': 's2p',
+            'eval_splits': ['test'],
+            'eval_langs': ['zh'],
+            'main_score': 'map',
+        }
+
+    def load_data(self, **kwargs) -> None:
+        if self.data_loaded:
+            return
+
+        try:
+            self.dataset = datasets.load_dataset(
+                'parquet',
+                data_files={self.description['eval_splits'][0]: self.description['dataset']['path']}
+            )
+        except KeyError as e:
+            raise RuntimeError('load dataset failed because {}'.format(e)) from e
+        else:
+            self.data_loaded = True
+
+
+# custom model
+class Model:
+    def __init__(self, tokenizer_path: str, batch_size: int, device_id: int) -> None:
+        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
+        self.device, self.runtime = self.init_runtime(device_id)
+        self.batch_size = batch_size
+
+    def init_runtime(self, device_id: int) -> Tuple[Union[str, int], Any]:
+        if self.__class__.__name__.startswith(('PyTorch', 'ONNX')):
+            try:
+                import torch_npu
+            except ImportError:
+                device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+            else:
+                device = 'npu:{}'.format(device_id)
+                torch_npu.npu.set_device(device_id)
+                torch.npu.set_compile_mode(jit_compile=False)
+            return device, 0
+        elif self.__class__.__name__.startswith('OM'):
+            from ais_bench.infer.interface import InferSession
+            return device_id, InferSession
+        else:
+            raise RuntimeError
+
+    def encode(self, sentences: List[str]) -> torch.Tensor:
+        """ Returns a list of embeddings for the given sentences.
+        Args:
+            sentences (`List[str]`): List of sentences to encode
+
+        Returns:
+            `torch.Tensor`: Tensor of embeddings for the given sentences
+        """
+        all_embeddings = []
+
+        for start_index in progressbar(range(0, len(sentences), self.batch_size)):
+            sentences_batch = sentences[start_index:start_index + self.batch_size]
+            # Tokenize sentences
+            encoded_inputs = self.tokenizer(
+                sentences_batch,
+                padding='max_length',
+                truncation=True,
+                return_tensors='pt',
+                max_length=512
+            )
+            # Compute token embeddings
+            embeddings = self._encode_or_compute_batched(encoded_inputs)
+            all_embeddings.extend(embeddings)
+
+        if all_embeddings:
+            if isinstance(all_embeddings, np.ndarray):
+                all_embeddings = torch.from_numpy(all_embeddings)
+            else:
+                all_embeddings = torch.stack(all_embeddings)
+        else:
+            all_embeddings = torch.Tensor()
+
+        return all_embeddings
+
+    def compute_score(self, sentence_pairs: Union[List[List[str]], Tuple[str, str]]) -> List[float]:
+        """ Returns a list of scores for the given sentence pairs.
+        Args:
+            sentence_pairs (`Union[List[List[str]], Tuple[str, str]]`): List of sentences pairs to compute score
+
+        Returns:
+            `List[float]`: List of scores for the given sentence pairs
+        """
+        if not isinstance(sentence_pairs, list):
+            raise TypeError('type of `sentence_pairs` is not `list`')
+        if isinstance(sentence_pairs[0], str):
+            sentence_pairs = [sentence_pairs]
+
+        all_scores = []
+
+        for start_index in progressbar(range(0, len(sentence_pairs), self.batch_size), 'Computing'):
+            pairs_batch = sentence_pairs[start_index:start_index + self.batch_size]
+            # Tokenize sentences
+            encoded_inputs = self.tokenizer(
+                pairs_batch,
+                padding='max_length',
+                truncation=True,
+                return_tensors='pt',
+                max_length=512
+            ).to(self.device)
+            scores = self._encode_or_compute_batched(encoded_inputs)
+            all_scores.extend(scores.numpy().tolist())
+
+        return all_scores[0] if len(all_scores) == 1 else all_scores
+
+    def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
+        """ Returns a list of embeddings for the given sentences.
+
+        Args:
+            inputs (`BatchEncoding`): List of sentences to encode
+
+        Returns:
+            `torch.Tensor`: Tensor of embeddings for the given sentences
+        """
+        # 规避【华为Python规范】【建议】G.CLS.07 类的方法不需要访问实例时，建议定义为staticmethod或classmethod
+        _ = self
+        return torch.tensor(0)
+
+
+class PyTorchModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
+        super(PyTorchModel, self).__init__(tokenizer_path, batch_size, device_id)
+        self.model = AutoModelForSequenceClassification.from_pretrained(
+            model_path,
+            local_files_only=True,
+            trust_remote_code=True
+        ).half().to(self.device)
+        self.model.eval()
+
+    def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
+        with torch.no_grad():
+            outputs = self.model(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
+        return outputs
+
+
+class ONNXModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
+        super(ONNXModel, self).__init__(tokenizer_path, batch_size, device_id)
+        self.ort = ORTModelForSequenceClassification.from_pretrained(model_path).to(self.device)
+
+    def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
+        with torch.inference_mode():
+            outputs = self.ort(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
+        return outputs
+
+
+class OMModel(Model):
+    def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int) -> None:
+        super(OMModel, self).__init__(tokenizer_path, batch_size, device_id)
+        self.session = self.runtime(device_id, model_path)
+
+    def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
+        input_ids = inputs.data['input_ids'].numpy().astype(np.int64)
+        attention_mask = inputs.data['attention_mask'].numpy().astype(np.int64)
+        session_outputs = self.session.infer(feeds=[input_ids, attention_mask], mode='dymshape', custom_sizes=10000000)
+        outputs = torch.from_numpy(session_outputs[0][:, 0]).view(-1, ).float()
+        return outputs
+
+
+def load_model(model_args: argparse.Namespace) -> Model:
+    # default model path
+    with safe_open('config.json', 'r', encoding='utf-8') as reader:
+        text = reader.read()
+    default_path = json.loads(text)['default_path']
+    pytorch_model_path = tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
+    onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
+    om_model_path = os.path.abspath(default_path['om_model_path'])
+
+    model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
+    model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
+
+    model_type = model_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
+    default_model_path = model_path_map.get(model_type, 'not exist')
+    if default_model_path != 'not exist':
+        model_path = (
+            model_args.model_type_or_path
+            if os.path.isdir(model_args.model_type_or_path) or os.path.isfile(model_args.model_type_or_path)
+            else default_model_path
+        )
+    else:
+        raise RuntimeError(
+            'load model failed because '
+            '\'{}\' is not a valid model type or path'.format(model_args.model_type_or_path)
+        )
+    try:
+        model_for_eval = model_map[model_type](
+            tokenizer_path=tokenizer_path,
+            model_path=model_path,
+            batch_size=model_args.batch_size,
+            device_id=model_args.device
+        )
+    except KeyError as e:
+        raise RuntimeError('load {} model failed because {}'.format(model_type, e)) from e
+
+    return model_for_eval
+
+
+if __name__ == '__main__':
+    logger = logging.getLogger(__name__)
+    logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.INFO)
+    args = get_args()
+    model = load_model(args)
+    task = ['T2RerankingLocal']
+    evaluation = MTEB(tasks=task, task_langs=['zh'])
+    results = evaluation.run(model)
+    logging.info(results)
--- a/mindie/examples/models/bge/reranker-large/models/om/ops_info.json
+++ b/mindie/examples/models/bge/reranker-large/models/om/ops_info.json
@ -0,0 +1,16 @@
+{
+  "black-list": {
+    "to-remove": [],
+    "to-add": []
+  },
+  "white-list": {
+    "to-remove": [],
+    "to-add": [
+      "Cast",
+      "FlattenV2",
+      "LayerNorm",
+      "GatherShapes",
+      "GatherV2"
+    ]
+  }
+}
--- a/mindie/examples/models/bge/reranker-large/models/pytorch/config.json
+++ b/mindie/examples/models/bge/reranker-large/models/pytorch/config.json
@ -0,0 +1,39 @@
+{
+  "_name_or_path": "models/pytorch",
+  "architectures": [
+    "XLMRobertaForSequenceClassification"
+  ],
+  "auto_map": {
+    "AutoConfig": "models/pytorch--configuration_xlm_roberta.XLMRobertaConfig",
+    "AutoModel": "models/pytorch--modeling_xlm_roberta_ascend.XLMRobertaModel",
+    "AutoModelForSequenceClassification": "models/pytorch--modeling_xlm_roberta_ascend.XLMRobertaForSequenceClassification"
+  },
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": null,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "xlm-roberta",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "output_past": true,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.30.0",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 250002
+}
--- a/mindie/examples/models/bge/reranker-large/models/pytorch/configuration_xlm_roberta.py
+++ b/mindie/examples/models/bge/reranker-large/models/pytorch/configuration_xlm_roberta.py
@ -0,0 +1,170 @@
+# coding=utf-8
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" XLM-RoBERTa configuration"""
+
+from collections import OrderedDict
+from typing import Mapping
+
+from transformers.configuration_utils import PretrainedConfig
+from transformers.onnx import OnnxConfig
+from transformers.utils import logging
+
+
+logger = logging.get_logger(__name__)
+
+XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+    "xlm-roberta-base": "https://huggingface.co/xlm-roberta-base/resolve/main/config.json",
+    "xlm-roberta-large": "https://huggingface.co/xlm-roberta-large/resolve/main/config.json",
+    "xlm-roberta-large-finetuned-conll02-dutch": (
+        "https://huggingface.co/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/config.json"
+    ),
+    "xlm-roberta-large-finetuned-conll02-spanish": (
+        "https://huggingface.co/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/config.json"
+    ),
+    "xlm-roberta-large-finetuned-conll03-english": (
+        "https://huggingface.co/xlm-roberta-large-finetuned-conll03-english/resolve/main/config.json"
+    ),
+    "xlm-roberta-large-finetuned-conll03-german": (
+        "https://huggingface.co/xlm-roberta-large-finetuned-conll03-german/resolve/main/config.json"
+    ),
+}
+
+
+class XLMRobertaConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`XLMRobertaModel`] or a [`TFXLMRobertaModel`]. It
+    is used to instantiate a XLM-RoBERTa model according to the specified arguments, defining the model architecture.
+    Instantiating a configuration with the defaults will yield a similar configuration to that of the XLMRoBERTa
+    [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) architecture.
+
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+
+
+    Args:
+        vocab_size (`int`, *optional*, defaults to 30522):
+            Vocabulary size of the XLM-RoBERTa model. Defines the number of different tokens that can be represented by
+            the `inputs_ids` passed when calling [`XLMRobertaModel`] or [`TFXLMRobertaModel`].
+        hidden_size (`int`, *optional*, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (`int`, *optional*, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (`int`, *optional*, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (`int`, *optional*, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
+            `"relu"`, `"silu"` and `"gelu_new"` are supported.
+        hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (`int`, *optional*, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (`int`, *optional*, defaults to 2):
+            The vocabulary size of the `token_type_ids` passed when calling [`XLMRobertaModel`] or
+            [`TFXLMRobertaModel`].
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
+            Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
+            positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
+            [Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
+            For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
+            with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
+        is_decoder (`bool`, *optional*, defaults to `False`):
+            Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        classifier_dropout (`float`, *optional*):
+            The dropout ratio for the classification head.
+
+    Examples:
+
+    ```python
+    >>> from transformers import XLMRobertaConfig, XLMRobertaModel
+
+    >>> # Initializing a XLM-RoBERTa xlm-roberta-base style configuration
+    >>> configuration = XLMRobertaConfig()
+
+    >>> # Initializing a model (with random weights) from the xlm-roberta-base style configuration
+    >>> model = XLMRobertaModel(configuration)
+
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "xlm-roberta"
+
+    def __init__(
+        self,
+        vocab_size=30522,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.1,
+        max_position_embeddings=512,
+        type_vocab_size=2,
+        initializer_range=0.02,
+        layer_norm_eps=1e-12,
+        pad_token_id=1,
+        bos_token_id=0,
+        eos_token_id=2,
+        position_embedding_type="absolute",
+        use_cache=True,
+        classifier_dropout=None,
+        **kwargs,
+    ):
+        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
+
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.layer_norm_eps = layer_norm_eps
+        self.position_embedding_type = position_embedding_type
+        self.use_cache = use_cache
+        self.classifier_dropout = classifier_dropout
+
+
+# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->XLMRoberta
+class XLMRobertaOnnxConfig(OnnxConfig):
+    @property
+    def inputs(self) -> Mapping[str, Mapping[int, str]]:
+        if self.task == "multiple-choice":
+            dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
+        else:
+            dynamic_axis = {0: "batch", 1: "sequence"}
+        return OrderedDict(
+            [
+                ("input_ids", dynamic_axis),
+                ("attention_mask", dynamic_axis),
+            ]
+        )
--- a/mindie/examples/models/bge/reranker-large/models/pytorch/modeling_xlm_roberta_fa.py
+++ b/mindie/examples/models/bge/reranker-large/models/pytorch/modeling_xlm_roberta_fa.py
--- a/mindie/examples/models/bge/reranker-large/requirements.txt
+++ b/mindie/examples/models/bge/reranker-large/requirements.txt
@ -0,0 +1,4 @@
+optimum==1.18.0
+onnx==1.16.0
+onnxruntime==1.17.1
+transformers==4.33.0
--- a/mindie/examples/models/bge/reranker-large/run.py
+++ b/mindie/examples/models/bge/reranker-large/run.py
@ -0,0 +1,181 @@
+# Copyright Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
+import argparse
+import logging
+import json
+import os
+import time
+
+import numpy as np
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+from optimum.onnxruntime import ORTModelForSequenceClassification
+
+from atb_llm.utils.file_utils import safe_open
+
+logger = logging.getLogger(__name__)
+logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.DEBUG)
+
+
+parser = argparse.ArgumentParser(description='Adapting LLM on Ascend.')
+parser.add_argument(
+    '--model_type_or_path',
+    type=str,
+    required=True,
+    help='Specipy model type to load default model or path to the directory containing model file.'
+)
+parser.add_argument(
+    '--device',
+    type=int,
+    default=6,
+    choices=list(range(8)),
+    help='Adapt model on device id x.'
+)
+
+# Default model path
+with safe_open('config.json', 'r', encoding='utf-8') as reader:
+    text = reader.read()
+default_path = json.loads(text)['default_path']
+pytorch_model_path = tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
+onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
+om_model_path = os.path.abspath(default_path['om_model_path'])
+
+# Query and passage we want sentence embeddings for
+QUERY = '什么是大熊猫？'
+POSITIVE = '大熊猫（Ailuropoda melanoleuca），属于食肉目熊科的一种哺乳动物，体色为黑白两色。是中国特有物种'
+NEGATIVE = '比熊犬（法语：Bichon Frisé，bichon à poil frisé，意指“白色卷毛的玩赏用小狗”）是一种小型犬品种'
+pairs = [[QUERY, POSITIVE], [QUERY, NEGATIVE]]
+logger.info('query and passage for inference: %s', pairs)
+
+# Load local tokenizer
+tokenizer = AutoTokenizer.from_pretrained(pytorch_model_path)
+
+# Tokenize sentences
+encoded_input = tokenizer(pairs, padding='max_length', return_tensors='pt', max_length=512)
+
+
+def infer_pytorch(model_path: str, device_id: int) -> None:
+    # Set device
+    try:
+        import torch_npu
+    except ImportError:
+        device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+    else:
+        device = 'npu:{}'.format(device_id)
+        torch_npu.npu.set_device(device_id)
+        torch.npu.set_compile_mode(jit_compile=False)
+    # Load model from local
+    model = AutoModelForSequenceClassification.from_pretrained(
+        model_path,
+        local_files_only=True,
+        trust_remote_code=True
+    ).half().to(device)
+    model.eval()
+    encoded_input_to_device = encoded_input.to(device)
+    # Compute similarity scores
+    for iters in range(2):
+        with torch.no_grad():
+            start_time = time.time()
+            scores = model(**encoded_input_to_device, return_dict=True).logits.view(-1, ).float()
+            exec_time = time.time() - start_time
+            logger.info(
+                '%s%s inference time: %.2f ms',
+                iters + 1,
+                'tsnrhtdd'[(iters + 1) % 5 * ((iters + 1) % 100 ^ 15 > 4 > (iters + 1) % 10)::4],
+                exec_time * 1000
+            )
+            logger.info('scores [positive, negative]: %s', scores.cpu())
+    # Free resource
+    if device.startswith('npu'):
+        try:
+            torch.npu.empty_cahce()
+        except AttributeError:
+            pass
+    elif device.startswith('cuda'):
+        torch.cuda.empty_cache()
+
+
+def infer_onnx(model_path: str, device_id: int) -> None:
+    # Set device
+    try:
+        import torch_npu
+    except ImportError:
+        device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
+    else:
+        device = 'npu:{}'.format(device_id)
+        torch_npu.npu.set_device(device_id)
+        torch.npu.set_compile_mode(jit_compile=False)
+    # Load model from local
+    ort = ORTModelForSequenceClassification.from_pretrained(model_path).to(device)
+    encoded_input_to_device = encoded_input.to(device)
+    # Compute similarity scores
+    for iters in range(2):
+        with torch.inference_mode():
+            start_time = time.time()
+            scores = ort(**encoded_input_to_device, return_dict=True).logits.view(-1, ).float()
+            exec_time = time.time() - start_time
+            logger.info(
+                '%s%s inference time: %.2f ms',
+                iters + 1,
+                'tsnrhtdd'[(iters + 1) % 5 * ((iters + 1) % 100 ^ 15 > 4 > (iters + 1) % 10)::4],
+                exec_time * 1000
+            )
+            logger.info('scores [positive, negative]: %s', scores.cpu())
+    # Free resource
+    if device.startswith('npu'):
+        try:
+            torch.npu.empty_cahce()
+        except AttributeError:
+            pass
+    elif device.startswith('cuda'):
+        torch.cuda.empty_cache()
+
+
+def infer_om(model_path: str, device_id: int) -> None:
+    # Tokenize sentences
+    input_ids = encoded_input.data['input_ids'].numpy().astype(np.int64)
+    attention_mask = encoded_input.data['attention_mask'].numpy().astype(np.int64)
+    # Load model from local
+    from ais_bench.infer.interface import InferSession
+    session = InferSession(device_id, model_path)
+    # Compute similarity scores
+    for iters in range(2):
+        output = session.infer(feeds=[input_ids, attention_mask], mode='dymshape', custom_sizes=10000000)
+        scores = torch.from_numpy(output[0][:, 0]).view(-1, ).float()
+        exec_time = session.summary().exec_time_list[-1]
+        logger.info(
+            '%s%s inference time: %.2f ms',
+            iters + 1,
+            'tsnrhtdd'[(iters + 1) % 5 * ((iters + 1) % 100 ^ 15 > 4 > (iters + 1) % 10)::4],
+            exec_time[1] - exec_time[0]
+        )
+        logger.info('scores [positive, negative]: %s', scores)
+    # Free resource
+    session.free_resource()
+
+
+def infer(model_type_or_path: str = None, device_id: int = 0) -> None:
+    model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
+    model_map = {'pytorch': infer_pytorch, 'onnx': infer_onnx, 'om': infer_om}
+
+    model_type = model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
+    default_model_path = model_path_map.get(model_type, 'not exist')
+    if default_model_path != 'not exist':
+        model_path = (
+            model_type_or_path
+            if os.path.isdir(model_type_or_path) or os.path.isfile(model_type_or_path)
+            else default_model_path
+        )
+    else:
+        raise RuntimeError(
+            'load model failed because '
+            '\'{}\' is not a valid model type or path'.format(model_type_or_path)
+        )
+    try:
+        model_map[model_type](model_path, device_id)
+    except KeyError as e:
+        raise RuntimeError('load {} model failed because {}'.format(model_type, e)) from e
+
+
+if __name__ == '__main__':
+    args = parser.parse_args()
+    infer(args.model_type_or_path, args.device)
--- a/mindie/examples/models/bloom/README.md
+++ b/mindie/examples/models/bloom/README.md
@ -0,0 +1,138 @@
+# BLOOM
+
+* [BLOOM](https://huggingface.co/bigscience/bloom) (BigScience Large Open-science Open-access Multilingual Language Model)
+* 此代码仓中实现了一套基于 NPU 硬件的 BLOOM 推理模型。
+
+## 特性矩阵
+
+- 此矩阵罗列了各 BLOOM 模型支持的特性：
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI |  长序列 |
+|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|--------------|--------------------------|-----|--------|-----|-----|
+| bloom (176B) | 支持world size 8   | 否      | 是   | 否  | 否              | 是              | 否       | 是      | 否           | 否       | 否     | 否     | 否  | 否  |
+| bloom-7b1 | 支持world size 1,2,4,8   | 支持world size 1,2,4 | 是   | 否  | 否              | 是              | 否       | 否       | 否           | 否       | 否     | 否    | 否  | 否  |
+| bloomz-7b1-mt | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是   | 否  | 否              | 是              | 否       | 否       | 否           | 否     | 否     | 否    | 否  | 否  |
+
+## 推理使用说明
+
+### 路径变量解释
+
+| 变量名        | 含义                                                         |
+| ------------- | ------------------------------------------------------------ |
+| `working_dir` | 加速库及模型库下载后放置的目录                               |
+| `llm_path`    | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用gitee下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
+| `script_path` | 脚本所在路径。BLOOM 系列模型的工作脚本所在路径为`{llm_path}/examples/models/bloom` |
+| `weight_path` | HF 原始模型权重路径（`.safetensors` 格式）                   |
+
+权重下载链接：
+
+* bloom (176b): https://huggingface.co/bigscience/bloom
+* bloomz-7b1-mt: https://huggingface.co/bigscience/bloomz-7b1-mt
+* bloom-7b1: https://huggingface.co/bigscience/bloomz-7b1
+
+> 下载权重时无需下载 `pytorch_model.bin.index.json` 以及 `.bin` 文件。
+
+框架加载权重时会从下载的 `config.json` 里面读取 `torch_dtype`，因此需要手动在 `config.json` 里面补上 `"torch_dtype": "float16"`。
+
+### 环境准备
+
+1、安装 CANN 8.0 的环境，并 `source /path/to/cann/set_env.sh`；
+
+2、使用 Python 3.9 或更高；
+
+3、使用 torch 2.0 或更高版本，并安装对应的 torch_npu；
+
+4、安装依赖：
+
+```shell
+pip install transformers==4.34.0
+pip install accelerate
+```
+
+5、安装 `atb_llm`:
+
+```shell
+cd $llm_path
+python setup.py bdist_wheel
+python -m pip install dist/*.whl --force-reinstall
+```
+
+## BLOOMZ-7B1-MT
+
+### 权重准备
+
+在 Hugging Face 上下载模型权重文件（推荐下载 `.safetensors`，`.bin` 需要转换成 `.safetensors`），权重路径为 `weight_path`。
+
+### PagedAttention模型
+
+进入 `modeltest` 路径下：
+
+```shell
+cd tests/modeltest
+```
+
+进行测试前需要先设置一些环境变量：
+
+```shell
+export HCCL_BUFFSIZE=110
+export PYTHONWARNINGS="ignore"
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_USE_TILING_COPY_STREAM=1
+export ATB_CONTEXT_WORKSPACE_RING=1
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+```
+
+#### 性能测试
+
+> `$weight_path` 可以是 HuggingFace 原始权重路径，也可以是量化后的模型权重路径（下同）。
+
+```shell
+bash run.sh pa_fp16 performance [[seq_in,seq_out],[seq_in,seq_out]] $batch_size bloom $weight_path $tp
+```
+
+例如：`TP = 8`，`batch_size = 1`：
+
+```shell
+bash run.sh pa_fp16 performance [[256,256],[512,512],[1024,1024],[2048,2048]] 1 bloom /path/to/model 8
+```
+
+#### 下游任务精度测试
+
+```shell
+bash run.sh pa_fp16 full_CEval $n_shot $batch_size bloom $weight_path $tp
+```
+
+例如：`TP = 8`，`batch_size = 1`，`CEval 5-shot`：
+
+```shell
+bash run.sh pa_fp16 full_CEval 5 1 bloom /path/to/model 1
+```
+
+更详细的配置选项请参考：`examples/atb_models/tests/modeltest/README.md`
+
+## BLOOM-7B1
+
+### PagedAttention模型
+
+与 BLOOMZ-7B1-MT PagedAttention 模型测试方式相同。
+
+## BLOOM-176B
+
+### 权重准备
+
+BLOOM-176B 由于权重较大（约 328GB），仅支持 800I A2 机器上进行 TP8 W8A16 推理，首选需要对 HuggingFace 下载的原始权重进行量化：
+
+```shell
+# source CANN包
+source /path/to/cann/set_env.sh
+# 进入模型仓所在路径，详见*路径变量解释-llm_path*
+cd $llm_path
+# {浮点权重路径} 即 HuggingFace 下载的原始权重路径
+python examples/models/bloom/convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A16量化权重路径} --w_bit 8 --a_bit 16 --act_method 3 --calib_file ""
+```
+
+### PagedAttention模型
+
+与 BLOOMZ-7B1-MT PagedAttention 模型测试方式相同，只需要将 `{W8A16量化权重路径}` 作为 `$weight_path` 配置即可。
--- a/mindie/examples/models/bloom/convert_quant_weights.py
+++ b/mindie/examples/models/bloom/convert_quant_weights.py
@ -0,0 +1,76 @@
+# Copyright Huawei Technologies Co., Ltd. 2024. All rights reserved.
+
+import os
+
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import QuantConfig
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig
+
+from transformers import BloomConfig
+from examples.convert.model_slim.get_calibration_dataset import load_jsonl
+from examples.convert.model_slim.quantifier import parse_arguments, Quantifier
+from examples.convert.convert_utils import copy_tokenizer_files, modify_config
+
+
+
+if __name__ == "__main__":
+    args = parse_arguments()
+
+    rank = int(os.getenv("RANK", "0"))
+
+    config = BloomConfig.from_pretrained(args.model_path)
+
+    disable_names = []
+    if args.a_bit != 16:
+        # W8A16, W4A16没有回退层
+        num_layers = config.num_hidden_layers
+        disable_names = [f"model.layers.{layer}.mlp.down_proj" for layer in range(num_layers)]
+        disable_names.append("lm_head")
+
+    anti_outlier_config = None
+    if args.anti_method:
+        anti_outlier_config = AntiOutlierConfig(anti_method=args.anti_method)
+
+    quant_config = QuantConfig(
+        a_bit=args.a_bit,
+        w_bit=args.w_bit,
+        disable_names=disable_names,
+        act_method=args.act_method,
+        w_sym=args.w_sym,
+        mm_tensor=False,
+        dev_type=args.device_type,
+        dev_id=rank,
+        pr=1.0,
+        fraction=args.fraction,
+        co_sparse=args.co_sparse,
+        do_smooth=args.do_smooth,
+        use_sigma=args.use_sigma,
+        sigma_factor=args.sigma_factor,
+        is_lowbit=args.is_lowbit,
+        use_kvcache_quant=args.use_kvcache_quant,
+    )
+
+    # 默认无校准数据集
+    calibration_dataset = None
+    # 若存在calib_file，则使用calib_file作为校准数据集
+    if args.calib_file:
+        calibration_dataset = load_jsonl(args.calib_file)
+    calibration_dataset = calibration_dataset
+    quant_weight_generator = Quantifier(args.model_path, quant_config, anti_outlier_config, args.device_type)
+    quant_weight_generator.tokenizer.pad_token_id = 0
+
+    tokenized_data = None
+    if calibration_dataset is not None:
+        tokenized_data = quant_weight_generator.get_tokenized_data(calibration_dataset)
+
+    quant_weight_generator.convert(tokenized_data, args.save_directory, args.disable_level)
+    #为适配工具稀疏量化传入w_bit=4,a_bit=8暂时修改quant_type
+    quant_type = f"w{args.w_bit}a{args.a_bit}" + ("s" if (args.co_sparse or args.is_lowbit) else "")
+    is_sparseCompress = args.w_bit == 4 and args.a_bit == 8 and (args.co_sparse or args.is_lowbit)
+    if is_sparseCompress:
+        quant_type = "w8a8s"
+    modify_config(
+        args.model_path, args.save_directory, config.torch_dtype,
+        quant_type,
+        args.use_kvcache_quant
+    )
+    copy_tokenizer_files(args.model_path, args.save_directory)
--- a/mindie/examples/models/bloom/run_fa.sh
+++ b/mindie/examples/models/bloom/run_fa.sh
@ -0,0 +1,37 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export MAX_MEMORY_GB=29
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MASTER_PORT=20030
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export HCCL_BUFFSIZE=120
+export HCCL_WHITELIST_DISABLE=1
+export ATB_CONTEXT_WORKSPACE_RING=1
+export ATB_CONTEXT_WORKSPACE_SIZE=2629145600
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
+export ATB_LAUNCH_KERNEL_WITH_TILING=0
+export ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=1
+export ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=0
+
+# solve num_blocks < 0  free_memory < 0
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export RESERVED_MEMORY_GB=0
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+
+extra_param=""
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_fa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
+fi
+
+#  --input_text "Common sense questions and answers\n\nQuestion: Why do we need to learn a new language\nFactual answer:" --max_output_length 32
--- a/mindie/examples/models/bloom/run_pa.sh
+++ b/mindie/examples/models/bloom/run_pa.sh
@ -0,0 +1,36 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export IS_QUANT=0
+export MAX_MEMORY_GB=29
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export world_size=8
+export MASTER_PORT=20030
+export IS_BF16=false
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export LCCL_ENABLE_FALLBACK=1
+
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export RESERVED_MEMORY_GB=0
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export INT8_FORMAT_NZ_ENABLE=1
+
+extra_param=""
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+if [ "$IS_BF16" = true ]; then
+    extra_param="${extra_param} --is_bf16"
+fi
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_pa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param 
+fi
--- a/mindie/examples/models/chatglm/v2_6b/README.md
+++ b/mindie/examples/models/chatglm/v2_6b/README.md
@ -0,0 +1,231 @@
+# ChatGLM2-6B 模型推理指导 <!-- omit in toc -->
+
+# 概述
+
+- [ChatGLM2-6B](https://github.com/THUDM/ChatGLM2-6B/) 是开源中英双语对话模型 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) 的第二代版本，在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上，ChatGLM2-6B有更强大的性能、更长的上下文、更高效的推理和更开放的协议。
+- 此代码仓中实现了一套基于NPU硬件的ChatGLM2推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了ChatGLM2-6B模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化（仅300I DUO支持） | MOE | MindIE | TGI | 长序列 |
+|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
+| ChatGLM2-6B    | 支持world size 1,2,4,8  | 支持world size 1,2,4      | 是   | 否   | 否              | 是              | 是       | 否      | 否     | 否           | 是       | 否     | 是     | 是  | 否 |
+
+- 此模型仓已适配的模型版本
+  - [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b/tree/main)
+
+# 使用说明
+
+## 路径变量解释
+| 变量名  | 含义                                             |
+|--------|--------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置的目录                  |
+| llm_path | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用gitee下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models`    |
+| script_path | 脚本所在路径；路径为${llm_path}/examples/models/chatglm/v2_6b                            |
+| weight_path | 模型权重路径                            |
+
+## 权重转换
+- 参考[此README文件](../../../README.md)
+
+## 量化权重导出
+量化权重可通过msmodelslim（昇腾压缩加速工具）实现。
+
+### 环境准备
+环境配置可参考msmodelslim官网：https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/devtools/auxiliarydevtool/modelslim_0002.html
+
+### 导出w8a8量化权重
+通过`${llm_path}/examples/models/chatglm/v2_6b/quant_chatglm_w8a8.py`文件导出模型的量化权重（注意量化权重不要和浮点权重放在同一个目录下）：
+```shell
+# 必须设置该线程数
+export OMP_NUM_THREADS=48
+python quant_chatglm_w8a8.py --model_path ${浮点权重路径} --save_path ${量化权重保存路径} --dataset_path ${校准数据集路径}
+```
+校准数据集从 [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/e84444333b6d434ea7b0/) 获取，解压后，使用解压目录下的 `CEval/val/Other/civil_servant.jsonl` 作为校准数据集。
+
+导出量化权重后应生成`quant_model_weight_w8a8.safetensors`和`quant_model_description_w8a8.json`两个文件。
+
+### 导出w4a16量化权重
+通过`${llm_path}/examples/models/chatglm/v2_6b/quant_chatglm_w4a16.py`文件导出模型的量化权重（注意量化权重不要和浮点权重放在同一个目录下）：
+```shell
+python quant_chatglm_w4a16.py --model_path ${浮点权重路径} --save_path ${量化权重保存路径} --dataset_path ${校准数据集路径}
+```
+校准数据集从 [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/e84444333b6d434ea7b0/) 获取，解压后，使用解压目录下的 `CEval/val/Social_Science/teacher_qualification.jsonl` 作为校准数据集。
+
+导出量化权重后应生成`quant_model_weight_w4a16.safetensors`和`quant_model_description_w4a16.json`两个文件。
+
+注：
+
+1.quant_chatglm_w8a8.py和quant_chatglm_w4a16.py文件中已配置好较优的量化策略，导出量化权重时可直接使用，也可修改为其它策略。
+
+2.执行脚本生成量化权重时，会在生成的权重路径的config.json文件中添加(或修改)`quantize`字段，值为相应量化方式，当前仅支持`w8a8`和`w4a16`。
+
+3.执行完以上步骤后，执行量化模型只需要替换权重路径。
+
+4.如果生成权重时遇到`OpenBLAS Warning: Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP = 1 option`，可通过设置`export OMP_NUM_THREADS=1`来关闭多线程规避。
+
+### 导出稀疏量化权重
+执行generate_sparse.sh导出稀疏量化权重（注意量化权重不要和浮点权重放在同一个目录下）：
+```shell
+bash generate_sparse.sh ${浮点权重路径} ${稀疏量化权重保存路径} ${llm_path}/examples/models/chatglm/v2_6b/calib_data.jsonl ${Tensor并行数}
+```
+
+执行后`${稀疏量化权重保存路径}`下会生成compress目录，使用`${稀疏量化权重保存路径}/compress`目录作为权重目录进行推理。
+
+注：
+
+1.generate_sparse.sh文件中已配置好较优的量化策略，导出量化权重时可直接使用，也可修改为其它策略。
+
+2.执行完以上步骤后，执行量化模型只需要替换权重路径为`${稀疏量化权重保存路径}/compress`。
+
+3.当在npu上生成稀疏量化权重（即--device_type为npu时）时，注意需要将${浮点权重路径}/modeling_chatglm.py文件168行的@torch.jit.script注释。
+
+## 300I DUO 运行操作说明
+- 可开启CPU Performance模式以提高模型推理性能
+
+  ```
+  cpupower frequency-set -g performance
+  ```
+
+### 对话测试
+- 运行启动脚本
+  - 在\${llm_path}目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_300i_duo.sh ${weight_path}
+    ```
+- 环境变量说明
+  - `export BIND_CPU=1`
+    - 绑定CPU核心开关
+    - 默认进行绑核
+    - 若当前机器未设置NUMA或绑核失败，可将 BIND_CPU 设为 0
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3`
+    - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+    - 核心ID查阅方式见[此README文件](../../../README.md)的【启动脚本相关环境变量】章节
+  - `export TP_WORLD_SIZE=2`
+    - 指定模型运行时的TP数，即world size
+    - 默认为单卡双芯
+    - 各模型支持的TP数参考“特性矩阵”
+    - “单卡双芯”运行请指定`TP_WORLD_SIZE`为`2`
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用20030端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - `export PYTHONPATH=${llm_path}:$PYTHONPATH`
+    - 将模型仓路径加入Python查询模块和包的搜索路径中
+    - 将${llm_path}替换为实际路径
+  - `export INT8_FORMAT_NZ_ENABLE=1`
+    - 服务化量化场景开启
+  - - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    # 内存
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    # 性能
+    export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export HCCL_BUFFSIZE=110
+    ```
+
+
+## 800I A2 运行操作说明
+- 可开启CPU Performance模式以提高模型推理性能
+
+  ```
+  cpupower frequency-set -g performance
+  ```
+### 对话测试
+
+**运行Paged Attention FP16**
+- 运行启动脚本
+  - 在\${llm_path}目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_800i_a2_pa.sh ${weight_path}
+    ```
+- 环境变量说明
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+    - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+    - 核心ID查阅方式见[此README文件](../../../README.md)的【启动脚本相关环境变量】章节
+  - `export TP_WORLD_SIZE=1`
+    - 指定模型运行时的TP数，即world size
+    - 默认为单卡
+    - 各模型支持的TP数参考“特性矩阵”
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用20030端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - `export PYTHONPATH=${llm_path}:$PYTHONPATH`
+    - 将模型仓路径加入Python查询模块和包的搜索路径中
+    - 将${llm_path}替换为实际路径
+  - `export IS_BF16=false`
+    - 是否使用BF16精度进行推理
+    - 默认使用FP16
+  - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    # 内存
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    # 性能
+    export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export LCCL_ENABLE_FALLBACK=1
+    ```
+
+
+**运行Paged Attention BF16**
+- 暂不支持
+
+**运行Paged Attention W8A8量化**
+- 运行启动脚本
+  - 与“运行Paged Attention FP16”的启动方式相同
+  - `${weight_path}`为W8A8量化权重的路径
+- 环境变量说明
+  - 参见“运行Paged Attention FP16”中的环境变量说明
+- 相比于FP16，运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段，将此字段对应的值修改为`w8a8`
+  - 若config.json中无此字段，则新增
+
+**运行KV cache量化**
+- 暂不支持
+
+**运行Paged Attention 稀疏量化**
+- 运行启动脚本
+  - 与“运行Paged Attention FP16”的启动方式相同
+  - `${weight_path}`为稀疏量化权重的路径
+- 环境变量说明
+  - 参见“运行Paged Attention FP16”中的环境变量说明
+- 相比于FP16，运行量化时需修改稀疏量化权重`${weight_path}/config.json`中的`quantize`字段，将此字段对应的值修改为`w8a8sc`
+  - 若config.json中无此字段，则新增
+- 注意：压缩算法与硬件强相关，当前仅300I DUO卡支持稀疏量化
+
+
+## 精度测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## 性能测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## Web交互
+- 拉起MindIE Service后端
+- 拉起Web后端
+  ```shell
+  # 安装依赖
+  pip install -r web_requirements.txt
+  
+  # 下载 GitHub 仓库
+  git clone https://github.com/THUDM/ChatGLM2-6B.git
+  cd ChatGLM2-6B
+  git reset --hard 921d7e9adc69020a19169d1ba4f76c2675a2dd29
+  
+  # 应用适配代码
+  git apply ../web_demo.patch
+  cd ..
+  python3 ChatGLM2-6B/web_demo.py --model_path ${weight_path}
+  ```
+- 根据后台显示的IP和端口从浏览器访问
+
+## FAQ
+- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错，可通过配置`LD_PRELOAD`解决。
+  - 示例：`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`
--- a/mindie/examples/models/chatglm/v2_6b/calib_data.jsonl
+++ b/mindie/examples/models/chatglm/v2_6b/calib_data.jsonl
@ -0,0 +1,15 @@
+{"id": 0, "inputs_pretokenized": "编写中小学教科书的直接依据是____。\nA. 《中华人民共和国教育法》\nB. 课程计划\nC. 课程标准\nD. 课程表", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 1, "inputs_pretokenized": "下列关于课程的三种文本表现形式说法正确的是____\nA. 课程计划是由当地教育主管部门制订的\nB. 课程标准是依据课程计划制定的\nC. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式、螺旋式、交叉式", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 2, "inputs_pretokenized": "悦悦是一名右耳失聪的残疾儿童，活动课上有时会听不清楚周老师所讲的内容，因此经常提问题。对此，周老师应当采取的措施是____。\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿，不理会悦悦", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 3, "inputs_pretokenized": "内流河也称“内陆河”，是指没有流入海洋的河流，大多分布在大陆内部干燥地区，上游降水或冰雪融水为其主要补给水源，最终消失于沙漠或注入内陆湖泊。下列中国内流河中，最长的是____。\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 4, "inputs_pretokenized": "学校规定学生不能烫染头发，但是小文为了彰显个性，在假期把头发染成了棕色。面对小文的情况，教师应该怎样处理？____\nA. 年轻人追求个性是合情合理的，应该宽容对待\nB. 违反学校的校规，应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明小文违反校规的原因，并对其进行劝导和教育", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 5, "inputs_pretokenized": "张老师根据自己班级的情况，为解决班级内部班干部的人际关系问题，建立和谐融洽的班级氛围，自主开发了“和谐人际”的班级课程，这体现了教师____。\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 6, "inputs_pretokenized": "刘老师工作很负责，学生在学校出现一点问题他就会与家长联系，在与家长沟通时他经常以前辈的姿态对待家长，对家长的教育方式指指点点。刘老师的做法____。\nA. 正确，老师就应该与家长经常沟通\nB. 正确，老师的经验比家长丰富，应该多指导家长\nC. 不正确，教师没有权利指导家长\nD. 不正确，教师应该与家长建立平等的沟通关系，尊重家长的人格", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 7, "inputs_pretokenized": "在古代印度，有一户人家经营一家棉布店销售自己手工制作的衣服。你认为这户人家属于哪个等级？____\nA. 婆罗门\nB. 刹帝利\nC. 吠舍\nD. 首陀罗", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 8, "inputs_pretokenized": "“小型分散，便于开展多种多样的活动，满足学生不同的兴趣、爱好，发展学生的才能，使学生得到更多的学习和锻炼的机会。”这种课外活动的形式是____。\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
+{"id": 9, "inputs_pretokenized": "小红每天晚上临睡前都要多次反复检查自己的书包，确保带齐了第二天需要用的教材和文具。她明知道没有这个必要，但就是控制不住。她可能出现了____。\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 10, "inputs_pretokenized": "国家管理和评价课程的基础是____。\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 11, "inputs_pretokenized": "儿童坚持性发生明显质变的年龄约在____\nA. 3～4岁\nB. 4～5岁\nC. 5～6岁\nD. 6岁以后", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
+{"id": 12, "inputs_pretokenized": "《红楼梦》中人物众多、关系繁杂。为了帮助读者阅读，许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图。这属于____。\nA. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
+{"id": 13, "inputs_pretokenized": "学期结束时，班主任王老师会对学生思想品德的发展变化情况进行评价。这项工作属于____。\nA. 工作总结\nB. 工作计划\nC. 操行评定\nD. 建立学生档案", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
+{"id": 14, "inputs_pretokenized": "人们常说：“教学有法而教无定法。”这反映了教师的劳动具有____。\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
--- a/mindie/examples/models/chatglm/v2_6b/generate_sparse.sh
+++ b/mindie/examples/models/chatglm/v2_6b/generate_sparse.sh
@ -0,0 +1,17 @@
+export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False
+
+disable_names="transformer.encoder.layers.0.mlp.dense_4h_to_h transformer.encoder.layers.1.self_attention.query_key_value transformer.encoder.layers.1.self_attention.dense transformer.encoder.layers.1.mlp.dense_h_to_4h transformer.encoder.layers.1.mlp.dense_4h_to_h transformer.encoder.layers.2.self_attention.query_key_value transformer.encoder.layers.2.self_attention.dense transformer.encoder.layers.2.mlp.dense_h_to_4h transformer.encoder.layers.2.mlp.dense_4h_to_h transformer.encoder.layers.3.self_attention.query_key_value transformer.encoder.layers.3.self_attention.dense transformer.encoder.layers.4.self_attention.query_key_value transformer.encoder.layers.4.self_attention.dense transformer.encoder.layers.5.self_attention.query_key_value transformer.encoder.layers.5.self_attention.dense transformer.encoder.layers.6.self_attention.query_key_value transformer.encoder.layers.6.self_attention.dense transformer.encoder.layers.7.self_attention.query_key_value transformer.encoder.layers.7.self_attention.dense transformer.encoder.layers.8.self_attention.query_key_value transformer.encoder.layers.8.self_attention.dense transformer.encoder.layers.9.self_attention.query_key_value transformer.encoder.layers.9.self_attention.dense transformer.encoder.layers.11.self_attention.query_key_value transformer.encoder.layers.11.self_attention.dense transformer.encoder.layers.14.self_attention.query_key_value transformer.encoder.layers.14.self_attention.dense transformer.encoder.layers.19.self_attention.query_key_value transformer.encoder.layers.19.self_attention.dense transformer.encoder.layers.20.mlp.dense_4h_to_h transformer.encoder.layers.27.mlp.dense_4h_to_h transformer.output_layer"
+
+weight_path=$1
+w8a8s_weight_path=$2
+w8a8sc_weight_path=${w8a8s_weight_path}/compress
+calib_data=$3
+tp_size=$4
+
+cd ${ATB_SPEED_HOME_PATH}
+
+python -m examples.convert.model_slim.quantifier --model_path ${weight_path} --save_directory ${w8a8s_weight_path} --calib_file ${calib_data} --disable_names ${disable_names} --device_type npu --is_lowbit True --w_bit 4 --a_bit 8
+
+torchrun --nproc_per_node $tp_size -m examples.convert.model_slim.sparse_compressor --model_path ${w8a8s_weight_path} --save_directory ${w8a8sc_weight_path}
+
+cp $weight_path/modeling_chatglm.py $w8a8sc_weight_path/
--- a/mindie/examples/models/chatglm/v2_6b/quant_chatglm_w4a16.py
+++ b/mindie/examples/models/chatglm/v2_6b/quant_chatglm_w4a16.py
@ -0,0 +1,50 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+from examples.models.chatglm.v2_6b.quant_utils \
+    import parse_args, get_model_and_tokenizer, get_calib_dataset, copy_config_files, read_dataset
+
+
+disable_names = [
+    'transformer.encoder.layers.0.mlp.dense_4h_to_h',
+    'transformer.encoder.layers.1.mlp.dense_4h_to_h',
+    'transformer.encoder.layers.2.self_attention.query_key_value',
+    'transformer.encoder.layers.2.mlp.dense_4h_to_h',
+    'transformer.output_layer'
+]
+
+
+def main():
+    args = parse_args()
+    fp16_path = args.model_path  # 原始浮点模型路径
+    model, tokenizer = get_model_and_tokenizer(fp16_path)
+
+    calib_set = read_dataset(args.dataset_path)
+    dataset_calib = get_calib_dataset(tokenizer, calib_set[:1])
+
+    w_sym = True
+    anti_config = AntiOutlierConfig(a_bit=16, w_bit=4, anti_method="m3", dev_type="cpu", w_sym=w_sym)
+    anti_outlier = AntiOutlier(model, calib_data=dataset_calib, cfg=anti_config)
+    anti_outlier.process()
+    quant_config = QuantConfig(
+        a_bit=16, 
+        w_bit=4, 
+        disable_names=disable_names, 
+        dev_type='cpu',
+        w_sym=w_sym,
+        mm_tensor=False,
+        is_lowbit=True,
+        open_outlier=False,
+        group_size=args.group_size
+    )
+
+    calibrator = Calibrator(model, quant_config, calib_data=[], disable_level='L0')
+    calibrator.run()  # 执行PTQ量化校准
+    calibrator.save(args.save_path, save_type=["safe_tensor"])  # "safe_tensor"对应safetensors格式权重
+    copy_config_files(fp16_path, args.save_path, 'w4a16')
+    
+
+if __name__ == '__main__':
+    main()
+    
--- a/mindie/examples/models/chatglm/v2_6b/quant_chatglm_w8a8.py
+++ b/mindie/examples/models/chatglm/v2_6b/quant_chatglm_w8a8.py
@ -0,0 +1,59 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+from examples.models.chatglm.v2_6b.quant_utils \
+    import parse_args, get_model_and_tokenizer, get_calib_dataset, copy_config_files, read_dataset
+
+
+disable_names = [
+    'transformer.encoder.layers.0.self_attention.query_key_value',
+    'transformer.encoder.layers.0.mlp.dense_4h_to_h',
+    'transformer.encoder.layers.1.self_attention.query_key_value',
+    'transformer.encoder.layers.1.mlp.dense_h_to_4h',
+    'transformer.encoder.layers.1.mlp.dense_4h_to_h',
+    'transformer.encoder.layers.2.self_attention.query_key_value',
+    'transformer.encoder.layers.2.mlp.dense_h_to_4h',
+    'transformer.encoder.layers.2.mlp.dense_4h_to_h',
+    'transformer.encoder.layers.3.self_attention.query_key_value',
+    'transformer.encoder.layers.4.self_attention.query_key_value',
+    'transformer.encoder.layers.5.self_attention.query_key_value',
+    'transformer.encoder.layers.6.self_attention.query_key_value',
+    'transformer.encoder.layers.7.self_attention.query_key_value',
+    'transformer.encoder.layers.8.self_attention.query_key_value',
+    'transformer.encoder.layers.9.self_attention.query_key_value',
+    'transformer.encoder.layers.11.self_attention.query_key_value',
+    'transformer.encoder.layers.14.self_attention.query_key_value',
+    'transformer.encoder.layers.19.self_attention.query_key_value',
+    'transformer.encoder.layers.20.mlp.dense_4h_to_h',
+    'transformer.encoder.layers.27.mlp.dense_4h_to_h',
+    'transformer.output_layer'
+]
+
+quant_config = QuantConfig(
+    a_bit=8, 
+    w_bit=8, 
+    disable_names=disable_names, 
+    dev_type='cpu',
+    act_method=1,
+    pr=1.0, 
+    w_sym=True, 
+    mm_tensor=False
+)
+
+
+def main():
+    args = parse_args()
+    fp16_path = args.model_path  # 原始浮点模型路径
+    model, tokenizer = get_model_and_tokenizer(fp16_path)
+
+    calib_set = read_dataset(args.dataset_path)
+    dataset_calib = get_calib_dataset(tokenizer, calib_set)
+    calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L0')
+    calibrator.run()  # 执行PTQ量化校准
+    calibrator.save(args.save_path, save_type=["safe_tensor"])  # "safe_tensor"对应safetensors格式权重
+    copy_config_files(fp16_path, args.save_path, 'w8a8')
+    
+
+if __name__ == '__main__':
+    main()
+    
--- a/mindie/examples/models/chatglm/v2_6b/quant_utils.py
+++ b/mindie/examples/models/chatglm/v2_6b/quant_utils.py
@ -0,0 +1,62 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+
+import os
+import json
+import shutil
+import argparse
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+from atb_llm.utils.file_utils import safe_open
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="Creating quant weights for ChatGLM2-6B or ChatGLM3-6B")
+    parser.add_argument("--model_path", type=str, required=True, help="The path to model float weights")
+    parser.add_argument("--save_path", type=str, default="./quant_weight_glm", help="The path to save quant weights")
+    parser.add_argument("--dataset_path", type=str, required=True, help="The dataset path")
+    parser.add_argument("--group_size", type=int, default=128, help="The group size for w4a16")
+
+    return parser.parse_args()
+
+
+def get_model_and_tokenizer(model_path):
+    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_path, trust_remote_code=True) 
+    model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_path, torch_dtype=torch.float32,
+                                                 trust_remote_code=True).cpu()
+    model.eval()
+    return model, tokenizer
+
+
+def read_dataset(dataset_path):
+    calib_set = []
+    with safe_open(dataset_path, encoding='utf-8') as file:
+        for line in file:
+            calib_set.append(json.loads(line))
+    return calib_set
+
+
+# 获取校准数据函数定义
+def get_calib_dataset(tokenizer, calib_list, device="cpu"):  # device="npu:0" 如果需要使用npu进行量化
+    calib_dataset = []
+    for calib_data in calib_list:
+        text = calib_data['inputs_pretokenized']
+        inputs = tokenizer([text], return_tensors='pt')
+        calib_dataset.append([
+            inputs.data['input_ids'].to(device),
+            inputs.data['position_ids'].to(device),
+            inputs.data['attention_mask'].to(device)
+            ])
+    return calib_dataset
+
+
+def copy_config_files(fp16_path, quant_path, quant_type):
+    model_files = [f for f in os.listdir(fp16_path) if f.startswith(("config", "tokeniz", "modeling_chatglm.py"))]
+    for f in model_files:
+        shutil.copy2(os.path.join(fp16_path, f), os.path.join(quant_path, f))
+    with safe_open(os.path.join(quant_path, "config.json"), 'r+', encoding='utf-8') as f:
+        config = json.load(f)
+        config['quantize'] = quant_type
+        f.seek(0)
+        json.dump(config, f, indent=4)
+        f.truncate()
--- a/mindie/examples/models/chatglm/v2_6b/run_300i_duo_pa.sh
+++ b/mindie/examples/models/chatglm/v2_6b/run_300i_duo_pa.sh
@ -0,0 +1,18 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export BIND_CPU=1
+export ASCEND_RT_VISIBLE_DEVICES=0,1
+export TP_WORLD_SIZE=2
+export MASTER_PORT=20030
+
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+
+export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export HCCL_BUFFSIZE=110
+export INT8_FORMAT_NZ_ENABLE=1
+
+export PYTHONPATH=${llm_path}:$PYTHONPATH
+torchrun --nproc_per_node $TP_WORLD_SIZE --master_port $MASTER_PORT -m examples.run_pa --model_path $1
--- a/mindie/examples/models/chatglm/v2_6b/run_800i_a2_pa.sh
+++ b/mindie/examples/models/chatglm/v2_6b/run_800i_a2_pa.sh
@ -0,0 +1,27 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export ASCEND_RT_VISIBLE_DEVICES=0
+export TP_WORLD_SIZE=1
+export MASTER_PORT=20030
+export PYTHONPATH=${llm_path}:$PYTHONPATH
+export IS_BF16=false
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+
+export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export LCCL_ENABLE_FALLBACK=1
+
+extra_param=""
+
+# if [ "$IS_BF16" = true ]; then
+#     extra_param="${extra_param} --is_bf16"
+# fi
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then    python -m examples.run_pa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $TP_WORLD_SIZE --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
+fi
--- a/mindie/examples/models/chatglm/v2_6b/web_demo.patch
+++ b/mindie/examples/models/chatglm/v2_6b/web_demo.patch
@ -0,0 +1,109 @@
+diff --git a/web_demo.py b/web_demo.py
+index 1af24c9..8c0e765 100644
+--- a/web_demo.py
+++ b/web_demo.py
+@@ -1,14 +1,23 @@
+-from transformers import AutoModel, AutoTokenizer
+import json
+import argparse
+import requests
+from transformers import AutoTokenizer
+ import gradio as gr
+ import mdtex2html
+-from utils import load_model_on_gpus
+ 
+-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
+-model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).cuda()
+-# 多显卡支持，使用下面两行代替上面一行，将num_gpus改为你实际的显卡数量
+-# from utils import load_model_on_gpus
+-# model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)
+-model = model.eval()
+def parse_args():
+    parser = argparse.ArgumentParser(description="ChatGLM2-6B/ChatGLM3-6b web demo")
+    parser.add_argument("--model_path", type=str, required=True, help="The path to model weights")
+    parser.add_argument("--mindie_sever_ip", type=str, default="127.0.0.1", help="The IP address of mindie server")
+    parser.add_argument("--mindie_sever_port", type=int, default=1025, help="The port of mindie server")
+    parser.add_argument("--max_new_tokens", type=int, default=512, help="Max new tokens to generate")
+    parser.add_argument("--concurrency", type=int, default=10, help="Concurrency count of web demo")
+
+    return parser.parse_args()
+
+
+args = parse_args()
+tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
+ 
+ """Override Chatbot.postprocess"""
+ 
+@@ -71,6 +80,49 @@ def predict(input, chatbot, max_length, top_p, temperature, history, past_key_va
+         yield chatbot, history, past_key_values
+ 
+ 
+def build_inputs(tokenizer, query: str):
+    # history由服务化内部自行处理
+    prompt = tokenizer.build_prompt(query, history=None)
+    return prompt
+
+
+def request(input, chatbot, max_length, top_p, temperature, history, past_key_values):
+    chatbot.append((parse_text(input), ""))
+
+    # 添加prompt格式以支持chat
+    promt = build_inputs(tokenizer, input)
+
+    response = requests.post(
+        f"http://{args.mindie_sever_ip}:{args.mindie_sever_port}/generate_stream",
+        json={
+            "inputs": promt,
+            "parameters": {
+                "max_new_tokens": max_length,
+                "do_sample": True,
+                "repetition_penalty": 1.05,
+                "seed": None,
+                "temperature": temperature,
+                # "top_k": 1,
+                "top_p": top_p,
+                "batch_size": 1
+            },
+        },
+        verify=False, stream=True
+    )
+
+    generate_text = ""
+    for line in response.iter_lines():
+        if not line:
+            continue
+        # 删除字符串开头的'data: '
+        res = line.decode('utf-8')[6:]
+        # 获取流式生成的文本内容
+        res_text = json.loads(res).get('token').get('text')
+        generate_text += res_text
+        chatbot[-1] = (parse_text(input), parse_text(generate_text))
+        yield chatbot, history, past_key_values
+
+
+ def reset_user_input():
+     return gr.update(value='')
+ 
+@@ -92,17 +144,17 @@ with gr.Blocks() as demo:
+                 submitBtn = gr.Button("Submit", variant="primary")
+         with gr.Column(scale=1):
+             emptyBtn = gr.Button("Clear History")
+-            max_length = gr.Slider(0, 32768, value=8192, step=1.0, label="Maximum length", interactive=True)
+-            top_p = gr.Slider(0, 1, value=0.8, step=0.01, label="Top P", interactive=True)
+-            temperature = gr.Slider(0, 1, value=0.95, step=0.01, label="Temperature", interactive=True)
+            max_length = gr.Slider(1, args.max_new_tokens, value=args.max_new_tokens, step=1.0, label="Maximum New Tokens", interactive=True)
+            top_p = gr.Slider(0.01, 0.99, value=0.01, step=0.01, label="Top P", interactive=True)
+            temperature = gr.Slider(0.01, 1, value=0.01, step=0.01, label="Temperature", interactive=True)
+ 
+     history = gr.State([])
+     past_key_values = gr.State(None)
+ 
+-    submitBtn.click(predict, [user_input, chatbot, max_length, top_p, temperature, history, past_key_values],
+    submitBtn.click(request, [user_input, chatbot, max_length, top_p, temperature, history, past_key_values],
+                     [chatbot, history, past_key_values], show_progress=True)
+     submitBtn.click(reset_user_input, [], [user_input])
+ 
+     emptyBtn.click(reset_state, outputs=[chatbot, history, past_key_values], show_progress=True)
+ 
+-demo.queue().launch(share=False, inbrowser=True)
+demo.queue(concurrency_count=args.concurrency).launch(server_name='0.0.0.0', share=False, inbrowser=True)
--- a/mindie/examples/models/chatglm/v2_6b/web_requirements.txt
+++ b/mindie/examples/models/chatglm/v2_6b/web_requirements.txt
@ -0,0 +1,3 @@
+gradio==3.39
+mdtex2html
+streamlit
--- a/mindie/examples/models/chatglm/v3_6b/README.md
+++ b/mindie/examples/models/chatglm/v3_6b/README.md
@ -0,0 +1,33 @@
+# ChatGLM3-6B 模型推理指导 <!-- omit in toc -->
+
+# 概述
+
+- ChatGLM3 是智谱AI和清华大学 KEG 实验室联合发布的对话预训练模型。ChatGLM3-6B 是 [ChatGLM3]((https://github.com/THUDM/ChatGLM3)) 系列中的开源模型，在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上，ChatGLM3-6B 有更强大的基础模型、更完整的功能支持、和更全面的开源序列。
+- 此代码仓中实现了一套基于NPU硬件的ChatGLM3-6B推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了ChatGLM3-6B模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI | 长序列 |
+|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
+| ChatGLM3-6B    | 支持world size 1,2,4,8 | 支持world size 1,2,4    | 是   | 否   | 否              | 是              | 否       | 否      | 否    | 否           | 否       | 否     | 是     | 否 | 否 |
+
+- 此模型仓已适配的模型版本
+  - [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b)
+  - [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k)
+  - 注：ChatGLM3-6B 推荐使用commit id为 `a5ba5501eb873d40d48bd0983bd2a8dd006bb838` 的模型仓版本
+
+
+# 使用说明
+
+- 参考[此README文件](../../chatglm/v2_6b/README.md)
+
+## 精度测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## 性能测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## FAQ
+- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错，可通过配置`LD_PRELOAD`解决。
+  - 示例：`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`
--- a/mindie/examples/models/chinese_alpaca/README.md
+++ b/mindie/examples/models/chinese_alpaca/README.md
@ -0,0 +1,99 @@
+# README
+
+[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) 项目开源了中文LLaMA模型和指令精调的Alpaca大模型，以进一步促进大模型在中文NLP社区的开放研究。这些模型在原版LLaMA的基础上扩充了中文词表并使用了中文数据进行二次预训练，进一步提升了中文基础语义理解能力。同时，中文Alpaca模型进一步使用了中文指令数据进行精调，显著提升了模型对指令的理解和执行能力。
+
+- 此代码仓中实现了一套基于NPU硬件的Chinese-LLaMA-Alpaca系列模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了各Chinese-LLaMA-Alpaca模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI |
+|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|---------|--------------|----------|--------|--------|-----|
+| Chinese-Alpaca-13B    | 支持world size 1,2,4,8   | 支持world size 1,2,4     | 是   | 否   | 否              | 是              | 否       | 否       | 否           | 否       | 否     | 否     | 否  |
+
+# 使用说明
+
+## 路径变量解释
+| 变量名  | 含义                                             |
+|--------|--------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置的目录                  |
+| llm_path | 模型仓所在路径；若使用编译好的包，则路径为`${working_dir}/`；若使用gitee下载的代码，则路径为`${working_dir}/ModelLink/mindie_ref/mindie_llm/atb_models`    |
+| script_path | 脚本所在路径；Chinese-Alpaca-13B的工作脚本所在路径为`${llm_path}/examples/models/chinese_alpaca`                            |
+| weight_path | 模型权重路径                            |
+
+## 权重
+**权重下载**
+
+- lora权重: [Chinese-Alpaca-Lora-13B](https://pan.baidu.com/s/1wYoSF58SnU9k0Lndd5VEYg?pwd=mm8i)
+- 原模型权重: [LLaMA-13B](https://huggingface.co/huggyllama/llama-13b)
+> 下载后务必检查压缩包中模型文件的SHA256是否一致，请查看[SHA256.md](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/SHA256.md)
+
+**lora权重合并**
+- 合并lora权重和原模型权重，请参考[合并教程](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E6%89%8B%E5%8A%A8%E6%A8%A1%E5%9E%8B%E5%90%88%E5%B9%B6%E4%B8%8E%E8%BD%AC%E6%8D%A2#%E5%A4%9Alora%E6%9D%83%E9%87%8D%E5%90%88%E5%B9%B6%E9%80%82%E7%94%A8%E4%BA%8Echinese-alpaca-plus)
+
+**权重转换**
+> 若权重中不包含safetensors格式，则执行权重转换步骤，否则跳过
+- 参考[此README文件](../../README.md)
+
+**基础环境变量**
+- 参考[此README文件](../../../README.md)
+
+## 推理
+
+### 对话测试
+
+**运行Paged Attention FP16**
+- 运行启动脚本
+  - 将`${llm_path}`加入`PYTHONPATH`搜索目录
+    ```shell
+    export PYTHONPATH=${llm_path}:${PYTHONPATH}
+    ```
+  - 在\${llm_path}目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_pa.sh ${weight_path}
+    ```
+- 环境变量说明
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+    - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+    - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+    - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+    - 各模型支持的核心数参考“特性矩阵”
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用20030端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export INF_NAN_MODE_ENABLE=0
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export ATB_CONVERT_NCHW_TO_ND=1
+    export LCCL_ENABLE_FALLBACK=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    export ATB_CONTEXT_WORKSPACE_SIZE=0
+    ```
+
+## 精度测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    bash run.sh pa_fp16 full_CEval 1 llama ${Chinese-Alpaca-13B权重路径} 8
+    ```
+
+## 性能测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 llama ${Chinese-Alpaca-13B权重路径} 8
+    ```
+
+## FAQ
+- 更多环境变量见[此README文件](../../README.md)
+- 对话测试实际执行的Python文件为`${llm_path}/examples/run_pa.py`；这两个文件的参数说明见[此README文件](../../README.md)
+- 运行时，需要通过指令pip list｜grep protobuf确认protobuf版本，如果版本高于3.20.x，请运行指令pip install protobuf==3.20.0进行更新
--- a/mindie/examples/models/chinese_alpaca/run_pa.sh
+++ b/mindie/examples/models/chinese_alpaca/run_pa.sh
@ -0,0 +1,23 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MASTER_PORT=20030
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export LCCL_ENABLE_FALLBACK=1
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+export INT8_FORMAT_NZ_ENABLE=1
+
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_pa --model_path $1
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1
+fi
--- a/mindie/examples/models/codegeex/v2_6b/README.md
+++ b/mindie/examples/models/codegeex/v2_6b/README.md
@ -0,0 +1,57 @@
+# CodeGeeX2-6B 模型推理指导 <!-- omit in toc -->
+
+# 概述
+
+- [CodeGeeX2-6B](https://github.com/THUDM/CodeGeeX2) 是多语言代码生成模型 [CodeGeeX](https://github.com/THUDM/CodeGeeX) ([KDD’23](https://arxiv.org/abs/2303.17568)) 的第二代模型。不同于一代 CodeGeeX（完全在国产华为昇腾芯片平台训练） ，CodeGeeX2 是基于 [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) 架构加入代码预训练实现，得益于 ChatGLM2 的更优性能，CodeGeeX2 在多项指标上取得性能提升（+107% > CodeGeeX；仅60亿参数即超过150亿参数的 StarCoder-15B 近10%）。
+- 此代码仓中实现了一套基于NPU硬件的CodeGeeX2推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了CodeGeeX2-6B模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI | 长序列 |
+|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
+| CodeGeeX2-6B    | 支持world size 1,2,4,8 | 支持world size 1,2,4    | 是   | 否   | 否              | 是              | 是      | 否      | 否     | 否           | 否       | 否     | 是     | 是  | 否 |
+
+- 此模型仓已适配的模型版本
+  - [CodeGeeX2-6B](https://huggingface.co/THUDM/codegeex2-6b/tree/main)
+
+
+# 使用说明
+
+- 执行推理前需要将权重目录下的config.json中的`torch_dtype`改为`"float16"`
+- 除了“量化权重导出”章节，其余均参考[此README文件](../../chatglm/v2_6b/README.md)
+## 量化权重导出
+量化权重可通过msmodelslim（昇腾压缩加速工具）实现。
+
+### 环境准备
+环境配置可参考msmodelslim官网：https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/devtools/auxiliarydevtool/modelslim_0002.html
+
+### 导出量化权重
+通过`${llm_path}/examples/models/codegeex/v2_6b/quant_codegeex2_6b_w8a8.py`文件导出模型的量化权重（注意量化权重不要和浮点权重放在同一个目录下）：
+```shell
+python quant_codegeex2_6b_w8a8.py --model_path ${浮点权重路径} --save_path ${量化权重保存路径} --dataset_path ${校准数据集路径}
+```
+校准数据集采用 `${llm_path}/tests/modeltest/dataset/full/BoolQ/dev.jsonl`
+
+导出量化权重后应生成`quant_model_weight_w8a8.safetensors`和`quant_model_description_w8a8.json`两个文件。
+
+注：
+
+1.quant_codegeex2_6b_w8a8.py文件中已配置好较优的量化策略，导出量化权重时可直接使用，也可修改为其它策略。
+
+2.执行脚本生成量化权重时，会在生成的权重路径的config.json文件中添加(或修改)`quantize`字段，值为相应量化方式，当前仅支持`w8a8`。
+
+3.执行完以上步骤后，执行量化模型只需要替换权重路径。
+
+4.如果生成权重时遇到`OpenBLAS Warning: Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP = 1 option`，可通过设置`export OMP_NUM_THREADS=1`来关闭多线程规避。
+
+
+## 精度测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## 性能测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## FAQ
+- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错，可通过配置`LD_PRELOAD`解决。
+  - 示例：`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`
--- a/mindie/examples/models/codegeex/v2_6b/quant_codegeex2_6b_w8a8.py
+++ b/mindie/examples/models/codegeex/v2_6b/quant_codegeex2_6b_w8a8.py
@ -0,0 +1,93 @@
+import os
+import json
+import shutil
+import argparse
+
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
+
+from atb_llm.utils.file_utils import safe_open
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="Creating quant weights for CodeGeex2-6B")
+    parser.add_argument("--model_path", type=str, required=True, help="The path to model float weights")
+    parser.add_argument("--save_path", type=str, default="./quant_weight_geex", help="The path to save quant weights")
+    parser.add_argument("--dataset_path", type=str, required=True, help="The dataset path")
+
+    return parser.parse_args()
+
+
+# 获取校准数据函数定义
+def get_calib_dataset(tokenizer, calib_list, device="cpu"):  # device="npu:0" 如果需要使用npu进行量化
+    calib_dataset = []
+    for calib_data in calib_list:
+        inputs = tokenizer(calib_data, return_tensors='pt')
+        calib_dataset.append([
+            inputs.data['input_ids'].to(device),
+            inputs.data['position_ids'].to(device),
+            inputs.data['attention_mask'].to(device)
+            ])
+    return calib_dataset
+
+
+disable_names = ['transformer.encoder.layers.0.self_attention.query_key_value',
+'transformer.encoder.layers.0.mlp.dense_4h_to_h',
+'transformer.encoder.layers.1.self_attention.query_key_value',
+'transformer.encoder.layers.1.mlp.dense_h_to_4h',
+'transformer.encoder.layers.1.mlp.dense_4h_to_h',
+'transformer.encoder.layers.2.self_attention.query_key_value',
+'transformer.encoder.layers.2.mlp.dense_h_to_4h',
+'transformer.encoder.layers.2.mlp.dense_4h_to_h',
+'transformer.encoder.layers.3.self_attention.query_key_value',
+'transformer.encoder.layers.4.self_attention.query_key_value',
+'transformer.encoder.layers.5.self_attention.query_key_value',
+'transformer.encoder.layers.6.self_attention.query_key_value',
+'transformer.encoder.layers.7.self_attention.query_key_value',
+'transformer.encoder.layers.8.self_attention.query_key_value',
+'transformer.encoder.layers.9.self_attention.query_key_value',
+'transformer.encoder.layers.11.self_attention.query_key_value',
+'transformer.encoder.layers.17.mlp.dense_4h_to_h',
+'transformer.encoder.layers.23.mlp.dense_4h_to_h',
+'transformer.encoder.layers.27.mlp.dense_4h_to_h',
+'transformer.output_layer']
+
+quant_config = QuantConfig(
+    a_bit=8, 
+    w_bit=8, 
+    disable_names=disable_names, 
+    dev_type='cpu',  # dev_type="npu", dev_id=0  如果需要使用npu进行量化
+    act_method=3, 
+    pr=1.0, 
+    w_sym=True, 
+    mm_tensor=False
+)
+
+
+def main():
+    args = parse_args()
+    fp16_path = args.model_path  # 原始浮点模型路径
+    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=fp16_path, trust_remote_code=True) 
+    model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=fp16_path, trust_remote_code=True).float().cpu()
+
+    calib_set = []
+    with safe_open(args.dataset_path, 'r', encoding='utf-8') as file:
+        calib_set = file.readlines()
+
+    dataset_calib = get_calib_dataset(tokenizer, calib_set[:5])
+    calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L0')
+    calibrator.run()  # 执行PTQ量化校准
+    calibrator.save(args.save_path, save_type=["safe_tensor"])  # "safe_tensor"对应safetensors格式权重，"numpy"对应npy格式权重
+
+    model_files = [f for f in os.listdir(args.model_path) if f.startswith(("config", "tokeniz", "modeling_chatglm.py"))]
+    for f in model_files:
+        shutil.copy2(os.path.join(args.model_path, f), os.path.join(args.save_path, f))
+    with safe_open(os.path.join(args.save_path, "config.json"), 'r+', encoding='utf-8') as f:
+        config = json.load(f)
+        config['quantize'] = 'w8a8'
+        f.seek(0)
+        json.dump(config, f, indent=4)
+        f.truncate()
+
+if __name__ == '__main__':
+    main()
--- a/mindie/examples/models/codellama/README.md
+++ b/mindie/examples/models/codellama/README.md
@ -0,0 +1,172 @@
+# README
+
+- [Code Llama](https://github.com/Meta-Llama/codellama) 是Meta发布的代码生成类大语言模型，在编程任务上具备填充、0-shot指令跟随能力，并支持长序列文本输入，在开源模型中拥有先进的性能。Code Llama 是 Llama 2 的代码专用版本，它是通过在代码数据集上对 Llama 2 进行进一步训练，并在同一数据集上长时间采样更多数据而创建的。从本质上讲，Code Llama 具有更强的编码能力。它可以根据代码和自然语言提示（例如，"给我写一个输出斐波那契数列的函数"）生成代码和有关代码的自然语言。它还可用于代码补全和调试。它支持许多当今最流行的编程语言，包括 Python、C++、Java、PHP、Typescript (Javascript)、C#、Bash 等。
+
+- 此代码仓中实现了一套基于NPU硬件的Code Llama推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了各CodeLlama模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI |  长序列 |
+|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|--------------|--------------------------|-----|--------|-----|-----|
+| CodeLlama-7B  | 支持world size 1,2,4,8   | 否      | 是   | 否   | 否              | 是              | 否       | 否       | 否           | 否       | 否     | 否     | 否  | 否  |
+| CodeLlama-13B  | 支持world size 1,2,4,8   | 否      | 是   | 是   | 否              | 是              | 否       | 否       | 否           | 否       | 否     | 是     | 否  | 否  |
+| CodeLlama-34B  | 支持world size 4,8   | 支持world size 2,4,8      | 是   | 是   | 否              | 是              | 是       | 否       | 否           | 是       | 否     | 是     | 否  | 否  |
+| CodeLlama-70B  | 支持world size 4,8   | 否      | 是   | 是   | 否              | 是              | 否       | 否       | 否           | 否       | 否     | 否     | 否  | 否  |
+
+# 使用说明
+
+## 路径变量解释
+| 变量名  | 含义                                             |
+|--------|--------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置的目录                  |
+| llm_path | ATB_Models模型仓所在路径；若使用编译好的包，则路径为`${working_dir}/`；若使用gitee下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models/`    |
+| script_path | 脚本所在路径；CodeLlama的工作脚本所在路径为`${llm_path}/examples/models/codellama`                            |
+| weight_path | 模型权重路径                            |
+
+## 权重
+**权重下载**
+- [CodeLlama-7B](https://huggingface.co/codellama/CodeLlama-7b-hf)
+- [CodeLlama-13B](https://huggingface.co/codellama/CodeLlama-13b-hf)
+- [CodeLlama-34B](https://huggingface.co/codellama/CodeLlama-34b-hf)
+- [CodeLlama-70B](https://huggingface.co/codellama/CodeLlama-70b-hf)
+
+**权重转换**
+> 若权重中不包含safetensors格式，则执行权重转换步骤，否则跳过
+- 参考[此README文件](../../README.md)
+
+**量化权重生成**
+> 基于原始的浮点权重，生成量化权重
+
+- 设置环境变量
+  ```shell
+  # 设置CANN包的环境变量
+  source /usr/local/Ascend/ascend-toolkit/set_env.sh
+  # 推荐使用transformers 4.33.0版本进行量化权重转换，执行模型推理时transformers的版本大于等于4.33.0
+  pip uninstall transformers -y
+  pip install transformers=={指定版本}
+  # NPU多卡量化时关闭虚拟内存
+  export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False
+  # 指定当前机器上可用的逻辑NPU核心
+  export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  # 将`${llm_path}`加入`PYTHONPATH`搜索目录
+  export PYTHONPATH=${llm_path}:${PYTHONPATH}
+  ```
+- W8A8量化权重请使用以下指令生成
+  - Step 1
+    - 修改模型权重config.json中`torch_dtype`字段为`float16`
+  - Step 2 W8A8量化权重生成
+    ```shell
+    cd ${llm_path}/examples/models/codellama
+    python convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A8量化权重路径} --w_bit 8 --a_bit 8 --act_method 3 --anti_method m2 --device_type npu --calib_file ./humaneval_python.json
+    ```
+  > NPU多卡量化注意事项和环境要求见[此README中的【NPU多卡量化】章节](../../README.md)
+
+- 稀疏量化权重请使用以下指令生成
+  > 稀疏量化方式生成的权重只支持在300I DUO硬件上推理
+  - Step 1
+    - 修改模型权重config.json中`torch_dtype`字段为`float16`
+  - Step 2 稀疏量化权重生成
+    ```shell
+    cd ${llm_path}/examples/models/codellama
+    python convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A8S量化权重路径} --w_bit 4 --a_bit 8 --act_method 2 --do_smooth True --use_sigma True --is_lowbit True  --device_type npu --calib_file ./humaneval_python.json
+    ```
+  - Step 3：量化权重切分及压缩
+    > 运行前需要确保压缩工具编译过
+    >
+    > `cd /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/msmodelslim/pytorch/weight_compression/compress_graph`
+    >
+    > `bash build.sh /usr/local/Ascend/ascend-toolkit/latest`
+    ```shell
+    torchrun --nproc_per_node {TP数} -m examples.convert.model_slim.sparse_compressor --model_path {W8A8S量化权重路径} --save_directory {W8A8SC量化权重路径}
+    ```
+    > TP数为tensor parallel并行个数
+    > 注意：若权重生成时以TP=4进行切分，则运行时也需以TP=4运行
+
+**基础环境变量**
+- 参考[此README文件](../../../README.md)
+
+## 推理
+
+### 对话测试
+
+**运行Paged Attention BF16**
+- 运行启动脚本
+  - 将`${llm_path}`加入`PYTHONPATH`搜索目录
+    ```shell
+    export PYTHONPATH=${llm_path}:${PYTHONPATH}
+    ```
+  - 在${llm_path}目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_pa.sh ${weight_path}
+    ```
+- 环境变量说明
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+    - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+    - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+    - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+    - 各模型支持的核心数参考“特性矩阵”
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用20030端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export INF_NAN_MODE_ENABLE=0
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export ATB_CONVERT_NCHW_TO_ND=1
+    export LCCL_ENABLE_FALLBACK=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    export ATB_CONTEXT_WORKSPACE_SIZE=0
+    export INT8_FORMAT_NZ_ENABLE=1
+    ```
+- 300I DUO卡不支持BF16数据类型
+
+**运行Paged Attention FP16**
+- 运行启动脚本
+  - 与“运行Paged Attention BF16”的启动方式相同
+- 环境变量说明
+  - 参见“运行Paged Attention BF16”中的环境变量说明
+- 相比于BF16，运行FP16时需修改${weight_path}/config.json中的`torch_dtype`字段，将此字段对应的值修改为`float16`
+
+**运行Paged Attention W8A8**
+- W8A8量化权重生成
+- 运行启动脚本
+  - 与“运行Paged Attention BF16”的启动方式相同
+  - `${weight_path}`为W8A8量化权重的路径
+- 环境变量说明
+  - 参见“运行Paged Attention BF16”中的环境变量说明
+- 相比于BF16，运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段，将此字段对应的值修改为`w8a8`
+  - 若config.json中无此字段，则新增
+
+## 精度测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    # 运行Paged Attention BF16
+    bash run.sh pa_bf16 full_HumanEval 1 codellama ${weight_path} 8
+    # 运行Paged Attention FP16
+    bash run.sh pa_fp16 full_HumanEval 1 codellama ${weight_path} 8
+    ```
+
+## 性能测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    # 运行Paged Attention BF16
+    bash run.sh pa_bf16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 codellama ${weight_path} 8
+    # 运行Paged Attention FP16
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 codellama ${weight_path} 8
+    ```
+
+## FAQ
+- 更多环境变量见[此README文件](../../README.md)
+- 对话测试实际执行的Python文件为`${llm_path}/examples/run_fa.py`和`${llm_path}/examples/run_pa.py`；这两个文件的参数说明见[此README文件](../../README.md)
+- 运行时，需要通过指令pip list｜grep protobuf确认protobuf版本，如果版本高于3.20.x，请运行指令pip install protobuf==3.20.0进行更新
--- a/mindie/examples/models/codellama/convert_quant_weights.py
+++ b/mindie/examples/models/codellama/convert_quant_weights.py
@ -0,0 +1,84 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import os
+import torch
+
+from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import QuantConfig
+from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig
+
+from atb_llm.models.llama.modeling_llama import LlamaConfig
+from atb_llm.utils.log import logger, print_log
+from examples.convert.convert_utils import copy_tokenizer_files, modify_config
+from examples.convert.model_slim.get_calibration_dataset import load_jsonl
+from examples.convert.model_slim.quantifier import parse_arguments, Quantifier
+
+
+if __name__ == "__main__":
+    args = parse_arguments()
+
+    rank = int(os.getenv("RANK", "0"))
+
+    config = LlamaConfig.from_pretrained(args.model_path)
+
+    disable_names = []
+    if args.a_bit != 16:
+        # W8A16, W4A16没有回退层
+        num_layers = config.num_hidden_layers
+        disable_names = [f"model.layers.{layer}.mlp.down_proj" for layer in range(num_layers)]
+        disable_names.append("lm_head")
+
+    anti_outlier_config = None
+    if args.anti_method:
+        anti_outlier_config = AntiOutlierConfig(anti_method=args.anti_method, dev_type=args.device_type, dev_id=rank)
+
+    quant_config = QuantConfig(
+        a_bit=args.a_bit,
+        w_bit=args.w_bit,
+        disable_names=disable_names,
+        act_method=args.act_method,
+        w_sym=args.w_sym,
+        mm_tensor=False,
+        dev_type=args.device_type,
+        dev_id=rank,
+        pr=1.0,
+        fraction=args.fraction,
+        co_sparse=args.co_sparse,
+        do_smooth=args.do_smooth,
+        use_sigma=args.use_sigma,
+        sigma_factor=args.sigma_factor,
+        is_lowbit=args.is_lowbit,
+    )
+
+    # 默认无校准数据集
+    calibration_dataset = None
+    # 若存在calib_file，则使用calib_file作为校准数据集
+    if args.calib_file:
+        calibration_dataset = load_jsonl(args.calib_file, key_name='prompt')
+    if args.calib_dataset_length <= len(calibration_dataset):
+        calibration_dataset = calibration_dataset[:args.calib_dataset_length]
+        print_log(rank, logger.info, f"calib_dataset_length: {args.calib_dataset_length}")
+    else:
+        print_log(rank, logger.warning,
+                  f"calib_dataset_length is too large, use default {len(calibration_dataset)}")
+    quant_weight_generator = Quantifier(
+        args.model_path, quant_config, anti_outlier_config,
+        device_type=args.device_type, tokenizer_args={"padding_side": "left"}
+    )
+    quant_weight_generator.tokenizer.pad_token_id = 2
+
+    tokenized_data = None
+    if calibration_dataset is not None:
+        dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=4)
+        tokenized_data = quant_weight_generator.get_tokenized_data(dataloader)
+
+    quant_weight_generator.convert(tokenized_data, args.save_directory, args.disable_level)
+    #为适配工具稀疏量化传入w_bit=4,a_bit=8暂时修改quant_type
+    quant_type = f"w{args.w_bit}a{args.a_bit}" + ("s" if (args.co_sparse or args.is_lowbit) else "")
+    is_sparseCompress = args.w_bit == 4 and args.a_bit == 8 and (args.co_sparse or args.is_lowbit)
+    if is_sparseCompress:
+        quant_type = "w8a8s"
+
+    modify_config(
+        args.model_path, args.save_directory, config.torch_dtype,
+        quant_type
+    )
+    copy_tokenizer_files(args.model_path, args.save_directory)
--- a/mindie/examples/models/codellama/humaneval_python.json
+++ b/mindie/examples/models/codellama/humaneval_python.json
@ -0,0 +1,7 @@
+{"task_id": "HumanEval/0", "prompt": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n", "entry_point": "has_close_elements", "canonical_solution": "    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\n    assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\n    assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\n    assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\n    assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\n    assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\n\n"}
+{"task_id": "HumanEval/1", "prompt": "from typing import List\n\n\ndef separate_paren_groups(paren_string: str) -> List[str]:\n    \"\"\" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to\n    separate those group into separate strings and return the list of those.\n    Separate groups are balanced (each open brace is properly closed) and not nested within each other\n    Ignore any spaces in the input string.\n    >>> separate_paren_groups('( ) (( )) (( )( ))')\n    ['()', '(())', '(()())']\n    \"\"\"\n", "entry_point": "separate_paren_groups", "canonical_solution": "    result = []\n    current_string = []\n    current_depth = 0\n\n    for c in paren_string:\n        if c == '(':\n            current_depth += 1\n            current_string.append(c)\n        elif c == ')':\n            current_depth -= 1\n            current_string.append(c)\n\n            if current_depth == 0:\n                result.append(''.join(current_string))\n                current_string.clear()\n\n    return result\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate('(()()) ((())) () ((())()())') == [\n        '(()())', '((()))', '()', '((())()())'\n    ]\n    assert candidate('() (()) ((())) (((())))') == [\n        '()', '(())', '((()))', '(((())))'\n    ]\n    assert candidate('(()(())((())))') == [\n        '(()(())((())))'\n    ]\n    assert candidate('( ) (( )) (( )( ))') == ['()', '(())', '(()())']\n"}
+{"task_id": "HumanEval/2", "prompt": "\n\ndef truncate_number(number: float) -> float:\n    \"\"\" Given a positive floating point number, it can be decomposed into\n    and integer part (largest integer smaller than given number) and decimals\n    (leftover part always smaller than 1).\n\n    Return the decimal part of the number.\n    >>> truncate_number(3.5)\n    0.5\n    \"\"\"\n", "entry_point": "truncate_number", "canonical_solution": "    return number % 1.0\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate(3.5) == 0.5\n    assert abs(candidate(1.33) - 0.33) < 1e-6\n    assert abs(candidate(123.456) - 0.456) < 1e-6\n"}
+{"task_id": "HumanEval/3", "prompt": "from typing import List\n\n\ndef below_zero(operations: List[int]) -> bool:\n    \"\"\" You're given a list of deposit and withdrawal operations on a bank account that starts with\n    zero balance. Your task is to detect if at any point the balance of account fallls below zero, and\n    at that point function should return True. Otherwise it should return False.\n    >>> below_zero([1, 2, 3])\n    False\n    >>> below_zero([1, 2, -4, 5])\n    True\n    \"\"\"\n", "entry_point": "below_zero", "canonical_solution": "    balance = 0\n\n    for op in operations:\n        balance += op\n        if balance < 0:\n            return True\n\n    return False\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate([]) == False\n    assert candidate([1, 2, -3, 1, 2, -3]) == False\n    assert candidate([1, 2, -4, 5, 6]) == True\n    assert candidate([1, -1, 2, -2, 5, -5, 4, -4]) == False\n    assert candidate([1, -1, 2, -2, 5, -5, 4, -5]) == True\n    assert candidate([1, -2, 2, -2, 5, -5, 4, -4]) == True\n"}
+{"task_id": "HumanEval/7", "prompt": "from typing import List\n\n\ndef filter_by_substring(strings: List[str], substring: str) -> List[str]:\n    \"\"\" Filter an input list of strings only for ones that contain given substring\n    >>> filter_by_substring([], 'a')\n    []\n    >>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a')\n    ['abc', 'bacd', 'array']\n    \"\"\"\n", "entry_point": "filter_by_substring", "canonical_solution": "    return [x for x in strings if substring in x]\n", "test": "\n\nMETADATA = {\n    'author': 'jt',\n    'dataset': 'test'\n}\n\n\ndef check(candidate):\n    assert candidate([], 'john') == []\n    assert candidate(['xxx', 'asd', 'xxy', 'john doe', 'xxxAAA', 'xxx'], 'xxx') == ['xxx', 'xxxAAA', 'xxx']\n    assert candidate(['xxx', 'asd', 'aaaxxy', 'john doe', 'xxxAAA', 'xxx'], 'xx') == ['xxx', 'aaaxxy', 'xxxAAA', 'xxx']\n    assert candidate(['grunt', 'trumpet', 'prune', 'gruesome'], 'run') == ['grunt', 'prune']\n"}
+{"task_id": "HumanEval/65", "prompt": "\ndef circular_shift(x, shift):\n    \"\"\"Circular shift the digits of the integer x, shift the digits right by shift\n    and return the result as a string.\n    If shift > number of digits, return digits reversed.\n    >>> circular_shift(12, 1)\n    \"21\"\n    >>> circular_shift(12, 2)\n    \"12\"\n    \"\"\"\n", "entry_point": "circular_shift", "canonical_solution": "    s = str(x)\n    if shift > len(s):\n        return s[::-1]\n    else:\n        return s[len(s) - shift:] + s[:len(s) - shift]\n", "test": "def check(candidate):\n\n    # Check some simple cases\n    assert candidate(100, 2) == \"001\"\n    assert candidate(12, 2) == \"12\"\n    assert candidate(97, 8) == \"79\"\n    assert candidate(12, 1) == \"21\", \"This prints if this assert fails 1 (good for debugging!)\"\n\n    # Check some edge cases that are easy to work out by hand.\n    assert candidate(11, 101) == \"11\", \"This prints if this assert fails 2 (also good for debugging!)\"\n\n"}
+{"task_id": "HumanEval/79", "prompt": "\ndef decimal_to_binary(decimal):\n    \"\"\"You will be given a number in decimal form and your task is to convert it to\n    binary format. The function should return a string, with each character representing a binary\n    number. Each character in the string will be '0' or '1'.\n\n    There will be an extra couple of characters 'db' at the beginning and at the end of the string.\n    The extra characters are there to help with the format.\n\n    Examples:\n    decimal_to_binary(15)   # returns \"db1111db\"\n    decimal_to_binary(32)   # returns \"db100000db\"\n    \"\"\"\n", "entry_point": "decimal_to_binary", "canonical_solution": "    return \"db\" + bin(decimal)[2:] + \"db\"\n", "test": "def check(candidate):\n\n    # Check some simple cases\n    assert candidate(0) == \"db0db\"\n    assert candidate(32) == \"db100000db\"\n    assert candidate(103) == \"db1100111db\"\n    assert candidate(15) == \"db1111db\", \"This prints if this assert fails 1 (good for debugging!)\"\n\n    # Check some edge cases that are easy to work out by hand.\n    assert True, \"This prints if this assert fails 2 (also good for debugging!)\"\n\n"}
--- a/mindie/examples/models/codellama/run_pa.sh
+++ b/mindie/examples/models/codellama/run_pa.sh
@ -0,0 +1,23 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MASTER_PORT=20030
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export LCCL_ENABLE_FALLBACK=1
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export ATB_CONTEXT_WORKSPACE_SIZE=0
+export INT8_FORMAT_NZ_ENABLE=1
+
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_pa --model_path $1
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1
+fi
--- a/mindie/examples/models/codeshell/README.md
+++ b/mindie/examples/models/codeshell/README.md
@ -0,0 +1,33 @@
+# CodeShell-7B 模型推理指导 <!-- omit in toc -->
+
+# 概述
+
+- [CodeShell-7B](https://github.com/WisdomShell/codeshell)是北京大学知识计算实验室联合四川天府银行AI团队研发的多语言代码大模型基座。它拥有70亿参数，经过对五千亿Tokens的训练，并具有8192的上下文窗口长度。CodeShell在权威的代码评估Benchmark（HumanEval与MBPP）上取得了同等规模最好的性能。这个项目为多语言代码处理和理解提供了有力的工具。
+- 此代码仓中实现了一套基于NPU硬件的CodeShell推理模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了CodeShell-7B模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI | 长序列 |
+|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
+| CodeShell-7B    | 支持world size 1,2,4,8  | 支持world size 1,2,4      | 是   | 否   | 否              | 是              | 否       | 否      | 否     | 否           | 否       | 否     | 否    | 否  | 否 |
+
+- 此模型仓已适配的模型版本
+  - [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell)
+
+
+# 使用说明
+
+- 执行推理前需要将权重目录下的config.json中的`torch_dtype`改为`"float16"`
+- 修改config.json中的`model_type`改为`"codeshell"`
+
+
+## 精度测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## 性能测试
+- 参考[此README文件](../../../../tests/modeltest/README.md)
+
+## FAQ
+- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错，可通过配置`LD_PRELOAD`解决。
+  - 示例：`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`
--- a/mindie/examples/models/deepseek/README_DeepSeek_Coder.md
+++ b/mindie/examples/models/deepseek/README_DeepSeek_Coder.md
@ -0,0 +1,112 @@
+# README
+
+- [Deepseek]是由一系列代码语言模型组成。提供 1.3B、6.7B、7B 和 33B 的型号尺寸，使用者能够选择最适合其要求的设置。（当前脚本支持1.3B、6.7B、7B和33B）
+
+- 此代码仓中实现了一套基于NPU硬件的Deepseek-Coder模型。配合加速库使用，旨在NPU上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了各DeepSeek-Coder模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16（仅800I A2支持） | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化（仅300I DUO支持） | MOE | MindIE | TGI | 长序列 |
+|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|--------------|--------------------------|-----|--------|-----|-----|
+| DeepSeek-Coder-1.3B    | 支持world size 1,2,4,8     | ×                |  ×  |     √               |  √               |     √           |  ×      |    ×     |   ×         |      ×                  |  × |   ×   | ×  |×|
+| DeepSeek-Coder-6.7B   | 支持world size 1,2,4,8     | 支持world size 2,4 |   √ |   √                  |      √          |       √         |    ×    |     ×    |     ×       |        ×                |  × |   ×   | ×  |×|
+| DeepSeek-Coder-7B   | 支持world size 1,2,4,8     | 支持world size 2,4   |   √ |    √                 |     √           |      √          |    ×    |      ×   |    ×        |       ×                |  × |    ×  | ×  |×|
+| DeepSeek-Coder-33B   | 支持world size 4,8           | ×                |   × |     √                |     √           |      √          |    ×    |     ×    |    ×        |       ×                 |  × |    ×  | ×  |×|
+
+- 此模型仓已适配的模型版本
+  - [DeepSeek-Coder系列](https://github.com/deepseek-ai/DeepSeek-Coder)
+
+# 使用说明
+
+## 路径变量解释
+| 变量名  | 含义                                             |
+|--------|--------------------------------------------------|
+| working_dir | 加速库及模型库下载后放置的目录                  |
+| llm_path | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用gitee下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models`    |
+| script_path | 脚本所在路径；Deepseek-Coder的工作脚本所在路径为`${llm_path}/examples/models/deepseek`                            |
+| weight_path | 模型权重路径                            |
+
+## 权重
+**权重下载**
+- [Deepseek-Coder-1.3B](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct)
+- [Deepseek-Coder-6.7B](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
+- [Deepseek-Coder-7B](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5)
+- [Deepseek-Coder-33B](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
+
+**基础环境变量**
+- 参考[此README文件](../../../README.md)
+
+**权重转换**
+- 参考[此README文件](../../README.md)
+
+**量化权重生成**
+- 暂不支持
+
+
+## 推理
+
+### 对话测试
+**运行Paged Attention FP16**
+- 运行启动脚本 （chat_template接口 transformers版本需求：4.34.0）
+  - 在\${llm_path}目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_pa.sh ${weight_path}
+    ```
+- 启动脚本中可设置自定义问题，具体在input_text后面修改即可
+- 环境变量说明
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+    - 指定当前机器上可用的逻辑NPU核心，多个核心间使用逗号相连
+    - 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
+    - 对于300I DUO卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+    - 各模型支持的核心数参考“特性矩阵”
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用20030端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export INF_NAN_MODE_ENABLE=0
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export ATB_CONVERT_NCHW_TO_ND=1
+    export LCCL_ENABLE_FALLBACK=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    export ATB_CONTEXT_WORKSPACE_SIZE=0
+    ```
+
+## 精度测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-1.3b权重路径} 8
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-6.7b权重路径} 8
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-7b权重路径} 8
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-33b权重路径} 8
+    ```
+- 运行量化权重和BF16时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配，参考[此README文件](../../README.md)
+
+## 性能测试
+- 参考[此README文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    export ATB_LLM_BENCHMARK_ENABLE=1
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-1.3b权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-6.7b权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-7b权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-33b权重路径} 8
+    ```
+- 运行量化权重和BF16时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配，参考[此README文件](../../README.md)
+
+## FAQ
+- 更多环境变量见[此README文件](../../README.md)
+- 对话测试实际执行的Python文件为`${llm_path}/examples/run_pa.py`；这个文件的参数说明见[此README文件](../../README.md)
+- 运行时，需要通过指令pip list｜grep protobuf确认protobuf版本，如果版本高于3.20.x，请运行指令pip install protobuf==3.20.0进行更新
--- a/mindie/examples/models/deepseek/README_deepseek_llm.md
+++ b/mindie/examples/models/deepseek/README_deepseek_llm.md
@ -0,0 +1,101 @@
+# README
+
+- [DeepSeek-LLM](https://github.com/deepseek-ai/deepseek-LLM)从包含2T token的中英文混合数据集中，训练得到7B Base、7B Chat、67B Base与67B Chat四种模型
+
+# 支持特性
+| 模型及参数量       | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16（仅800I A2支持） | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化（仅300I DUO支持） | MOE | MindIE | TGI |长序列|
+|------------------|----------------------------|-----------------------------|------|---------------------|-----------------|-----------------|---------|-----------|--------------|------------------------|-----|--------|-----|-----|
+| DeepSeek-LLM-7B  | 支持world size 1,2,4,8        | 支持world size 1,2,4,8        | √   | ×                   | ×              | √               | ×       | ×        | ×           | ×                      | ×  | ×     | ×  |×  |
+| DeepSeek-LLM-67B | 支持world size 8            | ×                          | √    | ×                   | ×              | √               | ×       | ×        | ×           | ×                      | ×  | ×     | ×  |×  |
+
+
+# 使用说明
+
+## 路径变量解释
+
+| 变量名         | 含义                             |
+| --------------| --------------------------------|
+| `working_dir` | 加速库及模型库下载后放置的目录       |
+| `llm_path`    | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用 gitee 下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
+| `script_path` | 脚本所在路径；Deepseek-LLM的工作脚本所在路径为`${llm_path}/examples/models/deepseek` |
+| `weight_path` | 模型权重路径                      |
+
+## 权重
+
+### 权重下载
+- [Deepseek-LLM-7B-Base](https://huggingface.co/deepseek-ai/deepseek-llm-7b-base)
+- [Deepseek-LLM-7B-Chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
+- [Deepseek-LLM-67B-Chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-base)
+- [Deepseek-LLM-67B-Chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)
+
+### 权重转换
+- 当前仅支持加载safetensor格式的权重文件，若权重文件为bin格式，请参考[此README文件](../../README.md)
+
+
+## 基础环境变量
+- 参考[此 README 文件](../../../README.md)
+
+## 推理
+
+### 对话测试
+
+**运行 Paged Attention FP16**
+- 运行启动脚本（`transformers` 版本需求：>=4.35.0）
+  - 在`${llm_path}`目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_pa.sh ${weight_path}
+    ```
+- 启动脚本中可设置自定义问题，具体在 input_text 后面修改即可 (默认问题为"Who is the CEO of Google?")
+- 启动脚本中可设置自定义输出长度，具体在 max_output_length 后面修改即可（默认长度为 10）
+- 若当前所用权重版本为"chat"版本，请将"--is_chat_model"赋值给 extra_param；若当前所用权重版本为"base"版本，可以将空字符串赋值给 extra_param（默认为 chat_model）
+- 环境变量说明
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+    - 指定当前机器上可用的逻辑 NPU 核心，多个核心间使用逗号相连
+    - 核心 ID 查阅方式见[此 README 文件](../../README.md)的【启动脚本相关环境变量】章节
+    - 对于 300I DUO 卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+    - 各模型支持的核心数参考“特性矩阵”
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用 20030 端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export INF_NAN_MODE_ENABLE=0
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export ATB_CONVERT_NCHW_TO_ND=1
+    export LCCL_ENABLE_FALLBACK=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    export ATB_CONTEXT_WORKSPACE_SIZE=1
+    export INT8_FORMAT_NZ_ENABLE=1
+    ```
+
+## 精度测试
+- 参考[此 README 文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek_llm ${deepseek-llm-7b-base权重路径} 2
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek_llm ${deepseek-llm-67b-base权重路径} 8
+    ```
+
+## 性能测试
+- 参考[此 README 文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    export ATB_LLM_BENCHMARK_ENABLE=1
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_llm ${deepseek-llm-7b-base权重路径} 2
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_llm ${deepseek-llm-67b-base权重路径} 8
+    ```
+
+## FAQ
+- 更多环境变量见[此 README 文件](../../README.md)
+- 对话测试实际执行的 Python 文件为`${llm_path}/examples/run_pa.py`；这个文件的参数说明见[此 README 文件](../../README.md)
+- 运行时，需要通过指令`pip list ｜ grep protobuf`确认`protobuf`版本，如果版本高于 3.20.x，请运行指令`pip install protobuf==3.20.0`进行更新
--- a/mindie/examples/models/deepseek/README_deepseek_moe.md
+++ b/mindie/examples/models/deepseek/README_deepseek_moe.md
@ -0,0 +1,103 @@
+# README
+
+- [DeepSeekMoE 16B]是具有 16.4B 参数的混合专家(MoE)语言模型。模型主要涉及两个创新策略：专家细分和共享专家。此模型用[DeepSeek 7B]和[Llama2 7B]40%的计算量，就可以得到与其相当的精度结果。（当前脚本支持 16B-Base 和 16B-Chat）
+- 此代码仓中实现了一套基于 NPU 硬件的 Deepseek-MoE 模型。配合加速库使用，旨在 NPU 上获得极致的推理性能。
+
+# 特性矩阵
+- 此矩阵罗列了各DeepSeek-MoE模型支持的特性
+
+| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16（仅800I A2支持） | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 |KV cache量化 | 稀疏量化（仅300I DUO支持） | MindIE | TGI | 长序列  |
+|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|-----------|--------------|--------------------------|--------|-----|-----|
+| DeepSeek-MoE-16B-Chat    | 支持world size 4,8     | ×                | √   | ×                   | √              | √              | ×       | ×        | ×        | ×           | ×                       | √     | ×  | ×  |
+
+# 使用说明
+
+## 路径变量解释
+
+| 变量名      | 含义                                                                                                                                                     |
+| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| working_dir | 加速库及模型库下载后放置的目录                                                                                                                           |
+| llm_path    | 模型仓所在路径。若使用编译好的包，则路径为`${working_dir}/MindIE-LLM/`；若使用 gitee 下载的代码，则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
+| script_path | 脚本所在路径；Deepseek-MoE 的工作脚本所在路径为`${llm_path}/examples/models/deepseek`                                                                    |
+| weight_path | 模型权重路径                                                                                                                                             |
+
+## 权重
+
+**权重下载**
+
+- [Deepseek-MoE-16B-Base](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base)
+- [Deepseek-MoE-16B-Chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat)
+
+**基础环境变量**
+
+- 参考[此 README 文件](../../../README.md)
+
+## 推理
+
+### 对话测试
+
+**运行 Paged Attention FP16**
+
+- 运行启动脚本（transformers 版本需求：4.36.2）
+  - 在\${llm_path}目录下执行以下指令
+    ```shell
+    bash ${script_path}/run_pa_deepseek_moe.sh ${weight_path}
+    ```
+- 启动脚本中可设置自定义问题，具体在 input_text 后面修改即可 (默认问题为"Who is the CEO of Google?")
+- 启动脚本中可设置自定义输出长度，具体在 max_output_length 后面修改即可（默认长度为 10）
+- 若当前所用权重版本为"chat"版本，请将"--is_chat_model"赋值给 extra_param；若当前所用权重版本为"base"版本，可以将空字符串赋值给 extra_param（默认为 chat_model）
+- 环境变量说明
+  - `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
+    - 指定当前机器上可用的逻辑 NPU 核心，多个核心间使用逗号相连
+    - 核心 ID 查阅方式见[此 README 文件](../../README.md)的【启动脚本相关环境变量】章节
+    - 对于 300I DUO 卡而言，若要使用单卡双芯，请指定至少两个可见核心；若要使用双卡四芯，请指定至少四个可见核心
+    - 各模型支持的核心数参考“特性矩阵”
+  - `export MASTER_PORT=20030`
+    - 设置卡间通信端口
+    - 默认使用 20030 端口
+    - 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
+    - 设置时端口建议范围为：20000-20050
+  - 以下环境变量与性能和内存优化相关，通常情况下无需修改
+    ```shell
+    export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+    export INF_NAN_MODE_ENABLE=0
+    export ATB_OPERATION_EXECUTE_ASYNC=1
+    export TASK_QUEUE_ENABLE=1
+    export ATB_CONVERT_NCHW_TO_ND=1
+    export LCCL_ENABLE_FALLBACK=1
+    export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+    export ATB_CONTEXT_WORKSPACE_SIZE=1
+    export INT8_FORMAT_NZ_ENABLE=1
+    export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0
+    ```
+
+## 精度测试
+
+- 参考[此 README 文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek ${deepseek-moe-16b-base权重路径} 8
+    bash run.sh pa_fp16 full_BoolQ 1 deepseek ${deepseek-moe-16b-chat权重路径} 8
+    ```
+
+## 性能测试
+
+- 参考[此 README 文件](../../../tests/modeltest/README.md)
+  - 示例
+    ```shell
+    cd ${llm_path}/tests/modeltest
+    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+    export MAX_MEMORY_GB=29
+    export ATB_LLM_BENCHMARK_ENABLE=1
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek ${deepseek-moe-16b-base权重路径} 8
+    bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek ${deepseek-moe-16b-chat权重路径} 8
+    ```
+
+## FAQ
+
+- 更多环境变量见[此 README 文件](../../README.md)
+- 对话测试实际执行的 Python 文件为`${llm_path}/examples/run_pa.py`；这个文件的参数说明见[此 README 文件](../../README.md)
+- 运行时，需要通过指令 pip list ｜ grep protobuf 确认 protobuf 版本，如果版本高于 3.20.x，请运行指令 pip install protobuf==3.20.0 进行更新
--- a/mindie/examples/models/deepseek/run_pa.sh
+++ b/mindie/examples/models/deepseek/run_pa.sh
@ -0,0 +1,26 @@
+# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# 参数配置以及启动指令的说明见同级目录下的README.md文件
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MASTER_PORT=20030
+
+# 以下环境变量与性能和内存优化相关，通常情况下无需修改
+export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
+export INF_NAN_MODE_ENABLE=0
+export ATB_OPERATION_EXECUTE_ASYNC=1
+export TASK_QUEUE_ENABLE=1
+export ATB_CONVERT_NCHW_TO_ND=1
+export LCCL_ENABLE_FALLBACK=1
+export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
+export ATB_CONTEXT_WORKSPACE_SIZE=1
+export INT8_FORMAT_NZ_ENABLE=1
+
+
+extra_param="--is_chat_model"
+world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
+
+
+if [ "$TP_WORLD_SIZE" == "1" ]; then
+    python -m examples.run_pa --model_path $1 $extra_param
+else
+    torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
+fi
--- a/Show More
+++ b/Show More
				`@ -0,0 +1 @@`
				`[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What's deep learning?"}, {"role": "assistant", "content": "Deep learning is a subset of machine learning that uses artificial neural networks to learn from data."}, {"role": "user", "content": "Can you explain in more detail?"}]`