add: add mindie file

This commit is contained in:
wql 2024-09-10 15:38:33 +08:00
parent a61372ee0f
commit faa909dcc3
193 changed files with 24234 additions and 0 deletions

220
mindie/examples/README.md Normal file
View File

@ -0,0 +1,220 @@
# README
- 此README对各模型统一的脚本及其使用方式进行介绍
## 路径变量解释
| 变量名 | 含义 |
|--------|--------------------------------------------------|
| working_dir | 加速库及模型库下载后放置目录 |
| llm_path | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`若使用gitee下载的代码则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| weight_path | 模型权重路径 |
| w8a8s_weight_path | 稀疏量化权重路径 |
| w8a8sc_weight_path | 切分并压缩后的稀疏量化权重路径 |
| cur_dir | 运行指令或执行脚本时的路径(当前目录) |
## 权重
### 权重设置
- `${weight_path}/config.json`文件中需设置`dtype`和`quantize`类型来标识权重的量化类型和精度
- 若`dtype`和`quantize`字段不存在,需新增
- 配置
| 量化类型及精度 | torch_dtype | quantize |
|----------------|-------------|----------|
| FP16 | "float16" | 无 |
| BF16 | "bfloat16" | 无 |
| W8A8 | "float16" | "w8a8" |
| W8A8S | "float16" | "w8a8s" |
| W8A8SC | "float16" | "w8a8sc" |
| W8A16 | "float16" | "w8a16" |
- 示例
- LLaMa模型的权重使用BF16精度非量化
```json
{
"architectures": [
"LlamaForCausalLM"
],
...
"torch_dtype": "bfloat16",
...
}
```
- LLaMa模型的权重使用FP16精度W8A16量化
```json
{
"architectures": [
"LlamaForCausalLM"
],
...
"torch_dtype": "float16",
...
"quantize": "w8a16",
}
```
### 权重转换
> 当前仅支持加载safetensor格式的权重文件
> 若下载的权重文件中已有safetensor格式的文件则无需进行权重转换
> 若环境中仅有bin格式的权重文件请按照如下方式进行转换
> 若当前环境不存在模型权重请至hugging face官网下载
- 使用`${llm_path}/examples/convert/convert_weights.py`将bin转成safetensor格式
- 示例
```shell
cd ${llm_path}
python examples/convert/convert_weights.py --model_path ${weight_path}
```
- 注意:必须先进入`${llm_path}`路径下执行以上命令否则由于脚本中存在相对路径会导致moudle not found的问题
- 输出结果会保存在bin权重同目录下
### NPU多卡量化
- 环境要求
- 硬件环境910A或910B环境
- Pytorch、PTA配套在2.1版本以上
- CANN >= 8.0.RC2.B010
- accelerate >= 0.28.0
- 关闭虚拟内存:`PYTORCH_NPU_ALLOC_CONF`环境变量需设置为`expandable_segments:False`(虚拟内存默认关闭)
- 调用`${llm_path}/examples/convert/model_slim/quantifier.py`脚本时,`--device_type`参数需设置为`npu`
- 参数配置和运行指令见各模型README文件
### 稀疏量化权重生成
- Step 1生成稀疏量化权重
```shell
cd ${llm_path}
python -m examples.convert.model_slim.quantifier --model_path ${weight_path} --save_directory ${w8a8s_weight_path} --w_bit 4 --a_bit 8 --calib_dataset_type TeacherQualification --fraction 0.011 --co_sparse True
```
- 参数配置以模型README文件中的描述为准
- Step 2量化权重切分及压缩
```shell
torchrun --nproc_per_node {TP数} -m examples.convert.model_slim.sparse_compressor --model_path ${w8a8s_weight_path} --save_directory ${w8a8sc_weight_path}
```
- TP数为tensor parallel并行个数
- 注意若权重生成时以TP=4进行切分则运行时也需以TP=4运行
- 示例
```shell
torchrun --nproc_per_node 2 -m examples.convert.model_slim.sparse_compressor --model_path /data1/weights/model_slim/llama2-7b_w8a8s --save_directory /data1/weights/model_slim/llama2-7b_w8a8sc_temp
```
## 启动脚本
- Flash Attention的启动脚本路径为`${llm_path}/examples/run_fa.py`
- Page Attention的启动脚本路径为`${llm_path}/examples/run_pa.py`
### 启动脚本相关环境变量
- `ASCEND_RT_VISIBLE_DEVICES`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心编号需要通过 npu-smi info 指令查阅
- Atlas 800I A2服务器需基于输出的 NPU 列查阅
![npu_smi_info](../images/npu_smi_info_800i_a2.png)
- Atlas 300I DUO服务器需基于输出的 Device 列查阅
![npu_smi_info](../images/npu_smi_info_300i_duo.png)
- 若要使用单卡双芯,请指定至少两个可见核心;若要使用双卡四芯,请指定至少四个可见核心
- `BIND_CPU`
- 绑定CPU核心开关
- 设置为1进行绑核设置为0则不绑核默认进行绑核
- 若当前机器未设置NUMA或绑核失败可将 BIND_CPU 设为 0
- `PROFILING_LEVEL`
- 设置ProfilerLevel默认为0
- `ATB_PROFILING_ENABLE`
- 是否落性能profiling文件
- 设置为1生成profiling文件设置为0则不生成默认不生成profiling文件
- `PROFILING_FILEPATH`
- 若生成profiling文件profiling文件的路径
- 默认为`${cur_dir}/profiling`
- `ATB_LLM_BENCHMARK_ENABLE`
- 是否统计端到端和各token的性能数据
- 设置为1统计耗时设置为0则不统计默认不统计
- `ATB_LLM_BENCHMARK_FILEPATH`
- 性能数据的保存路径
- 默认为`${cur_dir}/benchmark_result/benchmark.csv`
- `ATB_LLM_LCOC_ENABLE`
- 是否开启通信计算掩盖功能
- 在Prefill阶段开启通信计算掩盖会提升性能
- `ATB_LLM_LOGITS_SAVE_ENABLE`
- 是否保存每个token的logits每个logits会保存成一个单独的pth文件
- 设置为1保存设置为0则不保存默认不保存
- `ATB_LLM_LOGITS_SAVE_FOLDER`
- logits保存路径
- 默认为`${cur_dir}`
- `ATB_LLM_TOKEN_IDS_SAVE_ENABLE`
- 是否保存每个token的id输入和输出token会单独保存成两个文件
- 设置为1保存设置为0则不保存默认不保存
- `ATB_LLM_TOKEN_IDS_SAVE_FOLDER`
- token id保存路径
- 默认为`${cur_dir}`
### run_fa.py脚本参数
- `--model_path`
- 模型权重路径
- `--input_text`
- 输入问题
- 支持字符串列表或者字符串
- 若此值为字符串则构造推理输入时会基于batch size入参复制多份
- 若此值为列表则构造推理输入时会忽略batch size入参真实的batch size为此列表实际长度
- `--max_input_length`
- 最大输入长度
- 默认512个token
- 若输入长度不足512个token会自动使用padding补齐
- `--max_output_length`
- 最大输出长度
- - 默认输出20个token
- `--batch_size`
- 推理时固定的batch数量
- 默认单batch
- `--is_flash_causal_lm`
- 是否使用Paged Attention默认不使用
- 示例
```shell
# 使用多卡运行Flash Attention设置模型权重路径设置输出长度为2048个token精度使用BF16
torchrun --nproc_per_node 2 --master_port 20038 -m examples.run_fa --model_path ${weight_path} --max_output_length 2048 --is_bf16
```
### run_pa.py脚本参数
- `--model_path`
- 模型权重路径
- `--input_text`
- 输入问题
- 支持字符串列表或者字符串
- 若此值为单元素列表或字符串则构造推理输入时会基于batch size入参复制多份
- 若此值为多元素列表则构造推理输入时会忽略batch size入参真实的batch size为此列表实际长度
- `--input_file`
- 目前仅支持jsonl格式文件每一行必须为List[Dict]格式的按时间顺序排序对话数据
- 每个Dict字典中需要至少包含"role"和"content"两个字段
- `--max_position_embeddings`
- 模型可接受的最长输入长度
- 默认从模型权重的config文件中读取
- `--max_output_length`
- 最大输出长度
- - 默认输出20个token
- `--max_prefill_tokens`
- Prefill推理阶段最大输入长度
- 默认4096个token
- `--max_batch_size`
- 最大batch size实际运行的batch size动态变化有可能达不到设置的最大batch size
- 默认单batch
- `--is_flash_model`
- 是否使用Paged Attention默认使用
- `--is_chat_model`
- store_true类型参数若添加则判定是chat模型
- 会从input_file(当前仅支持jsonl格式文件)中读取List[Dict]类型的对话数据
- 若未指定input_file则会将input_text中的文本自动组成对话数据
- `--chat_template`
- 默认值为None且仅在is_chat_model为True时生效
- 若设置为文件名且is_chat_model为True则从文件名指定的文件中读取jinja格式的chat模板
- 若设置为字符串且is_chat_model为True则将该字符串解析为jinja格式的chat模板
- 示例
```shell
# 使用多卡运行Paged Attention设置模型权重路径设置输出长度为2048个token
torchrun --nproc_per_node 2 --master_port 20038 -m examples.run_pa --model_path ${weight_path} --max_output_length 2048
```
### 特殊场景说明
- 单机多用户场景
- 300I DUO 和 800I A2 上,单机多用户场景下,由于通信算子之间采用共享内存进行通信,每个用户需要配置如下环境变量,进行共享内存的区分;
```shell
export ATB_SHARE_MEMORY_NAME_SUFFIX="user1"
```
- 单机多用户场景如300I DUO上有4张卡每张卡单独跑一个模型推理任务需要根据不同任务设置上述环境变量来区分如`user1`、`user2`
- 300I DUO卡上需开启以下环境变量
```shell
export INT8_FORMAT_NZ_ENABLE=1
```

View File

View File

View File

@ -0,0 +1,27 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import json
import os.path
import shutil
from atb_llm.utils.file_utils import safe_open
def copy_tokenizer_files(model_dir, dest_dir):
os.makedirs(dest_dir, exist_ok=True)
for filename in os.listdir(model_dir):
if 'tokenizer' in filename or 'tokenization' in filename or 'special_token_map' in filename:
src_filepath = os.path.join(model_dir, filename)
dest_filepath = os.path.join(dest_dir, filename)
shutil.copyfile(src_filepath, dest_filepath)
def modify_config(model_dir, dest_dir, torch_dtype, quantize_type, kv_quant_type=False):
src_config_filepath = os.path.join(model_dir, 'config.json')
with open(src_config_filepath, 'r', encoding='utf-8') as fr:
data = json.load(fr)
data['torch_dtype'] = str(torch_dtype).split(".")[1]
data['quantize'] = quantize_type
if kv_quant_type:
data['kv_quant'] = "C8" # 当前仅支持kv cache仅支持C8类型的量化方式
dest_config_filtpath = os.path.join(dest_dir, 'config.json')
with safe_open(dest_config_filtpath, 'w', encoding='utf-8', is_exist_ok=False) as fw:
json.dump(data, fw, indent=4)

View File

@ -0,0 +1,41 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import argparse
from atb_llm.utils.convert import convert_files
from atb_llm.utils.hub import weight_files
from atb_llm.utils.log import logger
def parse_arguments():
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', help="model and tokenizer path")
return parser.parse_args()
def convert_bin2st(model_path):
local_pt_files = weight_files(model_path, revision=None, extension=".bin")
local_st_files = [
p.parent / f"{p.stem.lstrip('pytorch_')}.safetensors"
for p in local_pt_files
]
convert_files(local_pt_files, local_st_files, discard_names=[])
_ = weight_files(model_path)
def convert_bin2st_from_pretrained(model_path):
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=model_path,
low_cpu_mem_usage=True,
torch_dtype="auto")
model.save_pretrained(model_path, safe_serialization=True)
if __name__ == '__main__':
args = parse_arguments()
try:
convert_bin2st(args.model_path)
except RuntimeError:
logger.warning('convert weights failed with torch.load method, need model loaded to convert')
convert_bin2st_from_pretrained(args.model_path)

View File

@ -0,0 +1,50 @@
{"id": 0, "inputs_pretokenized": "Ghost in the Shell -- Animation studio Production I.G has produced several different anime adaptations of Ghost in the Shell, starting with the 1995 film of the same name, telling the story of Section 9's investigation of the Puppet Master. The television series Ghost in the Shell: Stand Alone Complex followed in 2002, telling an alternate story from the manga and first film, featuring Section 9's investigations of government corruption in the Laughing Man and Individual Eleven incidents. A sequel to the 1995 film, Ghost in the Shell 2: Innocence, was released in 2004. In 2006, the film Ghost in the Shell: Stand Alone Complex - Solid State Society retook the story of the television series. 2013 saw the start of the Ghost in the Shell: Arise original video animation (OVA) series, consisting of four parts through mid-2014. The series was recompiled in early 2015 as a television series titled Ghost in the Shell: Arise - Alternative Architecture, airing with an additional two episodes (one part). An animated feature film produced by most of the Arise staff, titled Ghost in the Shell: The New Movie, was released on June 20, 2015. A live-action American film of the same name was released on March 31, 2017.\nQuestion: is ghost in the shell based on the anime?\nAnswer:"}
{"id": 1, "inputs_pretokenized": "The Walking Dead (season 8) -- The eighth season of The Walking Dead, an American post-apocalyptic horror television series on AMC, premiered on October 22, 2017, and concluded on April 15, 2018, consisting of 16 episodes. Developed for television by Frank Darabont, the series is based on the eponymous series of comic books by Robert Kirkman, Tony Moore, and Charlie Adlard. The executive producers are Kirkman, David Alpert, Scott M. Gimple, Greg Nicotero, Tom Luse, and Gale Anne Hurd, with Gimple as showrunner for his fifth and final season. The eighth season received positive reviews from critics. It was nominated for multiple awards and won two, including Best Horror Television Series for the third consecutive year, at the 44th Saturn Awards.\nQuestion: is there gonna be a season 8 of the walking dead?\nAnswer:"}
{"id": 2, "inputs_pretokenized": "Onyx -- Brazilian green onyx was often used as plinths for art deco sculptures created in the 1920s and 1930s. The German sculptor Ferdinand Preiss used Brazilian green onyx for the base on the majority of his chryselephantine sculptures. Green onyx was also used for trays and pin dishes -- produced mainly in Austria -- often with small bronze animals or figures attached.\nQuestion: is there such a thing as green onyx?\nAnswer:"}
{"id": 3, "inputs_pretokenized": "Wachovia -- The acquisition of Wachovia by Wells Fargo was completed on December 31, 2008 after a government-forced sale to avoid Wachovia's failure. The Wachovia brand was absorbed into the Wells Fargo brand in a process that lasted three years: on October 15, 2011, the last Wachovia branches in North Carolina were converted to Wells Fargo.\nQuestion: is wells fargo and wachovia the same bank?\nAnswer:"}
{"id": 4, "inputs_pretokenized": "Friday Night Lights (film) -- Friday Night Lights is a 2004 American sports drama film, directed by Peter Berg, which 'dramatized' the coach and players of a high school football team in the Texas city of Odessa that supported and was obsessed with them. The book on which it was based, Friday Night Lights: A Town, a Team, and a Dream (1990) by H.G. Bissinger, followed the story of the 1988 Permian High School Panthers football team as they made a run towards the state championship. A television series of the same name premiered on October 3, 2006 on NBC. The film won the Best Sports Movie ESPY Award and was ranked number 37 on Entertainment Weekly's list of the Best High School Movies.\nQuestion: is friday night lights movie based on a true story?\nAnswer:"}
{"id": 5, "inputs_pretokenized": "Peace bond -- The use of peace bonds is rather uncommon in the U.S. justice system, but a deferred prosecution has a similar effect. Since there is no conviction or admission of any guilt, signing a peace bond in Canada does not usually result in U.S. inadmissibility under INA \u00a7 212 (a) (2).\nQuestion: is a peace bond an admission of guilt?\nAnswer:"}
{"id": 6, "inputs_pretokenized": "Eating mucus -- Mucophagy, despite its benefits on one's immunity, comes with some health risks due to the potential physical aggravation resulting from the action of nose picking, and the germs on fingers and in mucus. Picking one's nose can cause upper airway irritation as well as other injuries including nasal septal perforation (a ``through-and-through defect'' of the cartilage separating the nostrils), and epistaxis (nosebleed). In a study by Andrade and Srihari, 25% of subjects were ailed by nose bleeds, 17% with nasal infections, and 2% with damage more serious than bleeding. W. Buzina studied the fungal diversity in nasal mucus in 2003. 104 samples were gathered with 331 identifiable strains of fungi and 9 different species per patient.\nQuestion: does eating your boogers improve your immune system?\nAnswer:"}
{"id": 7, "inputs_pretokenized": "High-altitude flatus expulsion -- High-altitude flatus expulsion (HAFE) is a gastrointestinal syndrome which involves the spontaneous passage of increased quantities of rectal gases at high altitudes. First described by Joseph Hamel in c. 1820 and occasionally described afterward, a landmark study of this phenomenon was published in 1981 by Paul Auerbach and York Miller.\nQuestion: do you have more gas at higher altitudes?\nAnswer:"}
{"id": 8, "inputs_pretokenized": "Big Boss (Metal Gear) -- Big Boss is one of the central characters in the Metal Gear video game series. He was introduced in the original Metal Gear games for the MSX2 as the commanding officer and subsequent nemesis of Solid Snake. He is later featured as Naked Snake, the protagonist of Metal Gear Solid prequels where he is initially depicted as an American Special Forces Operator and decorated war hero until political manipulations cause him to be disillusioned and start his own private mercenary company. Big Boss's character has been praised by video game publications for his role as a villain as well for his relationship with Solid Snake. As the series' chronology progressed, his exact allegiance and motivations became increasingly complex; his first appearances are depicted as a traitor dreaming of a world of perpetual war, but subsequent appearances have revealed him to be a key figure in an ideological dispute that shaped the latter half of the twentieth century and a man whose conscience was disturbed by the attitude of leaders towards soldiers, prompting his decision to become a soldier of fortune and Venom Snake's mental template.\nQuestion: is solid snake and big boss the same person?\nAnswer:"}
{"id": 9, "inputs_pretokenized": "Jessie (2011 TV series) -- After casting was finalized and changes were made to several of the characters to suit the actors chosen, the series skipped the pilot phase and was put directly into production. Filming began in June 2011 on Stage 3/8 at Hollywood Center Studios which, prior to start of production, served as the sound stage where the Disney Channel series Wizards of Waverly Place was taped. 13 episodes were originally ordered for the first season, but while the show's first season was in production, Disney Channel ordered an additional seven episodes, bringing the total number of episodes for the first season to 20. When asked about the atmosphere on set during an interview with MSN TV, Ryan described her relationship with the young cast: ``I definitely feel like a nanny! They are smart kids, but they're real kids. They like to have fun. My policy is: We can play hard, as long as we work hard, and because we work hard, we need to play hard.'' Filming on the series wrapped on February 22, 2015.\nQuestion: is the show jessie filmed in new york?\nAnswer:"}
{"id": 10, "inputs_pretokenized": "Song of Songs -- The Song of Songs, also Song of Solomon or Canticles (Hebrew: \u05e9\u05b4\u05c1\u05d9\u05e8 \u05d4\u05b7\u05e9\u05b4\u05bc\u05c1\u05d9\u05e8\u05b4\u05d9\u05dd\u202c, \u0160\u00eer Ha\u0161\u0160\u00eer\u00eem, Greek: \u1f8e\u03c3\u03bc\u03b1 \u1f8e\u03c3\u03bc\u03ac\u03c4\u03c9\u03bd, asma asmaton, both meaning Song of Songs), is one of the megillot (scrolls) found in the last section of the Tanakh, known as the Ketuvim (or ``Writings''), and a book of the Old Testament.\nQuestion: is the song of songs the same as the song of solomon?\nAnswer:"}
{"id": 11, "inputs_pretokenized": "Northwest Florida State College -- The school voted to change its name to Okaloosa-Walton Community College in 1988, and gained four-year status in 2003, thus changing its name to Okaloosa-Walton College.\nQuestion: is northwest florida state college a 4 year college?\nAnswer:"}
{"id": 12, "inputs_pretokenized": "A Quiet Place (film) -- A Quiet Place is a production of Sunday Night and Platinum Dunes; it was produced on a budget of $17 million. Krasinski wrote the screenplay with story co-writers Scott Beck and Bryan Woods. Beck and Woods grew up together in the US state of Iowa, and had watched numerous silent films in college. By 2013, they began working on the story that would lead to the film. They used their experience growing up close to farmland as the basis, including a grain silo setting as a place considered dangerous in their upbringing. They initiated their approach with a 15-page proof of concept. Initially, the writers had considered developing the film into a Cloverfield installment, but after pitching their ideas to the studio collectively, all of those involved decided to keep the film its own entity.\nQuestion: is the movie the quiet place based on a book?\nAnswer:"}
{"id": 13, "inputs_pretokenized": "2018 FIFA World Cup qualification \u2013 UEFA Group G -- The group winners, Spain, qualified directly for the 2018 FIFA World Cup. The group runners-up, Italy, advanced to the play-offs as one of the best 8 runners-up, where they lost to Sweden and thus failed to qualify for the first time since 1958.\nQuestion: did spain qualify for the 2018 world cup?\nAnswer:"}
{"id": 14, "inputs_pretokenized": "Red squirrel -- The eastern grey squirrel and the red squirrel are not directly antagonistic, and violent conflict between these species is not a factor in the decline in red squirrel populations. However, the eastern grey squirrel appears to be able to decrease the red squirrel population due to several reasons:\nQuestion: are grey and red squirrels the same species?\nAnswer:"}
{"id": 15, "inputs_pretokenized": "Bermuda -- Bermuda is a group of low-forming volcanoes in the Atlantic Ocean, near the western edge of the Sargasso Sea, roughly 578 nautical miles (1,070 km; 665 mi) east-southeast of Cape Hatteras on the Outer Banks of North Carolina and about 594 nautical miles (1,100 km; 684 mi) southeast of Martha's Vineyard of Massachusetts. It is 898 nautical miles (1,663 km; 1,033 mi) northeast of Miami, Florida, and 667 nautical miles (1,235 km; 768 mi) from Cape Sable Island, in Nova Scotia, Canada. The islands lie due east of Fripp Island, South Carolina, west-northwest of Cape Verde, southeast of New York City, New York, north-northwest of Brazil and 1,759 km (1,093 mi) north of Cuba.\nQuestion: is bermuda off the coast of south carolina?\nAnswer:"}
{"id": 16, "inputs_pretokenized": "The People's Court -- The losing party does not actually need to pay the judgment, as such. Instead (as is stated in the disclaimer at the end of each show), both parties are paid from a fund (set up by Ralph Edwards-Stu Billett Productions). This fund was based on the amount of the lawsuit claim, but an exact formula was not stated. The fund was to be first divided equally, then any monetary judgment ordered was subtracted from the loser's half (and presumably both halves in the case of cross judgments). Each litigant received at least what remained of their half in shows concluding with that disclaimer.\nQuestion: do litigants on people's court get paid?\nAnswer:"}
{"id": 17, "inputs_pretokenized": "Texas -- Texas (/\u02c8t\u025bks\u0259s/, locally /-s\u0259z/; Spanish: Texas or Tejas (\u02c8texas)) is the second largest state in the United States by both area and population. Geographically located in the South Central region of the country, Texas shares borders with the U.S. states of Louisiana to the east, Arkansas to the northeast, Oklahoma to the north, New Mexico to the west, and the Mexican states of Chihuahua, Coahuila, Nuevo Le\u00f3n, and Tamaulipas to the southwest, while the Gulf of Mexico is to the southeast.\nQuestion: is texas the biggest state in the us?\nAnswer:"}
{"id": 18, "inputs_pretokenized": "The Adventures of Tintin (film) -- Spielberg acquired rights to produce a film based on The Adventures of Tintin series following Herg\u00e9's death in 1983, and re-optioned them in 2002. Filming was due to begin in October 2008 for a 2010 release, but release was delayed to 2011 after Universal opted out of producing the film with Paramount, who provided $30 million on pre-production. Sony chose to co-produce the film. The delay resulted in Thomas Sangster, who had been originally cast as Tintin, departing from the project. Producer Peter Jackson, whose company Weta Digital provided the computer animation, intends to direct a sequel. Spielberg and Jackson also hope to co-direct a third film. The world premi\u00e8re took place on 22 October 2011 in Brussels. The film was released in the United Kingdom and other European countries on 26 October 2011, and in the United States on 21 December 2011, in Digital 3D and IMAX.\nQuestion: will there be a adventures of tintin 2?\nAnswer:"}
{"id": 19, "inputs_pretokenized": "Emma Pillsbury -- Emma Pillsbury Schuester (previously Pillsbury-Howell) is a fictional character from the Fox musical comedy-drama series Glee. Portrayed by actress Jayma Mays, Emma has appeared in Glee from its pilot episode, first broadcast on May 19, 2009. Emma was developed by Glee creators Ryan Murphy, Brad Falchuk and Ian Brennan. She is a guidance counselor at the fictional William McKinley High School in Lima, Ohio where the series is set. Emma suffers from obsessive-compulsive disorder and has romantic feelings for glee club director Will Schuester (Matthew Morrison), but becomes engaged to football coach Ken Tanaka (Patrick Gallagher) as Will is married. Ken ultimately breaks up with her on their wedding day because of her feelings for Will, and when Will leaves his wife Terri (Jessalyn Gilsig), he and Emma share a kiss. Their relationship is short-lived, and in the second season, Emma and her dentist boyfriend Carl Howell (John Stamos) marry in Las Vegas. The wedding is later annulled as it was unconsummated. At the beginning of the third season, she and Will are living together; they become engaged shortly after New Years, and consummate their relationship near the end of the school year. Emma leaves Will at the altar midway through the fourth season, but the two later reconcile and marry in the season finale. She becomes pregnant during the middle of the fifth season.\nQuestion: do will and emma get together in glee?\nAnswer:"}
{"id": 20, "inputs_pretokenized": "The Princess and the Goblin (film) -- The Princess and the Goblin (Hungarian: A hercegn\u0151 \u00e9s a kobold) is a 1991 British-Hungarian-American animated musical fantasy film directed by J\u00f3zsef G\u00e9mes and written by Robin Lyons, an adaptation of George MacDonald's 1872 novel of the same name.\nQuestion: is the princess and the goblin a disney movie?\nAnswer:"}
{"id": 21, "inputs_pretokenized": "WWE draft -- On May 25, 2016, due to SmackDown moving to Tuesdays and to a live broadcast starting July 19, necessitating a brand extension, WWE announced that the draft would be returning. It would later be announced that the 2016 WWE draft would take place on July 19 during SmackDown's first live broadcast, which was also the first time that the draft took place on SmackDown. The 2017 draft was labeled the Superstar Shake-up as instead of a traditional draft, the general managers of Raw and SmackDown could trade and make deals between their respective talent.\nQuestion: is there going to be a wwe draft in 2017?\nAnswer:"}
{"id": 22, "inputs_pretokenized": "Izzie Stevens -- Heigl garnered critical acclaim for her performance as Izzie and received numerous awards and nominations for her role, winning the ``Outstanding Supporting Actress In A Drama Series'' at the 2007 Emmy Awards. She was critical of the character's development during the show's fourth season, particularly her romance with George. She declined to put herself forward for the 2008 Emmy Awards, citing insufficient material in the role. After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma. She married Alex in the series' one-hundredth episode, and afterwards, her tumor was successfully removed. Izzie made her final appearance in the sixth season, leaving Seattle after Alex refused to resume their marriage. Heigl requested to be released from her contract 18 months early, in order to spend more time with her family. In January 2012, Heigl reported that she would like to return to Grey's Anatomy to give closure to her character, however, Rhimes confirmed that there were no plans to have the character return at that time and has since stated that she has no plans to ever re-approach Izzie's storyline again.\nQuestion: does izzie come back in grey's anatomy?\nAnswer:"}
{"id": 23, "inputs_pretokenized": "Sam Beckett -- When Sam corrected the timeline, he leaped forward, but not all the way home; this time, he found himself assuming the identity of a minor-league professional baseball player named Tim Fox. For the rest of his life (an epilogue in the series finale tells us Sam never gets home, but in our terms, it was the next four years/five seasons, the duration of the show) Sam would continue to travel back and forth through time; swapping identities with various people and as a tagline for the show reiterated, ``setting right what once went wrong.''\nQuestion: did sam ever make it home in quantum leap?\nAnswer:"}
{"id": 24, "inputs_pretokenized": "Safety (gridiron football score) -- In gridiron football, the safety (American football) or safety touch (Canadian football) is a scoring play that results in two points (or, in rare cases, one point) being awarded to the scoring team. Safeties can be scored in a number of ways, such as when a ball carrier is tackled in his own end zone or when a foul is committed by the offense in their own end zone. After a safety is scored in American football, the ball is kicked off to the team that scored the safety from the 20-yard line; in Canadian football, the scoring team also has the options of taking control of the ball at their own 35-yard line or kicking off the ball, also at their own 35-yard line. The ability of the scoring team to receive the ball through a kickoff differs from the touchdown and field goal, which require the scoring team to kick the ball off to the scored upon team. Despite being of relatively low point value, safeties can have a significant impact on the result of games, and Brian Burke of Advanced NFL Stats estimated that safeties have a greater abstract value than field goals, despite being worth a point less, due to the field position and reclaimed possession gained off the safety kick.\nQuestion: is it possible to get 1 point in football?\nAnswer:"}
{"id": 25, "inputs_pretokenized": "Atomic number -- The atomic number or proton number (symbol Z) of a chemical element is the number of protons found in the nucleus of an atom. It is identical to the charge number of the nucleus. The atomic number uniquely identifies a chemical element. In an uncharged atom, the atomic number is also equal to the number of electrons.\nQuestion: is the atomic number equal to the number of protons?\nAnswer:"}
{"id": 26, "inputs_pretokenized": "Tick (comics) -- In the Amazon Prime video series, The Tick is fixated on Arthur, and even mentions at one point that his thinking is fuzzy when away from Arthur. Despite Arthur's repeated attempts to push The Tick away, the hero won't leave Arthur's side for long. The Tick also frequently talks about Destiny as if she is a literal person, guiding Arthur's path (``Destiny gave him the suit. I just acted in more of a 'delivery man' role''), alluding to the Parcae in Roman mythology. At one point, Arthur starts to believe that The Tick is merely another hallucination, but that thought is quickly dispelled when Arthur's sister, Dot, interacts with ``The Blue Guy.''\nQuestion: is the tick part of arthur's imagination?\nAnswer:"}
{"id": 27, "inputs_pretokenized": "Game of Thrones -- Game of Thrones is an American fantasy drama television series created by David Benioff and D.B. Weiss. It is an adaptation of A Song of Ice and Fire, George R.R. Martin's series of fantasy novels, the first of which is A Game of Thrones. It is filmed in Belfast and elsewhere in the United Kingdom, Canada, Croatia, Iceland, Malta, Morocco, Spain, and the United States. The series premiered on HBO in the United States on April 17, 2011, and its seventh season ended on August 27, 2017. The series will conclude with its eighth season premiering either in 2018 or 2019.\nQuestion: is this the last season of gsme of thrones?\nAnswer:"}
{"id": 28, "inputs_pretokenized": "State supreme court -- The court consists of a panel of judges selected by methods outlined in the state constitution. State supreme courts are completely distinct from any United States federal courts located within the geographical boundaries of a state's territory, or the federal United States Supreme Court (although appeals, on some issues, from judgments of a state's highest court can be sought in the U.S. Supreme Court).\nQuestion: can a state supreme court decision be appealed?\nAnswer:"}
{"id": 29, "inputs_pretokenized": "Snake River -- The Snake River is the thirteenth longest river in the United States. Its watershed is the 10th largest among North American rivers, and covers almost 108,000 square miles (280,000 km) in portions of six U.S. states: Wyoming, Idaho, Nevada, Utah, Oregon, and Washington, with the largest portion in Idaho. Most of the Snake River watershed lies between the Rocky Mountains on the east and the Columbia Plateau on the northwest. The largest tributary of the Columbia River, the Snake River watershed makes up about 41% of the entire Columbia River Basin. Its average discharge at the mouth constitutes 31% of the Columbia's flow at that point. Above the confluence, the Snake is slightly longer than the Columbia--1,078 miles (1,735 km) compared to 928 miles (1,493 km)--and its drainage basin is slightly larger--4% bigger than the upstream Columbia River watershed.\nQuestion: does the snake river flow into the columbia river?\nAnswer:"}
{"id": 30, "inputs_pretokenized": "Outlier -- Deletion of outlier data is a controversial practice frowned upon by many scientists and science instructors; while mathematical criteria provide an objective and quantitative method for data rejection, they do not make the practice more scientifically or methodologically sound, especially in small sets or where a normal distribution cannot be assumed. Rejection of outliers is more acceptable in areas of practice where the underlying model of the process being measured and the usual distribution of measurement error are confidently known. An outlier resulting from an instrument reading error may be excluded but it is desirable that the reading is at least verified.\nQuestion: can there be outliers in a normal distribution?\nAnswer:"}
{"id": 31, "inputs_pretokenized": "Ready Player One -- Ready Player One is a 2011 science fiction novel, and the debut novel of American author Ernest Cline. The story, set in a dystopian 2040s, follows protagonist Wade Watts on his search for an Easter egg in a worldwide virtual reality game, the discovery of which will lead him to inherit the game creator's fortune. Cline sold the rights to publish the novel in June 2010, in a bidding war to the Crown Publishing Group (a division of Random House). The book was published on August 16, 2011. An audiobook was released the same day; it was narrated by Wil Wheaton, who was mentioned briefly in one of the chapters. In 2012, the book received an Alex Award from the Young Adult Library Services Association division of the American Library Association and won the 2012 Prometheus Award.\nQuestion: is ready player one based on a true story?\nAnswer:"}
{"id": 32, "inputs_pretokenized": "Four-leaf clover -- The four-leaf clover is a rare variation of the common three-leaf clover. According to traditional superstition, such clovers bring good luck, though it is not clear when or how that superstition got started. The earliest mention of ``Fower-leafed or purple grasse'' is from 1640 and simply says that it was kept in gardens because it was ``good for the purples in children or others''. A description from 1869 says that four-leaf clovers were ``gathered at night-time during the full moon by sorceresses, who mixed it with vervain and other ingredients, while young girls in search of a token of perfect happiness made quest of the plant by day''. The first reference to luck might be from an 11-year-old girl, who wrote in an 1877 letter to St. Nicholas Magazine, ``Did the fairies ever whisper in your ear, that a four-leaf clover brought good luck to the finder?''\nQuestion: is there such a thing as a four leaf clover?\nAnswer:"}
{"id": 33, "inputs_pretokenized": "Statutory declaration -- Statutory declarations are commonly used to allow a person to declare something to be true for the purposes of satisfying some legal requirement or regulation when no other evidence is available. They are thus similar to affidavits (which are made on oath).\nQuestion: can a statutory declaration be used as evidence?\nAnswer:"}
{"id": 34, "inputs_pretokenized": "Convention to propose amendments to the United States Constitution -- To become part of the Constitution, an amendment must be ratified by either--as determined by Congress--the legislatures of three-fourths (presently 38) of the states or State ratifying conventions in three-fourths of the states. Thirty-three amendments to the United States Constitution have been approved by Congress and sent to the states for ratification. Twenty-seven of these amendments have been ratified and are now part of the Constitution. As of 2018, the convention process has never been used for proposing constitutional amendments.\nQuestion: has there ever been a convention of states?\nAnswer:"}
{"id": 35, "inputs_pretokenized": "South African English -- SAE is an extraterritorial (ET) variety of English, or a language variety that has been ``transported'' outside its mainland home. More specifically, SAE is a Southern hemisphere ET originating from later English colonisation in the 18th and 19th centuries (Zimbabwean, Australian, and New Zealand English are also Southern hemisphere ET varieties). SAE resembles British English more closely than it does American English due to the close ties that South African colonies maintained with the mainland in the 19th and 20th centuries. However, with the increasing influence of American pop-culture around the world via modes of contact like television, American English has become more familiar in South Africa. Indeed, some American lexical items are becoming alternatives to comparable British terms.\nQuestion: is south african english similar to british english?\nAnswer:"}
{"id": 36, "inputs_pretokenized": "Haroun and the Sea of Stories -- Haroun and the Sea of Stories is a 1990 children's book by Salman Rushdie. It was Rushdie's fifth novel after The Satanic Verses. It is a phantasmagorical story that begins in a city so old and ruinous that it has forgotten its name.\nQuestion: is haroun and the sea of stories a children's book?\nAnswer:"}
{"id": 37, "inputs_pretokenized": "Mandalay Bay -- Mandalay Bay is a 43-story luxury resort and casino on the Las Vegas Strip in Paradise, Nevada. It is owned and operated by MGM Resorts International. One of the property's towers operates as the Delano; the Four Seasons Hotel is independently operated within the Mandalay Bay tower, occupying 5 floors (35--39).\nQuestion: is four seasons las vegas part of mandalay bay?\nAnswer:"}
{"id": 38, "inputs_pretokenized": "Lynette Scavo -- Her world is further shocked when Tom asks for a divorce, and announces that he and Jane will be moving in together. Lynette is devastated, and her rivalry with Jane becomes more heated at Penny's birthday party when they continually try to one up each other. Jane then later tries to reconcile with Lynette, but then she begins to choke on a snack. Lynette hesitates to help Jane, but ultimately comes to her aid and saves her. However, Jane is alarmed at Lynette thinking such an action over believing she thought of letting Jane die. Then on the day of Mike Delfino's funeral, Tom and Lynette comfort each other as Jane looks on. Sparks of their marriage appear and while sitting at the service Lynette thinks back to the day Tom moved out. Mike tries to understand why Lynette isn't fighting for her marriage. He then reveals that everyone in the neighborhood knows that she and Tom belong together. This memory finally causes Lynette to make the decision to fight for her marriage, win Tom back, and dissolve his romance with Jane. In With So Little to Be Sure Of Lynette and Tom officially sign their divorce papers ending their marriage. When Lynette hears Tom hasn't filed the papers, she is hopeful but after seeing Tom and Jane kiss at the office, she accepts a date from Tom's boss. It goes well at first but when he plans to transfer Tom to India, Lynette breaks it off. The boss sardonically insults Lynette before Tom about her being hung up on another man and after insults to her, Tom punches him. He and Jane argue with Jane realizing that Tom still loves Lynette and they break up. Tom goes to see Lynette but sees her hugging Lee and (not seeing who it is), thinks Lynette has moved on. He tells her he is filing but in a later talk, they realize how much they love each other and reconcile.\nQuestion: do tom and lynette get back together spoiler?\nAnswer:"}
{"id": 39, "inputs_pretokenized": "List of Major League Baseball single-game home run leaders -- Writers of Sporting News described hitting four home runs in a single Major League Baseball (MLB) game as ``baseball's greatest single-game accomplishment''. Eighteen players have accomplished the feat to date, the most recent being Scooter Gennett on June 6, 2017 against the St. Louis Cardinals. No player has done this more than once in his career and no player has ever hit more than four in a game. Bobby Lowe was the first to hit four home runs in a single game, doing so on May 30, 1894. Fans were reportedly so excited that they threw $160 in silver coins ($4,500 today) onto the field after his fourth home run.\nQuestion: has there ever been a 5 home run game?\nAnswer:"}
{"id": 40, "inputs_pretokenized": "Virginia Cavaliers men's basketball -- The Wahoos, as they are unofficially known, have appeared in the NCAA Tournament twenty-two times, advancing to the Elite Eight six times (1981, 1983, 1984, 1989, 1995, 2016). They further advanced to the 1981 and 1984 Final Fours; in the former winning the last NCAA third place game ever played, defeating No. 1 LSU 78--74. The Cavaliers won the post-season NIT Tournaments of 1980 and 1992.\nQuestion: has university of virginia ever won the ncaa tournament?\nAnswer:"}
{"id": 41, "inputs_pretokenized": "Chiko Roll -- A Chiko Roll's filling is primarily cabbage and barley, as well as carrot, green beans, beef, beef tallow, wheat cereal, celery and onion. This filling is partially pulped and enclosed in a thick egg and flour pastry tube designed to survive handling at football matches. The roll is typically deep-fried in vegetable oil.\nQuestion: is there any meat in a chiko roll?\nAnswer:"}
{"id": 42, "inputs_pretokenized": "Pupil -- The pupil is a hole located in the center of the iris of the eye that allows light to strike the retina. It appears black because light rays entering the pupil are either absorbed by the tissues inside the eye directly, or absorbed after diffuse reflections within the eye that mostly miss exiting the narrow pupil.\nQuestion: is your pupil a hole in your eye?\nAnswer:"}
{"id": 43, "inputs_pretokenized": "Interleague play -- Interleague play in Major League Baseball refers to regular-season baseball games played between an American League (AL) team and a National League (NL) team. Interleague play was first introduced in the 1997 Major League Baseball season. Prior to that, matchups between AL teams and NL teams occurred only during spring training, the All-Star Game, other exhibition games (such as the Hall of Fame Game in Cooperstown, New York), and the World Series. Unlike modern interleague play, none of these contests, except for the World Series, counted toward official team or league records.\nQuestion: does the national league play the american league in the world series?\nAnswer:"}
{"id": 44, "inputs_pretokenized": "Steel-toe boot -- A steel-toe boot (also known as a safety boot, steel-capped boot or safety shoe) is a durable boot or shoe that has a protective reinforcement in the toe which protects the foot from falling objects or compression, usually combined with a mid sole plate to protect against punctures from below.\nQuestion: are steel toe boots made to cut toes off?\nAnswer:"}
{"id": 45, "inputs_pretokenized": "51st state -- Voters in Washington, D.C. and Puerto Rico have both voted for statehood in referendums. As statehood candidates, their admission to the Union requires congressional approval. American Samoa, Guam, the Northern Mariana Islands, and the United States Virgin Islands are also U.S. territories and could potentially become U.S. states someday.\nQuestion: is puerto rico the 51st state of the united states?\nAnswer:"}
{"id": 46, "inputs_pretokenized": "List of The Waltons characters -- Mary Ellen (Judy Norton Taylor) is the oldest of Liv and John's daughters, born in April 1920, aged 13 in season one. Throughout the first few seasons, she is a typically whiny, sometimes rebellious teenager, somewhat of a tomboy who enjoys playing baseball, but could also be vain, engaging in a rivalry with rich-girl Martha-Rose Coverdale for the affections of the awkward G.W. Haines (David Doremus). Mary Ellen matures into a wiser young woman and her childish fantasy of becoming a movie star gives way for a more reasonable and realistic ambition to go into medicine after reading up on it and developing an interest. She then works to gain an education as a medical worker, and becomes a nurse. However, when she ends up taking care of the people out in the country by herself, she concludes they need more medical expertise than she can offer them and continues studying medicine until she succeeds in becoming a fully-fledged doctor. Even though some people frown upon female doctors and she receives mixed support from her family, she refuses to let this stop her. Mary Ellen has a special relationship with each of her six siblings, but she is especially close to her younger sister Erin. Mary Ellen and Erin fought a lot when they were younger girls, particularly in seasons 1 and 2. But in the middle seasons, Mary Ellen and Erin matured and became friends. In season 5 after Mary Ellen married Curt, her relationship with her sister deepened even further and by the end of the show, they truly did become each other's best friend.\nQuestion: does mary ellen become a doctor on the waltons?\nAnswer:"}
{"id": 47, "inputs_pretokenized": "Switched at Birth (film) -- Switched at Birth is a 1991 American television film directed by Waris Hussein. It is based on the true story of Kimberly Mays and Arlena Twigg, babies switched soon after birth in a Florida hospital in 1978.\nQuestion: is switched at birth based on a real story?\nAnswer:"}
{"id": 48, "inputs_pretokenized": "Pine oil -- Pine oil is distinguished from other products from pine, such as turpentine, the low-boiling fraction from the distillation of pine sap, and rosin, the thick tar remaining after turpentine is distilled.\nQuestion: is pine oil and turpentine the same thing?\nAnswer:"}
{"id": 49, "inputs_pretokenized": "Mayfly -- Mayflies (also known as Canadian soldiers in the United States, or shadflies or fishflies in Canada and Michigan; also up-winged flies in the United Kingdom ) are aquatic insects belonging to the order Ephemeroptera. This order is part of an ancient group of insects termed the Palaeoptera, which also contains dragonflies and damselflies. Over 3,000 species of mayfly are known worldwide, grouped into over 400 genera in 42 families.\nQuestion: are canadian soldiers and mayflies the same thing?\nAnswer:"}

View File

@ -0,0 +1,12 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import json
def load_jsonl(dataset_path, key_name='inputs_pretokenized'):
dataset = []
with open(dataset_path, encoding='utf-8') as file:
for line in file:
data = json.loads(line)
text = data[key_name]
dataset.append(text)
return dataset

View File

@ -0,0 +1,176 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import os
import argparse
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlier, AntiOutlierConfig
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
from examples.convert.convert_utils import copy_tokenizer_files, modify_config
from examples.convert.model_slim.get_calibration_dataset import load_jsonl
CPU = "cpu"
NPU = "npu"
def cmd_bool(cmd_arg):
if cmd_arg == "True":
return True
elif cmd_arg == "False":
return False
raise ValueError(f"{cmd_arg} should be a boolean")
def parse_arguments():
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', help="model and tokenizer path")
parser.add_argument('--save_directory')
parser.add_argument(
'--calib_texts',
type=str,
nargs='+',
default=["What's deep learning?"])
parser.add_argument(
'--calib_file',
type=str,
help='CSV or Numpy file containing tokenized input. Alternative to text input.',
default=f"{os.path.join(os.path.dirname(__file__), 'teacher_qualification.jsonl')}")
parser.add_argument(
'--calib_dataset_length',
type=int,
help='Max calibration dataset length.',
default=50)
parser.add_argument('--w_bit', type=int, default=8)
parser.add_argument('--a_bit', type=int, default=8)
parser.add_argument('--disable_names', type=str, nargs='+', default=None)
parser.add_argument('--device_type', type=str, choices=[CPU, NPU], default=CPU)
parser.add_argument('--fraction', type=float, default=0.01)
parser.add_argument("--act_method", type=int, choices=[1, 2, 3], default=1,
help=" `1`: `MinMax`, `2`: `Histogram`, `3`: `Auto`")
parser.add_argument('--co_sparse', type=cmd_bool, default=False)
parser.add_argument('--anti_method', type=str, default='',help=" `m3`: `AWQ`")
parser.add_argument('--disable_level', type=str, default='L0')
parser.add_argument('--input_ids_name', type=str, default='input_ids')
parser.add_argument('--attention_mask_name', type=str, default='attention_mask')
parser.add_argument('--do_smooth', type=cmd_bool, default=False)
parser.add_argument('--use_sigma', type=cmd_bool, default=False)
parser.add_argument('--sigma_factor', type=float, default=3.0)
parser.add_argument('--is_lowbit', type=cmd_bool, default=False)
parser.add_argument('--mm_tensor', type=cmd_bool, default=True)
parser.add_argument('--w_sym', type=cmd_bool, default=True)
parser.add_argument('--use_kvcache_quant', type=cmd_bool, default=False)
parser.add_argument('--open_outlier', type=cmd_bool, default=True)
parser.add_argument('--group_size', type=int, default=64)
return parser.parse_args()
class Quantifier:
def __init__(self, model_path_or_name, quant_config=None, anti_outlier_config=None, device_type='cpu', **kwargs):
self.device_type = device_type
device_map = CPU if self.device_type == CPU else "auto"
self.quant_config = quant_config
self.anti_outlier_config = anti_outlier_config
self.model_path_or_name = model_path_or_name
self.config = AutoConfig.from_pretrained(self.model_path_or_name, trust_remote_code=True)
self.dtype = self.config.torch_dtype if self.device_type == NPU else torch.float32
self.model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=model_path_or_name,
low_cpu_mem_usage=True, torch_dtype=self.dtype,
device_map=device_map,
use_safetensors=True, trust_remote_code=True)
tokenizer_args = kwargs.get("tokenizer_args", {})
self.tokenizer = AutoTokenizer.from_pretrained(
model_path_or_name, use_fast=False, trust_remote_code=True, legacy=False, **tokenizer_args
)
def get_tokenized_data(self, input_texts,
input_ids_name='input_ids',
attention_mask_name='attention_mask'):
tokenized_data = []
for input_text in input_texts:
inputs = self.tokenizer(input_text, return_tensors='pt', padding=True).to(self.device_type)
tokenized_data.append(
[inputs.data[input_ids_name], inputs.data[attention_mask_name]])
return tokenized_data
def convert(self, tokenized_data, save_path, disable_level):
if self.device_type == NPU:
# 避免在线编译算子,使用二进制编译的算子
torch.npu.set_compile_mode(jit_compile=False)
if self.anti_outlier_config is not None:
anti_outlier = AntiOutlier(self.model, calib_data=tokenized_data, cfg=self.anti_outlier_config)
anti_outlier.process()
if not os.path.exists(save_path):
os.mkdir(save_path)
calibrator = Calibrator(self.model, self.quant_config, calib_data=tokenized_data, disable_level=disable_level)
calibrator.run()
calibrator.save(save_path, save_type=["safe_tensor"])
if __name__ == '__main__':
args = parse_arguments()
rank = int(os.getenv("RANK", "0"))
calib_file = args.calib_file
calib_texts = load_jsonl(calib_file) if calib_file else args.calib_texts
model_path = args.model_path
save_directory = args.save_directory
quant_conf = QuantConfig(
w_bit=args.w_bit,
a_bit=args.a_bit,
disable_names=args.disable_names,
dev_type=args.device_type,
dev_id=rank,
act_method=args.act_method,
pr=1.0, # randseed
nonuniform=False,
w_sym=args.w_sym,
mm_tensor=False,
co_sparse=args.co_sparse,
fraction=args.fraction,
sigma_factor=args.sigma_factor,
use_sigma=args.use_sigma,
is_lowbit=args.is_lowbit,
do_smooth=args.do_smooth,
use_kvcache_quant=args.use_kvcache_quant,
open_outlier=args.open_outlier,
group_size=args.group_size
)
anti_outlier_config = None
if args.anti_method == 'm3':
anti_outlier_config = AntiOutlierConfig(a_bit=args.a_bit, w_bit=args.w_bit,
anti_method=args.anti_method, w_sym=args.w_sym, dev_type=args.device_type)
elif args.anti_method:
anti_outlier_config = AntiOutlierConfig(anti_method=args.anti_method)
quantifier = Quantifier(
model_path, quant_conf, anti_outlier_config,
device_type=args.device_type
)
tokenized_calib_data = None
if calib_texts is not None:
tokenized_calib_data = quantifier.get_tokenized_data(
calib_texts,
input_ids_name=args.input_ids_name,
attention_mask_name=args.attention_mask_name
)
if not os.path.exists(save_directory):
os.makedirs(save_directory, exist_ok=True)
#为适配工具稀疏量化传入w_bit=4,a_bit=8暂时修改quant_type
quantifier.convert(tokenized_calib_data, save_directory, args.disable_level)
quant_type = f"w{args.w_bit}a{args.a_bit}"
is_sparseCompress = args.w_bit == 4 and args.a_bit == 8 and (args.co_sparse or args.is_lowbit)
if is_sparseCompress:
quant_type = "w8a8s"
auto_config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
modify_config(model_path, save_directory, auto_config.torch_dtype,
quant_type, args.use_kvcache_quant)
copy_tokenizer_files(model_path, save_directory)

View File

@ -0,0 +1,94 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import argparse
import os
import torch
from atb_llm.runner import ModelRunner
from atb_llm.utils.cpu_binding import NpuHbmInfo
from atb_llm.utils.log import logger, print_log
from atb_llm.models.base.model_utils import unwrap_model_state_dict
from msmodelslim.pytorch.weight_compression import CompressConfig, Compressor
from examples.convert.convert_utils import copy_tokenizer_files, modify_config
class SparseCompressor:
def __init__(self, **kwargs):
self.rank = kwargs.get('rank', '0')
self.world_size = kwargs.get('world_size', '1')
self.model_path = kwargs.get('model_path', None)
self.save_directory = kwargs.get('save_directory', None)
self.multiprocess_num = kwargs.get('multiprocess_num', 8)
self.save_split_w8a8s_dir = kwargs.get('save_split_w8a8s_dir', None)
self.model = ModelRunner(self.model_path, rank=self.rank, world_size=self.world_size)
self.dtype = self.model.dtype
self.quantize = self.model.quantize
self.model.load_weights()
self.device = self.model.device
self.max_memory = NpuHbmInfo.get_hbm_capacity(self.rank, self.world_size, self.model.soc_info.need_nz)
self.init_memory = int(
self.max_memory * NpuHbmInfo.get_hbm_usage(self.rank, self.world_size, self.model.soc_info.need_nz))
print_log(self.rank, logger.info, f'hbm_capacity(GB): {self.max_memory / (1024 ** 3)}, '
f'init_memory(GB): {self.init_memory / (1024 ** 3)}')
self.warm_up_memory = 0
self.warm_up_num_blocks = 0
self.cache_manager = None
if self.save_split_w8a8s_dir is not None:
self.model.save_pretrained(save_directory=f'{self.save_split_w8a8s_dir}_{self.world_size}',
safe_serialization=True)
modify_config(model_path, save_directory, torch.float16, 'w8a8s')
copy_tokenizer_files(model_path, save_directory)
def compress(self):
model_dict = unwrap_model_state_dict(self.model.model.state_dict())
quant_desc = self.model.model.generate_description()
compress_config = CompressConfig(do_pseudo_sparse=False, sparse_ratio=1, is_debug=True,
record_detail_root=self.save_directory,
multiprocess_num=self.multiprocess_num)
compressor = Compressor(compress_config, weight=model_dict, quant_model_description=quant_desc)
compressor.run()
part_save_directory = os.path.join(self.save_directory, f'part{self.rank}-of-{self.world_size}')
os.makedirs(part_save_directory, exist_ok=True)
compressor.export_safetensors(part_save_directory)
def parse_arguments():
parser = argparse.ArgumentParser()
parser.add_argument('--model_path',
help="model and tokenizer path",
default='/data/acltransformer_testdata/weights/llama2/llama-2-70b',
)
parser.add_argument('--save_directory', type=str, required=True)
parser.add_argument('--multiprocess_num', type=int, default=8)
parser.add_argument('--save_split_w8a8s_dir', type=str, default=None)
return parser.parse_args()
if __name__ == '__main__':
args = parse_arguments()
rank = int(os.getenv("RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
input_dict = {
'rank': rank,
'world_size': world_size,
**vars(args)
}
model_path = args.model_path
save_directory = args.save_directory
if not os.path.exists(save_directory):
os.makedirs(save_directory, exist_ok=True)
sparse_compressor = SparseCompressor(**input_dict)
sparse_compressor.compress()
if rank == 0:
modify_config(model_path, save_directory, torch.float16, 'w8a8sc')
copy_tokenizer_files(model_path, save_directory)

View File

@ -0,0 +1,44 @@
{"id": 0, "inputs_pretokenized": "编写中小学教科书的直接依据是____。\nA. 《中华人民共和国教育法》\nB. 课程计划\nC. 课程标准\nD. 课程表", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 1, "inputs_pretokenized": "下列关于课程的三种文本表现形式说法正确的是____\nA. 课程计划是由当地教育主管部门制订的\nB. 课程标准是依据课程计划制定的\nC. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式、螺旋式、交叉式", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 2, "inputs_pretokenized": "悦悦是一名右耳失聪的残疾儿童活动课上有时会听不清楚周老师所讲的内容因此经常提问题。对此周老师应当采取的措施是____。\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿,不理会悦悦", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 3, "inputs_pretokenized": "内流河也称“内陆河”是指没有流入海洋的河流大多分布在大陆内部干燥地区上游降水或冰雪融水为其主要补给水源最终消失于沙漠或注入内陆湖泊。下列中国内流河中最长的是____。\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 4, "inputs_pretokenized": "学校规定学生不能烫染头发但是小文为了彰显个性在假期把头发染成了棕色。面对小文的情况教师应该怎样处理____\nA. 年轻人追求个性是合情合理的,应该宽容对待\nB. 违反学校的校规,应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明小文违反校规的原因,并对其进行劝导和教育", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 5, "inputs_pretokenized": "张老师根据自己班级的情况为解决班级内部班干部的人际关系问题建立和谐融洽的班级氛围自主开发了“和谐人际”的班级课程这体现了教师____。\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 6, "inputs_pretokenized": "刘老师工作很负责学生在学校出现一点问题他就会与家长联系在与家长沟通时他经常以前辈的姿态对待家长对家长的教育方式指指点点。刘老师的做法____。\nA. 正确,老师就应该与家长经常沟通\nB. 正确,老师的经验比家长丰富,应该多指导家长\nC. 不正确,教师没有权利指导家长\nD. 不正确,教师应该与家长建立平等的沟通关系,尊重家长的人格", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 7, "inputs_pretokenized": "在古代印度有一户人家经营一家棉布店销售自己手工制作的衣服。你认为这户人家属于哪个等级____\nA. 婆罗门\nB. 刹帝利\nC. 吠舍\nD. 首陀罗", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 8, "inputs_pretokenized": "“小型分散便于开展多种多样的活动满足学生不同的兴趣、爱好发展学生的才能使学生得到更多的学习和锻炼的机会。”这种课外活动的形式是____。\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 9, "inputs_pretokenized": "小红每天晚上临睡前都要多次反复检查自己的书包确保带齐了第二天需要用的教材和文具。她明知道没有这个必要但就是控制不住。她可能出现了____。\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 10, "inputs_pretokenized": "国家管理和评价课程的基础是____。\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 11, "inputs_pretokenized": "儿童坚持性发生明显质变的年龄约在____\nA. 34岁\nB. 45岁\nC. 56岁\nD. 6岁以后", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 12, "inputs_pretokenized": "《红楼梦》中人物众多、关系繁杂。为了帮助读者阅读许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图。这属于____。\nA. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 13, "inputs_pretokenized": "学期结束时班主任王老师会对学生思想品德的发展变化情况进行评价。这项工作属于____。\nA. 工作总结\nB. 工作计划\nC. 操行评定\nD. 建立学生档案", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 14, "inputs_pretokenized": "人们常说“教学有法而教无定法。”这反映了教师的劳动具有____。\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 15, "inputs_pretokenized": "县级以上地方各级人民代表大会是县级以上地方国家权力机关其职权不包括____。\nA. 改变或撤销本级人大常务委员会不适当的决定\nB. 选举并有权罢免本级人民法院院长\nC. 批准本行政区域内的预算执行情况的报告\nD. 决定并宣布下一级行政区城进入紧急状态", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 16, "inputs_pretokenized": "在心理健康课上同一批学生在第二次进行同样内容的人格测验时获得的分数与上次测验差别较大。这说明该测验存在的问题是____。\nA. 信度问题\nB. 效度问题\nC. 难度问题\nD. 区分度问题", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 17, "inputs_pretokenized": "李老师在教学生区分形近字“渴”“竭”“碣”“谒”时将四个字相同的右半部分用白色粉笔写出相异的左半部分用彩色粉笔写出。李老师运用了知觉的____。\nA. 整体性\nB. 选择性\nC. 理解性\nD. 恒常性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 18, "inputs_pretokenized": "兰兰学会走路后,就要很喜欢尝试自己穿衣、吃饭、捡东西,喜欢探索周围世界。按照埃里克森人格发展阶段理论,兰兰所处的发展阶段是____\nA. 信任对怀疑\nB. 自立对羞怯\nC. 主动感对内疚感\nD. 勤奋感对自卑感", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 19, "inputs_pretokenized": "杨老师在教授生字词的过程中发现部分学生有缺笔少画的现象于是他把“小学生缺笔少画现象的原因及对策研究”作为研究课题拟订相应的研究计划在工作中收集、整理相关资料并实施教学措施最后根据反馈信息调整教学方案。这种研究方法属于____。\nA. 教育行动研究法\nB. 教育实验法\nC. 教育叙事研究法\nD. 个案研究法", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 20, "inputs_pretokenized": "小青的数学成绩不好她认为这是因为自己脑子笨不是学数学的料。她的这种归因属于____。\nA. 内部、稳定,不可控的归因\nB. 外部、稳定、可控的归因\nC. 内部、不稳定,可控的归因\nD. 外部,不稳定,不可控的归因", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 21, "inputs_pretokenized": "中小学教科书不同于其他任何书籍的基本特点是内容的____。\nA. 准确性\nB. 示范性\nC. 新颖性\nD. 基础性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 22, "inputs_pretokenized": "王老师在课堂上给学生演示了与知识点有关的几个实验。这属于____。\nA. 实物直观\nB. 模象直观\nC. 言语直观\nD. 思维直观", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 23, "inputs_pretokenized": "在Excel中单元格A1, A2, A3中的内容依次为数值123单元格A4中的内容为字符前添加了英文单撇号“”的文本字符“3”在单元格A5的编辑栏输入公式“=COUNT( A1A4) +12”并点击回车键A5单元格的内容为____。\nA. 15\nB. 21\nC. 12\nD. 18", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 24, "inputs_pretokenized": "唐朝时形成了“父教其子子教其弟”“五尺童子耻不言文墨焉”的社会风尚它的形成主要得益于____。\nA. 社会经济的繁荣\nB. 科举制度的推行\nC. 学校体系的完备\nD. 三省六部制的确立", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 25, "inputs_pretokenized": "教导处的刘老师抓到两名学生藏在厕所里偷偷抽烟于是把他们叫到办公室慢悠悠地点燃了一根香烟准备耐心细致地给他们做思想工作。对此以下说法错误的是____。\nA. 刘老师既禁止学生抽烟,又能耐心劝导,严慈相济,真正做到了关爱学生\nB. 刘老师要求学生不要抽烟,却在学生面前抽烟,违背了为人师表的要求\nC. 刘老师的抽烟行为与他教导学生不能抽烟的言词相悖,很容易损害自己的威信\nD. 刘老师的行为表明教师队伍中存在一些教师需要对其加强师风师德建设的", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 26, "inputs_pretokenized": "小班幼儿看木偶剧表演时看到“老虎”会感到害怕。这说明幼儿的____\nA. 想象脱离现实\nB. 想象与现实混淆\nC. 想象容易受情绪影响\nD. 想象内容零散", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 27, "inputs_pretokenized": "有的成语与历史人物密切相关。下列选项中与“狡兔三窟”相关的历史人物是____。\nA. 管仲与齐桓公\nB. 毛遂与平原君\nC. 冯谖与孟尝君\nD. 曹刿与鲁庄公", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 28, "inputs_pretokenized": "王浩同学活动过多、注意力不集中、冲动行为多。这种心理障碍可能是____。\nA. 多动综合征\nB. 学习困难综合征\nC. 儿童厌学症\nD. 儿童强迫行为", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 29, "inputs_pretokenized": "在对班级学生进行教育时班主任李老师引导学生对自己每日的学习、行为进行反省。李老师主要运用的德育方法是____。\nA. 自我修养法\nB. 榜样示范法\nC. 实践锻炼法\nD. 情感陶冶法", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 30, "inputs_pretokenized": "在讲解方程时王老师先讲一元一次方程再讲二元一次方程然后讲一元二次方程逐步加深难度。这种教学方式所遵循的原则是____。\nA. 理论联系实际原则\nB. 启发性原则\nC. 循序渐进原则\nD. 巩固性原则", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 31, "inputs_pretokenized": "近代原子核物理学之父是____。\nA. 普朗克\nB. 卢瑟福\nC. 玻尔\nD. 霍金", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 32, "inputs_pretokenized": "很多人因为有了受教育的机会而得到了和父辈完全不同的人生发展机遇。这说明教育在人的发展中起到____。\nA. 辅助作用\nB. 决定作用\nC. 次要作用\nD. 主导作用", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 33, "inputs_pretokenized": "下面是中国古代四大名著中的人物与情节其中搭配不当的一项是____。\nA. 鲁智深——倒拔垂杨柳\nB. 孙悟空——大闹天宫\nC. 周瑜——三顾茅庐\nD. 刘姥姥——进大观园", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 34, "inputs_pretokenized": "找规律填数字是一项很有趣的活动特别锻炼观察和思考能力。下列选项中填入数列“1、7、8、57、____、26050”空缺处的数字符合该组数字排列规律的是____。\nA. 456\nB. 457\nC. 458\nD. 459", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 35, "inputs_pretokenized": "教育自身的许多规律是人类长期教育实践认识的结果它们不会因政治经济制度和其他文化的发展而过时更不会随时代的发展而被否定。这说明教育具有____。\nA. 历史性\nB. 永恒性\nC. 阶级性\nD. 相对独立性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 36, "inputs_pretokenized": "高中毕业会考是一种达标考试属于____。\nA. 定量评价\nB. 相对性评价\nC. 形成性评价\nD. 绝对性评价", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 37, "inputs_pretokenized": "下列选项中与“图书”和“音乐书”的逻辑关系相同的一组是____。\nA. “钢笔”和“铅笔”\nB. “蛋糕”和“香油”\nC. “水果”和“西瓜”\nD. “白菜”和“黄瓜”", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 38, "inputs_pretokenized": "语文教师裴老师每天下课后都会对自己一天的工作进行总结反思并记录下来。这属于布鲁巴奇反思方法中的____。\nA. 反思日记\nB. 详细描述\nC. 交流讨论\nD. 行动研究", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 39, "inputs_pretokenized": "以下关于幼儿有意注意发展的表述不正确的是____\nA. 幼儿有意注意发展受大脑发育水平局限\nB. 幼儿有意注意的发展水平较低,无法依靠活动和操作来维持\nC. 幼儿在幼儿园需要遵守各种行为规则,完成各项任务,这都需要幼儿形成或发展有意注意\nD. 教师在组织活动时,要求幼儿保持注意的对象应该是幼儿认知范围以内或幼儿易于理解的事物", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 40, "inputs_pretokenized": "某幼儿园根据幼儿的发展情况将班级分为快班、中班和慢班。对于快班的幼儿安排大量优秀师资和先进设备而对于慢班的幼儿则给予较少的优良教育资源。该幼儿园的做法违背了素质教育内涵中的____。\nA. 以提高国民素质为基本宗旨\nB. 面向全体幼儿\nC. 促进幼儿全面发展\nD. 促进幼儿个性发展", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 41, "inputs_pretokenized": "作为古埃及文明的象征之一____既寄托了古埃及人对死后重生的向往又证明了新一代法老王权统治的神圣不可侵犯充分显示了古埃及人的高度智慧和精湛的建筑艺术。\nA. 金字塔\nB. 帕特农神庙\nC. 圆形竞技场\nD. 麦加清真寺", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 42, "inputs_pretokenized": "在太阳系的八大行星中质量最大和最小的行星分别是____。\nA. 木星;水星\nB. 火星;地球\nC. 金星;水星\nD. 土星;天王星", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 43, "inputs_pretokenized": "据调查教师对学生拳打脚踢的情况现在已经较少存在取而代之的是“心罚”。比如对于成绩不好的学生罚做题目、罚抄单词一百遍。教师这样的行为____。\nA. 是正确的,教育中适当的惩罚是必不可少的\nB. 是正确的,教师没有侵犯学生的身体健康\nC. 是不正确的,教师没能做到依法执教\nD. 是不正确的,教师没能做到团结合作", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}

View File

@ -0,0 +1 @@
[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What's deep learning?"}, {"role": "assistant", "content": "Deep learning is a subset of machine learning that uses artificial neural networks to learn from data."}, {"role": "user", "content": "Can you explain in more detail?"}]

View File

@ -0,0 +1,181 @@
# README
- 悟道·天鹰Aquila 语言大模型是首个具备中英双语知识、支持商用许可协议、国内数据合规需求的开源语言大模型。
- 此代码仓中实现了一套基于NPU硬件的Aquila推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了各Aquila模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI | 长序列 |
| ---------------------- |-------------------------|---------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ | -------------------------- | ---- | ------ | ---- |-----|
| Aquila-7B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | √ | × | √ | √ | × | × | × | × | × | × | × | × | × |
| Aquila2-7B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | √ | × | √ | √ | × | × | × | × | × | × | × | × | × |
| Aquila2-34B | 支持world size 4,8 | × | √ | × | √ | √ | × | × | × | × | × | × | × | × | × |
- 此模型仓已适配的模型版本
- [FalshAI GitHub仓](https://github.com/FlagAI-Open/FlagAI/)
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
|--------|---------------------------------------------------------------------------------------------------------------------|
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`若使用gitee下载的代码则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| script_path | 脚本所在路径Aquila和Aquila2的工作脚本所在路径为`${llm_path}/examples/models/aquila` |
| weight_path | 模型权重路径 |
## 权重
**权重下载**
- [Aquila-7B](https://huggingface.co/BAAI/Aquila-7B/tree/main)
- [Aquila2-7B](https://huggingface.co/BAAI/Aquila2-7B/tree/main)
- [Aquila2-34B](https://huggingface.co/BAAI/Aquila2-34B/tree/main)
**权重转换**
- 参考[此README文件](../../README.md)
**量化权重生成**
- 基于原始的FP16的权重生成量化权重
- W8A8 Antioutlier量化权重请使用以下指令生成
- 暂不支持
- W8A8量化权重请使用以下指令生成
- 暂不支持
- W8A16量化权重请使用以下指令生成
- 暂不支持
- 稀疏量化权重请使用以下指令生成
- 暂不支持
**基础环境变量**
- 参考[此README文件](../../../README.md)
## 推理
### 对话测试
**运行Flash Attention FP16**
- 其余Aquila模型参考以下运行方式
- 运行启动脚本
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_fa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20031`
- 设置卡间通信端口
- 默认使用20031端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export HCCL_BUFFSIZE=120
export HCCL_WHITELIST_DISABLE=1
export ATB_CONTEXT_WORKSPACE_RING=1
export ATB_CONTEXT_WORKSPACE_SIZE=2629145600
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
export ATB_LAUNCH_KERNEL_WITH_TILING=0
export ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=1
export ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=0
```
**运行Flash Attention BF16**
- 暂不支持
**运行Flash Attention W8A8**
- 暂不支持
**运行Flash Attention W8A16**
- 暂不支持
**运行Paged Attention FP16**
- 运行启动脚本
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_pa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20031`
- 设置卡间通信端口
- 默认使用20031端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
```
**运行Paged Attention BF16**
- 暂不支持
**运行Paged Attention W8A8**
- 暂不支持
**运行Paged Attention W8A16**
- 暂不支持
**运行KV cache量化**
- 暂不支持
**运行稀疏量化**
- 暂不支持
**运行MOE量化**
- 暂不支持
## 精度测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
bash run.sh pa_fp16 full_BoolQ 1 aquila_7b ${aquila-7b权重路径} 8
bash run.sh pa_fp16 full_BoolQ 1 aquila2_7b ${aquila2-7b权重路径} 8
bash run.sh pa_fp16 full_BoolQ 1 aquila2_34b ${aquila2-34b权重路径} 8
```
- MMLU测试集精度测试
- 使用GPU测试Aquila模型测试MMLU数据集需修改如下配置
- 1、修改开源文件config.json中max_position_embeddings大于3072
- 2、修改开源文件tokenizer_config.json中model_max_length为3072
## 性能测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
export ATB_LLM_BENCHMARK_ENABLE=1
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 aquila_7b ${aquila-7b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 aquila2_7b ${aquila2-7b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 aquila2_34b ${aquila2-34b权重路径} 8
```
## FAQ
- 更多环境变量见[此README文件](../../README.md)
- 对话测试实际执行的Python文件为`${llm_path}/examples/run_fa.py`和`${llm_path}/examples/run_pa.py`;这两个文件的参数说明见[此README文件](../../README.md)
- 运行时需要通过指令pip listgrep protobuf确认protobuf版本如果版本高于3.20.x请运行指令pip install protobuf==3.20.0进行更新

View File

@ -0,0 +1,23 @@
#Copyright (c) Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
#
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=20031
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export HCCL_BUFFSIZE=120
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
extra_param="--max_output_length=128"
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_fa --model_path $1 $extra_param
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_fa --model_path $1 $extra_param --input_text='假如你是小明,请给小红写一封情书?'
fi

View File

@ -0,0 +1,24 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export ASCEND_RT_VISIBLE_DEVICES=4,5
export MASTER_PORT=20030
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
export INT8_FORMAT_NZ_ENABLE=1
extra_param=""
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_pa --model_path $1 $extra_param
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
fi

View File

@ -0,0 +1,306 @@
# atb_speed_sdk
*提高加速库的易用性,统一下游任务,集成公共能力*
优点:
1. 同时兼容GPU与NPU最大程度减少迁移适配的工作量
2. 屏蔽NPU与GPU的差异用户无感切换
3. 一个配置文件覆盖所有配置
4. 进程安全的日志
# sdk安装
```shell
pip install .
```
# 配置文件使用及样例
## 使用
```python
from atb_speed.common.config import atb_speed_config
config_path = "xxxx"
atb_speed_config.init_config(config_path)
```
## 样例
```
[model]
;模型路径
model_path=../model
;使用的设备号,多卡用逗号分隔,设置多卡,将默认使用并行模式
device_ids=2
;并行通信类型默认是hccl可选hccl/nccl(GPU)
;parallel_backend=hccl
;日志保存路径,默认是执行脚本所在路径
;log_dir=./
;是否绑核0或1默认是1表示开启
;bind_cpu=1
[precision]
;精度测试方法默认为ceval可选ceval/mmlu
mode=ceval
;精度测试工作路径
work_dir=./
;批量精度测试默认是1
batch=1
;每个科目的shot数量默认是5
shot=5
;每个问题的回答长度默认是32
;seq_len_out=32
[performance]
;性能测试模型名称,用于结果文件的命名
model_name=vicuna_13b
;测试的batch size
batch_size=1
;测试的输入的最大2的幂
max_len_exp=10
;测试的输入的最小2的幂
min_len_exp=5
;特定用例测试,格式为[[seq_in,seq_out]],注意当设置这个参数时max_len_exp min_len_exp不生效
;case_pair=[[1,2],[2,3]]
;生成的结果文件名称,默认会自动生成,一般不设置
;save_file_name=
;性能测试方法detail / normal , 默认是normal.要使用detail需要配合装饰器计时并加上环境变量 TIMEIT=1
;perf_mode=
;性能测试时是否只测试generate而跳过decode0/1 默认是0
;skip_decode=
```
# 使用说明
最核心的模块是launcher所有的下游任务都围绕launcher来执行
## launcher [model]
用户通过继承Launcher多卡继承ParallelLauncher 基类来实现自定义launcher。
当前的launcher对GPU和NPU做了自适应适配因此可以通用。
使用launcher时用户需要实现自定义的init_model方法这里需要注意的是self.model_path是从配置文件中读出的。
如果要进行功能测试则需要实现自定义的infer方法。
```python
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher import Launcher
from transformers import AutoTokenizer, AutoModelForCausalLM
class BaichuanLM(Launcher):
def init_model(self):
"""
模型初始化
:return:
"""
tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
model.eval()
model.generation_config = self.remove_part_of_generation_config(model.generation_config)
return model, tokenizer
if __name__ == '__main__':
atb_speed_config.init_config()
baichuan = BaichuanLM()
print("---------------warm-up---------------")
baichuan.infer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->')
print("---------------inference---------------")
baichuan.infer('登鹳雀楼->王之涣\n夜雨寄北->')
baichuan.infer('苹果公司的CEO是')
query_list = ["谷歌公司的CEO是",
'登鹳雀楼->王之涣\n夜雨寄北->',
'苹果公司的CEO是',
'华为公司的CEO是',
'微软公司的CEO是']
baichuan.infer_batch(query_list)
```
# 精度测试
SDK提供了两种精度测试方法ceval和mmlu
## 配置说明 [precision]
| 配置项key | 默认值 | 备注 |
|-------------|-------|-----------------------------------|
| mode | ceval | 精度测试方法。可选ceval/mmlu |
| work_dir | | 精度测试工作路径。必填 |
| batch | 1 | 批量精度测试的批数请注意batch大于1时精度会和等于1时有差别 |
| shot | 5 | 每个科目的shot数量 |
| seq_len_out | 32 | 每个问题的回答长度 |
### 1. 下载测试数据集
ceval
```
wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
unzip ceval-exam.zip -d data
```
mmlu
```shell
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar -xvf data.tar
```
注:wget网络不通请从网页下载并复制
### 2. 配置精度测试相关项
0. 按照推理指导,下载模型及配置路径并安装atb_speed_sdk
1. 新建工作文件夹${precision_work_dir}。
2. 将下载的测试数据集进行解压后的数据和脚本放置在${precision_work_dir}
3. 修改config.ini文件设置设置ceval相关路径
目录结构示例${ceval_work_dir}:
--test_result 跑完之后生成
--data (包含数据文件夹dev、test、val三者)
## 运行脚本
只需要声明一个launcher即可使用
```python
from atb_speed.common.precision import get_precision_test_cls
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher import Launcher
from transformers import AutoTokenizer, AutoModelForCausalLM
class BaichuanLM(Launcher):
def init_model(self):
"""
模型初始化
:return:
"""
tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
model.eval()
model.generation_config = self.remove_part_of_generation_config(model.generation_config)
return model, tokenizer
if __name__ == '__main__':
atb_speed_config.init_config("config.ini")
baichuan = BaichuanLM()
c_t = get_precision_test_cls()(baichuan)
c_t.run()
```
# 性能测试 [performance]
SDK提供了两种性能测试的方法,常规估计法精确打点法。也提供了两种测试方案2幂测试和特定case测试
## 配置说明
| 配置项key | 默认值 | 备注 |
|----------------|--------|---------------------------------------------------------------------------------------|
| model_name | | 性能测试模型名称,用于结果文件的命名 |
| batch_size | 1 | 测试的batch size |
| max_len_exp | 10 | 测试的输入的最大2的幂 |
| min_len_exp | 5 | 测试的输入的最小2的幂 |
| case_pair | | 特定用例测试,格式为[[seq_in,seq_out]],注意当设置这个参数时max_len_exp min_len_exp不生效 |
| save_file_name | | 生成的结果文件名称,默认会自动生成,一般不设置 |
| perf_mode | normal | 性能测试方法detail / normal , 默认是normal.要使用detail需要侵入式替换utils并加上环境变量 RETURN_PERF_DETAIL=1 |
| skip_decode | 0 | 性能测试时是否只测试generate而跳过decode0/1 默认是0 |
## 精确打点法
- 通过在modeling中使用sdk里的计时装饰器进行计时
- 不再需要侵入式修改任何的三方件中的源码支持任意版本的transformers
- perf_mode设为detail
- 将环境变量`TIMEIT`设置成1来开启性能测试为了不影响正常使用默认是0
### Timer介绍
- 将环境变量`TIMEIT`设置成1来开计时为了不影响正常使用默认是0
- 计时的数据是累积的,使用 Timer.reset() 来重置计时器
- 硬件设备上的数据需要同步才能准确计时。在计时前,请使用`Timer.sync = getattr(torch, device_type).synchronize`设置计时器的同步函数
### 如何使用
只需要在最外层的forward函数上方增加timing的计时器即可。
例如:
```python
import torch
from torch import nn
from atb_speed.common.timer import Timer
class AddNet(nn.Module):
def __init__(self, in_dim, h_dim=5, out_dim=1):
super().__init__()
self.fc1 = nn.Linear(in_dim, h_dim)
self.fc2 = nn.Linear(h_dim, out_dim)
@Timer.timing
def forward(self, x, y):
out = torch.cat([x, y], dim=1)
out = torch.relu(self.fc1(out))
out = self.fc2(out)
return out
if __name__ == '__main__':
add_net = AddNet(in_dim=2)
Timer.sync = torch.cuda.synchronize
Timer.reset()
for i in range(5):
x = torch.randn(1, 1)
y = torch.randn(1, 1)
result = add_net.forward(x, y)
print(result)
print(Timer.timeit_res)
print(Timer.timeit_res.first_token_delay)
print(Timer.timeit_res.next_token_avg_delay)
```
## 常规估计法
- 通过第一次生成1个token第2次生成n个token计时作差来估计性能。
- *假设两次推理首token的时延相同*
- perf_mode设为normal
## 运行脚本
```python
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher import Launcher
from atb_speed.common.performance.base import PerformanceTest
from transformers import AutoTokenizer, AutoModelForCausalLM
class LMLauncher(Launcher):
"""
Baichuan2_7B_NPU
"""
def init_model(self):
"""
模型初始化
:return:
"""
tokenizer = AutoTokenizer.from_pretrained(
self.model_path, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
model.eval()
model.generation_config = self.remove_part_of_generation_config(model.generation_config)
return model, tokenizer
if __name__ == '__main__':
atb_speed_config.init_config("config.ini")
performance_test = PerformanceTest(LMLauncher())
performance_test.warm_up()
performance_test.run_test()
```

View File

@ -0,0 +1,122 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
config
"""
import ast
import configparser
import os
import warnings
from dataclasses import dataclass
from typing import Optional, List, Union, Type
class ConfigInitializationError(Exception):
def __init__(self, message):
self.message = message
super().__init__(self.message)
@dataclass
class PrecisionConfig:
work_dir: str = ""
batch: int = 1
shot: int = 5
seq_len_out: int = 32
mode: str = "ceval"
def __post_init__(self):
int_attr = ("batch", "shot", "seq_len_out")
for attr in int_attr:
self.__dict__[attr] = int(self.__dict__[attr])
self.work_dir = os.path.realpath(self.work_dir)
@dataclass
class PerformanceConfig:
model_name: str = ""
batch_size: int = 1
max_len_exp: int = 11
min_len_exp: int = 5
case_pair: Union[Optional[List[int]], str] = None
save_file_name: str = ""
perf_mode: str = "normal"
skip_decode: int = 0
def __post_init__(self):
int_attr = ("batch_size", "max_len_exp", "min_len_exp", "skip_decode")
for attr in int_attr:
self.__dict__[attr] = int(self.__dict__[attr])
if self.case_pair is not None:
self.case_pair = ast.literal_eval(self.case_pair)
@dataclass
class ModelConfig:
model_path: str = ""
device_ids: str = "0"
parallel_backend: str = "hccl"
device_num: int = 1
log_dir: str = os.path.join(os.getcwd(), "atb_speed_log")
bind_cpu: int = 1
def __post_init__(self):
self.model_path = os.path.realpath(self.model_path)
self.device_num = len(self.device_ids.split(","))
int_attr = ("bind_cpu",)
for attr in int_attr:
self.__dict__[attr] = int(self.__dict__[attr])
@dataclass
class Config:
model: ModelConfig = None
performance: PerformanceConfig = None
precision: PrecisionConfig = None
def init_config(self, raw_content_path, allow_modify=False):
if not os.path.exists(raw_content_path):
raise FileNotFoundError(f"{raw_content_path} not exists.")
section_map = {
"model": ModelConfig,
"performance": PerformanceConfig,
"precision": PrecisionConfig
}
if allow_modify:
warn_msg = "Warning, allow_modify has been set as True. " \
"It is dangerous to modify the reserved fields below.\n"
for cfg_key, cfg_cls in section_map.items():
warn_msg = warn_msg + "\n".join(
f"{cfg_key}.{sub_k} is reserved."
for sub_k in cfg_cls.__dict__ if not sub_k.startswith("__")) + "\n"
warnings.warn(warn_msg, DeprecationWarning, stacklevel=2)
conf = configparser.ConfigParser()
conf.read(raw_content_path, encoding="utf-8")
for section_name, section_content in conf.items():
if section_name == "DEFAULT":
continue
if section_name == "ceval":
warnings.warn(
"The section_name [ceval] is deprecated, "
"please refer to readme and use [precision] instead",
DeprecationWarning,
stacklevel=2)
section_name = "precision"
if not hasattr(self, section_name) and not allow_modify:
warnings.warn(f"The section [{section_name}] is not recognized and not allowed to modify.",
UserWarning,
stacklevel=2)
continue
config_cls: Type | None = section_map.get(section_name)
if not config_cls:
raise ConfigInitializationError(f"No configuration class found for section [{section_name}].")
try:
attr = config_cls(**section_content)
except TypeError as e:
raise ConfigInitializationError(f"Invalid configuration for section [{section_name}].") from e
setattr(self, section_name, attr)
atb_speed_config = Config()

View File

@ -0,0 +1,178 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import logging
import os
import subprocess
from dataclasses import dataclass
from typing import List, Dict, Union
import psutil
def execute_command(cmd_list):
with subprocess.Popen(cmd_list,
shell=False,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE) as p:
out, err = p.communicate(timeout=1000)
res = out.decode()
return res
@dataclass
class DeviceInfo:
_info_line: str = ""
npu_id: int = 0
chip_id: int = 0
chip_logic_id: Union[int, str] = 0
chip_name: str = ""
def __post_init__(self):
self.npu_id, self.chip_id, self.chip_logic_id, self.chip_name = self._info_line.strip().split(None, 3)
self.npu_id = int(self.npu_id)
self.chip_id = int(self.chip_id)
if self.chip_logic_id.isnumeric():
self.chip_logic_id = int(self.chip_logic_id)
@dataclass
class CPUBinder:
logger: logging.Logger = logging.getLogger()
@staticmethod
def _get_device_map_info() -> Dict[int, DeviceInfo]:
device_map_info = {}
device_map = execute_command([f"npu-smi", "info", "-m"]).strip().split("\n")[1:]
for line in device_map:
device_info = DeviceInfo(line.strip())
if isinstance(device_info.chip_logic_id, int):
device_map_info[device_info.chip_logic_id] = device_info
return device_map_info
@staticmethod
def _get_pcie_info(devices: List[int], keyword="PCIeBusInfo"):
device_map_info = CPUBinder._get_device_map_info()
device_pcie_tbl = {}
for device in devices:
device_info = device_map_info.get(device)
if not device_info:
raise RuntimeError("Can not get device info, binding cpu will skip.")
pcie_info = execute_command(["npu-smi", "info", "-t", "board", "-i", f"{device_info.npu_id}",
"-c", f"{device_info.chip_id}"]).strip().split("\n")
for _ in pcie_info:
line = ''.join(_.split()) # 此处是因为310P的关键字是 PCIe Bus Info 910是 PCIeBusInfo故去掉空格以此兼容
if line.startswith(keyword):
device_pcie_tbl[device] = line[len(keyword) + 1:]
break
return device_pcie_tbl
@staticmethod
def _get_numa_info(pcie_tbl, keyword="NUMAnode"):
device_numa_tbl = {} # key is device id, value is numa id
numa_devices_tbl = {} # key is numa id, value is device id list
for device, pcie_no in pcie_tbl.items():
numa_info = execute_command(["lspci", "-s", f"{pcie_no}", "-vvv"]).strip().split("\n")
for _ in numa_info:
line = ''.join(_.split())
if line.startswith(keyword):
numa_id = int(line[len(keyword) + 1:])
device_numa_tbl[device] = numa_id
devices = numa_devices_tbl.get(numa_id, None)
if devices is None:
numa_devices_tbl[numa_id] = list()
numa_devices_tbl[numa_id].append(device)
break
return device_numa_tbl, numa_devices_tbl
@staticmethod
def _get_cpu_info(numa_ids, keyword1="NUMAnode", keyword2="CPU(s)"):
cpu_idx_tbl = dict()
numa_keywords = [keyword1 + str(idx) + keyword2 for idx in numa_ids]
cpu_info = execute_command(["lscpu"]).strip().split("\n")
for _ in cpu_info:
line = ''.join(_.split())
if any(line.startswith(word) for word in numa_keywords):
split_info = line.split(":")
cpu_id_ranges = split_info[-1].split(",")
ranges = list()
for range_str in cpu_id_ranges:
endpoints = range_str.split("-")
if len(endpoints) != 2:
raise Exception("lscpu command output error, please check !")
ranges += [cid for cid in range(int(endpoints[0]), int(endpoints[1]) + 1)]
numa_id = int(split_info[0].replace(keyword1, '').replace(keyword2, ''))
cpu_idx_tbl[numa_id] = ranges
return cpu_idx_tbl
def bind_cpus(self, visible_devices: List[int] = None, rank_id: int = 0, ratio: float = 0.5):
"""
可以用export CPU_BINDING_NUM设置每个进程绑的核数;如果不设置CPU_BINDING_NUM,
会根据ratio(numa利用率)进行计算,如果有64个核0.5表示用一半用32个核, 平分给亲和在这个numa上的npu
:param visible_devices:
:param rank_id:
:param ratio:
:return:
"""
if visible_devices is None:
devices = [
int(item.strip())
for item in os.getenv("ASCEND_RT_VISIBLE_DEVICES", None).split(",")
if item.isnumeric()
]
else:
devices = visible_devices
# 获取npu和pcie的对应关系
device_pcie_tbl = self._get_pcie_info(devices)
# 根据pcie信息获取npu和numa的对应关系
device_numa_tbl, numa_devices_tbl = self._get_numa_info(device_pcie_tbl)
# 获取使用的numa对应的cpu核分配信息
cpu_idx_tbl = self._get_cpu_info(list(numa_devices_tbl.keys()))
# 当前rank的npu id
cur_device = devices[rank_id]
# 获取npu对应的numa id
numa_id = device_numa_tbl.get(cur_device)
# 获取共享该numa的npu信息
shard_devices = numa_devices_tbl.get(numa_id)
# 按照npu id进行排序
shard_devices.sort()
# 获取该numa上所有的cpu id信息
all_cpus = cpu_idx_tbl[numa_id]
info_msg = (f"rank_id: {rank_id}, device_id: {cur_device}, numa_id: {numa_id}, "
f"shard_devices: {shard_devices}, cpus: {all_cpus}")
self.logger.info(info_msg)
cpu_nums = len(all_cpus)
# 计算给该共享numa的npu分配的核的个数
cpu_binding_num = os.environ.get("CPU_BINDING_NUM", None)
if cpu_binding_num is None:
cpu_num_per_device = int(cpu_nums * ratio // len(shard_devices))
else:
cpu_num_per_device = int(cpu_binding_num)
if len(shard_devices) * cpu_num_per_device > cpu_nums:
raise Exception(
f"Cpu num in numa {numa_id} to assign {cpu_num_per_device} for every device is not enough, "
f"please decrease the value of CPU_BINDING_NUM!")
# 获取该npu的下标信息
idx = shard_devices.index(cur_device)
# 给该npu分配要绑定的cpu id
binding_cpus = [all_cpus[_] for _ in range(idx * cpu_num_per_device, (idx + 1) * cpu_num_per_device)]
# cpu bind
p = psutil.Process()
p.cpu_affinity(binding_cpus)
new_affinity = p.cpu_affinity()
info_msg = f"process {p.pid}, new_affinity is {new_affinity}, cpu count {cpu_num_per_device}"
self.logger.info(info_msg)

View File

@ -0,0 +1,12 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
common launcher
"""
from atb_speed.common.launcher.base import get_device, DeviceType
if get_device() == DeviceType.npu:
from atb_speed.common.launcher.npu import Launcher, ParallelLauncher
else:
from atb_speed.common.launcher.gpu import Launcher, ParallelLauncher

View File

@ -0,0 +1,244 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
common launcher
"""
import inspect
import logging
import os
import time
from abc import abstractmethod
from enum import Enum
from typing import Dict, Tuple
import torch
from atb_speed.common.config import atb_speed_config
from atb_speed.common.log.logging import init_logger
from transformers import GenerationConfig
class DeviceType(str, Enum):
npu = "npu"
cuda = "cuda"
cpu = "cpu"
def get_device() -> str:
"""
获取当前所在设备
:return:
"""
flag = torch.cuda.is_available()
if flag:
return DeviceType.cuda
try:
import torch_npu
flag = torch.npu.is_available()
except ImportError:
flag = False
return DeviceType.npu if flag else DeviceType.cpu
class BaseLauncher:
"""
BaseLauncher
"""
def __init__(self, device_ids: str = None, model_path="", options=None):
options = {} if options is None else options
self.model_path = atb_speed_config.model.model_path if not model_path else model_path
if device_ids is None and atb_speed_config.model:
device_ids = atb_speed_config.model.device_ids
self.device_ids = device_ids
self.device_id_list = [int(item.strip()) for item in self.device_ids.split(",") if item.isnumeric()]
self.local_rank, self.world_size = self.setup_model_parallel()
self.logger_name = f"device{self.local_rank}_{self.world_size}_{time.time()}.log"
os.makedirs(atb_speed_config.model.log_dir, exist_ok=True)
self.logger_path = os.path.join(atb_speed_config.model.log_dir, self.logger_name)
self.logger = init_logger(logging.getLogger(f"device_{self.local_rank}"), self.logger_path)
if atb_speed_config.model.bind_cpu:
try:
self.bind_cpu()
except Exception as err:
self.logger.error(f"Failed to bind cpu, skip to bind cpu. \nDetail: %s ", err)
self.set_torch_env(self.device_ids, options)
self.model, self.tokenizer = self.init_model()
self.logger.info(self.model.device)
self.logger.info(f"load model from %s successfully!", os.path.basename(inspect.getmodule(self.model).__file__))
self.logger.info(f"load model from %s successfully!", os.path.realpath(inspect.getmodule(self.model).__file__))
@property
def _device(self) -> str:
"""
获取当前所在设备
:return:
"""
return get_device()
@property
def device(self) -> torch.device:
"""
获取模型所在的设备
:return:
"""
return self.model.device
@property
def device_type(self) -> str:
"""
获取模型所在的设备的字符串
:return:
"""
return self.model.device.type
@property
def device_name(self) -> str:
"""
获取所在设备的详细硬件名称
:return:
"""
if self.device_type == DeviceType.npu:
device_name = torch.npu.get_device_name()
elif self.device_type == DeviceType.cuda:
device_name = torch.cuda.get_device_name()
else:
device_name = "cpu"
return "_".join(device_name.split())
@abstractmethod
def init_model(self):
"""
模型初始化
:return:
"""
...
@staticmethod
def set_torch_env(device_ids, options: Dict = None):
"""
:param device_ids:
:param options:
:return:
"""
@staticmethod
def bind_cpu():
...
@staticmethod
def setup_model_parallel() -> Tuple[int, int]:
local_rank, world_size = 0, 1
return local_rank, world_size
@classmethod
def safe_serialization(cls, model, tokenizer, save_dir):
"""
权重转safetensors
:param model:
:param tokenizer:
:param save_dir:
:return:
"""
os.makedirs(save_dir, exist_ok=True)
model.save_pretrained(save_dir, safe_serialization=True)
tokenizer.save_pretrained(save_dir)
def infer(self, query, model_params=None):
"""
推理代码
:param query:
:param model_params:
:return:
"""
inputs = self.tokenizer(query, return_tensors='pt')
inputs = inputs.to(self.model.device)
with torch.no_grad():
start_time = time.time()
model_params = model_params if model_params is not None else {}
pred = self.model.generate(**inputs, **model_params)
end_time = time.time()
time_cost = end_time - start_time
output = self.tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)
self.logger.info(output)
self.logger.info(f"cost %s s", time_cost)
new_tokens = len(pred[0]) - len(inputs.input_ids[0])
final_msg = f"generate {new_tokens} new tokens({new_tokens / time_cost:.2f} tokens/s)"
self.logger.info(final_msg)
return output
def infer_batch(self, query, model_params=None):
"""
推理代码
:param query:
:param model_params:
:return:
"""
inputs = self.tokenizer(query, return_tensors='pt', padding=True)
inputs = inputs.to(self.model.device)
with torch.no_grad():
start_time = time.time()
model_params = model_params if model_params is not None else {}
pred = self.model.generate(**inputs, **model_params)
end_time = time.time()
time_cost = end_time - start_time
output = self.tokenizer.batch_decode(pred, skip_special_tokens=True, clean_up_tokenization_spaces=False)
for ind, item in enumerate(output):
self.logger.info(f"###### batch %s ", ind)
self.logger.info(item)
self.logger.info(f"cost %s s", time_cost)
new_tokens = len(pred[0]) - len(inputs.input_ids[0])
final_msg = f"generate {new_tokens} new tokens({new_tokens / time_cost:.2f} tokens/s)"
self.logger.info(final_msg)
return output
def infer_test(self, batch_size: int = 1, seq_in: int = 2048, seq_out: int = 64):
"""
推理代码
:param batch_size: 特定batch size
:param seq_in: 特定长度输入
:param seq_out: 特定长度输出
:return:
"""
inputs = self.tokenizer("hi", return_tensors='pt')
dummy_input_ids_nxt = torch.randint(0, self.model.config.vocab_size, [batch_size, seq_in], dtype=torch.int64)
dummy_attention_mask = torch.ones((batch_size, seq_in), dtype=torch.int64)
inputs["input_ids"] = dummy_input_ids_nxt
inputs["attention_mask"] = dummy_attention_mask
inputs = inputs.to(self.model.device)
with torch.no_grad():
start_time = time.time()
pred = self.model.generate(**inputs, max_new_tokens=seq_out,
eos_token_id=self.model.config.vocab_size * 2)
end_time = time.time()
time_cost = end_time - start_time
output = self.tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)
self.logger.info(f"cost %s s", time_cost)
new_tokens = len(pred[0]) - seq_in
final_msg = (f"generate {batch_size * new_tokens} new tokens"
f"({batch_size * new_tokens / time_cost:.2f} tokens/s)")
self.logger.info(final_msg)
return output
def remove_part_of_generation_config(self, generation_config) -> GenerationConfig:
"""
移除部分当前不支持后处理相关参数
:param generation_config:
:return:
"""
ori_gen = GenerationConfig()
diff_dict = generation_config.to_diff_dict()
self.logger.info(diff_dict)
for key in diff_dict:
if key.endswith("_id"):
continue
ori_value = getattr(ori_gen, key, None)
if ori_value is not None:
setattr(generation_config, key, getattr(ori_gen, key))
self.logger.info(f"replace %s", key)
return generation_config

View File

@ -0,0 +1,57 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
common launcher
"""
import abc
import os
from typing import Dict
import torch
from atb_speed.common.launcher.base import BaseLauncher
class Launcher(BaseLauncher):
"""
BaseLauncher
"""
@staticmethod
def set_torch_env(device_ids, options: Dict = None):
"""
:param device_ids:
:param options:
:return:
"""
os.environ['CUDA_VISIBLE_DEVICES'] = device_ids
@abc.abstractmethod
def init_model(self):
"""
模型初始化
:return:
"""
...
class ParallelLauncher(Launcher):
@staticmethod
def set_torch_env(device_ids, options: Dict = None):
os.environ['CUDA_VISIBLE_DEVICES'] = device_ids
@abc.abstractmethod
def init_model(self):
"""
模型初始化
:return:
"""
...
def setup_model_parallel(self):
torch.distributed.init_process_group()
local_rank = torch.distributed.get_rank()
world_size = torch.distributed.get_world_size()
torch.manual_seed(1)
return local_rank, world_size

View File

@ -0,0 +1,117 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
common launcher
"""
import abc
from dataclasses import dataclass
from typing import Dict
import torch
import torch_npu
from atb_speed.common.config import atb_speed_config
from atb_speed.common.cpu_binding import CPUBinder
from atb_speed.common.launcher.base import BaseLauncher
@dataclass
class NPUSocInfo:
soc_name: str = ""
soc_version: int = -1
need_nz: bool = False
def __post_init__(self):
self.soc_version = torch_npu._C._npu_get_soc_version()
if self.soc_version in (100, 101, 102, 103, 104, 200, 201, 202, 203):
self.need_nz = True
class Launcher(BaseLauncher):
"""
BaseLauncher
"""
def __init__(self, device_ids: str = None, model_path="", options=None):
super().__init__(device_ids, model_path, options)
self.soc_info = NPUSocInfo()
self.fit_npu(self.model)
@staticmethod
def set_torch_env(device_ids, options: Dict = None):
"""
:param device_ids:
:param options:
:return:
"""
torch_npu.npu.set_device(int(device_ids.split(",")[0]))
torch.npu.set_compile_mode(jit_compile=False)
torch.npu.set_option(options)
@abc.abstractmethod
def init_model(self):
"""
模型初始化
:return:
"""
...
def fit_npu(self, model):
"""
芯片适配,提前转换提高性能
:param model:
:return:
"""
if not self.soc_info.need_nz:
for _, module in model.named_modules():
if isinstance(module, torch.nn.Linear):
module.weight.data = torch_npu.npu_format_cast(module.weight.data, 2)
self.logger.info(f"soc info: {self.soc_info.soc_version} , {self.soc_info.soc_name}, support ND")
else:
# if on 910A or 310P chip, eliminate the TransData and Transpose ops by converting weight data types
for name, module in model.named_modules():
if isinstance(module, torch.nn.Linear):
if name == 'lm_head':
# eliminate TransData op before lm_head calculation
module.weight.data = torch.nn.parameter.Parameter(module.weight.data)
module.weight.data = torch_npu.npu_format_cast(module.weight.data, 29)
self.logger.info(f"soc info: {self.soc_info.soc_version} , {self.soc_info.soc_name}, support NZ")
for _, module in model.named_modules():
if isinstance(module, torch.nn.Embedding):
module.weight.data = torch_npu.npu_format_cast(module.weight.data, 2)
def bind_cpu(self):
"""
绑核
:return:
"""
cpu_binder = CPUBinder(self.logger)
cpu_binder.bind_cpus(self.device_id_list, self.local_rank, 1.0)
self.logger.info("Bind cpu successfully!")
class ParallelLauncher(Launcher):
@staticmethod
def set_torch_env(device_ids, options: Dict = None):
torch.npu.set_compile_mode(jit_compile=False)
torch.npu.set_option(options)
@abc.abstractmethod
def init_model(self):
"""
模型初始化
:return:
"""
...
def setup_model_parallel(self):
torch.distributed.init_process_group(atb_speed_config.model.parallel_backend)
local_rank = torch.distributed.get_rank()
world_size = torch.distributed.get_world_size()
torch_npu.npu.set_device(self.device_id_list[local_rank])
# seed must be the same in all processes
torch.manual_seed(1)
return local_rank, world_size

View File

@ -0,0 +1,39 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
logging
"""
import logging
import os
from logging.handlers import RotatingFileHandler
from atb_speed.common.log.multiprocess_logging_handler import install_logging_handler
def init_logger(logger: logging.Logger, file_name: str):
"""
日志初始化
:param logger:
:param file_name:
:return:
"""
logger.setLevel(logging.INFO)
# 创建日志记录器,指明日志保存路径,每个日志的大小,保存日志的上限
flask_file_handle = RotatingFileHandler(
filename=file_name,
maxBytes=int(os.getenv('PYTHON_LOG_MAXSIZE', "1073741824")),
backupCount=10,
encoding="utf-8")
formatter = logging.Formatter('%(asctime)s [%(levelname)s] pid: %(process)d %(filename)s-%(lineno)d: %(message)s')
# 将日志记录器指定日志的格式
flask_file_handle.setFormatter(formatter)
# 为全局的日志工具对象添加日志记录器
logger.addHandler(flask_file_handle)
# 添加控制台输出日志
console_handle = logging.StreamHandler()
console_handle.setFormatter(formatter)
logger.addHandler(console_handle)
install_logging_handler(logger)
return logger

View File

@ -0,0 +1,135 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved.
"""
multiprocess_logging_handler
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import unicode_literals
import logging
import multiprocessing
import threading
def install_logging_handler(logger=None):
"""
Wraps the handlers in the given Logger with an MultiProcessingHandler.
:param logger: whose handlers to wrap. By default, the root logger.
"""
if logger is None:
logger = logging.getLogger("service_operation")
for index, org_handler in enumerate(list(logger.handlers)):
handler = MultiLoggingHandler('mp-handler-{0}'.format(index), log_handler=org_handler)
logger.removeHandler(org_handler)
logger.addHandler(handler)
class MultiLoggingHandler(logging.Handler):
"""
multiprocessing handler.
"""
def __init__(self, name, log_handler=None):
"""
Init multiprocessing handler
:param name:
:param log_handler:
:return:
"""
super().__init__()
if log_handler is None:
log_handler = logging.StreamHandler()
self.log_handler = log_handler
self.queue = multiprocessing.Queue(-1)
self.setLevel(self.log_handler.level)
self.set_formatter(self.log_handler.formatter)
# The thread handles receiving records asynchronously.
t_thd = threading.Thread(target=self.receive, name=name)
t_thd.daemon = True
t_thd.start()
def set_formatter(self, fmt):
"""
:param fmt:
:return:
"""
logging.Handler.setFormatter(self, fmt)
self.log_handler.setFormatter(fmt)
def receive(self):
"""
:return:
"""
while True:
try:
record = self.queue.get()
self.log_handler.emit(record)
except (KeyboardInterrupt, SystemExit) as err:
raise err
except EOFError:
break
except ValueError:
pass
def send(self, message):
"""
:param message:
:return:
"""
self.queue.put_nowait(message)
def emit(self, record):
"""
:param record:
:return:
"""
try:
sd_record = self._format_record(record)
self.send(sd_record)
except (KeyboardInterrupt, SystemExit) as err:
raise err
except ValueError:
self.handleError(record)
def close(self):
"""
:return:
"""
self.log_handler.close()
logging.Handler.close(self)
def handle(self, record):
"""
:param record:
:return:
"""
rsv_record = self.filter(record)
if rsv_record:
self.emit(record)
return rsv_record
def _format_record(self, org_record):
"""
:param org_record:
:return:
"""
if org_record.args:
org_record.msg = org_record.msg % org_record.args
org_record.args = None
if org_record.exc_info:
self.format(org_record)
org_record.exc_info = None
return org_record

View File

@ -0,0 +1,231 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
performance test base
"""
import time
from dataclasses import dataclass
from enum import Enum
from typing import List, Callable
import torch
import torch.distributed as dist
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher.base import BaseLauncher
from atb_speed.common.timer import Timer
from atb_llm.utils.file_utils import safe_open
class PerfMode(str, Enum):
detail = "detail"
normal = "normal"
@dataclass
class PerformanceTestConfig:
"""
PerformanceTestGPUConfig
"""
batch_size: int = 1
max_len_exp: int = 5
min_len_exp: int = 11
model_name: str = "model"
device_name: str = "cpu"
save_file_name: str = ""
case_pair: List[List[int]] = None
def __post_init__(self):
self.batch_size = atb_speed_config.performance.batch_size
self.max_len_exp = atb_speed_config.performance.max_len_exp
self.min_len_exp = atb_speed_config.performance.min_len_exp
self.model_name = atb_speed_config.performance.model_name
self.case_pair = atb_speed_config.performance.case_pair
if not atb_speed_config.performance.save_file_name:
self.save_file_name = f"performance_test_{self.model_name}_{self.device_name}_bs{self.batch_size}.csv"
else:
self.save_file_name = atb_speed_config.performance.save_file_name
class PerformanceTest:
"""
PerformanceTestNPU
"""
def __init__(self, launcher: BaseLauncher):
"""
:param launcher:
"""
self.launcher = launcher
self.local_rank, self.world_size = launcher.local_rank, launcher.world_size
self.config = PerformanceTestConfig(device_name=self.launcher.device_name)
self.launcher.logger.info(self.config.__dict__)
self.model, self.tokenizer = launcher.model, launcher.tokenizer
self.dummy_input = "Common sense questions and answers\n\nQuestion: Why do people need sleep\nFactual answer:"
if atb_speed_config.performance.perf_mode == PerfMode.detail:
self.perf = self._perf_detail_v2
else:
self.perf = self._perf
self.test_case = self.generate_test_case()
def generate_test_case(self):
if self.config.case_pair is None:
return [[2 ** i, 2 ** j]
for i in range(self.config.min_len_exp, self.config.max_len_exp + 1)
for j in range(self.config.min_len_exp, self.config.max_len_exp + 1)]
return self.config.case_pair
def warm_up(self, seq_len_in=None, seq_len_out=None):
"""
:return:
"""
if seq_len_in is None:
seq_len_in = max(case[0] for case in self.test_case)
if seq_len_out is None:
seq_len_out = max(case[1] for case in self.test_case)
dummy_input_ids_nxt = torch.randint(0, self.model.config.vocab_size, [self.config.batch_size, seq_len_in],
dtype=torch.int64)
dummy_attention_mask = torch.ones((self.config.batch_size, seq_len_in), dtype=torch.int64)
inputs = self.tokenizer([self.dummy_input] * self.config.batch_size, return_tensors="pt", padding='max_length',
max_length=seq_len_in)
inputs["input_ids"] = dummy_input_ids_nxt
inputs["attention_mask"] = dummy_attention_mask
inputs = inputs.to(self.model.device)
with torch.no_grad():
_ = self.model.generate(
**inputs,
max_new_tokens=seq_len_out,
eos_token_id=self.model.config.vocab_size * 2
)
self.launcher.logger.info("warm up finished.")
def run_test(self):
self.launcher.logger.info("---------------inference---------------")
file = None
if self.local_rank == 0:
file = safe_open(self.config.save_file_name, "w", encoding="utf-8")
file.write(
"batch_size,"
"input_seq_len(Encoding),"
"output_seq_len(Decoding),"
"ResponseTime(s),"
"forward_first_token_time(ms),"
"forward_next_token_time(ms),"
"pre_next_token_time(ms),"
"post_next_token_time_post(ms)\n")
for seq_len_in, seq_len_out in self.test_case:
time_tensor = self._run(seq_len_in, seq_len_out)
if self.local_rank == 0:
file.write(
f"{self.config.batch_size},"
f"{seq_len_in},"
f"{seq_len_out},"
f"{round(time_tensor[0], 2)},"
f"{time_tensor[1]},"
f"{time_tensor[2]},"
f"{time_tensor[3]},"
f"{time_tensor[4]}\n")
if self.local_rank == 0:
file.close()
def _run(self, seq_len_in, seq_len_out):
dummy_input_ids_nxt = torch.randint(0, self.model.config.vocab_size, [self.config.batch_size, seq_len_in],
dtype=torch.int64)
dummy_attention_mask = torch.ones((self.config.batch_size, seq_len_in), dtype=torch.int64)
inputs = self.tokenizer(
[self.dummy_input] * self.config.batch_size,
return_tensors="pt", padding='max_length', max_length=seq_len_in)
inputs["input_ids"] = dummy_input_ids_nxt
inputs["attention_mask"] = dummy_attention_mask
inputs = inputs.to(self.model.device)
self.launcher.logger.info("---------------inputs shape---------------")
self.launcher.logger.info(inputs.input_ids.shape)
self.launcher.logger.info(f"seq_len_in: {seq_len_in}, seq_len_out: {seq_len_out}")
start_time = time.time()
forward_first_token_time, forward_next_token_time, pre_next_token_time, post_next_token_time_post = (
self.perf(inputs, seq_len_out))
end_time = time.time()
# output
# time analysis
total_time = end_time - start_time
time_tensor = torch.tensor(
[total_time,
forward_first_token_time,
forward_next_token_time,
pre_next_token_time,
post_next_token_time_post], device=self.model.device)
if self.world_size > 1:
dist.all_reduce(time_tensor, dist.ReduceOp.MAX)
time_tensor = time_tensor.tolist()
return time_tensor
def _perf_detail_v2(self, inputs, seq_len_out):
"""
使用装饰器的方式进行计时从而从根本上解决侵入式修改打点的方式
:param inputs:
:param seq_len_out:
:return:
"""
Timer.reset()
Timer.sync = getattr(torch, self.launcher.device_type).synchronize
with torch.no_grad():
generate_ids = self.model.generate(**inputs, max_new_tokens=seq_len_out,
eos_token_id=self.model.config.vocab_size * 2 # 避免提前停止
)
# decode
if not atb_speed_config.performance.skip_decode:
_ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True,
clean_up_tokenization_spaces=False)
return [Timer.timeit_res.first_token_delay, Timer.timeit_res.next_token_avg_delay, 0, 0]
def _perf_detail(self, inputs, seq_len_out):
with torch.no_grad():
generate_ids, \
forward_first_token_time, \
forward_next_token_time, \
pre_next_token_time, \
post_next_token_time_post = \
self.model.generate(**inputs, max_new_tokens=seq_len_out,
eos_token_id=self.model.config.vocab_size * 2 # 避免提前停止
)
# decode
if not atb_speed_config.performance.skip_decode:
_ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True,
clean_up_tokenization_spaces=False)
return [forward_first_token_time,
forward_next_token_time,
pre_next_token_time,
post_next_token_time_post]
def _perf(self, inputs, seq_len_out):
with torch.no_grad():
getattr(torch, self.launcher.device_type).synchronize()
first_token_start = time.time()
generate_ids = self.model.generate(**inputs,
min_new_tokens=1,
max_new_tokens=1)
getattr(torch, self.launcher.device_type).synchronize()
first_token_end = time.time()
if not atb_speed_config.performance.skip_decode:
_ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True,
clean_up_tokenization_spaces=False)
getattr(torch, self.launcher.device_type).synchronize()
total_start = time.time()
generate_ids = self.model.generate(
**inputs,
min_new_tokens=seq_len_out,
max_new_tokens=seq_len_out
)
getattr(torch, self.launcher.device_type).synchronize()
total_end = time.time()
if not atb_speed_config.performance.skip_decode:
_ = self.tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
# time analysis
forward_first_token_time = (first_token_end - first_token_start) * 1000
time_inc_total = (total_end - total_start) * 1000
forward_next_token_time = (time_inc_total - forward_first_token_time) / (seq_len_out - 1)
return [forward_first_token_time, forward_next_token_time, 0, 0]

View File

@ -0,0 +1,21 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
common launcher
"""
from atb_speed.common.config import atb_speed_config
from .base import CEVALPrecisionTest, MMLUPrecisionTest
def get_precision_test_cls(mode=""):
"""
:return:
"""
cls_map = {
"mmlu": MMLUPrecisionTest,
"ceval": CEVALPrecisionTest
}
return cls_map.get(mode or atb_speed_config.precision.mode.lower(), CEVALPrecisionTest)

View File

@ -0,0 +1,256 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
precision base
"""
import json
import os
from string import ascii_letters
import pandas as pd
import torch
from atb_llm.utils.file_utils import safe_open
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher.base import BaseLauncher
from atb_speed.common.utils import torch_parallel_info
from tqdm import tqdm
HARD_TASK = (
"advanced_mathematics", "discrete_mathematics", "probability_and_statistics", "college_chemistry",
"college_physics", "high_school_mathematics", "high_school_chemistry", "high_school_physics"
)
class Record:
"""only keep one card result when debug is False"""
def __init__(self, log_dir, log_flag, debug=False):
self.debug = debug
self.flag = log_flag if debug else ""
self.log_name = os.path.join(log_dir, f"device{self.flag}.log")
self.cache_name = os.path.join(log_dir, f"cache{self.flag}.csv")
self.begin_idx = self.load_cache()
def log(self, *msg):
if self.debug or torch_parallel_info.is_rank_0:
with safe_open(self.log_name, "a", encoding="utf-8") as f:
f.write(" ".join([str(i) for i in msg]) + '\n')
def load_cache(self):
if not os.path.exists(self.cache_name):
self.log("[-] No cache file, cache will be created")
return 0
self.log("[~] Loading cache on last abnormal exit ... (and continue with the cache)")
with safe_open(self.cache_name, "r", encoding="utf-8") as f:
cache = f.read().strip().split()
if not cache:
return 0
cache = [row.split(",") for row in cache]
start_idx = cache[-1][0]
self.log(f"[+] Load cache successfully! start idx: {start_idx}")
return int(start_idx) + 1
def update_cache(self, task_name, question_id, truth_answer, predict_answer):
if self.debug or torch_parallel_info.is_rank_0:
with safe_open(self.cache_name, "a", encoding="utf-8") as f:
f.write(f"{question_id},{task_name},{truth_answer},{predict_answer}\n")
class PrecisionTestBase:
def __init__(self, launcher: BaseLauncher, workdir="", **kwargs):
workdir = atb_speed_config.precision.work_dir if not workdir else workdir
self.data_dir = os.path.join(workdir, "data")
self.result_output_dir = os.path.join(workdir, "test_result")
self.init_result_dir()
self.choices = ["A", "B", "C", "D"]
self.shot = 5
self.batch = 1
self.seq_len_out = 32
self.model, self.tokenizer = launcher.model, launcher.tokenizer
self.local_rank = launcher.local_rank
self.launcher = launcher
self.recorder = Record(self.result_output_dir, self.local_rank)
self.subject_mapping_path = os.path.join(os.path.dirname(os.path.realpath(__file__)),
f"{atb_speed_config.precision.mode}_subject_mapping.json")
# kwargs have higher priority
if atb_speed_config.precision:
self.update_param(atb_speed_config.precision.__dict__)
self.update_param(kwargs)
@staticmethod
def format_subject(subject):
sub_list = subject.split("_")
final_str = ""
for entry in sub_list:
final_str += " " + entry
return final_str
def update_param(self, param_dict):
for key, value in param_dict.items():
setattr(self, key, value)
self.recorder.log(f"[+] set {key} to {value}")
def init_result_dir(self):
if torch_parallel_info.is_rank_0:
os.makedirs(self.result_output_dir, exist_ok=True)
if torch_parallel_info.world_size > 1:
torch.distributed.barrier()
def compute_metric(self, subject_mapping):
run_results = pd.read_csv(
self.recorder.cache_name,
names=['question_id', 'task_name', 'truth_answer', 'predict_answer'])
classes_acc = dict()
subject_acc = dict()
hard_task = [0, 0]
for task in subject_mapping:
class_of_task = subject_mapping[task][2]
this_task = run_results.loc[run_results.task_name == task]
if not this_task.shape[0]:
continue
correct_num = (this_task.truth_answer == this_task.predict_answer).sum()
if class_of_task not in classes_acc:
classes_acc[class_of_task] = [0, 0] # correct num, total num
if task in HARD_TASK:
hard_task[0] += correct_num
hard_task[1] += this_task.shape[0]
subject_acc[task] = correct_num / this_task.shape[0]
classes_acc[class_of_task][0] += correct_num
classes_acc[class_of_task][1] += this_task.shape[0]
avg_acc = sum([i[0] for i in classes_acc.values()]) / sum([j[1] for j in classes_acc.values()])
for c in classes_acc:
classes_acc[c] = classes_acc[c][0] / classes_acc[c][1]
classes_acc["Avg"] = avg_acc
classes_acc["Avg(Hard)"] = hard_task[0] / hard_task[1]
with safe_open(os.path.join(self.result_output_dir, f"result{self.recorder.flag}_subject_acc.json"), "w") as fp:
json.dump(subject_acc, fp)
with safe_open(os.path.join(self.result_output_dir, f"result{self.recorder.flag}_classes_acc.json"), "w") as fp:
json.dump(classes_acc, fp)
if torch_parallel_info.is_rank_0:
self.launcher.logger.info(f"[+] Avg acc: {classes_acc['Avg']}")
def get_subject_mapping(self):
with safe_open(self.subject_mapping_path, "r", encoding="utf-8") as f:
subject_mapping = json.load(f)
return subject_mapping
def load_csv_by_task_name(self, task_name):
dev_df = pd.read_csv(os.path.join(self.data_dir, "dev", task_name + "_dev.csv"), header=None)[
:self.shot + 1]
val_df = pd.read_csv(os.path.join(self.data_dir, "val", task_name + "_val.csv"), header=None)
return dev_df, val_df
def format_example(self, df, idx, include_answer=True):
prompt = df.iloc[idx, 0]
k = len(self.choices)
for j in range(k):
prompt += "\n{}. {}".format(self.choices[j], df.iloc[idx, j + 1])
prompt += "\nAnswer:"
if include_answer:
prompt += " {}\n\n".format(df.iloc[idx, k + 1])
return prompt
def gen_prompt(self, train_df, subject, k=-1):
prompt = "The following are multiple choice questions (with answers) about {}.\n\n".format(
self.format_subject(subject))
if k == -1:
k = train_df.shape[0]
for i in range(k):
prompt += self.format_example(train_df, i)
return prompt
def batch_infer(self, qr_pair, begin_idx):
prompts = [item['prompt'] for item in qr_pair]
truth_answers = [item['answer'] for item in qr_pair]
task_names = [item['task_name'] for item in qr_pair]
inputs = self.tokenizer(prompts, return_tensors="pt", padding='longest')
inputs = inputs.to(self.model.device)
input_len = len(inputs.input_ids[0])
with torch.no_grad():
output = self.model.generate(inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=self.seq_len_out)
answers = self.tokenizer.batch_decode(output.to(torch.int32)[:, input_len:])
for prompt, truth_answer, task_name, ori_answer in zip(prompts, truth_answers, task_names, answers):
self.recorder.log("\n========== prompt start ==========\n", prompt,
"\n========== prompt end ==========\n")
self.recorder.log(f"[+] prompt length: {input_len}")
self.recorder.log("\n========== answer start ==========\n", ori_answer,
"\n========== answer end ==========\n")
answer_list = [char.upper() for char in ori_answer if char in ascii_letters]
answer = answer_list[0] if answer_list else "-1"
is_correct = "Correct" if answer == truth_answer else "Wrong"
self.recorder.log(f"[{is_correct}] predict: {answer}, label: {truth_answer}")
self.recorder.update_cache(task_name, begin_idx, truth_answer, answer)
begin_idx += 1
def run(self):
subject_mapping = self.get_subject_mapping()
subject_name_list = sorted(list(subject_mapping.keys()))
qr_pair = []
total_len = 0
begin_idx = self.recorder.begin_idx
for task_name in subject_name_list:
dev_df, val_df = self.load_csv_by_task_name(task_name)
total_len += len(val_df)
if len(val_df) <= begin_idx:
self.recorder.log(f"[~] Skip Task: {task_name}")
begin_idx -= len(val_df)
continue
for i in range(val_df.shape[0]):
if begin_idx > 0:
begin_idx -= 1
continue
for cut_shot in range(self.shot):
prompt_end = self.format_example(val_df, i, include_answer=False)
train_prompt = self.gen_prompt(dev_df, task_name, self.shot - cut_shot)
prompt = train_prompt + prompt_end
input_len = len(self.tokenizer(prompt, return_tensors="pt").input_ids[0])
if input_len > 2000:
continue
label = val_df.iloc[i, val_df.shape[1] - 1]
qr_pair.append({'task_name': task_name, 'prompt': prompt, 'answer': label})
break
pbar = None
if torch_parallel_info.is_rank_0:
pbar = tqdm(total=total_len, initial=self.recorder.begin_idx)
for i in range(0, len(qr_pair), self.batch):
self.batch_infer(qr_pair[i: i + self.batch], i + self.recorder.begin_idx)
if torch_parallel_info.is_rank_0:
pbar.update(self.batch if i + self.batch <= len(qr_pair) else len(qr_pair) - i)
if torch_parallel_info.is_rank_0:
pbar.close()
self.compute_metric(subject_mapping)
class CEVALPrecisionTest(PrecisionTestBase):
"""
CEVAL
"""
def load_csv_by_task_name(self, task_name):
dev_df, val_df = super().load_csv_by_task_name(task_name)
# remove the first row "column names" and the first column "id"
dev_df = dev_df.iloc[1:, 1:]
val_df = val_df.iloc[1:, 1:]
return dev_df, val_df
class MMLUPrecisionTest(PrecisionTestBase):
"""
MMLU
"""
def compute_metric(self, subject_mapping):
subject_mapping_adapt = {k: [None, None, v] for k, v in subject_mapping.items()}
return super().compute_metric(subject_mapping_adapt)

View File

@ -0,0 +1,262 @@
{
"computer_network": [
"Computer Network",
"\u8ba1\u7b97\u673a\u7f51\u7edc",
"STEM"
],
"operating_system": [
"Operating System",
"\u64cd\u4f5c\u7cfb\u7edf",
"STEM"
],
"computer_architecture": [
"Computer Architecture",
"\u8ba1\u7b97\u673a\u7ec4\u6210",
"STEM"
],
"college_programming": [
"College Programming",
"\u5927\u5b66\u7f16\u7a0b",
"STEM"
],
"college_physics": [
"College Physics",
"\u5927\u5b66\u7269\u7406",
"STEM"
],
"college_chemistry": [
"College Chemistry",
"\u5927\u5b66\u5316\u5b66",
"STEM"
],
"advanced_mathematics": [
"Advanced Mathematics",
"\u9ad8\u7b49\u6570\u5b66",
"STEM"
],
"probability_and_statistics": [
"Probability and Statistics",
"\u6982\u7387\u7edf\u8ba1",
"STEM"
],
"discrete_mathematics": [
"Discrete Mathematics",
"\u79bb\u6563\u6570\u5b66",
"STEM"
],
"electrical_engineer": [
"Electrical Engineer",
"\u6ce8\u518c\u7535\u6c14\u5de5\u7a0b\u5e08",
"STEM"
],
"metrology_engineer": [
"Metrology Engineer",
"\u6ce8\u518c\u8ba1\u91cf\u5e08",
"STEM"
],
"high_school_mathematics": [
"High School Mathematics",
"\u9ad8\u4e2d\u6570\u5b66",
"STEM"
],
"high_school_physics": [
"High School Physics",
"\u9ad8\u4e2d\u7269\u7406",
"STEM"
],
"high_school_chemistry": [
"High School Chemistry",
"\u9ad8\u4e2d\u5316\u5b66",
"STEM"
],
"high_school_biology": [
"High School Biology",
"\u9ad8\u4e2d\u751f\u7269",
"STEM"
],
"middle_school_mathematics": [
"Middle School Mathematics",
"\u521d\u4e2d\u6570\u5b66",
"STEM"
],
"middle_school_biology": [
"Middle School Biology",
"\u521d\u4e2d\u751f\u7269",
"STEM"
],
"middle_school_physics": [
"Middle School Physics",
"\u521d\u4e2d\u7269\u7406",
"STEM"
],
"middle_school_chemistry": [
"Middle School Chemistry",
"\u521d\u4e2d\u5316\u5b66",
"STEM"
],
"veterinary_medicine": [
"Veterinary Medicine",
"\u517d\u533b\u5b66",
"STEM"
],
"college_economics": [
"College Economics",
"\u5927\u5b66\u7ecf\u6d4e\u5b66",
"Social Science"
],
"business_administration": [
"Business Administration",
"\u5de5\u5546\u7ba1\u7406",
"Social Science"
],
"marxism": [
"Marxism",
"\u9a6c\u514b\u601d\u4e3b\u4e49\u57fa\u672c\u539f\u7406",
"Social Science"
],
"mao_zedong_thought": [
"Mao Zedong Thought",
"\u6bdb\u6cfd\u4e1c\u601d\u60f3\u548c\u4e2d\u56fd\u7279\u8272\u793e\u4f1a\u4e3b\u4e49\u7406\u8bba\u4f53\u7cfb\u6982\u8bba",
"Social Science"
],
"education_science": [
"Education Science",
"\u6559\u80b2\u5b66",
"Social Science"
],
"teacher_qualification": [
"Teacher Qualification",
"\u6559\u5e08\u8d44\u683c",
"Social Science"
],
"high_school_politics": [
"High School Politics",
"\u9ad8\u4e2d\u653f\u6cbb",
"Social Science"
],
"high_school_geography": [
"High School Geography",
"\u9ad8\u4e2d\u5730\u7406",
"Social Science"
],
"middle_school_politics": [
"Middle School Politics",
"\u521d\u4e2d\u653f\u6cbb",
"Social Science"
],
"middle_school_geography": [
"Middle School Geography",
"\u521d\u4e2d\u5730\u7406",
"Social Science"
],
"modern_chinese_history": [
"Modern Chinese History",
"\u8fd1\u4ee3\u53f2\u7eb2\u8981",
"Humanities"
],
"ideological_and_moral_cultivation": [
"Ideological and Moral Cultivation",
"\u601d\u60f3\u9053\u5fb7\u4fee\u517b\u4e0e\u6cd5\u5f8b\u57fa\u7840",
"Humanities"
],
"logic": [
"Logic",
"\u903b\u8f91\u5b66",
"Humanities"
],
"law": [
"Law",
"\u6cd5\u5b66",
"Humanities"
],
"chinese_language_and_literature": [
"Chinese Language and Literature",
"\u4e2d\u56fd\u8bed\u8a00\u6587\u5b66",
"Humanities"
],
"art_studies": [
"Art Studies",
"\u827a\u672f\u5b66",
"Humanities"
],
"professional_tour_guide": [
"Professional Tour Guide",
"\u5bfc\u6e38\u8d44\u683c",
"Humanities"
],
"legal_professional": [
"Legal Professional",
"\u6cd5\u5f8b\u804c\u4e1a\u8d44\u683c",
"Humanities"
],
"high_school_chinese": [
"High School Chinese",
"\u9ad8\u4e2d\u8bed\u6587",
"Humanities"
],
"high_school_history": [
"High School History",
"\u9ad8\u4e2d\u5386\u53f2",
"Humanities"
],
"middle_school_history": [
"Middle School History",
"\u521d\u4e2d\u5386\u53f2",
"Humanities"
],
"civil_servant": [
"Civil Servant",
"\u516c\u52a1\u5458",
"Other"
],
"sports_science": [
"Sports Science",
"\u4f53\u80b2\u5b66",
"Other"
],
"plant_protection": [
"Plant Protection",
"\u690d\u7269\u4fdd\u62a4",
"Other"
],
"basic_medicine": [
"Basic Medicine",
"\u57fa\u7840\u533b\u5b66",
"Other"
],
"clinical_medicine": [
"Clinical Medicine",
"\u4e34\u5e8a\u533b\u5b66",
"Other"
],
"urban_and_rural_planner": [
"Urban and Rural Planner",
"\u6ce8\u518c\u57ce\u4e61\u89c4\u5212\u5e08",
"Other"
],
"accountant": [
"Accountant",
"\u6ce8\u518c\u4f1a\u8ba1\u5e08",
"Other"
],
"fire_engineer": [
"Fire Engineer",
"\u6ce8\u518c\u6d88\u9632\u5de5\u7a0b\u5e08",
"Other"
],
"environmental_impact_assessment_engineer": [
"Environmental Impact Assessment Engineer",
"\u73af\u5883\u5f71\u54cd\u8bc4\u4ef7\u5de5\u7a0b\u5e08",
"Other"
],
"tax_accountant": [
"Tax Accountant",
"\u7a0e\u52a1\u5e08",
"Other"
],
"physician": [
"Physician",
"\u533b\u5e08\u8d44\u683c",
"Other"
]
}

View File

@ -0,0 +1,59 @@
{
"abstract_algebra": "STEM",
"anatomy": "other",
"astronomy": "STEM",
"business_ethics": "other",
"clinical_knowledge": "other",
"college_biology": "STEM",
"college_chemistry": "STEM",
"college_computer_science": "STEM",
"college_mathematics": "STEM",
"college_medicine": "other",
"college_physics": "STEM",
"computer_security": "STEM",
"conceptual_physics": "STEM",
"econometrics": "social sciences",
"electrical_engineering": "STEM",
"elementary_mathematics": "STEM",
"formal_logic": "humanities",
"global_facts": "other",
"high_school_biology": "STEM",
"high_school_chemistry": "STEM",
"high_school_computer_science": "STEM",
"high_school_european_history": "humanities",
"high_school_geography": "social sciences",
"high_school_government_and_politics": "social sciences",
"high_school_macroeconomics": "social sciences",
"high_school_mathematics": "STEM",
"high_school_microeconomics": "social sciences",
"high_school_physics": "STEM",
"high_school_psychology": "social sciences",
"high_school_statistics": "STEM",
"high_school_us_history": "humanities",
"high_school_world_history": "humanities",
"human_aging": "other",
"human_sexuality": "social sciences",
"international_law": "humanities",
"jurisprudence": "humanities",
"logical_fallacies": "humanities",
"machine_learning": "STEM",
"management": "other",
"marketing": "other",
"medical_genetics": "other",
"miscellaneous": "other",
"moral_disputes": "humanities",
"moral_scenarios": "humanities",
"nutrition": "other",
"philosophy": "humanities",
"prehistory": "humanities",
"professional_accounting": "other",
"professional_law": "humanities",
"professional_medicine": "other",
"professional_psychology": "social sciences",
"public_relations": "social sciences",
"security_studies": "social sciences",
"sociology": "social sciences",
"us_foreign_policy": "social sciences",
"virology": "other",
"world_religions": "humanities"
}

View File

@ -0,0 +1,101 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved.
"""
decorator
"""
import logging
import os
import time
import uuid
from dataclasses import dataclass, field
from functools import wraps, partial
from typing import List
from typing import Union
@dataclass
class TimeData:
step: int = 0
time_cost: Union[float, int] = 0
@dataclass
class SeqTimeData:
task_id: str = ""
time_data_list: List[TimeData] = field(default_factory=list)
@property
def generated_tokens(self):
return len(self.time_data_list)
@property
def first_token_delay(self):
return self.time_data_list[0].time_cost if self.time_data_list else 0
@property
def next_token_avg_delay(self):
if self.generated_tokens <= 1:
return 0
return sum(item.time_cost for item in self.time_data_list[1:]) / (self.generated_tokens - 1)
class Timer:
"""
CommonDecorator
"""
step: int = 0
timeit_res: SeqTimeData = SeqTimeData(str(uuid.uuid4()))
@classmethod
def reset(cls):
cls.step = 0
cls.timeit_res = SeqTimeData(str(uuid.uuid4()))
@classmethod
def sync(cls):
...
@classmethod
def timing(cls, func=None, *, logger=None, level=logging.INFO):
"""
函数计时
:return:
"""
if logger is None:
logger = logging.getLogger()
if func is None:
# 没有括号的时候args是func有括号的时候args是None
return partial(Timer.timing, logger=logger, level=level)
run = cls._timeit_run if os.getenv("TIMEIT", "0") == "1" else cls._run
@wraps(func)
def wrapper(*args, **kwargs):
"""
wrapper
:param args:
:param kwargs:
:return:
"""
res = run(func, *args, **kwargs)
return res
return wrapper
@classmethod
def _run(cls, func, *args, **kwargs):
res = func(*args, **kwargs)
return res
@classmethod
def _timeit_run(cls, func, *args, **kwargs):
cls.sync()
start_time = time.time()
res = func(*args, **kwargs)
cls.sync()
end_time = (time.time() - start_time) * 1000 # ms
cls.timeit_res.time_data_list.append(TimeData(cls.step, end_time))
cls.step = cls.step + 1
return res

View File

@ -0,0 +1,81 @@
#!/usr/bin/env python
# coding:utf-8
# Copyright Huawei Technologies Co., Ltd. 2010-2018. All rights reserved
"""
utils
"""
import os
from dataclasses import dataclass
import torch
FLAG_OS_MAP = {
'r': os.O_RDONLY, 'r+': os.O_RDWR,
'w': os.O_CREAT | os.O_TRUNC | os.O_WRONLY,
'w+': os.O_CREAT | os.O_TRUNC | os.O_RDWR,
'a': os.O_CREAT | os.O_APPEND | os.O_WRONLY,
'a+': os.O_CREAT | os.O_APPEND | os.O_RDWR,
'x': os.O_CREAT | os.O_EXCL,
"b": getattr(os, "O_BINARY", 0)
}
@dataclass
class TorchParallelInfo:
__is_initialized: bool = False
__world_size: int = 1
__local_rank: int = 0
def __post_init__(self):
self.try_to_init()
@property
def is_initialized(self):
return self.__is_initialized
@property
def world_size(self):
_ = self.try_to_init()
return self.__world_size
@property
def local_rank(self):
_ = self.try_to_init()
return self.__local_rank
@property
def is_rank_0(self) -> bool:
return self.local_rank == 0
@staticmethod
def get_rank() -> int:
return 0 if not torch.distributed.is_initialized() else torch.distributed.get_rank()
@staticmethod
def get_world_size() -> int:
return 1 if not torch.distributed.is_initialized() else torch.distributed.get_world_size()
def try_to_init(self):
"""
没有初始化的时候刷新初始化状态以及world_size local_rank
:return:
"""
if not self.__is_initialized:
is_initialized = torch.distributed.is_initialized()
if is_initialized:
self.__local_rank = self.get_rank()
self.__world_size = self.get_world_size()
self.__is_initialized = is_initialized
return self.__is_initialized
def load_atb_speed():
env_name = "ATB_SPEED_HOME_PATH"
atb_speed_home_path = os.getenv(env_name)
if atb_speed_home_path is None:
raise RuntimeError(f"env {env_name} not exist, source set_env.sh")
lib_path = os.path.join(atb_speed_home_path, "lib", "libatb_speed_torch.so")
torch.classes.load_library(lib_path)
torch_parallel_info = TorchParallelInfo()

View File

@ -0,0 +1,19 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
"""
setup
"""
from setuptools import find_packages, setup
setup(name='atb_speed',
version='1.1.0',
description='atb speed sdk',
license='MIT',
keywords='atb_speed',
packages=find_packages(),
install_requires=["pandas"],
package_data={"atb_speed": ["**/*.json"]},
include_package_data=True
)

View File

@ -0,0 +1,36 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher import Launcher
from atb_speed.common.precision import get_precision_test_cls
from transformers import AutoTokenizer, AutoModelForCausalLM
class BaichuanLM(Launcher):
def init_model(self):
"""
模型初始化
:return:
"""
tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
model.eval()
model.generation_config = self.remove_part_of_generation_config(model.generation_config)
return model, tokenizer
def demo_ceval(launcher: Launcher):
"""
:param launcher:
:return:
"""
c_t = get_precision_test_cls()(launcher)
c_t.run()
if __name__ == '__main__':
atb_speed_config.init_config("config.ini")
baichuan = BaichuanLM()
demo_ceval(baichuan)

View File

@ -0,0 +1,32 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
from atb_speed.common.config import atb_speed_config
from atb_speed.common.launcher import Launcher
from atb_speed.common.performance.base import PerformanceTest
from transformers import AutoTokenizer, AutoModelForCausalLM
class LMLauncher(Launcher):
"""
LMLauncher
"""
def init_model(self):
"""
模型初始化
:return:
"""
tokenizer = AutoTokenizer.from_pretrained(
self.model_path, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(self.model_path, trust_remote_code=True).half().to(self._device)
model.eval()
model.generation_config = self.remove_part_of_generation_config(model.generation_config)
return model, tokenizer
if __name__ == '__main__':
atb_speed_config.init_config("config.ini")
performance_test = PerformanceTest(LMLauncher("0"))
performance_test.warm_up()
performance_test.run_test()

View File

@ -0,0 +1,40 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
import os
from atb_speed.common.launcher import Launcher
from transformers import AutoTokenizer, AutoModelForCausalLM
class BaichuanLM(Launcher):
def init_model(self):
"""
模型初始化
:return:
"""
pwd = os.path.realpath(os.path.dirname(__file__))
model_path = os.path.join(pwd, "..", "model")
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).half().to(self._device)
model.eval()
model.generation_config = self.remove_part_of_generation_config(model.generation_config)
return model, tokenizer
if __name__ == '__main__':
baichuan = BaichuanLM(device_ids="1", )
baichuan.infer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->')
baichuan.infer('登鹳雀楼->王之涣\n夜雨寄北->')
baichuan.infer('苹果公司的CEO是')
query_list = [
"谷歌公司的CEO是",
'登鹳雀楼->王之涣\n夜雨寄北->',
'苹果公司的CEO是',
'华为公司的CEO是',
'微软公司的CEO是'
]
baichuan.infer_batch(query_list)

View File

@ -0,0 +1,41 @@
[model]
;模型路径
model_path=../model
;使用的设备号,多卡用逗号分隔,设置多卡,将默认使用并行模式
device_ids=2
;并行通信类型默认是hccl可选hccl/nccl(GPU)
;parallel_backend=hccl
;日志保存路径,默认是执行脚本所在路径
;log_dir=./
;是否绑核0或1默认是1表示开启
;bind_cpu=1
[precision]
;精度测试方法默认为ceval可选ceval/mmlu
mode=ceval
;精度测试工作路径
work_dir=./
;批量精度测试默认是1
batch=1
;每个科目的shot数量默认是5
shot=5
;每个问题的回答长度默认是32
;seq_len_out=32
[performance]
;性能测试模型名称,用于结果文件的命名
model_name=vicuna_13b
;测试的batch size
batch_size=1
;测试的输入的最大2的幂
max_len_exp=10
;测试的输入的最小2的幂
min_len_exp=5
;特定用例测试,格式为[[seq_in,seq_out]],注意当设置这个参数时max_len_exp min_len_exp不生效
;case_pair=[[1,2],[2,3]]
;生成的结果文件名称,默认会自动生成,一般不设置
;save_file_name=
;性能测试方法detail / normal , 默认是normal.要使用detail需要配合装饰器计时并加上环境变量 TIMEIT=1
;perf_mode=
;性能测试时是否只测试generate而跳过decode0/1 默认是0
;skip_decode=

View File

@ -0,0 +1,14 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
import os
from unittest import TestCase
from atb_speed.common.config import atb_speed_config
class ConfigTest(TestCase):
def test_1(self):
pwd = os.path.dirname(os.path.realpath(__file__))
atb_speed_config.init_config(os.path.join(pwd, "template.ini"))
self.assertEqual(atb_speed_config.performance.batch_size, 1)

View File

@ -0,0 +1,49 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright Huawei Technologies Co., Ltd. 2022-2022. All rights reserved.
"""
@Time : 2024/2/9 14:46
"""
import logging
import os
from unittest import TestCase
import torch
import torch.nn as nn
from atb_speed.common.timer import Timer
logging.basicConfig(level=logging.NOTSET)
os.environ["TIMEIT"] = "1"
class AddNet(nn.Module):
def __init__(self, in_dim, h_dim=5, out_dim=1):
super().__init__()
self.fc1 = nn.Linear(in_dim, h_dim)
self.fc2 = nn.Linear(h_dim, out_dim)
@Timer.timing
def forward(self, x_tensor, y_tensor):
out = torch.cat([x_tensor, y_tensor], dim=1)
out = torch.relu(self.fc1(out))
out = self.fc2(out)
return out
class TimerTest(TestCase):
@classmethod
def setUpClass(cls):
Timer.reset()
# Timer.sync= xxxx
cls.add_net = AddNet(in_dim=2)
def test_1(self):
for _ in range(5):
x_tensor = torch.randn(1, 1)
y_tensor = torch.randn(1, 1)
result = self.add_net.forward(x_tensor, y_tensor)
logging.info(result)
logging.info(Timer.timeit_res)
logging.info(Timer.timeit_res.first_token_delay)
logging.info(Timer.timeit_res.next_token_avg_delay)

View File

@ -0,0 +1,302 @@
# README
- Baichuan大模型融合了意图理解、信息检索以及强化学习技术结合有监督微调与人类意图对齐在知识问答、文本创作领域表现突出。
- 此代码仓中实现了一套基于NPU硬件的Baichuan推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了各Baichuan模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI | 长序列 |
|-----------------------|----------------------------|-----------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ | -------------------------- | ---- | ------ | ---- |-----|
| Baichuan2-7B | 支持world size 1,2,4,8 | 支持world size 2 | √ | × | √ | √ | √ | × | × | × | √ | × | √ | √ | × |
| Baichuan2-13B | 支持world size 2,4,8 | 支持world size 2,4 | √ | × | √ | √ | √ | × | √ | × | √ | × | √ | √ | × |
| Baichuan-7B | 支持world size 1,2,4,8 | 支持world size 2 | √ | × | √ | √ | × | × | × | × | × | × | √ | × | × |
| Baichuan-13B | 支持world size 2,4,8 | 支持world size 2,4 | √ | × | √ | √ | × | × | × | × | × | × | √ | × | × |
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
|-------------|--------------------------------------------------------------------------------------------------------------------------------|
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/ModelLink/`若使用gitee下载的代码则路径为`${working_dir}/ModelLink/mindie_ref/mindie_llm/atb_models` |
| script_path | 脚本所在路径。Baichuan系列模型的工作脚本所在路径为${llm_path}/examples/models/baichuan |
| weight_path | 模型权重路径 |
## 权重
**权重下载**
- [Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B/tree/main)
- [Baichuan-13B](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/tree/main)
- [Baichuan2-7B](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/tree/main)
- [Baichuan2-13B](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main)
**权重转换**
- Paged Attention 场景下需要.safetensors 格式的权重,如果没有,参考[此README文件](../../README.md)转换
**量化权重生成**
- 基于原始的FP16的权重生成量化权重
- W8A8 Antioutlier量化权重请使用以下指令生成
- 暂不支持
- W8A8量化权重请使用以下指令生成
- baichuan2-7b使用quant_baichuan2_7b_w8a8.pybaichuan2-13b使用quant_baichuan2_13b_w8a8.py
- 备注建议精度测试使用cpu生成量化权重。npu生成的量化权重可作为调试使用精度会有损失。
- 修改权重路径
- 根据模型将当前目录下的quant_baichuan2_7b_w8a8.py或quant_baichuan2_13b_w8a8.py文件中的input_fp16_path 和output_w8a8_path修改为自己的浮点权重路径和输出权重路径
- 如果想用npu转换权重需要根据注释修改代码将设备设置为npu
- 执行
```
python quant_baichuan2_7b_w8a8.py (baichuan2-7b)
python quant_baichuan2_13b_w8a8.py (baichuan2-13b)
```
- 将原权重文件夹下所有文件(除权重文件*。bin拷贝到新的量化权重文件下
- `${weight_path}/config.json`文件中需设置`dtype`和`quantize`类型来标识量化类型和精度
- 若`dtype`和`quantize`字段不存在,需新增
- 配置
| 量化类型及精度 | torch_dtype | quantize |
| -------------- | ----------- | -------- |
| FP16 | "float16" | "" |
| W8A8 | "float16" | "w8a8" |
- 示例
- baichuan模型使用FP16精度W8A8量化
```
{
"torch_dtype": "float16",
"quantize": "w8a8"
}
```
- W8A16量化权重请使用以下指令生成
- 暂不支持
- W4A16量化权重请使用以下指令生成
- 当前w4a16只支持baichuan2-13b模型
- baichuan2-13b使用quant_baichuan2_13b_w4a16.py
- 备注建议精度测试使用cpu生成量化权重。
- 修改权重路径
- 根据模型将当前目录下的quant_baichuan2_13b_w4a16.py文件中的FP16_PATH 和OUTPUT_PATH修改为自己的浮点权重路径和输出权重路径
- 执行
```
python quant_baichuan2_13b_w4a16.py (baichuan2-13b)
```
- 将原权重文件夹下所有文件(除权重文件*。bin拷贝到新的量化权重文件下
- `${weight_path}/config.json`文件中需设置`dtype`和`quantize`类型来标识量化类型和精度
- 若`dtype`和`quantize`字段不存在,需新增
- 配置
| 量化类型及精度 | torch_dtype | quantize |
| -------------- | ----------- | -------- |
| FP16 | "float16" | "" |
| W4A16 | "float16" | "w4a16" |
- 示例
- baichuan模型使用FP16精度W8A8量化
```
{
"torch_dtype": "float16",
"quantize": "w4a16"
}
```
- 稀疏量化权重请使用以下指令生成
- Step 1
```shell
# 设置CANN包的环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
cd ${llm_path}
python examples/models/llama/convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A8S量化权重路径} --w_bit 4 --a_bit 8 --calib_file ${llm_path}/examples/convert/model_slim/teacher_qualification.jsonl --fraction 0.011 --co_sparse True
```
请确保转换量化权重时transformer是==4.30.2
- Step 2量化权重切分及压缩
> 运行前需要确保压缩工具编译过
>
> `cd /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/msmodelslim/pytorch/weight_compression/compress_graph`
>
> `bash build.sh /usr/local/Ascend/ascend-toolkit/latest`
```
torchrun --nproc_per_node {TP数} -m examples.convert.model_slim.sparse_compressor --model_path {W8A8S量化权重路径} --save_directory {W8A8SC量化权重路径}
```
- TP数为tensor parallel并行个数
- 注意若权重生成时以TP=4进行切分则运行时也需以TP=4运行
- 示例
```
torchrun --nproc_per_node 4 -m examples.convert.model_slim.sparse_compressor --model_path /data1/weights/model_slim/baichuan2-7b_w8a8s --save_directory /data1/weights/model_slim/baichuan2-7b_w8a8sc
```
**基础环境变量**
- 参考[此README文件](../../../README.md)
## 推理
### 对话测试
**运行Flash Attention FP16**
- 其余Baichuan模型参考以下运行方式
- 运行启动脚本
- 在\${llm_path}目录下执行以下指令
```shell
bash examples/models/baichuan/run_fa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20036`
- 设置卡间通信端口
- 默认使用20036端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export HCCL_BUFFSIZE=120
export HCCL_WHITELIST_DISABLE=1
export ATB_CONTEXT_WORKSPACE_RING=1
export ATB_CONTEXT_WORKSPACE_SIZE=2629145600
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
export ATB_LAUNCH_KERNEL_WITH_TILING=0
export ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=1
export ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=0
```
**运行Flash Attention BF16**
- 暂不支持
**运行Flash Attention W8A8**
- 暂不支持
**运行Flash Attention W8A16**
- 暂不支持
**运行Flash Attention W4A16**
- 暂不支持
**运行Paged Attention FP16**
- 运行启动脚本
- 在\${llm_path}目录下执行以下指令
```shell
chat模式仅支持baichuan2系列:
bash examples/models/baichuan/run_pa.sh ${weight_path} chat
非chat模式:
bash examples/models/baichuan/run_pa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20036`
- 设置卡间通信端口
- 默认使用20036端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
```
**运行Paged Attention BF16**
- 暂不支持
**运行Paged Attention W8A8**
- 运行启动脚本
- 与“运行Paged Attention FP16”的启动方式相同
- `${weight_path}`为W8A8量化权重的路径
- 环境变量说明
- 参见“运行Paged Attention FP16”中的环境变量说明
- 相比于FP16运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段,将此字段对应的值修改为`w8a8`
- 若config.json中无此字段则新增
**运行Paged Attention W8A16**
- 暂不支持
**运行Paged Attention W4A16**
- 运行启动脚本
- 与“运行Paged Attention FP16”的启动方式相同
- `${weight_path}`为W4A16量化权重的路径
- 环境变量说明
- 参见“运行Paged Attention FP16”中的环境变量说明
- 相比于FP16运行量化时需修改W4A16量化权重`${weight_path}/config.json`中的`quantize`字段,将此字段对应的值修改为`w4a16`
- 若config.json中无此字段则新增
**运行KV cache量化**
- 待补充
**运行稀疏量化**
- 运行启动脚本
- 与“运行Paged Attention FP16”的启动方式相同
- `${weight_path}`为W8A8量化权重的路径
- 环境变量说明
- 参见“运行Paged Attention FP16”中的环境变量说明
- 相比于FP16运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段,将此字段对应的值修改为`w8a8sc`
- 若config.json中无此字段则新增
- 注意压缩算法与硬件强相关当前仅300I DUO卡支持稀疏量化
**运行MOE量化**
- 待补充
## 精度测试
- 参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
export MAX_MEMORY_GB=29
bash run.sh pa_fp16 full_BoolQ 1 baichuan2_7b ${baichuan-7b权重路径} 4
bash run.sh pa_fp16 full_BoolQ 1 baichuan2_13b ${baichuan-13b权重路径} 4
bash run.sh pa_fp16 full_BoolQ 1 baichuan2_7b ${baichuan2-7b权重路径} 4
bash run.sh pa_fp16 full_BoolQ 1 baichuan2_13b ${baichuan2-13b权重路径} 4
```
- 注意baichuan-7b和baichuan-13b模型测试时复用baichuan2_7b和baichuan2_13b的model_name
- 运行量化权重时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配,参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/examples/README.md)
## 性能测试
- 支持ALiBi Mask Free。默认关闭如需开启请修改当前目录下的run_pa.sh中环境变量如下
```
export IS_ALIBI_MASK_FREE=1
```
- 参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
export ATB_LLM_BENCHMARK_ENABLE=1
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_7b ${baichuan-7b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_13b ${baichuan-13b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_7b ${baichuan2-7b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 baichuan2_13b ${baichuan2-13b权重路径} 8
```
- 注意baichuan-7b和baichuan-13b模型测试时复用baichuan2_7b和baichuan2_13b的model_name
- 运行量化权重时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配,参考[此README文件](https://gitee.com/ascend/MindIE-LLM/blob/master/examples/atb_models/examples/README.md)
- 特殊场景说明: 若在性能测试时发现有波动情况,可配置透明大页,提升内存访问性能。该功能请按需开启,对内存占用有一定影响。
```shell
# 性能测试时,可按需开启透明大页
echo always > /sys/kernel/mm/transparent_hugepage/enabled
# 关闭透明大页
echo never > /sys/kernel/mm/transparent_hugepage/enabled
```
## FAQ
- 更多环境变量见[此README文件](../../README.md)
- 对话测试实际执行的Python文件为`${llm_path}/examples/run_fa.py`和`${llm_path}/examples/run_pa.py`;这两个文件的参数说明见[此README文件](../../README.md)
- 运行时需要通过指令pip listgrep protobuf确认protobuf版本如果版本高于3.20.x请运行指令pip install protobuf==3.20.0进行更新

View File

@ -0,0 +1,208 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import torch.utils.data
from transformers import AutoTokenizer, AutoModelForCausalLM
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
SEQ_LEN_OUT = 32
# for local path
OUTPUT_PATH = "your output path"
FP16_PATH = "your path to model" # 原始模型路径,其中的内容如下图
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=FP16_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=FP16_PATH,
torch_dtype=torch.float32,
trust_remote_code=True
)
W_SYM = True
# 获取校准数据函数定义
def get_calib_dataset(input_tokenizer, calib_list, device="cpu"): # 如果需要使用npu进行量化, device="npu:0"。使用cpu,device="cpu"
calib_dataset = []
for calib_data in calib_list:
inputs = input_tokenizer(calib_data, return_tensors='pt')
calib_dataset.append([
inputs.data['input_ids'].to(device),
inputs.data['attention_mask'].to(device)
])
return calib_dataset
CALIB_SET = [
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\
B. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\
A. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n编写中小学教科书的直接依据是____\nA. 中华人民共和国教育法\nB. 课程计划\nC. 课程标准\
D. 课程表\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的\
是____\nA. 坐井观天所见甚少B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n下列关于课程的三种文本表现形式说法正确的是____\nA. 课程计划是由当\
地教育主管部门制订的\nB. 课程标准是依据课程计划制定的C. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式螺旋式交叉式\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的\
是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n悦悦是一名右耳失聪的残疾儿童活动课上有时会听不清楚周老师所讲的内容因此\
经常提问题对此周老师应当采取的措施是____\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿不理会悦悦\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同\
的是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n内流河也称内陆河是指没有流入海洋的河流大多分布在大陆内部干燥地区\
游降水或冰雪融水为其主要补给水源最终消失于沙漠或注入内陆湖泊下列中国内流河中最长的是____\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同\
的是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学校规定学生不能烫染头发但是小文为了彰显个性在假期把头发染成了棕色\
对小文的情况教师应该怎样处理____\nA. 年轻人追求个性是合情合理的应该宽容对待\nB. 违反学校的校规应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明\
小文违反校规的原因并对其进行劝导和教育\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的\
是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n张老师根据自己班级的情况为解决班级内部班干部的人际关系问题建立和谐融洽\
的班级氛围自主开发了和谐人际的班级课程这体现了教师____\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n刘老师工作很负责学生在学校出现一点问题他就会与家长联系在与家长沟通时他经常以前辈的姿态对待家长对家长的教育方式指指点点刘老师的做法\
____\nA. 正确老师就应该与家长经常沟通\nB. 正确老师的经验比家长丰富应该多指导家长\nC. 不正确教师没有权利指导家长\nD. 不正确教师应该与家长建立平等的沟通关系尊重家长的人格\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在古代印度有一户人家经营一家棉布店销售自己手工制作的衣服你认为这户人家属于哪个等级____\nA. 婆罗门\nB. 刹帝利\
C. 吠舍\nD. 首陀罗\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n小型分散便于开展多种多样的活动满足学生不同的兴趣爱好发展学生的才能使学生得到更多的学习和锻炼的机会\
这种课外活动的形式是____\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n小红每天晚上临睡前都要多次反复检查自己的书包确保带齐了第二天需要用的教材和文具她明知道没有这个必要但就是控制不住她可\
能出现了____\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n国家管理和评价课程的基础是____\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n儿童坚持性发生明显质变的年龄约在____\nA. 34\nB. 45\nC. 56\nD. 6岁以后\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n红楼梦中人物众多关系繁杂为了帮助读者阅读许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图这属于____\
A. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学期结束时班主任王老师会对学生思想品德的发展变化情况进行评价这项工作属于____\nA. 工作总结\nB. 工作计划\nC. 操行评定\
D. 建立学生档案\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n人们常说教学有法而教无定法这反映了教师的劳动具有____\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造\
\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n县级以上地方各级人民代表大会是县级以上地方国家权力机关其职权不包括____\nA. 改变或撤销本级人大常务委员会不适当的决定\
B. 选举并有权罢免本级人民法院院长\nC. 批准本行政区域内的预算执行情况的报告\nD. 决定并宣布下一级行政区城进入紧急状态\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在心理健康课上同一批学生在第二次进行同样内容的人格测验时获得的分数与上次测验差别较大这说明该测验存在的问题是____\
A. 信度问题\nB. 效度问题\nC. 难度问题\nD. 区分度问题\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n李老师在教学生区分形近字将四个字相同的右半部分用白色粉笔写出相异的左半部分用彩色粉笔写出李老师运用了\
知觉的____\nA. 整体性\nB. 选择性\nC. 理解性\nD. 恒常性\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n兰兰学会走路后,就要很喜欢尝试自己穿衣吃饭捡东西,喜欢探索周围世界按照埃里克森人格发展阶段理论,兰兰所处的发展阶段是____\
A. 信任对怀疑\nB. 自立对羞怯\nC. 主动感对内疚感\nD. 勤奋感对自卑感\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n杨老师在教授生字词的过程中发现部分学生有缺笔少画的现象于是他把小学生缺笔少画现象的原因及对策研究作为研究课题拟订相应的研究计划\
在工作中收集整理相关资料并实施教学措施最后根据反馈信息调整教学方案这种研究方法属于____\nA. 教育行动研究法\nB. 教育实验法\nC. 教育叙事研究法\nD. 个案研究法\nAnswer:"
]
def main():
dataset_calib = get_calib_dataset(tokenizer, CALIB_SET)
'''
对于linear算子中的激活值如果有表示范围过大或者尖刺的异常值过多
需要使用anti outlier功能使用方法如下
'''
anti_config = AntiOutlierConfig(a_bit=16, w_bit=4, anti_method="m3", dev_type="cpu", w_sym=W_SYM)
anti_outlier = AntiOutlier(model, calib_data=dataset_calib, cfg=anti_config, norm_class_name="RMSNorm")
anti_outlier.process()
'''
下面是回退层的设置因为w8a8的对激活值也进行了量化会有部分网络层对激活值的表示
范围较为敏感所以需要回退这些网络层使用浮点权重进行计算
'''
disable_names = []
baichuan_layers = 40
disable_idx_lst = list(range(baichuan_layers))
for layer_index in disable_idx_lst:
down_proj_name = "model.layers.{}.mlp.down_proj".format(layer_index)
disable_names.append(down_proj_name)
model.eval()
quant_config = QuantConfig(a_bit=16, w_bit=4, disable_names=disable_names, dev_type='cpu',
w_sym=W_SYM, mm_tensor=False, is_lowbit=True, open_outlier=False,
group_size=64, disable_last_linear=False)
calibrator = Calibrator(model, quant_config, calib_data=[], disable_level='L0')
calibrator.run()
calibrator.save(OUTPUT_PATH, save_type=["safe_tensor", "numpy"])
if __name__ == "__main__":
main()

View File

@ -0,0 +1,197 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
from transformers import AutoTokenizer, AutoModelForCausalLM
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
INPORT_FP16_PATH = 'the_path_of_fp16_model_input'
OUTPORT_W8A8_PATH = 'the_path_of_w8a8_model_output'
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=INPORT_FP16_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=INPORT_FP16_PATH, trust_remote_code=True).\
float().cpu()
# 获取校准数据函数定义
def get_calib_dataset(tokenizer, calib_list, device="cpu"): # 如果需要使用npu进行量化, device="npu:0"。使用cpu,device="cpu"
calib_dataset = []
for calib_data in calib_list:
inputs = tokenizer(calib_data, return_tensors='pt')
calib_dataset.append([
inputs.data['input_ids'].to(device),
inputs.data['attention_mask'].to(device)
])
return calib_dataset
CALIB_SET = [
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\
B. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\
A. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n编写中小学教科书的直接依据是____\nA. 中华人民共和国教育法\nB. 课程计划\nC. 课程标准\
D. 课程表\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的\
是____\nA. 坐井观天所见甚少B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n下列关于课程的三种文本表现形式说法正确的是____\nA. 课程计划是由当\
地教育主管部门制订的\nB. 课程标准是依据课程计划制定的C. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式螺旋式交叉式\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的\
是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n悦悦是一名右耳失聪的残疾儿童活动课上有时会听不清楚周老师所讲的内容因此\
经常提问题对此周老师应当采取的措施是____\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿不理会悦悦\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同\
的是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n内流河也称内陆河是指没有流入海洋的河流大多分布在大陆内部干燥地区\
游降水或冰雪融水为其主要补给水源最终消失于沙漠或注入内陆湖泊下列中国内流河中最长的是____\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同\
的是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学校规定学生不能烫染头发但是小文为了彰显个性在假期把头发染成了棕色\
对小文的情况教师应该怎样处理____\nA. 年轻人追求个性是合情合理的应该宽容对待\nB. 违反学校的校规应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明\
小文违反校规的原因并对其进行劝导和教育\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理\
学习迁移产生的关键是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现\
####”符号时表明____。\nA. 显示的是字符串“####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914\
\nB. 1918\nC. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的\
是____\nA. 坐井观天所见甚少\nB. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n张老师根据自己班级的情况为解决班级内部班干部的人际关系问题建立和谐融洽\
的班级氛围自主开发了和谐人际的班级课程这体现了教师____\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n刘老师工作很负责学生在学校出现一点问题他就会与家长联系在与家长沟通时他经常以前辈的姿态对待家长对家长的教育方式指指点点刘老师的做法\
____\nA. 正确老师就应该与家长经常沟通\nB. 正确老师的经验比家长丰富应该多指导家长\nC. 不正确教师没有权利指导家长\nD. 不正确教师应该与家长建立平等的沟通关系尊重家长的人格\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在古代印度有一户人家经营一家棉布店销售自己手工制作的衣服你认为这户人家属于哪个等级____\nA. 婆罗门\nB. 刹帝利\
C. 吠舍\nD. 首陀罗\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n小型分散便于开展多种多样的活动满足学生不同的兴趣爱好发展学生的才能使学生得到更多的学习和锻炼的机会\
这种课外活动的形式是____\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n小红每天晚上临睡前都要多次反复检查自己的书包确保带齐了第二天需要用的教材和文具她明知道没有这个必要但就是控制不住她可\
能出现了____\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n国家管理和评价课程的基础是____\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n儿童坚持性发生明显质变的年龄约在____\nA. 34\nB. 45\nC. 56\nD. 6岁以后\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n红楼梦中人物众多关系繁杂为了帮助读者阅读许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图这属于____\
A. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n学期结束时班主任王老师会对学生思想品德的发展变化情况进行评价这项工作属于____\nA. 工作总结\nB. 工作计划\nC. 操行评定\
D. 建立学生档案\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n人们常说教学有法而教无定法这反映了教师的劳动具有____\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造\
\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n县级以上地方各级人民代表大会是县级以上地方国家权力机关其职权不包括____\nA. 改变或撤销本级人大常务委员会不适当的决定\
B. 选举并有权罢免本级人民法院院长\nC. 批准本行政区域内的预算执行情况的报告\nD. 决定并宣布下一级行政区城进入紧急状态\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n在心理健康课上同一批学生在第二次进行同样内容的人格测验时获得的分数与上次测验差别较大这说明该测验存在的问题是____\
A. 信度问题\nB. 效度问题\nC. 难度问题\nD. 区分度问题\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n李老师在教学生区分形近字将四个字相同的右半部分用白色粉笔写出相异的左半部分用彩色粉笔写出李老师运用了\
知觉的____\nA. 整体性\nB. 选择性\nC. 理解性\nD. 恒常性\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n兰兰学会走路后,就要很喜欢尝试自己穿衣吃饭捡东西,喜欢探索周围世界按照埃里克森人格发展阶段理论,兰兰所处的发展阶段是____\
A. 信任对怀疑\nB. 自立对羞怯\nC. 主动感对内疚感\nD. 勤奋感对自卑感\nAnswer:",
"The following are multiple choice questions (with answers) about teacher qualification.\n\n下列对于多动症的说法不正确的是____\
A. 由多种原因引起的一组综合征\nB. 某种神经递质的缺陷可诱发该病\nC. 神经髓鞘发育落后可诱发该病\nD. 营养不良可诱发该病\nAnswer: D\n\n学习迁移发生的必要条件是两种学习活动之间存在共同原理学习迁移产生的关键\
是学习者通过活动能概括出其共同原理持这种观点的迁移理论被称为____\nA. 形式训练说\nB. 相同要素说\nC. 概括化理论\nD. 关系理论\nAnswer: C\n\nExcel中通常在单元格内出现####”符号时,表明\
____\nA. 显示的是字符串####”\nB. 列宽不够,无法显示数值数据\nC. 数值溢出\nD. 计算错误\nAnswer: B\n\n第二次世界大战开始时间是____。\nA. 1914年\nB. 1918年\
C. 1939\nD. 1945\nAnswer: C\n\n在日常生活中我们经常会接触一些民谚俗语这些民谚俗语蕴含着丰富的物理知识下列民谚俗语蕴含的物理知识所属领域不同的是____\nA. 坐井观天所见甚少\
B. 瑞雪兆丰年\nC. 酒香不怕巷子深\nD. 下雪不寒化雪寒\nAnswer: A\n\n杨老师在教授生字词的过程中发现部分学生有缺笔少画的现象于是他把小学生缺笔少画现象的原因及对策研究作为研究课题拟订相应的研究计划\
在工作中收集整理相关资料并实施教学措施最后根据反馈信息调整教学方案这种研究方法属于____\nA. 教育行动研究法\nB. 教育实验法\nC. 教育叙事研究法\nD. 个案研究法\nAnswer:"
]
dataset_calib = get_calib_dataset(tokenizer, CALIB_SET)
# 对于linear算子中的激活值如果有表示范围过大或者“尖刺”的异常值过多
# 需要使用anti outleir功能使用方法如下
anti_config = AntiOutlierConfig(anti_method="m2", dev_type="cpu") # dev_type="npu", dev_id=0 如果需要使用npu进行量化。
anti_outlier = AntiOutlier(model, calib_data=dataset_calib, cfg=anti_config, norm_class_name="RMSNorm")
anti_outlier.process()
# 下面是回退层的设置因为w8a8的对激活值也进行了量化会有部分网络层对激活值的表示
# 范围较为敏感所以需要回退这些网络层使用浮点权重进行计算。
disable_names = []
baichuan_layers = 40
disable_idx_lst = list(range(baichuan_layers))
for layer_index in disable_idx_lst:
down_proj_name = "model.layers.{}.mlp.down_proj".format(layer_index)
disable_names.append(down_proj_name)
quant_config = QuantConfig(
a_bit=8,
w_bit=8,
disable_names=disable_names,
disable_last_linear=False,
dev_type='cpu', # dev_type="npu", dev_id=0 如果需要使用npu进行量化
act_method=3,
pr=1.0,
w_sym=True,
mm_tensor=False
)
calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L0')
calibrator.run() # 执行PTQ量化校准
# "safe_tensor"对应safetensors格式权重"numpy"对应npy格式权重
calibrator.save(OUTPORT_W8A8_PATH, save_type=["safe_tensor"])

View File

@ -0,0 +1,746 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import logging
from transformers import AutoTokenizer, AutoModelForCausalLM
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
INPORT_FP16_PATH = 'the_path_of_fp16_model_input'
OUTPORT_W8A8_PATH = 'the_path_of_w8a8_model_output'
tokenizer = AutoTokenizer.from_pretrained(
pretrained_model_name_or_path=INPORT_FP16_PATH,
use_fast=False,
padding_side='left',
trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=INPORT_FP16_PATH,
trust_remote_code=True).float().cpu()
# model = model.half().npu() # 如果需要使用npu进行量化
# 获取校准数据函数定义
def get_calib_dataset(
auto_tokenizer,
calib_list,
device="cpu"): # 如果需要使用npu进行量化, device="npu:0"。使用cpu,device="cpu"
calib_dataset = []
for calib_data in calib_list:
inputs = auto_tokenizer(calib_data, return_tensors='pt')
calib_dataset.append([
inputs.data['input_ids'].to(device),
inputs.data['attention_mask'].to(device)
])
return calib_dataset
calib_set = [
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,\
静谧中含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸\
露着贫瘠\nB. 也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都\
是荒村野店时而会有一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只\螃蟹\
放进去时渔夫就用重物将口封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底\
此下去即使篓口没有盖盖子但也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必\
然内耗团结就是力量\nD. 与人方便自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer:\
A\n\n①我的奶奶是这样我的父亲也是这样那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像\
抽离出记忆那么在那个时代里成长生活的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了\
也会认认真真卷起来放好我曾看别人卷过这画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生\
从这个意义上说尽管也许并不懂他但人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \
\nAnswer: D\n\n相机拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n甲与乙准备进行一个游戏向空中扔\
三枚硬币如果它们落地后全是\正面向上或全是反面向上乙就给甲钱但若出现两正面一反面或两反面一正面的情况则由甲给乙钱乙要求甲每次给10元\
从长远来看甲应该要求乙每次至少给____元才可考虑参加这个游戏\nA. 10\nB. 15\nC. 20\nD. 30\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧\
中含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫\
\nB. 也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野\
时而会有一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时\
夫就用重物将口封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓\
口没有盖盖子但也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是\
力量\nD. 与人方便自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶\
是这样我的父亲也是这样那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么\
在那个时代里成长生活的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起\
来放好我曾看别人卷过这画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义\
上说尽管也许并不懂他但人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: \
D\n\n相机拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n下列著名诗人与其代表作对应有误的是____\nA. \
李白将进酒\nB. 白居易琵琶行\nC. 王之焕登鹳雀楼\nD. 杜甫长恨歌\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧\
中含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫\
\nB. 也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店\
时而会有一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用\
重物将口封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖\
盖子但也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. \
人方便自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父\
亲也是这样那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成\
生活的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别\
人卷过这画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不\
懂他但人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n经济学上所推崇的橄榄型收入分配结构是指低收入和高收入相对较\
中等收入占绝大多数的分配结构我国正在采取措施实施提低扩中调高打非保困的方针使收入分配朝着橄榄型方向发展这主要是为了\
促进____\nA. 生产的发展\nB. 效率的提高\nC. 社会的公平\nD. 内需的扩大\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄\
____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n-81-36-90936____\nA. 49\nB. 64\nC. 81\nD. 100\nAns\
wer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\nVIP服务本来是个好东西大企业作为市场竞争的主体实行差别化服\
无可厚非但近年来一些企业纷纷进军医院机场车站等公共场所掏些赞助费设立所谓贵宾厅霸占公共资源不仅带来浪费更造成公共资源分配的不\
这段文字主要强调的是____\nA. 公共资源不该过度VIP\nB. VIP服务导致了公共资源的不公平分配\nC. 一些企业搬进医院机场车站办公\nD. 实行差别化\
服务是VIP服务的优势所在\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n258121724____\nA. 30\nB. 32\nC. 34\nD. 36\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n4461230____\nA. 48\nB. 64\nC. 80\nD. 90\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n当下中国文学描写官斗职斗婚斗家斗的作品比较流行这些作品\
中包含了不少对日常生活中权术和心机的描写这样的写作有可能削弱文学对社会的积极影响文学有必要与正义结盟形成诗性正义以提升生活 作者想表达的主\
要观点是____\nA. 当下文学作品的社会影响力有下降的趋势\nB. 流行作品未必是好作品这需要时间的检验\nC. 文学不应过度渲染权术机诈否则有可能泯灭正\
\nD. 生活中没那么多权术机诈文学创作应该贴近生活不能闭门造车\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n一天一个农民的驴子掉到枯井里那可怜的驴子在井里凄凉地惨叫了\
几个钟头农民亦急得团团转就是毫无办法把它救起来最后他断然认定驴子已老了这口枯井也该填起来不值得花精力去救驴子他请来所有邻居帮他填井\
大家抓起铁锹开始往井里填土驴子很快意识到发生了什么事起初它恐慌地大哭不一会儿居然安静下来人们忍不住往井里看奇迹发生了每一铲砸到驴子\
背上的土它都作了出人意料的处理迅速抖落一身尘土然后狠狠地用脚踩紧这样没过多久驴子竟然自己把自己升了起来到了井口它纵身一跳平安地跑开\
在场的人均惊诧不已 这段文字告诉我们的道理是____\nA. 人生中的每一个困难都是通往成功的垫脚石\nB. 换一种思维常常能够产生意想不到的效果\nC. \
静思考是克服困难的首要条件\nD. 求人不如求己很多时候自己才是自己最大的救星\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中含\
着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有一\
座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口封\
当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也没\
有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便自己\
方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画像\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人们心\
甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n在现代社会教育符号也即文凭和学历是一种重要的文化货币手持符号资本可进入相\
应职业群体身份团体和社会位置譬如凭借医学博士文凭可成为医生此为教育的筛选功能亦被喻为人才的分类编码场如同公共汽车总站目的地不同的人选\
择不同的路线乘坐不同的车辆到达不同的地方 下列选项不符合文意的一项是____\nA. 文凭与学历都是符号资本\nB. 教育符号是人才的分类编码\nC. 文凭体\
现了教育的筛选功能\nD. 手持相应的符号资本才能进入相应的职业群体\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n侯方域桃花扇____\nA. 蒲松龄聊斋志异\nB. 石头记\
红楼梦\nC. 崔莺莺西厢记\nD. 秦始皇后汉书\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n____全党同志和全国人民团结一心坚持不懈地奋斗不断取得扎扎实\
实的成效我们____一定能够使社会主义新农村建设真正成为惠及广大农民群众的民心工程 填入画横线部分最恰当的一项是____\nA. 如果 \nB. 只有 \
\nC. 只要 \nD. 倘若 也就\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中含\
着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会\
有一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物\
将口封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖\
但也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. \
人方便自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父\
亲也是这样那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成\
生活的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别\
人卷过这画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不\
懂他但人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n下列关于世界银行的说法中不正确的是____\nA. 原名国际复兴开发\
银行于1944年开始营业\nB. 它是联合国下属的一个专门机构\nC. 是负责长期贷款的国际金融机构\nD. 贷款期限较长一般为数年最长可达30年\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n外资银行进入新兴市场国家新兴市场国家银行业的各主体为了维持自\
身的生存会尽可能争取较大的市场份额充分拓展自身竞争优势努力向客户提供质优价廉的金融产品和金融服务这个过程必然带动银行业微观效率的提升 这个\
过程指的是____\nA. 外资银行进入新兴市场国家的过程\nB. 新兴市场国家银行业发展的过程\nC. 外资银行提供优质服务的过程\nD. 新兴市场国家银行业扩大市场\
份额的过程\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中含\
着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会\
有一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将\
口封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
但也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方\
便自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也\
是这样那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长\
生活的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷\
过这画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂\
但人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n按照行政层级标准来划分我国政府机构的类型有____\nA. \
般地方国家行政机关和基层国家行政机关两大类\nB. 常设机构与非常设机构两类\nC. 领导机构办公办事机构职能机构和派出机构四类\nD. 中央国家行政机关和地\
方国家行政机关两大类\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n在某市一项对公司年轻人员的最新调查中与往年相比今年有70\
人打算购买房屋这一比例已达到历史最高值然而在房屋管理局的统计中该市今年的房屋成交量却比往年有所下降以下哪项如果为真最不能解释上述现\
?____\nA. 一些打算购买房屋的年轻人目前并不具备该市购买房屋的条件\nB. 往年资料表明年轻人员购买房屋的比例不足购买房屋成员的30\nC. 近年来爆发的\
金融风暴对房地产行业有一定的打击\nD. 近几个月该市楼市价格不稳定使得一些购房者持观望态度\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n我们以往所理解的现代化概念仅仅局限于物质层面局限于表层经济现代化这也是\
迟发展国家长期存在的一个普遍性问题在物质层面上求变的欲望很强而在制度层面和观念层面上却是文化守成主义的这种状况对于现代化实际进程的影响自不必说\
它对于学术的影响是导致知识的流俗化不断地更换新词语在新词语的装潢下重复古老的思想观念结果是词语和口号不断地更换而社会精神气质则没有实质性的变化 \
这段文字要表达的主要意思是____\nA. 现代化应包括物质的制度的观念的三个层面\nB. 片面理解现代化是迟发展国家长期存在的一个普遍性问题\nC. 物质层面\
的落后现状是迟发展国家片面理解现代化的一个重要因素\nD. 片面理解现代化会导致知识的流俗化\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n皮肤破损出血颈髓损伤锐器插入体内严重挤压伤等是灾害发生时\
的常见损伤类型掌握科学的自救方法对于延续生命等待救援很重要下列自救措施中恰当的是____\nA. 锐器插人体内后应快速将锐器拔出简单处理伤口后\
立即送往医院救治\nB. 对颈后锐痛活动时疼痛加剧等症状即用颈托一时无颈托可临时用敷料硬板纸或塑料板做成颈圈固定颈部\nC. 伤口发生喷射状出血时\
应立即用厚消毒纱布(或毛巾)包扎好伤口\nD. 被重物挤压引起肢体肿胀或青紫时应尽快在患处用热毛巾湿敷消肿\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画像\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人们心\
甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n多年以来医生和家属对待癌症患者大多采取这样的态度即向患者隐瞒已得癌症的实情\
这样的做法在医学上叫作保护性医疗其目的在于减少患者的心理负担但是某肿瘤医生新设立的康复科的张主任却主张实行公开性治疗 由此可推知下文将要论\
述的是____\nA. 家属对实行公开性治疗的态度\nB. 保护性医疗的弊端\nC. 公开性治疗将使病情得到控制和好转\nD. 公开性治疗的含义和形式\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画像\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人们\
心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n古人归纳总结出许多观天象识天气的谚语下列与天气变化无关的谚语是____\nA. \
霞不出门晚霞行千里\nB. 天上鱼鳞云地下雨淋淋\nC. 东风是个精不下也要阴\nD. 百日连阴雨总有一日晴\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧\
中含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸瘠\nB. 也许\
是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有一座\
小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口封\
当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n从论语孔子对音乐的重视可以说远远超出了后世那些尊敬他\
的人的想象这一方面来自他对于乐的精神艺术的新发现艺术只在人们精神的发现中才存在可以说就现在见到的材料看孔子可能是中国历史上最伟大的艺术精\
神的发现者这段文字重点强调____\nA. 孔子在音乐方面的成就与贡献\nB. 后人评价孔子时所存在的偏颇\nC. 艺术精神在乐教传承中的作用\nD. 论语作为文\
献的重要意义\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n①当地球撞进尘埃带时从地球上看是短时间内无数尘埃以极高的速\
度划破大气层下落 因此流星雨实际上是彗星留下的无数尘埃形成的 进入大气层的尘埃被大气加热发出明亮的光 彗星释放出的尘埃并非顷刻扩散到宇宙空间\
消失得无影无踪而是留在彗星的轨道上继续公转 这样看上去就有许多流星也就是流星雨 这样形成的尘埃带有些和地球的公转轨道交叉 将以上6个句子重新排\
语序正确的是____\nA. \nB. \nC. \nD. \nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
拍摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n3716107____\nA. 1704\nB. 1072\nC. 1707\nD. \
\1068\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有一\
座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口封\
当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n我始终____开始在内心生活得更严肃的人也会在外在上开始生活得更____在一个\
奢华浪费的年代我希望能向世界____人类真正要的东西是非常之微小的 填入画横线部分最恰当的一项是____\nA. 确认 朴素 表明\nB. 相信 质朴 证明\nC. \
确认 质朴 证明\nD. 相信 朴素 表明\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n一特殊跑道为正三角形某运动员用6米秒的速度跑一圈耗时50秒问该运动员提\
速10后从跑道的某个顶点横穿跑道跑向对边问最少约需多少秒可踏足对边?(四舍五入到个位)____\nA. 9\nB. 10\nC. 13\nD. 15\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n文学资料在思想史领域著作中被使用得还是相当少其实作为记述史实的历史\
能对有些夸张和虚构的小说需要警惕但是作为考察理性和情感的思想史却不必胶柱鼓瑟或因噎废食任何文学作品也许在事实上有想象但在语言立场和情感上\
却仿佛当堂呈供并不能把自己的本相全盘隐匿 对这段文字的主旨理解最准确的是____\nA. 文学作品呈现艺术的真实\nB. 思想史研究应体现理性和情\
\nC. 文学资料可以作为思想史研究的史料\nD. 思想史研究中要慎用文学资料\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n下列关于国际组织的表述不正确的是____\nA. 石油输出国组织通过实行\
石油生产配额限制维护石油生产国利益\nB. 博鳌亚洲论坛是第一个总部设在中国的国际会议组织\nC. 蒙古国是上海合作组织的成员国之一\nD. 国际货币基金组织是联\
合国的专门机构\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n实验证明植物体内含有一种觉察光的蛋白质可以分辨光的强弱\
种能力很可能使植物看到人类视力所看不到的波长而且具有较高的灵敏度植物能感觉光照射过来的方向光使植物知道早上什么时候该醒来同样也能促使植物额外\
分泌栎精和堪非醇这两种无色色素他们能过滤强烈的阳光充分发挥遮光剂的作用从而保护植物免受紫外线的强烈照射 这段文字主要介绍的是____\nA. 植物是\
怎么辨别方向的\nB. 植物是如何避免阳光暴晒的\nC. 植物具有一定意义上的视觉\nD. 感知阳光对植物生长的重要性\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方\
便自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也\
是这样那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长\
生活的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷\
过这画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂\
但人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n1103782145____\nA. 170\nB. 197\nC. 224\nD. \
226\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口封\
当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也没\
有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便自己\
方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许多\
脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画像\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人们心\
甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. 空调\
降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n某县在一次招商引资活动中投资商刁难引资方说我有三个项目环境项目旅游项目\
和化工项目如果你说的话是正确的我会把其中一个项目投资到贵县但是如果你说的话是错误的我就一个项目也不投资引资方当然想获得环境项目那么引资\
方该如何说呢?____\nA. 你不会把环境项目或旅游项目投资到我县\nB. 你不会把环境项目或化工项目投资到我县\nC. 你不会把旅游项目或化工项目投资到我县\nD. \
不会把旅游项目和化工项目都投资到我县\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n民意被满意民众不满意甚至很生气尊重民意顺应民意\
纳民意是服务型政府的执政要义是政治文明建设的题中之意民意的力量一方面取决于民意征集占全民的比例即广泛性另一方面也体现在政府对民意的尊重程度\
保障民众的知情权参与权表达权和监督权就是要随时随地与民众进行多种途径的沟通交流民意内涵民智民意关乎民生我们不仅要从民意中看到民众欢\
迎什么反对什么为科学决策提供依据而且要充分发挥民智的作用尊重民意吸纳民智是科学决策的重要保证也是衡量政府亲民为民的重要标志阅读上面文\
最符合文意的一项是____\nA. 让民众不满意很生气的政府就不是服务型政府\nB. 知情权是监督权的前提参与权是表达权的前提\nC. 尊重民意吸纳民智\
是科学决策的决定性因素\nD. 民意力量的发挥取决于民意征集的广度和尊重民意的程度\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n3516821315____\nA. 107834\nB. 12849\nC. 12847\nD. 108847\nAns\
wer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画像\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人们\
心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n下列可以反映气候垂直变化的诗句是____\nA. 东边日出西边雨道是无晴却有晴\nB. \
罗浮山下四时春卢橘杨梅次第新\nC. 人间四月芳菲尽山寺桃花始盛开\nD. 横看成岭侧成峰远近高低各不同\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口封\
当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也没\
有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n日本松下公司日前在东京松下中心向当地媒体展示了其面向未来的零排放概念环保房\
环保屋的主要特点是节能创能蓄能节能就是提高对自然界既有资源的利用率同时采用环保隔热的建筑材料以及最先进的环保节能家电设备等 下文最\
有可能介绍的是____\nA. 环保屋是怎样设计出来的\nB. 环保屋的创能蓄能特点\nC. 环保屋的推广\nD. 环保屋的材料\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n下列没有歧义的一项是____\nA. 几个派出所的民警\nB. 法院门前的石狮\
\nC. 这份起诉书我写不好\nD. 咬死了主人的藏獒\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n我们发现零工制度有一个重要的支持机制就是完善的科学化的员工培训系统几乎所\
有的现代企业和公司都非常重视内部培训有的企业主甚至成为了培训狂哪怕有一秒钟的空闲也要为员工安排一次培训但真正有效的培训并不是无休止的洗脑和课程\
轰炸不是潜能激发感恩教育而是适合公司运营需求的专业性针对性科学性的业务训练这种培训机制如果能够建立起来无论你是否采用零工制度都会对\
企业的发展起到重要的推动作用 这段文字意在说明____\nA. 很多公司培训缺乏科学性\nB. 科学的员工培训对企业很重要\nC. 零工制度不一定适合所有企业\nD.\
过度培训可能会造成相反效果\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n全国人民代表大会举行会议时主持大会正式会议的是____\nA. 全国人\
大常委会\nB. 大会主席团\nC. 全国人大常委会委员长\nD. 大会秘书长\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许\
多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n改革开放以来中国农学会____献身创新求实协作的宗旨始终不渝地坚持以\
推动农业科技进步促进农村发展为己任大力开展学术交流和科技普及积极____和举荐人才为提高广大农民科技素质加快农业科技进步作出了重要贡献 填入画\
横线部分最恰当的一项是____\nA. 继承 出谋划策\nB. 继承 建言献策\nC. 秉承 建言献策\nD. 秉承 出谋划策\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n0 4 3 10 6 7 ____\nA. 101\nB. 102\nC. 103\nD. 1\
04\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n新生代散文作家大多有写现代诗的背景诗人所拥有的____的思维大胆的想象\
锐的感觉诗质____在散文语言的血液和肌理里这不同于平铺直叙式的浅浮的诗意而是自我心灵的体认中____而成的诗质 填入画横线部分最恰当的一项\
是____\nA. 跳脱 镶嵌 凝结\nB. 另类 浓缩 升华\nC. 感性 渗透 铸就\nD. 活跃 散播 提炼\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB.\
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n据咬文嚼字编辑部透露编制年度十大流行语是一项十分严肃的事既要____到\
词语在当年的流行度又要从语文伦理角度加以必要的____选优汰劣力争通过十大流行语向社会____正能量 填入画横线部分最恰当的一项是____\nA. 斟酌 \
估量 传播\nB. 思考 权衡 传送\nC. 思索 考察 传达\nD. 考虑 考量 传递\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口封\
当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也没\
有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便自己\
方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的许多\
脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画像\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人们心\
甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n20世纪60年代以前世界各国普遍注重防洪的工程措施即通过修建大堤水库水利设施\
对洪水进行控制但在60年代以后世界各国在防洪规划中越来越重视非工程措施的运用即通过洪水预警灾情评估洪灾保险等多种手段结合各种工程措施从而\
尽可能减少洪灾对人类经济环境和社会发展的影响 这段文字主要谈的是____\nA. 世界各国防洪理念的转变\nB. 世界各国控制洪水的新途径\nC. 单纯重视防洪\
工程不能有效控制洪水\nD. 非工程措施逐渐成为防洪规划的主导\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n近年来国家房地产调控措施的出台十分密集除了增加公共租赁住房供应\
再加上央行加息多个城市出现了房屋成交量下跌的态势房价涨幅开始放缓这表明____\nA. 国家通过宏观调控平衡供求关系\nB. 价格的波动通过供求关系表\
现出来\nC. 宏观调控是资源配置的基础性手段\nD. 宏观调控可以克服市场调节的滞后性\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n学生在操场上列队做操只知人数在90-110之间如果排成3排则不多不\
排成5排则少2人排成7排则少4人问学生人数是多少人?____\nA. 102\nB. 98\nC. 104\nD. 108\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n有人说人本是散落的珍珠随地乱滚文化就是那极____又强韧的细\
线将珠子串起来成为社会也有人说文化犹如空气中的氧气自然界的春雨不可或缺却____飘飘洒洒润物无声可见文化资源价值是无法用尺度衡量的 \
入画横线部分最恰当的一项是____\nA. 柔弱 视之无形\nB. 纤细 不可名状\nC. 结实 视而不见\nD. 薄弱 不可捉摸\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子但也\
没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这样\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活的\
许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这画\
那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他但人\
们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机拍摄____\nA. \
空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n政府职能与成本问题一直备受争议但这方面的研究似乎还处于一种观点与立场远未一致\
的状态一个重要原因是研究视角与方法的局限大体上看这类研究有两条思路一条是信守新古典经济学理论预设认为市场可以有效解决经济社会发展中的问\
小政府观点另一条是信守政府干预主义理论预设认为政府不时干预是市场能够健康运转的必要条件笔者认为要解决这种困境必须有新的理论视野和新\
的研究方法而新兴古典经济学理论就是其中之一 这段文字接下来最有可能讲述的是____\nA. 新兴古典经济学的理论框架与研究方法\nB. 新理论视野对提高政府\
的行政效率有何帮助\nC. 新古典经济学理论预设的局限性\nD. 政府职能与成本之间矛盾难解的原因\nAnswer:",
"The following are multiple choice questions (with answers) about civil servant.\n\n透过车轮卷起的黄土,却见山野人秋,庄稼割过,静谧中\
含着一些寂静只有阳光在切割过的根茬上烁烁闪亮____ 填入横线上最恰当的是____\nA. 这是一段颠簸的行程一路上景色苍凉雄浑寂静中裸露着贫瘠\nB. \
也许是久旱的缘故这边的溪流也变成了涓涓细流在盘踞的石缝间流动\nC. 同绿色的南方相比这里是荒凉的乃至荒蛮\nD. 偶见人迹大都是荒村野店时而会有\
一座小小的孤庙一闪而过\nAnswer: D\n\n据说在东南沿海一带渔民在捕到螃蟹后将螃蟹放进一个上小肚大的竹篓里面第一只螃蟹放进去时渔夫就用重物将口\
封住当第二只第三只放进去后渔夫就不再盖重物了因为第一只即将爬出篓口的螃蟹会被第二只第三只螃蟹拉到篓底如此下去即使篓口没有盖盖子\
也没有一只蟹能够爬出去 这个故事意在告诉我们____\nA. 人多不一定好办事\nB. 恶性竞争必然导致两败俱伤\nC. 内讧必然内耗团结就是力量\nD. 与人方便\
自己方便\nAnswer: C\n\n谨慎成就____\nA. 温和好感\nB. 勤奋努力\nC. 轻松普通\nD. 好学智慧\nAnswer: A\n\n①我的奶奶是这样我的父亲也是这\
那张画像已经成为许多老百姓生活必需品的一部分没有它似乎客厅都是空的 如果因为认知能力的提升而将偶像抽离出记忆那么在那个时代里成长生活\
的许多人脑子里将空空如也甚至不记得自己曾经活过这一回 卷的过程是在收叠他个人的历史 有时挂旧了破了也会认认真真卷起来放好我曾看别人卷过这\
画像那种澄澈的眼神令人难忘 有些伟大者永远不会被人遗忘因为那个伟大者在那个时代其实是一种生活精神生活 从这个意义上说尽管也许并不懂他\
人们心甘情愿尊他的名为圣 将以上6个句子重新排列语序正确的是____\nA. \nB. \nC. \nD. \nAnswer: D\n\n相机\
摄____\nA. 空调降温\nB. B超诊断\nC. 电脑操作\nD. 地图交通\nAnswer: B\n\n2009年有两次立春很容易让人联想到第二春二度春可想而知这\
样的婚姻不稳定所以网络上有2009年不能结婚或者2009年爱情不会长久等传闻但是大多数年轻人认为登记结婚是件水到渠成的事不会因为赶日子仓促提前\
或延迟 根据这段文字下列说法正确的是____\nA. 作者认为2009年不适合结婚\nB. 大多数年轻人认为2009年是结婚的好年头\nC. 2009年结婚会使婚姻不稳定的\
说法是无稽之谈\nD. 大多数年轻人不会因为2009年有两次立春而改变自己的结婚计划\nAnswer:"
]
dataset_calib = get_calib_dataset(tokenizer, calib_set)
# 对于linear算子中的激活值如果有表示范围过大或者“尖刺”的异常值过多
# 需要使用anti outleir功能使用方法如下
logging.info("===============start AntiOutlier==============")
anti_config = AntiOutlierConfig(
w_bit=8, a_bit=8, anti_method="m2",
dev_type="cpu") # dev_type="npu", dev_id=0 如果需要使用npu进行量化。
anti_outlier = AntiOutlier(model,
calib_data=dataset_calib,
cfg=anti_config,
norm_class_name="RMSNorm")
anti_outlier.process()
#下面是回退层的设置因为w8a8的对激活值也进行了量化会有部分网络层对激活值的表示
#范围较为敏感所以需要回退这些网络层使用浮点权重进行计算。
logging.info("===============end AntiOutlier==============")
disable_names = []
BAICHUAN_LAYERS = 32
disable_idx_lst = list(range(BAICHUAN_LAYERS))
for layer_index in disable_idx_lst:
down_proj_name = "model.layers.{}.mlp.down_proj".format(layer_index)
disable_names.append(down_proj_name)
quant_config = QuantConfig(
a_bit=8,
w_bit=8,
disable_names=disable_names,
disable_last_linear=False,
dev_type='cpu', # dev_type="npu", dev_id=0, 如果需要使用npu进行量化
act_method=3,
pr=1.0,
w_sym=True,
mm_tensor=False)
logging.info("===============start Calibrator==============")
calibrator = Calibrator(model,
quant_config,
calib_data=dataset_calib,
disable_level='L0')
calibrator.run() # 执行PTQ量化校准
calibrator.save(OUTPORT_W8A8_PATH, save_type=[
"safe_tensor"
]) # "safe_tensor"对应safetensors格式权重"numpy"对应npy格式权重
logging.info("===============end Calibrator==============")

View File

@ -0,0 +1,23 @@
# copyright (c) Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=20031
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export HCCL_BUFFSIZE=120
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
extra_param=""
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_fa --model_path $1 $extra_param
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_fa --model_path $1 $extra_param
fi

View File

@ -0,0 +1,20 @@
#!/bin/bash
set -ex
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export BIND_CPU=1
export IS_QUANT=0
export RESERVED_MEMORY_GB=3
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3
export MASTER_PORT=20036
export IS_ALIBI_MASK_FREE=0
export TP_WORLD_SIZE=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
export INT8_FORMAT_NZ_ENABLE=1
atb_options="ATB_LAUNCH_KERNEL_WITH_TILING=1 ATB_LAYER_INTERNAL_TENSOR_REUSE=1 ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1 PYTORCH_NPU_ALLOC_CONF='max_split_size_mb:2048' HCCL_BUFFSIZE=120"
atb_async_options="ATB_OPERATION_EXECUTE_ASYNC=1 TASK_QUEUE_ENABLE=1"
base_cmd="torchrun --nproc_per_node $TP_WORLD_SIZE --master_port $MASTER_PORT -m examples.run_pa --model_path $1"
if [[ "$2" == "chat" ]]; then
base_cmd+=" --is_chat_model"
fi
run_cmd="${atb_options} ${atb_async_options} ${base_cmd}"
eval "${run_cmd}"

View File

@ -0,0 +1,251 @@
# README
# 特性矩阵
- 此矩阵罗列了各bge-large-zh模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI | 长序列 |
|--------------|-------------------------|---------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ |------| ---- | ------ | ---- |-----|
| bge-large-zh | 支持world size 1 | 支持world size 1 | √ | × | × | × | × | × | × | × | × | × | × | × | × |
## 离线模型版本
### 模型介绍
bge-large-zh是由智源研究院研发的中文版文本表示模型可将任意文本映射为低维稠密向量以用于检索、分类、聚类或语义匹配等任务并可支持为大模型调用外部知识。其中**1.5版本**的相似度分布更加合理
[开源模型地址](https://huggingface.co/BAAI/bge-large-zh-v1.5)
`Commit-id 79e7739b6ab944e86d6171e44d24c997fc1e0116`
### 模型转换流程
首先获取`huggingface`开源模型将其转换为ONNX格式再使用Ascend ATC工具将ONNX格式的模型转换为om格式我们主要关注该模型在昇腾设备上的精度和性能表现。
### 变量名称解释
|变量名 |含义 |
| ------------ | ------------ |
|save_directory |onnx模型以及转换后om离线模型存放目录 |
|soc_version |昇腾AI处理器的版本可以通过执行**npu-smi info** 命令查询在查询到的型号前加Ascend信息例如**Ascend910B4、Ascend310P3** |
|precision_mode_v2 |设置网络模型的精度模式。例如:**fp16、mixed_float16、origin**
| cur_dir |运行指令或执行脚本时的路径(当前目录) |
|device_id |npu芯片的id,在装了CANN驱动的服务器上使用npu-smi info查看可用的npu芯片的id |
### 安装python依赖
```shell
cd ${cur_dir}
pip install -r requirements.txt
```
### 安装ais_bench推理工具
[ais_bench推理工具使用指南](https://gitee.com/ascend/tools/blob/master/ais-bench_workload/tool/ais_bench/README.md)
- 需安装**aclruntime**包和**ais_bench**推理程序包
#### 开源模型转换onnx格式
```shell
cd ${cur_dir}
python bin2onnx.py --model_path ${save_directory}
```
#### onnx转换om离线模型
在环境上使用[昇腾ATC](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/devaids/auxiliarydevtool/atlasatc_16_0001.html)将onnx格式转换为om格式的离线模型
- ATC工具集成在CANN中source相应的环境变量即可
```shell
source /usr/local/Ascend/ascend-toolkit/set_env.sh
```
在 ${cur_dir} 下运行脚本
```shell
atc --model=${save_directory}/model.onnx --framework=5 --output=${save_directory}/bge --soc_version=${soc_version} --input_shape="input_ids:-1,-1;attention_mask:-1,-1;token_type_ids:-1,-1" --optypelist_for_implmode="Gelu" --op_select_implmode=high_performance --input_format=ND --precision_mode_v2=${precision_mode} --modify_mixlist=${cur_dir}/ops_info.json
```
#### 参数说明
- bert模型的三个输入依次为**input_ids**、 **attention_mask****token_type_ids** 按顺序指定模型输入数据的shape。
- 参照ATC说明文档设置shape范围时若设置为 -1表示此维度可以使用 >=0 的任意取值,该场景下取值上限为 int64 数据类型表达范围但受限于host和device侧物理内存的大小用户可以通过增大内存来支持。
- Gelu算子在不影响精度的情况下开启高性能模式提升模型性能
- 所配置的精度模式不同,网络模型精度以及性能有所不同,具体为:
精度高低排序:`origin>mixed_float16>fp16`
性能优劣排序:`fp16>=mixed_float16>origin`
推荐配置: **mixed_float16**
- modify_mixlist参数为配置混合精度下的黑白灰名单目的是控制在fp16精度溢出的算子保持原精度格式避免其溢出这里定义了一个将算子写入黑名单的json文件
### 获取测试数据集
```shell
cd ${cur_dir}
mkdir dataset
cd dataset
```
将[corpus、queries](https://huggingface.co/datasets/C-MTEB/T2Retrieval/tree/main/data)和[dev](https://huggingface.co/datasets/C-MTEB/T2Retrieval-qrels/tree/main/data)下载到该路径下
### 离线模型推理脚本指南
- om模型推理脚本的启动路径为`${cur_dir}/infer.py`
- hf开源模型推理脚本的启动路径为`${cur_dir}/demo.py`
在昇腾机器上**运行**`python infer.py --model-path ${save_directory} --device ${device_id}`
或者GPU的权重存放路径上**运行**`python demo.py`
- **说明:**执行infer.py时脚本会运行模型存放的目录的第一个以.om为结尾的模型若想指定某个om模型可以在infer.py中修改
`session = InferSession(device_id=device, model_path=model_path)`**model_path** 为$`{save_directory}/*.om`
其中,*为OM离线模型文件名。
### 精度 & 性能测试
- 修改Config_bge.json内的模型路径为各模型所在的相应路径
- 精度测试脚本
```shell
python eval_cmteb.py --model_type_or_path om --device ${device_id}
```
- 性能测试脚本
```shell
python eval_performance.py --model_type_or_path om --input_shape [batch_size, seq_len] --device ${device_id}
```
#### 模型推理性能
性能验证NPU环境使用 `OM` 模型GPU环境使用 `ONNX` 模型
吞吐率1000 * batch_size / compute_time
| 环境 | 芯片型号 | batch_size | seq_len | 吞吐率fps |
|-----|-------------|------------|---------|----------|
| NPU | Ascend310P3 | 8 | 100 | 449.22 |
| NPU | Ascend310P3 | 20 | 512 | 39.40 |
| NPU | Ascend310P3 | 128 | 512 | 39.63 |
| GPU | NVIDIA A10 | 8 | 100 | 149.93 |
| GPU | NVIDIA A10 | 20 | 512 | 48.21 |
| GPU | NVIDIA A10 | 128 | 512 | 49.38 |
说明Atlas 300I Duo 推理卡为单卡双芯比较吞吐率时需要×2
| 环境 | 芯片型号 | batch_size | seq_len | 吞吐率fps |
|-----|-------------|------------|---------|----------|
| NPU | Ascend910B4 | 8 | 100 | 696.06 |
| NPU | Ascend910B4 | 20 | 512 | 132.96 |
| NPU | Ascend910B4 | 128 | 512 | 123.94 |
| GPU | NVIDIA L20 | 8 | 100 | 384.60 |
| GPU | NVIDIA L20 | 20 | 512 | 112.80 |
| GPU | NVIDIA L20 | 128 | 512 | 104.37 |
#### 模型推理精度
精度验证NPU环境使用 `OM` 模型GPU环境使用 `ONNX` 模型
| 环境 | 芯片型号 | ndcg@10% |
|-----|-------------|--------|
| NPU | Ascend310P3 | 83.66 |
| GPU | Nvidia A10 | 83.67 |
| 环境 | 芯片型号 | ndcg@10% |
|-----|-------------|--------|
| NPU | Ascend910B4 | 83.86 |
| GPU | Nvidia L20 | 83.67 |
### Ascend310P3性能说明
在昇腾310P3上需要进行一项操作来发挥出算子更好的性能
1. SoftmaxV2使能VectorCore需要在以下路径的json文件中找到SoftmaxV2
```
/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/config/ascend310p/aic-ascend310p-ops-info.json
```
加入使能VectorCore
```json
"enableVectorCore":{
"flag":"true"
}
```
2. 并且在以下路径中把已经存在的softmax_v2改为其它名称否则使能不生效
```shell
ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/kernel/ascend310p
```
3. 重新进行ATC转换再进行性能测试即可
------------
## 加速库版本
### 离线模型推理脚本指南
- 接入FA加速库模型推理脚本的启动路径为`${cur_dir}/main.py`
1. 把 **modeling_bert_ascend.py** 的代码替换原生transformers内的 **modeling_bert.py** 的代码
路径为
```shell
/miniconda/envs/${conda_name}/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py
```
2. 在昇腾机器上**运行**`python main.py`
### 精度 & 性能测试
- 修改Config_bge.json内的模型路径为各模型所在的相应路径
- 精度测试脚本
```shell
python eval_cmteb.py --model_type_or_path pytorch --device ${device_id}
```
- 性能测试脚本
```shell
python eval_performance.py --model_type_or_path pytorch --input_shape [batch_size, seq_len] --device ${device_id}
```
#### 模型推理性能
性能验证NPU环境使用 `PYTORCH` 模型GPU环境使用 `PYTORCH` 模型
吞吐率1000 * batch_size / compute_time
| 环境 | 芯片型号 | batch_size | seq_len | 吞吐率fps |
|-----|-------------|------------|---------|----------|
| NPU | Ascend910B4 | 8 | 100 | 486.66 |
| NPU | Ascend910B4 | 20 | 512 | 1100.48 |
| NPU | Ascend910B4 | 128 | 512 | 4885.53 |
| GPU | NVIDIA L40 | 8 | 100 | 453.42 |
| GPU | NVIDIA L40 | 20 | 512 | 575.13 |
| GPU | NVIDIA L40 | 128 | 512 | 2104.04 |
#### 模型推理精度
精度验证NPU环境使用 `PYTORCH` 模型GPU环境使用 `PYTORCH` 模型
| 环境 | 芯片型号 | ndcg@10% |
|-----|------------- |--------|
| NPU | Ascend910B4 (fp16) | 83.67 |
| GPU | Nvidia L40 (fp32) | 83.67 |
- Ascend310P3待测试

View File

@ -0,0 +1,28 @@
# Copyright 2024 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
from optimum.onnxruntime import ORTModelForFeatureExtraction
parser = argparse.ArgumentParser(description="Export a model from transformers to ONNX format.")
parser.add_argument("--model_path", type=str, required=True, help="Path to the model checkpoint to convert.")
args = parser.parse_args()
model_checkpoint = args.model_path
ort_model = ORTModelForFeatureExtraction.from_pretrained(model_checkpoint, export=True, from_transformers=True)
# Save the ONNX model
ort_model.save_pretrained(model_checkpoint)

View File

@ -0,0 +1,8 @@
{
"default_path": {
"tokenizer_path": "./bge-large-zh-v1.5",
"pytorch_model_path": "./bge-large-zh-v1.5",
"onnx_model_path": "./bge-large-zh-v1.5",
"om_model_path": "./bge-large-zh-v1.5/bge_liunx_aarch.om"
}
}

View File

@ -0,0 +1,129 @@
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" BERT model configuration"""
from transformers.configuration_utils import PretrainedConfig
from transformers.utils import logging
logger = logging.get_logger(__name__)
class BertConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`BertModel`] or a [`TFBertModel`]. It is used to
instantiate a BERT model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the BERT
[google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the BERT model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`BertModel`] or [`TFBertModel`].
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the `token_type_ids` passed when calling [`BertModel`] or [`TFBertModel`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
classifier_dropout (`float`, *optional*):
The dropout ratio for the classification head.
Examples:
```python
>>> from transformers import BertConfig, BertModel
>>> # Initializing a BERT google-bert/bert-base-uncased style configuration
>>> configuration = BertConfig()
>>> # Initializing a model (with random weights) from the google-bert/bert-base-uncased style configuration
>>> model = BertModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "bert"
def __init__(
self,
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
layer_norm_eps=1e-12,
pad_token_id=0,
position_embedding_type="absolute",
use_cache=True,
classifier_dropout=None,
**kwargs,
):
super().__init__(pad_token_id=pad_token_id, **kwargs)
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.intermediate_size = intermediate_size
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.type_vocab_size = type_vocab_size
self.initializer_range = initializer_range
self.layer_norm_eps = layer_norm_eps
self.position_embedding_type = position_embedding_type
self.use_cache = use_cache
self.classifier_dropout = classifier_dropout

View File

@ -0,0 +1,42 @@
#!/bin/bash
# 定义模型检查点和保存目录
model_checkpoint="$1"
save_directory="$model_checkpoint"
soc_version=$(python -c "import torch;import torch_npu;print(torch.npu.get_device_name())")
precision_mode=allow_mix_precision
# 确保当前模型路径下没有同名的model.onnx文件
if [ -f "$save_directory/model.onnx" ]; then
echo "Error: model.onnx already exists in the current path"
exit 1
fi
# 使用Python脚本加载并导出模型到ONNX
python -c "
from optimum.onnxruntime import ORTModelForFeatureExtraction
ort_model = ORTModelForFeatureExtraction.from_pretrained('$model_checkpoint', export=True, from_transformers=True)
ort_model.save_pretrained('$save_directory')
"
# 检查ONNX模型是否成功保存
if [ -f "$save_directory/model.onnx" ]; then
echo "ONNX model successfully saved at $save_directory/model.onnx"
else
echo "Error: Failed to save ONNX model."
exit 1
fi
# 使用ATC命令对ONNX模型进行转换或优化
atc --model=$save_directory/model.onnx --framework=5 --output=$save_directory/bge_"$soc_version" --soc_version="$soc_version" --input_shape="input_ids:-1,-1;attention_mask:-1,-1;token_type_ids:-1,-1" --precision_mode="$precision_mode"
# 检查ATC命令是否执行成功
if [ $? -eq 0 ]; then
echo "Model conversion with ATC successful."
else
echo "Error: Failed to convert model with ATC."
exit 1
fi

View File

@ -0,0 +1,85 @@
# Copyright 2023 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import logging
import torch
try:
import torch_npu
device = "npu:0"
torch_npu.npu.set_device(0)
torch.npu.set_compile_mode(jit_compile=False)
except ImportError:
device = "cuda:0"
from transformers import AutoTokenizer, AutoModel
logging.getLogger().setLevel(logging.INFO)
# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]
MODEL_PATH = "./"
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModel.from_pretrained(MODEL_PATH).to(device)
model.eval()
def infer(text):
# Tokenize sentences
encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors='pt', max_length=512)
encoded_input = encoded_input.to(device)
logging.info(encoded_input.input_ids.shape)
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
logging.info("Sentence embeddings:", sentence_embeddings)
logging.info("Sentence embeddings.shape:", sentence_embeddings.shape)
def infer_test(text):
# Tokenize sentences
encoded_input = tokenizer(text, padding="max_length", return_tensors='pt', max_length=512)
encoded_input = encoded_input.to(device)
logging.info(encoded_input.input_ids.shape)
# Compute token embeddings
with torch.no_grad():
start_time = time.time()
model_output = model(**encoded_input)
end_time = time.time()
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
time_cost = end_time - start_time
logging.info("Sentence embeddings:", sentence_embeddings)
logging.info("Sentence embeddings.shape:", sentence_embeddings.shape)
logging.info("generate cost %g ms", time_cost * 1000)
return sentence_embeddings
if __name__ == '__main__':
try:
infer_test(sentences)
infer_test(sentences)
except Exception as e:
logging.error("An error occurred during inference:", str(e))

View File

@ -0,0 +1,304 @@
# Copyright 2024 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import logging
import os
from typing import List, Any, Union
from collections import defaultdict
import json
import numpy as np
import torch
import transformers.tokenization_utils_base
from mteb import MTEB, AbsTaskRetrieval
from datasets import load_dataset, DatasetDict
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer, AutoModel
from tqdm import tqdm as progressbar
from atb_llm.utils.file_utils import safe_open
logging.getLogger().setLevel(logging.INFO)
def get_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description='Evaluate LLM.')
parser.add_argument(
'--model_type_or_path',
type=str,
required=True,
help='Specipy model type to load default model or path to the directory containing model file.'
)
parser.add_argument(
'--batch_size',
type=int,
default=20,
help='Batch size of dataset for computing.'
)
parser.add_argument(
'--device',
type=int,
default=0,
choices=list(range(8)),
help='Adapt model on device id x.'
)
return parser.parse_args()
def load_retrieval_data(hf_hub_name, eval_splits):
eval_split = eval_splits[0]
dataset = load_dataset("parquet", data_files={'corpus': 'dataset/corpus-00000-of-00001-8afe7b7a7eca49e3.parquet',
'queries': 'dataset/queries-00000-of-00001-930bf3b805a80dd9.parquet'})
qrels = load_dataset("parquet", data_files={eval_split: 'dataset/dev-00000-of-00001-92ed0416056ff7e1.parquet'})[
eval_split]
corpus = {e['id']: {'text': e['text']} for e in dataset['corpus']}
queries = {e['id']: e['text'] for e in dataset['queries']}
relevant_docs = defaultdict(dict)
for e in qrels:
relevant_docs[e['qid']][e['pid']] = e['score']
corpus = DatasetDict({eval_split: corpus})
queries = DatasetDict({eval_split: queries})
relevant_docs = DatasetDict({eval_split: relevant_docs})
return corpus, queries, relevant_docs
class T2RetrievalLocal(AbsTaskRetrieval):
def __init__(self, **kwargs: Any):
super().__init__(**kwargs)
self.data_loaded = None
self.corpus = None
self.queries = None
self.relevant_docs = None
@property
def description(self) -> dict:
return {
'name': 'T2RetrievalLocal',
'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
'hf_hub_name': 'C-MTEB/T2Retrieval',
'reference': "https://arxiv.org/abs/2304.03679",
'type': 'Retrieval',
'category': 's2p',
'eval_splits': ['test'],
'eval_langs': ['zh'],
'main_score': 'ndcg_at_10',
}
def load_data(self, **kwargs) -> None:
if self.data_loaded:
return
try:
self.corpus, self.queries, self.relevant_docs = load_retrieval_data(self.description['hf_hub_name'],
self.description['eval_splits'])
except KeyError as e:
raise RuntimeError('load dataset failed because {}'.format(e)) from e
else:
self.data_loaded = True
class Model:
def __init__(self, tokenizer_path: str, batch_size: int) -> None:
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
self.batch_size = batch_size
def encode(self, sentences: List[str], **kwargs: Any) -> torch.Tensor:
""" Returns a list of embeddings for the given sentences.
Args:
sentences (`List[str]`): List of sentences to encode
Returns:
`torch.Tensor`: Tensor of embeddings for the given sentences
"""
pass
def _tokenize_sentences(self, sentences: List[str]) -> transformers.tokenization_utils_base.BatchEncoding:
return self.tokenizer(
sentences,
padding='max_length',
truncation=True,
return_tensors='pt',
max_length=512
)
class PyTorchModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
super(PyTorchModel, self).__init__(tokenizer_path, batch_size)
# init model runtime
try:
import torch_npu
except ImportError:
self.device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
self.device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
self.model = AutoModel.from_pretrained(
model_path,
local_files_only=True,
trust_remote_code=True
).half().to(self.device)
self.model.eval()
def encode(self, sentences: List[str], **kwargs: Any) -> Union[np.ndarray, torch.Tensor]:
all_embs = []
for start_index in progressbar(range(0, len(sentences), self.batch_size)):
sentences_batch = sentences[start_index:start_index + self.batch_size]
# Tokenize sentences
encoded_inputs = self._tokenize_sentences(sentences_batch)
# Compute token embeddings
with torch.no_grad():
embs = self.model(**encoded_inputs.to(self.device)).float()
sentence_embeddings = embs[:, 0]
all_embs.extend(sentence_embeddings.cpu())
if all_embs:
if isinstance(all_embs, np.ndarray):
all_embs = torch.from_numpy(all_embs)
else:
all_embs = torch.stack(all_embs)
else:
all_embs = torch.Tensor()
return all_embs
class ONNXModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
super(ONNXModel, self).__init__(tokenizer_path, batch_size)
# init model runtime
try:
import torch_npu
except ImportError:
self.device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
self.device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
self.ort = ORTModelForFeatureExtraction.from_pretrained(model_path).to(self.device)
def encode(self, sentences: List[str], **kwargs: Any) -> Union[np.ndarray, torch.Tensor]:
all_embs = []
for start_index in progressbar(range(0, len(sentences), self.batch_size)):
sentences_batch = sentences[start_index:start_index + self.batch_size]
# Tokenize sentences
encoded_inputs = self._tokenize_sentences(sentences_batch)
# Compute token embeddings
encoded_input = encoded_inputs.to(self.device)
with torch.no_grad():
model_output = self.ort(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
embs = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
all_embs.extend(embs)
if all_embs:
if isinstance(all_embs, np.ndarray):
all_embs = torch.from_numpy(all_embs)
else:
all_embs = torch.stack(all_embs)
else:
all_embs = torch.Tensor()
return all_embs
class OMModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int) -> None:
super(OMModel, self).__init__(tokenizer_path, batch_size)
# init model runtime
from ais_bench.infer.interface import InferSession
self.session = InferSession(device_id, model_path)
def encode(self, sentences: List[str], **kwargs: Any) -> Union[np.ndarray, torch.Tensor]:
all_embs = []
for start_index in progressbar(range(0, len(sentences), self.batch_size)):
sentences_batch = sentences[start_index:start_index + self.batch_size]
# Tokenize sentences
encoded_inputs = self._tokenize_sentences(sentences_batch)
input_ids = encoded_inputs.data['input_ids']
attention_mask = encoded_inputs.data['attention_mask']
token_type_ids = encoded_inputs.data['token_type_ids']
# Compute token embeddings
outputs = self.session.infer(feeds=[input_ids, attention_mask, token_type_ids], mode='dymshape',
custom_sizes=10000000)[0][:, 0]
outputs = torch.from_numpy(outputs)
embs = torch.nn.functional.normalize(outputs, p=2, dim=1)
all_embs.extend(embs)
if all_embs:
if isinstance(all_embs, np.ndarray):
all_embs = torch.from_numpy(all_embs)
else:
all_embs = torch.stack(all_embs)
else:
all_embs = torch.Tensor()
return all_embs
def load_model(model_args: argparse.Namespace) -> Model:
# default model path
with safe_open('config_bge.json', 'r', encoding='utf-8') as reader:
text = reader.read()
default_path = json.loads(text)['default_path']
pytorch_model_path = tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
om_model_path = os.path.abspath(default_path['om_model_path'])
model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
model_type = model_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
default_model_path = model_path_map.get(model_type, 'not exist')
if default_model_path != 'not exist':
model_path = (
model_args.model_type_or_path
if os.path.isdir(model_args.model_type_or_path) or os.path.isfile(model_args.model_type_or_path)
else default_model_path
)
else:
raise RuntimeError(
'load model failed because '
'\'{}\' is not a valid model type or path'.format(model_args.model_type_or_path)
)
try:
model_for_eval = model_map[model_type](
tokenizer_path=tokenizer_path,
model_path=model_path,
batch_size=model_args.batch_size,
device_id=model_args.device
)
except KeyError as e:
raise RuntimeError('load {} model failed because {}'.format(model_type, e)) from e
return model_for_eval
if __name__ == '__main__':
args = get_args()
model = load_model(args)
task = ['T2RetrievalLocal']
evaluation = MTEB(tasks=task, task_langs=['zh'])
results = evaluation.run(model)
logging.info(results)

View File

@ -0,0 +1,302 @@
# Copyright 2024 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import json
import logging
import os
import time
from typing import Any, List, Union, Tuple
import datasets
import numpy as np
import torch
import transformers.tokenization_utils_base
from transformers import AutoTokenizer, AutoModel
from optimum.onnxruntime import ORTModelForFeatureExtraction
from tqdm import tqdm as progressbar
from atb_llm.utils.file_utils import safe_open
logging.getLogger().setLevel(logging.INFO)
def get_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description='Evaluate LLM.')
parser.add_argument(
'--model_type_or_path',
type=str,
required=True,
help='Specipy model type to load default model or path to the directory containing model file.'
)
parser.add_argument(
'--input_shape',
type=str,
required=True,
help='Shape of input tensors.'
)
parser.add_argument(
'--device',
type=int,
default=4,
choices=list(range(8)),
help='Adapt model on device id x.'
)
parser.add_argument(
'--loop',
type=int,
default=50,
help='Evaluation loops.'
)
return parser.parse_args()
class Model:
def __init__(self, tokenizer_path: str) -> None:
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
def init_runtime(self, device_id: int) -> Tuple[Union[str, int], Any]:
if self.__class__.__name__.startswith(('PyTorch', 'ONNX')):
try:
import torch_npu
except ImportError:
device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
return device, 0
elif self.__class__.__name__.startswith('OMModel'):
from ais_bench.infer.interface import InferSession
return device_id, InferSession
else:
raise RuntimeError
def tokenize(
self,
sentences_batch: List[List[str]],
seq_len: int
) -> transformers.tokenization_utils_base.BatchEncoding:
encoded_inputs = self.tokenizer(
sentences_batch,
padding='max_length',
truncation=True,
return_tensors='pt',
max_length=512 # seq_len
).to(self.device)
return encoded_inputs
def encode(self, pairs: List[List[str]], seq_len: int) -> float:
# Tokenize sentences
encoded_inputs = self.tokenize(pairs, seq_len)
# Compute token embedding time
computing_time = self._encode_batched(encoded_inputs)
return computing_time
def compute_scores(self, pairs: List[List[str]], batch_size: int, seq_len: int, loop: int) -> dict:
all_computing_time = []
for _ in progressbar(range(loop), 'Evaluating...'):
computing_time = self.encode(pairs, seq_len)
all_computing_time.append(computing_time)
try:
throughput = 1000 * batch_size / np.mean(all_computing_time)
except ZeroDivisionError as e:
raise RuntimeError('{} because no evaluation results'.format(e)) from e
scores = {
'compute_time': {
'min': np.min(all_computing_time),
'max': np.max(all_computing_time),
'mean': np.mean(all_computing_time),
'median': np.median(all_computing_time),
'percentile(99%)': np.percentile(all_computing_time, 99)
},
'throughput': throughput
}
return scores
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
""" Returns a list of embeddings for the given sentences.
Args:
inputs (`BatchEncoding`): List of sentences to encode
Returns:
`float: Computing time of embeddings for the given sentences
"""
_ = self
return 0.0
class PyTorchModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
super(PyTorchModel, self).__init__(tokenizer_path)
self.device, _ = self.init_runtime(device_id)
self.model = AutoModel.from_pretrained(model_path).half().to(self.device)
self.model.eval()
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
tick = time.time()
with torch.no_grad():
model_output = self.model(**inputs)
_ = model_output[0][:, 0]
tock = time.time()
return 1000 * (tock - tick)
class ONNXModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
super(ONNXModel, self).__init__(tokenizer_path)
self.device = self.init_runtime(device_id)
self.ort = ORTModelForFeatureExtraction.from_pretrained(model_path).to(self.device)
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
tick = time.time()
with torch.no_grad():
_ = self.ort(**inputs)
# Perform pooling. In this case, cls pooling.
tock = time.time()
return 1000 * (tock - tick)
class OMModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, device_id: int) -> None:
super(OMModel, self).__init__(tokenizer_path)
self.device, infer_session = self.init_runtime(device_id)
self.session = infer_session(device_id, model_path, loop=4)
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
input_ids = inputs.data['input_ids']
attention_mask = inputs.data['attention_mask']
token_type_ids = inputs.data['token_type_ids']
tick = time.time()
_ = self.session.infer(feeds=[input_ids, attention_mask, token_type_ids],
mode='dymshape', custom_sizes=5000000)[0][:, 0]
tock = time.time()
return 1000 * (tock - tick) / 4
class PerformanceEvaluator:
def __init__(self, metadata: dict) -> None:
self.metadata = metadata
self.dataset = datasets.load_dataset("parquet", data_files={
'corpus': 'dataset/corpus-00000-of-00001-8afe7b7a7eca49e3.parquet',
'queries': 'dataset/queries-00000-of-00001-930bf3b805a80dd9.parquet'})
self.samples = self.dataset[self.metadata['eval_splits'][0]]
def __call__(
self,
model: Model,
input_shape: Union[Tuple, List],
loop: int) -> dict:
"""This is called during training to evaluate the model.
It returns scores.
Args:
model (`Model`): the model to evaluate
input_shape (`Union[Tuple[int, int], List[int, int]]`): shape of input tensors
loop (`int`): evaluation loops
"""
return self.compute_performance(model, input_shape, loop)
def compute_performance(
self,
model: Model,
input_shape: Union[Tuple, List],
loop: int) -> dict:
batch_size, seq_len = input_shape
pairs = []
docs = []
for sample in self.samples:
docs.append(sample['text'])
pairs = docs
pairs = pairs[:batch_size]
scores = model.compute_scores(pairs, batch_size, seq_len, loop)
return scores
class Evaluation:
def __init__(self, eval_args: argparse.Namespace):
self.input_shape = tuple(map(int, eval_args.input_shape.split(',')))
self.device_id = eval_args.device
self.loop = eval_args.loop
# dataset metadata
self.metadata = {
'name': 'T2RetrievalLocal',
'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
'reference': 'https://arxiv.org/abs/2304.03679',
'type': 'Retrieval',
'category': 's2p',
'eval_splits': ['corpus'],
'eval_langs': ['zh'],
'main_score': 'ndcg_at_10'
}
# default model path
with safe_open('config_bge.json', 'r', encoding='utf-8') as reader:
text = reader.read()
default_path = json.loads(text)['default_path']
pytorch_model_path = self.tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
om_model_path = os.path.abspath(default_path['om_model_path'])
model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
self.model_type = eval_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
default_model_path = model_path_map.get(self.model_type, 'not exist')
if default_model_path != 'not exist':
self.model_path = (
eval_args.model_type_or_path
if os.path.isdir(eval_args.model_type_or_path) or os.path.isfile(eval_args.model_type_or_path)
else default_model_path
)
else:
raise RuntimeError(
'load model failed because '
'\'{}\' is not a valid model type or path'.format(eval_args.model_type_or_path)
)
def load_model(self) -> Model:
model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
try:
model = model_map[self.model_type](
tokenizer_path=self.tokenizer_path,
model_path=self.model_path,
device_id=self.device_id
)
except KeyError as e:
raise RuntimeError('load {} model failed because {}'.format(self.model_type, e)) from e
return model
def run(self) -> dict:
model = self.load_model()
evaluator = PerformanceEvaluator(self.metadata)
eval_results = evaluator(model, self.input_shape, self.loop)
return eval_results
if __name__ == '__main__':
args = get_args()
evaluation = Evaluation(args)
results = evaluation.run()
logging.info(results)

View File

@ -0,0 +1,95 @@
# Copyright 2023 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import logging
import torch
from transformers import AutoTokenizer
from ais_bench.infer.interface import InferSession
parser = argparse.ArgumentParser(description='Infer with a specified .om model file and device id')
parser.add_argument('--model-path', type=str, required=True, help='Path to the directory containing the .om model file')
parser.add_argument('--device', type=int, default=0, choices=[0, 1, 2, 3, 4, 5, 6, 7],
help='load the model.om on device id x')
logging.getLogger().setLevel(logging.INFO)
class InferEngine:
def __init__(self, device_id, model_path):
self.device_id = device
self.model_path = model_path
self.tokenizer = AutoTokenizer.from_pretrained(hf_model_path)
# InferSession的初始化表示在某个device的npu芯片上加载模型model.om
self.session = InferSession(device_id=device_id, model_path=model_path)
def infer(self, text):
encoded_input = self.tokenizer(text, padding=True, truncation=True, return_tensors='np', max_length=512)
input_ids = encoded_input['input_ids']
attention_mask = encoded_input['attention_mask']
token_type_ids = encoded_input['token_type_ids']
inputs = [input_ids, attention_mask, token_type_ids]
# feeds传入一组输入数据mode选择模型类型static表示输入节点shape固定的静态模型
outputs = self.session.infer(feeds=inputs, mode="dymshape", custom_sizes=10000000)[0][:, 0]
outputs = torch.from_numpy(outputs)
outputs = torch.nn.functional.normalize(outputs, p=2, dim=1)
logging.info("Sentence embeddings: %s", outputs)
logging.info("Sentence embeddings.shape: %s", outputs.shape)
return outputs
def infer_test(self, text):
encoded_input = self.tokenizer(text, padding=True, truncation=True, return_tensors='np', max_length=512)
input_ids = encoded_input['input_ids']
attention_mask = encoded_input['attention_mask']
token_type_ids = encoded_input['token_type_ids']
inputs = [input_ids, attention_mask, token_type_ids]
# feeds传入一组输入数据mode选择模型类型static表示输入节点shape固定的静态模型
outputs = self.session.infer(feeds=inputs, mode="dymshape", custom_sizes=10000000)[0][:, 0]
outputs = torch.from_numpy(outputs)
outputs = torch.nn.functional.normalize(outputs, p=2, dim=1)
logging.info("Sentence embeddings: %s", outputs)
logging.info("Sentence embeddings.shape: %s", outputs.shape)
# exec_time_list 按先后顺序保留了所有session在执行推理的时间。
exec_time = self.session.summary().exec_time_list[-1]
time_cost = exec_time[1] - exec_time[0]
logging.info("generate cost %g ms", time_cost * 1000)
return outputs
def free(self):
self.session.free_resource()
if __name__ == '__main__':
args = parser.parse_args()
device = args.device
# Load model from HuggingFace Hub
hf_model_path = args.model_path
# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]
om_files = [f for f in os.listdir(hf_model_path) if f.endswith('.om')]
if not om_files:
raise ValueError(f"No .om files found in {hf_model_path}")
# 选择第一个找到的.om文件
om_file_name = om_files[0]
om_model_path = os.path.join(hf_model_path, om_file_name)
infer_engine = InferEngine(device_id=device, model_path=om_model_path)
infer_engine.infer_test(sentences)
infer_engine.infer_test(sentences)
infer_engine.free()

View File

@ -0,0 +1,83 @@
# Copyright 2024 Huawei Technologies Co., Ltd
#
# Licensed under the Apache License, Version 2.0 (the License);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import logging
import torch
try:
import torch_npu
device = "npu:0"
torch_npu.npu.set_device(0)
torch.npu.set_compile_mode(jit_compile=False)
except ImportError:
device = "cuda:0"
from transformers import AutoTokenizer, AutoModel
logging.getLogger().setLevel(logging.INFO)
class ModelInference:
def __init__(self, model_path):
self.model_path = model_path
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModel.from_pretrained(model_path).half().to(device)
self.model.eval()
def infer(self, text):
encoded_input = self.tokenizer(
text, padding=True, truncation=True, return_tensors="pt", max_length=512
)
encoded_input = encoded_input.to(device)
logging.info(encoded_input.input_ids.shape)
with torch.no_grad():
model_output = self.model(**encoded_input)
sentence_embeddings = model_output[0][:, 0]
sentence_embeddings = torch.nn.functional.normalize(
sentence_embeddings, p=2, dim=1
)
logging.info("Sentence embeddings: %s", sentence_embeddings)
logging.info("Sentence embeddings.shape: %s", sentence_embeddings.shape)
def infer_test(self, text):
encoded_input = self.tokenizer(
text, padding="max_length", return_tensors="pt", max_length=512
)
encoded_input = encoded_input.to(device)
with torch.no_grad():
start_time = time.time()
model_output = self.model(**encoded_input)
end_time = time.time()
sentence_embeddings = model_output[:, 0]
sentence_embeddings = torch.nn.functional.normalize(
sentence_embeddings, p=2, dim=1
)
time_cost = end_time - start_time
logging.info("Sentence embeddings: %s", sentence_embeddings)
logging.info("Sentence embeddings.shape: %s", sentence_embeddings.shape)
logging.info("generate cost %g ms", time_cost * 1000)
return sentence_embeddings
if __name__ == "__main__":
MODEL_PATH = "/data1/models/BAAI/bge-large-zh-v1.5"
sentences = ["样例数据-1", "样例数据-2"]
model_inference = ModelInference(MODEL_PATH)
model_inference.infer_test(sentences)
model_inference.infer_test(sentences)

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,10 @@
{
"black-list": {
"to-add": [
"Add",
"Sub",
"Mul",
"SoftmaxV2"
]
}
}

View File

@ -0,0 +1,3 @@
optimum==1.18.0
onnx==1.16.0
onnxruntime==1.17.1

View File

@ -0,0 +1,251 @@
# README
# 特性矩阵
- 此矩阵罗列了各bge-reranker-large模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI | 长序列 |
|--------------|-------------------------|---------------------------| ---- |-----| --------------- | --------------- | -------- | --------- | --------- | ------------ | -------------------------- | ---- | ------ | ---- |-----|
| bge-reranker-large | 支持world size 1 | 支持world size 1 | √ | × | × | × | × | × | × | × | × | × | × | × | × |
# bge-reranker-large模型-推理指导
- [概述](#概述)
- [输入输出数据](#输入输出数据)
- [推理环境准备](#推理环境准备)
- [快速上手](#快速上手)
- [获取源码](#获取源码)
- [模型转换](#模型转换)
- [模型推理](#模型推理)
- [模型推理性能&精度](#模型推理性能精度)
- [模型推理性能](#模型推理性能)
- [精度](#精度)
## 概述
### 模型介绍
`bge-reranker-large` 是由智源研究院研发的交叉编码器重排模型,可对查询和答案实时计算相关性分数,这比向量模型(即双编码器)更准确,但比向量模型更耗时。
### 开源模型地址
```text
url=https://huggingface.co/BAAI/bge-reranker-large
commit_id=bc0c7056d15eaea221616887bf15da63743d19e1
model_name=bge-reranker-large
```
### 路径变量解释
```text
{cur_dir}
├─ .cache
│ ├─ huggingface
│ │ └─ datasets
│ │ └─ C-MTEB
│ │ └─ T2Reranking
│ │ └─ dev-00000-of-00001-65d96bde8023d9b9.parquet
├─ models
│ ├─ om
│ │ ├─ bge-reranker-large_{soc_version}_{precision_mode}_linux_aarch64.om
│ ├─ onnx
│ │ ├─ model.onnx
│ └─ pytorch
│ └─ pytorch_model.bin
├─ eval_performance.py
├─ eval_precision.py
└─ run.py
```
| 变量名 | 含义 |
|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| soc_version | npu芯片的处理器的版本可以使用 `npu-smi info` 查询 |
| precision_mode | 转换的om模型的精度模式参考[ATC工具参数](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/devaids/auxiliarydevtool/atlasatc_16_0099.html) |
### 输入输出数据
**输入数据**
| 输入数据 | 数据类型 | 大小 | 数据排布格式 |
|----------------|-------|----------------------|--------|
| input_ids | INT64 | batch_size * seq_len | ND |
| attention_mask | INT64 | batch_size * seq_len | ND |
**输出数据**
| 输出数据 | 数据类型 | 大小 | 数据排布格式 |
|--------|---------|--------------------|--------|
| output | FLOAT32 | batch_size * class | ND |
## 推理环境准备
**该模型需要以下插件与驱动**
| 配套 | 版本 | 环境准备指导 |
|---------|----------|---------------------------------------------------------------------------------------------------------------|
| 固件与驱动 | 23.0.RC3 | [Pytorch框架推理环境准备](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/pies/pies_00001.html) |
| CANN | 7.0.RC1 | - |
| Python | 3.10 | - |
| Pytorch | 2.1.0 | - |
说明Atlas 300I Duo 推理卡请以 CANN 版本选择实际固件与驱动版本。
## 快速上手
### 获取源码
1. 获取本项目源码
```shell
git clone https://gitee.com/ascend/MindIE-LLM.git # 克隆本仓库代码
git checkout master # 切换对应分支
cd examples/atb_models/pytorch/examples/BAAI/bge-reranker-large # 打开工作(当前)目录 {cur_dir}
```
2. 安装依赖
安装python依赖
```shell
pip install -r requirements.txt
```
下载安装 `ais_bench` 推理工具
[ais_bench推理工具使用指南](https://gitee.com/ascend/tools/blob/master/ais-bench_workload/tool/ais_bench/README.md)
```shell
pip install ./aclruntime-{version}-{python_version}-linux_{arch}.whl
pip install ./ais_bench-{version}-py3-none-any.whl
# {version}表示软件版本号,{python_version}表示Python版本号{arch}表示CPU架构
```
3. 获取开源模型
```shell
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://gitee.com/ascend/MindIE-LLM.git
```
4. 准备数据集
下载 [C-MTEB/T2Reranking](https://huggingface.co/datasets/C-MTEB/T2Reranking) 数据集
```shell
mkdir .cache/huggingface/datasets/C-MTEB/
cd .cache/huggingface/datasets/C-MTEB/
git clone https://huggingface.co/datasets/C-MTEB/T2Reranking
mv T2Reranking/data/dev-00000-of-00001-65d96bde8023d9b9.parquet T2Reranking/
```
### 模型转换
1. 获取开源模型 pytorch 权重文件 [pytorch_model.bin](https://huggingface.co/BAAI/bge-reranker-large/blob/main/pytorch_model.bin),放在 `models/pytorch` 目录中
2. 获取开源模型 onnx 权重文件 [model.onnx](https://huggingface.co/BAAI/bge-large-zh-v1.5/resolve/main/pytorch_model.bin?download=true),放在 `models/onnx` 目录中
3. 运行脚本转换模型
```shell
bash ${cur_dir}/convert.sh ${onnx} ${om} ${precision_mode}
```
- 参数说明,参考 [ATC工具参数](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/devaids/auxiliarydevtool/atlasatc_16_0039.html)
- `onnx`转换后的onnx模型文件路径
- `om`转换后的om模型文件路径
- `precision_mode`:模型精度模式,精度高低排序 `origin>mixed_float16>fp16`,性能优劣排序 `fp16>=mixed_float16>origin`,推荐使用 `mixed_float16` 以在保证精度的前提下获得最大性能,默认为 `mixed_float16`
### 模型推理
1. 执行推理
```shell
python run.py \
--model_type_or_path=${model_type} or ${model_path}
--device=${device}
```
- 参数说明
- `model_type_or_path`:选择需要推理的模型类型或模型文件路径
- `device`选择加载模型的芯片id
2. 性能测试
```shell
python eval_performance.py \
--model_type_or_path=${model_type} or ${model_path} \
--input_shape=${batch_size},${seq_len} \
--device=${device} \
--loop=${loop}
```
- 参数说明
- `model_type_or_path`:选择需要推理的模型类型或模型文件路径
- `batch_size`:选择每次推理时加载的数据集长度
- `seq_len`:选择每次推理时加载的文本长度
- `device`选择加载模型的芯片id
- `loop`:验证循环次数
3. 精度测试
```shell
python eval_precision.py \
--model_type_or_path=${model_type} or ${model_path} \
--batch_size=${batch_size} \
--device=${device}
```
- 参数说明
- `model_type_or_path`:选择需要推理的模型类型或模型文件路径
- `batch_size`:选择每次推理时加载的数据集长度
- `device`选择加载模型的芯片id
## 模型推理性能&精度
### 模型推理性能
吞吐率1000 * batch_size / compute_time
| 环境 | 芯片型号 | batch_size | seq_len | 吞吐率fps |
|-----|-------------|------------|---------|----------|
| NPU | Ascend310P3 | 20 | 512 | 43.84 |
| NPU | Ascend310P3 | 50 | 512 | 44.23 |
| GPU | NVIDIA A10 | 20 | 512 | 46.43 |
| GPU | NVIDIA A10 | 50 | 512 | 49.16 |
说明Atlas 300I Duo 推理卡为单卡双芯比较吞吐率时需要×2
| 环境 | 芯片型号 | batch_size | seq_len | 吞吐率fps |
|-----|-------------|------------|---------|----------|
| NPU | Ascend910B4 | 20 | 512 | 144.02 |
| NPU | Ascend910B4 | 50 | 512 | 135.82 |
| GPU | NVIDIA L40S | 20 | 512 | 119.75 |
| GPU | NVIDIA L40S | 50 | 512 | 113.42 |
### 模型推理精度
精度验证NPU环境使用 `OM` 模型GPU环境使用 `ONNX` 模型
有数据集精度验证选择 [C-MTEB/T2Reranking](https://huggingface.co/datasets/C-MTEB/T2Reranking) 任务,开源模型在该任务下 MAP 分数为 67.28
| 环境 | 芯片型号 | MAP% | MRR@10% | 执行时间s |
|-----|-------------|--------|-----------|---------|
| NPU | Ascend310P3 | 67.60 | 77.68 | 4496.25 |
| GPU | Nvidia A10 | 67.61 | 77.66 | 2216.56 |
| 环境 | 芯片型号 | MAP% | MRR@10% | 执行时间s |
|-----|-------------|--------|-----------|---------|
| NPU | Ascend910B4 | 67.60 | 77.66 | 985.30 |
| GPU | Nvidia L40S | 67.61 | 77.66 | 991.57 |
说明:
- MAP平均精度均值Mean Average Precision$MAP = \frac{1}{|U|} \sum_{i=1}^{|U|} hit(i) \times \frac{1}{P_i}$
- MRR平均倒数排名Mean Reciprocal Rank$MRR = \frac{1}{N} \sum_{i=1}^N \frac{1}{p_i}$
无数据集精度验证选择输入 `[[query, positive], [query, negative]]` 文本,`torch.allclose` 满足1%精度
| 环境 | 芯片型号 | 推理结果 |
|-----|-------------|--------------------------|
| NPU | Ascend310P3 | tensor([7.5195, 1.3613]) |
| GPU | Nvidia A10 | tensor([7.5152, 1.3654]) |
| 环境 | 芯片型号 | 推理结果 |
|-----|-------------|--------------------------|
| NPU | Ascend910B4 | tensor([7.5195, 1.3779]) |
| GPU | Nvidia L40S | tensor([7.5140, 1.3697]) |

View File

@ -0,0 +1,8 @@
{
"default_path": {
"tokenizer_path": "models/pytorch",
"pytorch_model_path": "models/pytorch",
"onnx_model_path": "models/onnx",
"om_model_path": "models/om/bge-reranker-large_Ascend910B4_allow_mix_precision_linux_aarch64.om"
}
}

View File

@ -0,0 +1,39 @@
#!/bin/bash
# 定义模型检查点和保存目录
onnx_directory="$1"
om_directory="$2"
soc_version=$(python -c "import torch;import torch_npu;print(torch.npu.get_device_name())")
# 检查是否输入了转换精度参数
if [ -z "$3" ]; then
precision_mode=mixed_float16
else
precision_mode="$3"
fi
# 检查ONNX模型是否存在
if [ -f "$onnx_directory/model.onnx" ]; then
echo "ONNX model found at $onnx_directory/model.onnx"
else
echo "Error: Unable to find ONNX model."
exit 1
fi
# 使用ATC命令对ONNX模型进行转换或优化
atc --model="$onnx_directory/model.onnx" \
--framework=5 \
--output="$om_directory/bge-reranker-large_'$soc_version'_'$precision_mode'" \
--soc_version="$soc_version" \
--input_shape="input_ids:-1,-1;attention_mask:-1,-1" \
--precision_mode_v2="$precision_mode" \
--modify_mixlist="$om_directory/ops_info.json"
# 检查ATC命令是否执行成功
# shellcheck disable=SC2181
if [ $? -eq 0 ]; then
echo "Model conversion with ATC successful."
else
echo "Error: Failed to convert model with ATC."
exit 1
fi

View File

@ -0,0 +1,299 @@
# Copyright Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
import argparse
import json
import logging
import os
import time
from typing import Any, List, Union, Tuple
import datasets
import numpy as np
import torch
import transformers.tokenization_utils_base
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from optimum.onnxruntime import ORTModelForSequenceClassification
from tqdm import tqdm as progressbar
from atb_llm.utils.file_utils import safe_open
def get_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description='Evaluate LLM.')
parser.add_argument(
'--model_type_or_path',
type=str,
required=True,
help='Specipy model type to load default model or path to the directory containing model file.'
)
parser.add_argument(
'--input_shape',
type=str,
required=True,
help='Shape of input tensors.'
)
parser.add_argument(
'--device',
type=int,
default=6,
choices=list(range(8)),
help='Adapt model on device id x.'
)
parser.add_argument(
'--loop',
type=int,
default=50,
help='Evaluation loops.'
)
return parser.parse_args()
class Model:
def __init__(self, tokenizer_path: str, device_id: int) -> None:
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
self.device, self.runtime = self.init_runtime(device_id)
def init_runtime(self, device_id: int) -> Tuple[Union[str, int], Any]:
if self.__class__.__name__.startswith(('PyTorch', 'ONNX')):
try:
import torch_npu
except ImportError:
device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
return device, 0
elif self.__class__.__name__.startswith('OM'):
from ais_bench.infer.interface import InferSession
return device_id, InferSession
else:
raise RuntimeError
def tokenize(
self,
sentences_batch: List[List[str]],
seq_len: int
) -> transformers.tokenization_utils_base.BatchEncoding:
encoded_inputs = self.tokenizer(
sentences_batch,
padding='max_length',
truncation=True,
return_tensors='pt',
max_length=seq_len
).to(self.device)
return encoded_inputs
def encode(self, encoded_inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
# Compute token embedding time
computing_time = self._encode_batched(encoded_inputs)
return computing_time
def compute_scores(self, pairs: List[List[str]], batch_size: int, seq_len: int, loop: int) -> dict:
# Tokenize sentences
encoded_inputs = self.tokenize(pairs, seq_len)
all_computing_time = []
for _ in progressbar(range(loop), 'Evaluating...'):
computing_time = self.encode(encoded_inputs)
all_computing_time.append(computing_time)
try:
throughput = 1000 * batch_size / np.mean(all_computing_time)
except ZeroDivisionError as e:
raise RuntimeError('{} because no evaluation results'.format(e)) from e
scores = {
'compute_time': {
'min': np.min(all_computing_time),
'max': np.max(all_computing_time),
'mean': np.mean(all_computing_time),
'median': np.median(all_computing_time),
'percentile(99%)': np.percentile(all_computing_time, 99)
},
'throughput': throughput
}
return scores
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
""" Returns a list of embeddings for the given sentences.
Args:
inputs (`BatchEncoding`): List of sentences to encode
Returns:
`float: Computing time of embeddings for the given sentences
"""
# 规避【华为Python规范】【建议】G.CLS.07 类的方法不需要访问实例时建议定义为staticmethod或classmethod
_ = self
return 0.0
class PyTorchModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
super(PyTorchModel, self).__init__(tokenizer_path, device_id)
self.model = AutoModelForSequenceClassification.from_pretrained(
model_path,
local_files_only=True,
trust_remote_code=True
).half().to(self.device)
self.model.eval()
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
tick = time.time()
with torch.no_grad():
self.model(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
tock = time.time()
return 1000 * (tock - tick)
class ONNXModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, device_id: int):
super(ONNXModel, self).__init__(tokenizer_path, device_id)
self.ort = ORTModelForSequenceClassification.from_pretrained(model_path).to(self.device)
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
tick = time.time()
with torch.inference_mode():
self.ort(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
tock = time.time()
return 1000 * (tock - tick)
class OMModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, device_id: int) -> None:
super(OMModel, self).__init__(tokenizer_path, device_id)
self.session = self.runtime(device_id, model_path)
def _encode_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> float:
input_ids = inputs.data['input_ids'].numpy().astype(np.int64)
attention_mask = inputs.data['attention_mask'].numpy().astype(np.int64)
tick = time.time()
self.session.infer(feeds=[input_ids, attention_mask], mode='dymshape', custom_sizes=10000000)
tock = time.time()
return 1000 * (tock - tick)
class PerformanceEvaluator:
def __init__(self, metadata: dict) -> None:
self.metadata = metadata
# load dataset from HuggingFace hub
self.dataset = datasets.load_dataset(
self.metadata['dataset']['path'].split('.')[-1],
data_files={self.metadata['eval_splits'][0]: self.metadata['dataset']['path']}
)
self.samples = self.dataset[self.metadata['eval_splits'][0]]
def __call__(
self,
model: Model,
input_shape: Union[Tuple, List],
loop: int) -> dict:
"""This is called during training to evaluate the model.
It returns scores.
Args:
model (`Model`): the model to evaluate
input_shape (`Union[Tuple[int, int], List[int, int]]`): shape of input tensors
loop (`int`): evaluation loops
"""
return self.compute_performance(model, input_shape, loop)
def compute_performance(
self,
model: Model,
input_shape: Union[Tuple, List],
loop: int) -> dict:
batch_size, seq_len = input_shape
pairs = []
for sample in self.samples:
query = sample['query']
docs = []
docs.extend(sample['positive'])
docs.extend(sample['negative'])
for doc in docs:
pairs.append([query, doc])
pairs = pairs[:batch_size]
scores = model.compute_scores(pairs, batch_size, seq_len, loop)
return scores
class Evaluation:
def __init__(self, eval_args: argparse.Namespace):
self.input_shape = tuple(map(int, eval_args.input_shape.split(',')))
self.device_id = eval_args.device
self.loop = eval_args.loop
# dataset metadata
self.metadata = {
'name': 'T2RerankingLocal',
'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
'reference': 'https://arxiv.org/abs/2304.03679',
'dataset': {
'path': '.cache/huggingface/datasets/C-MTEB/T2Reranking/dev-00000-of-00001-65d96bde8023d9b9.parquet',
'revision': '76631901a18387f85eaa53e5450019b87ad58ef9',
},
'type': 'Reranking',
'category': 's2p',
'eval_splits': ['test'],
'eval_langs': ['zh'],
'main_score': 'map'
}
# default model path
with safe_open('config.json', 'r', encoding='utf-8') as reader:
text = reader.read()
default_path = json.loads(text)['default_path']
pytorch_model_path = self.tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
om_model_path = os.path.abspath(default_path['om_model_path'])
model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
self.model_type = eval_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
default_model_path = model_path_map.get(self.model_type, 'not exist')
if default_model_path != 'not exist':
self.model_path = (
eval_args.model_type_or_path
if os.path.isdir(eval_args.model_type_or_path) or os.path.isfile(eval_args.model_type_or_path)
else default_model_path
)
else:
raise RuntimeError(
'load model failed because '
'\'{}\' is not a valid model type or path'.format(eval_args.model_type_or_path)
)
def load_model(self) -> Model:
model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
try:
model = model_map[self.model_type](
tokenizer_path=self.tokenizer_path,
model_path=self.model_path,
device_id=self.device_id
)
except KeyError as e:
raise RuntimeError('load {} model failed because {}'.format(self.model_type, e)) from e
return model
def run(self) -> dict:
model = self.load_model()
evaluator = PerformanceEvaluator(self.metadata)
eval_results = evaluator(model, self.input_shape, self.loop)
return eval_results
if __name__ == '__main__':
logger = logging.getLogger(__name__)
logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.INFO)
args = get_args()
evaluation = Evaluation(args)
results = evaluation.run()
logging.info(results)

View File

@ -0,0 +1,351 @@
# Copyright Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
import argparse
import json
import logging
import os
from typing import Any, List, Union, Tuple
import datasets
import numpy as np
import torch
import transformers.tokenization_utils_base
from mteb import MTEB, AbsTaskReranking
from C_MTEB.tasks import ChineseRerankingEvaluator
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from optimum.onnxruntime import ORTModelForSequenceClassification
from tqdm import tqdm as progressbar
from atb_llm.utils.file_utils import safe_open
def get_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description='Evaluate LLM.')
parser.add_argument(
'--model_type_or_path',
type=str,
required=True,
help='Specipy model type to load default model or path to the directory containing model file.'
)
parser.add_argument(
'--batch_size',
type=int,
default=20,
help='Batch size of dataset for computing.'
)
parser.add_argument(
'--device',
type=int,
default=6,
choices=list(range(8)),
help='Adapt model on device id x.'
)
return parser.parse_args()
# copied from mteb.evaluation.evaluators.utils.cos_sim
def cos_sim(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
"""Computes the cosine similarity cos_sim(a[i], b[j]) for all i and j.
Returns:
Matrix with res[i][j] = cos_sim(a[i], b[j])
"""
if not isinstance(a, torch.Tensor):
a = torch.tensor(a)
if not isinstance(b, torch.Tensor):
b = torch.tensor(b)
if len(a.shape) == 1:
a = a.unsqueeze(0)
if len(b.shape) == 1:
b = b.unsqueeze(0)
a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
# transpose will cause RuntimeError in C_MTEB.tasks.ChineseRerankingEvaluator.compute_metrics_from_biencoder():
# mat1 and mat2 shapes cannot be multiplied
try:
similarity = torch.mm(a_norm, b_norm.transpose(0, 1))
except RuntimeError:
similarity = torch.mm(a_norm, b_norm)
return similarity
class ChineseRerankingEvaluatorTweaked(ChineseRerankingEvaluator):
# copied from mteb.evaluation.evaluators.RerankingEvaluator._compute_metrics_instance with similarity_fct->cos_sim
def _compute_metrics_instance(
self,
query_emb: torch.Tensor,
docs_emb: torch.Tensor,
is_relevant: List[bool]
) -> dict[str, float]:
"""Computes metrics for a single instance = (query, positives, negatives)
Args:
query_emb (`torch.Tensor` of shape `(num_queries, hidden_size)`): Query embedding
if `num_queries` > 0: we take the closest document to any of the queries
docs_emb (`torch.Tensor` of shape `(num_pos+num_neg, hidden_size)`): Candidates documents embeddings
is_relevant (`List[bool]` of length `num_pos+num_neg`): True if the document is relevant
Returns:
scores (`Dict[str, float]`):
- `mrr`: Mean Reciprocal Rank @ `self.mrr_at_k`
- `ap`: Average Precision
"""
pred_scores = cos_sim(query_emb, docs_emb)
if len(pred_scores.shape) > 1:
pred_scores = torch.amax(pred_scores, dim=0)
pred_scores_argsort = torch.argsort(-pred_scores) # Sort in decreasing order
mrr = self.mrr_at_k_score(is_relevant, pred_scores_argsort, self.mrr_at_k)
ap = self.ap_score(is_relevant, pred_scores.cpu().tolist())
return {'mrr': mrr, 'ap': ap}
# copied from C_MTEB.tasks.Reranking.evaluate
def evaluate(self, model_for_eval, split: str = 'test', **kwargs: Any) -> dict[str, float]:
if not self.data_loaded:
self.load_data()
data_split = self.dataset[split]
evaluator = ChineseRerankingEvaluatorTweaked(data_split, **kwargs)
scores = evaluator(model_for_eval)
return dict(scores)
# rewrite
AbsTaskReranking.evaluate = evaluate
# custom task
class T2RerankingLocal(AbsTaskReranking):
# 规避【华为Python规范】【要求】G.CLS.08 避免在__init__方法外定义类实例属性
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.dataset = None
self.data_loaded = None
@property
def description(self) -> dict:
return {
'name': 'T2RerankingLocal',
'description': 'T2Ranking: A large-scale Chinese Benchmark for Passage Ranking',
'reference': "https://arxiv.org/abs/2304.03679",
'dataset': {
'path': '.cache/huggingface/datasets/C-MTEB/T2Reranking/dev-00000-of-00001-65d96bde8023d9b9.parquet',
'revision': '76631901a18387f85eaa53e5450019b87ad58ef9',
},
'type': 'Reranking',
'category': 's2p',
'eval_splits': ['test'],
'eval_langs': ['zh'],
'main_score': 'map',
}
def load_data(self, **kwargs) -> None:
if self.data_loaded:
return
try:
self.dataset = datasets.load_dataset(
'parquet',
data_files={self.description['eval_splits'][0]: self.description['dataset']['path']}
)
except KeyError as e:
raise RuntimeError('load dataset failed because {}'.format(e)) from e
else:
self.data_loaded = True
# custom model
class Model:
def __init__(self, tokenizer_path: str, batch_size: int, device_id: int) -> None:
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
self.device, self.runtime = self.init_runtime(device_id)
self.batch_size = batch_size
def init_runtime(self, device_id: int) -> Tuple[Union[str, int], Any]:
if self.__class__.__name__.startswith(('PyTorch', 'ONNX')):
try:
import torch_npu
except ImportError:
device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
return device, 0
elif self.__class__.__name__.startswith('OM'):
from ais_bench.infer.interface import InferSession
return device_id, InferSession
else:
raise RuntimeError
def encode(self, sentences: List[str]) -> torch.Tensor:
""" Returns a list of embeddings for the given sentences.
Args:
sentences (`List[str]`): List of sentences to encode
Returns:
`torch.Tensor`: Tensor of embeddings for the given sentences
"""
all_embeddings = []
for start_index in progressbar(range(0, len(sentences), self.batch_size)):
sentences_batch = sentences[start_index:start_index + self.batch_size]
# Tokenize sentences
encoded_inputs = self.tokenizer(
sentences_batch,
padding='max_length',
truncation=True,
return_tensors='pt',
max_length=512
)
# Compute token embeddings
embeddings = self._encode_or_compute_batched(encoded_inputs)
all_embeddings.extend(embeddings)
if all_embeddings:
if isinstance(all_embeddings, np.ndarray):
all_embeddings = torch.from_numpy(all_embeddings)
else:
all_embeddings = torch.stack(all_embeddings)
else:
all_embeddings = torch.Tensor()
return all_embeddings
def compute_score(self, sentence_pairs: Union[List[List[str]], Tuple[str, str]]) -> List[float]:
""" Returns a list of scores for the given sentence pairs.
Args:
sentence_pairs (`Union[List[List[str]], Tuple[str, str]]`): List of sentences pairs to compute score
Returns:
`List[float]`: List of scores for the given sentence pairs
"""
if not isinstance(sentence_pairs, list):
raise TypeError('type of `sentence_pairs` is not `list`')
if isinstance(sentence_pairs[0], str):
sentence_pairs = [sentence_pairs]
all_scores = []
for start_index in progressbar(range(0, len(sentence_pairs), self.batch_size), 'Computing'):
pairs_batch = sentence_pairs[start_index:start_index + self.batch_size]
# Tokenize sentences
encoded_inputs = self.tokenizer(
pairs_batch,
padding='max_length',
truncation=True,
return_tensors='pt',
max_length=512
).to(self.device)
scores = self._encode_or_compute_batched(encoded_inputs)
all_scores.extend(scores.numpy().tolist())
return all_scores[0] if len(all_scores) == 1 else all_scores
def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
""" Returns a list of embeddings for the given sentences.
Args:
inputs (`BatchEncoding`): List of sentences to encode
Returns:
`torch.Tensor`: Tensor of embeddings for the given sentences
"""
# 规避【华为Python规范】【建议】G.CLS.07 类的方法不需要访问实例时建议定义为staticmethod或classmethod
_ = self
return torch.tensor(0)
class PyTorchModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
super(PyTorchModel, self).__init__(tokenizer_path, batch_size, device_id)
self.model = AutoModelForSequenceClassification.from_pretrained(
model_path,
local_files_only=True,
trust_remote_code=True
).half().to(self.device)
self.model.eval()
def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
with torch.no_grad():
outputs = self.model(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
return outputs
class ONNXModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int):
super(ONNXModel, self).__init__(tokenizer_path, batch_size, device_id)
self.ort = ORTModelForSequenceClassification.from_pretrained(model_path).to(self.device)
def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
with torch.inference_mode():
outputs = self.ort(**inputs, return_dict=True).logits.view(-1, ).float().cpu()
return outputs
class OMModel(Model):
def __init__(self, tokenizer_path: str, model_path: str, batch_size: int, device_id: int) -> None:
super(OMModel, self).__init__(tokenizer_path, batch_size, device_id)
self.session = self.runtime(device_id, model_path)
def _encode_or_compute_batched(self, inputs: transformers.tokenization_utils_base.BatchEncoding) -> torch.Tensor:
input_ids = inputs.data['input_ids'].numpy().astype(np.int64)
attention_mask = inputs.data['attention_mask'].numpy().astype(np.int64)
session_outputs = self.session.infer(feeds=[input_ids, attention_mask], mode='dymshape', custom_sizes=10000000)
outputs = torch.from_numpy(session_outputs[0][:, 0]).view(-1, ).float()
return outputs
def load_model(model_args: argparse.Namespace) -> Model:
# default model path
with safe_open('config.json', 'r', encoding='utf-8') as reader:
text = reader.read()
default_path = json.loads(text)['default_path']
pytorch_model_path = tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
om_model_path = os.path.abspath(default_path['om_model_path'])
model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
model_map = {'pytorch': PyTorchModel, 'onnx': ONNXModel, 'om': OMModel}
model_type = model_args.model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
default_model_path = model_path_map.get(model_type, 'not exist')
if default_model_path != 'not exist':
model_path = (
model_args.model_type_or_path
if os.path.isdir(model_args.model_type_or_path) or os.path.isfile(model_args.model_type_or_path)
else default_model_path
)
else:
raise RuntimeError(
'load model failed because '
'\'{}\' is not a valid model type or path'.format(model_args.model_type_or_path)
)
try:
model_for_eval = model_map[model_type](
tokenizer_path=tokenizer_path,
model_path=model_path,
batch_size=model_args.batch_size,
device_id=model_args.device
)
except KeyError as e:
raise RuntimeError('load {} model failed because {}'.format(model_type, e)) from e
return model_for_eval
if __name__ == '__main__':
logger = logging.getLogger(__name__)
logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.INFO)
args = get_args()
model = load_model(args)
task = ['T2RerankingLocal']
evaluation = MTEB(tasks=task, task_langs=['zh'])
results = evaluation.run(model)
logging.info(results)

View File

@ -0,0 +1,16 @@
{
"black-list": {
"to-remove": [],
"to-add": []
},
"white-list": {
"to-remove": [],
"to-add": [
"Cast",
"FlattenV2",
"LayerNorm",
"GatherShapes",
"GatherV2"
]
}
}

View File

@ -0,0 +1,39 @@
{
"_name_or_path": "models/pytorch",
"architectures": [
"XLMRobertaForSequenceClassification"
],
"auto_map": {
"AutoConfig": "models/pytorch--configuration_xlm_roberta.XLMRobertaConfig",
"AutoModel": "models/pytorch--modeling_xlm_roberta_ascend.XLMRobertaModel",
"AutoModelForSequenceClassification": "models/pytorch--modeling_xlm_roberta_ascend.XLMRobertaForSequenceClassification"
},
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"classifier_dropout": null,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"id2label": {
"0": "LABEL_0"
},
"initializer_range": 0.02,
"intermediate_size": 4096,
"label2id": {
"LABEL_0": 0
},
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "xlm-roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"output_past": true,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"torch_dtype": "float32",
"transformers_version": "4.30.0",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 250002
}

View File

@ -0,0 +1,170 @@
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" XLM-RoBERTa configuration"""
from collections import OrderedDict
from typing import Mapping
from transformers.configuration_utils import PretrainedConfig
from transformers.onnx import OnnxConfig
from transformers.utils import logging
logger = logging.get_logger(__name__)
XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"xlm-roberta-base": "https://huggingface.co/xlm-roberta-base/resolve/main/config.json",
"xlm-roberta-large": "https://huggingface.co/xlm-roberta-large/resolve/main/config.json",
"xlm-roberta-large-finetuned-conll02-dutch": (
"https://huggingface.co/xlm-roberta-large-finetuned-conll02-dutch/resolve/main/config.json"
),
"xlm-roberta-large-finetuned-conll02-spanish": (
"https://huggingface.co/xlm-roberta-large-finetuned-conll02-spanish/resolve/main/config.json"
),
"xlm-roberta-large-finetuned-conll03-english": (
"https://huggingface.co/xlm-roberta-large-finetuned-conll03-english/resolve/main/config.json"
),
"xlm-roberta-large-finetuned-conll03-german": (
"https://huggingface.co/xlm-roberta-large-finetuned-conll03-german/resolve/main/config.json"
),
}
class XLMRobertaConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a [`XLMRobertaModel`] or a [`TFXLMRobertaModel`]. It
is used to instantiate a XLM-RoBERTa model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the XLMRoBERTa
[xlm-roberta-base](https://huggingface.co/xlm-roberta-base) architecture.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the XLM-RoBERTa model. Defines the number of different tokens that can be represented by
the `inputs_ids` passed when calling [`XLMRobertaModel`] or [`TFXLMRobertaModel`].
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the `token_type_ids` passed when calling [`XLMRobertaModel`] or
[`TFXLMRobertaModel`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
classifier_dropout (`float`, *optional*):
The dropout ratio for the classification head.
Examples:
```python
>>> from transformers import XLMRobertaConfig, XLMRobertaModel
>>> # Initializing a XLM-RoBERTa xlm-roberta-base style configuration
>>> configuration = XLMRobertaConfig()
>>> # Initializing a model (with random weights) from the xlm-roberta-base style configuration
>>> model = XLMRobertaModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "xlm-roberta"
def __init__(
self,
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
layer_norm_eps=1e-12,
pad_token_id=1,
bos_token_id=0,
eos_token_id=2,
position_embedding_type="absolute",
use_cache=True,
classifier_dropout=None,
**kwargs,
):
super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.intermediate_size = intermediate_size
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.type_vocab_size = type_vocab_size
self.initializer_range = initializer_range
self.layer_norm_eps = layer_norm_eps
self.position_embedding_type = position_embedding_type
self.use_cache = use_cache
self.classifier_dropout = classifier_dropout
# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->XLMRoberta
class XLMRobertaOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task == "multiple-choice":
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
else:
dynamic_axis = {0: "batch", 1: "sequence"}
return OrderedDict(
[
("input_ids", dynamic_axis),
("attention_mask", dynamic_axis),
]
)

View File

@ -0,0 +1,4 @@
optimum==1.18.0
onnx==1.16.0
onnxruntime==1.17.1
transformers==4.33.0

View File

@ -0,0 +1,181 @@
# Copyright Huawei Technologies Co., Ltd. 2024-2024. All rights reserved.
import argparse
import logging
import json
import os
import time
import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from optimum.onnxruntime import ORTModelForSequenceClassification
from atb_llm.utils.file_utils import safe_open
logger = logging.getLogger(__name__)
logging.basicConfig(format='[%(levelname)s] %(message)s', level=logging.DEBUG)
parser = argparse.ArgumentParser(description='Adapting LLM on Ascend.')
parser.add_argument(
'--model_type_or_path',
type=str,
required=True,
help='Specipy model type to load default model or path to the directory containing model file.'
)
parser.add_argument(
'--device',
type=int,
default=6,
choices=list(range(8)),
help='Adapt model on device id x.'
)
# Default model path
with safe_open('config.json', 'r', encoding='utf-8') as reader:
text = reader.read()
default_path = json.loads(text)['default_path']
pytorch_model_path = tokenizer_path = os.path.abspath(default_path['tokenizer_path'])
onnx_model_path = os.path.abspath(default_path['onnx_model_path'])
om_model_path = os.path.abspath(default_path['om_model_path'])
# Query and passage we want sentence embeddings for
QUERY = '什么是大熊猫?'
POSITIVE = '大熊猫Ailuropoda melanoleuca属于食肉目熊科的一种哺乳动物体色为黑白两色。是中国特有物种'
NEGATIVE = '比熊犬法语Bichon Frisébichon à poil frisé意指“白色卷毛的玩赏用小狗”是一种小型犬品种'
pairs = [[QUERY, POSITIVE], [QUERY, NEGATIVE]]
logger.info('query and passage for inference: %s', pairs)
# Load local tokenizer
tokenizer = AutoTokenizer.from_pretrained(pytorch_model_path)
# Tokenize sentences
encoded_input = tokenizer(pairs, padding='max_length', return_tensors='pt', max_length=512)
def infer_pytorch(model_path: str, device_id: int) -> None:
# Set device
try:
import torch_npu
except ImportError:
device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
# Load model from local
model = AutoModelForSequenceClassification.from_pretrained(
model_path,
local_files_only=True,
trust_remote_code=True
).half().to(device)
model.eval()
encoded_input_to_device = encoded_input.to(device)
# Compute similarity scores
for iters in range(2):
with torch.no_grad():
start_time = time.time()
scores = model(**encoded_input_to_device, return_dict=True).logits.view(-1, ).float()
exec_time = time.time() - start_time
logger.info(
'%s%s inference time: %.2f ms',
iters + 1,
'tsnrhtdd'[(iters + 1) % 5 * ((iters + 1) % 100 ^ 15 > 4 > (iters + 1) % 10)::4],
exec_time * 1000
)
logger.info('scores [positive, negative]: %s', scores.cpu())
# Free resource
if device.startswith('npu'):
try:
torch.npu.empty_cahce()
except AttributeError:
pass
elif device.startswith('cuda'):
torch.cuda.empty_cache()
def infer_onnx(model_path: str, device_id: int) -> None:
# Set device
try:
import torch_npu
except ImportError:
device = 'cuda:{}'.format(device_id) if torch.cuda.is_available() else 'cpu'
else:
device = 'npu:{}'.format(device_id)
torch_npu.npu.set_device(device_id)
torch.npu.set_compile_mode(jit_compile=False)
# Load model from local
ort = ORTModelForSequenceClassification.from_pretrained(model_path).to(device)
encoded_input_to_device = encoded_input.to(device)
# Compute similarity scores
for iters in range(2):
with torch.inference_mode():
start_time = time.time()
scores = ort(**encoded_input_to_device, return_dict=True).logits.view(-1, ).float()
exec_time = time.time() - start_time
logger.info(
'%s%s inference time: %.2f ms',
iters + 1,
'tsnrhtdd'[(iters + 1) % 5 * ((iters + 1) % 100 ^ 15 > 4 > (iters + 1) % 10)::4],
exec_time * 1000
)
logger.info('scores [positive, negative]: %s', scores.cpu())
# Free resource
if device.startswith('npu'):
try:
torch.npu.empty_cahce()
except AttributeError:
pass
elif device.startswith('cuda'):
torch.cuda.empty_cache()
def infer_om(model_path: str, device_id: int) -> None:
# Tokenize sentences
input_ids = encoded_input.data['input_ids'].numpy().astype(np.int64)
attention_mask = encoded_input.data['attention_mask'].numpy().astype(np.int64)
# Load model from local
from ais_bench.infer.interface import InferSession
session = InferSession(device_id, model_path)
# Compute similarity scores
for iters in range(2):
output = session.infer(feeds=[input_ids, attention_mask], mode='dymshape', custom_sizes=10000000)
scores = torch.from_numpy(output[0][:, 0]).view(-1, ).float()
exec_time = session.summary().exec_time_list[-1]
logger.info(
'%s%s inference time: %.2f ms',
iters + 1,
'tsnrhtdd'[(iters + 1) % 5 * ((iters + 1) % 100 ^ 15 > 4 > (iters + 1) % 10)::4],
exec_time[1] - exec_time[0]
)
logger.info('scores [positive, negative]: %s', scores)
# Free resource
session.free_resource()
def infer(model_type_or_path: str = None, device_id: int = 0) -> None:
model_path_map = {'pytorch': pytorch_model_path, 'onnx': onnx_model_path, 'om': om_model_path}
model_map = {'pytorch': infer_pytorch, 'onnx': infer_onnx, 'om': infer_om}
model_type = model_type_or_path.removesuffix('/').split('.')[-1].split('/')[-1]
default_model_path = model_path_map.get(model_type, 'not exist')
if default_model_path != 'not exist':
model_path = (
model_type_or_path
if os.path.isdir(model_type_or_path) or os.path.isfile(model_type_or_path)
else default_model_path
)
else:
raise RuntimeError(
'load model failed because '
'\'{}\' is not a valid model type or path'.format(model_type_or_path)
)
try:
model_map[model_type](model_path, device_id)
except KeyError as e:
raise RuntimeError('load {} model failed because {}'.format(model_type, e)) from e
if __name__ == '__main__':
args = parser.parse_args()
infer(args.model_type_or_path, args.device)

View File

@ -0,0 +1,138 @@
# BLOOM
* [BLOOM](https://huggingface.co/bigscience/bloom) (BigScience Large Open-science Open-access Multilingual Language Model)
* 此代码仓中实现了一套基于 NPU 硬件的 BLOOM 推理模型。
## 特性矩阵
- 此矩阵罗列了各 BLOOM 模型支持的特性:
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI | 长序列 |
|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|--------------|--------------------------|-----|--------|-----|-----|
| bloom (176B) | 支持world size 8 | 否 | 是 | 否 | 否 | 是 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 |
| bloom-7b1 | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 否 | 否 |
| bloomz-7b1-mt | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 否 | 否 |
## 推理使用说明
### 路径变量解释
| 变量名 | 含义 |
| ------------- | ------------------------------------------------------------ |
| `working_dir` | 加速库及模型库下载后放置的目录 |
| `llm_path` | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`若使用gitee下载的代码则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| `script_path` | 脚本所在路径。BLOOM 系列模型的工作脚本所在路径为`{llm_path}/examples/models/bloom` |
| `weight_path` | HF 原始模型权重路径(`.safetensors` 格式) |
权重下载链接:
* bloom (176b): https://huggingface.co/bigscience/bloom
* bloomz-7b1-mt: https://huggingface.co/bigscience/bloomz-7b1-mt
* bloom-7b1: https://huggingface.co/bigscience/bloomz-7b1
> 下载权重时无需下载 `pytorch_model.bin.index.json` 以及 `.bin` 文件。
框架加载权重时会从下载的 `config.json` 里面读取 `torch_dtype`,因此需要手动在 `config.json` 里面补上 `"torch_dtype": "float16"`
### 环境准备
1、安装 CANN 8.0 的环境,并 `source /path/to/cann/set_env.sh`
2、使用 Python 3.9 或更高;
3、使用 torch 2.0 或更高版本,并安装对应的 torch_npu
4、安装依赖
```shell
pip install transformers==4.34.0
pip install accelerate
```
5、安装 `atb_llm`:
```shell
cd $llm_path
python setup.py bdist_wheel
python -m pip install dist/*.whl --force-reinstall
```
## BLOOMZ-7B1-MT
### 权重准备
在 Hugging Face 上下载模型权重文件(推荐下载 `.safetensors``.bin` 需要转换成 `.safetensors`),权重路径为 `weight_path`
### PagedAttention模型
进入 `modeltest` 路径下:
```shell
cd tests/modeltest
```
进行测试前需要先设置一些环境变量:
```shell
export HCCL_BUFFSIZE=110
export PYTHONWARNINGS="ignore"
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_USE_TILING_COPY_STREAM=1
export ATB_CONTEXT_WORKSPACE_RING=1
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
```
#### 性能测试
> `$weight_path` 可以是 HuggingFace 原始权重路径,也可以是量化后的模型权重路径(下同)。
```shell
bash run.sh pa_fp16 performance [[seq_in,seq_out],[seq_in,seq_out]] $batch_size bloom $weight_path $tp
```
例如:`TP = 8``batch_size = 1`
```shell
bash run.sh pa_fp16 performance [[256,256],[512,512],[1024,1024],[2048,2048]] 1 bloom /path/to/model 8
```
#### 下游任务精度测试
```shell
bash run.sh pa_fp16 full_CEval $n_shot $batch_size bloom $weight_path $tp
```
例如:`TP = 8``batch_size = 1``CEval 5-shot`
```shell
bash run.sh pa_fp16 full_CEval 5 1 bloom /path/to/model 1
```
更详细的配置选项请参考:`examples/atb_models/tests/modeltest/README.md`
## BLOOM-7B1
### PagedAttention模型
与 BLOOMZ-7B1-MT PagedAttention 模型测试方式相同。
## BLOOM-176B
### 权重准备
BLOOM-176B 由于权重较大(约 328GB仅支持 800I A2 机器上进行 TP8 W8A16 推理,首选需要对 HuggingFace 下载的原始权重进行量化:
```shell
# source CANN包
source /path/to/cann/set_env.sh
# 进入模型仓所在路径,详见*路径变量解释-llm_path*
cd $llm_path
# {浮点权重路径} 即 HuggingFace 下载的原始权重路径
python examples/models/bloom/convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A16量化权重路径} --w_bit 8 --a_bit 16 --act_method 3 --calib_file ""
```
### PagedAttention模型
与 BLOOMZ-7B1-MT PagedAttention 模型测试方式相同,只需要将 `{W8A16量化权重路径}` 作为 `$weight_path` 配置即可。

View File

@ -0,0 +1,76 @@
# Copyright Huawei Technologies Co., Ltd. 2024. All rights reserved.
import os
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import QuantConfig
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig
from transformers import BloomConfig
from examples.convert.model_slim.get_calibration_dataset import load_jsonl
from examples.convert.model_slim.quantifier import parse_arguments, Quantifier
from examples.convert.convert_utils import copy_tokenizer_files, modify_config
if __name__ == "__main__":
args = parse_arguments()
rank = int(os.getenv("RANK", "0"))
config = BloomConfig.from_pretrained(args.model_path)
disable_names = []
if args.a_bit != 16:
# W8A16, W4A16没有回退层
num_layers = config.num_hidden_layers
disable_names = [f"model.layers.{layer}.mlp.down_proj" for layer in range(num_layers)]
disable_names.append("lm_head")
anti_outlier_config = None
if args.anti_method:
anti_outlier_config = AntiOutlierConfig(anti_method=args.anti_method)
quant_config = QuantConfig(
a_bit=args.a_bit,
w_bit=args.w_bit,
disable_names=disable_names,
act_method=args.act_method,
w_sym=args.w_sym,
mm_tensor=False,
dev_type=args.device_type,
dev_id=rank,
pr=1.0,
fraction=args.fraction,
co_sparse=args.co_sparse,
do_smooth=args.do_smooth,
use_sigma=args.use_sigma,
sigma_factor=args.sigma_factor,
is_lowbit=args.is_lowbit,
use_kvcache_quant=args.use_kvcache_quant,
)
# 默认无校准数据集
calibration_dataset = None
# 若存在calib_file则使用calib_file作为校准数据集
if args.calib_file:
calibration_dataset = load_jsonl(args.calib_file)
calibration_dataset = calibration_dataset
quant_weight_generator = Quantifier(args.model_path, quant_config, anti_outlier_config, args.device_type)
quant_weight_generator.tokenizer.pad_token_id = 0
tokenized_data = None
if calibration_dataset is not None:
tokenized_data = quant_weight_generator.get_tokenized_data(calibration_dataset)
quant_weight_generator.convert(tokenized_data, args.save_directory, args.disable_level)
#为适配工具稀疏量化传入w_bit=4,a_bit=8暂时修改quant_type
quant_type = f"w{args.w_bit}a{args.a_bit}" + ("s" if (args.co_sparse or args.is_lowbit) else "")
is_sparseCompress = args.w_bit == 4 and args.a_bit == 8 and (args.co_sparse or args.is_lowbit)
if is_sparseCompress:
quant_type = "w8a8s"
modify_config(
args.model_path, args.save_directory, config.torch_dtype,
quant_type,
args.use_kvcache_quant
)
copy_tokenizer_files(args.model_path, args.save_directory)

View File

@ -0,0 +1,37 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export MAX_MEMORY_GB=29
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=20030
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export HCCL_BUFFSIZE=120
export HCCL_WHITELIST_DISABLE=1
export ATB_CONTEXT_WORKSPACE_RING=1
export ATB_CONTEXT_WORKSPACE_SIZE=2629145600
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
export ATB_LAUNCH_KERNEL_WITH_TILING=0
export ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=1
export ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=0
# solve num_blocks < 0 free_memory < 0
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export RESERVED_MEMORY_GB=0
export ATB_CONTEXT_WORKSPACE_SIZE=0
extra_param=""
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_fa --model_path $1 $extra_param
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
fi
# --input_text "Common sense questions and answers\n\nQuestion: Why do we need to learn a new language\nFactual answer:" --max_output_length 32

View File

@ -0,0 +1,36 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export IS_QUANT=0
export MAX_MEMORY_GB=29
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export world_size=8
export MASTER_PORT=20030
export IS_BF16=false
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export RESERVED_MEMORY_GB=0
export ATB_CONTEXT_WORKSPACE_SIZE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export INT8_FORMAT_NZ_ENABLE=1
extra_param=""
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$IS_BF16" = true ]; then
extra_param="${extra_param} --is_bf16"
fi
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_pa --model_path $1 $extra_param
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
fi

View File

@ -0,0 +1,231 @@
# ChatGLM2-6B 模型推理指导 <!-- omit in toc -->
# 概述
- [ChatGLM2-6B](https://github.com/THUDM/ChatGLM2-6B/) 是开源中英双语对话模型 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) 的第二代版本在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上ChatGLM2-6B有更强大的性能、更长的上下文、更高效的推理和更开放的协议。
- 此代码仓中实现了一套基于NPU硬件的ChatGLM2推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了ChatGLM2-6B模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化仅300I DUO支持 | MOE | MindIE | TGI | 长序列 |
|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
| ChatGLM2-6B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 是 | 否 | 否 | 否 | 是 | 否 | 是 | 是 | 否 |
- 此模型仓已适配的模型版本
- [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b/tree/main)
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
|--------|--------------------------------------------------|
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`若使用gitee下载的代码则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| script_path | 脚本所在路径;路径为${llm_path}/examples/models/chatglm/v2_6b |
| weight_path | 模型权重路径 |
## 权重转换
- 参考[此README文件](../../../README.md)
## 量化权重导出
量化权重可通过msmodelslim昇腾压缩加速工具实现。
### 环境准备
环境配置可参考msmodelslim官网https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/devtools/auxiliarydevtool/modelslim_0002.html
### 导出w8a8量化权重
通过`${llm_path}/examples/models/chatglm/v2_6b/quant_chatglm_w8a8.py`文件导出模型的量化权重(注意量化权重不要和浮点权重放在同一个目录下):
```shell
# 必须设置该线程数
export OMP_NUM_THREADS=48
python quant_chatglm_w8a8.py --model_path ${浮点权重路径} --save_path ${量化权重保存路径} --dataset_path ${校准数据集路径}
```
校准数据集从 [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/e84444333b6d434ea7b0/) 获取,解压后,使用解压目录下的 `CEval/val/Other/civil_servant.jsonl` 作为校准数据集。
导出量化权重后应生成`quant_model_weight_w8a8.safetensors`和`quant_model_description_w8a8.json`两个文件。
### 导出w4a16量化权重
通过`${llm_path}/examples/models/chatglm/v2_6b/quant_chatglm_w4a16.py`文件导出模型的量化权重(注意量化权重不要和浮点权重放在同一个目录下):
```shell
python quant_chatglm_w4a16.py --model_path ${浮点权重路径} --save_path ${量化权重保存路径} --dataset_path ${校准数据集路径}
```
校准数据集从 [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/e84444333b6d434ea7b0/) 获取,解压后,使用解压目录下的 `CEval/val/Social_Science/teacher_qualification.jsonl` 作为校准数据集。
导出量化权重后应生成`quant_model_weight_w4a16.safetensors`和`quant_model_description_w4a16.json`两个文件。
注:
1.quant_chatglm_w8a8.py和quant_chatglm_w4a16.py文件中已配置好较优的量化策略导出量化权重时可直接使用也可修改为其它策略。
2.执行脚本生成量化权重时会在生成的权重路径的config.json文件中添加(或修改)`quantize`字段,值为相应量化方式,当前仅支持`w8a8`和`w4a16`。
3.执行完以上步骤后,执行量化模型只需要替换权重路径。
4.如果生成权重时遇到`OpenBLAS Warning: Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP = 1 option`,可通过设置`export OMP_NUM_THREADS=1`来关闭多线程规避。
### 导出稀疏量化权重
执行generate_sparse.sh导出稀疏量化权重注意量化权重不要和浮点权重放在同一个目录下
```shell
bash generate_sparse.sh ${浮点权重路径} ${稀疏量化权重保存路径} ${llm_path}/examples/models/chatglm/v2_6b/calib_data.jsonl ${Tensor并行数}
```
执行后`${稀疏量化权重保存路径}`下会生成compress目录使用`${稀疏量化权重保存路径}/compress`目录作为权重目录进行推理。
注:
1.generate_sparse.sh文件中已配置好较优的量化策略导出量化权重时可直接使用也可修改为其它策略。
2.执行完以上步骤后,执行量化模型只需要替换权重路径为`${稀疏量化权重保存路径}/compress`。
3.当在npu上生成稀疏量化权重即--device_type为npu时注意需要将${浮点权重路径}/modeling_chatglm.py文件168行的@torch.jit.script注释。
## 300I DUO 运行操作说明
- 可开启CPU Performance模式以提高模型推理性能
```
cpupower frequency-set -g performance
```
### 对话测试
- 运行启动脚本
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_300i_duo.sh ${weight_path}
```
- 环境变量说明
- `export BIND_CPU=1`
- 绑定CPU核心开关
- 默认进行绑核
- 若当前机器未设置NUMA或绑核失败可将 BIND_CPU 设为 0
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../../README.md)的【启动脚本相关环境变量】章节
- `export TP_WORLD_SIZE=2`
- 指定模型运行时的TP数即world size
- 默认为单卡双芯
- 各模型支持的TP数参考“特性矩阵”
- “单卡双芯”运行请指定`TP_WORLD_SIZE`为`2`
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用20030端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- `export PYTHONPATH=${llm_path}:$PYTHONPATH`
- 将模型仓路径加入Python查询模块和包的搜索路径中
- 将${llm_path}替换为实际路径
- `export INT8_FORMAT_NZ_ENABLE=1`
- 服务化量化场景开启
- - 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
# 内存
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
# 性能
export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export HCCL_BUFFSIZE=110
```
## 800I A2 运行操作说明
- 可开启CPU Performance模式以提高模型推理性能
```
cpupower frequency-set -g performance
```
### 对话测试
**运行Paged Attention FP16**
- 运行启动脚本
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_800i_a2_pa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../../README.md)的【启动脚本相关环境变量】章节
- `export TP_WORLD_SIZE=1`
- 指定模型运行时的TP数即world size
- 默认为单卡
- 各模型支持的TP数参考“特性矩阵”
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用20030端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- `export PYTHONPATH=${llm_path}:$PYTHONPATH`
- 将模型仓路径加入Python查询模块和包的搜索路径中
- 将${llm_path}替换为实际路径
- `export IS_BF16=false`
- 是否使用BF16精度进行推理
- 默认使用FP16
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
# 内存
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
# 性能
export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export LCCL_ENABLE_FALLBACK=1
```
**运行Paged Attention BF16**
- 暂不支持
**运行Paged Attention W8A8量化**
- 运行启动脚本
- 与“运行Paged Attention FP16”的启动方式相同
- `${weight_path}`为W8A8量化权重的路径
- 环境变量说明
- 参见“运行Paged Attention FP16”中的环境变量说明
- 相比于FP16运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段,将此字段对应的值修改为`w8a8`
- 若config.json中无此字段则新增
**运行KV cache量化**
- 暂不支持
**运行Paged Attention 稀疏量化**
- 运行启动脚本
- 与“运行Paged Attention FP16”的启动方式相同
- `${weight_path}`为稀疏量化权重的路径
- 环境变量说明
- 参见“运行Paged Attention FP16”中的环境变量说明
- 相比于FP16运行量化时需修改稀疏量化权重`${weight_path}/config.json`中的`quantize`字段,将此字段对应的值修改为`w8a8sc`
- 若config.json中无此字段则新增
- 注意压缩算法与硬件强相关当前仅300I DUO卡支持稀疏量化
## 精度测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## 性能测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## Web交互
- 拉起MindIE Service后端
- 拉起Web后端
```shell
# 安装依赖
pip install -r web_requirements.txt
# 下载 GitHub 仓库
git clone https://github.com/THUDM/ChatGLM2-6B.git
cd ChatGLM2-6B
git reset --hard 921d7e9adc69020a19169d1ba4f76c2675a2dd29
# 应用适配代码
git apply ../web_demo.patch
cd ..
python3 ChatGLM2-6B/web_demo.py --model_path ${weight_path}
```
- 根据后台显示的IP和端口从浏览器访问
## FAQ
- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错,可通过配置`LD_PRELOAD`解决。
- 示例:`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`

View File

@ -0,0 +1,15 @@
{"id": 0, "inputs_pretokenized": "编写中小学教科书的直接依据是____。\nA. 《中华人民共和国教育法》\nB. 课程计划\nC. 课程标准\nD. 课程表", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 1, "inputs_pretokenized": "下列关于课程的三种文本表现形式说法正确的是____\nA. 课程计划是由当地教育主管部门制订的\nB. 课程标准是依据课程计划制定的\nC. 课程标准的核心是实施建议\nD. 教材编写的基本方式有直线式、螺旋式、交叉式", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 2, "inputs_pretokenized": "悦悦是一名右耳失聪的残疾儿童活动课上有时会听不清楚周老师所讲的内容因此经常提问题。对此周老师应当采取的措施是____。\nA. 给予悦悦更多的帮助和指导\nB. 指导家长带悦悦回家自学\nC. 建议家长将悦悦转到特殊幼儿园\nD. 照顾大多数幼儿,不理会悦悦", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 3, "inputs_pretokenized": "内流河也称“内陆河”是指没有流入海洋的河流大多分布在大陆内部干燥地区上游降水或冰雪融水为其主要补给水源最终消失于沙漠或注入内陆湖泊。下列中国内流河中最长的是____。\nA. 塔里木河\nB. 柴达木河\nC. 尼雅河\nD. 疏勒河", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 4, "inputs_pretokenized": "学校规定学生不能烫染头发但是小文为了彰显个性在假期把头发染成了棕色。面对小文的情况教师应该怎样处理____\nA. 年轻人追求个性是合情合理的,应该宽容对待\nB. 违反学校的校规,应该严格处分\nC. 强制要求小文将头发颜色染回来才可以进校门\nD. 探明小文违反校规的原因,并对其进行劝导和教育", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 5, "inputs_pretokenized": "张老师根据自己班级的情况为解决班级内部班干部的人际关系问题建立和谐融洽的班级氛围自主开发了“和谐人际”的班级课程这体现了教师____。\nA. 是教育教学的研究者\nB. 是课程的建设者和开发者\nC. 是学生学习的促进者\nD. 是社区型的开放教师", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 6, "inputs_pretokenized": "刘老师工作很负责学生在学校出现一点问题他就会与家长联系在与家长沟通时他经常以前辈的姿态对待家长对家长的教育方式指指点点。刘老师的做法____。\nA. 正确,老师就应该与家长经常沟通\nB. 正确,老师的经验比家长丰富,应该多指导家长\nC. 不正确,教师没有权利指导家长\nD. 不正确,教师应该与家长建立平等的沟通关系,尊重家长的人格", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 7, "inputs_pretokenized": "在古代印度有一户人家经营一家棉布店销售自己手工制作的衣服。你认为这户人家属于哪个等级____\nA. 婆罗门\nB. 刹帝利\nC. 吠舍\nD. 首陀罗", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 8, "inputs_pretokenized": "“小型分散便于开展多种多样的活动满足学生不同的兴趣、爱好发展学生的才能使学生得到更多的学习和锻炼的机会。”这种课外活动的形式是____。\nA. 科技活动\nB. 学科活动\nC. 个人活动\nD. 小组活动", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}
{"id": 9, "inputs_pretokenized": "小红每天晚上临睡前都要多次反复检查自己的书包确保带齐了第二天需要用的教材和文具。她明知道没有这个必要但就是控制不住。她可能出现了____。\nA. 抑郁症\nB. 焦虑症\nC. 强迫症\nD. 恐惧症", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 10, "inputs_pretokenized": "国家管理和评价课程的基础是____。\nA. 课程计划\nB. 课程标准\nC. 教学目标\nD. 教育目的", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 11, "inputs_pretokenized": "儿童坚持性发生明显质变的年龄约在____\nA. 34岁\nB. 45岁\nC. 56岁\nD. 6岁以后", "choices_pretokenized": [" A", " B", " C", " D"], "label": 1, "targets_pretokenized": ["B"]}
{"id": 12, "inputs_pretokenized": "《红楼梦》中人物众多、关系繁杂。为了帮助读者阅读许多红学爱好者都在网络上发布了自己整理制作的主要人物关系图。这属于____。\nA. 纲要策略\nB. 精细加工策略\nC. 资源管理策略\nD. 监控策略", "choices_pretokenized": [" A", " B", " C", " D"], "label": 0, "targets_pretokenized": ["A"]}
{"id": 13, "inputs_pretokenized": "学期结束时班主任王老师会对学生思想品德的发展变化情况进行评价。这项工作属于____。\nA. 工作总结\nB. 工作计划\nC. 操行评定\nD. 建立学生档案", "choices_pretokenized": [" A", " B", " C", " D"], "label": 2, "targets_pretokenized": ["C"]}
{"id": 14, "inputs_pretokenized": "人们常说“教学有法而教无定法。”这反映了教师的劳动具有____。\nA. 连续性\nB. 示范性\nC. 长期性\nD. 创造性", "choices_pretokenized": [" A", " B", " C", " D"], "label": 3, "targets_pretokenized": ["D"]}

View File

@ -0,0 +1,17 @@
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False
disable_names="transformer.encoder.layers.0.mlp.dense_4h_to_h transformer.encoder.layers.1.self_attention.query_key_value transformer.encoder.layers.1.self_attention.dense transformer.encoder.layers.1.mlp.dense_h_to_4h transformer.encoder.layers.1.mlp.dense_4h_to_h transformer.encoder.layers.2.self_attention.query_key_value transformer.encoder.layers.2.self_attention.dense transformer.encoder.layers.2.mlp.dense_h_to_4h transformer.encoder.layers.2.mlp.dense_4h_to_h transformer.encoder.layers.3.self_attention.query_key_value transformer.encoder.layers.3.self_attention.dense transformer.encoder.layers.4.self_attention.query_key_value transformer.encoder.layers.4.self_attention.dense transformer.encoder.layers.5.self_attention.query_key_value transformer.encoder.layers.5.self_attention.dense transformer.encoder.layers.6.self_attention.query_key_value transformer.encoder.layers.6.self_attention.dense transformer.encoder.layers.7.self_attention.query_key_value transformer.encoder.layers.7.self_attention.dense transformer.encoder.layers.8.self_attention.query_key_value transformer.encoder.layers.8.self_attention.dense transformer.encoder.layers.9.self_attention.query_key_value transformer.encoder.layers.9.self_attention.dense transformer.encoder.layers.11.self_attention.query_key_value transformer.encoder.layers.11.self_attention.dense transformer.encoder.layers.14.self_attention.query_key_value transformer.encoder.layers.14.self_attention.dense transformer.encoder.layers.19.self_attention.query_key_value transformer.encoder.layers.19.self_attention.dense transformer.encoder.layers.20.mlp.dense_4h_to_h transformer.encoder.layers.27.mlp.dense_4h_to_h transformer.output_layer"
weight_path=$1
w8a8s_weight_path=$2
w8a8sc_weight_path=${w8a8s_weight_path}/compress
calib_data=$3
tp_size=$4
cd ${ATB_SPEED_HOME_PATH}
python -m examples.convert.model_slim.quantifier --model_path ${weight_path} --save_directory ${w8a8s_weight_path} --calib_file ${calib_data} --disable_names ${disable_names} --device_type npu --is_lowbit True --w_bit 4 --a_bit 8
torchrun --nproc_per_node $tp_size -m examples.convert.model_slim.sparse_compressor --model_path ${w8a8s_weight_path} --save_directory ${w8a8sc_weight_path}
cp $weight_path/modeling_chatglm.py $w8a8sc_weight_path/

View File

@ -0,0 +1,50 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig, AntiOutlier
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
from examples.models.chatglm.v2_6b.quant_utils \
import parse_args, get_model_and_tokenizer, get_calib_dataset, copy_config_files, read_dataset
disable_names = [
'transformer.encoder.layers.0.mlp.dense_4h_to_h',
'transformer.encoder.layers.1.mlp.dense_4h_to_h',
'transformer.encoder.layers.2.self_attention.query_key_value',
'transformer.encoder.layers.2.mlp.dense_4h_to_h',
'transformer.output_layer'
]
def main():
args = parse_args()
fp16_path = args.model_path # 原始浮点模型路径
model, tokenizer = get_model_and_tokenizer(fp16_path)
calib_set = read_dataset(args.dataset_path)
dataset_calib = get_calib_dataset(tokenizer, calib_set[:1])
w_sym = True
anti_config = AntiOutlierConfig(a_bit=16, w_bit=4, anti_method="m3", dev_type="cpu", w_sym=w_sym)
anti_outlier = AntiOutlier(model, calib_data=dataset_calib, cfg=anti_config)
anti_outlier.process()
quant_config = QuantConfig(
a_bit=16,
w_bit=4,
disable_names=disable_names,
dev_type='cpu',
w_sym=w_sym,
mm_tensor=False,
is_lowbit=True,
open_outlier=False,
group_size=args.group_size
)
calibrator = Calibrator(model, quant_config, calib_data=[], disable_level='L0')
calibrator.run() # 执行PTQ量化校准
calibrator.save(args.save_path, save_type=["safe_tensor"]) # "safe_tensor"对应safetensors格式权重
copy_config_files(fp16_path, args.save_path, 'w4a16')
if __name__ == '__main__':
main()

View File

@ -0,0 +1,59 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
from examples.models.chatglm.v2_6b.quant_utils \
import parse_args, get_model_and_tokenizer, get_calib_dataset, copy_config_files, read_dataset
disable_names = [
'transformer.encoder.layers.0.self_attention.query_key_value',
'transformer.encoder.layers.0.mlp.dense_4h_to_h',
'transformer.encoder.layers.1.self_attention.query_key_value',
'transformer.encoder.layers.1.mlp.dense_h_to_4h',
'transformer.encoder.layers.1.mlp.dense_4h_to_h',
'transformer.encoder.layers.2.self_attention.query_key_value',
'transformer.encoder.layers.2.mlp.dense_h_to_4h',
'transformer.encoder.layers.2.mlp.dense_4h_to_h',
'transformer.encoder.layers.3.self_attention.query_key_value',
'transformer.encoder.layers.4.self_attention.query_key_value',
'transformer.encoder.layers.5.self_attention.query_key_value',
'transformer.encoder.layers.6.self_attention.query_key_value',
'transformer.encoder.layers.7.self_attention.query_key_value',
'transformer.encoder.layers.8.self_attention.query_key_value',
'transformer.encoder.layers.9.self_attention.query_key_value',
'transformer.encoder.layers.11.self_attention.query_key_value',
'transformer.encoder.layers.14.self_attention.query_key_value',
'transformer.encoder.layers.19.self_attention.query_key_value',
'transformer.encoder.layers.20.mlp.dense_4h_to_h',
'transformer.encoder.layers.27.mlp.dense_4h_to_h',
'transformer.output_layer'
]
quant_config = QuantConfig(
a_bit=8,
w_bit=8,
disable_names=disable_names,
dev_type='cpu',
act_method=1,
pr=1.0,
w_sym=True,
mm_tensor=False
)
def main():
args = parse_args()
fp16_path = args.model_path # 原始浮点模型路径
model, tokenizer = get_model_and_tokenizer(fp16_path)
calib_set = read_dataset(args.dataset_path)
dataset_calib = get_calib_dataset(tokenizer, calib_set)
calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L0')
calibrator.run() # 执行PTQ量化校准
calibrator.save(args.save_path, save_type=["safe_tensor"]) # "safe_tensor"对应safetensors格式权重
copy_config_files(fp16_path, args.save_path, 'w8a8')
if __name__ == '__main__':
main()

View File

@ -0,0 +1,62 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import os
import json
import shutil
import argparse
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from atb_llm.utils.file_utils import safe_open
def parse_args():
parser = argparse.ArgumentParser(description="Creating quant weights for ChatGLM2-6B or ChatGLM3-6B")
parser.add_argument("--model_path", type=str, required=True, help="The path to model float weights")
parser.add_argument("--save_path", type=str, default="./quant_weight_glm", help="The path to save quant weights")
parser.add_argument("--dataset_path", type=str, required=True, help="The dataset path")
parser.add_argument("--group_size", type=int, default=128, help="The group size for w4a16")
return parser.parse_args()
def get_model_and_tokenizer(model_path):
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_path, torch_dtype=torch.float32,
trust_remote_code=True).cpu()
model.eval()
return model, tokenizer
def read_dataset(dataset_path):
calib_set = []
with safe_open(dataset_path, encoding='utf-8') as file:
for line in file:
calib_set.append(json.loads(line))
return calib_set
# 获取校准数据函数定义
def get_calib_dataset(tokenizer, calib_list, device="cpu"): # device="npu:0" 如果需要使用npu进行量化
calib_dataset = []
for calib_data in calib_list:
text = calib_data['inputs_pretokenized']
inputs = tokenizer([text], return_tensors='pt')
calib_dataset.append([
inputs.data['input_ids'].to(device),
inputs.data['position_ids'].to(device),
inputs.data['attention_mask'].to(device)
])
return calib_dataset
def copy_config_files(fp16_path, quant_path, quant_type):
model_files = [f for f in os.listdir(fp16_path) if f.startswith(("config", "tokeniz", "modeling_chatglm.py"))]
for f in model_files:
shutil.copy2(os.path.join(fp16_path, f), os.path.join(quant_path, f))
with safe_open(os.path.join(quant_path, "config.json"), 'r+', encoding='utf-8') as f:
config = json.load(f)
config['quantize'] = quant_type
f.seek(0)
json.dump(config, f, indent=4)
f.truncate()

View File

@ -0,0 +1,18 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export BIND_CPU=1
export ASCEND_RT_VISIBLE_DEVICES=0,1
export TP_WORLD_SIZE=2
export MASTER_PORT=20030
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export HCCL_BUFFSIZE=110
export INT8_FORMAT_NZ_ENABLE=1
export PYTHONPATH=${llm_path}:$PYTHONPATH
torchrun --nproc_per_node $TP_WORLD_SIZE --master_port $MASTER_PORT -m examples.run_pa --model_path $1

View File

@ -0,0 +1,27 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export ASCEND_RT_VISIBLE_DEVICES=0
export TP_WORLD_SIZE=1
export MASTER_PORT=20030
export PYTHONPATH=${llm_path}:$PYTHONPATH
export IS_BF16=false
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export HCCL_OP_BASE_FFTS_MODE_ENABLE=TRUE
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export LCCL_ENABLE_FALLBACK=1
extra_param=""
# if [ "$IS_BF16" = true ]; then
# extra_param="${extra_param} --is_bf16"
# fi
if [ "$TP_WORLD_SIZE" == "1" ]; then python -m examples.run_pa --model_path $1 $extra_param
else
torchrun --nproc_per_node $TP_WORLD_SIZE --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
fi

View File

@ -0,0 +1,109 @@
diff --git a/web_demo.py b/web_demo.py
index 1af24c9..8c0e765 100644
--- a/web_demo.py
+++ b/web_demo.py
@@ -1,14 +1,23 @@
-from transformers import AutoModel, AutoTokenizer
+import json
+import argparse
+import requests
+from transformers import AutoTokenizer
import gradio as gr
import mdtex2html
-from utils import load_model_on_gpus
-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).cuda()
-# 多显卡支持使用下面两行代替上面一行将num_gpus改为你实际的显卡数量
-# from utils import load_model_on_gpus
-# model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)
-model = model.eval()
+def parse_args():
+ parser = argparse.ArgumentParser(description="ChatGLM2-6B/ChatGLM3-6b web demo")
+ parser.add_argument("--model_path", type=str, required=True, help="The path to model weights")
+ parser.add_argument("--mindie_sever_ip", type=str, default="127.0.0.1", help="The IP address of mindie server")
+ parser.add_argument("--mindie_sever_port", type=int, default=1025, help="The port of mindie server")
+ parser.add_argument("--max_new_tokens", type=int, default=512, help="Max new tokens to generate")
+ parser.add_argument("--concurrency", type=int, default=10, help="Concurrency count of web demo")
+
+ return parser.parse_args()
+
+
+args = parse_args()
+tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
"""Override Chatbot.postprocess"""
@@ -71,6 +80,49 @@ def predict(input, chatbot, max_length, top_p, temperature, history, past_key_va
yield chatbot, history, past_key_values
+def build_inputs(tokenizer, query: str):
+ # history由服务化内部自行处理
+ prompt = tokenizer.build_prompt(query, history=None)
+ return prompt
+
+
+def request(input, chatbot, max_length, top_p, temperature, history, past_key_values):
+ chatbot.append((parse_text(input), ""))
+
+ # 添加prompt格式以支持chat
+ promt = build_inputs(tokenizer, input)
+
+ response = requests.post(
+ f"http://{args.mindie_sever_ip}:{args.mindie_sever_port}/generate_stream",
+ json={
+ "inputs": promt,
+ "parameters": {
+ "max_new_tokens": max_length,
+ "do_sample": True,
+ "repetition_penalty": 1.05,
+ "seed": None,
+ "temperature": temperature,
+ # "top_k": 1,
+ "top_p": top_p,
+ "batch_size": 1
+ },
+ },
+ verify=False, stream=True
+ )
+
+ generate_text = ""
+ for line in response.iter_lines():
+ if not line:
+ continue
+ # 删除字符串开头的'data: '
+ res = line.decode('utf-8')[6:]
+ # 获取流式生成的文本内容
+ res_text = json.loads(res).get('token').get('text')
+ generate_text += res_text
+ chatbot[-1] = (parse_text(input), parse_text(generate_text))
+ yield chatbot, history, past_key_values
+
+
def reset_user_input():
return gr.update(value='')
@@ -92,17 +144,17 @@ with gr.Blocks() as demo:
submitBtn = gr.Button("Submit", variant="primary")
with gr.Column(scale=1):
emptyBtn = gr.Button("Clear History")
- max_length = gr.Slider(0, 32768, value=8192, step=1.0, label="Maximum length", interactive=True)
- top_p = gr.Slider(0, 1, value=0.8, step=0.01, label="Top P", interactive=True)
- temperature = gr.Slider(0, 1, value=0.95, step=0.01, label="Temperature", interactive=True)
+ max_length = gr.Slider(1, args.max_new_tokens, value=args.max_new_tokens, step=1.0, label="Maximum New Tokens", interactive=True)
+ top_p = gr.Slider(0.01, 0.99, value=0.01, step=0.01, label="Top P", interactive=True)
+ temperature = gr.Slider(0.01, 1, value=0.01, step=0.01, label="Temperature", interactive=True)
history = gr.State([])
past_key_values = gr.State(None)
- submitBtn.click(predict, [user_input, chatbot, max_length, top_p, temperature, history, past_key_values],
+ submitBtn.click(request, [user_input, chatbot, max_length, top_p, temperature, history, past_key_values],
[chatbot, history, past_key_values], show_progress=True)
submitBtn.click(reset_user_input, [], [user_input])
emptyBtn.click(reset_state, outputs=[chatbot, history, past_key_values], show_progress=True)
-demo.queue().launch(share=False, inbrowser=True)
+demo.queue(concurrency_count=args.concurrency).launch(server_name='0.0.0.0', share=False, inbrowser=True)

View File

@ -0,0 +1,3 @@
gradio==3.39
mdtex2html
streamlit

View File

@ -0,0 +1,33 @@
# ChatGLM3-6B 模型推理指导 <!-- omit in toc -->
# 概述
- ChatGLM3 是智谱AI和清华大学 KEG 实验室联合发布的对话预训练模型。ChatGLM3-6B 是 [ChatGLM3]((https://github.com/THUDM/ChatGLM3)) 系列中的开源模型在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上ChatGLM3-6B 有更强大的基础模型、更完整的功能支持、和更全面的开源序列。
- 此代码仓中实现了一套基于NPU硬件的ChatGLM3-6B推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了ChatGLM3-6B模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI | 长序列 |
|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
| ChatGLM3-6B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 是 | 否 | 否 |
- 此模型仓已适配的模型版本
- [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b)
- [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k)
- 注ChatGLM3-6B 推荐使用commit id为 `a5ba5501eb873d40d48bd0983bd2a8dd006bb838` 的模型仓版本
# 使用说明
- 参考[此README文件](../../chatglm/v2_6b/README.md)
## 精度测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## 性能测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## FAQ
- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错,可通过配置`LD_PRELOAD`解决。
- 示例:`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`

View File

@ -0,0 +1,99 @@
# README
[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) 项目开源了中文LLaMA模型和指令精调的Alpaca大模型以进一步促进大模型在中文NLP社区的开放研究。这些模型在原版LLaMA的基础上扩充了中文词表并使用了中文数据进行二次预训练进一步提升了中文基础语义理解能力。同时中文Alpaca模型进一步使用了中文指令数据进行精调显著提升了模型对指令的理解和执行能力。
- 此代码仓中实现了一套基于NPU硬件的Chinese-LLaMA-Alpaca系列模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了各Chinese-LLaMA-Alpaca模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI |
|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|---------|--------------|----------|--------|--------|-----|
| Chinese-Alpaca-13B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 否 |
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
|--------|--------------------------------------------------|
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | 模型仓所在路径;若使用编译好的包,则路径为`${working_dir}/`若使用gitee下载的代码则路径为`${working_dir}/ModelLink/mindie_ref/mindie_llm/atb_models` |
| script_path | 脚本所在路径Chinese-Alpaca-13B的工作脚本所在路径为`${llm_path}/examples/models/chinese_alpaca` |
| weight_path | 模型权重路径 |
## 权重
**权重下载**
- lora权重: [Chinese-Alpaca-Lora-13B](https://pan.baidu.com/s/1wYoSF58SnU9k0Lndd5VEYg?pwd=mm8i)
- 原模型权重: [LLaMA-13B](https://huggingface.co/huggyllama/llama-13b)
> 下载后务必检查压缩包中模型文件的SHA256是否一致请查看[SHA256.md](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/SHA256.md)
**lora权重合并**
- 合并lora权重和原模型权重请参考[合并教程](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/%E6%89%8B%E5%8A%A8%E6%A8%A1%E5%9E%8B%E5%90%88%E5%B9%B6%E4%B8%8E%E8%BD%AC%E6%8D%A2#%E5%A4%9Alora%E6%9D%83%E9%87%8D%E5%90%88%E5%B9%B6%E9%80%82%E7%94%A8%E4%BA%8Echinese-alpaca-plus)
**权重转换**
> 若权重中不包含safetensors格式则执行权重转换步骤否则跳过
- 参考[此README文件](../../README.md)
**基础环境变量**
- 参考[此README文件](../../../README.md)
## 推理
### 对话测试
**运行Paged Attention FP16**
- 运行启动脚本
- 将`${llm_path}`加入`PYTHONPATH`搜索目录
```shell
export PYTHONPATH=${llm_path}:${PYTHONPATH}
```
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_pa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用20030端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
```
## 精度测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
bash run.sh pa_fp16 full_CEval 1 llama ${Chinese-Alpaca-13B权重路径} 8
```
## 性能测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 llama ${Chinese-Alpaca-13B权重路径} 8
```
## FAQ
- 更多环境变量见[此README文件](../../README.md)
- 对话测试实际执行的Python文件为`${llm_path}/examples/run_pa.py`;这两个文件的参数说明见[此README文件](../../README.md)
- 运行时需要通过指令pip listgrep protobuf确认protobuf版本如果版本高于3.20.x请运行指令pip install protobuf==3.20.0进行更新

View File

@ -0,0 +1,23 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=20030
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
export INT8_FORMAT_NZ_ENABLE=1
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_pa --model_path $1
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1
fi

View File

@ -0,0 +1,57 @@
# CodeGeeX2-6B 模型推理指导 <!-- omit in toc -->
# 概述
- [CodeGeeX2-6B](https://github.com/THUDM/CodeGeeX2) 是多语言代码生成模型 [CodeGeeX](https://github.com/THUDM/CodeGeeX) ([KDD23](https://arxiv.org/abs/2303.17568)) 的第二代模型。不同于一代 CodeGeeX完全在国产华为昇腾芯片平台训练 CodeGeeX2 是基于 [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) 架构加入代码预训练实现,得益于 ChatGLM2 的更优性能CodeGeeX2 在多项指标上取得性能提升(+107% > CodeGeeX仅60亿参数即超过150亿参数的 StarCoder-15B 近10%)。
- 此代码仓中实现了一套基于NPU硬件的CodeGeeX2推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了CodeGeeX2-6B模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI | 长序列 |
|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
| CodeGeeX2-6B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 是 | 否 | 否 | 否 | 否 | 否 | 是 | 是 | 否 |
- 此模型仓已适配的模型版本
- [CodeGeeX2-6B](https://huggingface.co/THUDM/codegeex2-6b/tree/main)
# 使用说明
- 执行推理前需要将权重目录下的config.json中的`torch_dtype`改为`"float16"`
- 除了“量化权重导出”章节,其余均参考[此README文件](../../chatglm/v2_6b/README.md)
## 量化权重导出
量化权重可通过msmodelslim昇腾压缩加速工具实现。
### 环境准备
环境配置可参考msmodelslim官网https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/devtools/auxiliarydevtool/modelslim_0002.html
### 导出量化权重
通过`${llm_path}/examples/models/codegeex/v2_6b/quant_codegeex2_6b_w8a8.py`文件导出模型的量化权重(注意量化权重不要和浮点权重放在同一个目录下):
```shell
python quant_codegeex2_6b_w8a8.py --model_path ${浮点权重路径} --save_path ${量化权重保存路径} --dataset_path ${校准数据集路径}
```
校准数据集采用 `${llm_path}/tests/modeltest/dataset/full/BoolQ/dev.jsonl`
导出量化权重后应生成`quant_model_weight_w8a8.safetensors`和`quant_model_description_w8a8.json`两个文件。
注:
1.quant_codegeex2_6b_w8a8.py文件中已配置好较优的量化策略导出量化权重时可直接使用也可修改为其它策略。
2.执行脚本生成量化权重时会在生成的权重路径的config.json文件中添加(或修改)`quantize`字段,值为相应量化方式,当前仅支持`w8a8`。
3.执行完以上步骤后,执行量化模型只需要替换权重路径。
4.如果生成权重时遇到`OpenBLAS Warning: Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP = 1 option`,可通过设置`export OMP_NUM_THREADS=1`来关闭多线程规避。
## 精度测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## 性能测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## FAQ
- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错,可通过配置`LD_PRELOAD`解决。
- 示例:`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`

View File

@ -0,0 +1,93 @@
import os
import json
import shutil
import argparse
from transformers import AutoTokenizer, AutoModelForCausalLM
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import Calibrator, QuantConfig
from atb_llm.utils.file_utils import safe_open
def parse_args():
parser = argparse.ArgumentParser(description="Creating quant weights for CodeGeex2-6B")
parser.add_argument("--model_path", type=str, required=True, help="The path to model float weights")
parser.add_argument("--save_path", type=str, default="./quant_weight_geex", help="The path to save quant weights")
parser.add_argument("--dataset_path", type=str, required=True, help="The dataset path")
return parser.parse_args()
# 获取校准数据函数定义
def get_calib_dataset(tokenizer, calib_list, device="cpu"): # device="npu:0" 如果需要使用npu进行量化
calib_dataset = []
for calib_data in calib_list:
inputs = tokenizer(calib_data, return_tensors='pt')
calib_dataset.append([
inputs.data['input_ids'].to(device),
inputs.data['position_ids'].to(device),
inputs.data['attention_mask'].to(device)
])
return calib_dataset
disable_names = ['transformer.encoder.layers.0.self_attention.query_key_value',
'transformer.encoder.layers.0.mlp.dense_4h_to_h',
'transformer.encoder.layers.1.self_attention.query_key_value',
'transformer.encoder.layers.1.mlp.dense_h_to_4h',
'transformer.encoder.layers.1.mlp.dense_4h_to_h',
'transformer.encoder.layers.2.self_attention.query_key_value',
'transformer.encoder.layers.2.mlp.dense_h_to_4h',
'transformer.encoder.layers.2.mlp.dense_4h_to_h',
'transformer.encoder.layers.3.self_attention.query_key_value',
'transformer.encoder.layers.4.self_attention.query_key_value',
'transformer.encoder.layers.5.self_attention.query_key_value',
'transformer.encoder.layers.6.self_attention.query_key_value',
'transformer.encoder.layers.7.self_attention.query_key_value',
'transformer.encoder.layers.8.self_attention.query_key_value',
'transformer.encoder.layers.9.self_attention.query_key_value',
'transformer.encoder.layers.11.self_attention.query_key_value',
'transformer.encoder.layers.17.mlp.dense_4h_to_h',
'transformer.encoder.layers.23.mlp.dense_4h_to_h',
'transformer.encoder.layers.27.mlp.dense_4h_to_h',
'transformer.output_layer']
quant_config = QuantConfig(
a_bit=8,
w_bit=8,
disable_names=disable_names,
dev_type='cpu', # dev_type="npu", dev_id=0 如果需要使用npu进行量化
act_method=3,
pr=1.0,
w_sym=True,
mm_tensor=False
)
def main():
args = parse_args()
fp16_path = args.model_path # 原始浮点模型路径
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=fp16_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=fp16_path, trust_remote_code=True).float().cpu()
calib_set = []
with safe_open(args.dataset_path, 'r', encoding='utf-8') as file:
calib_set = file.readlines()
dataset_calib = get_calib_dataset(tokenizer, calib_set[:5])
calibrator = Calibrator(model, quant_config, calib_data=dataset_calib, disable_level='L0')
calibrator.run() # 执行PTQ量化校准
calibrator.save(args.save_path, save_type=["safe_tensor"]) # "safe_tensor"对应safetensors格式权重"numpy"对应npy格式权重
model_files = [f for f in os.listdir(args.model_path) if f.startswith(("config", "tokeniz", "modeling_chatglm.py"))]
for f in model_files:
shutil.copy2(os.path.join(args.model_path, f), os.path.join(args.save_path, f))
with safe_open(os.path.join(args.save_path, "config.json"), 'r+', encoding='utf-8') as f:
config = json.load(f)
config['quantize'] = 'w8a8'
f.seek(0)
json.dump(config, f, indent=4)
f.truncate()
if __name__ == '__main__':
main()

View File

@ -0,0 +1,172 @@
# README
- [Code Llama](https://github.com/Meta-Llama/codellama) 是Meta发布的代码生成类大语言模型在编程任务上具备填充、0-shot指令跟随能力并支持长序列文本输入在开源模型中拥有先进的性能。Code Llama 是 Llama 2 的代码专用版本,它是通过在代码数据集上对 Llama 2 进行进一步训练并在同一数据集上长时间采样更多数据而创建的。从本质上讲Code Llama 具有更强的编码能力。它可以根据代码和自然语言提示(例如,"给我写一个输出斐波那契数列的函数")生成代码和有关代码的自然语言。它还可用于代码补全和调试。它支持许多当今最流行的编程语言,包括 Python、C++、Java、PHP、Typescript (Javascript)、C#、Bash 等。
- 此代码仓中实现了一套基于NPU硬件的Code Llama推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了各CodeLlama模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE Service | TGI | 长序列 |
|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|--------------|--------------------------|-----|--------|-----|-----|
| CodeLlama-7B | 支持world size 1,2,4,8 | 否 | 是 | 否 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 否 | 否 |
| CodeLlama-13B | 支持world size 1,2,4,8 | 否 | 是 | 是 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 是 | 否 | 否 |
| CodeLlama-34B | 支持world size 4,8 | 支持world size 2,4,8 | 是 | 是 | 否 | 是 | 是 | 否 | 否 | 是 | 否 | 是 | 否 | 否 |
| CodeLlama-70B | 支持world size 4,8 | 否 | 是 | 是 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 否 | 否 |
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
|--------|--------------------------------------------------|
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | ATB_Models模型仓所在路径若使用编译好的包则路径为`${working_dir}/`若使用gitee下载的代码则路径为`${working_dir}/MindIE-LLM/examples/atb_models/` |
| script_path | 脚本所在路径CodeLlama的工作脚本所在路径为`${llm_path}/examples/models/codellama` |
| weight_path | 模型权重路径 |
## 权重
**权重下载**
- [CodeLlama-7B](https://huggingface.co/codellama/CodeLlama-7b-hf)
- [CodeLlama-13B](https://huggingface.co/codellama/CodeLlama-13b-hf)
- [CodeLlama-34B](https://huggingface.co/codellama/CodeLlama-34b-hf)
- [CodeLlama-70B](https://huggingface.co/codellama/CodeLlama-70b-hf)
**权重转换**
> 若权重中不包含safetensors格式则执行权重转换步骤否则跳过
- 参考[此README文件](../../README.md)
**量化权重生成**
> 基于原始的浮点权重,生成量化权重
- 设置环境变量
```shell
# 设置CANN包的环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 推荐使用transformers 4.33.0版本进行量化权重转换执行模型推理时transformers的版本大于等于4.33.0
pip uninstall transformers -y
pip install transformers=={指定版本}
# NPU多卡量化时关闭虚拟内存
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:False
# 指定当前机器上可用的逻辑NPU核心
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 将`${llm_path}`加入`PYTHONPATH`搜索目录
export PYTHONPATH=${llm_path}:${PYTHONPATH}
```
- W8A8量化权重请使用以下指令生成
- Step 1
- 修改模型权重config.json中`torch_dtype`字段为`float16`
- Step 2 W8A8量化权重生成
```shell
cd ${llm_path}/examples/models/codellama
python convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A8量化权重路径} --w_bit 8 --a_bit 8 --act_method 3 --anti_method m2 --device_type npu --calib_file ./humaneval_python.json
```
> NPU多卡量化注意事项和环境要求见[此README中的【NPU多卡量化】章节](../../README.md)
- 稀疏量化权重请使用以下指令生成
> 稀疏量化方式生成的权重只支持在300I DUO硬件上推理
- Step 1
- 修改模型权重config.json中`torch_dtype`字段为`float16`
- Step 2 稀疏量化权重生成
```shell
cd ${llm_path}/examples/models/codellama
python convert_quant_weights.py --model_path {浮点权重路径} --save_directory {W8A8S量化权重路径} --w_bit 4 --a_bit 8 --act_method 2 --do_smooth True --use_sigma True --is_lowbit True --device_type npu --calib_file ./humaneval_python.json
```
- Step 3量化权重切分及压缩
> 运行前需要确保压缩工具编译过
>
> `cd /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/msmodelslim/pytorch/weight_compression/compress_graph`
>
> `bash build.sh /usr/local/Ascend/ascend-toolkit/latest`
```shell
torchrun --nproc_per_node {TP数} -m examples.convert.model_slim.sparse_compressor --model_path {W8A8S量化权重路径} --save_directory {W8A8SC量化权重路径}
```
> TP数为tensor parallel并行个数
> 注意若权重生成时以TP=4进行切分则运行时也需以TP=4运行
**基础环境变量**
- 参考[此README文件](../../../README.md)
## 推理
### 对话测试
**运行Paged Attention BF16**
- 运行启动脚本
- 将`${llm_path}`加入`PYTHONPATH`搜索目录
```shell
export PYTHONPATH=${llm_path}:${PYTHONPATH}
```
- 在${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_pa.sh ${weight_path}
```
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用20030端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
export INT8_FORMAT_NZ_ENABLE=1
```
- 300I DUO卡不支持BF16数据类型
**运行Paged Attention FP16**
- 运行启动脚本
- 与“运行Paged Attention BF16”的启动方式相同
- 环境变量说明
- 参见“运行Paged Attention BF16”中的环境变量说明
- 相比于BF16运行FP16时需修改${weight_path}/config.json中的`torch_dtype`字段,将此字段对应的值修改为`float16`
**运行Paged Attention W8A8**
- W8A8量化权重生成
- 运行启动脚本
- 与“运行Paged Attention BF16”的启动方式相同
- `${weight_path}`为W8A8量化权重的路径
- 环境变量说明
- 参见“运行Paged Attention BF16”中的环境变量说明
- 相比于BF16运行量化时需修改W8A8量化权重`${weight_path}/config.json`中的`quantize`字段,将此字段对应的值修改为`w8a8`
- 若config.json中无此字段则新增
## 精度测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 运行Paged Attention BF16
bash run.sh pa_bf16 full_HumanEval 1 codellama ${weight_path} 8
# 运行Paged Attention FP16
bash run.sh pa_fp16 full_HumanEval 1 codellama ${weight_path} 8
```
## 性能测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# 运行Paged Attention BF16
bash run.sh pa_bf16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 codellama ${weight_path} 8
# 运行Paged Attention FP16
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 codellama ${weight_path} 8
```
## FAQ
- 更多环境变量见[此README文件](../../README.md)
- 对话测试实际执行的Python文件为`${llm_path}/examples/run_fa.py`和`${llm_path}/examples/run_pa.py`;这两个文件的参数说明见[此README文件](../../README.md)
- 运行时需要通过指令pip listgrep protobuf确认protobuf版本如果版本高于3.20.x请运行指令pip install protobuf==3.20.0进行更新

View File

@ -0,0 +1,84 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
import os
import torch
from msmodelslim.pytorch.llm_ptq.llm_ptq_tools import QuantConfig
from msmodelslim.pytorch.llm_ptq.anti_outlier import AntiOutlierConfig
from atb_llm.models.llama.modeling_llama import LlamaConfig
from atb_llm.utils.log import logger, print_log
from examples.convert.convert_utils import copy_tokenizer_files, modify_config
from examples.convert.model_slim.get_calibration_dataset import load_jsonl
from examples.convert.model_slim.quantifier import parse_arguments, Quantifier
if __name__ == "__main__":
args = parse_arguments()
rank = int(os.getenv("RANK", "0"))
config = LlamaConfig.from_pretrained(args.model_path)
disable_names = []
if args.a_bit != 16:
# W8A16, W4A16没有回退层
num_layers = config.num_hidden_layers
disable_names = [f"model.layers.{layer}.mlp.down_proj" for layer in range(num_layers)]
disable_names.append("lm_head")
anti_outlier_config = None
if args.anti_method:
anti_outlier_config = AntiOutlierConfig(anti_method=args.anti_method, dev_type=args.device_type, dev_id=rank)
quant_config = QuantConfig(
a_bit=args.a_bit,
w_bit=args.w_bit,
disable_names=disable_names,
act_method=args.act_method,
w_sym=args.w_sym,
mm_tensor=False,
dev_type=args.device_type,
dev_id=rank,
pr=1.0,
fraction=args.fraction,
co_sparse=args.co_sparse,
do_smooth=args.do_smooth,
use_sigma=args.use_sigma,
sigma_factor=args.sigma_factor,
is_lowbit=args.is_lowbit,
)
# 默认无校准数据集
calibration_dataset = None
# 若存在calib_file则使用calib_file作为校准数据集
if args.calib_file:
calibration_dataset = load_jsonl(args.calib_file, key_name='prompt')
if args.calib_dataset_length <= len(calibration_dataset):
calibration_dataset = calibration_dataset[:args.calib_dataset_length]
print_log(rank, logger.info, f"calib_dataset_length: {args.calib_dataset_length}")
else:
print_log(rank, logger.warning,
f"calib_dataset_length is too large, use default {len(calibration_dataset)}")
quant_weight_generator = Quantifier(
args.model_path, quant_config, anti_outlier_config,
device_type=args.device_type, tokenizer_args={"padding_side": "left"}
)
quant_weight_generator.tokenizer.pad_token_id = 2
tokenized_data = None
if calibration_dataset is not None:
dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=4)
tokenized_data = quant_weight_generator.get_tokenized_data(dataloader)
quant_weight_generator.convert(tokenized_data, args.save_directory, args.disable_level)
#为适配工具稀疏量化传入w_bit=4,a_bit=8暂时修改quant_type
quant_type = f"w{args.w_bit}a{args.a_bit}" + ("s" if (args.co_sparse or args.is_lowbit) else "")
is_sparseCompress = args.w_bit == 4 and args.a_bit == 8 and (args.co_sparse or args.is_lowbit)
if is_sparseCompress:
quant_type = "w8a8s"
modify_config(
args.model_path, args.save_directory, config.torch_dtype,
quant_type
)
copy_tokenizer_files(args.model_path, args.save_directory)

View File

@ -0,0 +1,7 @@
{"task_id": "HumanEval/0", "prompt": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n given threshold.\n >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n False\n >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n True\n \"\"\"\n", "entry_point": "has_close_elements", "canonical_solution": " for idx, elem in enumerate(numbers):\n for idx2, elem2 in enumerate(numbers):\n if idx != idx2:\n distance = abs(elem - elem2)\n if distance < threshold:\n return True\n\n return False\n", "test": "\n\nMETADATA = {\n 'author': 'jt',\n 'dataset': 'test'\n}\n\n\ndef check(candidate):\n assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\n assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\n assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\n assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\n assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\n assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\n assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\n\n"}
{"task_id": "HumanEval/1", "prompt": "from typing import List\n\n\ndef separate_paren_groups(paren_string: str) -> List[str]:\n \"\"\" Input to this function is a string containing multiple groups of nested parentheses. Your goal is to\n separate those group into separate strings and return the list of those.\n Separate groups are balanced (each open brace is properly closed) and not nested within each other\n Ignore any spaces in the input string.\n >>> separate_paren_groups('( ) (( )) (( )( ))')\n ['()', '(())', '(()())']\n \"\"\"\n", "entry_point": "separate_paren_groups", "canonical_solution": " result = []\n current_string = []\n current_depth = 0\n\n for c in paren_string:\n if c == '(':\n current_depth += 1\n current_string.append(c)\n elif c == ')':\n current_depth -= 1\n current_string.append(c)\n\n if current_depth == 0:\n result.append(''.join(current_string))\n current_string.clear()\n\n return result\n", "test": "\n\nMETADATA = {\n 'author': 'jt',\n 'dataset': 'test'\n}\n\n\ndef check(candidate):\n assert candidate('(()()) ((())) () ((())()())') == [\n '(()())', '((()))', '()', '((())()())'\n ]\n assert candidate('() (()) ((())) (((())))') == [\n '()', '(())', '((()))', '(((())))'\n ]\n assert candidate('(()(())((())))') == [\n '(()(())((())))'\n ]\n assert candidate('( ) (( )) (( )( ))') == ['()', '(())', '(()())']\n"}
{"task_id": "HumanEval/2", "prompt": "\n\ndef truncate_number(number: float) -> float:\n \"\"\" Given a positive floating point number, it can be decomposed into\n and integer part (largest integer smaller than given number) and decimals\n (leftover part always smaller than 1).\n\n Return the decimal part of the number.\n >>> truncate_number(3.5)\n 0.5\n \"\"\"\n", "entry_point": "truncate_number", "canonical_solution": " return number % 1.0\n", "test": "\n\nMETADATA = {\n 'author': 'jt',\n 'dataset': 'test'\n}\n\n\ndef check(candidate):\n assert candidate(3.5) == 0.5\n assert abs(candidate(1.33) - 0.33) < 1e-6\n assert abs(candidate(123.456) - 0.456) < 1e-6\n"}
{"task_id": "HumanEval/3", "prompt": "from typing import List\n\n\ndef below_zero(operations: List[int]) -> bool:\n \"\"\" You're given a list of deposit and withdrawal operations on a bank account that starts with\n zero balance. Your task is to detect if at any point the balance of account fallls below zero, and\n at that point function should return True. Otherwise it should return False.\n >>> below_zero([1, 2, 3])\n False\n >>> below_zero([1, 2, -4, 5])\n True\n \"\"\"\n", "entry_point": "below_zero", "canonical_solution": " balance = 0\n\n for op in operations:\n balance += op\n if balance < 0:\n return True\n\n return False\n", "test": "\n\nMETADATA = {\n 'author': 'jt',\n 'dataset': 'test'\n}\n\n\ndef check(candidate):\n assert candidate([]) == False\n assert candidate([1, 2, -3, 1, 2, -3]) == False\n assert candidate([1, 2, -4, 5, 6]) == True\n assert candidate([1, -1, 2, -2, 5, -5, 4, -4]) == False\n assert candidate([1, -1, 2, -2, 5, -5, 4, -5]) == True\n assert candidate([1, -2, 2, -2, 5, -5, 4, -4]) == True\n"}
{"task_id": "HumanEval/7", "prompt": "from typing import List\n\n\ndef filter_by_substring(strings: List[str], substring: str) -> List[str]:\n \"\"\" Filter an input list of strings only for ones that contain given substring\n >>> filter_by_substring([], 'a')\n []\n >>> filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a')\n ['abc', 'bacd', 'array']\n \"\"\"\n", "entry_point": "filter_by_substring", "canonical_solution": " return [x for x in strings if substring in x]\n", "test": "\n\nMETADATA = {\n 'author': 'jt',\n 'dataset': 'test'\n}\n\n\ndef check(candidate):\n assert candidate([], 'john') == []\n assert candidate(['xxx', 'asd', 'xxy', 'john doe', 'xxxAAA', 'xxx'], 'xxx') == ['xxx', 'xxxAAA', 'xxx']\n assert candidate(['xxx', 'asd', 'aaaxxy', 'john doe', 'xxxAAA', 'xxx'], 'xx') == ['xxx', 'aaaxxy', 'xxxAAA', 'xxx']\n assert candidate(['grunt', 'trumpet', 'prune', 'gruesome'], 'run') == ['grunt', 'prune']\n"}
{"task_id": "HumanEval/65", "prompt": "\ndef circular_shift(x, shift):\n \"\"\"Circular shift the digits of the integer x, shift the digits right by shift\n and return the result as a string.\n If shift > number of digits, return digits reversed.\n >>> circular_shift(12, 1)\n \"21\"\n >>> circular_shift(12, 2)\n \"12\"\n \"\"\"\n", "entry_point": "circular_shift", "canonical_solution": " s = str(x)\n if shift > len(s):\n return s[::-1]\n else:\n return s[len(s) - shift:] + s[:len(s) - shift]\n", "test": "def check(candidate):\n\n # Check some simple cases\n assert candidate(100, 2) == \"001\"\n assert candidate(12, 2) == \"12\"\n assert candidate(97, 8) == \"79\"\n assert candidate(12, 1) == \"21\", \"This prints if this assert fails 1 (good for debugging!)\"\n\n # Check some edge cases that are easy to work out by hand.\n assert candidate(11, 101) == \"11\", \"This prints if this assert fails 2 (also good for debugging!)\"\n\n"}
{"task_id": "HumanEval/79", "prompt": "\ndef decimal_to_binary(decimal):\n \"\"\"You will be given a number in decimal form and your task is to convert it to\n binary format. The function should return a string, with each character representing a binary\n number. Each character in the string will be '0' or '1'.\n\n There will be an extra couple of characters 'db' at the beginning and at the end of the string.\n The extra characters are there to help with the format.\n\n Examples:\n decimal_to_binary(15) # returns \"db1111db\"\n decimal_to_binary(32) # returns \"db100000db\"\n \"\"\"\n", "entry_point": "decimal_to_binary", "canonical_solution": " return \"db\" + bin(decimal)[2:] + \"db\"\n", "test": "def check(candidate):\n\n # Check some simple cases\n assert candidate(0) == \"db0db\"\n assert candidate(32) == \"db100000db\"\n assert candidate(103) == \"db1100111db\"\n assert candidate(15) == \"db1111db\", \"This prints if this assert fails 1 (good for debugging!)\"\n\n # Check some edge cases that are easy to work out by hand.\n assert True, \"This prints if this assert fails 2 (also good for debugging!)\"\n\n"}

View File

@ -0,0 +1,23 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=20030
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
export INT8_FORMAT_NZ_ENABLE=1
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_pa --model_path $1
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1
fi

View File

@ -0,0 +1,33 @@
# CodeShell-7B 模型推理指导 <!-- omit in toc -->
# 概述
- [CodeShell-7B](https://github.com/WisdomShell/codeshell)是北京大学知识计算实验室联合四川天府银行AI团队研发的多语言代码大模型基座。它拥有70亿参数经过对五千亿Tokens的训练并具有8192的上下文窗口长度。CodeShell在权威的代码评估BenchmarkHumanEval与MBPP上取得了同等规模最好的性能。这个项目为多语言代码处理和理解提供了有力的工具。
- 此代码仓中实现了一套基于NPU硬件的CodeShell推理模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了CodeShell-7B模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 | KV cache量化 | 稀疏量化 | MOE量化 | MindIE | TGI | 长序列 |
|-------------|-------------------------|-------------------------|------|------|-----------------|-----------------|---------|--------------|----------|--------|--------|-----|-----|-----|-----|
| CodeShell-7B | 支持world size 1,2,4,8 | 支持world size 1,2,4 | 是 | 否 | 否 | 是 | 否 | 否 | 否 | 否 | 否 | 否 | 否 | 否 | 否 |
- 此模型仓已适配的模型版本
- [CodeShell-7B](https://huggingface.co/WisdomShell/CodeShell)
# 使用说明
- 执行推理前需要将权重目录下的config.json中的`torch_dtype`改为`"float16"`
- 修改config.json中的`model_type`改为`"codeshell"`
## 精度测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## 性能测试
- 参考[此README文件](../../../../tests/modeltest/README.md)
## FAQ
- `import torch_npu`遇到`xxx/libgomp.so.1: cannot allocate memory in static TLS block`报错,可通过配置`LD_PRELOAD`解决。
- 示例:`export LD_PRELOAD=/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD`

View File

@ -0,0 +1,112 @@
# README
- [Deepseek]是由一系列代码语言模型组成。提供 1.3B、6.7B、7B 和 33B 的型号尺寸使用者能够选择最适合其要求的设置。当前脚本支持1.3B、6.7B、7B和33B
- 此代码仓中实现了一套基于NPU硬件的Deepseek-Coder模型。配合加速库使用旨在NPU上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了各DeepSeek-Coder模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16仅800I A2支持 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化仅300I DUO支持 | MOE | MindIE | TGI | 长序列 |
|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|--------------|--------------------------|-----|--------|-----|-----|
| DeepSeek-Coder-1.3B | 支持world size 1,2,4,8 | × | × | √ | √ | √ | × | × | × | × | × | × | × |×|
| DeepSeek-Coder-6.7B | 支持world size 1,2,4,8 | 支持world size 2,4 | √ | √ | √ | √ | × | × | × | × | × | × | × |×|
| DeepSeek-Coder-7B | 支持world size 1,2,4,8 | 支持world size 2,4 | √ | √ | √ | √ | × | × | × | × | × | × | × |×|
| DeepSeek-Coder-33B | 支持world size 4,8 | × | × | √ | √ | √ | × | × | × | × | × | × | × |×|
- 此模型仓已适配的模型版本
- [DeepSeek-Coder系列](https://github.com/deepseek-ai/DeepSeek-Coder)
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
|--------|--------------------------------------------------|
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`若使用gitee下载的代码则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| script_path | 脚本所在路径Deepseek-Coder的工作脚本所在路径为`${llm_path}/examples/models/deepseek` |
| weight_path | 模型权重路径 |
## 权重
**权重下载**
- [Deepseek-Coder-1.3B](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct)
- [Deepseek-Coder-6.7B](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
- [Deepseek-Coder-7B](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5)
- [Deepseek-Coder-33B](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
**基础环境变量**
- 参考[此README文件](../../../README.md)
**权重转换**
- 参考[此README文件](../../README.md)
**量化权重生成**
- 暂不支持
## 推理
### 对话测试
**运行Paged Attention FP16**
- 运行启动脚本 chat_template接口 transformers版本需求4.34.0
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_pa.sh ${weight_path}
```
- 启动脚本中可设置自定义问题具体在input_text后面修改即可
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑NPU核心多个核心间使用逗号相连
- 核心ID查阅方式见[此README文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于300I DUO卡而言若要使用单卡双芯请指定至少两个可见核心若要使用双卡四芯请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用20030端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=0
```
## 精度测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-1.3b权重路径} 8
bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-6.7b权重路径} 8
bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-7b权重路径} 8
bash run.sh pa_fp16 full_BoolQ 1 deepseek_coder ${deepseek-coder-33b权重路径} 8
```
- 运行量化权重和BF16时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配,参考[此README文件](../../README.md)
## 性能测试
- 参考[此README文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
export ATB_LLM_BENCHMARK_ENABLE=1
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-1.3b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-6.7b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-7b权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_coder ${deepseek-coder-33b权重路径} 8
```
- 运行量化权重和BF16时需注意`${weight_path}/config.json`中的`quantize`字段和`torch_dtype`字段是否与权重匹配,参考[此README文件](../../README.md)
## FAQ
- 更多环境变量见[此README文件](../../README.md)
- 对话测试实际执行的Python文件为`${llm_path}/examples/run_pa.py`;这个文件的参数说明见[此README文件](../../README.md)
- 运行时需要通过指令pip listgrep protobuf确认protobuf版本如果版本高于3.20.x请运行指令pip install protobuf==3.20.0进行更新

View File

@ -0,0 +1,101 @@
# README
- [DeepSeek-LLM](https://github.com/deepseek-ai/deepseek-LLM)从包含2T token的中英文混合数据集中训练得到7B Base、7B Chat、67B Base与67B Chat四种模型
# 支持特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16仅800I A2支持 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | KV cache量化 | 稀疏量化仅300I DUO支持 | MOE | MindIE | TGI |长序列|
|------------------|----------------------------|-----------------------------|------|---------------------|-----------------|-----------------|---------|-----------|--------------|------------------------|-----|--------|-----|-----|
| DeepSeek-LLM-7B | 支持world size 1,2,4,8 | 支持world size 1,2,4,8 | √ | × | × | √ | × | × | × | × | × | × | × |× |
| DeepSeek-LLM-67B | 支持world size 8 | × | √ | × | × | √ | × | × | × | × | × | × | × |× |
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
| --------------| --------------------------------|
| `working_dir` | 加速库及模型库下载后放置的目录 |
| `llm_path` | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`;若使用 gitee 下载的代码,则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| `script_path` | 脚本所在路径Deepseek-LLM的工作脚本所在路径为`${llm_path}/examples/models/deepseek` |
| `weight_path` | 模型权重路径 |
## 权重
### 权重下载
- [Deepseek-LLM-7B-Base](https://huggingface.co/deepseek-ai/deepseek-llm-7b-base)
- [Deepseek-LLM-7B-Chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
- [Deepseek-LLM-67B-Chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-base)
- [Deepseek-LLM-67B-Chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)
### 权重转换
- 当前仅支持加载safetensor格式的权重文件若权重文件为bin格式请参考[此README文件](../../README.md)
## 基础环境变量
- 参考[此 README 文件](../../../README.md)
## 推理
### 对话测试
**运行 Paged Attention FP16**
- 运行启动脚本(`transformers` 版本需求:>=4.35.0
- 在`${llm_path}`目录下执行以下指令
```shell
bash ${script_path}/run_pa.sh ${weight_path}
```
- 启动脚本中可设置自定义问题,具体在 input_text 后面修改即可 (默认问题为"Who is the CEO of Google?")
- 启动脚本中可设置自定义输出长度,具体在 max_output_length 后面修改即可(默认长度为 10
- 若当前所用权重版本为"chat"版本,请将"--is_chat_model"赋值给 extra_param若当前所用权重版本为"base"版本,可以将空字符串赋值给 extra_param默认为 chat_model
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑 NPU 核心,多个核心间使用逗号相连
- 核心 ID 查阅方式见[此 README 文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于 300I DUO 卡而言,若要使用单卡双芯,请指定至少两个可见核心;若要使用双卡四芯,请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用 20030 端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=1
export INT8_FORMAT_NZ_ENABLE=1
```
## 精度测试
- 参考[此 README 文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
bash run.sh pa_fp16 full_BoolQ 1 deepseek_llm ${deepseek-llm-7b-base权重路径} 2
bash run.sh pa_fp16 full_BoolQ 1 deepseek_llm ${deepseek-llm-67b-base权重路径} 8
```
## 性能测试
- 参考[此 README 文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
export ATB_LLM_BENCHMARK_ENABLE=1
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_llm ${deepseek-llm-7b-base权重路径} 2
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek_llm ${deepseek-llm-67b-base权重路径} 8
```
## FAQ
- 更多环境变量见[此 README 文件](../../README.md)
- 对话测试实际执行的 Python 文件为`${llm_path}/examples/run_pa.py`;这个文件的参数说明见[此 README 文件](../../README.md)
- 运行时,需要通过指令`pip list grep protobuf`确认`protobuf`版本,如果版本高于 3.20.x请运行指令`pip install protobuf==3.20.0`进行更新

View File

@ -0,0 +1,103 @@
# README
- [DeepSeekMoE 16B]是具有 16.4B 参数的混合专家(MoE)语言模型。模型主要涉及两个创新策略:专家细分和共享专家。此模型用[DeepSeek 7B]和[Llama2 7B]40%的计算量,就可以得到与其相当的精度结果。(当前脚本支持 16B-Base 和 16B-Chat
- 此代码仓中实现了一套基于 NPU 硬件的 Deepseek-MoE 模型。配合加速库使用,旨在 NPU 上获得极致的推理性能。
# 特性矩阵
- 此矩阵罗列了各DeepSeek-MoE模型支持的特性
| 模型及参数量 | 800I A2 Tensor Parallelism | 300I DUO Tensor Parallelism | FP16 | BF16仅800I A2支持 | Flash Attention | Paged Attention | W8A8量化 | W8A16量化 | W4A16量化 |KV cache量化 | 稀疏量化仅300I DUO支持 | MindIE | TGI | 长序列 |
|-------------|----------------------------|-----------------------------|------|----------------------|-----------------|-----------------|---------|-----------|-----------|--------------|--------------------------|--------|-----|-----|
| DeepSeek-MoE-16B-Chat | 支持world size 4,8 | × | √ | × | √ | √ | × | × | × | × | × | √ | × | × |
# 使用说明
## 路径变量解释
| 变量名 | 含义 |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| working_dir | 加速库及模型库下载后放置的目录 |
| llm_path | 模型仓所在路径。若使用编译好的包,则路径为`${working_dir}/MindIE-LLM/`;若使用 gitee 下载的代码,则路径为`${working_dir}/MindIE-LLM/examples/atb_models` |
| script_path | 脚本所在路径Deepseek-MoE 的工作脚本所在路径为`${llm_path}/examples/models/deepseek` |
| weight_path | 模型权重路径 |
## 权重
**权重下载**
- [Deepseek-MoE-16B-Base](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base)
- [Deepseek-MoE-16B-Chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat)
**基础环境变量**
- 参考[此 README 文件](../../../README.md)
## 推理
### 对话测试
**运行 Paged Attention FP16**
- 运行启动脚本transformers 版本需求4.36.2
- 在\${llm_path}目录下执行以下指令
```shell
bash ${script_path}/run_pa_deepseek_moe.sh ${weight_path}
```
- 启动脚本中可设置自定义问题,具体在 input_text 后面修改即可 (默认问题为"Who is the CEO of Google?")
- 启动脚本中可设置自定义输出长度,具体在 max_output_length 后面修改即可(默认长度为 10
- 若当前所用权重版本为"chat"版本,请将"--is_chat_model"赋值给 extra_param若当前所用权重版本为"base"版本,可以将空字符串赋值给 extra_param默认为 chat_model
- 环境变量说明
- `export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7`
- 指定当前机器上可用的逻辑 NPU 核心,多个核心间使用逗号相连
- 核心 ID 查阅方式见[此 README 文件](../../README.md)的【启动脚本相关环境变量】章节
- 对于 300I DUO 卡而言,若要使用单卡双芯,请指定至少两个可见核心;若要使用双卡四芯,请指定至少四个可见核心
- 各模型支持的核心数参考“特性矩阵”
- `export MASTER_PORT=20030`
- 设置卡间通信端口
- 默认使用 20030 端口
- 目的是为了避免同一台机器同时运行多个多卡模型时出现通信冲突
- 设置时端口建议范围为20000-20050
- 以下环境变量与性能和内存优化相关,通常情况下无需修改
```shell
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=1
export INT8_FORMAT_NZ_ENABLE=1
export ATB_LLM_ENABLE_AUTO_TRANSPOSE=0
```
## 精度测试
- 参考[此 README 文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
bash run.sh pa_fp16 full_BoolQ 1 deepseek ${deepseek-moe-16b-base权重路径} 8
bash run.sh pa_fp16 full_BoolQ 1 deepseek ${deepseek-moe-16b-chat权重路径} 8
```
## 性能测试
- 参考[此 README 文件](../../../tests/modeltest/README.md)
- 示例
```shell
cd ${llm_path}/tests/modeltest
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MAX_MEMORY_GB=29
export ATB_LLM_BENCHMARK_ENABLE=1
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek ${deepseek-moe-16b-base权重路径} 8
bash run.sh pa_fp16 performance [[2048,2048],[1024,1024],[512,512],[256,256]] 1 deepseek ${deepseek-moe-16b-chat权重路径} 8
```
## FAQ
- 更多环境变量见[此 README 文件](../../README.md)
- 对话测试实际执行的 Python 文件为`${llm_path}/examples/run_pa.py`;这个文件的参数说明见[此 README 文件](../../README.md)
- 运行时,需要通过指令 pip list grep protobuf 确认 protobuf 版本,如果版本高于 3.20.x请运行指令 pip install protobuf==3.20.0 进行更新

View File

@ -0,0 +1,26 @@
# Copyright Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
# 参数配置以及启动指令的说明见同级目录下的README.md文件
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=20030
# 以下环境变量与性能和内存优化相关,通常情况下无需修改
export ATB_LAYER_INTERNAL_TENSOR_REUSE=1
export INF_NAN_MODE_ENABLE=0
export ATB_OPERATION_EXECUTE_ASYNC=1
export TASK_QUEUE_ENABLE=1
export ATB_CONVERT_NCHW_TO_ND=1
export LCCL_ENABLE_FALLBACK=1
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
export ATB_CONTEXT_WORKSPACE_SIZE=1
export INT8_FORMAT_NZ_ENABLE=1
extra_param="--is_chat_model"
world_size=$(($(echo "${ASCEND_RT_VISIBLE_DEVICES}" | grep -o , | wc -l) +1))
if [ "$TP_WORLD_SIZE" == "1" ]; then
python -m examples.run_pa --model_path $1 $extra_param
else
torchrun --nproc_per_node $world_size --master_port $MASTER_PORT -m examples.run_pa --model_path $1 $extra_param
fi

Some files were not shown because too many files have changed in this diff Show More