train: qwen

This commit is contained in:
wql 2024-09-19 15:11:35 +08:00
parent e99fab5e4c
commit 5a306611bc
30 changed files with 307468 additions and 0 deletions

View File

@ -0,0 +1,66 @@
---
base_model: ../../../models/qwen
library_name: peft
license: other
tags:
- llama-factory
- lora
- generated_from_trainer
model-index:
- name: lora_sft_Qwen-7B_8_gpu_500_step_20240919142813
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# lora_sft_Qwen-7B_8_gpu_500_step_20240919142813
This model is a fine-tuned version of [../../../models/qwen](https://huggingface.co/../../../models/qwen) on the belle_1m dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2722
- Num Input Tokens Seen: 1321104
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 500
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| 1.23 | 0.8889 | 500 | 1.2722 | 1321104 |
### Framework versions
- PEFT 0.12.0
- Transformers 4.43.4
- Pytorch 2.1.0
- Datasets 2.20.0
- Tokenizers 0.19.1

View File

@ -0,0 +1,31 @@
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "../../../models/qwen",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"c_proj",
"w1",
"c_attn",
"w2"
],
"task_type": "CAUSAL_LM",
"use_dora": false,
"use_rslora": false
}

View File

@ -0,0 +1,14 @@
{
"epoch": 0.8888888888888888,
"eval_loss": 1.272219181060791,
"eval_runtime": 56.8054,
"eval_samples_per_second": 17.604,
"eval_steps_per_second": 8.802,
"num_input_tokens_seen": 1321104,
"total_flos": 5.641287949968998e+16,
"train_loss": 1.3309755539894104,
"train_runtime": 1830.2469,
"train_samples_per_second": 4.371,
"train_steps_per_second": 0.273,
"train_tokens_per_second": 1084.007
}

View File

@ -0,0 +1,202 @@
---
base_model: ../../../models/qwen
library_name: peft
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
### Framework versions
- PEFT 0.12.0

View File

@ -0,0 +1,31 @@
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "../../../models/qwen",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"c_proj",
"w1",
"c_attn",
"w2"
],
"task_type": "CAUSAL_LM",
"use_dora": false,
"use_rslora": false
}

View File

@ -0,0 +1,10 @@
{
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|im_end|>"
}

View File

@ -0,0 +1,276 @@
# Copyright (c) Alibaba Cloud.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""Tokenization classes for QWen."""
import base64
import logging
import os
import unicodedata
from typing import Collection, Dict, List, Set, Tuple, Union
import tiktoken
from transformers import PreTrainedTokenizer, AddedToken
logger = logging.getLogger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
ENDOFTEXT = "<|endoftext|>"
IMSTART = "<|im_start|>"
IMEND = "<|im_end|>"
# as the default behavior is changed to allow special tokens in
# regular texts, the surface forms of special tokens need to be
# as different as possible to minimize the impact
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
# changed to use actual index to avoid misconfiguration with vocabulary expansion
SPECIAL_START_ID = 151643
SPECIAL_TOKENS = tuple(
enumerate(
(
(
ENDOFTEXT,
IMSTART,
IMEND,
)
+ EXTRAS
),
start=SPECIAL_START_ID,
)
)
SPECIAL_TOKENS_SET = set(t for i, t in SPECIAL_TOKENS)
def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
with open(tiktoken_bpe_file, "rb") as f:
contents = f.read()
return {
base64.b64decode(token): int(rank)
for token, rank in (line.split() for line in contents.splitlines() if line)
}
class QWenTokenizer(PreTrainedTokenizer):
"""QWen tokenizer."""
vocab_files_names = VOCAB_FILES_NAMES
def __init__(
self,
vocab_file,
errors="replace",
extra_vocab_file=None,
**kwargs,
):
super().__init__(**kwargs)
# how to handle errors in decoding UTF-8 byte sequences
# use ignore if you are in streaming inference
self.errors = errors
self.mergeable_ranks = _load_tiktoken_bpe(vocab_file) # type: Dict[bytes, int]
self.special_tokens = {
token: index
for index, token in SPECIAL_TOKENS
}
# try load extra vocab from file
if extra_vocab_file is not None:
used_ids = set(self.mergeable_ranks.values()) | set(self.special_tokens.values())
extra_mergeable_ranks = _load_tiktoken_bpe(extra_vocab_file)
for token, index in extra_mergeable_ranks.items():
if token in self.mergeable_ranks:
logger.info(f"extra token {token} exists, skipping")
continue
if index in used_ids:
logger.info(f'the index {index} for extra token {token} exists, skipping')
continue
self.mergeable_ranks[token] = index
# the index may be sparse after this, but don't worry tiktoken.Encoding will handle this
enc = tiktoken.Encoding(
"Qwen",
pat_str=PAT_STR,
mergeable_ranks=self.mergeable_ranks,
special_tokens=self.special_tokens,
)
assert (
len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
self.decoder = {
v: k for k, v in self.mergeable_ranks.items()
} # type: dict[int, bytes|str]
self.decoder.update({v: k for k, v in self.special_tokens.items()})
self.tokenizer = enc # type: tiktoken.Encoding
self.eod_id = self.tokenizer.eot_token
self.im_start_id = self.special_tokens[IMSTART]
self.im_end_id = self.special_tokens[IMEND]
def __getstate__(self):
# for pickle lovers
state = self.__dict__.copy()
del state["tokenizer"]
return state
def __setstate__(self, state):
# tokenizer is not python native; don't pass it; rebuild it
self.__dict__.update(state)
enc = tiktoken.Encoding(
"Qwen",
pat_str=PAT_STR,
mergeable_ranks=self.mergeable_ranks,
special_tokens=self.special_tokens,
)
self.tokenizer = enc
def __len__(self) -> int:
return self.tokenizer.n_vocab
def get_vocab(self) -> Dict[bytes, int]:
return self.mergeable_ranks
def convert_tokens_to_ids(
self, tokens: Union[bytes, str, List[Union[bytes, str]]]
) -> List[int]:
ids = []
if isinstance(tokens, (str, bytes)):
if tokens in self.special_tokens:
return self.special_tokens[tokens]
else:
return self.mergeable_ranks.get(tokens)
for token in tokens:
if token in self.special_tokens:
ids.append(self.special_tokens[token])
else:
ids.append(self.mergeable_ranks.get(token))
return ids
def _add_tokens(
self,
new_tokens: Union[List[str], List[AddedToken]],
special_tokens: bool = False,
) -> int:
if not special_tokens and new_tokens:
raise ValueError("Adding regular tokens is not supported")
for token in new_tokens:
surface_form = token.content if isinstance(token, AddedToken) else token
if surface_form not in SPECIAL_TOKENS_SET:
raise ValueError("Adding unknown special tokens is not supported")
return 0
def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
"""
Save only the vocabulary of the tokenizer (vocabulary).
Returns:
`Tuple(str)`: Paths to the files saved.
"""
file_path = os.path.join(save_directory, "qwen.tiktoken")
with open(file_path, "w", encoding="utf8") as w:
for k, v in self.mergeable_ranks.items():
line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
w.write(line)
return (file_path,)
def tokenize(
self,
text: str,
allowed_special: Union[Set, str] = "all",
disallowed_special: Union[Collection, str] = (),
**kwargs,
) -> List[Union[bytes, str]]:
"""
Converts a string in a sequence of tokens.
Args:
text (`str`):
The sequence to be encoded.
allowed_special (`Literal["all"]` or `set`):
The surface forms of the tokens to be encoded as special tokens in regular texts.
Default to "all".
disallowed_special (`Literal["all"]` or `Collection`):
The surface forms of the tokens that should not be in regular texts and trigger errors.
Default to an empty tuple.
kwargs (additional keyword arguments, *optional*):
Will be passed to the underlying model specific encode method.
Returns:
`List[bytes|str]`: The list of tokens.
"""
tokens = []
text = unicodedata.normalize("NFC", text)
# this implementation takes a detour: text -> token id -> token surface forms
for t in self.tokenizer.encode(
text, allowed_special=allowed_special, disallowed_special=disallowed_special
):
tokens.append(self.decoder[t])
return tokens
def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
"""
Converts a sequence of tokens in a single string.
"""
text = ""
temp = b""
for t in tokens:
if isinstance(t, str):
if temp:
text += temp.decode("utf-8", errors=self.errors)
temp = b""
text += t
elif isinstance(t, bytes):
temp += t
else:
raise TypeError("token should only be of type types or str")
if temp:
text += temp.decode("utf-8", errors=self.errors)
return text
@property
def vocab_size(self):
return self.tokenizer.n_vocab
def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
"""Converts an id to a token, special tokens included"""
if index in self.decoder:
return self.decoder[index]
raise ValueError("unknown ids")
def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
"""Converts a token to an id using the vocab, special tokens included"""
if token in self.special_tokens:
return self.special_tokens[token]
if token in self.mergeable_ranks:
return self.mergeable_ranks[token]
raise ValueError("unknown token")
def _tokenize(self, text: str, **kwargs):
"""
Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
Do NOT take care of added tokens.
"""
raise NotImplementedError
def _decode(
self,
token_ids: Union[int, List[int]],
skip_special_tokens: bool = False,
errors: str = None,
**kwargs,
) -> str:
if isinstance(token_ids, int):
token_ids = [token_ids]
if skip_special_tokens:
token_ids = [i for i in token_ids if i < self.eod_id]
return self.tokenizer.decode(token_ids, errors=errors or self.errors)

View File

@ -0,0 +1,17 @@
{
"added_tokens_decoder": {},
"auto_map": {
"AutoTokenizer": [
"tokenization_qwen.QWenTokenizer",
null
]
},
"chat_template": "{% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}",
"clean_up_tokenization_spaces": true,
"eos_token": "<|im_end|>",
"model_max_length": 32768,
"pad_token": "<|im_end|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "QWenTokenizer"
}

View File

@ -0,0 +1,8 @@
{
"epoch": 0.8888888888888888,
"eval_loss": 1.272219181060791,
"eval_runtime": 56.8054,
"eval_samples_per_second": 17.604,
"eval_steps_per_second": 8.802,
"num_input_tokens_seen": 1321104
}

View File

@ -0,0 +1,231 @@
[2024-09-19 14:28:31,296] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to npu (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
09/19/2024 14:28:36 - INFO - llamafactory.hparams.parser - Process rank: 0, device: npu:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
09/19/2024 14:28:37 - INFO - llamafactory.data.template - Add eos token: <|im_end|>
09/19/2024 14:28:37 - INFO - llamafactory.data.template - Add pad token: <|im_end|>
09/19/2024 14:28:37 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
training example:
input_ids:
[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 104317, 89012, 22382, 106096, 64471, 101137, 72881, 102648, 46448, 1773, 62244, 107132, 37945, 99553, 25177, 101898, 8997, 100431, 99639, 113773, 9370, 111749, 25, 330, 100012, 105435, 99487, 100220, 3837, 104817, 44063, 99553, 102322, 20074, 33108, 116993, 3837, 23031, 104022, 100147, 101313, 1773, 698, 151645, 198, 151644, 77091, 198, 99487, 111749, 101137, 72881, 102648, 46448, 1773, 151645]
inputs:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
判断给定的文章是否符合语法规则。如果不符合,请提供修改建议。
下面是一篇文章的开头: "为了探讨这个主题,本文将提供一系列数据和实例,以证明这一观点。"
<|im_end|>
<|im_start|>assistant
这个开头符合语法规则。<|im_end|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 99487, 111749, 101137, 72881, 102648, 46448, 1773, 151645]
labels:
这个开头符合语法规则。<|im_end|>
09/19/2024 14:33:15 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
09/19/2024 14:33:15 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
09/19/2024 14:33:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
09/19/2024 14:33:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
09/19/2024 14:33:15 - INFO - llamafactory.model.model_utils.misc - Found linear modules: c_proj,w1,c_attn,w2
09/19/2024 14:33:16 - INFO - llamafactory.model.loader - trainable params: 17,891,328 || all params: 7,739,215,872 || trainable%: 0.2312
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 32768.0
{'loss': 1.5189, 'grad_norm': 0.8999722599983215, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01, 'num_input_tokens_seen': 9808}
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 16384.0
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 8192.0
{'loss': 1.5574, 'grad_norm': nan, 'learning_rate': 6e-06, 'epoch': 0.01, 'num_input_tokens_seen': 19312}
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 4096.0
{'loss': 1.5909, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.02, 'num_input_tokens_seen': 29232}
{'loss': 1.8082, 'grad_norm': 0.6828662753105164, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.02, 'num_input_tokens_seen': 37984}
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 2048.0
{'loss': 1.7089, 'grad_norm': 1.1571168899536133, 'learning_rate': 2e-05, 'epoch': 0.03, 'num_input_tokens_seen': 44592}
{'loss': 1.7715, 'grad_norm': 6.552746772766113, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.03, 'num_input_tokens_seen': 52400}
{'loss': 1.5624, 'grad_norm': 1.3311208486557007, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.04, 'num_input_tokens_seen': 60320}
{'loss': 1.9382, 'grad_norm': 1.941733479499817, 'learning_rate': 3.8e-05, 'epoch': 0.04, 'num_input_tokens_seen': 67024}
{'loss': 1.5455, 'grad_norm': 1.579136848449707, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.05, 'num_input_tokens_seen': 73776}
{'loss': 1.6186, 'grad_norm': 1.887698769569397, 'learning_rate': 5e-05, 'epoch': 0.05, 'num_input_tokens_seen': 82592}
{'loss': 1.5182, 'grad_norm': 1.30510413646698, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.06, 'num_input_tokens_seen': 90512}
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 1024.0
{'loss': 1.4462, 'grad_norm': 1.3481446504592896, 'learning_rate': 6e-05, 'epoch': 0.06, 'num_input_tokens_seen': 96848}
{'loss': 1.3399, 'grad_norm': 1.5753204822540283, 'learning_rate': 6.6e-05, 'epoch': 0.07, 'num_input_tokens_seen': 103728}
{'loss': 1.3188, 'grad_norm': 0.7255751490592957, 'learning_rate': 7.2e-05, 'epoch': 0.07, 'num_input_tokens_seen': 112160}
{'loss': 1.535, 'grad_norm': 0.7835249900817871, 'learning_rate': 7.800000000000001e-05, 'epoch': 0.08, 'num_input_tokens_seen': 117984}
{'loss': 1.363, 'grad_norm': 0.6611946225166321, 'learning_rate': 8.4e-05, 'epoch': 0.09, 'num_input_tokens_seen': 126624}
{'loss': 1.5687, 'grad_norm': 0.9039588570594788, 'learning_rate': 9e-05, 'epoch': 0.09, 'num_input_tokens_seen': 134496}
{'loss': 1.6227, 'grad_norm': 0.9094035029411316, 'learning_rate': 9.6e-05, 'epoch': 0.1, 'num_input_tokens_seen': 141664}
{'loss': 1.3263, 'grad_norm': 0.5215123891830444, 'learning_rate': 9.999878153526974e-05, 'epoch': 0.1, 'num_input_tokens_seen': 149360}
{'loss': 1.3676, 'grad_norm': 0.5413862466812134, 'learning_rate': 9.998050575201771e-05, 'epoch': 0.11, 'num_input_tokens_seen': 157776}
{'loss': 1.4657, 'grad_norm': 1.1176000833511353, 'learning_rate': 9.99403068670717e-05, 'epoch': 0.11, 'num_input_tokens_seen': 165360}
{'loss': 1.3857, 'grad_norm': 1.301885962486267, 'learning_rate': 9.987820251299122e-05, 'epoch': 0.12, 'num_input_tokens_seen': 173152}
{'loss': 1.549, 'grad_norm': 0.8806378841400146, 'learning_rate': 9.979421993079852e-05, 'epoch': 0.12, 'num_input_tokens_seen': 181504}
{'loss': 1.4207, 'grad_norm': 1.1323336362838745, 'learning_rate': 9.968839595802982e-05, 'epoch': 0.13, 'num_input_tokens_seen': 188880}
{'loss': 1.4148, 'grad_norm': 0.5949046015739441, 'learning_rate': 9.956077701257709e-05, 'epoch': 0.13, 'num_input_tokens_seen': 195952}
{'loss': 1.2252, 'grad_norm': 2.3640127182006836, 'learning_rate': 9.941141907232765e-05, 'epoch': 0.14, 'num_input_tokens_seen': 203744}
{'loss': 1.1623, 'grad_norm': 0.46419885754585266, 'learning_rate': 9.924038765061042e-05, 'epoch': 0.14, 'num_input_tokens_seen': 212624}
{'loss': 1.5017, 'grad_norm': 2.0805838108062744, 'learning_rate': 9.904775776745958e-05, 'epoch': 0.15, 'num_input_tokens_seen': 219968}
{'loss': 1.2643, 'grad_norm': 0.38017725944519043, 'learning_rate': 9.88336139167084e-05, 'epoch': 0.15, 'num_input_tokens_seen': 229024}
{'loss': 1.2085, 'grad_norm': 0.5102754831314087, 'learning_rate': 9.859805002892732e-05, 'epoch': 0.16, 'num_input_tokens_seen': 236768}
{'loss': 1.333, 'grad_norm': 0.4705878496170044, 'learning_rate': 9.834116943022298e-05, 'epoch': 0.17, 'num_input_tokens_seen': 245152}
{'loss': 1.5669, 'grad_norm': 0.8111830353736877, 'learning_rate': 9.806308479691595e-05, 'epoch': 0.17, 'num_input_tokens_seen': 251728}
{'loss': 1.317, 'grad_norm': 0.5702469348907471, 'learning_rate': 9.776391810611718e-05, 'epoch': 0.18, 'num_input_tokens_seen': 260832}
{'loss': 1.3829, 'grad_norm': 0.9620650410652161, 'learning_rate': 9.744380058222483e-05, 'epoch': 0.18, 'num_input_tokens_seen': 268448}
{'loss': 1.2588, 'grad_norm': 0.8848989605903625, 'learning_rate': 9.710287263936484e-05, 'epoch': 0.19, 'num_input_tokens_seen': 276016}
{'loss': 1.3142, 'grad_norm': 0.7803770899772644, 'learning_rate': 9.674128381980072e-05, 'epoch': 0.19, 'num_input_tokens_seen': 282416}
{'loss': 1.4468, 'grad_norm': 0.6126624941825867, 'learning_rate': 9.635919272833938e-05, 'epoch': 0.2, 'num_input_tokens_seen': 290016}
{'loss': 1.5577, 'grad_norm': 1.1495459079742432, 'learning_rate': 9.595676696276172e-05, 'epoch': 0.2, 'num_input_tokens_seen': 297952}
{'loss': 1.3545, 'grad_norm': 0.4600121080875397, 'learning_rate': 9.553418304030886e-05, 'epoch': 0.21, 'num_input_tokens_seen': 305152}
{'loss': 1.2359, 'grad_norm': 0.6911833882331848, 'learning_rate': 9.50916263202557e-05, 'epoch': 0.21, 'num_input_tokens_seen': 312592}
{'loss': 1.2377, 'grad_norm': 0.800264298915863, 'learning_rate': 9.462929092260628e-05, 'epoch': 0.22, 'num_input_tokens_seen': 319792}
{'loss': 1.402, 'grad_norm': 0.37946560978889465, 'learning_rate': 9.414737964294636e-05, 'epoch': 0.22, 'num_input_tokens_seen': 327776}
{'loss': 1.3244, 'grad_norm': 0.6147738695144653, 'learning_rate': 9.364610386349049e-05, 'epoch': 0.23, 'num_input_tokens_seen': 336432}
{'loss': 1.3333, 'grad_norm': 0.4380050301551819, 'learning_rate': 9.312568346036288e-05, 'epoch': 0.23, 'num_input_tokens_seen': 343216}
{'loss': 1.2563, 'grad_norm': 0.8881791234016418, 'learning_rate': 9.258634670715238e-05, 'epoch': 0.24, 'num_input_tokens_seen': 350096}
{'loss': 1.3305, 'grad_norm': 0.39159095287323, 'learning_rate': 9.202833017478422e-05, 'epoch': 0.25, 'num_input_tokens_seen': 359040}
{'loss': 1.2371, 'grad_norm': 0.4823763966560364, 'learning_rate': 9.145187862775209e-05, 'epoch': 0.25, 'num_input_tokens_seen': 367568}
{'loss': 1.2544, 'grad_norm': 0.7489269971847534, 'learning_rate': 9.085724491675642e-05, 'epoch': 0.26, 'num_input_tokens_seen': 375552}
{'loss': 1.3928, 'grad_norm': 0.7432783842086792, 'learning_rate': 9.02446898677957e-05, 'epoch': 0.26, 'num_input_tokens_seen': 385712}
{'loss': 1.1703, 'grad_norm': 0.6810922622680664, 'learning_rate': 8.961448216775954e-05, 'epoch': 0.27, 'num_input_tokens_seen': 393344}
{'loss': 1.3133, 'grad_norm': 0.5105797648429871, 'learning_rate': 8.896689824657372e-05, 'epoch': 0.27, 'num_input_tokens_seen': 400720}
{'loss': 1.2321, 'grad_norm': 0.7406936287879944, 'learning_rate': 8.83022221559489e-05, 'epoch': 0.28, 'num_input_tokens_seen': 409056}
{'loss': 1.3816, 'grad_norm': 0.7281469106674194, 'learning_rate': 8.762074544478623e-05, 'epoch': 0.28, 'num_input_tokens_seen': 416144}
{'loss': 1.1577, 'grad_norm': 1.008775234222412, 'learning_rate': 8.692276703129421e-05, 'epoch': 0.29, 'num_input_tokens_seen': 422880}
{'loss': 1.2287, 'grad_norm': 0.9613437652587891, 'learning_rate': 8.620859307187339e-05, 'epoch': 0.29, 'num_input_tokens_seen': 429440}
{'loss': 1.2672, 'grad_norm': 0.556431233882904, 'learning_rate': 8.547853682682604e-05, 'epoch': 0.3, 'num_input_tokens_seen': 437200}
{'loss': 1.299, 'grad_norm': 0.6057170629501343, 'learning_rate': 8.473291852294987e-05, 'epoch': 0.3, 'num_input_tokens_seen': 446416}
{'loss': 1.4358, 'grad_norm': 0.772670567035675, 'learning_rate': 8.397206521307584e-05, 'epoch': 0.31, 'num_input_tokens_seen': 454672}
{'loss': 1.2488, 'grad_norm': 0.5777968168258667, 'learning_rate': 8.319631063261209e-05, 'epoch': 0.31, 'num_input_tokens_seen': 463216}
{'loss': 1.2692, 'grad_norm': 0.7657284140586853, 'learning_rate': 8.240599505315655e-05, 'epoch': 0.32, 'num_input_tokens_seen': 472432}
{'loss': 1.2339, 'grad_norm': 0.7620649337768555, 'learning_rate': 8.160146513324254e-05, 'epoch': 0.33, 'num_input_tokens_seen': 479968}
{'loss': 1.4274, 'grad_norm': 0.50833660364151, 'learning_rate': 8.07830737662829e-05, 'epoch': 0.33, 'num_input_tokens_seen': 488112}
{'loss': 1.3689, 'grad_norm': 0.692903459072113, 'learning_rate': 7.99511799257793e-05, 'epoch': 0.34, 'num_input_tokens_seen': 495808}
{'loss': 1.344, 'grad_norm': 0.8289609551429749, 'learning_rate': 7.910614850786448e-05, 'epoch': 0.34, 'num_input_tokens_seen': 503216}
{'loss': 1.2793, 'grad_norm': 0.5638831853866577, 'learning_rate': 7.82483501712469e-05, 'epoch': 0.35, 'num_input_tokens_seen': 511568}
{'loss': 1.5863, 'grad_norm': 0.5944852828979492, 'learning_rate': 7.737816117462752e-05, 'epoch': 0.35, 'num_input_tokens_seen': 519536}
{'loss': 1.5119, 'grad_norm': 0.6509004831314087, 'learning_rate': 7.649596321166024e-05, 'epoch': 0.36, 'num_input_tokens_seen': 526576}
{'loss': 1.335, 'grad_norm': 0.5436453819274902, 'learning_rate': 7.560214324352858e-05, 'epoch': 0.36, 'num_input_tokens_seen': 533824}
{'loss': 1.2377, 'grad_norm': 0.45730966329574585, 'learning_rate': 7.469709332921155e-05, 'epoch': 0.37, 'num_input_tokens_seen': 541472}
{'loss': 1.1963, 'grad_norm': 0.5066884160041809, 'learning_rate': 7.378121045351378e-05, 'epoch': 0.37, 'num_input_tokens_seen': 550144}
{'loss': 1.3623, 'grad_norm': 0.43720710277557373, 'learning_rate': 7.285489635293472e-05, 'epoch': 0.38, 'num_input_tokens_seen': 557872}
{'loss': 1.2955, 'grad_norm': 0.6945408582687378, 'learning_rate': 7.191855733945387e-05, 'epoch': 0.38, 'num_input_tokens_seen': 563728}
{'loss': 1.2986, 'grad_norm': 1.6284866333007812, 'learning_rate': 7.097260412230886e-05, 'epoch': 0.39, 'num_input_tokens_seen': 571056}
{'loss': 1.0992, 'grad_norm': 0.6772762537002563, 'learning_rate': 7.001745162784477e-05, 'epoch': 0.39, 'num_input_tokens_seen': 579184}
{'loss': 1.437, 'grad_norm': 1.3077731132507324, 'learning_rate': 6.905351881751372e-05, 'epoch': 0.4, 'num_input_tokens_seen': 586128}
{'loss': 1.431, 'grad_norm': 0.4619450271129608, 'learning_rate': 6.808122850410461e-05, 'epoch': 0.41, 'num_input_tokens_seen': 594848}
{'loss': 1.3132, 'grad_norm': 0.432878702878952, 'learning_rate': 6.710100716628344e-05, 'epoch': 0.41, 'num_input_tokens_seen': 602544}
Gradient overflow. Skipping step
Loss scaler reducing loss scale to 512.0
{'loss': 1.3838, 'grad_norm': 15.175246238708496, 'learning_rate': 6.644333233692916e-05, 'epoch': 0.42, 'num_input_tokens_seen': 609904}
{'loss': 1.1309, 'grad_norm': 0.43043747544288635, 'learning_rate': 6.545084971874738e-05, 'epoch': 0.42, 'num_input_tokens_seen': 618576}
{'loss': 1.3572, 'grad_norm': 0.432679146528244, 'learning_rate': 6.445158984722358e-05, 'epoch': 0.43, 'num_input_tokens_seen': 626960}
{'loss': 1.3563, 'grad_norm': 0.49087584018707275, 'learning_rate': 6.344599103076329e-05, 'epoch': 0.43, 'num_input_tokens_seen': 634896}
{'loss': 1.2947, 'grad_norm': 0.4489469528198242, 'learning_rate': 6.243449435824276e-05, 'epoch': 0.44, 'num_input_tokens_seen': 642640}
{'loss': 1.206, 'grad_norm': 0.5883502960205078, 'learning_rate': 6.141754350553279e-05, 'epoch': 0.44, 'num_input_tokens_seen': 650816}
{'loss': 1.2874, 'grad_norm': 0.4804786443710327, 'learning_rate': 6.0395584540887963e-05, 'epoch': 0.45, 'num_input_tokens_seen': 658736}
{'loss': 1.3168, 'grad_norm': 0.6784992814064026, 'learning_rate': 5.9369065729286245e-05, 'epoch': 0.45, 'num_input_tokens_seen': 666416}
{'loss': 1.3401, 'grad_norm': 0.6340084671974182, 'learning_rate': 5.833843733580512e-05, 'epoch': 0.46, 'num_input_tokens_seen': 673312}
{'loss': 1.3921, 'grad_norm': 1.0485939979553223, 'learning_rate': 5.730415142812059e-05, 'epoch': 0.46, 'num_input_tokens_seen': 681408}
{'loss': 1.2268, 'grad_norm': 0.5049784779548645, 'learning_rate': 5.6266661678215216e-05, 'epoch': 0.47, 'num_input_tokens_seen': 690608}
{'loss': 1.3181, 'grad_norm': 0.8696343302726746, 'learning_rate': 5.522642316338268e-05, 'epoch': 0.47, 'num_input_tokens_seen': 698576}
{'loss': 1.1667, 'grad_norm': 0.5481306910514832, 'learning_rate': 5.418389216661579e-05, 'epoch': 0.48, 'num_input_tokens_seen': 707808}
{'loss': 1.2205, 'grad_norm': 0.6927640438079834, 'learning_rate': 5.313952597646568e-05, 'epoch': 0.49, 'num_input_tokens_seen': 714016}
{'loss': 1.2018, 'grad_norm': 0.5713353753089905, 'learning_rate': 5.209378268645998e-05, 'epoch': 0.49, 'num_input_tokens_seen': 721552}
{'loss': 1.3723, 'grad_norm': 0.6889858245849609, 'learning_rate': 5.104712099416785e-05, 'epoch': 0.5, 'num_input_tokens_seen': 729104}
{'loss': 1.3686, 'grad_norm': 0.4804179072380066, 'learning_rate': 5e-05, 'epoch': 0.5, 'num_input_tokens_seen': 736496}
{'loss': 1.2406, 'grad_norm': 0.5674819350242615, 'learning_rate': 4.895287900583216e-05, 'epoch': 0.51, 'num_input_tokens_seen': 745184}
{'loss': 1.2107, 'grad_norm': 0.5671128630638123, 'learning_rate': 4.790621731354003e-05, 'epoch': 0.51, 'num_input_tokens_seen': 752048}
{'loss': 1.2136, 'grad_norm': 0.4124850630760193, 'learning_rate': 4.6860474023534335e-05, 'epoch': 0.52, 'num_input_tokens_seen': 761088}
{'loss': 1.2939, 'grad_norm': 0.8833000063896179, 'learning_rate': 4.5816107833384234e-05, 'epoch': 0.52, 'num_input_tokens_seen': 768080}
{'loss': 1.2621, 'grad_norm': 0.43334656953811646, 'learning_rate': 4.477357683661734e-05, 'epoch': 0.53, 'num_input_tokens_seen': 777808}
{'loss': 1.3968, 'grad_norm': 0.5995087027549744, 'learning_rate': 4.373333832178478e-05, 'epoch': 0.53, 'num_input_tokens_seen': 786032}
{'loss': 1.2069, 'grad_norm': 0.4385344684123993, 'learning_rate': 4.269584857187943e-05, 'epoch': 0.54, 'num_input_tokens_seen': 793424}
{'loss': 1.1725, 'grad_norm': 0.6815104484558105, 'learning_rate': 4.166156266419489e-05, 'epoch': 0.54, 'num_input_tokens_seen': 801392}
{'loss': 1.1476, 'grad_norm': 0.5028893351554871, 'learning_rate': 4.063093427071376e-05, 'epoch': 0.55, 'num_input_tokens_seen': 809184}
{'loss': 1.3471, 'grad_norm': 0.6175994873046875, 'learning_rate': 3.960441545911204e-05, 'epoch': 0.55, 'num_input_tokens_seen': 816896}
{'loss': 1.3197, 'grad_norm': 0.3137647211551666, 'learning_rate': 3.858245649446721e-05, 'epoch': 0.56, 'num_input_tokens_seen': 825952}
{'loss': 1.2391, 'grad_norm': 0.42882341146469116, 'learning_rate': 3.756550564175727e-05, 'epoch': 0.57, 'num_input_tokens_seen': 835664}
{'loss': 1.2011, 'grad_norm': 0.37835580110549927, 'learning_rate': 3.655400896923672e-05, 'epoch': 0.57, 'num_input_tokens_seen': 844288}
{'loss': 1.2983, 'grad_norm': 0.43478235602378845, 'learning_rate': 3.554841015277641e-05, 'epoch': 0.58, 'num_input_tokens_seen': 851584}
{'loss': 1.3713, 'grad_norm': 0.69629967212677, 'learning_rate': 3.4549150281252636e-05, 'epoch': 0.58, 'num_input_tokens_seen': 861216}
{'loss': 1.3763, 'grad_norm': 0.3036434054374695, 'learning_rate': 3.355666766307084e-05, 'epoch': 0.59, 'num_input_tokens_seen': 869792}
{'loss': 1.1632, 'grad_norm': 0.4899413287639618, 'learning_rate': 3.257139763390925e-05, 'epoch': 0.59, 'num_input_tokens_seen': 877216}
{'loss': 1.2382, 'grad_norm': 0.5674645900726318, 'learning_rate': 3.1593772365766105e-05, 'epoch': 0.6, 'num_input_tokens_seen': 883488}
{'loss': 1.1966, 'grad_norm': 0.7481414079666138, 'learning_rate': 3.062422067739485e-05, 'epoch': 0.6, 'num_input_tokens_seen': 890816}
{'loss': 1.2934, 'grad_norm': 0.3711884617805481, 'learning_rate': 2.9663167846209998e-05, 'epoch': 0.61, 'num_input_tokens_seen': 900352}
{'loss': 1.184, 'grad_norm': 0.3552614152431488, 'learning_rate': 2.8711035421746367e-05, 'epoch': 0.61, 'num_input_tokens_seen': 909312}
{'loss': 1.1655, 'grad_norm': 0.46287208795547485, 'learning_rate': 2.776824104075364e-05, 'epoch': 0.62, 'num_input_tokens_seen': 917168}
{'loss': 1.2241, 'grad_norm': 0.40405702590942383, 'learning_rate': 2.6835198244006927e-05, 'epoch': 0.62, 'num_input_tokens_seen': 923424}
{'loss': 1.275, 'grad_norm': 0.625882625579834, 'learning_rate': 2.591231629491423e-05, 'epoch': 0.63, 'num_input_tokens_seen': 931152}
{'loss': 1.2231, 'grad_norm': 0.4761885404586792, 'learning_rate': 2.500000000000001e-05, 'epoch': 0.63, 'num_input_tokens_seen': 940992}
{'loss': 1.3608, 'grad_norm': 0.34844455122947693, 'learning_rate': 2.4098649531343497e-05, 'epoch': 0.64, 'num_input_tokens_seen': 949936}
{'loss': 1.3481, 'grad_norm': 0.4715827405452728, 'learning_rate': 2.3208660251050158e-05, 'epoch': 0.65, 'num_input_tokens_seen': 958432}
{'loss': 1.2009, 'grad_norm': 1.3269503116607666, 'learning_rate': 2.23304225378328e-05, 'epoch': 0.65, 'num_input_tokens_seen': 966688}
{'loss': 1.1758, 'grad_norm': 0.850872278213501, 'learning_rate': 2.1464321615778422e-05, 'epoch': 0.66, 'num_input_tokens_seen': 973872}
{'loss': 1.3557, 'grad_norm': 0.9200695753097534, 'learning_rate': 2.061073738537635e-05, 'epoch': 0.66, 'num_input_tokens_seen': 981264}
{'loss': 1.4066, 'grad_norm': 0.47994673252105713, 'learning_rate': 1.977004425688126e-05, 'epoch': 0.67, 'num_input_tokens_seen': 988800}
{'loss': 1.4266, 'grad_norm': 0.4955289661884308, 'learning_rate': 1.8942610986084486e-05, 'epoch': 0.67, 'num_input_tokens_seen': 996656}
{'loss': 1.2348, 'grad_norm': 0.5885555744171143, 'learning_rate': 1.8128800512565513e-05, 'epoch': 0.68, 'num_input_tokens_seen': 1004672}
{'loss': 1.2, 'grad_norm': 0.43341994285583496, 'learning_rate': 1.7328969800494726e-05, 'epoch': 0.68, 'num_input_tokens_seen': 1012400}
{'loss': 1.2074, 'grad_norm': 0.5209198594093323, 'learning_rate': 1.6543469682057106e-05, 'epoch': 0.69, 'num_input_tokens_seen': 1018768}
{'loss': 1.3071, 'grad_norm': 0.7264676690101624, 'learning_rate': 1.5772644703565565e-05, 'epoch': 0.69, 'num_input_tokens_seen': 1026368}
{'loss': 1.3392, 'grad_norm': 0.5016859769821167, 'learning_rate': 1.5016832974331724e-05, 'epoch': 0.7, 'num_input_tokens_seen': 1033664}
{'loss': 1.2203, 'grad_norm': 0.5449459552764893, 'learning_rate': 1.4276366018359844e-05, 'epoch': 0.7, 'num_input_tokens_seen': 1042560}
{'loss': 1.3089, 'grad_norm': 0.5243913531303406, 'learning_rate': 1.3551568628929434e-05, 'epoch': 0.71, 'num_input_tokens_seen': 1050336}
{'loss': 1.3856, 'grad_norm': 0.7503966689109802, 'learning_rate': 1.2842758726130283e-05, 'epoch': 0.71, 'num_input_tokens_seen': 1057824}
{'loss': 1.4345, 'grad_norm': 0.8323341012001038, 'learning_rate': 1.2150247217412186e-05, 'epoch': 0.72, 'num_input_tokens_seen': 1065184}
{'loss': 1.3059, 'grad_norm': 0.5052825808525085, 'learning_rate': 1.1474337861210543e-05, 'epoch': 0.73, 'num_input_tokens_seen': 1072016}
{'loss': 1.1952, 'grad_norm': 0.5355105400085449, 'learning_rate': 1.0815327133708015e-05, 'epoch': 0.73, 'num_input_tokens_seen': 1079584}
{'loss': 1.1284, 'grad_norm': 0.4663732051849365, 'learning_rate': 1.0173504098790187e-05, 'epoch': 0.74, 'num_input_tokens_seen': 1087984}
{'loss': 1.1887, 'grad_norm': 0.500197172164917, 'learning_rate': 9.549150281252633e-06, 'epoch': 0.74, 'num_input_tokens_seen': 1096032}
{'loss': 1.1508, 'grad_norm': 0.3888009488582611, 'learning_rate': 8.9425395433148e-06, 'epoch': 0.75, 'num_input_tokens_seen': 1105696}
{'loss': 1.2795, 'grad_norm': 0.5471282601356506, 'learning_rate': 8.353937964495029e-06, 'epoch': 0.75, 'num_input_tokens_seen': 1114048}
{'loss': 1.4123, 'grad_norm': 0.5487935543060303, 'learning_rate': 7.783603724899257e-06, 'epoch': 0.76, 'num_input_tokens_seen': 1122528}
{'loss': 1.2925, 'grad_norm': 0.3951992988586426, 'learning_rate': 7.2317869919746705e-06, 'epoch': 0.76, 'num_input_tokens_seen': 1131328}
{'loss': 1.1505, 'grad_norm': 0.3872891068458557, 'learning_rate': 6.698729810778065e-06, 'epoch': 0.77, 'num_input_tokens_seen': 1138992}
{'loss': 1.3429, 'grad_norm': 0.5767130851745605, 'learning_rate': 6.184665997806832e-06, 'epoch': 0.77, 'num_input_tokens_seen': 1147776}
{'loss': 1.2439, 'grad_norm': 0.7298781275749207, 'learning_rate': 5.689821038439263e-06, 'epoch': 0.78, 'num_input_tokens_seen': 1157360}
{'loss': 1.2647, 'grad_norm': 0.8098281621932983, 'learning_rate': 5.214411988029355e-06, 'epoch': 0.78, 'num_input_tokens_seen': 1166256}
{'loss': 1.3183, 'grad_norm': 0.4596332609653473, 'learning_rate': 4.758647376699032e-06, 'epoch': 0.79, 'num_input_tokens_seen': 1173840}
{'loss': 1.3921, 'grad_norm': 0.7599104642868042, 'learning_rate': 4.322727117869951e-06, 'epoch': 0.79, 'num_input_tokens_seen': 1180848}
{'loss': 1.3069, 'grad_norm': 0.5496035218238831, 'learning_rate': 3.90684242057498e-06, 'epoch': 0.8, 'num_input_tokens_seen': 1189952}
{'loss': 1.228, 'grad_norm': 2.8328826427459717, 'learning_rate': 3.511175705587433e-06, 'epoch': 0.81, 'num_input_tokens_seen': 1198336}
{'loss': 1.264, 'grad_norm': 0.40324077010154724, 'learning_rate': 3.1359005254054273e-06, 'epoch': 0.81, 'num_input_tokens_seen': 1206304}
{'loss': 1.3767, 'grad_norm': 0.7104987502098083, 'learning_rate': 2.7811814881259503e-06, 'epoch': 0.82, 'num_input_tokens_seen': 1214480}
{'loss': 1.2163, 'grad_norm': 0.29078155755996704, 'learning_rate': 2.4471741852423237e-06, 'epoch': 0.82, 'num_input_tokens_seen': 1222608}
{'loss': 1.3609, 'grad_norm': 0.8542816638946533, 'learning_rate': 2.134025123396638e-06, 'epoch': 0.83, 'num_input_tokens_seen': 1229568}
{'loss': 1.2252, 'grad_norm': 0.34084662795066833, 'learning_rate': 1.841871660117095e-06, 'epoch': 0.83, 'num_input_tokens_seen': 1236656}
{'loss': 1.3446, 'grad_norm': 0.4465295374393463, 'learning_rate': 1.5708419435684462e-06, 'epoch': 0.84, 'num_input_tokens_seen': 1243776}
{'loss': 1.1298, 'grad_norm': 0.3369986414909363, 'learning_rate': 1.3210548563419856e-06, 'epoch': 0.84, 'num_input_tokens_seen': 1253296}
{'loss': 1.4383, 'grad_norm': 0.6114615797996521, 'learning_rate': 1.0926199633097157e-06, 'epoch': 0.85, 'num_input_tokens_seen': 1261440}
{'loss': 1.4809, 'grad_norm': 0.4665069282054901, 'learning_rate': 8.856374635655695e-07, 'epoch': 0.85, 'num_input_tokens_seen': 1268336}
{'loss': 1.4316, 'grad_norm': 0.640594482421875, 'learning_rate': 7.001981464747565e-07, 'epoch': 0.86, 'num_input_tokens_seen': 1276224}
{'loss': 1.1178, 'grad_norm': 0.7955052852630615, 'learning_rate': 5.363833518505834e-07, 'epoch': 0.86, 'num_input_tokens_seen': 1284112}
{'loss': 1.3764, 'grad_norm': 0.7011861801147461, 'learning_rate': 3.9426493427611177e-07, 'epoch': 0.87, 'num_input_tokens_seen': 1292800}
{'loss': 1.4021, 'grad_norm': 1.2664375305175781, 'learning_rate': 2.7390523158633554e-07, 'epoch': 0.87, 'num_input_tokens_seen': 1300656}
{'loss': 1.3095, 'grad_norm': 0.46517035365104675, 'learning_rate': 1.753570375247815e-07, 'epoch': 0.88, 'num_input_tokens_seen': 1309792}
{'loss': 1.23, 'grad_norm': 0.7953115105628967, 'learning_rate': 9.866357858642205e-08, 'epoch': 0.89, 'num_input_tokens_seen': 1316000}
{'eval_loss': 1.272219181060791, 'eval_runtime': 56.3441, 'eval_samples_per_second': 17.748, 'eval_steps_per_second': 8.874, 'epoch': 0.89, 'num_input_tokens_seen': 1321104}
{'train_runtime': 1830.2469, 'train_samples_per_second': 4.371, 'train_steps_per_second': 0.273, 'train_tokens_per_second': 1084.007, 'train_loss': 1.3309755539894104, 'epoch': 0.89, 'num_input_tokens_seen': 1321104}
***** train metrics *****
epoch = 0.8889
num_input_tokens_seen = 1321104
total_flos = 52538588GF
train_loss = 1.331
train_runtime = 0:30:30.24
train_samples_per_second = 4.371
train_steps_per_second = 0.273
train_tokens_per_second = 1084.007
Figure saved at: ./results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_loss.png
Figure saved at: ./results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_eval_loss.png
09/19/2024 15:03:48 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
***** eval metrics *****
epoch = 0.8889
eval_loss = 1.2722
eval_runtime = 0:00:56.80
eval_samples_per_second = 17.604
eval_steps_per_second = 8.802
num_input_tokens_seen = 1321104

View File

@ -0,0 +1,31 @@
cutoff_len: 1024
dataset: belle_1m
ddp_timeout: 180000000
do_train: true
eval_steps: 500
eval_strategy: steps
finetuning_type: lora
fp16: true
gradient_accumulation_steps: 8
include_num_input_tokens_seen: true
include_tokens_per_second: true
learning_rate: 0.0001
logging_steps: 3
lora_target: all
lr_scheduler_type: cosine
max_samples: 10000
max_steps: 500
model_name_or_path: ../../../models/qwen
num_train_epochs: 10.0
output_dir: ./results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813
overwrite_cache: true
overwrite_output_dir: true
per_device_eval_batch_size: 2
per_device_train_batch_size: 2
plot_loss: true
preprocessing_num_workers: 16
save_steps: 500
stage: sft
template: qwen
val_size: 0.1
warmup_ratio: 0.1

View File

@ -0,0 +1,33 @@
{"cur_time": "2024-09-19 14:28:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.3}, {"npu_id": 1, "power_dissipation": 89.5}, {"npu_id": 2, "power_dissipation": 92.8}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 90.6}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:29:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.5}, {"npu_id": 1, "power_dissipation": 89.6}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.6}, {"npu_id": 6, "power_dissipation": 93.1}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:30:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.8}, {"npu_id": 1, "power_dissipation": 89.2}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:31:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.5}, {"npu_id": 1, "power_dissipation": 89.7}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 15}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:33:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.7}, {"npu_id": 1, "power_dissipation": 89.7}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.5}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 25}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:34:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 130.6}, {"npu_id": 1, "power_dissipation": 89.6}, {"npu_id": 2, "power_dissipation": 92.9}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 94.2}, {"npu_id": 6, "power_dissipation": 93.1}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:35:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 163.1}, {"npu_id": 1, "power_dissipation": 89.5}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 88.5}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 93.2}, {"npu_id": 7, "power_dissipation": 90.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:36:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 127.7}, {"npu_id": 1, "power_dissipation": 90.0}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:37:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 199.2}, {"npu_id": 1, "power_dissipation": 88.5}, {"npu_id": 2, "power_dissipation": 93.0}, {"npu_id": 3, "power_dissipation": 88.8}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:38:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 163.4}, {"npu_id": 1, "power_dissipation": 89.0}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 89.1}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.2}, {"npu_id": 7, "power_dissipation": 90.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:40:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 181.9}, {"npu_id": 1, "power_dissipation": 90.9}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.5}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.5}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:41:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 143.0}, {"npu_id": 1, "power_dissipation": 90.3}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.1}, {"npu_id": 4, "power_dissipation": 93.2}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:42:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 131.2}, {"npu_id": 1, "power_dissipation": 90.5}, {"npu_id": 2, "power_dissipation": 93.0}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.6}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 91.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:43:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 161.1}, {"npu_id": 1, "power_dissipation": 90.0}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 93.1}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:44:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 154.2}, {"npu_id": 1, "power_dissipation": 90.2}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.5}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:45:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 126.5}, {"npu_id": 1, "power_dissipation": 88.9}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 93.2}, {"npu_id": 7, "power_dissipation": 91.2}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:47:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 144.7}, {"npu_id": 1, "power_dissipation": 91.3}, {"npu_id": 2, "power_dissipation": 93.4}, {"npu_id": 3, "power_dissipation": 88.8}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 94.3}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:48:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 135.1}, {"npu_id": 1, "power_dissipation": 90.0}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:49:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 178.7}, {"npu_id": 1, "power_dissipation": 89.4}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.7}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.6}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:50:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 157.0}, {"npu_id": 1, "power_dissipation": 90.3}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 89.0}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:51:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 142.0}, {"npu_id": 1, "power_dissipation": 90.2}, {"npu_id": 2, "power_dissipation": 93.0}, {"npu_id": 3, "power_dissipation": 88.5}, {"npu_id": 4, "power_dissipation": 92.6}, {"npu_id": 5, "power_dissipation": 93.5}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:52:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 172.2}, {"npu_id": 1, "power_dissipation": 89.6}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.7}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:54:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 172.0}, {"npu_id": 1, "power_dissipation": 91.2}, {"npu_id": 2, "power_dissipation": 92.9}, {"npu_id": 3, "power_dissipation": 89.0}, {"npu_id": 4, "power_dissipation": 92.7}, {"npu_id": 5, "power_dissipation": 94.3}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:55:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 126.0}, {"npu_id": 1, "power_dissipation": 90.7}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.7}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 89.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:56:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 167.3}, {"npu_id": 1, "power_dissipation": 92.1}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:57:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 151.7}, {"npu_id": 1, "power_dissipation": 90.5}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 94.1}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:58:43", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 137.0}, {"npu_id": 1, "power_dissipation": 90.7}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 94.1}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 14:59:53", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 160.1}, {"npu_id": 1, "power_dissipation": 89.3}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.9}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 90.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 15:01:03", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 135.5}, {"npu_id": 1, "power_dissipation": 91.7}, {"npu_id": 2, "power_dissipation": 93.5}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.4}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 15:02:13", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 140.5}, {"npu_id": 1, "power_dissipation": 89.7}, {"npu_id": 2, "power_dissipation": 92.9}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 15:03:23", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 230.1}, {"npu_id": 1, "power_dissipation": 89.3}, {"npu_id": 2, "power_dissipation": 93.4}, {"npu_id": 3, "power_dissipation": 89.1}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 91.0}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 15:04:33", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 133.3}, {"npu_id": 1, "power_dissipation": 90.2}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 91.1}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
{"cur_time": "2024-09-19 15:05:43", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 86.2}, {"npu_id": 1, "power_dissipation": 90.1}, {"npu_id": 2, "power_dissipation": 93.4}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.0}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,10 @@
{
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": "<|im_end|>"
}

View File

@ -0,0 +1,276 @@
# Copyright (c) Alibaba Cloud.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""Tokenization classes for QWen."""
import base64
import logging
import os
import unicodedata
from typing import Collection, Dict, List, Set, Tuple, Union
import tiktoken
from transformers import PreTrainedTokenizer, AddedToken
logger = logging.getLogger(__name__)
VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
ENDOFTEXT = "<|endoftext|>"
IMSTART = "<|im_start|>"
IMEND = "<|im_end|>"
# as the default behavior is changed to allow special tokens in
# regular texts, the surface forms of special tokens need to be
# as different as possible to minimize the impact
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
# changed to use actual index to avoid misconfiguration with vocabulary expansion
SPECIAL_START_ID = 151643
SPECIAL_TOKENS = tuple(
enumerate(
(
(
ENDOFTEXT,
IMSTART,
IMEND,
)
+ EXTRAS
),
start=SPECIAL_START_ID,
)
)
SPECIAL_TOKENS_SET = set(t for i, t in SPECIAL_TOKENS)
def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
with open(tiktoken_bpe_file, "rb") as f:
contents = f.read()
return {
base64.b64decode(token): int(rank)
for token, rank in (line.split() for line in contents.splitlines() if line)
}
class QWenTokenizer(PreTrainedTokenizer):
"""QWen tokenizer."""
vocab_files_names = VOCAB_FILES_NAMES
def __init__(
self,
vocab_file,
errors="replace",
extra_vocab_file=None,
**kwargs,
):
super().__init__(**kwargs)
# how to handle errors in decoding UTF-8 byte sequences
# use ignore if you are in streaming inference
self.errors = errors
self.mergeable_ranks = _load_tiktoken_bpe(vocab_file) # type: Dict[bytes, int]
self.special_tokens = {
token: index
for index, token in SPECIAL_TOKENS
}
# try load extra vocab from file
if extra_vocab_file is not None:
used_ids = set(self.mergeable_ranks.values()) | set(self.special_tokens.values())
extra_mergeable_ranks = _load_tiktoken_bpe(extra_vocab_file)
for token, index in extra_mergeable_ranks.items():
if token in self.mergeable_ranks:
logger.info(f"extra token {token} exists, skipping")
continue
if index in used_ids:
logger.info(f'the index {index} for extra token {token} exists, skipping')
continue
self.mergeable_ranks[token] = index
# the index may be sparse after this, but don't worry tiktoken.Encoding will handle this
enc = tiktoken.Encoding(
"Qwen",
pat_str=PAT_STR,
mergeable_ranks=self.mergeable_ranks,
special_tokens=self.special_tokens,
)
assert (
len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
self.decoder = {
v: k for k, v in self.mergeable_ranks.items()
} # type: dict[int, bytes|str]
self.decoder.update({v: k for k, v in self.special_tokens.items()})
self.tokenizer = enc # type: tiktoken.Encoding
self.eod_id = self.tokenizer.eot_token
self.im_start_id = self.special_tokens[IMSTART]
self.im_end_id = self.special_tokens[IMEND]
def __getstate__(self):
# for pickle lovers
state = self.__dict__.copy()
del state["tokenizer"]
return state
def __setstate__(self, state):
# tokenizer is not python native; don't pass it; rebuild it
self.__dict__.update(state)
enc = tiktoken.Encoding(
"Qwen",
pat_str=PAT_STR,
mergeable_ranks=self.mergeable_ranks,
special_tokens=self.special_tokens,
)
self.tokenizer = enc
def __len__(self) -> int:
return self.tokenizer.n_vocab
def get_vocab(self) -> Dict[bytes, int]:
return self.mergeable_ranks
def convert_tokens_to_ids(
self, tokens: Union[bytes, str, List[Union[bytes, str]]]
) -> List[int]:
ids = []
if isinstance(tokens, (str, bytes)):
if tokens in self.special_tokens:
return self.special_tokens[tokens]
else:
return self.mergeable_ranks.get(tokens)
for token in tokens:
if token in self.special_tokens:
ids.append(self.special_tokens[token])
else:
ids.append(self.mergeable_ranks.get(token))
return ids
def _add_tokens(
self,
new_tokens: Union[List[str], List[AddedToken]],
special_tokens: bool = False,
) -> int:
if not special_tokens and new_tokens:
raise ValueError("Adding regular tokens is not supported")
for token in new_tokens:
surface_form = token.content if isinstance(token, AddedToken) else token
if surface_form not in SPECIAL_TOKENS_SET:
raise ValueError("Adding unknown special tokens is not supported")
return 0
def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
"""
Save only the vocabulary of the tokenizer (vocabulary).
Returns:
`Tuple(str)`: Paths to the files saved.
"""
file_path = os.path.join(save_directory, "qwen.tiktoken")
with open(file_path, "w", encoding="utf8") as w:
for k, v in self.mergeable_ranks.items():
line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
w.write(line)
return (file_path,)
def tokenize(
self,
text: str,
allowed_special: Union[Set, str] = "all",
disallowed_special: Union[Collection, str] = (),
**kwargs,
) -> List[Union[bytes, str]]:
"""
Converts a string in a sequence of tokens.
Args:
text (`str`):
The sequence to be encoded.
allowed_special (`Literal["all"]` or `set`):
The surface forms of the tokens to be encoded as special tokens in regular texts.
Default to "all".
disallowed_special (`Literal["all"]` or `Collection`):
The surface forms of the tokens that should not be in regular texts and trigger errors.
Default to an empty tuple.
kwargs (additional keyword arguments, *optional*):
Will be passed to the underlying model specific encode method.
Returns:
`List[bytes|str]`: The list of tokens.
"""
tokens = []
text = unicodedata.normalize("NFC", text)
# this implementation takes a detour: text -> token id -> token surface forms
for t in self.tokenizer.encode(
text, allowed_special=allowed_special, disallowed_special=disallowed_special
):
tokens.append(self.decoder[t])
return tokens
def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
"""
Converts a sequence of tokens in a single string.
"""
text = ""
temp = b""
for t in tokens:
if isinstance(t, str):
if temp:
text += temp.decode("utf-8", errors=self.errors)
temp = b""
text += t
elif isinstance(t, bytes):
temp += t
else:
raise TypeError("token should only be of type types or str")
if temp:
text += temp.decode("utf-8", errors=self.errors)
return text
@property
def vocab_size(self):
return self.tokenizer.n_vocab
def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
"""Converts an id to a token, special tokens included"""
if index in self.decoder:
return self.decoder[index]
raise ValueError("unknown ids")
def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
"""Converts a token to an id using the vocab, special tokens included"""
if token in self.special_tokens:
return self.special_tokens[token]
if token in self.mergeable_ranks:
return self.mergeable_ranks[token]
raise ValueError("unknown token")
def _tokenize(self, text: str, **kwargs):
"""
Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
Do NOT take care of added tokens.
"""
raise NotImplementedError
def _decode(
self,
token_ids: Union[int, List[int]],
skip_special_tokens: bool = False,
errors: str = None,
**kwargs,
) -> str:
if isinstance(token_ids, int):
token_ids = [token_ids]
if skip_special_tokens:
token_ids = [i for i in token_ids if i < self.eod_id]
return self.tokenizer.decode(token_ids, errors=errors or self.errors)

View File

@ -0,0 +1,17 @@
{
"added_tokens_decoder": {},
"auto_map": {
"AutoTokenizer": [
"tokenization_qwen.QWenTokenizer",
null
]
},
"chat_template": "{% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}",
"clean_up_tokenization_spaces": true,
"eos_token": "<|im_end|>",
"model_max_length": 32768,
"pad_token": "<|im_end|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "QWenTokenizer"
}

View File

@ -0,0 +1,10 @@
{
"epoch": 0.8888888888888888,
"num_input_tokens_seen": 1321104,
"total_flos": 5.641287949968998e+16,
"train_loss": 1.3309755539894104,
"train_runtime": 1830.2469,
"train_samples_per_second": 4.371,
"train_steps_per_second": 0.273,
"train_tokens_per_second": 1084.007
}

View File

@ -0,0 +1,168 @@
{"current_steps": 3, "total_steps": 500, "loss": 1.5189, "learning_rate": 4.000000000000001e-06, "epoch": 0.005333333333333333, "percentage": 0.6, "cur_time": "2024-09-19 14:33:30", "elapsed_time": "0:00:12", "remaining_time": "0:35:14", "throughput": 768.4, "total_tokens": 9808}
{"current_steps": 6, "total_steps": 500, "loss": 1.5574, "learning_rate": 6e-06, "epoch": 0.010666666666666666, "percentage": 1.2, "cur_time": "2024-09-19 14:33:40", "elapsed_time": "0:00:23", "remaining_time": "0:32:00", "throughput": 827.9, "total_tokens": 19312}
{"current_steps": 9, "total_steps": 500, "loss": 1.5909, "learning_rate": 1e-05, "epoch": 0.016, "percentage": 1.8, "cur_time": "2024-09-19 14:33:51", "elapsed_time": "0:00:34", "remaining_time": "0:31:14", "throughput": 850.73, "total_tokens": 29232}
{"current_steps": 12, "total_steps": 500, "loss": 1.8082, "learning_rate": 1.6000000000000003e-05, "epoch": 0.021333333333333333, "percentage": 2.4, "cur_time": "2024-09-19 14:34:02", "elapsed_time": "0:00:44", "remaining_time": "0:30:20", "throughput": 848.54, "total_tokens": 37984}
{"current_steps": 15, "total_steps": 500, "loss": 1.7089, "learning_rate": 2e-05, "epoch": 0.02666666666666667, "percentage": 3.0, "cur_time": "2024-09-19 14:34:12", "elapsed_time": "0:00:55", "remaining_time": "0:29:47", "throughput": 806.72, "total_tokens": 44592}
{"current_steps": 18, "total_steps": 500, "loss": 1.7715, "learning_rate": 2.6000000000000002e-05, "epoch": 0.032, "percentage": 3.6, "cur_time": "2024-09-19 14:34:23", "elapsed_time": "0:01:06", "remaining_time": "0:29:33", "throughput": 791.08, "total_tokens": 52400}
{"current_steps": 21, "total_steps": 500, "loss": 1.5624, "learning_rate": 3.2000000000000005e-05, "epoch": 0.037333333333333336, "percentage": 4.2, "cur_time": "2024-09-19 14:34:34", "elapsed_time": "0:01:16", "remaining_time": "0:29:16", "throughput": 783.5, "total_tokens": 60320}
{"current_steps": 24, "total_steps": 500, "loss": 1.9382, "learning_rate": 3.8e-05, "epoch": 0.042666666666666665, "percentage": 4.8, "cur_time": "2024-09-19 14:34:45", "elapsed_time": "0:01:27", "remaining_time": "0:29:03", "throughput": 762.52, "total_tokens": 67024}
{"current_steps": 27, "total_steps": 500, "loss": 1.5455, "learning_rate": 4.4000000000000006e-05, "epoch": 0.048, "percentage": 5.4, "cur_time": "2024-09-19 14:34:56", "elapsed_time": "0:01:38", "remaining_time": "0:28:50", "throughput": 746.98, "total_tokens": 73776}
{"current_steps": 30, "total_steps": 500, "loss": 1.6186, "learning_rate": 5e-05, "epoch": 0.05333333333333334, "percentage": 6.0, "cur_time": "2024-09-19 14:35:07", "elapsed_time": "0:01:49", "remaining_time": "0:28:42", "throughput": 751.4, "total_tokens": 82592}
{"current_steps": 33, "total_steps": 500, "loss": 1.5182, "learning_rate": 5.6000000000000006e-05, "epoch": 0.058666666666666666, "percentage": 6.6, "cur_time": "2024-09-19 14:35:18", "elapsed_time": "0:02:00", "remaining_time": "0:28:30", "throughput": 748.85, "total_tokens": 90512}
{"current_steps": 36, "total_steps": 500, "loss": 1.4462, "learning_rate": 6e-05, "epoch": 0.064, "percentage": 7.2, "cur_time": "2024-09-19 14:35:29", "elapsed_time": "0:02:11", "remaining_time": "0:28:16", "throughput": 735.9, "total_tokens": 96848}
{"current_steps": 39, "total_steps": 500, "loss": 1.3399, "learning_rate": 6.6e-05, "epoch": 0.06933333333333333, "percentage": 7.8, "cur_time": "2024-09-19 14:35:39", "elapsed_time": "0:02:22", "remaining_time": "0:28:01", "throughput": 729.05, "total_tokens": 103728}
{"current_steps": 42, "total_steps": 500, "loss": 1.3188, "learning_rate": 7.2e-05, "epoch": 0.07466666666666667, "percentage": 8.4, "cur_time": "2024-09-19 14:35:50", "elapsed_time": "0:02:33", "remaining_time": "0:27:50", "throughput": 732.22, "total_tokens": 112160}
{"current_steps": 45, "total_steps": 500, "loss": 1.535, "learning_rate": 7.800000000000001e-05, "epoch": 0.08, "percentage": 9.0, "cur_time": "2024-09-19 14:36:01", "elapsed_time": "0:02:43", "remaining_time": "0:27:37", "throughput": 719.58, "total_tokens": 117984}
{"current_steps": 48, "total_steps": 500, "loss": 1.363, "learning_rate": 8.4e-05, "epoch": 0.08533333333333333, "percentage": 9.6, "cur_time": "2024-09-19 14:36:11", "elapsed_time": "0:02:54", "remaining_time": "0:27:21", "throughput": 726.53, "total_tokens": 126624}
{"current_steps": 51, "total_steps": 500, "loss": 1.5687, "learning_rate": 9e-05, "epoch": 0.09066666666666667, "percentage": 10.2, "cur_time": "2024-09-19 14:36:22", "elapsed_time": "0:03:04", "remaining_time": "0:27:07", "throughput": 727.42, "total_tokens": 134496}
{"current_steps": 54, "total_steps": 500, "loss": 1.6227, "learning_rate": 9.6e-05, "epoch": 0.096, "percentage": 10.8, "cur_time": "2024-09-19 14:36:33", "elapsed_time": "0:03:15", "remaining_time": "0:26:55", "throughput": 724.07, "total_tokens": 141664}
{"current_steps": 57, "total_steps": 500, "loss": 1.3263, "learning_rate": 9.999878153526974e-05, "epoch": 0.10133333333333333, "percentage": 11.4, "cur_time": "2024-09-19 14:36:43", "elapsed_time": "0:03:26", "remaining_time": "0:26:44", "throughput": 723.3, "total_tokens": 149360}
{"current_steps": 60, "total_steps": 500, "loss": 1.3676, "learning_rate": 9.998050575201771e-05, "epoch": 0.10666666666666667, "percentage": 12.0, "cur_time": "2024-09-19 14:36:54", "elapsed_time": "0:03:37", "remaining_time": "0:26:33", "throughput": 725.97, "total_tokens": 157776}
{"current_steps": 63, "total_steps": 500, "loss": 1.4657, "learning_rate": 9.99403068670717e-05, "epoch": 0.112, "percentage": 12.6, "cur_time": "2024-09-19 14:37:05", "elapsed_time": "0:03:47", "remaining_time": "0:26:19", "throughput": 726.29, "total_tokens": 165360}
{"current_steps": 66, "total_steps": 500, "loss": 1.3857, "learning_rate": 9.987820251299122e-05, "epoch": 0.11733333333333333, "percentage": 13.2, "cur_time": "2024-09-19 14:37:15", "elapsed_time": "0:03:58", "remaining_time": "0:26:05", "throughput": 727.34, "total_tokens": 173152}
{"current_steps": 69, "total_steps": 500, "loss": 1.549, "learning_rate": 9.979421993079852e-05, "epoch": 0.12266666666666666, "percentage": 13.8, "cur_time": "2024-09-19 14:37:25", "elapsed_time": "0:04:08", "remaining_time": "0:25:52", "throughput": 730.38, "total_tokens": 181504}
{"current_steps": 72, "total_steps": 500, "loss": 1.4207, "learning_rate": 9.968839595802982e-05, "epoch": 0.128, "percentage": 14.4, "cur_time": "2024-09-19 14:37:36", "elapsed_time": "0:04:19", "remaining_time": "0:25:43", "throughput": 727.6, "total_tokens": 188880}
{"current_steps": 75, "total_steps": 500, "loss": 1.4148, "learning_rate": 9.956077701257709e-05, "epoch": 0.13333333333333333, "percentage": 15.0, "cur_time": "2024-09-19 14:37:48", "elapsed_time": "0:04:30", "remaining_time": "0:25:33", "throughput": 724.13, "total_tokens": 195952}
{"current_steps": 78, "total_steps": 500, "loss": 1.2252, "learning_rate": 9.941141907232765e-05, "epoch": 0.13866666666666666, "percentage": 15.6, "cur_time": "2024-09-19 14:37:59", "elapsed_time": "0:04:41", "remaining_time": "0:25:23", "throughput": 723.32, "total_tokens": 203744}
{"current_steps": 81, "total_steps": 500, "loss": 1.1623, "learning_rate": 9.924038765061042e-05, "epoch": 0.144, "percentage": 16.2, "cur_time": "2024-09-19 14:38:10", "elapsed_time": "0:04:52", "remaining_time": "0:25:14", "throughput": 726.3, "total_tokens": 212624}
{"current_steps": 84, "total_steps": 500, "loss": 1.5017, "learning_rate": 9.904775776745958e-05, "epoch": 0.14933333333333335, "percentage": 16.8, "cur_time": "2024-09-19 14:38:21", "elapsed_time": "0:05:03", "remaining_time": "0:25:04", "throughput": 724.25, "total_tokens": 219968}
{"current_steps": 87, "total_steps": 500, "loss": 1.2643, "learning_rate": 9.88336139167084e-05, "epoch": 0.15466666666666667, "percentage": 17.4, "cur_time": "2024-09-19 14:38:31", "elapsed_time": "0:05:14", "remaining_time": "0:24:50", "throughput": 729.2, "total_tokens": 229024}
{"current_steps": 90, "total_steps": 500, "loss": 1.2085, "learning_rate": 9.859805002892732e-05, "epoch": 0.16, "percentage": 18.0, "cur_time": "2024-09-19 14:38:41", "elapsed_time": "0:05:24", "remaining_time": "0:24:37", "throughput": 729.81, "total_tokens": 236768}
{"current_steps": 93, "total_steps": 500, "loss": 1.333, "learning_rate": 9.834116943022298e-05, "epoch": 0.16533333333333333, "percentage": 18.6, "cur_time": "2024-09-19 14:38:52", "elapsed_time": "0:05:34", "remaining_time": "0:24:24", "throughput": 732.33, "total_tokens": 245152}
{"current_steps": 96, "total_steps": 500, "loss": 1.5669, "learning_rate": 9.806308479691595e-05, "epoch": 0.17066666666666666, "percentage": 19.2, "cur_time": "2024-09-19 14:39:02", "elapsed_time": "0:05:45", "remaining_time": "0:24:12", "throughput": 729.35, "total_tokens": 251728}
{"current_steps": 99, "total_steps": 500, "loss": 1.317, "learning_rate": 9.776391810611718e-05, "epoch": 0.176, "percentage": 19.8, "cur_time": "2024-09-19 14:39:13", "elapsed_time": "0:05:55", "remaining_time": "0:24:00", "throughput": 733.35, "total_tokens": 260832}
{"current_steps": 102, "total_steps": 500, "loss": 1.3829, "learning_rate": 9.744380058222483e-05, "epoch": 0.18133333333333335, "percentage": 20.4, "cur_time": "2024-09-19 14:39:23", "elapsed_time": "0:06:06", "remaining_time": "0:23:50", "throughput": 732.38, "total_tokens": 268448}
{"current_steps": 105, "total_steps": 500, "loss": 1.2588, "learning_rate": 9.710287263936484e-05, "epoch": 0.18666666666666668, "percentage": 21.0, "cur_time": "2024-09-19 14:39:34", "elapsed_time": "0:06:17", "remaining_time": "0:23:39", "throughput": 731.63, "total_tokens": 276016}
{"current_steps": 108, "total_steps": 500, "loss": 1.3142, "learning_rate": 9.674128381980072e-05, "epoch": 0.192, "percentage": 21.6, "cur_time": "2024-09-19 14:39:45", "elapsed_time": "0:06:28", "remaining_time": "0:23:28", "throughput": 727.67, "total_tokens": 282416}
{"current_steps": 111, "total_steps": 500, "loss": 1.4468, "learning_rate": 9.635919272833938e-05, "epoch": 0.19733333333333333, "percentage": 22.2, "cur_time": "2024-09-19 14:39:56", "elapsed_time": "0:06:38", "remaining_time": "0:23:18", "throughput": 726.96, "total_tokens": 290016}
{"current_steps": 114, "total_steps": 500, "loss": 1.5577, "learning_rate": 9.595676696276172e-05, "epoch": 0.20266666666666666, "percentage": 22.8, "cur_time": "2024-09-19 14:40:07", "elapsed_time": "0:06:49", "remaining_time": "0:23:07", "throughput": 727.14, "total_tokens": 297952}
{"current_steps": 117, "total_steps": 500, "loss": 1.3545, "learning_rate": 9.553418304030886e-05, "epoch": 0.208, "percentage": 23.4, "cur_time": "2024-09-19 14:40:18", "elapsed_time": "0:07:00", "remaining_time": "0:22:57", "throughput": 725.15, "total_tokens": 305152}
{"current_steps": 120, "total_steps": 500, "loss": 1.2359, "learning_rate": 9.50916263202557e-05, "epoch": 0.21333333333333335, "percentage": 24.0, "cur_time": "2024-09-19 14:40:29", "elapsed_time": "0:07:11", "remaining_time": "0:22:47", "throughput": 723.79, "total_tokens": 312592}
{"current_steps": 123, "total_steps": 500, "loss": 1.2377, "learning_rate": 9.462929092260628e-05, "epoch": 0.21866666666666668, "percentage": 24.6, "cur_time": "2024-09-19 14:40:39", "elapsed_time": "0:07:22", "remaining_time": "0:22:35", "throughput": 723.22, "total_tokens": 319792}
{"current_steps": 126, "total_steps": 500, "loss": 1.402, "learning_rate": 9.414737964294636e-05, "epoch": 0.224, "percentage": 25.2, "cur_time": "2024-09-19 14:40:49", "elapsed_time": "0:07:32", "remaining_time": "0:22:23", "throughput": 724.28, "total_tokens": 327776}
{"current_steps": 129, "total_steps": 500, "loss": 1.3244, "learning_rate": 9.364610386349049e-05, "epoch": 0.22933333333333333, "percentage": 25.8, "cur_time": "2024-09-19 14:41:00", "elapsed_time": "0:07:43", "remaining_time": "0:22:11", "throughput": 726.49, "total_tokens": 336432}
{"current_steps": 132, "total_steps": 500, "loss": 1.3333, "learning_rate": 9.312568346036288e-05, "epoch": 0.23466666666666666, "percentage": 26.4, "cur_time": "2024-09-19 14:41:11", "elapsed_time": "0:07:53", "remaining_time": "0:22:01", "throughput": 724.2, "total_tokens": 343216}
{"current_steps": 135, "total_steps": 500, "loss": 1.2563, "learning_rate": 9.258634670715238e-05, "epoch": 0.24, "percentage": 27.0, "cur_time": "2024-09-19 14:41:21", "elapsed_time": "0:08:04", "remaining_time": "0:21:49", "throughput": 722.89, "total_tokens": 350096}
{"current_steps": 138, "total_steps": 500, "loss": 1.3305, "learning_rate": 9.202833017478422e-05, "epoch": 0.24533333333333332, "percentage": 27.6, "cur_time": "2024-09-19 14:41:32", "elapsed_time": "0:08:14", "remaining_time": "0:21:37", "throughput": 725.77, "total_tokens": 359040}
{"current_steps": 141, "total_steps": 500, "loss": 1.2371, "learning_rate": 9.145187862775209e-05, "epoch": 0.25066666666666665, "percentage": 28.2, "cur_time": "2024-09-19 14:41:42", "elapsed_time": "0:08:25", "remaining_time": "0:21:26", "throughput": 727.69, "total_tokens": 367568}
{"current_steps": 144, "total_steps": 500, "loss": 1.2544, "learning_rate": 9.085724491675642e-05, "epoch": 0.256, "percentage": 28.8, "cur_time": "2024-09-19 14:41:52", "elapsed_time": "0:08:35", "remaining_time": "0:21:14", "throughput": 728.61, "total_tokens": 375552}
{"current_steps": 147, "total_steps": 500, "loss": 1.3928, "learning_rate": 9.02446898677957e-05, "epoch": 0.2613333333333333, "percentage": 29.4, "cur_time": "2024-09-19 14:42:03", "elapsed_time": "0:08:45", "remaining_time": "0:21:02", "throughput": 733.4, "total_tokens": 385712}
{"current_steps": 150, "total_steps": 500, "loss": 1.1703, "learning_rate": 8.961448216775954e-05, "epoch": 0.26666666666666666, "percentage": 30.0, "cur_time": "2024-09-19 14:42:13", "elapsed_time": "0:08:56", "remaining_time": "0:20:50", "throughput": 733.69, "total_tokens": 393344}
{"current_steps": 153, "total_steps": 500, "loss": 1.3133, "learning_rate": 8.896689824657372e-05, "epoch": 0.272, "percentage": 30.6, "cur_time": "2024-09-19 14:42:23", "elapsed_time": "0:09:06", "remaining_time": "0:20:39", "throughput": 733.38, "total_tokens": 400720}
{"current_steps": 156, "total_steps": 500, "loss": 1.2321, "learning_rate": 8.83022221559489e-05, "epoch": 0.2773333333333333, "percentage": 31.2, "cur_time": "2024-09-19 14:42:34", "elapsed_time": "0:09:17", "remaining_time": "0:20:28", "throughput": 734.04, "total_tokens": 409056}
{"current_steps": 159, "total_steps": 500, "loss": 1.3816, "learning_rate": 8.762074544478623e-05, "epoch": 0.2826666666666667, "percentage": 31.8, "cur_time": "2024-09-19 14:42:45", "elapsed_time": "0:09:27", "remaining_time": "0:20:17", "throughput": 732.97, "total_tokens": 416144}
{"current_steps": 162, "total_steps": 500, "loss": 1.1577, "learning_rate": 8.692276703129421e-05, "epoch": 0.288, "percentage": 32.4, "cur_time": "2024-09-19 14:42:56", "elapsed_time": "0:09:38", "remaining_time": "0:20:07", "throughput": 730.57, "total_tokens": 422880}
{"current_steps": 165, "total_steps": 500, "loss": 1.2287, "learning_rate": 8.620859307187339e-05, "epoch": 0.29333333333333333, "percentage": 33.0, "cur_time": "2024-09-19 14:43:07", "elapsed_time": "0:09:49", "remaining_time": "0:19:57", "throughput": 728.02, "total_tokens": 429440}
{"current_steps": 168, "total_steps": 500, "loss": 1.2672, "learning_rate": 8.547853682682604e-05, "epoch": 0.2986666666666667, "percentage": 33.6, "cur_time": "2024-09-19 14:43:18", "elapsed_time": "0:10:00", "remaining_time": "0:19:47", "throughput": 727.46, "total_tokens": 437200}
{"current_steps": 171, "total_steps": 500, "loss": 1.299, "learning_rate": 8.473291852294987e-05, "epoch": 0.304, "percentage": 34.2, "cur_time": "2024-09-19 14:43:29", "elapsed_time": "0:10:11", "remaining_time": "0:19:36", "throughput": 729.8, "total_tokens": 446416}
{"current_steps": 174, "total_steps": 500, "loss": 1.4358, "learning_rate": 8.397206521307584e-05, "epoch": 0.30933333333333335, "percentage": 34.8, "cur_time": "2024-09-19 14:43:39", "elapsed_time": "0:10:22", "remaining_time": "0:19:26", "throughput": 730.48, "total_tokens": 454672}
{"current_steps": 177, "total_steps": 500, "loss": 1.2488, "learning_rate": 8.319631063261209e-05, "epoch": 0.31466666666666665, "percentage": 35.4, "cur_time": "2024-09-19 14:43:50", "elapsed_time": "0:10:33", "remaining_time": "0:19:15", "throughput": 731.53, "total_tokens": 463216}
{"current_steps": 180, "total_steps": 500, "loss": 1.2692, "learning_rate": 8.240599505315655e-05, "epoch": 0.32, "percentage": 36.0, "cur_time": "2024-09-19 14:44:01", "elapsed_time": "0:10:43", "remaining_time": "0:19:04", "throughput": 733.8, "total_tokens": 472432}
{"current_steps": 183, "total_steps": 500, "loss": 1.2339, "learning_rate": 8.160146513324254e-05, "epoch": 0.3253333333333333, "percentage": 36.6, "cur_time": "2024-09-19 14:44:11", "elapsed_time": "0:10:54", "remaining_time": "0:18:53", "throughput": 733.66, "total_tokens": 479968}
{"current_steps": 186, "total_steps": 500, "loss": 1.4274, "learning_rate": 8.07830737662829e-05, "epoch": 0.33066666666666666, "percentage": 37.2, "cur_time": "2024-09-19 14:44:22", "elapsed_time": "0:11:04", "remaining_time": "0:18:42", "throughput": 734.36, "total_tokens": 488112}
{"current_steps": 189, "total_steps": 500, "loss": 1.3689, "learning_rate": 7.99511799257793e-05, "epoch": 0.336, "percentage": 37.8, "cur_time": "2024-09-19 14:44:32", "elapsed_time": "0:11:15", "remaining_time": "0:18:31", "throughput": 734.04, "total_tokens": 495808}
{"current_steps": 192, "total_steps": 500, "loss": 1.344, "learning_rate": 7.910614850786448e-05, "epoch": 0.3413333333333333, "percentage": 38.4, "cur_time": "2024-09-19 14:44:43", "elapsed_time": "0:11:26", "remaining_time": "0:18:20", "throughput": 733.19, "total_tokens": 503216}
{"current_steps": 195, "total_steps": 500, "loss": 1.2793, "learning_rate": 7.82483501712469e-05, "epoch": 0.3466666666666667, "percentage": 39.0, "cur_time": "2024-09-19 14:44:54", "elapsed_time": "0:11:37", "remaining_time": "0:18:10", "throughput": 733.77, "total_tokens": 511568}
{"current_steps": 198, "total_steps": 500, "loss": 1.5863, "learning_rate": 7.737816117462752e-05, "epoch": 0.352, "percentage": 39.6, "cur_time": "2024-09-19 14:45:05", "elapsed_time": "0:11:47", "remaining_time": "0:17:59", "throughput": 734.08, "total_tokens": 519536}
{"current_steps": 201, "total_steps": 500, "loss": 1.5119, "learning_rate": 7.649596321166024e-05, "epoch": 0.35733333333333334, "percentage": 40.2, "cur_time": "2024-09-19 14:45:15", "elapsed_time": "0:11:58", "remaining_time": "0:17:48", "throughput": 732.99, "total_tokens": 526576}
{"current_steps": 204, "total_steps": 500, "loss": 1.335, "learning_rate": 7.560214324352858e-05, "epoch": 0.3626666666666667, "percentage": 40.8, "cur_time": "2024-09-19 14:45:26", "elapsed_time": "0:12:09", "remaining_time": "0:17:37", "throughput": 732.15, "total_tokens": 533824}
{"current_steps": 207, "total_steps": 500, "loss": 1.2377, "learning_rate": 7.469709332921155e-05, "epoch": 0.368, "percentage": 41.4, "cur_time": "2024-09-19 14:45:37", "elapsed_time": "0:12:19", "remaining_time": "0:17:27", "throughput": 731.85, "total_tokens": 541472}
{"current_steps": 210, "total_steps": 500, "loss": 1.1963, "learning_rate": 7.378121045351378e-05, "epoch": 0.37333333333333335, "percentage": 42.0, "cur_time": "2024-09-19 14:45:47", "elapsed_time": "0:12:30", "remaining_time": "0:17:16", "throughput": 733.12, "total_tokens": 550144}
{"current_steps": 213, "total_steps": 500, "loss": 1.3623, "learning_rate": 7.285489635293472e-05, "epoch": 0.37866666666666665, "percentage": 42.6, "cur_time": "2024-09-19 14:45:58", "elapsed_time": "0:12:40", "remaining_time": "0:17:05", "throughput": 733.12, "total_tokens": 557872}
{"current_steps": 216, "total_steps": 500, "loss": 1.2955, "learning_rate": 7.191855733945387e-05, "epoch": 0.384, "percentage": 43.2, "cur_time": "2024-09-19 14:46:08", "elapsed_time": "0:12:51", "remaining_time": "0:16:54", "throughput": 730.69, "total_tokens": 563728}
{"current_steps": 219, "total_steps": 500, "loss": 1.2986, "learning_rate": 7.097260412230886e-05, "epoch": 0.3893333333333333, "percentage": 43.8, "cur_time": "2024-09-19 14:46:19", "elapsed_time": "0:13:02", "remaining_time": "0:16:43", "throughput": 730.21, "total_tokens": 571056}
{"current_steps": 222, "total_steps": 500, "loss": 1.0992, "learning_rate": 7.001745162784477e-05, "epoch": 0.39466666666666667, "percentage": 44.4, "cur_time": "2024-09-19 14:46:29", "elapsed_time": "0:13:12", "remaining_time": "0:16:32", "throughput": 730.81, "total_tokens": 579184}
{"current_steps": 225, "total_steps": 500, "loss": 1.437, "learning_rate": 6.905351881751372e-05, "epoch": 0.4, "percentage": 45.0, "cur_time": "2024-09-19 14:46:40", "elapsed_time": "0:13:22", "remaining_time": "0:16:21", "throughput": 730.11, "total_tokens": 586128}
{"current_steps": 228, "total_steps": 500, "loss": 1.431, "learning_rate": 6.808122850410461e-05, "epoch": 0.4053333333333333, "percentage": 45.6, "cur_time": "2024-09-19 14:46:50", "elapsed_time": "0:13:33", "remaining_time": "0:16:10", "throughput": 731.52, "total_tokens": 594848}
{"current_steps": 231, "total_steps": 500, "loss": 1.3132, "learning_rate": 6.710100716628344e-05, "epoch": 0.4106666666666667, "percentage": 46.2, "cur_time": "2024-09-19 14:47:00", "elapsed_time": "0:13:43", "remaining_time": "0:15:59", "throughput": 731.65, "total_tokens": 602544}
{"current_steps": 234, "total_steps": 500, "loss": 1.3838, "learning_rate": 6.644333233692916e-05, "epoch": 0.416, "percentage": 46.8, "cur_time": "2024-09-19 14:47:11", "elapsed_time": "0:13:53", "remaining_time": "0:15:47", "throughput": 731.42, "total_tokens": 609904}
{"current_steps": 237, "total_steps": 500, "loss": 1.1309, "learning_rate": 6.545084971874738e-05, "epoch": 0.42133333333333334, "percentage": 47.4, "cur_time": "2024-09-19 14:47:21", "elapsed_time": "0:14:04", "remaining_time": "0:15:36", "throughput": 732.82, "total_tokens": 618576}
{"current_steps": 240, "total_steps": 500, "loss": 1.3572, "learning_rate": 6.445158984722358e-05, "epoch": 0.4266666666666667, "percentage": 48.0, "cur_time": "2024-09-19 14:47:31", "elapsed_time": "0:14:14", "remaining_time": "0:15:25", "throughput": 733.87, "total_tokens": 626960}
{"current_steps": 243, "total_steps": 500, "loss": 1.3563, "learning_rate": 6.344599103076329e-05, "epoch": 0.432, "percentage": 48.6, "cur_time": "2024-09-19 14:47:41", "elapsed_time": "0:14:24", "remaining_time": "0:15:14", "throughput": 734.35, "total_tokens": 634896}
{"current_steps": 246, "total_steps": 500, "loss": 1.2947, "learning_rate": 6.243449435824276e-05, "epoch": 0.43733333333333335, "percentage": 49.2, "cur_time": "2024-09-19 14:47:52", "elapsed_time": "0:14:34", "remaining_time": "0:15:03", "throughput": 734.59, "total_tokens": 642640}
{"current_steps": 249, "total_steps": 500, "loss": 1.206, "learning_rate": 6.141754350553279e-05, "epoch": 0.44266666666666665, "percentage": 49.8, "cur_time": "2024-09-19 14:48:02", "elapsed_time": "0:14:45", "remaining_time": "0:14:52", "throughput": 735.33, "total_tokens": 650816}
{"current_steps": 252, "total_steps": 500, "loss": 1.2874, "learning_rate": 6.0395584540887963e-05, "epoch": 0.448, "percentage": 50.4, "cur_time": "2024-09-19 14:48:12", "elapsed_time": "0:14:55", "remaining_time": "0:14:41", "throughput": 735.72, "total_tokens": 658736}
{"current_steps": 255, "total_steps": 500, "loss": 1.3168, "learning_rate": 5.9369065729286245e-05, "epoch": 0.4533333333333333, "percentage": 51.0, "cur_time": "2024-09-19 14:48:23", "elapsed_time": "0:15:05", "remaining_time": "0:14:30", "throughput": 735.88, "total_tokens": 666416}
{"current_steps": 258, "total_steps": 500, "loss": 1.3401, "learning_rate": 5.833843733580512e-05, "epoch": 0.45866666666666667, "percentage": 51.6, "cur_time": "2024-09-19 14:48:33", "elapsed_time": "0:15:15", "remaining_time": "0:14:19", "throughput": 735.18, "total_tokens": 673312}
{"current_steps": 261, "total_steps": 500, "loss": 1.3921, "learning_rate": 5.730415142812059e-05, "epoch": 0.464, "percentage": 52.2, "cur_time": "2024-09-19 14:48:43", "elapsed_time": "0:15:26", "remaining_time": "0:14:08", "throughput": 735.78, "total_tokens": 681408}
{"current_steps": 264, "total_steps": 500, "loss": 1.2268, "learning_rate": 5.6266661678215216e-05, "epoch": 0.4693333333333333, "percentage": 52.8, "cur_time": "2024-09-19 14:48:53", "elapsed_time": "0:15:36", "remaining_time": "0:13:57", "throughput": 737.55, "total_tokens": 690608}
{"current_steps": 267, "total_steps": 500, "loss": 1.3181, "learning_rate": 5.522642316338268e-05, "epoch": 0.4746666666666667, "percentage": 53.4, "cur_time": "2024-09-19 14:49:03", "elapsed_time": "0:15:46", "remaining_time": "0:13:46", "throughput": 738.01, "total_tokens": 698576}
{"current_steps": 270, "total_steps": 500, "loss": 1.1667, "learning_rate": 5.418389216661579e-05, "epoch": 0.48, "percentage": 54.0, "cur_time": "2024-09-19 14:49:14", "elapsed_time": "0:15:57", "remaining_time": "0:13:35", "throughput": 739.49, "total_tokens": 707808}
{"current_steps": 273, "total_steps": 500, "loss": 1.2205, "learning_rate": 5.313952597646568e-05, "epoch": 0.48533333333333334, "percentage": 54.6, "cur_time": "2024-09-19 14:49:25", "elapsed_time": "0:16:07", "remaining_time": "0:13:24", "throughput": 737.69, "total_tokens": 714016}
{"current_steps": 276, "total_steps": 500, "loss": 1.2018, "learning_rate": 5.209378268645998e-05, "epoch": 0.49066666666666664, "percentage": 55.2, "cur_time": "2024-09-19 14:49:35", "elapsed_time": "0:16:18", "remaining_time": "0:13:14", "throughput": 737.36, "total_tokens": 721552}
{"current_steps": 279, "total_steps": 500, "loss": 1.3723, "learning_rate": 5.104712099416785e-05, "epoch": 0.496, "percentage": 55.8, "cur_time": "2024-09-19 14:49:46", "elapsed_time": "0:16:29", "remaining_time": "0:13:03", "throughput": 736.87, "total_tokens": 729104}
{"current_steps": 282, "total_steps": 500, "loss": 1.3686, "learning_rate": 5e-05, "epoch": 0.5013333333333333, "percentage": 56.4, "cur_time": "2024-09-19 14:49:57", "elapsed_time": "0:16:39", "remaining_time": "0:12:52", "throughput": 736.57, "total_tokens": 736496}
{"current_steps": 285, "total_steps": 500, "loss": 1.2406, "learning_rate": 4.895287900583216e-05, "epoch": 0.5066666666666667, "percentage": 57.0, "cur_time": "2024-09-19 14:50:08", "elapsed_time": "0:16:50", "remaining_time": "0:12:42", "throughput": 737.33, "total_tokens": 745184}
{"current_steps": 288, "total_steps": 500, "loss": 1.2107, "learning_rate": 4.790621731354003e-05, "epoch": 0.512, "percentage": 57.6, "cur_time": "2024-09-19 14:50:18", "elapsed_time": "0:17:01", "remaining_time": "0:12:31", "throughput": 736.43, "total_tokens": 752048}
{"current_steps": 291, "total_steps": 500, "loss": 1.2136, "learning_rate": 4.6860474023534335e-05, "epoch": 0.5173333333333333, "percentage": 58.2, "cur_time": "2024-09-19 14:50:29", "elapsed_time": "0:17:12", "remaining_time": "0:12:21", "throughput": 737.36, "total_tokens": 761088}
{"current_steps": 294, "total_steps": 500, "loss": 1.2939, "learning_rate": 4.5816107833384234e-05, "epoch": 0.5226666666666666, "percentage": 58.8, "cur_time": "2024-09-19 14:50:40", "elapsed_time": "0:17:22", "remaining_time": "0:12:10", "throughput": 736.53, "total_tokens": 768080}
{"current_steps": 297, "total_steps": 500, "loss": 1.2621, "learning_rate": 4.477357683661734e-05, "epoch": 0.528, "percentage": 59.4, "cur_time": "2024-09-19 14:50:51", "elapsed_time": "0:17:33", "remaining_time": "0:12:00", "throughput": 738.06, "total_tokens": 777808}
{"current_steps": 300, "total_steps": 500, "loss": 1.3968, "learning_rate": 4.373333832178478e-05, "epoch": 0.5333333333333333, "percentage": 60.0, "cur_time": "2024-09-19 14:51:02", "elapsed_time": "0:17:44", "remaining_time": "0:11:49", "throughput": 738.26, "total_tokens": 786032}
{"current_steps": 303, "total_steps": 500, "loss": 1.2069, "learning_rate": 4.269584857187943e-05, "epoch": 0.5386666666666666, "percentage": 60.6, "cur_time": "2024-09-19 14:51:12", "elapsed_time": "0:17:55", "remaining_time": "0:11:39", "throughput": 737.67, "total_tokens": 793424}
{"current_steps": 306, "total_steps": 500, "loss": 1.1725, "learning_rate": 4.166156266419489e-05, "epoch": 0.544, "percentage": 61.2, "cur_time": "2024-09-19 14:51:23", "elapsed_time": "0:18:06", "remaining_time": "0:11:28", "throughput": 737.78, "total_tokens": 801392}
{"current_steps": 309, "total_steps": 500, "loss": 1.1476, "learning_rate": 4.063093427071376e-05, "epoch": 0.5493333333333333, "percentage": 61.8, "cur_time": "2024-09-19 14:51:34", "elapsed_time": "0:18:16", "remaining_time": "0:11:17", "throughput": 737.79, "total_tokens": 809184}
{"current_steps": 312, "total_steps": 500, "loss": 1.3471, "learning_rate": 3.960441545911204e-05, "epoch": 0.5546666666666666, "percentage": 62.4, "cur_time": "2024-09-19 14:51:44", "elapsed_time": "0:18:27", "remaining_time": "0:11:07", "throughput": 737.77, "total_tokens": 816896}
{"current_steps": 315, "total_steps": 500, "loss": 1.3197, "learning_rate": 3.858245649446721e-05, "epoch": 0.56, "percentage": 63.0, "cur_time": "2024-09-19 14:51:55", "elapsed_time": "0:18:38", "remaining_time": "0:10:56", "throughput": 738.67, "total_tokens": 825952}
{"current_steps": 318, "total_steps": 500, "loss": 1.2391, "learning_rate": 3.756550564175727e-05, "epoch": 0.5653333333333334, "percentage": 63.6, "cur_time": "2024-09-19 14:52:06", "elapsed_time": "0:18:49", "remaining_time": "0:10:46", "throughput": 739.98, "total_tokens": 835664}
{"current_steps": 321, "total_steps": 500, "loss": 1.2011, "learning_rate": 3.655400896923672e-05, "epoch": 0.5706666666666667, "percentage": 64.2, "cur_time": "2024-09-19 14:52:17", "elapsed_time": "0:19:00", "remaining_time": "0:10:35", "throughput": 740.33, "total_tokens": 844288}
{"current_steps": 324, "total_steps": 500, "loss": 1.2983, "learning_rate": 3.554841015277641e-05, "epoch": 0.576, "percentage": 64.8, "cur_time": "2024-09-19 14:52:28", "elapsed_time": "0:19:11", "remaining_time": "0:10:25", "throughput": 739.48, "total_tokens": 851584}
{"current_steps": 327, "total_steps": 500, "loss": 1.3713, "learning_rate": 3.4549150281252636e-05, "epoch": 0.5813333333333334, "percentage": 65.4, "cur_time": "2024-09-19 14:52:40", "elapsed_time": "0:19:22", "remaining_time": "0:10:15", "throughput": 740.68, "total_tokens": 861216}
{"current_steps": 330, "total_steps": 500, "loss": 1.3763, "learning_rate": 3.355666766307084e-05, "epoch": 0.5866666666666667, "percentage": 66.0, "cur_time": "2024-09-19 14:52:51", "elapsed_time": "0:19:33", "remaining_time": "0:10:04", "throughput": 740.95, "total_tokens": 869792}
{"current_steps": 333, "total_steps": 500, "loss": 1.1632, "learning_rate": 3.257139763390925e-05, "epoch": 0.592, "percentage": 66.6, "cur_time": "2024-09-19 14:53:02", "elapsed_time": "0:19:44", "remaining_time": "0:09:54", "throughput": 740.41, "total_tokens": 877216}
{"current_steps": 336, "total_steps": 500, "loss": 1.2382, "learning_rate": 3.1593772365766105e-05, "epoch": 0.5973333333333334, "percentage": 67.2, "cur_time": "2024-09-19 14:53:12", "elapsed_time": "0:19:55", "remaining_time": "0:09:43", "throughput": 739.28, "total_tokens": 883488}
{"current_steps": 339, "total_steps": 500, "loss": 1.1966, "learning_rate": 3.062422067739485e-05, "epoch": 0.6026666666666667, "percentage": 67.8, "cur_time": "2024-09-19 14:53:23", "elapsed_time": "0:20:05", "remaining_time": "0:09:32", "throughput": 738.88, "total_tokens": 890816}
{"current_steps": 342, "total_steps": 500, "loss": 1.2934, "learning_rate": 2.9663167846209998e-05, "epoch": 0.608, "percentage": 68.4, "cur_time": "2024-09-19 14:53:33", "elapsed_time": "0:20:16", "remaining_time": "0:09:21", "throughput": 740.21, "total_tokens": 900352}
{"current_steps": 345, "total_steps": 500, "loss": 1.184, "learning_rate": 2.8711035421746367e-05, "epoch": 0.6133333333333333, "percentage": 69.0, "cur_time": "2024-09-19 14:53:44", "elapsed_time": "0:20:26", "remaining_time": "0:09:11", "throughput": 741.26, "total_tokens": 909312}
{"current_steps": 348, "total_steps": 500, "loss": 1.1655, "learning_rate": 2.776824104075364e-05, "epoch": 0.6186666666666667, "percentage": 69.6, "cur_time": "2024-09-19 14:53:54", "elapsed_time": "0:20:36", "remaining_time": "0:09:00", "throughput": 741.49, "total_tokens": 917168}
{"current_steps": 351, "total_steps": 500, "loss": 1.2241, "learning_rate": 2.6835198244006927e-05, "epoch": 0.624, "percentage": 70.2, "cur_time": "2024-09-19 14:54:04", "elapsed_time": "0:20:47", "remaining_time": "0:08:49", "throughput": 740.4, "total_tokens": 923424}
{"current_steps": 354, "total_steps": 500, "loss": 1.275, "learning_rate": 2.591231629491423e-05, "epoch": 0.6293333333333333, "percentage": 70.8, "cur_time": "2024-09-19 14:54:15", "elapsed_time": "0:20:57", "remaining_time": "0:08:38", "throughput": 740.37, "total_tokens": 931152}
{"current_steps": 357, "total_steps": 500, "loss": 1.2231, "learning_rate": 2.500000000000001e-05, "epoch": 0.6346666666666667, "percentage": 71.4, "cur_time": "2024-09-19 14:54:25", "elapsed_time": "0:21:08", "remaining_time": "0:08:28", "throughput": 741.9, "total_tokens": 940992}
{"current_steps": 360, "total_steps": 500, "loss": 1.3608, "learning_rate": 2.4098649531343497e-05, "epoch": 0.64, "percentage": 72.0, "cur_time": "2024-09-19 14:54:36", "elapsed_time": "0:21:18", "remaining_time": "0:08:17", "throughput": 742.73, "total_tokens": 949936}
{"current_steps": 363, "total_steps": 500, "loss": 1.3481, "learning_rate": 2.3208660251050158e-05, "epoch": 0.6453333333333333, "percentage": 72.6, "cur_time": "2024-09-19 14:54:47", "elapsed_time": "0:21:29", "remaining_time": "0:08:06", "throughput": 743.06, "total_tokens": 958432}
{"current_steps": 366, "total_steps": 500, "loss": 1.2009, "learning_rate": 2.23304225378328e-05, "epoch": 0.6506666666666666, "percentage": 73.2, "cur_time": "2024-09-19 14:54:58", "elapsed_time": "0:21:40", "remaining_time": "0:07:56", "throughput": 743.06, "total_tokens": 966688}
{"current_steps": 369, "total_steps": 500, "loss": 1.1758, "learning_rate": 2.1464321615778422e-05, "epoch": 0.656, "percentage": 73.8, "cur_time": "2024-09-19 14:55:09", "elapsed_time": "0:21:52", "remaining_time": "0:07:45", "throughput": 742.21, "total_tokens": 973872}
{"current_steps": 372, "total_steps": 500, "loss": 1.3557, "learning_rate": 2.061073738537635e-05, "epoch": 0.6613333333333333, "percentage": 74.4, "cur_time": "2024-09-19 14:55:20", "elapsed_time": "0:22:03", "remaining_time": "0:07:35", "throughput": 741.58, "total_tokens": 981264}
{"current_steps": 375, "total_steps": 500, "loss": 1.4066, "learning_rate": 1.977004425688126e-05, "epoch": 0.6666666666666666, "percentage": 75.0, "cur_time": "2024-09-19 14:55:31", "elapsed_time": "0:22:14", "remaining_time": "0:07:24", "throughput": 741.2, "total_tokens": 988800}
{"current_steps": 378, "total_steps": 500, "loss": 1.4266, "learning_rate": 1.8942610986084486e-05, "epoch": 0.672, "percentage": 75.6, "cur_time": "2024-09-19 14:55:41", "elapsed_time": "0:22:24", "remaining_time": "0:07:13", "throughput": 741.3, "total_tokens": 996656}
{"current_steps": 381, "total_steps": 500, "loss": 1.2348, "learning_rate": 1.8128800512565513e-05, "epoch": 0.6773333333333333, "percentage": 76.2, "cur_time": "2024-09-19 14:55:52", "elapsed_time": "0:22:34", "remaining_time": "0:07:03", "throughput": 741.62, "total_tokens": 1004672}
{"current_steps": 384, "total_steps": 500, "loss": 1.2, "learning_rate": 1.7328969800494726e-05, "epoch": 0.6826666666666666, "percentage": 76.8, "cur_time": "2024-09-19 14:56:02", "elapsed_time": "0:22:44", "remaining_time": "0:06:52", "throughput": 741.71, "total_tokens": 1012400}
{"current_steps": 387, "total_steps": 500, "loss": 1.2074, "learning_rate": 1.6543469682057106e-05, "epoch": 0.688, "percentage": 77.4, "cur_time": "2024-09-19 14:56:12", "elapsed_time": "0:22:55", "remaining_time": "0:06:41", "throughput": 740.89, "total_tokens": 1018768}
{"current_steps": 390, "total_steps": 500, "loss": 1.3071, "learning_rate": 1.5772644703565565e-05, "epoch": 0.6933333333333334, "percentage": 78.0, "cur_time": "2024-09-19 14:56:22", "elapsed_time": "0:23:05", "remaining_time": "0:06:30", "throughput": 740.9, "total_tokens": 1026368}
{"current_steps": 393, "total_steps": 500, "loss": 1.3392, "learning_rate": 1.5016832974331724e-05, "epoch": 0.6986666666666667, "percentage": 78.6, "cur_time": "2024-09-19 14:56:32", "elapsed_time": "0:23:15", "remaining_time": "0:06:19", "throughput": 740.7, "total_tokens": 1033664}
{"current_steps": 396, "total_steps": 500, "loss": 1.2203, "learning_rate": 1.4276366018359844e-05, "epoch": 0.704, "percentage": 79.2, "cur_time": "2024-09-19 14:56:43", "elapsed_time": "0:23:25", "remaining_time": "0:06:09", "throughput": 741.61, "total_tokens": 1042560}
{"current_steps": 399, "total_steps": 500, "loss": 1.3089, "learning_rate": 1.3551568628929434e-05, "epoch": 0.7093333333333334, "percentage": 79.8, "cur_time": "2024-09-19 14:56:53", "elapsed_time": "0:23:36", "remaining_time": "0:05:58", "throughput": 741.57, "total_tokens": 1050336}
{"current_steps": 402, "total_steps": 500, "loss": 1.3856, "learning_rate": 1.2842758726130283e-05, "epoch": 0.7146666666666667, "percentage": 80.4, "cur_time": "2024-09-19 14:57:04", "elapsed_time": "0:23:47", "remaining_time": "0:05:47", "throughput": 741.16, "total_tokens": 1057824}
{"current_steps": 405, "total_steps": 500, "loss": 1.4345, "learning_rate": 1.2150247217412186e-05, "epoch": 0.72, "percentage": 81.0, "cur_time": "2024-09-19 14:57:15", "elapsed_time": "0:23:58", "remaining_time": "0:05:37", "throughput": 740.73, "total_tokens": 1065184}
{"current_steps": 408, "total_steps": 500, "loss": 1.3059, "learning_rate": 1.1474337861210543e-05, "epoch": 0.7253333333333334, "percentage": 81.6, "cur_time": "2024-09-19 14:57:25", "elapsed_time": "0:24:08", "remaining_time": "0:05:26", "throughput": 740.11, "total_tokens": 1072016}
{"current_steps": 411, "total_steps": 500, "loss": 1.1952, "learning_rate": 1.0815327133708015e-05, "epoch": 0.7306666666666667, "percentage": 82.2, "cur_time": "2024-09-19 14:57:36", "elapsed_time": "0:24:19", "remaining_time": "0:05:15", "throughput": 739.9, "total_tokens": 1079584}
{"current_steps": 414, "total_steps": 500, "loss": 1.1284, "learning_rate": 1.0173504098790187e-05, "epoch": 0.736, "percentage": 82.8, "cur_time": "2024-09-19 14:57:46", "elapsed_time": "0:24:29", "remaining_time": "0:05:05", "throughput": 740.43, "total_tokens": 1087984}
{"current_steps": 417, "total_steps": 500, "loss": 1.1887, "learning_rate": 9.549150281252633e-06, "epoch": 0.7413333333333333, "percentage": 83.4, "cur_time": "2024-09-19 14:57:57", "elapsed_time": "0:24:39", "remaining_time": "0:04:54", "throughput": 740.7, "total_tokens": 1096032}
{"current_steps": 420, "total_steps": 500, "loss": 1.1508, "learning_rate": 8.9425395433148e-06, "epoch": 0.7466666666666667, "percentage": 84.0, "cur_time": "2024-09-19 14:58:07", "elapsed_time": "0:24:50", "remaining_time": "0:04:43", "throughput": 742.05, "total_tokens": 1105696}
{"current_steps": 423, "total_steps": 500, "loss": 1.2795, "learning_rate": 8.353937964495029e-06, "epoch": 0.752, "percentage": 84.6, "cur_time": "2024-09-19 14:58:17", "elapsed_time": "0:25:00", "remaining_time": "0:04:33", "throughput": 742.46, "total_tokens": 1114048}
{"current_steps": 426, "total_steps": 500, "loss": 1.4123, "learning_rate": 7.783603724899257e-06, "epoch": 0.7573333333333333, "percentage": 85.2, "cur_time": "2024-09-19 14:58:28", "elapsed_time": "0:25:11", "remaining_time": "0:04:22", "throughput": 742.87, "total_tokens": 1122528}
{"current_steps": 429, "total_steps": 500, "loss": 1.2925, "learning_rate": 7.2317869919746705e-06, "epoch": 0.7626666666666667, "percentage": 85.8, "cur_time": "2024-09-19 14:58:39", "elapsed_time": "0:25:21", "remaining_time": "0:04:11", "throughput": 743.46, "total_tokens": 1131328}
{"current_steps": 432, "total_steps": 500, "loss": 1.1505, "learning_rate": 6.698729810778065e-06, "epoch": 0.768, "percentage": 86.4, "cur_time": "2024-09-19 14:58:50", "elapsed_time": "0:25:32", "remaining_time": "0:04:01", "throughput": 743.16, "total_tokens": 1138992}
{"current_steps": 435, "total_steps": 500, "loss": 1.3429, "learning_rate": 6.184665997806832e-06, "epoch": 0.7733333333333333, "percentage": 87.0, "cur_time": "2024-09-19 14:59:00", "elapsed_time": "0:25:42", "remaining_time": "0:03:50", "throughput": 743.91, "total_tokens": 1147776}
{"current_steps": 438, "total_steps": 500, "loss": 1.2439, "learning_rate": 5.689821038439263e-06, "epoch": 0.7786666666666666, "percentage": 87.6, "cur_time": "2024-09-19 14:59:11", "elapsed_time": "0:25:53", "remaining_time": "0:03:39", "throughput": 744.84, "total_tokens": 1157360}
{"current_steps": 441, "total_steps": 500, "loss": 1.2647, "learning_rate": 5.214411988029355e-06, "epoch": 0.784, "percentage": 88.2, "cur_time": "2024-09-19 14:59:21", "elapsed_time": "0:26:04", "remaining_time": "0:03:29", "throughput": 745.41, "total_tokens": 1166256}
{"current_steps": 444, "total_steps": 500, "loss": 1.3183, "learning_rate": 4.758647376699032e-06, "epoch": 0.7893333333333333, "percentage": 88.8, "cur_time": "2024-09-19 14:59:32", "elapsed_time": "0:26:15", "remaining_time": "0:03:18", "throughput": 745.27, "total_tokens": 1173840}
{"current_steps": 447, "total_steps": 500, "loss": 1.3921, "learning_rate": 4.322727117869951e-06, "epoch": 0.7946666666666666, "percentage": 89.4, "cur_time": "2024-09-19 14:59:42", "elapsed_time": "0:26:25", "remaining_time": "0:03:07", "throughput": 744.81, "total_tokens": 1180848}
{"current_steps": 450, "total_steps": 500, "loss": 1.3069, "learning_rate": 3.90684242057498e-06, "epoch": 0.8, "percentage": 90.0, "cur_time": "2024-09-19 14:59:53", "elapsed_time": "0:26:36", "remaining_time": "0:02:57", "throughput": 745.37, "total_tokens": 1189952}
{"current_steps": 453, "total_steps": 500, "loss": 1.228, "learning_rate": 3.511175705587433e-06, "epoch": 0.8053333333333333, "percentage": 90.6, "cur_time": "2024-09-19 15:00:04", "elapsed_time": "0:26:47", "remaining_time": "0:02:46", "throughput": 745.54, "total_tokens": 1198336}
{"current_steps": 456, "total_steps": 500, "loss": 1.264, "learning_rate": 3.1359005254054273e-06, "epoch": 0.8106666666666666, "percentage": 91.2, "cur_time": "2024-09-19 15:00:15", "elapsed_time": "0:26:57", "remaining_time": "0:02:36", "throughput": 745.65, "total_tokens": 1206304}
{"current_steps": 459, "total_steps": 500, "loss": 1.3767, "learning_rate": 2.7811814881259503e-06, "epoch": 0.816, "percentage": 91.8, "cur_time": "2024-09-19 15:00:25", "elapsed_time": "0:27:08", "remaining_time": "0:02:25", "throughput": 745.73, "total_tokens": 1214480}
{"current_steps": 462, "total_steps": 500, "loss": 1.2163, "learning_rate": 2.4471741852423237e-06, "epoch": 0.8213333333333334, "percentage": 92.4, "cur_time": "2024-09-19 15:00:36", "elapsed_time": "0:27:19", "remaining_time": "0:02:14", "throughput": 745.84, "total_tokens": 1222608}
{"current_steps": 465, "total_steps": 500, "loss": 1.3609, "learning_rate": 2.134025123396638e-06, "epoch": 0.8266666666666667, "percentage": 93.0, "cur_time": "2024-09-19 15:00:47", "elapsed_time": "0:27:29", "remaining_time": "0:02:04", "throughput": 745.32, "total_tokens": 1229568}
{"current_steps": 468, "total_steps": 500, "loss": 1.2252, "learning_rate": 1.841871660117095e-06, "epoch": 0.832, "percentage": 93.6, "cur_time": "2024-09-19 15:00:57", "elapsed_time": "0:27:40", "remaining_time": "0:01:53", "throughput": 744.92, "total_tokens": 1236656}
{"current_steps": 471, "total_steps": 500, "loss": 1.3446, "learning_rate": 1.5708419435684462e-06, "epoch": 0.8373333333333334, "percentage": 94.2, "cur_time": "2024-09-19 15:01:08", "elapsed_time": "0:27:51", "remaining_time": "0:01:42", "throughput": 744.33, "total_tokens": 1243776}
{"current_steps": 474, "total_steps": 500, "loss": 1.1298, "learning_rate": 1.3210548563419856e-06, "epoch": 0.8426666666666667, "percentage": 94.8, "cur_time": "2024-09-19 15:01:19", "elapsed_time": "0:28:02", "remaining_time": "0:01:32", "throughput": 745.02, "total_tokens": 1253296}
{"current_steps": 477, "total_steps": 500, "loss": 1.4383, "learning_rate": 1.0926199633097157e-06, "epoch": 0.848, "percentage": 95.4, "cur_time": "2024-09-19 15:01:30", "elapsed_time": "0:28:12", "remaining_time": "0:01:21", "throughput": 745.14, "total_tokens": 1261440}
{"current_steps": 480, "total_steps": 500, "loss": 1.4809, "learning_rate": 8.856374635655695e-07, "epoch": 0.8533333333333334, "percentage": 96.0, "cur_time": "2024-09-19 15:01:41", "elapsed_time": "0:28:23", "remaining_time": "0:01:10", "throughput": 744.5, "total_tokens": 1268336}
{"current_steps": 483, "total_steps": 500, "loss": 1.4316, "learning_rate": 7.001981464747565e-07, "epoch": 0.8586666666666667, "percentage": 96.6, "cur_time": "2024-09-19 15:01:51", "elapsed_time": "0:28:33", "remaining_time": "0:01:00", "throughput": 744.62, "total_tokens": 1276224}
{"current_steps": 486, "total_steps": 500, "loss": 1.1178, "learning_rate": 5.363833518505834e-07, "epoch": 0.864, "percentage": 97.2, "cur_time": "2024-09-19 15:02:01", "elapsed_time": "0:28:44", "remaining_time": "0:00:49", "throughput": 744.67, "total_tokens": 1284112}
{"current_steps": 489, "total_steps": 500, "loss": 1.3764, "learning_rate": 3.9426493427611177e-07, "epoch": 0.8693333333333333, "percentage": 97.8, "cur_time": "2024-09-19 15:02:12", "elapsed_time": "0:28:54", "remaining_time": "0:00:39", "throughput": 745.29, "total_tokens": 1292800}
{"current_steps": 492, "total_steps": 500, "loss": 1.4021, "learning_rate": 2.7390523158633554e-07, "epoch": 0.8746666666666667, "percentage": 98.4, "cur_time": "2024-09-19 15:02:22", "elapsed_time": "0:29:04", "remaining_time": "0:00:28", "throughput": 745.42, "total_tokens": 1300656}
{"current_steps": 495, "total_steps": 500, "loss": 1.3095, "learning_rate": 1.753570375247815e-07, "epoch": 0.88, "percentage": 99.0, "cur_time": "2024-09-19 15:02:32", "elapsed_time": "0:29:15", "remaining_time": "0:00:17", "throughput": 746.25, "total_tokens": 1309792}
{"current_steps": 498, "total_steps": 500, "loss": 1.23, "learning_rate": 9.866357858642205e-08, "epoch": 0.8853333333333333, "percentage": 99.6, "cur_time": "2024-09-19 15:02:43", "elapsed_time": "0:29:25", "remaining_time": "0:00:07", "throughput": 745.22, "total_tokens": 1316000}
{"current_steps": 500, "total_steps": 500, "eval_loss": 1.272219181060791, "epoch": 0.8888888888888888, "percentage": 100.0, "cur_time": "2024-09-19 15:03:46", "elapsed_time": "0:30:29", "remaining_time": "0:00:00", "throughput": 722.22, "total_tokens": 1321104}
{"current_steps": 500, "total_steps": 500, "epoch": 0.8888888888888888, "percentage": 100.0, "cur_time": "2024-09-19 15:03:47", "elapsed_time": "0:30:30", "remaining_time": "0:00:00", "throughput": 721.82, "total_tokens": 1321104}

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB