train: done test1

This commit is contained in:
wql 2024-08-22 06:46:44 +00:00
parent 7f5b10d654
commit fdae778fa7
55 changed files with 288643 additions and 0 deletions

View File

@ -0,0 +1,70 @@
---
base_model: /home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms
library_name: peft
license: other
tags:
- llama-factory
- lora
- generated_from_trainer
model-index:
- name: test1
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# test1
This model is a fine-tuned version of [/home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms](https://huggingface.co//home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms) on the belle_1m dataset.
It achieves the following results on the evaluation set:
- Loss: nan
- Num Input Tokens Seen: 37262192
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 7
- gradient_accumulation_steps: 8
- total_train_batch_size: 112
- total_eval_batch_size: 14
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 1000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:-------:|:----:|:---------------:|:-----------------:|
| 6.5026 | 6.2208 | 500 | nan | 18659952 |
| 11.689 | 12.4417 | 1000 | nan | 37262192 |
### Framework versions
- PEFT 0.12.0
- Transformers 4.43.4
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1

View File

@ -0,0 +1,34 @@
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "/home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"gate_proj",
"q_proj",
"o_proj",
"down_proj",
"v_proj",
"k_proj",
"up_proj"
],
"task_type": "CAUSAL_LM",
"use_dora": false,
"use_rslora": false
}

View File

@ -0,0 +1,3 @@
{
"<pad>": 32000
}

View File

@ -0,0 +1,14 @@
{
"epoch": 12.441679626749611,
"eval_loss": NaN,
"eval_runtime": 11.4932,
"eval_samples_per_second": 87.008,
"eval_steps_per_second": 6.265,
"num_input_tokens_seen": 37262192,
"total_flos": 1.4816935215166915e+18,
"train_loss": 12.00993843150139,
"train_runtime": 2977.5083,
"train_samples_per_second": 37.615,
"train_steps_per_second": 0.336,
"train_tokens_per_second": 2751.294
}

View File

@ -0,0 +1,202 @@
---
base_model: /home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms
library_name: peft
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
### Framework versions
- PEFT 0.12.0

View File

@ -0,0 +1,34 @@
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "/home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"gate_proj",
"q_proj",
"o_proj",
"down_proj",
"v_proj",
"k_proj",
"up_proj"
],
"task_type": "CAUSAL_LM",
"use_dora": false,
"use_rslora": false
}

View File

@ -0,0 +1,3 @@
{
"<pad>": 32000
}

View File

@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,52 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": null,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "<pad>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
}
},
"bos_token": "<s>",
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<unk>",
"padding_side": "right",
"sp_model_kwargs": {},
"split_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,202 @@
---
base_model: /home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms
library_name: peft
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
### Framework versions
- PEFT 0.12.0

View File

@ -0,0 +1,34 @@
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "/home/user/.cache/modelscope/hub/modelscope/Llama-2-7b-ms",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"gate_proj",
"q_proj",
"o_proj",
"down_proj",
"v_proj",
"k_proj",
"up_proj"
],
"task_type": "CAUSAL_LM",
"use_dora": false,
"use_rslora": false
}

View File

@ -0,0 +1,3 @@
{
"<pad>": 32000
}

View File

@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,52 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": null,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "<pad>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
}
},
"bos_token": "<s>",
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<unk>",
"padding_side": "right",
"sp_model_kwargs": {},
"split_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,8 @@
{
"epoch": 12.441679626749611,
"eval_loss": NaN,
"eval_runtime": 11.4932,
"eval_samples_per_second": 87.008,
"eval_steps_per_second": 6.265,
"num_input_tokens_seen": 37262192
}

View File

@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@ -0,0 +1,52 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": null,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "<pad>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
}
},
"bos_token": "<s>",
"chat_template": "{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\n' + system_message + '\n<</SYS>>\n\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<unk>",
"padding_side": "right",
"sp_model_kwargs": {},
"split_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}

View File

@ -0,0 +1,10 @@
{
"epoch": 12.441679626749611,
"num_input_tokens_seen": 37262192,
"total_flos": 1.4816935215166915e+18,
"train_loss": 12.00993843150139,
"train_runtime": 2977.5083,
"train_samples_per_second": 37.615,
"train_steps_per_second": 0.336,
"train_tokens_per_second": 2751.294
}

View File

@ -0,0 +1,336 @@
{"current_steps": 3, "total_steps": 1000, "loss": 9.2998, "learning_rate": 0.0, "epoch": 0.03732503888024884, "percentage": 0.3, "cur_time": "2024-08-22 05:35:38", "elapsed_time": "0:00:07", "remaining_time": "0:44:08", "throughput": "14913.59", "total_tokens": 118832}
{"current_steps": 6, "total_steps": 1000, "loss": 12.0759, "learning_rate": 0.0, "epoch": 0.07465007776049767, "percentage": 0.6, "cur_time": "2024-08-22 05:35:45", "elapsed_time": "0:00:14", "remaining_time": "0:40:29", "throughput": "15122.18", "total_tokens": 221744}
{"current_steps": 9, "total_steps": 1000, "loss": 19.6855, "learning_rate": 0.0, "epoch": 0.1119751166407465, "percentage": 0.9, "cur_time": "2024-08-22 05:35:52", "elapsed_time": "0:00:21", "remaining_time": "0:39:01", "throughput": "15544.57", "total_tokens": 330496}
{"current_steps": 12, "total_steps": 1000, "loss": 13.653, "learning_rate": 0.0, "epoch": 0.14930015552099535, "percentage": 1.2, "cur_time": "2024-08-22 05:35:58", "elapsed_time": "0:00:28", "remaining_time": "0:38:25", "throughput": "15778.40", "total_tokens": 441856}
{"current_steps": 15, "total_steps": 1000, "loss": 11.6583, "learning_rate": 0.0, "epoch": 0.18662519440124417, "percentage": 1.5, "cur_time": "2024-08-22 05:36:05", "elapsed_time": "0:00:34", "remaining_time": "0:37:48", "throughput": "16088.25", "total_tokens": 555712}
{"current_steps": 18, "total_steps": 1000, "loss": 6.0712, "learning_rate": 0.0, "epoch": 0.223950233281493, "percentage": 1.8, "cur_time": "2024-08-22 05:36:12", "elapsed_time": "0:00:41", "remaining_time": "0:37:23", "throughput": "16065.89", "total_tokens": 660672}
{"current_steps": 21, "total_steps": 1000, "loss": 14.2505, "learning_rate": 0.0, "epoch": 0.26127527216174184, "percentage": 2.1, "cur_time": "2024-08-22 05:36:18", "elapsed_time": "0:00:47", "remaining_time": "0:37:16", "throughput": "16186.32", "total_tokens": 776560}
{"current_steps": 24, "total_steps": 1000, "loss": 8.3573, "learning_rate": 0.0, "epoch": 0.2986003110419907, "percentage": 2.4, "cur_time": "2024-08-22 05:36:25", "elapsed_time": "0:00:54", "remaining_time": "0:36:41", "throughput": "16220.75", "total_tokens": 878160}
{"current_steps": 27, "total_steps": 1000, "loss": 9.2897, "learning_rate": 0.0, "epoch": 0.3359253499222395, "percentage": 2.7, "cur_time": "2024-08-22 05:36:32", "elapsed_time": "0:01:01", "remaining_time": "0:36:44", "throughput": "16277.65", "total_tokens": 995808}
{"current_steps": 30, "total_steps": 1000, "loss": 11.2486, "learning_rate": 0.0, "epoch": 0.37325038880248834, "percentage": 3.0, "cur_time": "2024-08-22 05:36:39", "elapsed_time": "0:01:08", "remaining_time": "0:36:45", "throughput": "16223.91", "total_tokens": 1106496}
{"current_steps": 33, "total_steps": 1000, "loss": 13.309, "learning_rate": 0.0, "epoch": 0.4105754276827372, "percentage": 3.3, "cur_time": "2024-08-22 05:36:45", "elapsed_time": "0:01:14", "remaining_time": "0:36:36", "throughput": "16145.37", "total_tokens": 1210208}
{"current_steps": 36, "total_steps": 1000, "loss": 16.2104, "learning_rate": 0.0, "epoch": 0.447900466562986, "percentage": 3.6, "cur_time": "2024-08-22 05:36:52", "elapsed_time": "0:01:21", "remaining_time": "0:36:23", "throughput": "16242.75", "total_tokens": 1324352}
{"current_steps": 39, "total_steps": 1000, "loss": 5.2954, "learning_rate": 0.0, "epoch": 0.48522550544323484, "percentage": 3.9, "cur_time": "2024-08-22 05:36:59", "elapsed_time": "0:01:28", "remaining_time": "0:36:15", "throughput": "16240.09", "total_tokens": 1433792}
{"current_steps": 42, "total_steps": 1000, "loss": 10.5451, "learning_rate": 0.0, "epoch": 0.5225505443234837, "percentage": 4.2, "cur_time": "2024-08-22 05:37:05", "elapsed_time": "0:01:34", "remaining_time": "0:36:03", "throughput": "16237.46", "total_tokens": 1539888}
{"current_steps": 45, "total_steps": 1000, "loss": 19.6801, "learning_rate": 0.0, "epoch": 0.5598755832037325, "percentage": 4.5, "cur_time": "2024-08-22 05:37:13", "elapsed_time": "0:01:42", "remaining_time": "0:36:06", "throughput": "16243.16", "total_tokens": 1658080}
{"current_steps": 48, "total_steps": 1000, "loss": 8.8696, "learning_rate": 0.0, "epoch": 0.5972006220839814, "percentage": 4.8, "cur_time": "2024-08-22 05:37:20", "elapsed_time": "0:01:49", "remaining_time": "0:36:11", "throughput": "16210.51", "total_tokens": 1774576}
{"current_steps": 51, "total_steps": 1000, "loss": 23.9679, "learning_rate": 0.0, "epoch": 0.6345256609642301, "percentage": 5.1, "cur_time": "2024-08-22 05:37:27", "elapsed_time": "0:01:56", "remaining_time": "0:36:05", "throughput": "16223.73", "total_tokens": 1888096}
{"current_steps": 54, "total_steps": 1000, "loss": 8.9261, "learning_rate": 0.0, "epoch": 0.671850699844479, "percentage": 5.4, "cur_time": "2024-08-22 05:37:34", "elapsed_time": "0:02:03", "remaining_time": "0:35:59", "throughput": "16235.34", "total_tokens": 2001680}
{"current_steps": 57, "total_steps": 1000, "loss": 6.7411, "learning_rate": 0.0, "epoch": 0.7091757387247278, "percentage": 5.7, "cur_time": "2024-08-22 05:37:40", "elapsed_time": "0:02:09", "remaining_time": "0:35:46", "throughput": "16228.02", "total_tokens": 2105744}
{"current_steps": 60, "total_steps": 1000, "loss": 9.0461, "learning_rate": 0.0, "epoch": 0.7465007776049767, "percentage": 6.0, "cur_time": "2024-08-22 05:37:47", "elapsed_time": "0:02:16", "remaining_time": "0:35:39", "throughput": "16223.42", "total_tokens": 2215056}
{"current_steps": 63, "total_steps": 1000, "loss": 11.6066, "learning_rate": 0.0, "epoch": 0.7838258164852255, "percentage": 6.3, "cur_time": "2024-08-22 05:37:54", "elapsed_time": "0:02:23", "remaining_time": "0:35:36", "throughput": "16274.74", "total_tokens": 2337728}
{"current_steps": 66, "total_steps": 1000, "loss": 12.4327, "learning_rate": 0.0, "epoch": 0.8211508553654744, "percentage": 6.6, "cur_time": "2024-08-22 05:38:01", "elapsed_time": "0:02:30", "remaining_time": "0:35:34", "throughput": "16257.12", "total_tokens": 2451904}
{"current_steps": 69, "total_steps": 1000, "loss": 8.7956, "learning_rate": 0.0, "epoch": 0.8584758942457231, "percentage": 6.9, "cur_time": "2024-08-22 05:38:08", "elapsed_time": "0:02:37", "remaining_time": "0:35:29", "throughput": "16207.46", "total_tokens": 2558544}
{"current_steps": 72, "total_steps": 1000, "loss": 15.0004, "learning_rate": 0.0, "epoch": 0.895800933125972, "percentage": 7.2, "cur_time": "2024-08-22 05:38:16", "elapsed_time": "0:02:45", "remaining_time": "0:35:27", "throughput": "16190.00", "total_tokens": 2672672}
{"current_steps": 75, "total_steps": 1000, "loss": 17.6262, "learning_rate": 0.0, "epoch": 0.9331259720062208, "percentage": 7.5, "cur_time": "2024-08-22 05:38:23", "elapsed_time": "0:02:52", "remaining_time": "0:35:22", "throughput": "16181.41", "total_tokens": 2784704}
{"current_steps": 78, "total_steps": 1000, "loss": 14.3673, "learning_rate": 0.0, "epoch": 0.9704510108864697, "percentage": 7.8, "cur_time": "2024-08-22 05:38:30", "elapsed_time": "0:02:59", "remaining_time": "0:35:20", "throughput": "16145.32", "total_tokens": 2895952}
{"current_steps": 81, "total_steps": 1000, "loss": 10.5809, "learning_rate": 0.0, "epoch": 1.0077760497667185, "percentage": 8.1, "cur_time": "2024-08-22 05:38:37", "elapsed_time": "0:03:06", "remaining_time": "0:35:13", "throughput": "16136.33", "total_tokens": 3005872}
{"current_steps": 84, "total_steps": 1000, "loss": 11.7681, "learning_rate": 0.0, "epoch": 1.0451010886469674, "percentage": 8.4, "cur_time": "2024-08-22 05:38:44", "elapsed_time": "0:03:13", "remaining_time": "0:35:06", "throughput": "16157.65", "total_tokens": 3121712}
{"current_steps": 87, "total_steps": 1000, "loss": 10.2302, "learning_rate": 0.0, "epoch": 1.0824261275272162, "percentage": 8.7, "cur_time": "2024-08-22 05:38:50", "elapsed_time": "0:03:19", "remaining_time": "0:34:56", "throughput": "16160.99", "total_tokens": 3228464}
{"current_steps": 90, "total_steps": 1000, "loss": 7.6525, "learning_rate": 0.0, "epoch": 1.119751166407465, "percentage": 9.0, "cur_time": "2024-08-22 05:38:57", "elapsed_time": "0:03:26", "remaining_time": "0:34:49", "throughput": "16166.31", "total_tokens": 3341504}
{"current_steps": 93, "total_steps": 1000, "loss": 14.1201, "learning_rate": 0.0, "epoch": 1.157076205287714, "percentage": 9.3, "cur_time": "2024-08-22 05:39:05", "elapsed_time": "0:03:34", "remaining_time": "0:34:51", "throughput": "16168.39", "total_tokens": 3466688}
{"current_steps": 96, "total_steps": 1000, "loss": 11.1454, "learning_rate": 0.0, "epoch": 1.1944012441679628, "percentage": 9.6, "cur_time": "2024-08-22 05:39:12", "elapsed_time": "0:03:41", "remaining_time": "0:34:45", "throughput": "16136.77", "total_tokens": 3573200}
{"current_steps": 99, "total_steps": 1000, "loss": 16.2573, "learning_rate": 0.0, "epoch": 1.2317262830482114, "percentage": 9.9, "cur_time": "2024-08-22 05:39:18", "elapsed_time": "0:03:48", "remaining_time": "0:34:35", "throughput": "16138.64", "total_tokens": 3679728}
{"current_steps": 102, "total_steps": 1000, "loss": 8.3276, "learning_rate": 0.0, "epoch": 1.2690513219284603, "percentage": 10.2, "cur_time": "2024-08-22 05:39:26", "elapsed_time": "0:03:55", "remaining_time": "0:34:32", "throughput": "16130.77", "total_tokens": 3796640}
{"current_steps": 105, "total_steps": 1000, "loss": 8.4286, "learning_rate": 0.0, "epoch": 1.3063763608087091, "percentage": 10.5, "cur_time": "2024-08-22 05:39:33", "elapsed_time": "0:04:02", "remaining_time": "0:34:24", "throughput": "16117.51", "total_tokens": 3902912}
{"current_steps": 108, "total_steps": 1000, "loss": 9.6022, "learning_rate": 0.0, "epoch": 1.343701399688958, "percentage": 10.8, "cur_time": "2024-08-22 05:39:39", "elapsed_time": "0:04:08", "remaining_time": "0:34:15", "throughput": "16102.63", "total_tokens": 4008000}
{"current_steps": 111, "total_steps": 1000, "loss": 6.8597, "learning_rate": 0.0, "epoch": 1.3810264385692068, "percentage": 11.1, "cur_time": "2024-08-22 05:39:46", "elapsed_time": "0:04:15", "remaining_time": "0:34:05", "throughput": "16113.39", "total_tokens": 4115280}
{"current_steps": 114, "total_steps": 1000, "loss": 12.0886, "learning_rate": 0.0, "epoch": 1.4183514774494557, "percentage": 11.4, "cur_time": "2024-08-22 05:39:53", "elapsed_time": "0:04:22", "remaining_time": "0:33:58", "throughput": "16123.99", "total_tokens": 4229648}
{"current_steps": 117, "total_steps": 1000, "loss": 13.8743, "learning_rate": 0.0, "epoch": 1.4556765163297045, "percentage": 11.7, "cur_time": "2024-08-22 05:40:00", "elapsed_time": "0:04:29", "remaining_time": "0:33:55", "throughput": "16112.58", "total_tokens": 4345392}
{"current_steps": 120, "total_steps": 1000, "loss": 15.7739, "learning_rate": 0.0, "epoch": 1.4930015552099534, "percentage": 12.0, "cur_time": "2024-08-22 05:40:07", "elapsed_time": "0:04:36", "remaining_time": "0:33:49", "throughput": "16106.95", "total_tokens": 4456880}
{"current_steps": 123, "total_steps": 1000, "loss": 7.1596, "learning_rate": 0.0, "epoch": 1.5303265940902022, "percentage": 12.3, "cur_time": "2024-08-22 05:40:14", "elapsed_time": "0:04:43", "remaining_time": "0:33:43", "throughput": "16116.23", "total_tokens": 4574288}
{"current_steps": 126, "total_steps": 1000, "loss": 8.1217, "learning_rate": 0.0, "epoch": 1.5676516329704508, "percentage": 12.6, "cur_time": "2024-08-22 05:40:21", "elapsed_time": "0:04:50", "remaining_time": "0:33:33", "throughput": "16112.71", "total_tokens": 4676256}
{"current_steps": 129, "total_steps": 1000, "loss": 28.1282, "learning_rate": 0.0, "epoch": 1.6049766718506997, "percentage": 12.9, "cur_time": "2024-08-22 05:40:28", "elapsed_time": "0:04:57", "remaining_time": "0:33:26", "throughput": "16122.00", "total_tokens": 4791120}
{"current_steps": 132, "total_steps": 1000, "loss": 6.5505, "learning_rate": 0.0, "epoch": 1.6423017107309485, "percentage": 13.2, "cur_time": "2024-08-22 05:40:34", "elapsed_time": "0:05:03", "remaining_time": "0:33:15", "throughput": "16153.46", "total_tokens": 4902480}
{"current_steps": 135, "total_steps": 1000, "loss": 11.5594, "learning_rate": 0.0, "epoch": 1.6796267496111974, "percentage": 13.5, "cur_time": "2024-08-22 05:40:41", "elapsed_time": "0:05:10", "remaining_time": "0:33:10", "throughput": "16126.15", "total_tokens": 5010624}
{"current_steps": 138, "total_steps": 1000, "loss": 11.3297, "learning_rate": 0.0, "epoch": 1.7169517884914463, "percentage": 13.8, "cur_time": "2024-08-22 05:40:48", "elapsed_time": "0:05:17", "remaining_time": "0:33:01", "throughput": "16122.05", "total_tokens": 5115552}
{"current_steps": 141, "total_steps": 1000, "loss": 13.8845, "learning_rate": 0.0, "epoch": 1.754276827371695, "percentage": 14.1, "cur_time": "2024-08-22 05:40:54", "elapsed_time": "0:05:23", "remaining_time": "0:32:53", "throughput": "16145.60", "total_tokens": 5230896}
{"current_steps": 144, "total_steps": 1000, "loss": 16.5376, "learning_rate": 0.0, "epoch": 1.791601866251944, "percentage": 14.4, "cur_time": "2024-08-22 05:41:01", "elapsed_time": "0:05:30", "remaining_time": "0:32:44", "throughput": "16168.78", "total_tokens": 5342896}
{"current_steps": 147, "total_steps": 1000, "loss": 7.6679, "learning_rate": 0.0, "epoch": 1.8289269051321928, "percentage": 14.7, "cur_time": "2024-08-22 05:41:08", "elapsed_time": "0:05:37", "remaining_time": "0:32:39", "throughput": "16163.07", "total_tokens": 5457904}
{"current_steps": 150, "total_steps": 1000, "loss": 15.0238, "learning_rate": 0.0, "epoch": 1.8662519440124417, "percentage": 15.0, "cur_time": "2024-08-22 05:41:15", "elapsed_time": "0:05:44", "remaining_time": "0:32:34", "throughput": "16161.78", "total_tokens": 5573360}
{"current_steps": 153, "total_steps": 1000, "loss": 7.4082, "learning_rate": 0.0, "epoch": 1.9035769828926905, "percentage": 15.3, "cur_time": "2024-08-22 05:41:22", "elapsed_time": "0:05:51", "remaining_time": "0:32:28", "throughput": "16157.29", "total_tokens": 5686432}
{"current_steps": 156, "total_steps": 1000, "loss": 7.2681, "learning_rate": 0.0, "epoch": 1.9409020217729394, "percentage": 15.6, "cur_time": "2024-08-22 05:41:29", "elapsed_time": "0:05:58", "remaining_time": "0:32:20", "throughput": "16177.32", "total_tokens": 5803248}
{"current_steps": 159, "total_steps": 1000, "loss": 17.5133, "learning_rate": 0.0, "epoch": 1.9782270606531882, "percentage": 15.9, "cur_time": "2024-08-22 05:41:36", "elapsed_time": "0:06:05", "remaining_time": "0:32:13", "throughput": "16172.52", "total_tokens": 5911424}
{"current_steps": 162, "total_steps": 1000, "loss": 21.3751, "learning_rate": 0.0, "epoch": 2.015552099533437, "percentage": 16.2, "cur_time": "2024-08-22 05:41:43", "elapsed_time": "0:06:12", "remaining_time": "0:32:08", "throughput": "16171.98", "total_tokens": 6030352}
{"current_steps": 165, "total_steps": 1000, "loss": 14.4481, "learning_rate": 0.0, "epoch": 2.052877138413686, "percentage": 16.5, "cur_time": "2024-08-22 05:41:50", "elapsed_time": "0:06:19", "remaining_time": "0:32:00", "throughput": "16171.49", "total_tokens": 6137888}
{"current_steps": 168, "total_steps": 1000, "loss": 8.3918, "learning_rate": 0.0, "epoch": 2.0902021772939348, "percentage": 16.8, "cur_time": "2024-08-22 05:41:57", "elapsed_time": "0:06:26", "remaining_time": "0:31:55", "throughput": "16158.56", "total_tokens": 6248704}
{"current_steps": 171, "total_steps": 1000, "loss": 21.2405, "learning_rate": 0.0, "epoch": 2.1275272161741836, "percentage": 17.1, "cur_time": "2024-08-22 05:42:04", "elapsed_time": "0:06:33", "remaining_time": "0:31:46", "throughput": "16156.28", "total_tokens": 6354144}
{"current_steps": 174, "total_steps": 1000, "loss": 8.2517, "learning_rate": 0.0, "epoch": 2.1648522550544325, "percentage": 17.4, "cur_time": "2024-08-22 05:42:11", "elapsed_time": "0:06:40", "remaining_time": "0:31:39", "throughput": "16164.81", "total_tokens": 6467936}
{"current_steps": 177, "total_steps": 1000, "loss": 7.6873, "learning_rate": 0.0, "epoch": 2.2021772939346813, "percentage": 17.7, "cur_time": "2024-08-22 05:42:18", "elapsed_time": "0:06:47", "remaining_time": "0:31:32", "throughput": "16163.07", "total_tokens": 6579504}
{"current_steps": 180, "total_steps": 1000, "loss": 12.5826, "learning_rate": 0.0, "epoch": 2.23950233281493, "percentage": 18.0, "cur_time": "2024-08-22 05:42:24", "elapsed_time": "0:06:54", "remaining_time": "0:31:26", "throughput": "16163.05", "total_tokens": 6691920}
{"current_steps": 183, "total_steps": 1000, "loss": 14.0724, "learning_rate": 0.0, "epoch": 2.276827371695179, "percentage": 18.3, "cur_time": "2024-08-22 05:42:32", "elapsed_time": "0:07:01", "remaining_time": "0:31:19", "throughput": "16167.45", "total_tokens": 6808080}
{"current_steps": 186, "total_steps": 1000, "loss": 9.5351, "learning_rate": 0.0, "epoch": 2.314152410575428, "percentage": 18.6, "cur_time": "2024-08-22 05:42:39", "elapsed_time": "0:07:08", "remaining_time": "0:31:16", "throughput": "16156.66", "total_tokens": 6925936}
{"current_steps": 189, "total_steps": 1000, "loss": 11.3624, "learning_rate": 0.0, "epoch": 2.3514774494556763, "percentage": 18.9, "cur_time": "2024-08-22 05:42:46", "elapsed_time": "0:07:15", "remaining_time": "0:31:07", "throughput": "16163.25", "total_tokens": 7032576}
{"current_steps": 192, "total_steps": 1000, "loss": 23.8997, "learning_rate": 0.0, "epoch": 2.3888024883359256, "percentage": 19.2, "cur_time": "2024-08-22 05:42:52", "elapsed_time": "0:07:21", "remaining_time": "0:30:58", "throughput": "16166.11", "total_tokens": 7140304}
{"current_steps": 195, "total_steps": 1000, "loss": 7.9994, "learning_rate": 0.0, "epoch": 2.426127527216174, "percentage": 19.5, "cur_time": "2024-08-22 05:42:59", "elapsed_time": "0:07:28", "remaining_time": "0:30:50", "throughput": "16160.23", "total_tokens": 7244320}
{"current_steps": 198, "total_steps": 1000, "loss": 12.7368, "learning_rate": 0.0, "epoch": 2.463452566096423, "percentage": 19.8, "cur_time": "2024-08-22 05:43:05", "elapsed_time": "0:07:34", "remaining_time": "0:30:42", "throughput": "16158.83", "total_tokens": 7351408}
{"current_steps": 201, "total_steps": 1000, "loss": 13.655, "learning_rate": 0.0, "epoch": 2.5007776049766717, "percentage": 20.1, "cur_time": "2024-08-22 05:43:12", "elapsed_time": "0:07:42", "remaining_time": "0:30:36", "throughput": "16154.31", "total_tokens": 7463792}
{"current_steps": 204, "total_steps": 1000, "loss": 9.2975, "learning_rate": 0.0, "epoch": 2.5381026438569205, "percentage": 20.4, "cur_time": "2024-08-22 05:43:19", "elapsed_time": "0:07:48", "remaining_time": "0:30:28", "throughput": "16160.81", "total_tokens": 7572176}
{"current_steps": 207, "total_steps": 1000, "loss": 8.1766, "learning_rate": 0.0, "epoch": 2.5754276827371694, "percentage": 20.7, "cur_time": "2024-08-22 05:43:26", "elapsed_time": "0:07:55", "remaining_time": "0:30:23", "throughput": "16153.58", "total_tokens": 7688464}
{"current_steps": 210, "total_steps": 1000, "loss": 12.0825, "learning_rate": 0.0, "epoch": 2.6127527216174182, "percentage": 21.0, "cur_time": "2024-08-22 05:43:33", "elapsed_time": "0:08:02", "remaining_time": "0:30:15", "throughput": "16159.74", "total_tokens": 7799760}
{"current_steps": 213, "total_steps": 1000, "loss": 5.9591, "learning_rate": 0.0, "epoch": 2.650077760497667, "percentage": 21.3, "cur_time": "2024-08-22 05:43:40", "elapsed_time": "0:08:09", "remaining_time": "0:30:10", "throughput": "16166.25", "total_tokens": 7920016}
{"current_steps": 216, "total_steps": 1000, "loss": 17.4673, "learning_rate": 0.0, "epoch": 2.687402799377916, "percentage": 21.6, "cur_time": "2024-08-22 05:43:48", "elapsed_time": "0:08:17", "remaining_time": "0:30:04", "throughput": "16160.67", "total_tokens": 8032896}
{"current_steps": 219, "total_steps": 1000, "loss": 6.1845, "learning_rate": 0.0, "epoch": 2.724727838258165, "percentage": 21.9, "cur_time": "2024-08-22 05:43:54", "elapsed_time": "0:08:23", "remaining_time": "0:29:56", "throughput": "16167.07", "total_tokens": 8143360}
{"current_steps": 222, "total_steps": 1000, "loss": 8.8712, "learning_rate": 0.0, "epoch": 2.7620528771384136, "percentage": 22.2, "cur_time": "2024-08-22 05:44:01", "elapsed_time": "0:08:30", "remaining_time": "0:29:47", "throughput": "16162.42", "total_tokens": 8244608}
{"current_steps": 225, "total_steps": 1000, "loss": 9.6746, "learning_rate": 0.0, "epoch": 2.7993779160186625, "percentage": 22.5, "cur_time": "2024-08-22 05:44:08", "elapsed_time": "0:08:37", "remaining_time": "0:29:41", "throughput": "16165.43", "total_tokens": 8362592}
{"current_steps": 228, "total_steps": 1000, "loss": 15.1383, "learning_rate": 0.0, "epoch": 2.8367029548989113, "percentage": 22.8, "cur_time": "2024-08-22 05:44:15", "elapsed_time": "0:08:44", "remaining_time": "0:29:35", "throughput": "16176.82", "total_tokens": 8480496}
{"current_steps": 231, "total_steps": 1000, "loss": 8.4034, "learning_rate": 0.0, "epoch": 2.87402799377916, "percentage": 23.1, "cur_time": "2024-08-22 05:44:22", "elapsed_time": "0:08:51", "remaining_time": "0:29:28", "throughput": "16182.48", "total_tokens": 8594912}
{"current_steps": 234, "total_steps": 1000, "loss": 20.3951, "learning_rate": 0.0, "epoch": 2.911353032659409, "percentage": 23.4, "cur_time": "2024-08-22 05:44:29", "elapsed_time": "0:08:58", "remaining_time": "0:29:22", "throughput": "16175.24", "total_tokens": 8707728}
{"current_steps": 237, "total_steps": 1000, "loss": 6.4257, "learning_rate": 0.0, "epoch": 2.948678071539658, "percentage": 23.7, "cur_time": "2024-08-22 05:44:36", "elapsed_time": "0:09:05", "remaining_time": "0:29:15", "throughput": "16169.49", "total_tokens": 8815456}
{"current_steps": 240, "total_steps": 1000, "loss": 13.9254, "learning_rate": 0.0, "epoch": 2.9860031104199067, "percentage": 24.0, "cur_time": "2024-08-22 05:44:43", "elapsed_time": "0:09:12", "remaining_time": "0:29:09", "throughput": "16172.20", "total_tokens": 8932288}
{"current_steps": 243, "total_steps": 1000, "loss": 12.1414, "learning_rate": 0.0, "epoch": 3.0233281493001556, "percentage": 24.3, "cur_time": "2024-08-22 05:44:50", "elapsed_time": "0:09:19", "remaining_time": "0:29:03", "throughput": "16168.29", "total_tokens": 9048592}
{"current_steps": 246, "total_steps": 1000, "loss": 11.2167, "learning_rate": 0.0, "epoch": 3.0606531881804044, "percentage": 24.6, "cur_time": "2024-08-22 05:44:56", "elapsed_time": "0:09:25", "remaining_time": "0:28:54", "throughput": "16184.05", "total_tokens": 9158576}
{"current_steps": 249, "total_steps": 1000, "loss": 16.1835, "learning_rate": 0.0, "epoch": 3.0979782270606533, "percentage": 24.9, "cur_time": "2024-08-22 05:45:03", "elapsed_time": "0:09:32", "remaining_time": "0:28:46", "throughput": "16181.83", "total_tokens": 9264336}
{"current_steps": 252, "total_steps": 1000, "loss": 16.5755, "learning_rate": 0.0, "epoch": 3.135303265940902, "percentage": 25.2, "cur_time": "2024-08-22 05:45:10", "elapsed_time": "0:09:39", "remaining_time": "0:28:39", "throughput": "16189.44", "total_tokens": 9379136}
{"current_steps": 255, "total_steps": 1000, "loss": 15.7423, "learning_rate": 0.0, "epoch": 3.172628304821151, "percentage": 25.5, "cur_time": "2024-08-22 05:45:16", "elapsed_time": "0:09:45", "remaining_time": "0:28:31", "throughput": "16184.56", "total_tokens": 9483568}
{"current_steps": 258, "total_steps": 1000, "loss": 11.4227, "learning_rate": 0.0, "epoch": 3.2099533437014, "percentage": 25.8, "cur_time": "2024-08-22 05:45:23", "elapsed_time": "0:09:52", "remaining_time": "0:28:24", "throughput": "16187.29", "total_tokens": 9596304}
{"current_steps": 261, "total_steps": 1000, "loss": 14.9299, "learning_rate": 0.0, "epoch": 3.2472783825816487, "percentage": 26.1, "cur_time": "2024-08-22 05:45:30", "elapsed_time": "0:09:59", "remaining_time": "0:28:18", "throughput": "16180.28", "total_tokens": 9706592}
{"current_steps": 264, "total_steps": 1000, "loss": 7.2115, "learning_rate": 0.0, "epoch": 3.2846034214618975, "percentage": 26.4, "cur_time": "2024-08-22 05:45:37", "elapsed_time": "0:10:06", "remaining_time": "0:28:10", "throughput": "16178.24", "total_tokens": 9807376}
{"current_steps": 267, "total_steps": 1000, "loss": 16.8579, "learning_rate": 0.0, "epoch": 3.3219284603421464, "percentage": 26.7, "cur_time": "2024-08-22 05:45:43", "elapsed_time": "0:10:12", "remaining_time": "0:28:02", "throughput": "16182.00", "total_tokens": 9918144}
{"current_steps": 270, "total_steps": 1000, "loss": 9.2991, "learning_rate": 0.0, "epoch": 3.359253499222395, "percentage": 27.0, "cur_time": "2024-08-22 05:45:50", "elapsed_time": "0:10:19", "remaining_time": "0:27:55", "throughput": "16181.58", "total_tokens": 10027360}
{"current_steps": 273, "total_steps": 1000, "loss": 10.1769, "learning_rate": 0.0, "epoch": 3.396578538102644, "percentage": 27.3, "cur_time": "2024-08-22 05:45:57", "elapsed_time": "0:10:26", "remaining_time": "0:27:48", "throughput": "16186.04", "total_tokens": 10141792}
{"current_steps": 276, "total_steps": 1000, "loss": 27.4338, "learning_rate": 0.0, "epoch": 3.4339035769828925, "percentage": 27.6, "cur_time": "2024-08-22 05:46:04", "elapsed_time": "0:10:33", "remaining_time": "0:27:42", "throughput": "16182.67", "total_tokens": 10254096}
{"current_steps": 279, "total_steps": 1000, "loss": 6.8982, "learning_rate": 0.0, "epoch": 3.4712286158631414, "percentage": 27.9, "cur_time": "2024-08-22 05:46:11", "elapsed_time": "0:10:40", "remaining_time": "0:27:36", "throughput": "16179.48", "total_tokens": 10368672}
{"current_steps": 282, "total_steps": 1000, "loss": 7.7729, "learning_rate": 0.0, "epoch": 3.50855365474339, "percentage": 28.2, "cur_time": "2024-08-22 05:46:17", "elapsed_time": "0:10:46", "remaining_time": "0:27:26", "throughput": "16186.39", "total_tokens": 10467568}
{"current_steps": 285, "total_steps": 1000, "loss": 13.1635, "learning_rate": 0.0, "epoch": 3.545878693623639, "percentage": 28.5, "cur_time": "2024-08-22 05:46:25", "elapsed_time": "0:10:54", "remaining_time": "0:27:22", "throughput": "16178.37", "total_tokens": 10590848}
{"current_steps": 288, "total_steps": 1000, "loss": 11.636, "learning_rate": 0.0, "epoch": 3.583203732503888, "percentage": 28.8, "cur_time": "2024-08-22 05:46:32", "elapsed_time": "0:11:01", "remaining_time": "0:27:15", "throughput": "16175.79", "total_tokens": 10702448}
{"current_steps": 291, "total_steps": 1000, "loss": 12.8655, "learning_rate": 0.0, "epoch": 3.6205287713841368, "percentage": 29.1, "cur_time": "2024-08-22 05:46:39", "elapsed_time": "0:11:08", "remaining_time": "0:27:09", "throughput": "16186.37", "total_tokens": 10827376}
{"current_steps": 294, "total_steps": 1000, "loss": 8.1184, "learning_rate": 0.0, "epoch": 3.6578538102643856, "percentage": 29.4, "cur_time": "2024-08-22 05:46:46", "elapsed_time": "0:11:15", "remaining_time": "0:27:02", "throughput": "16195.37", "total_tokens": 10940560}
{"current_steps": 297, "total_steps": 1000, "loss": 12.5074, "learning_rate": 0.0, "epoch": 3.6951788491446345, "percentage": 29.7, "cur_time": "2024-08-22 05:46:53", "elapsed_time": "0:11:22", "remaining_time": "0:26:54", "throughput": "16192.19", "total_tokens": 11045168}
{"current_steps": 300, "total_steps": 1000, "loss": 6.7851, "learning_rate": 0.0, "epoch": 3.7325038880248833, "percentage": 30.0, "cur_time": "2024-08-22 05:46:59", "elapsed_time": "0:11:28", "remaining_time": "0:26:46", "throughput": "16198.61", "total_tokens": 11154736}
{"current_steps": 303, "total_steps": 1000, "loss": 12.5286, "learning_rate": 0.0, "epoch": 3.769828926905132, "percentage": 30.3, "cur_time": "2024-08-22 05:47:06", "elapsed_time": "0:11:35", "remaining_time": "0:26:39", "throughput": "16205.18", "total_tokens": 11265440}
{"current_steps": 306, "total_steps": 1000, "loss": 9.2964, "learning_rate": 0.0, "epoch": 3.807153965785381, "percentage": 30.6, "cur_time": "2024-08-22 05:47:12", "elapsed_time": "0:11:41", "remaining_time": "0:26:31", "throughput": "16211.77", "total_tokens": 11377728}
{"current_steps": 309, "total_steps": 1000, "loss": 12.8742, "learning_rate": 0.0, "epoch": 3.84447900466563, "percentage": 30.9, "cur_time": "2024-08-22 05:47:19", "elapsed_time": "0:11:48", "remaining_time": "0:26:23", "throughput": "16212.70", "total_tokens": 11482944}
{"current_steps": 312, "total_steps": 1000, "loss": 10.1718, "learning_rate": 0.0, "epoch": 3.8818040435458787, "percentage": 31.2, "cur_time": "2024-08-22 05:47:26", "elapsed_time": "0:11:55", "remaining_time": "0:26:18", "throughput": "16214.93", "total_tokens": 11604304}
{"current_steps": 315, "total_steps": 1000, "loss": 9.0186, "learning_rate": 0.0, "epoch": 3.9191290824261276, "percentage": 31.5, "cur_time": "2024-08-22 05:47:33", "elapsed_time": "0:12:02", "remaining_time": "0:26:11", "throughput": "16213.00", "total_tokens": 11717840}
{"current_steps": 318, "total_steps": 1000, "loss": 12.1146, "learning_rate": 0.0, "epoch": 3.9564541213063764, "percentage": 31.8, "cur_time": "2024-08-22 05:47:40", "elapsed_time": "0:12:09", "remaining_time": "0:26:04", "throughput": "16225.00", "total_tokens": 11838736}
{"current_steps": 321, "total_steps": 1000, "loss": 8.2904, "learning_rate": 0.0, "epoch": 3.9937791601866253, "percentage": 32.1, "cur_time": "2024-08-22 05:47:47", "elapsed_time": "0:12:16", "remaining_time": "0:25:58", "throughput": "16219.34", "total_tokens": 11951984}
{"current_steps": 324, "total_steps": 1000, "loss": 14.7016, "learning_rate": 0.0, "epoch": 4.031104199066874, "percentage": 32.4, "cur_time": "2024-08-22 05:47:54", "elapsed_time": "0:12:23", "remaining_time": "0:25:52", "throughput": "16222.01", "total_tokens": 12068128}
{"current_steps": 327, "total_steps": 1000, "loss": 12.1103, "learning_rate": 0.0, "epoch": 4.0684292379471225, "percentage": 32.7, "cur_time": "2024-08-22 05:48:01", "elapsed_time": "0:12:30", "remaining_time": "0:25:44", "throughput": "16215.59", "total_tokens": 12168768}
{"current_steps": 330, "total_steps": 1000, "loss": 13.9338, "learning_rate": 0.0, "epoch": 4.105754276827372, "percentage": 33.0, "cur_time": "2024-08-22 05:48:08", "elapsed_time": "0:12:37", "remaining_time": "0:25:38", "throughput": "16216.60", "total_tokens": 12286624}
{"current_steps": 333, "total_steps": 1000, "loss": 5.3579, "learning_rate": 0.0, "epoch": 4.14307931570762, "percentage": 33.3, "cur_time": "2024-08-22 05:48:14", "elapsed_time": "0:12:44", "remaining_time": "0:25:30", "throughput": "16218.41", "total_tokens": 12391296}
{"current_steps": 336, "total_steps": 1000, "loss": 9.147, "learning_rate": 0.0, "epoch": 4.1804043545878695, "percentage": 33.6, "cur_time": "2024-08-22 05:48:21", "elapsed_time": "0:12:50", "remaining_time": "0:25:23", "throughput": "16232.38", "total_tokens": 12512656}
{"current_steps": 339, "total_steps": 1000, "loss": 12.5259, "learning_rate": 0.0, "epoch": 4.217729393468118, "percentage": 33.9, "cur_time": "2024-08-22 05:48:28", "elapsed_time": "0:12:57", "remaining_time": "0:25:15", "throughput": "16235.58", "total_tokens": 12617696}
{"current_steps": 342, "total_steps": 1000, "loss": 17.4818, "learning_rate": 0.0, "epoch": 4.255054432348367, "percentage": 34.2, "cur_time": "2024-08-22 05:48:35", "elapsed_time": "0:13:04", "remaining_time": "0:25:08", "throughput": "16238.86", "total_tokens": 12735136}
{"current_steps": 345, "total_steps": 1000, "loss": 20.5624, "learning_rate": 0.0, "epoch": 4.292379471228616, "percentage": 34.5, "cur_time": "2024-08-22 05:48:42", "elapsed_time": "0:13:11", "remaining_time": "0:25:02", "throughput": "16241.26", "total_tokens": 12849008}
{"current_steps": 348, "total_steps": 1000, "loss": 13.0519, "learning_rate": 0.0, "epoch": 4.329704510108865, "percentage": 34.8, "cur_time": "2024-08-22 05:48:48", "elapsed_time": "0:13:17", "remaining_time": "0:24:54", "throughput": "16246.92", "total_tokens": 12962592}
{"current_steps": 351, "total_steps": 1000, "loss": 8.2046, "learning_rate": 0.0, "epoch": 4.367029548989113, "percentage": 35.1, "cur_time": "2024-08-22 05:48:55", "elapsed_time": "0:13:24", "remaining_time": "0:24:46", "throughput": "16258.19", "total_tokens": 13074688}
{"current_steps": 354, "total_steps": 1000, "loss": 7.3577, "learning_rate": 0.0, "epoch": 4.404354587869363, "percentage": 35.4, "cur_time": "2024-08-22 05:49:02", "elapsed_time": "0:13:31", "remaining_time": "0:24:40", "throughput": "16257.83", "total_tokens": 13186240}
{"current_steps": 357, "total_steps": 1000, "loss": 7.2161, "learning_rate": 0.0, "epoch": 4.441679626749611, "percentage": 35.7, "cur_time": "2024-08-22 05:49:08", "elapsed_time": "0:13:37", "remaining_time": "0:24:32", "throughput": "16261.12", "total_tokens": 13290736}
{"current_steps": 360, "total_steps": 1000, "loss": 6.7259, "learning_rate": 0.0, "epoch": 4.47900466562986, "percentage": 36.0, "cur_time": "2024-08-22 05:49:14", "elapsed_time": "0:13:44", "remaining_time": "0:24:24", "throughput": "16258.66", "total_tokens": 13397344}
{"current_steps": 363, "total_steps": 1000, "loss": 26.3404, "learning_rate": 0.0, "epoch": 4.516329704510109, "percentage": 36.3, "cur_time": "2024-08-22 05:49:21", "elapsed_time": "0:13:50", "remaining_time": "0:24:16", "throughput": "16259.62", "total_tokens": 13499776}
{"current_steps": 366, "total_steps": 1000, "loss": 12.21, "learning_rate": 0.0, "epoch": 4.553654743390358, "percentage": 36.6, "cur_time": "2024-08-22 05:49:28", "elapsed_time": "0:13:57", "remaining_time": "0:24:10", "throughput": "16252.72", "total_tokens": 13613312}
{"current_steps": 369, "total_steps": 1000, "loss": 11.6263, "learning_rate": 0.0, "epoch": 4.590979782270606, "percentage": 36.9, "cur_time": "2024-08-22 05:49:35", "elapsed_time": "0:14:04", "remaining_time": "0:24:04", "throughput": "16251.99", "total_tokens": 13724848}
{"current_steps": 372, "total_steps": 1000, "loss": 5.3601, "learning_rate": 0.0, "epoch": 4.628304821150856, "percentage": 37.2, "cur_time": "2024-08-22 05:49:42", "elapsed_time": "0:14:11", "remaining_time": "0:23:57", "throughput": "16247.79", "total_tokens": 13832560}
{"current_steps": 375, "total_steps": 1000, "loss": 5.7389, "learning_rate": 0.0, "epoch": 4.665629860031104, "percentage": 37.5, "cur_time": "2024-08-22 05:49:49", "elapsed_time": "0:14:18", "remaining_time": "0:23:51", "throughput": "16243.50", "total_tokens": 13947488}
{"current_steps": 378, "total_steps": 1000, "loss": 10.5194, "learning_rate": 0.0, "epoch": 4.7029548989113525, "percentage": 37.8, "cur_time": "2024-08-22 05:49:57", "elapsed_time": "0:14:26", "remaining_time": "0:23:45", "throughput": "16244.24", "total_tokens": 14068672}
{"current_steps": 381, "total_steps": 1000, "loss": 9.6571, "learning_rate": 0.0, "epoch": 4.740279937791602, "percentage": 38.1, "cur_time": "2024-08-22 05:50:04", "elapsed_time": "0:14:33", "remaining_time": "0:23:38", "throughput": "16242.15", "total_tokens": 14184800}
{"current_steps": 384, "total_steps": 1000, "loss": 19.7807, "learning_rate": 0.0, "epoch": 4.777604976671851, "percentage": 38.4, "cur_time": "2024-08-22 05:50:11", "elapsed_time": "0:14:40", "remaining_time": "0:23:32", "throughput": "16246.83", "total_tokens": 14301888}
{"current_steps": 387, "total_steps": 1000, "loss": 11.351, "learning_rate": 0.0, "epoch": 4.8149300155520995, "percentage": 38.7, "cur_time": "2024-08-22 05:50:18", "elapsed_time": "0:14:47", "remaining_time": "0:23:25", "throughput": "16257.00", "total_tokens": 14423984}
{"current_steps": 390, "total_steps": 1000, "loss": 7.8255, "learning_rate": 0.0, "epoch": 4.852255054432348, "percentage": 39.0, "cur_time": "2024-08-22 05:50:24", "elapsed_time": "0:14:53", "remaining_time": "0:23:18", "throughput": "16258.56", "total_tokens": 14532336}
{"current_steps": 393, "total_steps": 1000, "loss": 13.489, "learning_rate": 0.0, "epoch": 4.889580093312597, "percentage": 39.3, "cur_time": "2024-08-22 05:50:31", "elapsed_time": "0:15:00", "remaining_time": "0:23:10", "throughput": "16257.40", "total_tokens": 14634784}
{"current_steps": 396, "total_steps": 1000, "loss": 12.5538, "learning_rate": 0.0, "epoch": 4.926905132192846, "percentage": 39.6, "cur_time": "2024-08-22 05:50:37", "elapsed_time": "0:15:07", "remaining_time": "0:23:03", "throughput": "16253.40", "total_tokens": 14741952}
{"current_steps": 399, "total_steps": 1000, "loss": 17.8018, "learning_rate": 0.0, "epoch": 4.964230171073095, "percentage": 39.9, "cur_time": "2024-08-22 05:50:45", "elapsed_time": "0:15:14", "remaining_time": "0:22:56", "throughput": "16254.49", "total_tokens": 14859328}
{"current_steps": 402, "total_steps": 1000, "loss": 8.9342, "learning_rate": 0.0, "epoch": 5.001555209953343, "percentage": 40.2, "cur_time": "2024-08-22 05:50:52", "elapsed_time": "0:15:21", "remaining_time": "0:22:50", "throughput": "16253.35", "total_tokens": 14976672}
{"current_steps": 405, "total_steps": 1000, "loss": 15.353, "learning_rate": 0.0, "epoch": 5.038880248833593, "percentage": 40.5, "cur_time": "2024-08-22 05:50:59", "elapsed_time": "0:15:28", "remaining_time": "0:22:44", "throughput": "16245.09", "total_tokens": 15084176}
{"current_steps": 408, "total_steps": 1000, "loss": 6.0025, "learning_rate": 0.0, "epoch": 5.076205287713841, "percentage": 40.8, "cur_time": "2024-08-22 05:51:06", "elapsed_time": "0:15:35", "remaining_time": "0:22:37", "throughput": "16249.30", "total_tokens": 15199840}
{"current_steps": 411, "total_steps": 1000, "loss": 18.3302, "learning_rate": 0.0, "epoch": 5.11353032659409, "percentage": 41.1, "cur_time": "2024-08-22 05:51:13", "elapsed_time": "0:15:42", "remaining_time": "0:22:30", "throughput": "16251.20", "total_tokens": 15311664}
{"current_steps": 414, "total_steps": 1000, "loss": 10.1281, "learning_rate": 0.0, "epoch": 5.150855365474339, "percentage": 41.4, "cur_time": "2024-08-22 05:51:19", "elapsed_time": "0:15:48", "remaining_time": "0:22:22", "throughput": "16251.36", "total_tokens": 15417552}
{"current_steps": 417, "total_steps": 1000, "loss": 19.8608, "learning_rate": 0.0, "epoch": 5.188180404354588, "percentage": 41.7, "cur_time": "2024-08-22 05:51:26", "elapsed_time": "0:15:55", "remaining_time": "0:22:16", "throughput": "16254.93", "total_tokens": 15533648}
{"current_steps": 420, "total_steps": 1000, "loss": 10.9191, "learning_rate": 0.0, "epoch": 5.2255054432348365, "percentage": 42.0, "cur_time": "2024-08-22 05:51:33", "elapsed_time": "0:16:02", "remaining_time": "0:22:09", "throughput": "16260.54", "total_tokens": 15653504}
{"current_steps": 423, "total_steps": 1000, "loss": 7.1656, "learning_rate": 0.0, "epoch": 5.262830482115086, "percentage": 42.3, "cur_time": "2024-08-22 05:51:40", "elapsed_time": "0:16:09", "remaining_time": "0:22:02", "throughput": "16261.31", "total_tokens": 15770336}
{"current_steps": 426, "total_steps": 1000, "loss": 8.2608, "learning_rate": 0.0, "epoch": 5.300155520995334, "percentage": 42.6, "cur_time": "2024-08-22 05:51:47", "elapsed_time": "0:16:16", "remaining_time": "0:21:55", "throughput": "16262.62", "total_tokens": 15882608}
{"current_steps": 429, "total_steps": 1000, "loss": 7.1175, "learning_rate": 0.0, "epoch": 5.3374805598755835, "percentage": 42.9, "cur_time": "2024-08-22 05:51:54", "elapsed_time": "0:16:23", "remaining_time": "0:21:49", "throughput": "16264.07", "total_tokens": 16002000}
{"current_steps": 432, "total_steps": 1000, "loss": 7.9221, "learning_rate": 0.0, "epoch": 5.374805598755832, "percentage": 43.2, "cur_time": "2024-08-22 05:52:01", "elapsed_time": "0:16:30", "remaining_time": "0:21:42", "throughput": "16271.26", "total_tokens": 16115504}
{"current_steps": 435, "total_steps": 1000, "loss": 7.5912, "learning_rate": 0.0, "epoch": 5.412130637636081, "percentage": 43.5, "cur_time": "2024-08-22 05:52:08", "elapsed_time": "0:16:37", "remaining_time": "0:21:35", "throughput": "16264.76", "total_tokens": 16223232}
{"current_steps": 438, "total_steps": 1000, "loss": 15.3158, "learning_rate": 0.0, "epoch": 5.44945567651633, "percentage": 43.8, "cur_time": "2024-08-22 05:52:15", "elapsed_time": "0:16:44", "remaining_time": "0:21:29", "throughput": "16260.69", "total_tokens": 16338720}
{"current_steps": 441, "total_steps": 1000, "loss": 7.991, "learning_rate": 0.0, "epoch": 5.486780715396579, "percentage": 44.1, "cur_time": "2024-08-22 05:52:22", "elapsed_time": "0:16:51", "remaining_time": "0:21:22", "throughput": "16258.12", "total_tokens": 16444768}
{"current_steps": 444, "total_steps": 1000, "loss": 12.7983, "learning_rate": 0.0, "epoch": 5.524105754276827, "percentage": 44.4, "cur_time": "2024-08-22 05:52:29", "elapsed_time": "0:16:58", "remaining_time": "0:21:15", "throughput": "16257.90", "total_tokens": 16558672}
{"current_steps": 447, "total_steps": 1000, "loss": 14.9909, "learning_rate": 0.0, "epoch": 5.561430793157077, "percentage": 44.7, "cur_time": "2024-08-22 05:52:35", "elapsed_time": "0:17:04", "remaining_time": "0:21:08", "throughput": "16259.92", "total_tokens": 16665760}
{"current_steps": 450, "total_steps": 1000, "loss": 8.7115, "learning_rate": 0.0, "epoch": 5.598755832037325, "percentage": 45.0, "cur_time": "2024-08-22 05:52:42", "elapsed_time": "0:17:11", "remaining_time": "0:21:00", "throughput": "16257.32", "total_tokens": 16772560}
{"current_steps": 453, "total_steps": 1000, "loss": 9.2608, "learning_rate": 0.0, "epoch": 5.636080870917574, "percentage": 45.3, "cur_time": "2024-08-22 05:52:49", "elapsed_time": "0:17:18", "remaining_time": "0:20:54", "throughput": "16265.35", "total_tokens": 16892544}
{"current_steps": 456, "total_steps": 1000, "loss": 12.3677, "learning_rate": 0.0, "epoch": 5.673405909797823, "percentage": 45.6, "cur_time": "2024-08-22 05:52:56", "elapsed_time": "0:17:25", "remaining_time": "0:20:47", "throughput": "16265.61", "total_tokens": 17004288}
{"current_steps": 459, "total_steps": 1000, "loss": 5.9306, "learning_rate": 0.0, "epoch": 5.710730948678071, "percentage": 45.9, "cur_time": "2024-08-22 05:53:03", "elapsed_time": "0:17:32", "remaining_time": "0:20:40", "throughput": "16254.23", "total_tokens": 17110288}
{"current_steps": 462, "total_steps": 1000, "loss": 9.6708, "learning_rate": 0.0, "epoch": 5.74805598755832, "percentage": 46.2, "cur_time": "2024-08-22 05:53:10", "elapsed_time": "0:17:39", "remaining_time": "0:20:33", "throughput": "16256.21", "total_tokens": 17220896}
{"current_steps": 465, "total_steps": 1000, "loss": 8.0083, "learning_rate": 0.0, "epoch": 5.78538102643857, "percentage": 46.5, "cur_time": "2024-08-22 05:53:17", "elapsed_time": "0:17:46", "remaining_time": "0:20:26", "throughput": "16262.16", "total_tokens": 17340752}
{"current_steps": 468, "total_steps": 1000, "loss": 10.2094, "learning_rate": 0.0, "epoch": 5.822706065318818, "percentage": 46.8, "cur_time": "2024-08-22 05:53:23", "elapsed_time": "0:17:52", "remaining_time": "0:20:19", "throughput": "16258.99", "total_tokens": 17445744}
{"current_steps": 471, "total_steps": 1000, "loss": 9.8845, "learning_rate": 0.0, "epoch": 5.8600311041990665, "percentage": 47.1, "cur_time": "2024-08-22 05:53:30", "elapsed_time": "0:17:59", "remaining_time": "0:20:12", "throughput": "16265.89", "total_tokens": 17561664}
{"current_steps": 474, "total_steps": 1000, "loss": 15.0664, "learning_rate": 0.0, "epoch": 5.897356143079316, "percentage": 47.4, "cur_time": "2024-08-22 05:53:37", "elapsed_time": "0:18:06", "remaining_time": "0:20:05", "throughput": "16266.39", "total_tokens": 17672976}
{"current_steps": 477, "total_steps": 1000, "loss": 17.8165, "learning_rate": 0.0, "epoch": 5.934681181959564, "percentage": 47.7, "cur_time": "2024-08-22 05:53:44", "elapsed_time": "0:18:13", "remaining_time": "0:19:58", "throughput": "16266.64", "total_tokens": 17780224}
{"current_steps": 480, "total_steps": 1000, "loss": 12.124, "learning_rate": 0.0, "epoch": 5.9720062208398135, "percentage": 48.0, "cur_time": "2024-08-22 05:53:50", "elapsed_time": "0:18:19", "remaining_time": "0:19:51", "throughput": "16269.93", "total_tokens": 17889984}
{"current_steps": 483, "total_steps": 1000, "loss": 13.9358, "learning_rate": 0.0, "epoch": 6.009331259720062, "percentage": 48.3, "cur_time": "2024-08-22 05:53:57", "elapsed_time": "0:18:26", "remaining_time": "0:19:44", "throughput": "16270.31", "total_tokens": 18004224}
{"current_steps": 486, "total_steps": 1000, "loss": 8.6936, "learning_rate": 0.0, "epoch": 6.046656298600311, "percentage": 48.6, "cur_time": "2024-08-22 05:54:04", "elapsed_time": "0:18:33", "remaining_time": "0:19:37", "throughput": "16272.81", "total_tokens": 18123616}
{"current_steps": 489, "total_steps": 1000, "loss": 9.3147, "learning_rate": 0.0, "epoch": 6.08398133748056, "percentage": 48.9, "cur_time": "2024-08-22 05:54:11", "elapsed_time": "0:18:40", "remaining_time": "0:19:30", "throughput": "16272.94", "total_tokens": 18231584}
{"current_steps": 492, "total_steps": 1000, "loss": 17.2013, "learning_rate": 0.0, "epoch": 6.121306376360809, "percentage": 49.2, "cur_time": "2024-08-22 05:54:18", "elapsed_time": "0:18:47", "remaining_time": "0:19:23", "throughput": "16277.39", "total_tokens": 18346080}
{"current_steps": 495, "total_steps": 1000, "loss": 15.7279, "learning_rate": 0.0, "epoch": 6.158631415241057, "percentage": 49.5, "cur_time": "2024-08-22 05:54:24", "elapsed_time": "0:18:53", "remaining_time": "0:19:16", "throughput": "16280.96", "total_tokens": 18461776}
{"current_steps": 498, "total_steps": 1000, "loss": 6.5026, "learning_rate": 0.0, "epoch": 6.195956454121307, "percentage": 49.8, "cur_time": "2024-08-22 05:54:32", "elapsed_time": "0:19:01", "remaining_time": "0:19:10", "throughput": "16277.18", "total_tokens": 18575904}
{"current_steps": 500, "total_steps": 1000, "eval_loss": NaN, "epoch": 6.2208398133748055, "percentage": 50.0, "cur_time": "2024-08-22 05:54:43", "elapsed_time": "0:19:12", "remaining_time": "0:19:12", "throughput": "16194.23", "total_tokens": 18659952}
{"current_steps": 501, "total_steps": 1000, "loss": 12.0386, "learning_rate": 0.0, "epoch": 6.233281493001555, "percentage": 50.1, "cur_time": "2024-08-22 05:54:45", "elapsed_time": "0:19:14", "remaining_time": "0:19:09", "throughput": "16194.15", "total_tokens": 18695344}
{"current_steps": 504, "total_steps": 1000, "loss": 8.9403, "learning_rate": 0.0, "epoch": 6.270606531881804, "percentage": 50.4, "cur_time": "2024-08-22 05:54:51", "elapsed_time": "0:19:20", "remaining_time": "0:19:02", "throughput": "16193.94", "total_tokens": 18796944}
{"current_steps": 507, "total_steps": 1000, "loss": 8.8669, "learning_rate": 0.0, "epoch": 6.307931570762053, "percentage": 50.7, "cur_time": "2024-08-22 05:54:58", "elapsed_time": "0:19:27", "remaining_time": "0:18:55", "throughput": "16190.23", "total_tokens": 18905296}
{"current_steps": 510, "total_steps": 1000, "loss": 9.6763, "learning_rate": 0.0, "epoch": 6.345256609642302, "percentage": 51.0, "cur_time": "2024-08-22 05:55:05", "elapsed_time": "0:19:34", "remaining_time": "0:18:48", "throughput": "16193.91", "total_tokens": 19021664}
{"current_steps": 513, "total_steps": 1000, "loss": 15.275, "learning_rate": 0.0, "epoch": 6.38258164852255, "percentage": 51.3, "cur_time": "2024-08-22 05:55:12", "elapsed_time": "0:19:41", "remaining_time": "0:18:41", "throughput": "16194.62", "total_tokens": 19137824}
{"current_steps": 516, "total_steps": 1000, "loss": 36.5206, "learning_rate": 0.0, "epoch": 6.4199066874028, "percentage": 51.6, "cur_time": "2024-08-22 05:55:19", "elapsed_time": "0:19:48", "remaining_time": "0:18:35", "throughput": "16192.00", "total_tokens": 19250000}
{"current_steps": 519, "total_steps": 1000, "loss": 14.6615, "learning_rate": 0.0, "epoch": 6.457231726283048, "percentage": 51.9, "cur_time": "2024-08-22 05:55:26", "elapsed_time": "0:19:55", "remaining_time": "0:18:28", "throughput": "16189.67", "total_tokens": 19361152}
{"current_steps": 522, "total_steps": 1000, "loss": 15.7849, "learning_rate": 0.0, "epoch": 6.494556765163297, "percentage": 52.2, "cur_time": "2024-08-22 05:55:33", "elapsed_time": "0:20:02", "remaining_time": "0:18:21", "throughput": "16191.48", "total_tokens": 19474288}
{"current_steps": 525, "total_steps": 1000, "loss": 7.2328, "learning_rate": 0.0, "epoch": 6.531881804043546, "percentage": 52.5, "cur_time": "2024-08-22 05:55:41", "elapsed_time": "0:20:10", "remaining_time": "0:18:14", "throughput": "16186.14", "total_tokens": 19587008}
{"current_steps": 528, "total_steps": 1000, "loss": 10.6908, "learning_rate": 0.0, "epoch": 6.569206842923795, "percentage": 52.8, "cur_time": "2024-08-22 05:55:47", "elapsed_time": "0:20:16", "remaining_time": "0:18:07", "throughput": "16190.05", "total_tokens": 19696112}
{"current_steps": 531, "total_steps": 1000, "loss": 15.1283, "learning_rate": 0.0, "epoch": 6.6065318818040435, "percentage": 53.1, "cur_time": "2024-08-22 05:55:54", "elapsed_time": "0:20:23", "remaining_time": "0:18:00", "throughput": "16189.44", "total_tokens": 19807648}
{"current_steps": 534, "total_steps": 1000, "loss": 20.95, "learning_rate": 0.0, "epoch": 6.643856920684293, "percentage": 53.4, "cur_time": "2024-08-22 05:56:01", "elapsed_time": "0:20:30", "remaining_time": "0:17:53", "throughput": "16186.52", "total_tokens": 19916992}
{"current_steps": 537, "total_steps": 1000, "loss": 14.2251, "learning_rate": 0.0, "epoch": 6.681181959564541, "percentage": 53.7, "cur_time": "2024-08-22 05:56:08", "elapsed_time": "0:20:37", "remaining_time": "0:17:46", "throughput": "16183.90", "total_tokens": 20022672}
{"current_steps": 540, "total_steps": 1000, "loss": 17.42, "learning_rate": 0.0, "epoch": 6.71850699844479, "percentage": 54.0, "cur_time": "2024-08-22 05:56:14", "elapsed_time": "0:20:43", "remaining_time": "0:17:39", "throughput": "16189.45", "total_tokens": 20138768}
{"current_steps": 543, "total_steps": 1000, "loss": 10.8675, "learning_rate": 0.0, "epoch": 6.755832037325039, "percentage": 54.3, "cur_time": "2024-08-22 05:56:21", "elapsed_time": "0:20:50", "remaining_time": "0:17:32", "throughput": "16192.35", "total_tokens": 20247824}
{"current_steps": 546, "total_steps": 1000, "loss": 8.586, "learning_rate": 0.0, "epoch": 6.793157076205288, "percentage": 54.6, "cur_time": "2024-08-22 05:56:27", "elapsed_time": "0:20:56", "remaining_time": "0:17:25", "throughput": "16195.05", "total_tokens": 20357072}
{"current_steps": 549, "total_steps": 1000, "loss": 8.1799, "learning_rate": 0.0, "epoch": 6.830482115085537, "percentage": 54.9, "cur_time": "2024-08-22 05:56:34", "elapsed_time": "0:21:03", "remaining_time": "0:17:18", "throughput": "16192.67", "total_tokens": 20466928}
{"current_steps": 552, "total_steps": 1000, "loss": 15.5518, "learning_rate": 0.0, "epoch": 6.867807153965785, "percentage": 55.2, "cur_time": "2024-08-22 05:56:41", "elapsed_time": "0:21:10", "remaining_time": "0:17:11", "throughput": "16189.01", "total_tokens": 20571168}
{"current_steps": 555, "total_steps": 1000, "loss": 10.5345, "learning_rate": 0.0, "epoch": 6.905132192846034, "percentage": 55.5, "cur_time": "2024-08-22 05:56:48", "elapsed_time": "0:21:17", "remaining_time": "0:17:04", "throughput": "16195.70", "total_tokens": 20691152}
{"current_steps": 558, "total_steps": 1000, "loss": 6.7281, "learning_rate": 0.0, "epoch": 6.942457231726283, "percentage": 55.8, "cur_time": "2024-08-22 05:56:55", "elapsed_time": "0:21:24", "remaining_time": "0:16:57", "throughput": "16196.60", "total_tokens": 20804944}
{"current_steps": 561, "total_steps": 1000, "loss": 9.2801, "learning_rate": 0.0, "epoch": 6.979782270606532, "percentage": 56.1, "cur_time": "2024-08-22 05:57:02", "elapsed_time": "0:21:31", "remaining_time": "0:16:50", "throughput": "16197.51", "total_tokens": 20916128}
{"current_steps": 564, "total_steps": 1000, "loss": 5.9323, "learning_rate": 0.0, "epoch": 7.01710730948678, "percentage": 56.4, "cur_time": "2024-08-22 05:57:09", "elapsed_time": "0:21:38", "remaining_time": "0:16:43", "throughput": "16199.09", "total_tokens": 21031664}
{"current_steps": 567, "total_steps": 1000, "loss": 9.9459, "learning_rate": 0.0, "epoch": 7.05443234836703, "percentage": 56.7, "cur_time": "2024-08-22 05:57:15", "elapsed_time": "0:21:44", "remaining_time": "0:16:36", "throughput": "16194.46", "total_tokens": 21132288}
{"current_steps": 570, "total_steps": 1000, "loss": 12.6263, "learning_rate": 0.0, "epoch": 7.091757387247278, "percentage": 57.0, "cur_time": "2024-08-22 05:57:22", "elapsed_time": "0:21:51", "remaining_time": "0:16:29", "throughput": "16195.40", "total_tokens": 21247024}
{"current_steps": 573, "total_steps": 1000, "loss": 11.8039, "learning_rate": 0.0, "epoch": 7.129082426127527, "percentage": 57.3, "cur_time": "2024-08-22 05:57:29", "elapsed_time": "0:21:58", "remaining_time": "0:16:22", "throughput": "16199.12", "total_tokens": 21363776}
{"current_steps": 576, "total_steps": 1000, "loss": 12.5097, "learning_rate": 0.0, "epoch": 7.166407465007776, "percentage": 57.6, "cur_time": "2024-08-22 05:57:36", "elapsed_time": "0:22:05", "remaining_time": "0:16:15", "throughput": "16198.68", "total_tokens": 21477024}
{"current_steps": 579, "total_steps": 1000, "loss": 12.6844, "learning_rate": 0.0, "epoch": 7.203732503888025, "percentage": 57.9, "cur_time": "2024-08-22 05:57:43", "elapsed_time": "0:22:12", "remaining_time": "0:16:08", "throughput": "16200.23", "total_tokens": 21588896}
{"current_steps": 582, "total_steps": 1000, "loss": 13.2153, "learning_rate": 0.0, "epoch": 7.2410575427682735, "percentage": 58.2, "cur_time": "2024-08-22 05:57:50", "elapsed_time": "0:22:19", "remaining_time": "0:16:02", "throughput": "16199.88", "total_tokens": 21705216}
{"current_steps": 585, "total_steps": 1000, "loss": 15.364, "learning_rate": 0.0, "epoch": 7.278382581648523, "percentage": 58.5, "cur_time": "2024-08-22 05:57:58", "elapsed_time": "0:22:27", "remaining_time": "0:15:55", "throughput": "16198.51", "total_tokens": 21820000}
{"current_steps": 588, "total_steps": 1000, "loss": 6.5364, "learning_rate": 0.0, "epoch": 7.315707620528771, "percentage": 58.8, "cur_time": "2024-08-22 05:58:05", "elapsed_time": "0:22:34", "remaining_time": "0:15:49", "throughput": "16190.43", "total_tokens": 21930352}
{"current_steps": 591, "total_steps": 1000, "loss": 8.7861, "learning_rate": 0.0, "epoch": 7.3530326594090205, "percentage": 59.1, "cur_time": "2024-08-22 05:58:13", "elapsed_time": "0:22:42", "remaining_time": "0:15:43", "throughput": "16169.35", "total_tokens": 22035424}
{"current_steps": 594, "total_steps": 1000, "loss": 9.4291, "learning_rate": 0.0, "epoch": 7.390357698289269, "percentage": 59.4, "cur_time": "2024-08-22 05:58:23", "elapsed_time": "0:22:52", "remaining_time": "0:15:37", "throughput": "16145.53", "total_tokens": 22154624}
{"current_steps": 597, "total_steps": 1000, "loss": 8.9286, "learning_rate": 0.0, "epoch": 7.427682737169518, "percentage": 59.7, "cur_time": "2024-08-22 05:58:31", "elapsed_time": "0:23:00", "remaining_time": "0:15:32", "throughput": "16125.75", "total_tokens": 22267792}
{"current_steps": 600, "total_steps": 1000, "loss": 6.2194, "learning_rate": 0.0, "epoch": 7.465007776049767, "percentage": 60.0, "cur_time": "2024-08-22 05:58:40", "elapsed_time": "0:23:09", "remaining_time": "0:15:26", "throughput": "16104.93", "total_tokens": 22377184}
{"current_steps": 603, "total_steps": 1000, "loss": 12.7506, "learning_rate": 0.0, "epoch": 7.502332814930016, "percentage": 60.3, "cur_time": "2024-08-22 05:58:49", "elapsed_time": "0:23:18", "remaining_time": "0:15:20", "throughput": "16085.06", "total_tokens": 22496112}
{"current_steps": 606, "total_steps": 1000, "loss": 16.7138, "learning_rate": 0.0, "epoch": 7.539657853810264, "percentage": 60.6, "cur_time": "2024-08-22 05:59:00", "elapsed_time": "0:23:29", "remaining_time": "0:15:16", "throughput": "16042.54", "total_tokens": 22613600}
{"current_steps": 609, "total_steps": 1000, "loss": 15.042, "learning_rate": 0.0, "epoch": 7.576982892690513, "percentage": 60.9, "cur_time": "2024-08-22 05:59:08", "elapsed_time": "0:23:37", "remaining_time": "0:15:10", "throughput": "16028.10", "total_tokens": 22726320}
{"current_steps": 612, "total_steps": 1000, "loss": 8.3775, "learning_rate": 0.0, "epoch": 7.614307931570762, "percentage": 61.2, "cur_time": "2024-08-22 05:59:17", "elapsed_time": "0:23:46", "remaining_time": "0:15:04", "throughput": "16007.04", "total_tokens": 22830224}
{"current_steps": 615, "total_steps": 1000, "loss": 11.2836, "learning_rate": 0.0, "epoch": 7.651632970451011, "percentage": 61.5, "cur_time": "2024-08-22 05:59:27", "elapsed_time": "0:23:56", "remaining_time": "0:14:59", "throughput": "15974.73", "total_tokens": 22944096}
{"current_steps": 618, "total_steps": 1000, "loss": 9.3435, "learning_rate": 0.0, "epoch": 7.68895800933126, "percentage": 61.8, "cur_time": "2024-08-22 05:59:36", "elapsed_time": "0:24:05", "remaining_time": "0:14:53", "throughput": "15958.68", "total_tokens": 23062688}
{"current_steps": 621, "total_steps": 1000, "loss": 10.2463, "learning_rate": 0.0, "epoch": 7.726283048211508, "percentage": 62.1, "cur_time": "2024-08-22 05:59:44", "elapsed_time": "0:24:13", "remaining_time": "0:14:46", "throughput": "15943.50", "total_tokens": 23170864}
{"current_steps": 624, "total_steps": 1000, "loss": 9.0613, "learning_rate": 0.0, "epoch": 7.763608087091757, "percentage": 62.4, "cur_time": "2024-08-22 05:59:52", "elapsed_time": "0:24:21", "remaining_time": "0:14:40", "throughput": "15922.84", "total_tokens": 23276464}
{"current_steps": 627, "total_steps": 1000, "loss": 8.1291, "learning_rate": 0.0, "epoch": 7.800933125972006, "percentage": 62.7, "cur_time": "2024-08-22 06:00:01", "elapsed_time": "0:24:30", "remaining_time": "0:14:34", "throughput": "15908.56", "total_tokens": 23392416}
{"current_steps": 630, "total_steps": 1000, "loss": 7.621, "learning_rate": 0.0, "epoch": 7.838258164852255, "percentage": 63.0, "cur_time": "2024-08-22 06:00:10", "elapsed_time": "0:24:39", "remaining_time": "0:14:28", "throughput": "15880.07", "total_tokens": 23496272}
{"current_steps": 633, "total_steps": 1000, "loss": 13.155, "learning_rate": 0.0, "epoch": 7.8755832037325035, "percentage": 63.3, "cur_time": "2024-08-22 06:00:19", "elapsed_time": "0:24:48", "remaining_time": "0:14:23", "throughput": "15862.75", "total_tokens": 23612064}
{"current_steps": 636, "total_steps": 1000, "loss": 9.4481, "learning_rate": 0.0, "epoch": 7.912908242612753, "percentage": 63.6, "cur_time": "2024-08-22 06:00:27", "elapsed_time": "0:24:56", "remaining_time": "0:14:16", "throughput": "15849.02", "total_tokens": 23720240}
{"current_steps": 639, "total_steps": 1000, "loss": 14.2044, "learning_rate": 0.0, "epoch": 7.950233281493001, "percentage": 63.9, "cur_time": "2024-08-22 06:00:37", "elapsed_time": "0:25:06", "remaining_time": "0:14:10", "throughput": "15821.08", "total_tokens": 23828768}
{"current_steps": 642, "total_steps": 1000, "loss": 12.8233, "learning_rate": 0.0, "epoch": 7.9875583203732505, "percentage": 64.2, "cur_time": "2024-08-22 06:00:44", "elapsed_time": "0:25:13", "remaining_time": "0:14:04", "throughput": "15815.95", "total_tokens": 23942672}
{"current_steps": 645, "total_steps": 1000, "loss": 21.3516, "learning_rate": 0.0, "epoch": 8.024883359253499, "percentage": 64.5, "cur_time": "2024-08-22 06:00:53", "elapsed_time": "0:25:22", "remaining_time": "0:13:57", "throughput": "15800.02", "total_tokens": 24051120}
{"current_steps": 648, "total_steps": 1000, "loss": 10.1635, "learning_rate": 0.0, "epoch": 8.062208398133748, "percentage": 64.8, "cur_time": "2024-08-22 06:01:01", "elapsed_time": "0:25:30", "remaining_time": "0:13:51", "throughput": "15782.15", "total_tokens": 24158848}
{"current_steps": 651, "total_steps": 1000, "loss": 15.1394, "learning_rate": 0.0, "epoch": 8.099533437013998, "percentage": 65.1, "cur_time": "2024-08-22 06:01:09", "elapsed_time": "0:25:38", "remaining_time": "0:13:45", "throughput": "15773.24", "total_tokens": 24273760}
{"current_steps": 654, "total_steps": 1000, "loss": 11.1823, "learning_rate": 0.0, "epoch": 8.136858475894245, "percentage": 65.4, "cur_time": "2024-08-22 06:01:18", "elapsed_time": "0:25:47", "remaining_time": "0:13:38", "throughput": "15761.93", "total_tokens": 24389040}
{"current_steps": 657, "total_steps": 1000, "loss": 12.2783, "learning_rate": 0.0, "epoch": 8.174183514774494, "percentage": 65.7, "cur_time": "2024-08-22 06:01:27", "elapsed_time": "0:25:56", "remaining_time": "0:13:32", "throughput": "15741.46", "total_tokens": 24497872}
{"current_steps": 660, "total_steps": 1000, "loss": 16.2704, "learning_rate": 0.0, "epoch": 8.211508553654744, "percentage": 66.0, "cur_time": "2024-08-22 06:01:35", "elapsed_time": "0:26:04", "remaining_time": "0:13:26", "throughput": "15723.67", "total_tokens": 24604576}
{"current_steps": 663, "total_steps": 1000, "loss": 7.936, "learning_rate": 0.0, "epoch": 8.248833592534993, "percentage": 66.3, "cur_time": "2024-08-22 06:01:46", "elapsed_time": "0:26:15", "remaining_time": "0:13:20", "throughput": "15689.40", "total_tokens": 24719728}
{"current_steps": 666, "total_steps": 1000, "loss": 9.9501, "learning_rate": 0.0, "epoch": 8.28615863141524, "percentage": 66.6, "cur_time": "2024-08-22 06:01:56", "elapsed_time": "0:26:25", "remaining_time": "0:13:15", "throughput": "15662.14", "total_tokens": 24839888}
{"current_steps": 669, "total_steps": 1000, "loss": 12.1488, "learning_rate": 0.0, "epoch": 8.32348367029549, "percentage": 66.9, "cur_time": "2024-08-22 06:02:07", "elapsed_time": "0:26:36", "remaining_time": "0:13:09", "throughput": "15633.28", "total_tokens": 24955312}
{"current_steps": 672, "total_steps": 1000, "loss": 17.915, "learning_rate": 0.0, "epoch": 8.360808709175739, "percentage": 67.2, "cur_time": "2024-08-22 06:02:15", "elapsed_time": "0:26:45", "remaining_time": "0:13:03", "throughput": "15622.91", "total_tokens": 25075152}
{"current_steps": 675, "total_steps": 1000, "loss": 14.0293, "learning_rate": 0.0, "epoch": 8.398133748055988, "percentage": 67.5, "cur_time": "2024-08-22 06:02:26", "elapsed_time": "0:26:55", "remaining_time": "0:12:57", "throughput": "15595.62", "total_tokens": 25188944}
{"current_steps": 678, "total_steps": 1000, "loss": 14.9981, "learning_rate": 0.0, "epoch": 8.435458786936236, "percentage": 67.8, "cur_time": "2024-08-22 06:02:35", "elapsed_time": "0:27:04", "remaining_time": "0:12:51", "throughput": "15572.97", "total_tokens": 25297920}
{"current_steps": 681, "total_steps": 1000, "loss": 12.4305, "learning_rate": 0.0, "epoch": 8.472783825816485, "percentage": 68.1, "cur_time": "2024-08-22 06:02:44", "elapsed_time": "0:27:13", "remaining_time": "0:12:45", "throughput": "15548.28", "total_tokens": 25405760}
{"current_steps": 684, "total_steps": 1000, "loss": 14.0933, "learning_rate": 0.0, "epoch": 8.510108864696734, "percentage": 68.4, "cur_time": "2024-08-22 06:02:55", "elapsed_time": "0:27:24", "remaining_time": "0:12:39", "throughput": "15525.25", "total_tokens": 25527152}
{"current_steps": 687, "total_steps": 1000, "loss": 5.9321, "learning_rate": 0.0, "epoch": 8.547433903576984, "percentage": 68.7, "cur_time": "2024-08-22 06:03:06", "elapsed_time": "0:27:35", "remaining_time": "0:12:34", "throughput": "15492.82", "total_tokens": 25643344}
{"current_steps": 690, "total_steps": 1000, "loss": 12.6201, "learning_rate": 0.0, "epoch": 8.584758942457231, "percentage": 69.0, "cur_time": "2024-08-22 06:03:16", "elapsed_time": "0:27:45", "remaining_time": "0:12:28", "throughput": "15468.84", "total_tokens": 25758096}
{"current_steps": 693, "total_steps": 1000, "loss": 14.1967, "learning_rate": 0.0, "epoch": 8.62208398133748, "percentage": 69.3, "cur_time": "2024-08-22 06:03:24", "elapsed_time": "0:27:53", "remaining_time": "0:12:21", "throughput": "15456.49", "total_tokens": 25864848}
{"current_steps": 696, "total_steps": 1000, "loss": 8.8063, "learning_rate": 0.0, "epoch": 8.65940902021773, "percentage": 69.6, "cur_time": "2024-08-22 06:03:36", "elapsed_time": "0:28:05", "remaining_time": "0:12:16", "throughput": "15412.74", "total_tokens": 25971872}
{"current_steps": 699, "total_steps": 1000, "loss": 12.9953, "learning_rate": 0.0, "epoch": 8.696734059097977, "percentage": 69.9, "cur_time": "2024-08-22 06:03:46", "elapsed_time": "0:28:15", "remaining_time": "0:12:10", "throughput": "15379.16", "total_tokens": 26080384}
{"current_steps": 702, "total_steps": 1000, "loss": 16.3832, "learning_rate": 0.0, "epoch": 8.734059097978227, "percentage": 70.2, "cur_time": "2024-08-22 06:03:55", "elapsed_time": "0:28:24", "remaining_time": "0:12:03", "throughput": "15363.66", "total_tokens": 26187648}
{"current_steps": 705, "total_steps": 1000, "loss": 8.5016, "learning_rate": 0.0, "epoch": 8.771384136858476, "percentage": 70.5, "cur_time": "2024-08-22 06:04:05", "elapsed_time": "0:28:34", "remaining_time": "0:11:57", "throughput": "15339.13", "total_tokens": 26298352}
{"current_steps": 708, "total_steps": 1000, "loss": 12.978, "learning_rate": 0.0, "epoch": 8.808709175738725, "percentage": 70.8, "cur_time": "2024-08-22 06:04:16", "elapsed_time": "0:28:46", "remaining_time": "0:11:51", "throughput": "15304.87", "total_tokens": 26416656}
{"current_steps": 711, "total_steps": 1000, "loss": 10.8072, "learning_rate": 0.0, "epoch": 8.846034214618973, "percentage": 71.1, "cur_time": "2024-08-22 06:04:26", "elapsed_time": "0:28:55", "remaining_time": "0:11:45", "throughput": "15283.40", "total_tokens": 26521456}
{"current_steps": 714, "total_steps": 1000, "loss": 11.6907, "learning_rate": 0.0, "epoch": 8.883359253499222, "percentage": 71.4, "cur_time": "2024-08-22 06:04:35", "elapsed_time": "0:29:04", "remaining_time": "0:11:38", "throughput": "15263.89", "total_tokens": 26630544}
{"current_steps": 717, "total_steps": 1000, "loss": 14.1724, "learning_rate": 0.0, "epoch": 8.920684292379471, "percentage": 71.7, "cur_time": "2024-08-22 06:04:46", "elapsed_time": "0:29:15", "remaining_time": "0:11:33", "throughput": "15230.05", "total_tokens": 26741824}
{"current_steps": 720, "total_steps": 1000, "loss": 11.2749, "learning_rate": 0.0, "epoch": 8.95800933125972, "percentage": 72.0, "cur_time": "2024-08-22 06:04:58", "elapsed_time": "0:29:27", "remaining_time": "0:11:27", "throughput": "15193.64", "total_tokens": 26850240}
{"current_steps": 723, "total_steps": 1000, "loss": 10.3783, "learning_rate": 0.0, "epoch": 8.995334370139968, "percentage": 72.3, "cur_time": "2024-08-22 06:05:09", "elapsed_time": "0:29:38", "remaining_time": "0:11:21", "throughput": "15154.38", "total_tokens": 26959600}
{"current_steps": 726, "total_steps": 1000, "loss": 10.993, "learning_rate": 0.0, "epoch": 9.032659409020217, "percentage": 72.6, "cur_time": "2024-08-22 06:05:20", "elapsed_time": "0:29:49", "remaining_time": "0:11:15", "throughput": "15132.10", "total_tokens": 27077136}
{"current_steps": 729, "total_steps": 1000, "loss": 15.6579, "learning_rate": 0.0, "epoch": 9.069984447900467, "percentage": 72.9, "cur_time": "2024-08-22 06:05:31", "elapsed_time": "0:30:00", "remaining_time": "0:11:09", "throughput": "15101.27", "total_tokens": 27183328}
{"current_steps": 732, "total_steps": 1000, "loss": 12.3566, "learning_rate": 0.0, "epoch": 9.107309486780716, "percentage": 73.2, "cur_time": "2024-08-22 06:05:41", "elapsed_time": "0:30:11", "remaining_time": "0:11:03", "throughput": "15073.75", "total_tokens": 27298640}
{"current_steps": 735, "total_steps": 1000, "loss": 9.5611, "learning_rate": 0.0, "epoch": 9.144634525660964, "percentage": 73.5, "cur_time": "2024-08-22 06:05:53", "elapsed_time": "0:30:22", "remaining_time": "0:10:57", "throughput": "15046.02", "total_tokens": 27420240}
{"current_steps": 738, "total_steps": 1000, "loss": 10.838, "learning_rate": 0.0, "epoch": 9.181959564541213, "percentage": 73.8, "cur_time": "2024-08-22 06:06:04", "elapsed_time": "0:30:33", "remaining_time": "0:10:50", "throughput": "15017.57", "total_tokens": 27531440}
{"current_steps": 741, "total_steps": 1000, "loss": 17.1524, "learning_rate": 0.0, "epoch": 9.219284603421462, "percentage": 74.1, "cur_time": "2024-08-22 06:06:15", "elapsed_time": "0:30:44", "remaining_time": "0:10:44", "throughput": "14982.83", "total_tokens": 27642768}
{"current_steps": 744, "total_steps": 1000, "loss": 21.5314, "learning_rate": 0.0, "epoch": 9.256609642301711, "percentage": 74.4, "cur_time": "2024-08-22 06:06:28", "elapsed_time": "0:30:57", "remaining_time": "0:10:39", "throughput": "14950.70", "total_tokens": 27765712}
{"current_steps": 747, "total_steps": 1000, "loss": 6.8498, "learning_rate": 0.0, "epoch": 9.293934681181959, "percentage": 74.7, "cur_time": "2024-08-22 06:06:38", "elapsed_time": "0:31:07", "remaining_time": "0:10:32", "throughput": "14930.68", "total_tokens": 27884240}
{"current_steps": 750, "total_steps": 1000, "loss": 7.8165, "learning_rate": 0.0, "epoch": 9.331259720062208, "percentage": 75.0, "cur_time": "2024-08-22 06:06:49", "elapsed_time": "0:31:18", "remaining_time": "0:10:26", "throughput": "14901.69", "total_tokens": 27993664}
{"current_steps": 753, "total_steps": 1000, "loss": 25.8428, "learning_rate": 0.0, "epoch": 9.368584758942458, "percentage": 75.3, "cur_time": "2024-08-22 06:07:00", "elapsed_time": "0:31:29", "remaining_time": "0:10:19", "throughput": "14873.49", "total_tokens": 28102224}
{"current_steps": 756, "total_steps": 1000, "loss": 12.2624, "learning_rate": 0.0, "epoch": 9.405909797822707, "percentage": 75.6, "cur_time": "2024-08-22 06:07:11", "elapsed_time": "0:31:40", "remaining_time": "0:10:13", "throughput": "14846.30", "total_tokens": 28219120}
{"current_steps": 759, "total_steps": 1000, "loss": 4.8092, "learning_rate": 0.0, "epoch": 9.443234836702954, "percentage": 75.9, "cur_time": "2024-08-22 06:07:22", "elapsed_time": "0:31:51", "remaining_time": "0:10:06", "throughput": "14824.68", "total_tokens": 28330768}
{"current_steps": 762, "total_steps": 1000, "loss": 20.5568, "learning_rate": 0.0, "epoch": 9.480559875583204, "percentage": 76.2, "cur_time": "2024-08-22 06:07:35", "elapsed_time": "0:32:04", "remaining_time": "0:10:01", "throughput": "14779.86", "total_tokens": 28444352}
{"current_steps": 765, "total_steps": 1000, "loss": 10.6764, "learning_rate": 0.0, "epoch": 9.517884914463453, "percentage": 76.5, "cur_time": "2024-08-22 06:07:48", "elapsed_time": "0:32:17", "remaining_time": "0:09:55", "throughput": "14735.09", "total_tokens": 28553904}
{"current_steps": 768, "total_steps": 1000, "loss": 13.5784, "learning_rate": 0.0, "epoch": 9.555209953343702, "percentage": 76.8, "cur_time": "2024-08-22 06:08:02", "elapsed_time": "0:32:31", "remaining_time": "0:09:49", "throughput": "14687.79", "total_tokens": 28662592}
{"current_steps": 771, "total_steps": 1000, "loss": 11.8902, "learning_rate": 0.0, "epoch": 9.59253499222395, "percentage": 77.1, "cur_time": "2024-08-22 06:08:14", "elapsed_time": "0:32:43", "remaining_time": "0:09:43", "throughput": "14650.56", "total_tokens": 28769040}
{"current_steps": 774, "total_steps": 1000, "loss": 11.5366, "learning_rate": 0.0, "epoch": 9.629860031104199, "percentage": 77.4, "cur_time": "2024-08-22 06:08:26", "elapsed_time": "0:32:55", "remaining_time": "0:09:36", "throughput": "14616.91", "total_tokens": 28869760}
{"current_steps": 777, "total_steps": 1000, "loss": 11.4389, "learning_rate": 0.0, "epoch": 9.667185069984448, "percentage": 77.7, "cur_time": "2024-08-22 06:08:40", "elapsed_time": "0:33:09", "remaining_time": "0:09:30", "throughput": "14570.21", "total_tokens": 28985152}
{"current_steps": 780, "total_steps": 1000, "loss": 8.7927, "learning_rate": 0.0, "epoch": 9.704510108864696, "percentage": 78.0, "cur_time": "2024-08-22 06:08:52", "elapsed_time": "0:33:21", "remaining_time": "0:09:24", "throughput": "14528.55", "total_tokens": 29085568}
{"current_steps": 783, "total_steps": 1000, "loss": 11.7756, "learning_rate": 0.0, "epoch": 9.741835147744945, "percentage": 78.3, "cur_time": "2024-08-22 06:09:06", "elapsed_time": "0:33:35", "remaining_time": "0:09:18", "throughput": "14485.86", "total_tokens": 29193920}
{"current_steps": 786, "total_steps": 1000, "loss": 12.465, "learning_rate": 0.0, "epoch": 9.779160186625194, "percentage": 78.6, "cur_time": "2024-08-22 06:09:19", "elapsed_time": "0:33:48", "remaining_time": "0:09:12", "throughput": "14443.90", "total_tokens": 29304656}
{"current_steps": 789, "total_steps": 1000, "loss": 6.525, "learning_rate": 0.0, "epoch": 9.816485225505444, "percentage": 78.9, "cur_time": "2024-08-22 06:09:33", "elapsed_time": "0:34:02", "remaining_time": "0:09:06", "throughput": "14407.23", "total_tokens": 29422464}
{"current_steps": 792, "total_steps": 1000, "loss": 7.5254, "learning_rate": 0.0, "epoch": 9.853810264385691, "percentage": 79.2, "cur_time": "2024-08-22 06:09:44", "elapsed_time": "0:34:13", "remaining_time": "0:08:59", "throughput": "14377.09", "total_tokens": 29529488}
{"current_steps": 795, "total_steps": 1000, "loss": 22.6713, "learning_rate": 0.0, "epoch": 9.89113530326594, "percentage": 79.5, "cur_time": "2024-08-22 06:09:58", "elapsed_time": "0:34:27", "remaining_time": "0:08:53", "throughput": "14338.88", "total_tokens": 29645280}
{"current_steps": 798, "total_steps": 1000, "loss": 17.2835, "learning_rate": 0.0, "epoch": 9.92846034214619, "percentage": 79.8, "cur_time": "2024-08-22 06:10:13", "elapsed_time": "0:34:42", "remaining_time": "0:08:47", "throughput": "14291.06", "total_tokens": 29754944}
{"current_steps": 801, "total_steps": 1000, "loss": 10.2831, "learning_rate": 0.0, "epoch": 9.96578538102644, "percentage": 80.1, "cur_time": "2024-08-22 06:10:24", "elapsed_time": "0:34:53", "remaining_time": "0:08:39", "throughput": "14265.73", "total_tokens": 29858736}
{"current_steps": 804, "total_steps": 1000, "loss": 9.022, "learning_rate": 0.0, "epoch": 10.003110419906687, "percentage": 80.4, "cur_time": "2024-08-22 06:10:37", "elapsed_time": "0:35:06", "remaining_time": "0:08:33", "throughput": "14231.83", "total_tokens": 29973344}
{"current_steps": 807, "total_steps": 1000, "loss": 17.1575, "learning_rate": 0.0, "epoch": 10.040435458786936, "percentage": 80.7, "cur_time": "2024-08-22 06:10:50", "elapsed_time": "0:35:19", "remaining_time": "0:08:26", "throughput": "14195.36", "total_tokens": 30092320}
{"current_steps": 810, "total_steps": 1000, "loss": 14.3203, "learning_rate": 0.0, "epoch": 10.077760497667185, "percentage": 81.0, "cur_time": "2024-08-22 06:11:02", "elapsed_time": "0:35:31", "remaining_time": "0:08:19", "throughput": "14169.29", "total_tokens": 30196208}
{"current_steps": 813, "total_steps": 1000, "loss": 9.3238, "learning_rate": 0.0, "epoch": 10.115085536547435, "percentage": 81.3, "cur_time": "2024-08-22 06:11:13", "elapsed_time": "0:35:42", "remaining_time": "0:08:12", "throughput": "14143.71", "total_tokens": 30300992}
{"current_steps": 816, "total_steps": 1000, "loss": 10.5851, "learning_rate": 0.0, "epoch": 10.152410575427682, "percentage": 81.6, "cur_time": "2024-08-22 06:11:26", "elapsed_time": "0:35:55", "remaining_time": "0:08:05", "throughput": "14107.43", "total_tokens": 30403824}
{"current_steps": 819, "total_steps": 1000, "loss": 9.3195, "learning_rate": 0.0, "epoch": 10.189735614307931, "percentage": 81.9, "cur_time": "2024-08-22 06:11:39", "elapsed_time": "0:36:08", "remaining_time": "0:07:59", "throughput": "14073.94", "total_tokens": 30515120}
{"current_steps": 822, "total_steps": 1000, "loss": 7.8101, "learning_rate": 0.0, "epoch": 10.22706065318818, "percentage": 82.2, "cur_time": "2024-08-22 06:11:52", "elapsed_time": "0:36:21", "remaining_time": "0:07:52", "throughput": "14039.31", "total_tokens": 30627680}
{"current_steps": 825, "total_steps": 1000, "loss": 11.4029, "learning_rate": 0.0, "epoch": 10.26438569206843, "percentage": 82.5, "cur_time": "2024-08-22 06:12:06", "elapsed_time": "0:36:35", "remaining_time": "0:07:45", "throughput": "14004.91", "total_tokens": 30742624}
{"current_steps": 828, "total_steps": 1000, "loss": 15.2435, "learning_rate": 0.0, "epoch": 10.301710730948678, "percentage": 82.8, "cur_time": "2024-08-22 06:12:20", "elapsed_time": "0:36:49", "remaining_time": "0:07:38", "throughput": "13966.50", "total_tokens": 30856448}
{"current_steps": 831, "total_steps": 1000, "loss": 10.4408, "learning_rate": 0.0, "epoch": 10.339035769828927, "percentage": 83.1, "cur_time": "2024-08-22 06:12:33", "elapsed_time": "0:37:02", "remaining_time": "0:07:31", "throughput": "13937.62", "total_tokens": 30973232}
{"current_steps": 834, "total_steps": 1000, "loss": 24.5362, "learning_rate": 0.0, "epoch": 10.376360808709176, "percentage": 83.4, "cur_time": "2024-08-22 06:12:45", "elapsed_time": "0:37:14", "remaining_time": "0:07:24", "throughput": "13910.19", "total_tokens": 31078448}
{"current_steps": 837, "total_steps": 1000, "loss": 15.8166, "learning_rate": 0.0, "epoch": 10.413685847589425, "percentage": 83.7, "cur_time": "2024-08-22 06:12:57", "elapsed_time": "0:37:26", "remaining_time": "0:07:17", "throughput": "13881.89", "total_tokens": 31187696}
{"current_steps": 840, "total_steps": 1000, "loss": 12.3537, "learning_rate": 0.0, "epoch": 10.451010886469673, "percentage": 84.0, "cur_time": "2024-08-22 06:13:11", "elapsed_time": "0:37:40", "remaining_time": "0:07:10", "throughput": "13849.09", "total_tokens": 31302464}
{"current_steps": 843, "total_steps": 1000, "loss": 8.8279, "learning_rate": 0.0, "epoch": 10.488335925349922, "percentage": 84.3, "cur_time": "2024-08-22 06:13:23", "elapsed_time": "0:37:52", "remaining_time": "0:07:03", "throughput": "13821.13", "total_tokens": 31413136}
{"current_steps": 846, "total_steps": 1000, "loss": 16.2256, "learning_rate": 0.0, "epoch": 10.525660964230172, "percentage": 84.6, "cur_time": "2024-08-22 06:13:36", "elapsed_time": "0:38:05", "remaining_time": "0:06:55", "throughput": "13796.47", "total_tokens": 31527120}
{"current_steps": 849, "total_steps": 1000, "loss": 8.6758, "learning_rate": 0.0, "epoch": 10.56298600311042, "percentage": 84.9, "cur_time": "2024-08-22 06:13:48", "elapsed_time": "0:38:17", "remaining_time": "0:06:48", "throughput": "13769.97", "total_tokens": 31637568}
{"current_steps": 852, "total_steps": 1000, "loss": 7.5267, "learning_rate": 0.0, "epoch": 10.600311041990668, "percentage": 85.2, "cur_time": "2024-08-22 06:14:02", "elapsed_time": "0:38:31", "remaining_time": "0:06:41", "throughput": "13731.06", "total_tokens": 31739616}
{"current_steps": 855, "total_steps": 1000, "loss": 15.6083, "learning_rate": 0.0, "epoch": 10.637636080870918, "percentage": 85.5, "cur_time": "2024-08-22 06:14:15", "elapsed_time": "0:38:44", "remaining_time": "0:06:34", "throughput": "13700.18", "total_tokens": 31851104}
{"current_steps": 858, "total_steps": 1000, "loss": 13.0695, "learning_rate": 0.0, "epoch": 10.674961119751167, "percentage": 85.8, "cur_time": "2024-08-22 06:14:28", "elapsed_time": "0:38:57", "remaining_time": "0:06:26", "throughput": "13675.46", "total_tokens": 31967616}
{"current_steps": 861, "total_steps": 1000, "loss": 19.6401, "learning_rate": 0.0, "epoch": 10.712286158631414, "percentage": 86.1, "cur_time": "2024-08-22 06:14:41", "elapsed_time": "0:39:10", "remaining_time": "0:06:19", "throughput": "13649.07", "total_tokens": 32082000}
{"current_steps": 864, "total_steps": 1000, "loss": 8.2957, "learning_rate": 0.0, "epoch": 10.749611197511664, "percentage": 86.4, "cur_time": "2024-08-22 06:14:53", "elapsed_time": "0:39:22", "remaining_time": "0:06:11", "throughput": "13624.52", "total_tokens": 32193776}
{"current_steps": 867, "total_steps": 1000, "loss": 11.6462, "learning_rate": 0.0, "epoch": 10.786936236391913, "percentage": 86.7, "cur_time": "2024-08-22 06:15:07", "elapsed_time": "0:39:36", "remaining_time": "0:06:04", "throughput": "13591.58", "total_tokens": 32304176}
{"current_steps": 870, "total_steps": 1000, "loss": 14.5142, "learning_rate": 0.0, "epoch": 10.824261275272162, "percentage": 87.0, "cur_time": "2024-08-22 06:15:20", "elapsed_time": "0:39:49", "remaining_time": "0:05:57", "throughput": "13566.74", "total_tokens": 32415904}
{"current_steps": 873, "total_steps": 1000, "loss": 9.4163, "learning_rate": 0.0, "epoch": 10.86158631415241, "percentage": 87.3, "cur_time": "2024-08-22 06:15:34", "elapsed_time": "0:40:03", "remaining_time": "0:05:49", "throughput": "13537.13", "total_tokens": 32536800}
{"current_steps": 876, "total_steps": 1000, "loss": 13.5533, "learning_rate": 0.0, "epoch": 10.89891135303266, "percentage": 87.6, "cur_time": "2024-08-22 06:15:48", "elapsed_time": "0:40:17", "remaining_time": "0:05:42", "throughput": "13505.80", "total_tokens": 32646368}
{"current_steps": 879, "total_steps": 1000, "loss": 9.3812, "learning_rate": 0.0, "epoch": 10.936236391912908, "percentage": 87.9, "cur_time": "2024-08-22 06:16:01", "elapsed_time": "0:40:30", "remaining_time": "0:05:34", "throughput": "13479.47", "total_tokens": 32758496}
{"current_steps": 882, "total_steps": 1000, "loss": 8.6979, "learning_rate": 0.0, "epoch": 10.973561430793158, "percentage": 88.2, "cur_time": "2024-08-22 06:16:13", "elapsed_time": "0:40:42", "remaining_time": "0:05:26", "throughput": "13458.11", "total_tokens": 32870528}
{"current_steps": 885, "total_steps": 1000, "loss": 7.2537, "learning_rate": 0.0, "epoch": 11.010886469673405, "percentage": 88.5, "cur_time": "2024-08-22 06:16:25", "elapsed_time": "0:40:54", "remaining_time": "0:05:18", "throughput": "13435.50", "total_tokens": 32981664}
{"current_steps": 888, "total_steps": 1000, "loss": 19.0666, "learning_rate": 0.0, "epoch": 11.048211508553655, "percentage": 88.8, "cur_time": "2024-08-22 06:16:38", "elapsed_time": "0:41:07", "remaining_time": "0:05:11", "throughput": "13412.31", "total_tokens": 33099424}
{"current_steps": 891, "total_steps": 1000, "loss": 8.0133, "learning_rate": 0.0, "epoch": 11.085536547433904, "percentage": 89.1, "cur_time": "2024-08-22 06:16:51", "elapsed_time": "0:41:21", "remaining_time": "0:05:03", "throughput": "13385.29", "total_tokens": 33209120}
{"current_steps": 894, "total_steps": 1000, "loss": 8.6026, "learning_rate": 0.0, "epoch": 11.122861586314153, "percentage": 89.4, "cur_time": "2024-08-22 06:17:04", "elapsed_time": "0:41:33", "remaining_time": "0:04:55", "throughput": "13362.89", "total_tokens": 33316720}
{"current_steps": 897, "total_steps": 1000, "loss": 10.6685, "learning_rate": 0.0, "epoch": 11.1601866251944, "percentage": 89.7, "cur_time": "2024-08-22 06:17:16", "elapsed_time": "0:41:45", "remaining_time": "0:04:47", "throughput": "13341.47", "total_tokens": 33429712}
{"current_steps": 900, "total_steps": 1000, "loss": 8.8793, "learning_rate": 0.0, "epoch": 11.19751166407465, "percentage": 90.0, "cur_time": "2024-08-22 06:17:30", "elapsed_time": "0:41:59", "remaining_time": "0:04:39", "throughput": "13309.80", "total_tokens": 33533616}
{"current_steps": 903, "total_steps": 1000, "loss": 8.1571, "learning_rate": 0.0, "epoch": 11.2348367029549, "percentage": 90.3, "cur_time": "2024-08-22 06:17:42", "elapsed_time": "0:42:11", "remaining_time": "0:04:31", "throughput": "13290.15", "total_tokens": 33645904}
{"current_steps": 906, "total_steps": 1000, "loss": 10.4339, "learning_rate": 0.0, "epoch": 11.272161741835149, "percentage": 90.6, "cur_time": "2024-08-22 06:17:54", "elapsed_time": "0:42:23", "remaining_time": "0:04:23", "throughput": "13268.78", "total_tokens": 33753728}
{"current_steps": 909, "total_steps": 1000, "loss": 17.3575, "learning_rate": 0.0, "epoch": 11.309486780715396, "percentage": 90.9, "cur_time": "2024-08-22 06:18:06", "elapsed_time": "0:42:35", "remaining_time": "0:04:15", "throughput": "13247.01", "total_tokens": 33858560}
{"current_steps": 912, "total_steps": 1000, "loss": 17.4937, "learning_rate": 0.0, "epoch": 11.346811819595645, "percentage": 91.2, "cur_time": "2024-08-22 06:18:21", "elapsed_time": "0:42:50", "remaining_time": "0:04:08", "throughput": "13219.19", "total_tokens": 33982384}
{"current_steps": 915, "total_steps": 1000, "loss": 17.59, "learning_rate": 0.0, "epoch": 11.384136858475895, "percentage": 91.5, "cur_time": "2024-08-22 06:18:32", "elapsed_time": "0:43:01", "remaining_time": "0:03:59", "throughput": "13203.65", "total_tokens": 34088640}
{"current_steps": 918, "total_steps": 1000, "loss": 7.6542, "learning_rate": 0.0, "epoch": 11.421461897356144, "percentage": 91.8, "cur_time": "2024-08-22 06:18:45", "elapsed_time": "0:43:14", "remaining_time": "0:03:51", "throughput": "13180.54", "total_tokens": 34197808}
{"current_steps": 921, "total_steps": 1000, "loss": 8.1183, "learning_rate": 0.0, "epoch": 11.458786936236391, "percentage": 92.1, "cur_time": "2024-08-22 06:18:58", "elapsed_time": "0:43:27", "remaining_time": "0:03:43", "throughput": "13157.96", "total_tokens": 34308832}
{"current_steps": 924, "total_steps": 1000, "loss": 8.557, "learning_rate": 0.0, "epoch": 11.49611197511664, "percentage": 92.4, "cur_time": "2024-08-22 06:19:13", "elapsed_time": "0:43:42", "remaining_time": "0:03:35", "throughput": "13127.10", "total_tokens": 34425072}
{"current_steps": 927, "total_steps": 1000, "loss": 12.6485, "learning_rate": 0.0, "epoch": 11.53343701399689, "percentage": 92.7, "cur_time": "2024-08-22 06:19:26", "elapsed_time": "0:43:55", "remaining_time": "0:03:27", "throughput": "13103.31", "total_tokens": 34539056}
{"current_steps": 930, "total_steps": 1000, "loss": 11.9814, "learning_rate": 0.0, "epoch": 11.57076205287714, "percentage": 93.0, "cur_time": "2024-08-22 06:19:41", "elapsed_time": "0:44:10", "remaining_time": "0:03:19", "throughput": "13075.65", "total_tokens": 34655216}
{"current_steps": 933, "total_steps": 1000, "loss": 15.5236, "learning_rate": 0.0, "epoch": 11.608087091757387, "percentage": 93.3, "cur_time": "2024-08-22 06:19:55", "elapsed_time": "0:44:24", "remaining_time": "0:03:11", "throughput": "13050.35", "total_tokens": 34773264}
{"current_steps": 936, "total_steps": 1000, "loss": 13.5778, "learning_rate": 0.0, "epoch": 11.645412130637636, "percentage": 93.6, "cur_time": "2024-08-22 06:20:08", "elapsed_time": "0:44:37", "remaining_time": "0:03:03", "throughput": "13027.35", "total_tokens": 34874688}
{"current_steps": 939, "total_steps": 1000, "loss": 13.0034, "learning_rate": 0.0, "epoch": 11.682737169517885, "percentage": 93.9, "cur_time": "2024-08-22 06:20:23", "elapsed_time": "0:44:52", "remaining_time": "0:02:54", "throughput": "12995.92", "total_tokens": 34990672}
{"current_steps": 942, "total_steps": 1000, "loss": 14.2124, "learning_rate": 0.0, "epoch": 11.720062208398133, "percentage": 94.2, "cur_time": "2024-08-22 06:20:37", "elapsed_time": "0:45:06", "remaining_time": "0:02:46", "throughput": "12973.79", "total_tokens": 35108560}
{"current_steps": 945, "total_steps": 1000, "loss": 7.7588, "learning_rate": 0.0, "epoch": 11.757387247278382, "percentage": 94.5, "cur_time": "2024-08-22 06:20:51", "elapsed_time": "0:45:20", "remaining_time": "0:02:38", "throughput": "12944.78", "total_tokens": 35222432}
{"current_steps": 948, "total_steps": 1000, "loss": 10.0868, "learning_rate": 0.0, "epoch": 11.794712286158632, "percentage": 94.8, "cur_time": "2024-08-22 06:21:04", "elapsed_time": "0:45:33", "remaining_time": "0:02:29", "throughput": "12919.64", "total_tokens": 35319344}
{"current_steps": 951, "total_steps": 1000, "loss": 15.1566, "learning_rate": 0.0, "epoch": 11.83203732503888, "percentage": 95.1, "cur_time": "2024-08-22 06:21:18", "elapsed_time": "0:45:47", "remaining_time": "0:02:21", "throughput": "12894.88", "total_tokens": 35429904}
{"current_steps": 954, "total_steps": 1000, "loss": 17.3247, "learning_rate": 0.0, "epoch": 11.869362363919128, "percentage": 95.4, "cur_time": "2024-08-22 06:21:32", "elapsed_time": "0:46:01", "remaining_time": "0:02:13", "throughput": "12869.74", "total_tokens": 35538176}
{"current_steps": 957, "total_steps": 1000, "loss": 10.6989, "learning_rate": 0.0, "epoch": 11.906687402799378, "percentage": 95.7, "cur_time": "2024-08-22 06:21:45", "elapsed_time": "0:46:14", "remaining_time": "0:02:04", "throughput": "12849.88", "total_tokens": 35650240}
{"current_steps": 960, "total_steps": 1000, "loss": 8.0925, "learning_rate": 0.0, "epoch": 11.944012441679627, "percentage": 96.0, "cur_time": "2024-08-22 06:21:59", "elapsed_time": "0:46:28", "remaining_time": "0:01:56", "throughput": "12827.26", "total_tokens": 35764208}
{"current_steps": 963, "total_steps": 1000, "loss": 8.5649, "learning_rate": 0.0, "epoch": 11.981337480559876, "percentage": 96.3, "cur_time": "2024-08-22 06:22:14", "elapsed_time": "0:46:43", "remaining_time": "0:01:47", "throughput": "12801.78", "total_tokens": 35884432}
{"current_steps": 966, "total_steps": 1000, "loss": 11.7754, "learning_rate": 0.0, "epoch": 12.018662519440124, "percentage": 96.6, "cur_time": "2024-08-22 06:22:28", "elapsed_time": "0:46:57", "remaining_time": "0:01:39", "throughput": "12779.29", "total_tokens": 36000592}
{"current_steps": 969, "total_steps": 1000, "loss": 10.2937, "learning_rate": 0.0, "epoch": 12.055987558320373, "percentage": 96.9, "cur_time": "2024-08-22 06:22:41", "elapsed_time": "0:47:10", "remaining_time": "0:01:30", "throughput": "12759.67", "total_tokens": 36116816}
{"current_steps": 972, "total_steps": 1000, "loss": 23.8059, "learning_rate": 0.0, "epoch": 12.093312597200622, "percentage": 97.2, "cur_time": "2024-08-22 06:22:54", "elapsed_time": "0:47:23", "remaining_time": "0:01:21", "throughput": "12740.74", "total_tokens": 36224528}
{"current_steps": 975, "total_steps": 1000, "loss": 8.6336, "learning_rate": 0.0, "epoch": 12.130637636080872, "percentage": 97.5, "cur_time": "2024-08-22 06:23:08", "elapsed_time": "0:47:37", "remaining_time": "0:01:13", "throughput": "12717.19", "total_tokens": 36342320}
{"current_steps": 978, "total_steps": 1000, "loss": 12.6936, "learning_rate": 0.0, "epoch": 12.16796267496112, "percentage": 97.8, "cur_time": "2024-08-22 06:23:21", "elapsed_time": "0:47:50", "remaining_time": "0:01:04", "throughput": "12697.40", "total_tokens": 36450064}
{"current_steps": 981, "total_steps": 1000, "loss": 5.388, "learning_rate": 0.0, "epoch": 12.205287713841368, "percentage": 98.1, "cur_time": "2024-08-22 06:23:34", "elapsed_time": "0:48:03", "remaining_time": "0:00:55", "throughput": "12680.89", "total_tokens": 36567520}
{"current_steps": 984, "total_steps": 1000, "loss": 12.5749, "learning_rate": 0.0, "epoch": 12.242612752721618, "percentage": 98.4, "cur_time": "2024-08-22 06:23:49", "elapsed_time": "0:48:18", "remaining_time": "0:00:47", "throughput": "12654.47", "total_tokens": 36681280}
{"current_steps": 987, "total_steps": 1000, "loss": 10.7256, "learning_rate": 0.0, "epoch": 12.279937791601867, "percentage": 98.7, "cur_time": "2024-08-22 06:24:02", "elapsed_time": "0:48:31", "remaining_time": "0:00:38", "throughput": "12635.07", "total_tokens": 36781744}
{"current_steps": 990, "total_steps": 1000, "loss": 14.4794, "learning_rate": 0.0, "epoch": 12.317262830482115, "percentage": 99.0, "cur_time": "2024-08-22 06:24:15", "elapsed_time": "0:48:44", "remaining_time": "0:00:29", "throughput": "12615.19", "total_tokens": 36887776}
{"current_steps": 993, "total_steps": 1000, "loss": 9.934, "learning_rate": 0.0, "epoch": 12.354587869362364, "percentage": 99.3, "cur_time": "2024-08-22 06:24:28", "elapsed_time": "0:48:57", "remaining_time": "0:00:20", "throughput": "12596.36", "total_tokens": 37003184}
{"current_steps": 996, "total_steps": 1000, "loss": 12.3277, "learning_rate": 0.0, "epoch": 12.391912908242613, "percentage": 99.6, "cur_time": "2024-08-22 06:24:40", "elapsed_time": "0:49:09", "remaining_time": "0:00:11", "throughput": "12582.80", "total_tokens": 37118576}
{"current_steps": 999, "total_steps": 1000, "loss": 11.689, "learning_rate": 0.0, "epoch": 12.42923794712286, "percentage": 99.9, "cur_time": "2024-08-22 06:24:52", "elapsed_time": "0:49:21", "remaining_time": "0:00:02", "throughput": "12569.59", "total_tokens": 37228288}
{"current_steps": 1000, "total_steps": 1000, "eval_loss": NaN, "epoch": 12.441679626749611, "percentage": 100.0, "cur_time": "2024-08-22 06:25:08", "elapsed_time": "0:49:37", "remaining_time": "0:00:00", "throughput": "12515.23", "total_tokens": 37262192}
{"current_steps": 1000, "total_steps": 1000, "epoch": 12.441679626749611, "percentage": 100.0, "cur_time": "2024-08-22 06:25:08", "elapsed_time": "0:49:37", "remaining_time": "0:00:00", "throughput": "12514.66", "total_tokens": 37262192}

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

View File

@ -0,0 +1,430 @@
08/22/2024 05:32:46 - INFO - llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:25273
08/22/2024 05:32:53 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:53 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:32:54 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:54 - INFO - llamafactory.hparams.parser - Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:32:54 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
08/22/2024 05:32:54 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:54 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:32:54 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:54 - INFO - llamafactory.hparams.parser - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:32:54 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:54 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:32:54 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:54 - INFO - llamafactory.hparams.parser - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:32:54 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/22/2024 05:32:54 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16
08/22/2024 05:33:49 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
08/22/2024 05:33:49 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
08/22/2024 05:33:49 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
08/22/2024 05:33:49 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
08/22/2024 05:33:49 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
08/22/2024 05:33:49 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
training example:
input_ids:
[1, 518, 25580, 29962, 29871, 31791, 31683, 31999, 30495, 30210, 30333, 31374, 30392, 31191, 31277, 30733, 31505, 30545, 235, 170, 135, 31403, 30267, 30847, 30801, 30413, 31277, 30733, 30214, 31088, 31302, 231, 193, 158, 31273, 31264, 30886, 235, 177, 177, 30267, 13, 30557, 30806, 30392, 30287, 234, 178, 138, 30333, 31374, 30210, 31026, 31584, 29901, 376, 30573, 30743, 233, 145, 165, 235, 177, 171, 30810, 30502, 30888, 31596, 30214, 30346, 30333, 30998, 31302, 231, 193, 158, 30287, 31185, 31025, 30354, 30763, 30503, 31195, 31507, 30214, 30651, 235, 178, 132, 30592, 30810, 30287, 235, 170, 133, 30940, 30267, 29908, 13, 518, 29914, 25580, 29962, 29871, 30810, 30502, 31026, 31584, 31277, 30733, 31505, 30545, 235, 170, 135, 31403, 30267, 2]
inputs:
<s> [INST] 判断给定的文章是否符合语法规则。如果不符合,请提供修改建议。
下面是一篇文章的开头: "为了探讨这个主题,本文将提供一系列数据和实例,以证明这一观点。"
[/INST] 这个开头符合语法规则。</s>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 29871, 30810, 30502, 31026, 31584, 31277, 30733, 31505, 30545, 235, 170, 135, 31403, 30267, 2]
labels:
这个开头符合语法规则。</s>
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,k_proj,v_proj,up_proj,o_proj,down_proj,q_proj
08/22/2024 05:35:27 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,down_proj,k_proj,o_proj,up_proj,gate_proj,v_proj
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,q_proj,o_proj,down_proj,v_proj,k_proj,up_proj
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:27 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:27 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,o_proj,up_proj,k_proj,gate_proj,down_proj,v_proj
08/22/2024 05:35:28 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
08/22/2024 05:35:28 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
08/22/2024 05:35:28 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
08/22/2024 05:35:28 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:28 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:28 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:28 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:28 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,gate_proj,o_proj,down_proj,q_proj,k_proj,up_proj
08/22/2024 05:35:29 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
08/22/2024 05:35:30 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:30 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:30 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:30 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:30 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,k_proj,down_proj,q_proj,o_proj,v_proj,up_proj
08/22/2024 05:35:30 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/22/2024 05:35:30 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/22/2024 05:35:30 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/22/2024 05:35:30 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/22/2024 05:35:30 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,gate_proj,down_proj,q_proj,up_proj,k_proj,v_proj
08/22/2024 05:35:30 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
08/22/2024 05:35:30 - INFO - llamafactory.model.loader - trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2958
{'loss': 9.2998, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.04, 'num_input_tokens_seen': 118832}
{'loss': 12.0759, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.07, 'num_input_tokens_seen': 221744}
{'loss': 19.6855, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.11, 'num_input_tokens_seen': 330496}
{'loss': 13.653, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.15, 'num_input_tokens_seen': 441856}
{'loss': 11.6583, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.19, 'num_input_tokens_seen': 555712}
{'loss': 6.0712, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.22, 'num_input_tokens_seen': 660672}
{'loss': 14.2505, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.26, 'num_input_tokens_seen': 776560}
{'loss': 8.3573, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.3, 'num_input_tokens_seen': 878160}
{'loss': 9.2897, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.34, 'num_input_tokens_seen': 995808}
{'loss': 11.2486, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.37, 'num_input_tokens_seen': 1106496}
{'loss': 13.309, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.41, 'num_input_tokens_seen': 1210208}
{'loss': 16.2104, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.45, 'num_input_tokens_seen': 1324352}
{'loss': 5.2954, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.49, 'num_input_tokens_seen': 1433792}
{'loss': 10.5451, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.52, 'num_input_tokens_seen': 1539888}
{'loss': 19.6801, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.56, 'num_input_tokens_seen': 1658080}
{'loss': 8.8696, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.6, 'num_input_tokens_seen': 1774576}
{'loss': 23.9679, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.63, 'num_input_tokens_seen': 1888096}
{'loss': 8.9261, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.67, 'num_input_tokens_seen': 2001680}
{'loss': 6.7411, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.71, 'num_input_tokens_seen': 2105744}
{'loss': 9.0461, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.75, 'num_input_tokens_seen': 2215056}
{'loss': 11.6066, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.78, 'num_input_tokens_seen': 2337728}
{'loss': 12.4327, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.82, 'num_input_tokens_seen': 2451904}
{'loss': 8.7956, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.86, 'num_input_tokens_seen': 2558544}
{'loss': 15.0004, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.9, 'num_input_tokens_seen': 2672672}
{'loss': 17.6262, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.93, 'num_input_tokens_seen': 2784704}
{'loss': 14.3673, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 0.97, 'num_input_tokens_seen': 2895952}
{'loss': 10.5809, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.01, 'num_input_tokens_seen': 3005872}
{'loss': 11.7681, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.05, 'num_input_tokens_seen': 3121712}
{'loss': 10.2302, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.08, 'num_input_tokens_seen': 3228464}
{'loss': 7.6525, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.12, 'num_input_tokens_seen': 3341504}
{'loss': 14.1201, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.16, 'num_input_tokens_seen': 3466688}
{'loss': 11.1454, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.19, 'num_input_tokens_seen': 3573200}
{'loss': 16.2573, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.23, 'num_input_tokens_seen': 3679728}
{'loss': 8.3276, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.27, 'num_input_tokens_seen': 3796640}
{'loss': 8.4286, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.31, 'num_input_tokens_seen': 3902912}
{'loss': 9.6022, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.34, 'num_input_tokens_seen': 4008000}
{'loss': 6.8597, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.38, 'num_input_tokens_seen': 4115280}
{'loss': 12.0886, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.42, 'num_input_tokens_seen': 4229648}
{'loss': 13.8743, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.46, 'num_input_tokens_seen': 4345392}
{'loss': 15.7739, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.49, 'num_input_tokens_seen': 4456880}
{'loss': 7.1596, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.53, 'num_input_tokens_seen': 4574288}
{'loss': 8.1217, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.57, 'num_input_tokens_seen': 4676256}
{'loss': 28.1282, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.6, 'num_input_tokens_seen': 4791120}
{'loss': 6.5505, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.64, 'num_input_tokens_seen': 4902480}
{'loss': 11.5594, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.68, 'num_input_tokens_seen': 5010624}
{'loss': 11.3297, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.72, 'num_input_tokens_seen': 5115552}
{'loss': 13.8845, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.75, 'num_input_tokens_seen': 5230896}
{'loss': 16.5376, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.79, 'num_input_tokens_seen': 5342896}
{'loss': 7.6679, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.83, 'num_input_tokens_seen': 5457904}
{'loss': 15.0238, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.87, 'num_input_tokens_seen': 5573360}
{'loss': 7.4082, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.9, 'num_input_tokens_seen': 5686432}
{'loss': 7.2681, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.94, 'num_input_tokens_seen': 5803248}
{'loss': 17.5133, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 1.98, 'num_input_tokens_seen': 5911424}
{'loss': 21.3751, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.02, 'num_input_tokens_seen': 6030352}
{'loss': 14.4481, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.05, 'num_input_tokens_seen': 6137888}
{'loss': 8.3918, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.09, 'num_input_tokens_seen': 6248704}
{'loss': 21.2405, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.13, 'num_input_tokens_seen': 6354144}
{'loss': 8.2517, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.16, 'num_input_tokens_seen': 6467936}
{'loss': 7.6873, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.2, 'num_input_tokens_seen': 6579504}
{'loss': 12.5826, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.24, 'num_input_tokens_seen': 6691920}
{'loss': 14.0724, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.28, 'num_input_tokens_seen': 6808080}
{'loss': 9.5351, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.31, 'num_input_tokens_seen': 6925936}
{'loss': 11.3624, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.35, 'num_input_tokens_seen': 7032576}
{'loss': 23.8997, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.39, 'num_input_tokens_seen': 7140304}
{'loss': 7.9994, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.43, 'num_input_tokens_seen': 7244320}
{'loss': 12.7368, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.46, 'num_input_tokens_seen': 7351408}
{'loss': 13.655, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.5, 'num_input_tokens_seen': 7463792}
{'loss': 9.2975, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.54, 'num_input_tokens_seen': 7572176}
{'loss': 8.1766, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.58, 'num_input_tokens_seen': 7688464}
{'loss': 12.0825, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.61, 'num_input_tokens_seen': 7799760}
{'loss': 5.9591, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.65, 'num_input_tokens_seen': 7920016}
{'loss': 17.4673, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.69, 'num_input_tokens_seen': 8032896}
{'loss': 6.1845, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.72, 'num_input_tokens_seen': 8143360}
{'loss': 8.8712, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.76, 'num_input_tokens_seen': 8244608}
{'loss': 9.6746, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.8, 'num_input_tokens_seen': 8362592}
{'loss': 15.1383, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.84, 'num_input_tokens_seen': 8480496}
{'loss': 8.4034, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.87, 'num_input_tokens_seen': 8594912}
{'loss': 20.3951, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.91, 'num_input_tokens_seen': 8707728}
{'loss': 6.4257, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.95, 'num_input_tokens_seen': 8815456}
{'loss': 13.9254, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 2.99, 'num_input_tokens_seen': 8932288}
{'loss': 12.1414, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.02, 'num_input_tokens_seen': 9048592}
{'loss': 11.2167, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.06, 'num_input_tokens_seen': 9158576}
{'loss': 16.1835, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.1, 'num_input_tokens_seen': 9264336}
{'loss': 16.5755, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.14, 'num_input_tokens_seen': 9379136}
{'loss': 15.7423, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.17, 'num_input_tokens_seen': 9483568}
{'loss': 11.4227, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.21, 'num_input_tokens_seen': 9596304}
{'loss': 14.9299, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.25, 'num_input_tokens_seen': 9706592}
{'loss': 7.2115, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.28, 'num_input_tokens_seen': 9807376}
{'loss': 16.8579, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.32, 'num_input_tokens_seen': 9918144}
{'loss': 9.2991, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.36, 'num_input_tokens_seen': 10027360}
{'loss': 10.1769, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.4, 'num_input_tokens_seen': 10141792}
{'loss': 27.4338, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.43, 'num_input_tokens_seen': 10254096}
{'loss': 6.8982, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.47, 'num_input_tokens_seen': 10368672}
{'loss': 7.7729, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.51, 'num_input_tokens_seen': 10467568}
{'loss': 13.1635, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.55, 'num_input_tokens_seen': 10590848}
{'loss': 11.636, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.58, 'num_input_tokens_seen': 10702448}
{'loss': 12.8655, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.62, 'num_input_tokens_seen': 10827376}
{'loss': 8.1184, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.66, 'num_input_tokens_seen': 10940560}
{'loss': 12.5074, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.7, 'num_input_tokens_seen': 11045168}
{'loss': 6.7851, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.73, 'num_input_tokens_seen': 11154736}
{'loss': 12.5286, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.77, 'num_input_tokens_seen': 11265440}
{'loss': 9.2964, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.81, 'num_input_tokens_seen': 11377728}
{'loss': 12.8742, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.84, 'num_input_tokens_seen': 11482944}
{'loss': 10.1718, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.88, 'num_input_tokens_seen': 11604304}
{'loss': 9.0186, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.92, 'num_input_tokens_seen': 11717840}
{'loss': 12.1146, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.96, 'num_input_tokens_seen': 11838736}
{'loss': 8.2904, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 3.99, 'num_input_tokens_seen': 11951984}
{'loss': 14.7016, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.03, 'num_input_tokens_seen': 12068128}
{'loss': 12.1103, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.07, 'num_input_tokens_seen': 12168768}
{'loss': 13.9338, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.11, 'num_input_tokens_seen': 12286624}
{'loss': 5.3579, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.14, 'num_input_tokens_seen': 12391296}
{'loss': 9.147, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.18, 'num_input_tokens_seen': 12512656}
{'loss': 12.5259, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.22, 'num_input_tokens_seen': 12617696}
{'loss': 17.4818, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.26, 'num_input_tokens_seen': 12735136}
{'loss': 20.5624, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.29, 'num_input_tokens_seen': 12849008}
{'loss': 13.0519, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.33, 'num_input_tokens_seen': 12962592}
{'loss': 8.2046, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.37, 'num_input_tokens_seen': 13074688}
{'loss': 7.3577, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.4, 'num_input_tokens_seen': 13186240}
{'loss': 7.2161, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.44, 'num_input_tokens_seen': 13290736}
{'loss': 6.7259, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.48, 'num_input_tokens_seen': 13397344}
{'loss': 26.3404, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.52, 'num_input_tokens_seen': 13499776}
{'loss': 12.21, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.55, 'num_input_tokens_seen': 13613312}
{'loss': 11.6263, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.59, 'num_input_tokens_seen': 13724848}
{'loss': 5.3601, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.63, 'num_input_tokens_seen': 13832560}
{'loss': 5.7389, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.67, 'num_input_tokens_seen': 13947488}
{'loss': 10.5194, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.7, 'num_input_tokens_seen': 14068672}
{'loss': 9.6571, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.74, 'num_input_tokens_seen': 14184800}
{'loss': 19.7807, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.78, 'num_input_tokens_seen': 14301888}
{'loss': 11.351, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.81, 'num_input_tokens_seen': 14423984}
{'loss': 7.8255, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.85, 'num_input_tokens_seen': 14532336}
{'loss': 13.489, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.89, 'num_input_tokens_seen': 14634784}
{'loss': 12.5538, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.93, 'num_input_tokens_seen': 14741952}
{'loss': 17.8018, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 4.96, 'num_input_tokens_seen': 14859328}
{'loss': 8.9342, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.0, 'num_input_tokens_seen': 14976672}
{'loss': 15.353, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.04, 'num_input_tokens_seen': 15084176}
{'loss': 6.0025, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.08, 'num_input_tokens_seen': 15199840}
{'loss': 18.3302, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.11, 'num_input_tokens_seen': 15311664}
{'loss': 10.1281, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.15, 'num_input_tokens_seen': 15417552}
{'loss': 19.8608, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.19, 'num_input_tokens_seen': 15533648}
{'loss': 10.9191, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.23, 'num_input_tokens_seen': 15653504}
{'loss': 7.1656, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.26, 'num_input_tokens_seen': 15770336}
{'loss': 8.2608, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.3, 'num_input_tokens_seen': 15882608}
{'loss': 7.1175, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.34, 'num_input_tokens_seen': 16002000}
{'loss': 7.9221, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.37, 'num_input_tokens_seen': 16115504}
{'loss': 7.5912, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.41, 'num_input_tokens_seen': 16223232}
{'loss': 15.3158, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.45, 'num_input_tokens_seen': 16338720}
{'loss': 7.991, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.49, 'num_input_tokens_seen': 16444768}
{'loss': 12.7983, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.52, 'num_input_tokens_seen': 16558672}
{'loss': 14.9909, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.56, 'num_input_tokens_seen': 16665760}
{'loss': 8.7115, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.6, 'num_input_tokens_seen': 16772560}
{'loss': 9.2608, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.64, 'num_input_tokens_seen': 16892544}
{'loss': 12.3677, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.67, 'num_input_tokens_seen': 17004288}
{'loss': 5.9306, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.71, 'num_input_tokens_seen': 17110288}
{'loss': 9.6708, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.75, 'num_input_tokens_seen': 17220896}
{'loss': 8.0083, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.79, 'num_input_tokens_seen': 17340752}
{'loss': 10.2094, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.82, 'num_input_tokens_seen': 17445744}
{'loss': 9.8845, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.86, 'num_input_tokens_seen': 17561664}
{'loss': 15.0664, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.9, 'num_input_tokens_seen': 17672976}
{'loss': 17.8165, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.93, 'num_input_tokens_seen': 17780224}
{'loss': 12.124, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.97, 'num_input_tokens_seen': 17889984}
{'loss': 13.9358, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.01, 'num_input_tokens_seen': 18004224}
{'loss': 8.6936, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.05, 'num_input_tokens_seen': 18123616}
{'loss': 9.3147, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.08, 'num_input_tokens_seen': 18231584}
{'loss': 17.2013, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.12, 'num_input_tokens_seen': 18346080}
{'loss': 15.7279, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.16, 'num_input_tokens_seen': 18461776}
{'loss': 6.5026, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.2, 'num_input_tokens_seen': 18575904}
{'eval_loss': nan, 'eval_runtime': 6.0791, 'eval_samples_per_second': 164.498, 'eval_steps_per_second': 11.844, 'epoch': 6.22, 'num_input_tokens_seen': 18659952}
{'loss': 12.0386, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.23, 'num_input_tokens_seen': 18695344}
{'loss': 8.9403, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.27, 'num_input_tokens_seen': 18796944}
{'loss': 8.8669, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.31, 'num_input_tokens_seen': 18905296}
{'loss': 9.6763, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.35, 'num_input_tokens_seen': 19021664}
{'loss': 15.275, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.38, 'num_input_tokens_seen': 19137824}
{'loss': 36.5206, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.42, 'num_input_tokens_seen': 19250000}
{'loss': 14.6615, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.46, 'num_input_tokens_seen': 19361152}
{'loss': 15.7849, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.49, 'num_input_tokens_seen': 19474288}
{'loss': 7.2328, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.53, 'num_input_tokens_seen': 19587008}
{'loss': 10.6908, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.57, 'num_input_tokens_seen': 19696112}
{'loss': 15.1283, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.61, 'num_input_tokens_seen': 19807648}
{'loss': 20.95, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.64, 'num_input_tokens_seen': 19916992}
{'loss': 14.2251, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.68, 'num_input_tokens_seen': 20022672}
{'loss': 17.42, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.72, 'num_input_tokens_seen': 20138768}
{'loss': 10.8675, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.76, 'num_input_tokens_seen': 20247824}
{'loss': 8.586, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.79, 'num_input_tokens_seen': 20357072}
{'loss': 8.1799, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.83, 'num_input_tokens_seen': 20466928}
{'loss': 15.5518, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.87, 'num_input_tokens_seen': 20571168}
{'loss': 10.5345, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.91, 'num_input_tokens_seen': 20691152}
{'loss': 6.7281, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.94, 'num_input_tokens_seen': 20804944}
{'loss': 9.2801, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 6.98, 'num_input_tokens_seen': 20916128}
{'loss': 5.9323, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.02, 'num_input_tokens_seen': 21031664}
{'loss': 9.9459, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.05, 'num_input_tokens_seen': 21132288}
{'loss': 12.6263, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.09, 'num_input_tokens_seen': 21247024}
{'loss': 11.8039, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.13, 'num_input_tokens_seen': 21363776}
{'loss': 12.5097, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.17, 'num_input_tokens_seen': 21477024}
{'loss': 12.6844, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.2, 'num_input_tokens_seen': 21588896}
{'loss': 13.2153, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.24, 'num_input_tokens_seen': 21705216}
{'loss': 15.364, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.28, 'num_input_tokens_seen': 21820000}
{'loss': 6.5364, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.32, 'num_input_tokens_seen': 21930352}
{'loss': 8.7861, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.35, 'num_input_tokens_seen': 22035424}
{'loss': 9.4291, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.39, 'num_input_tokens_seen': 22154624}
{'loss': 8.9286, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.43, 'num_input_tokens_seen': 22267792}
{'loss': 6.2194, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.47, 'num_input_tokens_seen': 22377184}
{'loss': 12.7506, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.5, 'num_input_tokens_seen': 22496112}
{'loss': 16.7138, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.54, 'num_input_tokens_seen': 22613600}
{'loss': 15.042, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.58, 'num_input_tokens_seen': 22726320}
{'loss': 8.3775, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.61, 'num_input_tokens_seen': 22830224}
{'loss': 11.2836, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.65, 'num_input_tokens_seen': 22944096}
{'loss': 9.3435, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.69, 'num_input_tokens_seen': 23062688}
{'loss': 10.2463, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.73, 'num_input_tokens_seen': 23170864}
{'loss': 9.0613, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.76, 'num_input_tokens_seen': 23276464}
{'loss': 8.1291, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.8, 'num_input_tokens_seen': 23392416}
{'loss': 7.621, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.84, 'num_input_tokens_seen': 23496272}
{'loss': 13.155, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.88, 'num_input_tokens_seen': 23612064}
{'loss': 9.4481, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.91, 'num_input_tokens_seen': 23720240}
{'loss': 14.2044, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.95, 'num_input_tokens_seen': 23828768}
{'loss': 12.8233, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 7.99, 'num_input_tokens_seen': 23942672}
{'loss': 21.3516, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.02, 'num_input_tokens_seen': 24051120}
{'loss': 10.1635, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.06, 'num_input_tokens_seen': 24158848}
{'loss': 15.1394, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.1, 'num_input_tokens_seen': 24273760}
{'loss': 11.1823, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.14, 'num_input_tokens_seen': 24389040}
{'loss': 12.2783, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.17, 'num_input_tokens_seen': 24497872}
{'loss': 16.2704, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.21, 'num_input_tokens_seen': 24604576}
{'loss': 7.936, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.25, 'num_input_tokens_seen': 24719728}
{'loss': 9.9501, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.29, 'num_input_tokens_seen': 24839888}
{'loss': 12.1488, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.32, 'num_input_tokens_seen': 24955312}
{'loss': 17.915, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.36, 'num_input_tokens_seen': 25075152}
{'loss': 14.0293, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.4, 'num_input_tokens_seen': 25188944}
{'loss': 14.9981, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.44, 'num_input_tokens_seen': 25297920}
{'loss': 12.4305, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.47, 'num_input_tokens_seen': 25405760}
{'loss': 14.0933, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.51, 'num_input_tokens_seen': 25527152}
{'loss': 5.9321, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.55, 'num_input_tokens_seen': 25643344}
{'loss': 12.6201, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.58, 'num_input_tokens_seen': 25758096}
{'loss': 14.1967, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.62, 'num_input_tokens_seen': 25864848}
{'loss': 8.8063, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.66, 'num_input_tokens_seen': 25971872}
{'loss': 12.9953, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.7, 'num_input_tokens_seen': 26080384}
{'loss': 16.3832, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.73, 'num_input_tokens_seen': 26187648}
{'loss': 8.5016, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.77, 'num_input_tokens_seen': 26298352}
{'loss': 12.978, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.81, 'num_input_tokens_seen': 26416656}
{'loss': 10.8072, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.85, 'num_input_tokens_seen': 26521456}
{'loss': 11.6907, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.88, 'num_input_tokens_seen': 26630544}
{'loss': 14.1724, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.92, 'num_input_tokens_seen': 26741824}
{'loss': 11.2749, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 8.96, 'num_input_tokens_seen': 26850240}
{'loss': 10.3783, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.0, 'num_input_tokens_seen': 26959600}
{'loss': 10.993, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.03, 'num_input_tokens_seen': 27077136}
{'loss': 15.6579, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.07, 'num_input_tokens_seen': 27183328}
{'loss': 12.3566, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.11, 'num_input_tokens_seen': 27298640}
{'loss': 9.5611, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.14, 'num_input_tokens_seen': 27420240}
{'loss': 10.838, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.18, 'num_input_tokens_seen': 27531440}
{'loss': 17.1524, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.22, 'num_input_tokens_seen': 27642768}
{'loss': 21.5314, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.26, 'num_input_tokens_seen': 27765712}
{'loss': 6.8498, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.29, 'num_input_tokens_seen': 27884240}
{'loss': 7.8165, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.33, 'num_input_tokens_seen': 27993664}
{'loss': 25.8428, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.37, 'num_input_tokens_seen': 28102224}
{'loss': 12.2624, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.41, 'num_input_tokens_seen': 28219120}
{'loss': 4.8092, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.44, 'num_input_tokens_seen': 28330768}
{'loss': 20.5568, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.48, 'num_input_tokens_seen': 28444352}
{'loss': 10.6764, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.52, 'num_input_tokens_seen': 28553904}
{'loss': 13.5784, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.56, 'num_input_tokens_seen': 28662592}
{'loss': 11.8902, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.59, 'num_input_tokens_seen': 28769040}
{'loss': 11.5366, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.63, 'num_input_tokens_seen': 28869760}
{'loss': 11.4389, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.67, 'num_input_tokens_seen': 28985152}
{'loss': 8.7927, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.7, 'num_input_tokens_seen': 29085568}
{'loss': 11.7756, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.74, 'num_input_tokens_seen': 29193920}
{'loss': 12.465, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.78, 'num_input_tokens_seen': 29304656}
{'loss': 6.525, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.82, 'num_input_tokens_seen': 29422464}
{'loss': 7.5254, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.85, 'num_input_tokens_seen': 29529488}
{'loss': 22.6713, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.89, 'num_input_tokens_seen': 29645280}
{'loss': 17.2835, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.93, 'num_input_tokens_seen': 29754944}
{'loss': 10.2831, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 9.97, 'num_input_tokens_seen': 29858736}
{'loss': 9.022, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.0, 'num_input_tokens_seen': 29973344}
{'loss': 17.1575, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.04, 'num_input_tokens_seen': 30092320}
{'loss': 14.3203, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.08, 'num_input_tokens_seen': 30196208}
{'loss': 9.3238, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.12, 'num_input_tokens_seen': 30300992}
{'loss': 10.5851, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.15, 'num_input_tokens_seen': 30403824}
{'loss': 9.3195, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.19, 'num_input_tokens_seen': 30515120}
{'loss': 7.8101, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.23, 'num_input_tokens_seen': 30627680}
{'loss': 11.4029, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.26, 'num_input_tokens_seen': 30742624}
{'loss': 15.2435, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.3, 'num_input_tokens_seen': 30856448}
{'loss': 10.4408, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.34, 'num_input_tokens_seen': 30973232}
{'loss': 24.5362, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.38, 'num_input_tokens_seen': 31078448}
{'loss': 15.8166, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.41, 'num_input_tokens_seen': 31187696}
{'loss': 12.3537, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.45, 'num_input_tokens_seen': 31302464}
{'loss': 8.8279, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.49, 'num_input_tokens_seen': 31413136}
{'loss': 16.2256, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.53, 'num_input_tokens_seen': 31527120}
{'loss': 8.6758, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.56, 'num_input_tokens_seen': 31637568}
{'loss': 7.5267, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.6, 'num_input_tokens_seen': 31739616}
{'loss': 15.6083, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.64, 'num_input_tokens_seen': 31851104}
{'loss': 13.0695, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.67, 'num_input_tokens_seen': 31967616}
{'loss': 19.6401, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.71, 'num_input_tokens_seen': 32082000}
{'loss': 8.2957, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.75, 'num_input_tokens_seen': 32193776}
{'loss': 11.6462, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.79, 'num_input_tokens_seen': 32304176}
{'loss': 14.5142, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.82, 'num_input_tokens_seen': 32415904}
{'loss': 9.4163, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.86, 'num_input_tokens_seen': 32536800}
{'loss': 13.5533, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.9, 'num_input_tokens_seen': 32646368}
{'loss': 9.3812, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.94, 'num_input_tokens_seen': 32758496}
{'loss': 8.6979, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 10.97, 'num_input_tokens_seen': 32870528}
{'loss': 7.2537, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.01, 'num_input_tokens_seen': 32981664}
{'loss': 19.0666, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.05, 'num_input_tokens_seen': 33099424}
{'loss': 8.0133, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.09, 'num_input_tokens_seen': 33209120}
{'loss': 8.6026, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.12, 'num_input_tokens_seen': 33316720}
{'loss': 10.6685, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.16, 'num_input_tokens_seen': 33429712}
{'loss': 8.8793, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.2, 'num_input_tokens_seen': 33533616}
{'loss': 8.1571, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.23, 'num_input_tokens_seen': 33645904}
{'loss': 10.4339, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.27, 'num_input_tokens_seen': 33753728}
{'loss': 17.3575, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.31, 'num_input_tokens_seen': 33858560}
{'loss': 17.4937, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.35, 'num_input_tokens_seen': 33982384}
{'loss': 17.59, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.38, 'num_input_tokens_seen': 34088640}
{'loss': 7.6542, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.42, 'num_input_tokens_seen': 34197808}
{'loss': 8.1183, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.46, 'num_input_tokens_seen': 34308832}
{'loss': 8.557, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.5, 'num_input_tokens_seen': 34425072}
{'loss': 12.6485, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.53, 'num_input_tokens_seen': 34539056}
{'loss': 11.9814, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.57, 'num_input_tokens_seen': 34655216}
{'loss': 15.5236, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.61, 'num_input_tokens_seen': 34773264}
{'loss': 13.5778, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.65, 'num_input_tokens_seen': 34874688}
{'loss': 13.0034, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.68, 'num_input_tokens_seen': 34990672}
{'loss': 14.2124, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.72, 'num_input_tokens_seen': 35108560}
{'loss': 7.7588, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.76, 'num_input_tokens_seen': 35222432}
{'loss': 10.0868, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.79, 'num_input_tokens_seen': 35319344}
{'loss': 15.1566, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.83, 'num_input_tokens_seen': 35429904}
{'loss': 17.3247, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.87, 'num_input_tokens_seen': 35538176}
{'loss': 10.6989, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.91, 'num_input_tokens_seen': 35650240}
{'loss': 8.0925, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.94, 'num_input_tokens_seen': 35764208}
{'loss': 8.5649, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 11.98, 'num_input_tokens_seen': 35884432}
{'loss': 11.7754, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.02, 'num_input_tokens_seen': 36000592}
{'loss': 10.2937, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.06, 'num_input_tokens_seen': 36116816}
{'loss': 23.8059, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.09, 'num_input_tokens_seen': 36224528}
{'loss': 8.6336, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.13, 'num_input_tokens_seen': 36342320}
{'loss': 12.6936, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.17, 'num_input_tokens_seen': 36450064}
{'loss': 5.388, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.21, 'num_input_tokens_seen': 36567520}
{'loss': 12.5749, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.24, 'num_input_tokens_seen': 36681280}
{'loss': 10.7256, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.28, 'num_input_tokens_seen': 36781744}
{'loss': 14.4794, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.32, 'num_input_tokens_seen': 36887776}
{'loss': 9.934, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.35, 'num_input_tokens_seen': 37003184}
{'loss': 12.3277, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.39, 'num_input_tokens_seen': 37118576}
{'loss': 11.689, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 12.43, 'num_input_tokens_seen': 37228288}
{'eval_loss': nan, 'eval_runtime': 11.5479, 'eval_samples_per_second': 86.596, 'eval_steps_per_second': 6.235, 'epoch': 12.44, 'num_input_tokens_seen': 37262192}
{'train_runtime': 2977.5083, 'train_samples_per_second': 37.615, 'train_steps_per_second': 0.336, 'train_tokens_per_second': 2751.294, 'train_loss': 12.00993843150139, 'epoch': 12.44, 'num_input_tokens_seen': 37262192}
***** train metrics *****
epoch = 12.4417
num_input_tokens_seen = 37262192
total_flos = 1379934625GF
train_loss = 12.0099
train_runtime = 0:49:37.50
train_samples_per_second = 37.615
train_steps_per_second = 0.336
train_tokens_per_second = 2751.294
Figure saved at: ./results/lora_sft_2/test/test1/training_loss.png
Figure saved at: ./results/lora_sft_2/test/test1/training_eval_loss.png
08/22/2024 06:25:08 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
***** eval metrics *****
epoch = 12.4417
eval_loss = nan
eval_runtime = 0:00:11.49
eval_samples_per_second = 87.008
eval_steps_per_second = 6.265
num_input_tokens_seen = 37262192