train: qwen

2024-09-19 15:11:35 +08:00 · 2024-09-19 15:11:35 +08:00 · 5a306611bc
parent e99fab5e4c
commit 5a306611bc
30 changed files with 307468 additions and 0 deletions
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/README.md
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/README.md
@ -0,0 +1,66 @@
+---
+base_model: ../../../models/qwen
+library_name: peft
+license: other
+tags:
+- llama-factory
+- lora
+- generated_from_trainer
+model-index:
+- name: lora_sft_Qwen-7B_8_gpu_500_step_20240919142813
+  results: []
+---
+
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+
+# lora_sft_Qwen-7B_8_gpu_500_step_20240919142813
+
+This model is a fine-tuned version of [../../../models/qwen](https://huggingface.co/../../../models/qwen) on the belle_1m dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.2722
+- Num Input Tokens Seen: 1321104
+
+## Model description
+
+More information needed
+
+## Intended uses & limitations
+
+More information needed
+
+## Training and evaluation data
+
+More information needed
+
+## Training procedure
+
+### Training hyperparameters
+
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- training_steps: 500
+- mixed_precision_training: Native AMP
+
+### Training results
+
+| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
+|:-------------:|:------:|:----:|:---------------:|:-----------------:|
+| 1.23          | 0.8889 | 500  | 1.2722          | 1321104           |
+
+
+### Framework versions
+
+- PEFT 0.12.0
+- Transformers 4.43.4
+- Pytorch 2.1.0
+- Datasets 2.20.0
+- Tokenizers 0.19.1
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/adapter_config.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/adapter_config.json
@ -0,0 +1,31 @@
+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../../../models/qwen",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_proj",
+    "w1",
+    "c_attn",
+    "w2"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/adapter_model.safetensors
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/adapter_model.safetensors
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/all_results.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/all_results.json
@ -0,0 +1,14 @@
+{
+    "epoch": 0.8888888888888888,
+    "eval_loss": 1.272219181060791,
+    "eval_runtime": 56.8054,
+    "eval_samples_per_second": 17.604,
+    "eval_steps_per_second": 8.802,
+    "num_input_tokens_seen": 1321104,
+    "total_flos": 5.641287949968998e+16,
+    "train_loss": 1.3309755539894104,
+    "train_runtime": 1830.2469,
+    "train_samples_per_second": 4.371,
+    "train_steps_per_second": 0.273,
+    "train_tokens_per_second": 1084.007
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/README.md
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/README.md
@ -0,0 +1,202 @@
+---
+base_model: ../../../models/qwen
+library_name: peft
+---
+
+# Model Card for Model ID
+
+<!-- Provide a quick summary of what the model is/does. -->
+
+
+
+## Model Details
+
+### Model Description
+
+<!-- Provide a longer summary of what this model is. -->
+
+
+
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+
+### Model Sources [optional]
+
+<!-- Provide the basic links for the model. -->
+
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+
+## Uses
+
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+
+### Direct Use
+
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+
+[More Information Needed]
+
+### Downstream Use [optional]
+
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+
+[More Information Needed]
+
+### Out-of-Scope Use
+
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+
+[More Information Needed]
+
+## Bias, Risks, and Limitations
+
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+
+[More Information Needed]
+
+### Recommendations
+
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+
+## How to Get Started with the Model
+
+Use the code below to get started with the model.
+
+[More Information Needed]
+
+## Training Details
+
+### Training Data
+
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+
+[More Information Needed]
+
+### Training Procedure
+
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+
+#### Preprocessing [optional]
+
+[More Information Needed]
+
+
+#### Training Hyperparameters
+
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+
+#### Speeds, Sizes, Times [optional]
+
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+
+[More Information Needed]
+
+## Evaluation
+
+<!-- This section describes the evaluation protocols and provides the results. -->
+
+### Testing Data, Factors & Metrics
+
+#### Testing Data
+
+<!-- This should link to a Dataset Card if possible. -->
+
+[More Information Needed]
+
+#### Factors
+
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+
+[More Information Needed]
+
+#### Metrics
+
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+
+[More Information Needed]
+
+### Results
+
+[More Information Needed]
+
+#### Summary
+
+
+
+## Model Examination [optional]
+
+<!-- Relevant interpretability work for the model goes here -->
+
+[More Information Needed]
+
+## Environmental Impact
+
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+
+## Technical Specifications [optional]
+
+### Model Architecture and Objective
+
+[More Information Needed]
+
+### Compute Infrastructure
+
+[More Information Needed]
+
+#### Hardware
+
+[More Information Needed]
+
+#### Software
+
+[More Information Needed]
+
+## Citation [optional]
+
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+
+**BibTeX:**
+
+[More Information Needed]
+
+**APA:**
+
+[More Information Needed]
+
+## Glossary [optional]
+
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+
+[More Information Needed]
+
+## More Information [optional]
+
+[More Information Needed]
+
+## Model Card Authors [optional]
+
+[More Information Needed]
+
+## Model Card Contact
+
+[More Information Needed]
+### Framework versions
+
+- PEFT 0.12.0
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/adapter_config.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/adapter_config.json
@ -0,0 +1,31 @@
+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "../../../models/qwen",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_proj",
+    "w1",
+    "c_attn",
+    "w2"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/adapter_model.safetensors
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/adapter_model.safetensors
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/optimizer.pt
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/optimizer.pt
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/qwen.tiktoken
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/qwen.tiktoken
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/rng_state.pth
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/rng_state.pth
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/scheduler.pt
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/scheduler.pt
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/special_tokens_map.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/special_tokens_map.json
@ -0,0 +1,10 @@
+{
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|im_end|>"
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/tokenization_qwen.py
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/tokenization_qwen.py
@ -0,0 +1,276 @@
+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Tokenization classes for QWen."""
+
+import base64
+import logging
+import os
+import unicodedata
+from typing import Collection, Dict, List, Set, Tuple, Union
+
+import tiktoken
+from transformers import PreTrainedTokenizer, AddedToken
+
+logger = logging.getLogger(__name__)
+
+
+VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
+
+PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
+ENDOFTEXT = "<|endoftext|>"
+IMSTART = "<|im_start|>"
+IMEND = "<|im_end|>"
+# as the default behavior is changed to allow special tokens in
+# regular texts, the surface forms of special tokens need to be
+# as different as possible to minimize the impact
+EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
+# changed to use actual index to avoid misconfiguration with vocabulary expansion
+SPECIAL_START_ID = 151643
+SPECIAL_TOKENS = tuple(
+    enumerate(
+        (
+            (
+                ENDOFTEXT,
+                IMSTART,
+                IMEND,
+            )
+            + EXTRAS
+        ),
+        start=SPECIAL_START_ID,
+    )
+)
+SPECIAL_TOKENS_SET = set(t for i, t in SPECIAL_TOKENS)
+
+
+def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
+    with open(tiktoken_bpe_file, "rb") as f:
+        contents = f.read()
+    return {
+        base64.b64decode(token): int(rank)
+        for token, rank in (line.split() for line in contents.splitlines() if line)
+    }
+
+
+class QWenTokenizer(PreTrainedTokenizer):
+    """QWen tokenizer."""
+
+    vocab_files_names = VOCAB_FILES_NAMES
+
+    def __init__(
+        self,
+        vocab_file,
+        errors="replace",
+        extra_vocab_file=None,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+
+        # how to handle errors in decoding UTF-8 byte sequences
+        # use ignore if you are in streaming inference
+        self.errors = errors  
+
+        self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)  # type: Dict[bytes, int]
+        self.special_tokens = {
+            token: index
+            for index, token in SPECIAL_TOKENS
+        }
+
+        # try load extra vocab from file
+        if extra_vocab_file is not None:
+            used_ids = set(self.mergeable_ranks.values()) | set(self.special_tokens.values())
+            extra_mergeable_ranks = _load_tiktoken_bpe(extra_vocab_file)
+            for token, index in extra_mergeable_ranks.items():
+                if token in self.mergeable_ranks:
+                    logger.info(f"extra token {token} exists, skipping")
+                    continue
+                if index in used_ids:
+                    logger.info(f'the index {index} for extra token {token} exists, skipping')
+                    continue
+                self.mergeable_ranks[token] = index
+            # the index may be sparse after this, but don't worry tiktoken.Encoding will handle this
+
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        assert (
+            len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
+        ), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
+
+        self.decoder = {
+            v: k for k, v in self.mergeable_ranks.items()
+        }  # type: dict[int, bytes|str]
+        self.decoder.update({v: k for k, v in self.special_tokens.items()})
+
+        self.tokenizer = enc  # type: tiktoken.Encoding
+
+        self.eod_id = self.tokenizer.eot_token
+        self.im_start_id = self.special_tokens[IMSTART]
+        self.im_end_id = self.special_tokens[IMEND]
+
+    def __getstate__(self):
+        # for pickle lovers
+        state = self.__dict__.copy()
+        del state["tokenizer"]
+        return state
+
+    def __setstate__(self, state):
+        # tokenizer is not python native; don't pass it; rebuild it
+        self.__dict__.update(state)
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        self.tokenizer = enc
+
+    def __len__(self) -> int:
+        return self.tokenizer.n_vocab
+
+    def get_vocab(self) -> Dict[bytes, int]:
+        return self.mergeable_ranks
+
+    def convert_tokens_to_ids(
+        self, tokens: Union[bytes, str, List[Union[bytes, str]]]
+    ) -> List[int]:
+        ids = []
+        if isinstance(tokens, (str, bytes)):
+            if tokens in self.special_tokens:
+                return self.special_tokens[tokens]
+            else:
+                return self.mergeable_ranks.get(tokens)
+        for token in tokens:
+            if token in self.special_tokens:
+                ids.append(self.special_tokens[token])
+            else:
+                ids.append(self.mergeable_ranks.get(token))
+        return ids
+
+    def _add_tokens(
+        self,
+        new_tokens: Union[List[str], List[AddedToken]],
+        special_tokens: bool = False,
+    ) -> int:
+        if not special_tokens and new_tokens:
+            raise ValueError("Adding regular tokens is not supported")
+        for token in new_tokens:
+            surface_form = token.content if isinstance(token, AddedToken) else token
+            if surface_form not in SPECIAL_TOKENS_SET:
+                raise ValueError("Adding unknown special tokens is not supported")
+        return 0
+
+    def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
+        """
+        Save only the vocabulary of the tokenizer (vocabulary).
+
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        file_path = os.path.join(save_directory, "qwen.tiktoken")
+        with open(file_path, "w", encoding="utf8") as w:
+            for k, v in self.mergeable_ranks.items():
+                line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
+                w.write(line)
+        return (file_path,)
+
+    def tokenize(
+        self,
+        text: str,
+        allowed_special: Union[Set, str] = "all",
+        disallowed_special: Union[Collection, str] = (),
+        **kwargs,
+    ) -> List[Union[bytes, str]]:
+        """
+        Converts a string in a sequence of tokens.
+
+        Args:
+            text (`str`):
+                The sequence to be encoded.
+            allowed_special (`Literal["all"]` or `set`):
+                The surface forms of the tokens to be encoded as special tokens in regular texts.
+                Default to "all".
+            disallowed_special (`Literal["all"]` or `Collection`):
+                The surface forms of the tokens that should not be in regular texts and trigger errors.
+                Default to an empty tuple.
+
+            kwargs (additional keyword arguments, *optional*):
+                Will be passed to the underlying model specific encode method.
+
+        Returns:
+            `List[bytes|str]`: The list of tokens.
+        """
+        tokens = []
+        text = unicodedata.normalize("NFC", text)
+
+        # this implementation takes a detour: text -> token id -> token surface forms
+        for t in self.tokenizer.encode(
+            text, allowed_special=allowed_special, disallowed_special=disallowed_special
+        ):
+            tokens.append(self.decoder[t])
+        return tokens
+
+    def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
+        """
+        Converts a sequence of tokens in a single string.
+        """
+        text = ""
+        temp = b""
+        for t in tokens:
+            if isinstance(t, str):
+                if temp:
+                    text += temp.decode("utf-8", errors=self.errors)
+                    temp = b""
+                text += t
+            elif isinstance(t, bytes):
+                temp += t
+            else:
+                raise TypeError("token should only be of type types or str")
+        if temp:
+            text += temp.decode("utf-8", errors=self.errors)
+        return text
+
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_vocab
+
+    def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
+        """Converts an id to a token, special tokens included"""
+        if index in self.decoder:
+            return self.decoder[index]
+        raise ValueError("unknown ids")
+
+    def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
+        """Converts a token to an id using the vocab, special tokens included"""
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        if token in self.mergeable_ranks:
+            return self.mergeable_ranks[token]
+        raise ValueError("unknown token")
+
+    def _tokenize(self, text: str, **kwargs):
+        """
+        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
+        vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
+
+        Do NOT take care of added tokens.
+        """
+        raise NotImplementedError
+
+    def _decode(
+        self,
+        token_ids: Union[int, List[int]],
+        skip_special_tokens: bool = False,
+        errors: str = None,
+        **kwargs,
+    ) -> str:
+        if isinstance(token_ids, int):
+            token_ids = [token_ids]
+        if skip_special_tokens:
+            token_ids = [i for i in token_ids if i < self.eod_id]
+        return self.tokenizer.decode(token_ids, errors=errors or self.errors)
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/tokenizer_config.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/tokenizer_config.json
@ -0,0 +1,17 @@
+{
+  "added_tokens_decoder": {},
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_qwen.QWenTokenizer",
+      null
+    ]
+  },
+  "chat_template": "{% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|im_end|>",
+  "model_max_length": 32768,
+  "pad_token": "<|im_end|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "QWenTokenizer"
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/trainer_state.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/trainer_state.json
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/training_args.bin
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/checkpoint-500/training_args.bin
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/eval_results.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/eval_results.json
@ -0,0 +1,8 @@
+{
+    "epoch": 0.8888888888888888,
+    "eval_loss": 1.272219181060791,
+    "eval_runtime": 56.8054,
+    "eval_samples_per_second": 17.604,
+    "eval_steps_per_second": 8.802,
+    "num_input_tokens_seen": 1321104
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/log.txt
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/log.txt
@ -0,0 +1,231 @@
+[2024-09-19 14:28:31,296] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to npu (auto detect)
+[93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found.
+[93m [WARNING] [0m async_io: please install the libaio-devel package with yum
+[93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
+09/19/2024 14:28:36 - INFO - llamafactory.hparams.parser - Process rank: 0, device: npu:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
+09/19/2024 14:28:37 - INFO - llamafactory.data.template - Add eos token: <|im_end|>
+09/19/2024 14:28:37 - INFO - llamafactory.data.template - Add pad token: <|im_end|>
+09/19/2024 14:28:37 - INFO - llamafactory.data.loader - Loading dataset AI-ModelScope/train_1M_CN...
+training example:
+input_ids:
+[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 104317, 89012, 22382, 106096, 64471, 101137, 72881, 102648, 46448, 1773, 62244, 107132, 37945, 99553, 25177, 101898, 8997, 100431, 99639, 113773, 9370, 111749, 25, 330, 100012, 105435, 99487, 100220, 3837, 104817, 44063, 99553, 102322, 20074, 33108, 116993, 3837, 23031, 104022, 100147, 101313, 1773, 698, 151645, 198, 151644, 77091, 198, 99487, 111749, 101137, 72881, 102648, 46448, 1773, 151645]
+inputs:
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+判断给定的文章是否符合语法规则。如果不符合，请提供修改建议。
+下面是一篇文章的开头: "为了探讨这个主题，本文将提供一系列数据和实例，以证明这一观点。"
+<|im_end|>
+<|im_start|>assistant
+这个开头符合语法规则。<|im_end|>
+label_ids:
+[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 99487, 111749, 101137, 72881, 102648, 46448, 1773, 151645]
+labels:
+这个开头符合语法规则。<|im_end|>
+09/19/2024 14:33:15 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
+09/19/2024 14:33:15 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
+09/19/2024 14:33:15 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
+09/19/2024 14:33:15 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
+09/19/2024 14:33:15 - INFO - llamafactory.model.model_utils.misc - Found linear modules: c_proj,w1,c_attn,w2
+09/19/2024 14:33:16 - INFO - llamafactory.model.loader - trainable params: 17,891,328 || all params: 7,739,215,872 || trainable%: 0.2312
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 32768.0
+{'loss': 1.5189, 'grad_norm': 0.8999722599983215, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01, 'num_input_tokens_seen': 9808}
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 16384.0
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 8192.0
+{'loss': 1.5574, 'grad_norm': nan, 'learning_rate': 6e-06, 'epoch': 0.01, 'num_input_tokens_seen': 19312}
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 4096.0
+{'loss': 1.5909, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.02, 'num_input_tokens_seen': 29232}
+{'loss': 1.8082, 'grad_norm': 0.6828662753105164, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.02, 'num_input_tokens_seen': 37984}
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 2048.0
+{'loss': 1.7089, 'grad_norm': 1.1571168899536133, 'learning_rate': 2e-05, 'epoch': 0.03, 'num_input_tokens_seen': 44592}
+{'loss': 1.7715, 'grad_norm': 6.552746772766113, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.03, 'num_input_tokens_seen': 52400}
+{'loss': 1.5624, 'grad_norm': 1.3311208486557007, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.04, 'num_input_tokens_seen': 60320}
+{'loss': 1.9382, 'grad_norm': 1.941733479499817, 'learning_rate': 3.8e-05, 'epoch': 0.04, 'num_input_tokens_seen': 67024}
+{'loss': 1.5455, 'grad_norm': 1.579136848449707, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.05, 'num_input_tokens_seen': 73776}
+{'loss': 1.6186, 'grad_norm': 1.887698769569397, 'learning_rate': 5e-05, 'epoch': 0.05, 'num_input_tokens_seen': 82592}
+{'loss': 1.5182, 'grad_norm': 1.30510413646698, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.06, 'num_input_tokens_seen': 90512}
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 1024.0
+{'loss': 1.4462, 'grad_norm': 1.3481446504592896, 'learning_rate': 6e-05, 'epoch': 0.06, 'num_input_tokens_seen': 96848}
+{'loss': 1.3399, 'grad_norm': 1.5753204822540283, 'learning_rate': 6.6e-05, 'epoch': 0.07, 'num_input_tokens_seen': 103728}
+{'loss': 1.3188, 'grad_norm': 0.7255751490592957, 'learning_rate': 7.2e-05, 'epoch': 0.07, 'num_input_tokens_seen': 112160}
+{'loss': 1.535, 'grad_norm': 0.7835249900817871, 'learning_rate': 7.800000000000001e-05, 'epoch': 0.08, 'num_input_tokens_seen': 117984}
+{'loss': 1.363, 'grad_norm': 0.6611946225166321, 'learning_rate': 8.4e-05, 'epoch': 0.09, 'num_input_tokens_seen': 126624}
+{'loss': 1.5687, 'grad_norm': 0.9039588570594788, 'learning_rate': 9e-05, 'epoch': 0.09, 'num_input_tokens_seen': 134496}
+{'loss': 1.6227, 'grad_norm': 0.9094035029411316, 'learning_rate': 9.6e-05, 'epoch': 0.1, 'num_input_tokens_seen': 141664}
+{'loss': 1.3263, 'grad_norm': 0.5215123891830444, 'learning_rate': 9.999878153526974e-05, 'epoch': 0.1, 'num_input_tokens_seen': 149360}
+{'loss': 1.3676, 'grad_norm': 0.5413862466812134, 'learning_rate': 9.998050575201771e-05, 'epoch': 0.11, 'num_input_tokens_seen': 157776}
+{'loss': 1.4657, 'grad_norm': 1.1176000833511353, 'learning_rate': 9.99403068670717e-05, 'epoch': 0.11, 'num_input_tokens_seen': 165360}
+{'loss': 1.3857, 'grad_norm': 1.301885962486267, 'learning_rate': 9.987820251299122e-05, 'epoch': 0.12, 'num_input_tokens_seen': 173152}
+{'loss': 1.549, 'grad_norm': 0.8806378841400146, 'learning_rate': 9.979421993079852e-05, 'epoch': 0.12, 'num_input_tokens_seen': 181504}
+{'loss': 1.4207, 'grad_norm': 1.1323336362838745, 'learning_rate': 9.968839595802982e-05, 'epoch': 0.13, 'num_input_tokens_seen': 188880}
+{'loss': 1.4148, 'grad_norm': 0.5949046015739441, 'learning_rate': 9.956077701257709e-05, 'epoch': 0.13, 'num_input_tokens_seen': 195952}
+{'loss': 1.2252, 'grad_norm': 2.3640127182006836, 'learning_rate': 9.941141907232765e-05, 'epoch': 0.14, 'num_input_tokens_seen': 203744}
+{'loss': 1.1623, 'grad_norm': 0.46419885754585266, 'learning_rate': 9.924038765061042e-05, 'epoch': 0.14, 'num_input_tokens_seen': 212624}
+{'loss': 1.5017, 'grad_norm': 2.0805838108062744, 'learning_rate': 9.904775776745958e-05, 'epoch': 0.15, 'num_input_tokens_seen': 219968}
+{'loss': 1.2643, 'grad_norm': 0.38017725944519043, 'learning_rate': 9.88336139167084e-05, 'epoch': 0.15, 'num_input_tokens_seen': 229024}
+{'loss': 1.2085, 'grad_norm': 0.5102754831314087, 'learning_rate': 9.859805002892732e-05, 'epoch': 0.16, 'num_input_tokens_seen': 236768}
+{'loss': 1.333, 'grad_norm': 0.4705878496170044, 'learning_rate': 9.834116943022298e-05, 'epoch': 0.17, 'num_input_tokens_seen': 245152}
+{'loss': 1.5669, 'grad_norm': 0.8111830353736877, 'learning_rate': 9.806308479691595e-05, 'epoch': 0.17, 'num_input_tokens_seen': 251728}
+{'loss': 1.317, 'grad_norm': 0.5702469348907471, 'learning_rate': 9.776391810611718e-05, 'epoch': 0.18, 'num_input_tokens_seen': 260832}
+{'loss': 1.3829, 'grad_norm': 0.9620650410652161, 'learning_rate': 9.744380058222483e-05, 'epoch': 0.18, 'num_input_tokens_seen': 268448}
+{'loss': 1.2588, 'grad_norm': 0.8848989605903625, 'learning_rate': 9.710287263936484e-05, 'epoch': 0.19, 'num_input_tokens_seen': 276016}
+{'loss': 1.3142, 'grad_norm': 0.7803770899772644, 'learning_rate': 9.674128381980072e-05, 'epoch': 0.19, 'num_input_tokens_seen': 282416}
+{'loss': 1.4468, 'grad_norm': 0.6126624941825867, 'learning_rate': 9.635919272833938e-05, 'epoch': 0.2, 'num_input_tokens_seen': 290016}
+{'loss': 1.5577, 'grad_norm': 1.1495459079742432, 'learning_rate': 9.595676696276172e-05, 'epoch': 0.2, 'num_input_tokens_seen': 297952}
+{'loss': 1.3545, 'grad_norm': 0.4600121080875397, 'learning_rate': 9.553418304030886e-05, 'epoch': 0.21, 'num_input_tokens_seen': 305152}
+{'loss': 1.2359, 'grad_norm': 0.6911833882331848, 'learning_rate': 9.50916263202557e-05, 'epoch': 0.21, 'num_input_tokens_seen': 312592}
+{'loss': 1.2377, 'grad_norm': 0.800264298915863, 'learning_rate': 9.462929092260628e-05, 'epoch': 0.22, 'num_input_tokens_seen': 319792}
+{'loss': 1.402, 'grad_norm': 0.37946560978889465, 'learning_rate': 9.414737964294636e-05, 'epoch': 0.22, 'num_input_tokens_seen': 327776}
+{'loss': 1.3244, 'grad_norm': 0.6147738695144653, 'learning_rate': 9.364610386349049e-05, 'epoch': 0.23, 'num_input_tokens_seen': 336432}
+{'loss': 1.3333, 'grad_norm': 0.4380050301551819, 'learning_rate': 9.312568346036288e-05, 'epoch': 0.23, 'num_input_tokens_seen': 343216}
+{'loss': 1.2563, 'grad_norm': 0.8881791234016418, 'learning_rate': 9.258634670715238e-05, 'epoch': 0.24, 'num_input_tokens_seen': 350096}
+{'loss': 1.3305, 'grad_norm': 0.39159095287323, 'learning_rate': 9.202833017478422e-05, 'epoch': 0.25, 'num_input_tokens_seen': 359040}
+{'loss': 1.2371, 'grad_norm': 0.4823763966560364, 'learning_rate': 9.145187862775209e-05, 'epoch': 0.25, 'num_input_tokens_seen': 367568}
+{'loss': 1.2544, 'grad_norm': 0.7489269971847534, 'learning_rate': 9.085724491675642e-05, 'epoch': 0.26, 'num_input_tokens_seen': 375552}
+{'loss': 1.3928, 'grad_norm': 0.7432783842086792, 'learning_rate': 9.02446898677957e-05, 'epoch': 0.26, 'num_input_tokens_seen': 385712}
+{'loss': 1.1703, 'grad_norm': 0.6810922622680664, 'learning_rate': 8.961448216775954e-05, 'epoch': 0.27, 'num_input_tokens_seen': 393344}
+{'loss': 1.3133, 'grad_norm': 0.5105797648429871, 'learning_rate': 8.896689824657372e-05, 'epoch': 0.27, 'num_input_tokens_seen': 400720}
+{'loss': 1.2321, 'grad_norm': 0.7406936287879944, 'learning_rate': 8.83022221559489e-05, 'epoch': 0.28, 'num_input_tokens_seen': 409056}
+{'loss': 1.3816, 'grad_norm': 0.7281469106674194, 'learning_rate': 8.762074544478623e-05, 'epoch': 0.28, 'num_input_tokens_seen': 416144}
+{'loss': 1.1577, 'grad_norm': 1.008775234222412, 'learning_rate': 8.692276703129421e-05, 'epoch': 0.29, 'num_input_tokens_seen': 422880}
+{'loss': 1.2287, 'grad_norm': 0.9613437652587891, 'learning_rate': 8.620859307187339e-05, 'epoch': 0.29, 'num_input_tokens_seen': 429440}
+{'loss': 1.2672, 'grad_norm': 0.556431233882904, 'learning_rate': 8.547853682682604e-05, 'epoch': 0.3, 'num_input_tokens_seen': 437200}
+{'loss': 1.299, 'grad_norm': 0.6057170629501343, 'learning_rate': 8.473291852294987e-05, 'epoch': 0.3, 'num_input_tokens_seen': 446416}
+{'loss': 1.4358, 'grad_norm': 0.772670567035675, 'learning_rate': 8.397206521307584e-05, 'epoch': 0.31, 'num_input_tokens_seen': 454672}
+{'loss': 1.2488, 'grad_norm': 0.5777968168258667, 'learning_rate': 8.319631063261209e-05, 'epoch': 0.31, 'num_input_tokens_seen': 463216}
+{'loss': 1.2692, 'grad_norm': 0.7657284140586853, 'learning_rate': 8.240599505315655e-05, 'epoch': 0.32, 'num_input_tokens_seen': 472432}
+{'loss': 1.2339, 'grad_norm': 0.7620649337768555, 'learning_rate': 8.160146513324254e-05, 'epoch': 0.33, 'num_input_tokens_seen': 479968}
+{'loss': 1.4274, 'grad_norm': 0.50833660364151, 'learning_rate': 8.07830737662829e-05, 'epoch': 0.33, 'num_input_tokens_seen': 488112}
+{'loss': 1.3689, 'grad_norm': 0.692903459072113, 'learning_rate': 7.99511799257793e-05, 'epoch': 0.34, 'num_input_tokens_seen': 495808}
+{'loss': 1.344, 'grad_norm': 0.8289609551429749, 'learning_rate': 7.910614850786448e-05, 'epoch': 0.34, 'num_input_tokens_seen': 503216}
+{'loss': 1.2793, 'grad_norm': 0.5638831853866577, 'learning_rate': 7.82483501712469e-05, 'epoch': 0.35, 'num_input_tokens_seen': 511568}
+{'loss': 1.5863, 'grad_norm': 0.5944852828979492, 'learning_rate': 7.737816117462752e-05, 'epoch': 0.35, 'num_input_tokens_seen': 519536}
+{'loss': 1.5119, 'grad_norm': 0.6509004831314087, 'learning_rate': 7.649596321166024e-05, 'epoch': 0.36, 'num_input_tokens_seen': 526576}
+{'loss': 1.335, 'grad_norm': 0.5436453819274902, 'learning_rate': 7.560214324352858e-05, 'epoch': 0.36, 'num_input_tokens_seen': 533824}
+{'loss': 1.2377, 'grad_norm': 0.45730966329574585, 'learning_rate': 7.469709332921155e-05, 'epoch': 0.37, 'num_input_tokens_seen': 541472}
+{'loss': 1.1963, 'grad_norm': 0.5066884160041809, 'learning_rate': 7.378121045351378e-05, 'epoch': 0.37, 'num_input_tokens_seen': 550144}
+{'loss': 1.3623, 'grad_norm': 0.43720710277557373, 'learning_rate': 7.285489635293472e-05, 'epoch': 0.38, 'num_input_tokens_seen': 557872}
+{'loss': 1.2955, 'grad_norm': 0.6945408582687378, 'learning_rate': 7.191855733945387e-05, 'epoch': 0.38, 'num_input_tokens_seen': 563728}
+{'loss': 1.2986, 'grad_norm': 1.6284866333007812, 'learning_rate': 7.097260412230886e-05, 'epoch': 0.39, 'num_input_tokens_seen': 571056}
+{'loss': 1.0992, 'grad_norm': 0.6772762537002563, 'learning_rate': 7.001745162784477e-05, 'epoch': 0.39, 'num_input_tokens_seen': 579184}
+{'loss': 1.437, 'grad_norm': 1.3077731132507324, 'learning_rate': 6.905351881751372e-05, 'epoch': 0.4, 'num_input_tokens_seen': 586128}
+{'loss': 1.431, 'grad_norm': 0.4619450271129608, 'learning_rate': 6.808122850410461e-05, 'epoch': 0.41, 'num_input_tokens_seen': 594848}
+{'loss': 1.3132, 'grad_norm': 0.432878702878952, 'learning_rate': 6.710100716628344e-05, 'epoch': 0.41, 'num_input_tokens_seen': 602544}
+Gradient overflow. Skipping step
+Loss scaler reducing loss scale to 512.0
+{'loss': 1.3838, 'grad_norm': 15.175246238708496, 'learning_rate': 6.644333233692916e-05, 'epoch': 0.42, 'num_input_tokens_seen': 609904}
+{'loss': 1.1309, 'grad_norm': 0.43043747544288635, 'learning_rate': 6.545084971874738e-05, 'epoch': 0.42, 'num_input_tokens_seen': 618576}
+{'loss': 1.3572, 'grad_norm': 0.432679146528244, 'learning_rate': 6.445158984722358e-05, 'epoch': 0.43, 'num_input_tokens_seen': 626960}
+{'loss': 1.3563, 'grad_norm': 0.49087584018707275, 'learning_rate': 6.344599103076329e-05, 'epoch': 0.43, 'num_input_tokens_seen': 634896}
+{'loss': 1.2947, 'grad_norm': 0.4489469528198242, 'learning_rate': 6.243449435824276e-05, 'epoch': 0.44, 'num_input_tokens_seen': 642640}
+{'loss': 1.206, 'grad_norm': 0.5883502960205078, 'learning_rate': 6.141754350553279e-05, 'epoch': 0.44, 'num_input_tokens_seen': 650816}
+{'loss': 1.2874, 'grad_norm': 0.4804786443710327, 'learning_rate': 6.0395584540887963e-05, 'epoch': 0.45, 'num_input_tokens_seen': 658736}
+{'loss': 1.3168, 'grad_norm': 0.6784992814064026, 'learning_rate': 5.9369065729286245e-05, 'epoch': 0.45, 'num_input_tokens_seen': 666416}
+{'loss': 1.3401, 'grad_norm': 0.6340084671974182, 'learning_rate': 5.833843733580512e-05, 'epoch': 0.46, 'num_input_tokens_seen': 673312}
+{'loss': 1.3921, 'grad_norm': 1.0485939979553223, 'learning_rate': 5.730415142812059e-05, 'epoch': 0.46, 'num_input_tokens_seen': 681408}
+{'loss': 1.2268, 'grad_norm': 0.5049784779548645, 'learning_rate': 5.6266661678215216e-05, 'epoch': 0.47, 'num_input_tokens_seen': 690608}
+{'loss': 1.3181, 'grad_norm': 0.8696343302726746, 'learning_rate': 5.522642316338268e-05, 'epoch': 0.47, 'num_input_tokens_seen': 698576}
+{'loss': 1.1667, 'grad_norm': 0.5481306910514832, 'learning_rate': 5.418389216661579e-05, 'epoch': 0.48, 'num_input_tokens_seen': 707808}
+{'loss': 1.2205, 'grad_norm': 0.6927640438079834, 'learning_rate': 5.313952597646568e-05, 'epoch': 0.49, 'num_input_tokens_seen': 714016}
+{'loss': 1.2018, 'grad_norm': 0.5713353753089905, 'learning_rate': 5.209378268645998e-05, 'epoch': 0.49, 'num_input_tokens_seen': 721552}
+{'loss': 1.3723, 'grad_norm': 0.6889858245849609, 'learning_rate': 5.104712099416785e-05, 'epoch': 0.5, 'num_input_tokens_seen': 729104}
+{'loss': 1.3686, 'grad_norm': 0.4804179072380066, 'learning_rate': 5e-05, 'epoch': 0.5, 'num_input_tokens_seen': 736496}
+{'loss': 1.2406, 'grad_norm': 0.5674819350242615, 'learning_rate': 4.895287900583216e-05, 'epoch': 0.51, 'num_input_tokens_seen': 745184}
+{'loss': 1.2107, 'grad_norm': 0.5671128630638123, 'learning_rate': 4.790621731354003e-05, 'epoch': 0.51, 'num_input_tokens_seen': 752048}
+{'loss': 1.2136, 'grad_norm': 0.4124850630760193, 'learning_rate': 4.6860474023534335e-05, 'epoch': 0.52, 'num_input_tokens_seen': 761088}
+{'loss': 1.2939, 'grad_norm': 0.8833000063896179, 'learning_rate': 4.5816107833384234e-05, 'epoch': 0.52, 'num_input_tokens_seen': 768080}
+{'loss': 1.2621, 'grad_norm': 0.43334656953811646, 'learning_rate': 4.477357683661734e-05, 'epoch': 0.53, 'num_input_tokens_seen': 777808}
+{'loss': 1.3968, 'grad_norm': 0.5995087027549744, 'learning_rate': 4.373333832178478e-05, 'epoch': 0.53, 'num_input_tokens_seen': 786032}
+{'loss': 1.2069, 'grad_norm': 0.4385344684123993, 'learning_rate': 4.269584857187943e-05, 'epoch': 0.54, 'num_input_tokens_seen': 793424}
+{'loss': 1.1725, 'grad_norm': 0.6815104484558105, 'learning_rate': 4.166156266419489e-05, 'epoch': 0.54, 'num_input_tokens_seen': 801392}
+{'loss': 1.1476, 'grad_norm': 0.5028893351554871, 'learning_rate': 4.063093427071376e-05, 'epoch': 0.55, 'num_input_tokens_seen': 809184}
+{'loss': 1.3471, 'grad_norm': 0.6175994873046875, 'learning_rate': 3.960441545911204e-05, 'epoch': 0.55, 'num_input_tokens_seen': 816896}
+{'loss': 1.3197, 'grad_norm': 0.3137647211551666, 'learning_rate': 3.858245649446721e-05, 'epoch': 0.56, 'num_input_tokens_seen': 825952}
+{'loss': 1.2391, 'grad_norm': 0.42882341146469116, 'learning_rate': 3.756550564175727e-05, 'epoch': 0.57, 'num_input_tokens_seen': 835664}
+{'loss': 1.2011, 'grad_norm': 0.37835580110549927, 'learning_rate': 3.655400896923672e-05, 'epoch': 0.57, 'num_input_tokens_seen': 844288}
+{'loss': 1.2983, 'grad_norm': 0.43478235602378845, 'learning_rate': 3.554841015277641e-05, 'epoch': 0.58, 'num_input_tokens_seen': 851584}
+{'loss': 1.3713, 'grad_norm': 0.69629967212677, 'learning_rate': 3.4549150281252636e-05, 'epoch': 0.58, 'num_input_tokens_seen': 861216}
+{'loss': 1.3763, 'grad_norm': 0.3036434054374695, 'learning_rate': 3.355666766307084e-05, 'epoch': 0.59, 'num_input_tokens_seen': 869792}
+{'loss': 1.1632, 'grad_norm': 0.4899413287639618, 'learning_rate': 3.257139763390925e-05, 'epoch': 0.59, 'num_input_tokens_seen': 877216}
+{'loss': 1.2382, 'grad_norm': 0.5674645900726318, 'learning_rate': 3.1593772365766105e-05, 'epoch': 0.6, 'num_input_tokens_seen': 883488}
+{'loss': 1.1966, 'grad_norm': 0.7481414079666138, 'learning_rate': 3.062422067739485e-05, 'epoch': 0.6, 'num_input_tokens_seen': 890816}
+{'loss': 1.2934, 'grad_norm': 0.3711884617805481, 'learning_rate': 2.9663167846209998e-05, 'epoch': 0.61, 'num_input_tokens_seen': 900352}
+{'loss': 1.184, 'grad_norm': 0.3552614152431488, 'learning_rate': 2.8711035421746367e-05, 'epoch': 0.61, 'num_input_tokens_seen': 909312}
+{'loss': 1.1655, 'grad_norm': 0.46287208795547485, 'learning_rate': 2.776824104075364e-05, 'epoch': 0.62, 'num_input_tokens_seen': 917168}
+{'loss': 1.2241, 'grad_norm': 0.40405702590942383, 'learning_rate': 2.6835198244006927e-05, 'epoch': 0.62, 'num_input_tokens_seen': 923424}
+{'loss': 1.275, 'grad_norm': 0.625882625579834, 'learning_rate': 2.591231629491423e-05, 'epoch': 0.63, 'num_input_tokens_seen': 931152}
+{'loss': 1.2231, 'grad_norm': 0.4761885404586792, 'learning_rate': 2.500000000000001e-05, 'epoch': 0.63, 'num_input_tokens_seen': 940992}
+{'loss': 1.3608, 'grad_norm': 0.34844455122947693, 'learning_rate': 2.4098649531343497e-05, 'epoch': 0.64, 'num_input_tokens_seen': 949936}
+{'loss': 1.3481, 'grad_norm': 0.4715827405452728, 'learning_rate': 2.3208660251050158e-05, 'epoch': 0.65, 'num_input_tokens_seen': 958432}
+{'loss': 1.2009, 'grad_norm': 1.3269503116607666, 'learning_rate': 2.23304225378328e-05, 'epoch': 0.65, 'num_input_tokens_seen': 966688}
+{'loss': 1.1758, 'grad_norm': 0.850872278213501, 'learning_rate': 2.1464321615778422e-05, 'epoch': 0.66, 'num_input_tokens_seen': 973872}
+{'loss': 1.3557, 'grad_norm': 0.9200695753097534, 'learning_rate': 2.061073738537635e-05, 'epoch': 0.66, 'num_input_tokens_seen': 981264}
+{'loss': 1.4066, 'grad_norm': 0.47994673252105713, 'learning_rate': 1.977004425688126e-05, 'epoch': 0.67, 'num_input_tokens_seen': 988800}
+{'loss': 1.4266, 'grad_norm': 0.4955289661884308, 'learning_rate': 1.8942610986084486e-05, 'epoch': 0.67, 'num_input_tokens_seen': 996656}
+{'loss': 1.2348, 'grad_norm': 0.5885555744171143, 'learning_rate': 1.8128800512565513e-05, 'epoch': 0.68, 'num_input_tokens_seen': 1004672}
+{'loss': 1.2, 'grad_norm': 0.43341994285583496, 'learning_rate': 1.7328969800494726e-05, 'epoch': 0.68, 'num_input_tokens_seen': 1012400}
+{'loss': 1.2074, 'grad_norm': 0.5209198594093323, 'learning_rate': 1.6543469682057106e-05, 'epoch': 0.69, 'num_input_tokens_seen': 1018768}
+{'loss': 1.3071, 'grad_norm': 0.7264676690101624, 'learning_rate': 1.5772644703565565e-05, 'epoch': 0.69, 'num_input_tokens_seen': 1026368}
+{'loss': 1.3392, 'grad_norm': 0.5016859769821167, 'learning_rate': 1.5016832974331724e-05, 'epoch': 0.7, 'num_input_tokens_seen': 1033664}
+{'loss': 1.2203, 'grad_norm': 0.5449459552764893, 'learning_rate': 1.4276366018359844e-05, 'epoch': 0.7, 'num_input_tokens_seen': 1042560}
+{'loss': 1.3089, 'grad_norm': 0.5243913531303406, 'learning_rate': 1.3551568628929434e-05, 'epoch': 0.71, 'num_input_tokens_seen': 1050336}
+{'loss': 1.3856, 'grad_norm': 0.7503966689109802, 'learning_rate': 1.2842758726130283e-05, 'epoch': 0.71, 'num_input_tokens_seen': 1057824}
+{'loss': 1.4345, 'grad_norm': 0.8323341012001038, 'learning_rate': 1.2150247217412186e-05, 'epoch': 0.72, 'num_input_tokens_seen': 1065184}
+{'loss': 1.3059, 'grad_norm': 0.5052825808525085, 'learning_rate': 1.1474337861210543e-05, 'epoch': 0.73, 'num_input_tokens_seen': 1072016}
+{'loss': 1.1952, 'grad_norm': 0.5355105400085449, 'learning_rate': 1.0815327133708015e-05, 'epoch': 0.73, 'num_input_tokens_seen': 1079584}
+{'loss': 1.1284, 'grad_norm': 0.4663732051849365, 'learning_rate': 1.0173504098790187e-05, 'epoch': 0.74, 'num_input_tokens_seen': 1087984}
+{'loss': 1.1887, 'grad_norm': 0.500197172164917, 'learning_rate': 9.549150281252633e-06, 'epoch': 0.74, 'num_input_tokens_seen': 1096032}
+{'loss': 1.1508, 'grad_norm': 0.3888009488582611, 'learning_rate': 8.9425395433148e-06, 'epoch': 0.75, 'num_input_tokens_seen': 1105696}
+{'loss': 1.2795, 'grad_norm': 0.5471282601356506, 'learning_rate': 8.353937964495029e-06, 'epoch': 0.75, 'num_input_tokens_seen': 1114048}
+{'loss': 1.4123, 'grad_norm': 0.5487935543060303, 'learning_rate': 7.783603724899257e-06, 'epoch': 0.76, 'num_input_tokens_seen': 1122528}
+{'loss': 1.2925, 'grad_norm': 0.3951992988586426, 'learning_rate': 7.2317869919746705e-06, 'epoch': 0.76, 'num_input_tokens_seen': 1131328}
+{'loss': 1.1505, 'grad_norm': 0.3872891068458557, 'learning_rate': 6.698729810778065e-06, 'epoch': 0.77, 'num_input_tokens_seen': 1138992}
+{'loss': 1.3429, 'grad_norm': 0.5767130851745605, 'learning_rate': 6.184665997806832e-06, 'epoch': 0.77, 'num_input_tokens_seen': 1147776}
+{'loss': 1.2439, 'grad_norm': 0.7298781275749207, 'learning_rate': 5.689821038439263e-06, 'epoch': 0.78, 'num_input_tokens_seen': 1157360}
+{'loss': 1.2647, 'grad_norm': 0.8098281621932983, 'learning_rate': 5.214411988029355e-06, 'epoch': 0.78, 'num_input_tokens_seen': 1166256}
+{'loss': 1.3183, 'grad_norm': 0.4596332609653473, 'learning_rate': 4.758647376699032e-06, 'epoch': 0.79, 'num_input_tokens_seen': 1173840}
+{'loss': 1.3921, 'grad_norm': 0.7599104642868042, 'learning_rate': 4.322727117869951e-06, 'epoch': 0.79, 'num_input_tokens_seen': 1180848}
+{'loss': 1.3069, 'grad_norm': 0.5496035218238831, 'learning_rate': 3.90684242057498e-06, 'epoch': 0.8, 'num_input_tokens_seen': 1189952}
+{'loss': 1.228, 'grad_norm': 2.8328826427459717, 'learning_rate': 3.511175705587433e-06, 'epoch': 0.81, 'num_input_tokens_seen': 1198336}
+{'loss': 1.264, 'grad_norm': 0.40324077010154724, 'learning_rate': 3.1359005254054273e-06, 'epoch': 0.81, 'num_input_tokens_seen': 1206304}
+{'loss': 1.3767, 'grad_norm': 0.7104987502098083, 'learning_rate': 2.7811814881259503e-06, 'epoch': 0.82, 'num_input_tokens_seen': 1214480}
+{'loss': 1.2163, 'grad_norm': 0.29078155755996704, 'learning_rate': 2.4471741852423237e-06, 'epoch': 0.82, 'num_input_tokens_seen': 1222608}
+{'loss': 1.3609, 'grad_norm': 0.8542816638946533, 'learning_rate': 2.134025123396638e-06, 'epoch': 0.83, 'num_input_tokens_seen': 1229568}
+{'loss': 1.2252, 'grad_norm': 0.34084662795066833, 'learning_rate': 1.841871660117095e-06, 'epoch': 0.83, 'num_input_tokens_seen': 1236656}
+{'loss': 1.3446, 'grad_norm': 0.4465295374393463, 'learning_rate': 1.5708419435684462e-06, 'epoch': 0.84, 'num_input_tokens_seen': 1243776}
+{'loss': 1.1298, 'grad_norm': 0.3369986414909363, 'learning_rate': 1.3210548563419856e-06, 'epoch': 0.84, 'num_input_tokens_seen': 1253296}
+{'loss': 1.4383, 'grad_norm': 0.6114615797996521, 'learning_rate': 1.0926199633097157e-06, 'epoch': 0.85, 'num_input_tokens_seen': 1261440}
+{'loss': 1.4809, 'grad_norm': 0.4665069282054901, 'learning_rate': 8.856374635655695e-07, 'epoch': 0.85, 'num_input_tokens_seen': 1268336}
+{'loss': 1.4316, 'grad_norm': 0.640594482421875, 'learning_rate': 7.001981464747565e-07, 'epoch': 0.86, 'num_input_tokens_seen': 1276224}
+{'loss': 1.1178, 'grad_norm': 0.7955052852630615, 'learning_rate': 5.363833518505834e-07, 'epoch': 0.86, 'num_input_tokens_seen': 1284112}
+{'loss': 1.3764, 'grad_norm': 0.7011861801147461, 'learning_rate': 3.9426493427611177e-07, 'epoch': 0.87, 'num_input_tokens_seen': 1292800}
+{'loss': 1.4021, 'grad_norm': 1.2664375305175781, 'learning_rate': 2.7390523158633554e-07, 'epoch': 0.87, 'num_input_tokens_seen': 1300656}
+{'loss': 1.3095, 'grad_norm': 0.46517035365104675, 'learning_rate': 1.753570375247815e-07, 'epoch': 0.88, 'num_input_tokens_seen': 1309792}
+{'loss': 1.23, 'grad_norm': 0.7953115105628967, 'learning_rate': 9.866357858642205e-08, 'epoch': 0.89, 'num_input_tokens_seen': 1316000}
+{'eval_loss': 1.272219181060791, 'eval_runtime': 56.3441, 'eval_samples_per_second': 17.748, 'eval_steps_per_second': 8.874, 'epoch': 0.89, 'num_input_tokens_seen': 1321104}
+{'train_runtime': 1830.2469, 'train_samples_per_second': 4.371, 'train_steps_per_second': 0.273, 'train_tokens_per_second': 1084.007, 'train_loss': 1.3309755539894104, 'epoch': 0.89, 'num_input_tokens_seen': 1321104}
+***** train metrics *****
+  epoch                    =     0.8889
+  num_input_tokens_seen    =    1321104
+  total_flos               = 52538588GF
+  train_loss               =      1.331
+  train_runtime            = 0:30:30.24
+  train_samples_per_second =      4.371
+  train_steps_per_second   =      0.273
+  train_tokens_per_second  =   1084.007
+Figure saved at: ./results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_loss.png
+Figure saved at: ./results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_eval_loss.png
+09/19/2024 15:03:48 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
+***** eval metrics *****
+  epoch                   =     0.8889
+  eval_loss               =     1.2722
+  eval_runtime            = 0:00:56.80
+  eval_samples_per_second =     17.604
+  eval_steps_per_second   =      8.802
+  num_input_tokens_seen   =    1321104
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813.yaml
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813.yaml
@ -0,0 +1,31 @@
+cutoff_len: 1024
+dataset: belle_1m
+ddp_timeout: 180000000
+do_train: true
+eval_steps: 500
+eval_strategy: steps
+finetuning_type: lora
+fp16: true
+gradient_accumulation_steps: 8
+include_num_input_tokens_seen: true
+include_tokens_per_second: true
+learning_rate: 0.0001
+logging_steps: 3
+lora_target: all
+lr_scheduler_type: cosine
+max_samples: 10000
+max_steps: 500
+model_name_or_path: ../../../models/qwen
+num_train_epochs: 10.0
+output_dir: ./results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813
+overwrite_cache: true
+overwrite_output_dir: true
+per_device_eval_batch_size: 2
+per_device_train_batch_size: 2
+plot_loss: true
+preprocessing_num_workers: 16
+save_steps: 500
+stage: sft
+template: qwen
+val_size: 0.1
+warmup_ratio: 0.1
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/npu_status_20240919142813.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/npu_status_20240919142813.json
@ -0,0 +1,33 @@
+{"cur_time": "2024-09-19 14:28:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.3}, {"npu_id": 1, "power_dissipation": 89.5}, {"npu_id": 2, "power_dissipation": 92.8}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 90.6}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:29:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.5}, {"npu_id": 1, "power_dissipation": 89.6}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.6}, {"npu_id": 6, "power_dissipation": 93.1}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:30:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.8}, {"npu_id": 1, "power_dissipation": 89.2}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:31:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.5}, {"npu_id": 1, "power_dissipation": 89.7}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 15}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:33:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 85.7}, {"npu_id": 1, "power_dissipation": 89.7}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.5}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 25}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:34:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 130.6}, {"npu_id": 1, "power_dissipation": 89.6}, {"npu_id": 2, "power_dissipation": 92.9}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 94.2}, {"npu_id": 6, "power_dissipation": 93.1}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:35:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 163.1}, {"npu_id": 1, "power_dissipation": 89.5}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 88.5}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 93.2}, {"npu_id": 7, "power_dissipation": 90.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:36:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 127.7}, {"npu_id": 1, "power_dissipation": 90.0}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:37:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 199.2}, {"npu_id": 1, "power_dissipation": 88.5}, {"npu_id": 2, "power_dissipation": 93.0}, {"npu_id": 3, "power_dissipation": 88.8}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:38:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 163.4}, {"npu_id": 1, "power_dissipation": 89.0}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 89.1}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.2}, {"npu_id": 7, "power_dissipation": 90.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:40:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 181.9}, {"npu_id": 1, "power_dissipation": 90.9}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.5}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.5}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:41:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 143.0}, {"npu_id": 1, "power_dissipation": 90.3}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.1}, {"npu_id": 4, "power_dissipation": 93.2}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:42:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 131.2}, {"npu_id": 1, "power_dissipation": 90.5}, {"npu_id": 2, "power_dissipation": 93.0}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.6}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 91.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:43:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 161.1}, {"npu_id": 1, "power_dissipation": 90.0}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 93.1}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:44:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 154.2}, {"npu_id": 1, "power_dissipation": 90.2}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.5}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:45:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 126.5}, {"npu_id": 1, "power_dissipation": 88.9}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 93.2}, {"npu_id": 7, "power_dissipation": 91.2}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:47:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 144.7}, {"npu_id": 1, "power_dissipation": 91.3}, {"npu_id": 2, "power_dissipation": 93.4}, {"npu_id": 3, "power_dissipation": 88.8}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 94.3}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:48:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 135.1}, {"npu_id": 1, "power_dissipation": 90.0}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.4}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.3}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:49:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 178.7}, {"npu_id": 1, "power_dissipation": 89.4}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.7}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.6}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:50:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 157.0}, {"npu_id": 1, "power_dissipation": 90.3}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 89.0}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:51:44", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 142.0}, {"npu_id": 1, "power_dissipation": 90.2}, {"npu_id": 2, "power_dissipation": 93.0}, {"npu_id": 3, "power_dissipation": 88.5}, {"npu_id": 4, "power_dissipation": 92.6}, {"npu_id": 5, "power_dissipation": 93.5}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:52:54", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 172.2}, {"npu_id": 1, "power_dissipation": 89.6}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.7}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 43}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:54:04", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 172.0}, {"npu_id": 1, "power_dissipation": 91.2}, {"npu_id": 2, "power_dissipation": 92.9}, {"npu_id": 3, "power_dissipation": 89.0}, {"npu_id": 4, "power_dissipation": 92.7}, {"npu_id": 5, "power_dissipation": 94.3}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:55:14", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 126.0}, {"npu_id": 1, "power_dissipation": 90.7}, {"npu_id": 2, "power_dissipation": 93.3}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.7}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 89.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:56:24", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 167.3}, {"npu_id": 1, "power_dissipation": 92.1}, {"npu_id": 2, "power_dissipation": 93.1}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.7}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 90.8}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:57:34", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 151.7}, {"npu_id": 1, "power_dissipation": 90.5}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 94.1}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:58:43", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 137.0}, {"npu_id": 1, "power_dissipation": 90.7}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 94.1}, {"npu_id": 6, "power_dissipation": 92.8}, {"npu_id": 7, "power_dissipation": 90.7}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 14:59:53", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 160.1}, {"npu_id": 1, "power_dissipation": 89.3}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.9}, {"npu_id": 4, "power_dissipation": 93.1}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 90.4}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 15:01:03", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 135.5}, {"npu_id": 1, "power_dissipation": 91.7}, {"npu_id": 2, "power_dissipation": 93.5}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 93.4}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 15:02:13", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 140.5}, {"npu_id": 1, "power_dissipation": 89.7}, {"npu_id": 2, "power_dissipation": 92.9}, {"npu_id": 3, "power_dissipation": 88.6}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.9}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 90.9}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 15:03:23", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 230.1}, {"npu_id": 1, "power_dissipation": 89.3}, {"npu_id": 2, "power_dissipation": 93.4}, {"npu_id": 3, "power_dissipation": 89.1}, {"npu_id": 4, "power_dissipation": 92.8}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 92.9}, {"npu_id": 7, "power_dissipation": 91.0}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 15:04:33", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 133.3}, {"npu_id": 1, "power_dissipation": 90.2}, {"npu_id": 2, "power_dissipation": 93.2}, {"npu_id": 3, "power_dissipation": 88.3}, {"npu_id": 4, "power_dissipation": 92.9}, {"npu_id": 5, "power_dissipation": 94.0}, {"npu_id": 6, "power_dissipation": 92.7}, {"npu_id": 7, "power_dissipation": 91.1}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 46}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
+{"cur_time": "2024-09-19 15:05:43", "npu_power_dissipation": [{"npu_id": 0, "power_dissipation": 86.2}, {"npu_id": 1, "power_dissipation": 90.1}, {"npu_id": 2, "power_dissipation": 93.4}, {"npu_id": 3, "power_dissipation": 88.2}, {"npu_id": 4, "power_dissipation": 93.0}, {"npu_id": 5, "power_dissipation": 93.8}, {"npu_id": 6, "power_dissipation": 93.0}, {"npu_id": 7, "power_dissipation": 91.0}], "device_mem_usage": [{"npu_id": 0, "mem_usage_percent": 6}, {"npu_id": 1, "mem_usage_percent": 6}, {"npu_id": 2, "mem_usage_percent": 6}, {"npu_id": 3, "mem_usage_percent": 6}, {"npu_id": 4, "mem_usage_percent": 6}, {"npu_id": 5, "mem_usage_percent": 6}, {"npu_id": 6, "mem_usage_percent": 6}, {"npu_id": 7, "mem_usage_percent": 6}]}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/qwen.tiktoken
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/qwen.tiktoken
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/special_tokens_map.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/special_tokens_map.json
@ -0,0 +1,10 @@
+{
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|im_end|>"
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/tokenization_qwen.py
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/tokenization_qwen.py
@ -0,0 +1,276 @@
+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Tokenization classes for QWen."""
+
+import base64
+import logging
+import os
+import unicodedata
+from typing import Collection, Dict, List, Set, Tuple, Union
+
+import tiktoken
+from transformers import PreTrainedTokenizer, AddedToken
+
+logger = logging.getLogger(__name__)
+
+
+VOCAB_FILES_NAMES = {"vocab_file": "qwen.tiktoken"}
+
+PAT_STR = r"""(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+"""
+ENDOFTEXT = "<|endoftext|>"
+IMSTART = "<|im_start|>"
+IMEND = "<|im_end|>"
+# as the default behavior is changed to allow special tokens in
+# regular texts, the surface forms of special tokens need to be
+# as different as possible to minimize the impact
+EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
+# changed to use actual index to avoid misconfiguration with vocabulary expansion
+SPECIAL_START_ID = 151643
+SPECIAL_TOKENS = tuple(
+    enumerate(
+        (
+            (
+                ENDOFTEXT,
+                IMSTART,
+                IMEND,
+            )
+            + EXTRAS
+        ),
+        start=SPECIAL_START_ID,
+    )
+)
+SPECIAL_TOKENS_SET = set(t for i, t in SPECIAL_TOKENS)
+
+
+def _load_tiktoken_bpe(tiktoken_bpe_file: str) -> Dict[bytes, int]:
+    with open(tiktoken_bpe_file, "rb") as f:
+        contents = f.read()
+    return {
+        base64.b64decode(token): int(rank)
+        for token, rank in (line.split() for line in contents.splitlines() if line)
+    }
+
+
+class QWenTokenizer(PreTrainedTokenizer):
+    """QWen tokenizer."""
+
+    vocab_files_names = VOCAB_FILES_NAMES
+
+    def __init__(
+        self,
+        vocab_file,
+        errors="replace",
+        extra_vocab_file=None,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+
+        # how to handle errors in decoding UTF-8 byte sequences
+        # use ignore if you are in streaming inference
+        self.errors = errors  
+
+        self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)  # type: Dict[bytes, int]
+        self.special_tokens = {
+            token: index
+            for index, token in SPECIAL_TOKENS
+        }
+
+        # try load extra vocab from file
+        if extra_vocab_file is not None:
+            used_ids = set(self.mergeable_ranks.values()) | set(self.special_tokens.values())
+            extra_mergeable_ranks = _load_tiktoken_bpe(extra_vocab_file)
+            for token, index in extra_mergeable_ranks.items():
+                if token in self.mergeable_ranks:
+                    logger.info(f"extra token {token} exists, skipping")
+                    continue
+                if index in used_ids:
+                    logger.info(f'the index {index} for extra token {token} exists, skipping')
+                    continue
+                self.mergeable_ranks[token] = index
+            # the index may be sparse after this, but don't worry tiktoken.Encoding will handle this
+
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        assert (
+            len(self.mergeable_ranks) + len(self.special_tokens) == enc.n_vocab
+        ), f"{len(self.mergeable_ranks) + len(self.special_tokens)} != {enc.n_vocab} in encoding"
+
+        self.decoder = {
+            v: k for k, v in self.mergeable_ranks.items()
+        }  # type: dict[int, bytes|str]
+        self.decoder.update({v: k for k, v in self.special_tokens.items()})
+
+        self.tokenizer = enc  # type: tiktoken.Encoding
+
+        self.eod_id = self.tokenizer.eot_token
+        self.im_start_id = self.special_tokens[IMSTART]
+        self.im_end_id = self.special_tokens[IMEND]
+
+    def __getstate__(self):
+        # for pickle lovers
+        state = self.__dict__.copy()
+        del state["tokenizer"]
+        return state
+
+    def __setstate__(self, state):
+        # tokenizer is not python native; don't pass it; rebuild it
+        self.__dict__.update(state)
+        enc = tiktoken.Encoding(
+            "Qwen",
+            pat_str=PAT_STR,
+            mergeable_ranks=self.mergeable_ranks,
+            special_tokens=self.special_tokens,
+        )
+        self.tokenizer = enc
+
+    def __len__(self) -> int:
+        return self.tokenizer.n_vocab
+
+    def get_vocab(self) -> Dict[bytes, int]:
+        return self.mergeable_ranks
+
+    def convert_tokens_to_ids(
+        self, tokens: Union[bytes, str, List[Union[bytes, str]]]
+    ) -> List[int]:
+        ids = []
+        if isinstance(tokens, (str, bytes)):
+            if tokens in self.special_tokens:
+                return self.special_tokens[tokens]
+            else:
+                return self.mergeable_ranks.get(tokens)
+        for token in tokens:
+            if token in self.special_tokens:
+                ids.append(self.special_tokens[token])
+            else:
+                ids.append(self.mergeable_ranks.get(token))
+        return ids
+
+    def _add_tokens(
+        self,
+        new_tokens: Union[List[str], List[AddedToken]],
+        special_tokens: bool = False,
+    ) -> int:
+        if not special_tokens and new_tokens:
+            raise ValueError("Adding regular tokens is not supported")
+        for token in new_tokens:
+            surface_form = token.content if isinstance(token, AddedToken) else token
+            if surface_form not in SPECIAL_TOKENS_SET:
+                raise ValueError("Adding unknown special tokens is not supported")
+        return 0
+
+    def save_vocabulary(self, save_directory: str, **kwargs) -> Tuple[str]:
+        """
+        Save only the vocabulary of the tokenizer (vocabulary).
+
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        file_path = os.path.join(save_directory, "qwen.tiktoken")
+        with open(file_path, "w", encoding="utf8") as w:
+            for k, v in self.mergeable_ranks.items():
+                line = base64.b64encode(k).decode("utf8") + " " + str(v) + "\n"
+                w.write(line)
+        return (file_path,)
+
+    def tokenize(
+        self,
+        text: str,
+        allowed_special: Union[Set, str] = "all",
+        disallowed_special: Union[Collection, str] = (),
+        **kwargs,
+    ) -> List[Union[bytes, str]]:
+        """
+        Converts a string in a sequence of tokens.
+
+        Args:
+            text (`str`):
+                The sequence to be encoded.
+            allowed_special (`Literal["all"]` or `set`):
+                The surface forms of the tokens to be encoded as special tokens in regular texts.
+                Default to "all".
+            disallowed_special (`Literal["all"]` or `Collection`):
+                The surface forms of the tokens that should not be in regular texts and trigger errors.
+                Default to an empty tuple.
+
+            kwargs (additional keyword arguments, *optional*):
+                Will be passed to the underlying model specific encode method.
+
+        Returns:
+            `List[bytes|str]`: The list of tokens.
+        """
+        tokens = []
+        text = unicodedata.normalize("NFC", text)
+
+        # this implementation takes a detour: text -> token id -> token surface forms
+        for t in self.tokenizer.encode(
+            text, allowed_special=allowed_special, disallowed_special=disallowed_special
+        ):
+            tokens.append(self.decoder[t])
+        return tokens
+
+    def convert_tokens_to_string(self, tokens: List[Union[bytes, str]]) -> str:
+        """
+        Converts a sequence of tokens in a single string.
+        """
+        text = ""
+        temp = b""
+        for t in tokens:
+            if isinstance(t, str):
+                if temp:
+                    text += temp.decode("utf-8", errors=self.errors)
+                    temp = b""
+                text += t
+            elif isinstance(t, bytes):
+                temp += t
+            else:
+                raise TypeError("token should only be of type types or str")
+        if temp:
+            text += temp.decode("utf-8", errors=self.errors)
+        return text
+
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_vocab
+
+    def _convert_id_to_token(self, index: int) -> Union[bytes, str]:
+        """Converts an id to a token, special tokens included"""
+        if index in self.decoder:
+            return self.decoder[index]
+        raise ValueError("unknown ids")
+
+    def _convert_token_to_id(self, token: Union[bytes, str]) -> int:
+        """Converts a token to an id using the vocab, special tokens included"""
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        if token in self.mergeable_ranks:
+            return self.mergeable_ranks[token]
+        raise ValueError("unknown token")
+
+    def _tokenize(self, text: str, **kwargs):
+        """
+        Converts a string in a sequence of tokens (string), using the tokenizer. Split in words for word-based
+        vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces).
+
+        Do NOT take care of added tokens.
+        """
+        raise NotImplementedError
+
+    def _decode(
+        self,
+        token_ids: Union[int, List[int]],
+        skip_special_tokens: bool = False,
+        errors: str = None,
+        **kwargs,
+    ) -> str:
+        if isinstance(token_ids, int):
+            token_ids = [token_ids]
+        if skip_special_tokens:
+            token_ids = [i for i in token_ids if i < self.eod_id]
+        return self.tokenizer.decode(token_ids, errors=errors or self.errors)
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/tokenizer_config.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/tokenizer_config.json
@ -0,0 +1,17 @@
+{
+  "added_tokens_decoder": {},
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_qwen.QWenTokenizer",
+      null
+    ]
+  },
+  "chat_template": "{% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|im_end|>",
+  "model_max_length": 32768,
+  "pad_token": "<|im_end|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "QWenTokenizer"
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/train_results.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/train_results.json
@ -0,0 +1,10 @@
+{
+    "epoch": 0.8888888888888888,
+    "num_input_tokens_seen": 1321104,
+    "total_flos": 5.641287949968998e+16,
+    "train_loss": 1.3309755539894104,
+    "train_runtime": 1830.2469,
+    "train_samples_per_second": 4.371,
+    "train_steps_per_second": 0.273,
+    "train_tokens_per_second": 1084.007
+}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/trainer_log.jsonl
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/trainer_log.jsonl
@ -0,0 +1,168 @@
+{"current_steps": 3, "total_steps": 500, "loss": 1.5189, "learning_rate": 4.000000000000001e-06, "epoch": 0.005333333333333333, "percentage": 0.6, "cur_time": "2024-09-19 14:33:30", "elapsed_time": "0:00:12", "remaining_time": "0:35:14", "throughput": 768.4, "total_tokens": 9808}
+{"current_steps": 6, "total_steps": 500, "loss": 1.5574, "learning_rate": 6e-06, "epoch": 0.010666666666666666, "percentage": 1.2, "cur_time": "2024-09-19 14:33:40", "elapsed_time": "0:00:23", "remaining_time": "0:32:00", "throughput": 827.9, "total_tokens": 19312}
+{"current_steps": 9, "total_steps": 500, "loss": 1.5909, "learning_rate": 1e-05, "epoch": 0.016, "percentage": 1.8, "cur_time": "2024-09-19 14:33:51", "elapsed_time": "0:00:34", "remaining_time": "0:31:14", "throughput": 850.73, "total_tokens": 29232}
+{"current_steps": 12, "total_steps": 500, "loss": 1.8082, "learning_rate": 1.6000000000000003e-05, "epoch": 0.021333333333333333, "percentage": 2.4, "cur_time": "2024-09-19 14:34:02", "elapsed_time": "0:00:44", "remaining_time": "0:30:20", "throughput": 848.54, "total_tokens": 37984}
+{"current_steps": 15, "total_steps": 500, "loss": 1.7089, "learning_rate": 2e-05, "epoch": 0.02666666666666667, "percentage": 3.0, "cur_time": "2024-09-19 14:34:12", "elapsed_time": "0:00:55", "remaining_time": "0:29:47", "throughput": 806.72, "total_tokens": 44592}
+{"current_steps": 18, "total_steps": 500, "loss": 1.7715, "learning_rate": 2.6000000000000002e-05, "epoch": 0.032, "percentage": 3.6, "cur_time": "2024-09-19 14:34:23", "elapsed_time": "0:01:06", "remaining_time": "0:29:33", "throughput": 791.08, "total_tokens": 52400}
+{"current_steps": 21, "total_steps": 500, "loss": 1.5624, "learning_rate": 3.2000000000000005e-05, "epoch": 0.037333333333333336, "percentage": 4.2, "cur_time": "2024-09-19 14:34:34", "elapsed_time": "0:01:16", "remaining_time": "0:29:16", "throughput": 783.5, "total_tokens": 60320}
+{"current_steps": 24, "total_steps": 500, "loss": 1.9382, "learning_rate": 3.8e-05, "epoch": 0.042666666666666665, "percentage": 4.8, "cur_time": "2024-09-19 14:34:45", "elapsed_time": "0:01:27", "remaining_time": "0:29:03", "throughput": 762.52, "total_tokens": 67024}
+{"current_steps": 27, "total_steps": 500, "loss": 1.5455, "learning_rate": 4.4000000000000006e-05, "epoch": 0.048, "percentage": 5.4, "cur_time": "2024-09-19 14:34:56", "elapsed_time": "0:01:38", "remaining_time": "0:28:50", "throughput": 746.98, "total_tokens": 73776}
+{"current_steps": 30, "total_steps": 500, "loss": 1.6186, "learning_rate": 5e-05, "epoch": 0.05333333333333334, "percentage": 6.0, "cur_time": "2024-09-19 14:35:07", "elapsed_time": "0:01:49", "remaining_time": "0:28:42", "throughput": 751.4, "total_tokens": 82592}
+{"current_steps": 33, "total_steps": 500, "loss": 1.5182, "learning_rate": 5.6000000000000006e-05, "epoch": 0.058666666666666666, "percentage": 6.6, "cur_time": "2024-09-19 14:35:18", "elapsed_time": "0:02:00", "remaining_time": "0:28:30", "throughput": 748.85, "total_tokens": 90512}
+{"current_steps": 36, "total_steps": 500, "loss": 1.4462, "learning_rate": 6e-05, "epoch": 0.064, "percentage": 7.2, "cur_time": "2024-09-19 14:35:29", "elapsed_time": "0:02:11", "remaining_time": "0:28:16", "throughput": 735.9, "total_tokens": 96848}
+{"current_steps": 39, "total_steps": 500, "loss": 1.3399, "learning_rate": 6.6e-05, "epoch": 0.06933333333333333, "percentage": 7.8, "cur_time": "2024-09-19 14:35:39", "elapsed_time": "0:02:22", "remaining_time": "0:28:01", "throughput": 729.05, "total_tokens": 103728}
+{"current_steps": 42, "total_steps": 500, "loss": 1.3188, "learning_rate": 7.2e-05, "epoch": 0.07466666666666667, "percentage": 8.4, "cur_time": "2024-09-19 14:35:50", "elapsed_time": "0:02:33", "remaining_time": "0:27:50", "throughput": 732.22, "total_tokens": 112160}
+{"current_steps": 45, "total_steps": 500, "loss": 1.535, "learning_rate": 7.800000000000001e-05, "epoch": 0.08, "percentage": 9.0, "cur_time": "2024-09-19 14:36:01", "elapsed_time": "0:02:43", "remaining_time": "0:27:37", "throughput": 719.58, "total_tokens": 117984}
+{"current_steps": 48, "total_steps": 500, "loss": 1.363, "learning_rate": 8.4e-05, "epoch": 0.08533333333333333, "percentage": 9.6, "cur_time": "2024-09-19 14:36:11", "elapsed_time": "0:02:54", "remaining_time": "0:27:21", "throughput": 726.53, "total_tokens": 126624}
+{"current_steps": 51, "total_steps": 500, "loss": 1.5687, "learning_rate": 9e-05, "epoch": 0.09066666666666667, "percentage": 10.2, "cur_time": "2024-09-19 14:36:22", "elapsed_time": "0:03:04", "remaining_time": "0:27:07", "throughput": 727.42, "total_tokens": 134496}
+{"current_steps": 54, "total_steps": 500, "loss": 1.6227, "learning_rate": 9.6e-05, "epoch": 0.096, "percentage": 10.8, "cur_time": "2024-09-19 14:36:33", "elapsed_time": "0:03:15", "remaining_time": "0:26:55", "throughput": 724.07, "total_tokens": 141664}
+{"current_steps": 57, "total_steps": 500, "loss": 1.3263, "learning_rate": 9.999878153526974e-05, "epoch": 0.10133333333333333, "percentage": 11.4, "cur_time": "2024-09-19 14:36:43", "elapsed_time": "0:03:26", "remaining_time": "0:26:44", "throughput": 723.3, "total_tokens": 149360}
+{"current_steps": 60, "total_steps": 500, "loss": 1.3676, "learning_rate": 9.998050575201771e-05, "epoch": 0.10666666666666667, "percentage": 12.0, "cur_time": "2024-09-19 14:36:54", "elapsed_time": "0:03:37", "remaining_time": "0:26:33", "throughput": 725.97, "total_tokens": 157776}
+{"current_steps": 63, "total_steps": 500, "loss": 1.4657, "learning_rate": 9.99403068670717e-05, "epoch": 0.112, "percentage": 12.6, "cur_time": "2024-09-19 14:37:05", "elapsed_time": "0:03:47", "remaining_time": "0:26:19", "throughput": 726.29, "total_tokens": 165360}
+{"current_steps": 66, "total_steps": 500, "loss": 1.3857, "learning_rate": 9.987820251299122e-05, "epoch": 0.11733333333333333, "percentage": 13.2, "cur_time": "2024-09-19 14:37:15", "elapsed_time": "0:03:58", "remaining_time": "0:26:05", "throughput": 727.34, "total_tokens": 173152}
+{"current_steps": 69, "total_steps": 500, "loss": 1.549, "learning_rate": 9.979421993079852e-05, "epoch": 0.12266666666666666, "percentage": 13.8, "cur_time": "2024-09-19 14:37:25", "elapsed_time": "0:04:08", "remaining_time": "0:25:52", "throughput": 730.38, "total_tokens": 181504}
+{"current_steps": 72, "total_steps": 500, "loss": 1.4207, "learning_rate": 9.968839595802982e-05, "epoch": 0.128, "percentage": 14.4, "cur_time": "2024-09-19 14:37:36", "elapsed_time": "0:04:19", "remaining_time": "0:25:43", "throughput": 727.6, "total_tokens": 188880}
+{"current_steps": 75, "total_steps": 500, "loss": 1.4148, "learning_rate": 9.956077701257709e-05, "epoch": 0.13333333333333333, "percentage": 15.0, "cur_time": "2024-09-19 14:37:48", "elapsed_time": "0:04:30", "remaining_time": "0:25:33", "throughput": 724.13, "total_tokens": 195952}
+{"current_steps": 78, "total_steps": 500, "loss": 1.2252, "learning_rate": 9.941141907232765e-05, "epoch": 0.13866666666666666, "percentage": 15.6, "cur_time": "2024-09-19 14:37:59", "elapsed_time": "0:04:41", "remaining_time": "0:25:23", "throughput": 723.32, "total_tokens": 203744}
+{"current_steps": 81, "total_steps": 500, "loss": 1.1623, "learning_rate": 9.924038765061042e-05, "epoch": 0.144, "percentage": 16.2, "cur_time": "2024-09-19 14:38:10", "elapsed_time": "0:04:52", "remaining_time": "0:25:14", "throughput": 726.3, "total_tokens": 212624}
+{"current_steps": 84, "total_steps": 500, "loss": 1.5017, "learning_rate": 9.904775776745958e-05, "epoch": 0.14933333333333335, "percentage": 16.8, "cur_time": "2024-09-19 14:38:21", "elapsed_time": "0:05:03", "remaining_time": "0:25:04", "throughput": 724.25, "total_tokens": 219968}
+{"current_steps": 87, "total_steps": 500, "loss": 1.2643, "learning_rate": 9.88336139167084e-05, "epoch": 0.15466666666666667, "percentage": 17.4, "cur_time": "2024-09-19 14:38:31", "elapsed_time": "0:05:14", "remaining_time": "0:24:50", "throughput": 729.2, "total_tokens": 229024}
+{"current_steps": 90, "total_steps": 500, "loss": 1.2085, "learning_rate": 9.859805002892732e-05, "epoch": 0.16, "percentage": 18.0, "cur_time": "2024-09-19 14:38:41", "elapsed_time": "0:05:24", "remaining_time": "0:24:37", "throughput": 729.81, "total_tokens": 236768}
+{"current_steps": 93, "total_steps": 500, "loss": 1.333, "learning_rate": 9.834116943022298e-05, "epoch": 0.16533333333333333, "percentage": 18.6, "cur_time": "2024-09-19 14:38:52", "elapsed_time": "0:05:34", "remaining_time": "0:24:24", "throughput": 732.33, "total_tokens": 245152}
+{"current_steps": 96, "total_steps": 500, "loss": 1.5669, "learning_rate": 9.806308479691595e-05, "epoch": 0.17066666666666666, "percentage": 19.2, "cur_time": "2024-09-19 14:39:02", "elapsed_time": "0:05:45", "remaining_time": "0:24:12", "throughput": 729.35, "total_tokens": 251728}
+{"current_steps": 99, "total_steps": 500, "loss": 1.317, "learning_rate": 9.776391810611718e-05, "epoch": 0.176, "percentage": 19.8, "cur_time": "2024-09-19 14:39:13", "elapsed_time": "0:05:55", "remaining_time": "0:24:00", "throughput": 733.35, "total_tokens": 260832}
+{"current_steps": 102, "total_steps": 500, "loss": 1.3829, "learning_rate": 9.744380058222483e-05, "epoch": 0.18133333333333335, "percentage": 20.4, "cur_time": "2024-09-19 14:39:23", "elapsed_time": "0:06:06", "remaining_time": "0:23:50", "throughput": 732.38, "total_tokens": 268448}
+{"current_steps": 105, "total_steps": 500, "loss": 1.2588, "learning_rate": 9.710287263936484e-05, "epoch": 0.18666666666666668, "percentage": 21.0, "cur_time": "2024-09-19 14:39:34", "elapsed_time": "0:06:17", "remaining_time": "0:23:39", "throughput": 731.63, "total_tokens": 276016}
+{"current_steps": 108, "total_steps": 500, "loss": 1.3142, "learning_rate": 9.674128381980072e-05, "epoch": 0.192, "percentage": 21.6, "cur_time": "2024-09-19 14:39:45", "elapsed_time": "0:06:28", "remaining_time": "0:23:28", "throughput": 727.67, "total_tokens": 282416}
+{"current_steps": 111, "total_steps": 500, "loss": 1.4468, "learning_rate": 9.635919272833938e-05, "epoch": 0.19733333333333333, "percentage": 22.2, "cur_time": "2024-09-19 14:39:56", "elapsed_time": "0:06:38", "remaining_time": "0:23:18", "throughput": 726.96, "total_tokens": 290016}
+{"current_steps": 114, "total_steps": 500, "loss": 1.5577, "learning_rate": 9.595676696276172e-05, "epoch": 0.20266666666666666, "percentage": 22.8, "cur_time": "2024-09-19 14:40:07", "elapsed_time": "0:06:49", "remaining_time": "0:23:07", "throughput": 727.14, "total_tokens": 297952}
+{"current_steps": 117, "total_steps": 500, "loss": 1.3545, "learning_rate": 9.553418304030886e-05, "epoch": 0.208, "percentage": 23.4, "cur_time": "2024-09-19 14:40:18", "elapsed_time": "0:07:00", "remaining_time": "0:22:57", "throughput": 725.15, "total_tokens": 305152}
+{"current_steps": 120, "total_steps": 500, "loss": 1.2359, "learning_rate": 9.50916263202557e-05, "epoch": 0.21333333333333335, "percentage": 24.0, "cur_time": "2024-09-19 14:40:29", "elapsed_time": "0:07:11", "remaining_time": "0:22:47", "throughput": 723.79, "total_tokens": 312592}
+{"current_steps": 123, "total_steps": 500, "loss": 1.2377, "learning_rate": 9.462929092260628e-05, "epoch": 0.21866666666666668, "percentage": 24.6, "cur_time": "2024-09-19 14:40:39", "elapsed_time": "0:07:22", "remaining_time": "0:22:35", "throughput": 723.22, "total_tokens": 319792}
+{"current_steps": 126, "total_steps": 500, "loss": 1.402, "learning_rate": 9.414737964294636e-05, "epoch": 0.224, "percentage": 25.2, "cur_time": "2024-09-19 14:40:49", "elapsed_time": "0:07:32", "remaining_time": "0:22:23", "throughput": 724.28, "total_tokens": 327776}
+{"current_steps": 129, "total_steps": 500, "loss": 1.3244, "learning_rate": 9.364610386349049e-05, "epoch": 0.22933333333333333, "percentage": 25.8, "cur_time": "2024-09-19 14:41:00", "elapsed_time": "0:07:43", "remaining_time": "0:22:11", "throughput": 726.49, "total_tokens": 336432}
+{"current_steps": 132, "total_steps": 500, "loss": 1.3333, "learning_rate": 9.312568346036288e-05, "epoch": 0.23466666666666666, "percentage": 26.4, "cur_time": "2024-09-19 14:41:11", "elapsed_time": "0:07:53", "remaining_time": "0:22:01", "throughput": 724.2, "total_tokens": 343216}
+{"current_steps": 135, "total_steps": 500, "loss": 1.2563, "learning_rate": 9.258634670715238e-05, "epoch": 0.24, "percentage": 27.0, "cur_time": "2024-09-19 14:41:21", "elapsed_time": "0:08:04", "remaining_time": "0:21:49", "throughput": 722.89, "total_tokens": 350096}
+{"current_steps": 138, "total_steps": 500, "loss": 1.3305, "learning_rate": 9.202833017478422e-05, "epoch": 0.24533333333333332, "percentage": 27.6, "cur_time": "2024-09-19 14:41:32", "elapsed_time": "0:08:14", "remaining_time": "0:21:37", "throughput": 725.77, "total_tokens": 359040}
+{"current_steps": 141, "total_steps": 500, "loss": 1.2371, "learning_rate": 9.145187862775209e-05, "epoch": 0.25066666666666665, "percentage": 28.2, "cur_time": "2024-09-19 14:41:42", "elapsed_time": "0:08:25", "remaining_time": "0:21:26", "throughput": 727.69, "total_tokens": 367568}
+{"current_steps": 144, "total_steps": 500, "loss": 1.2544, "learning_rate": 9.085724491675642e-05, "epoch": 0.256, "percentage": 28.8, "cur_time": "2024-09-19 14:41:52", "elapsed_time": "0:08:35", "remaining_time": "0:21:14", "throughput": 728.61, "total_tokens": 375552}
+{"current_steps": 147, "total_steps": 500, "loss": 1.3928, "learning_rate": 9.02446898677957e-05, "epoch": 0.2613333333333333, "percentage": 29.4, "cur_time": "2024-09-19 14:42:03", "elapsed_time": "0:08:45", "remaining_time": "0:21:02", "throughput": 733.4, "total_tokens": 385712}
+{"current_steps": 150, "total_steps": 500, "loss": 1.1703, "learning_rate": 8.961448216775954e-05, "epoch": 0.26666666666666666, "percentage": 30.0, "cur_time": "2024-09-19 14:42:13", "elapsed_time": "0:08:56", "remaining_time": "0:20:50", "throughput": 733.69, "total_tokens": 393344}
+{"current_steps": 153, "total_steps": 500, "loss": 1.3133, "learning_rate": 8.896689824657372e-05, "epoch": 0.272, "percentage": 30.6, "cur_time": "2024-09-19 14:42:23", "elapsed_time": "0:09:06", "remaining_time": "0:20:39", "throughput": 733.38, "total_tokens": 400720}
+{"current_steps": 156, "total_steps": 500, "loss": 1.2321, "learning_rate": 8.83022221559489e-05, "epoch": 0.2773333333333333, "percentage": 31.2, "cur_time": "2024-09-19 14:42:34", "elapsed_time": "0:09:17", "remaining_time": "0:20:28", "throughput": 734.04, "total_tokens": 409056}
+{"current_steps": 159, "total_steps": 500, "loss": 1.3816, "learning_rate": 8.762074544478623e-05, "epoch": 0.2826666666666667, "percentage": 31.8, "cur_time": "2024-09-19 14:42:45", "elapsed_time": "0:09:27", "remaining_time": "0:20:17", "throughput": 732.97, "total_tokens": 416144}
+{"current_steps": 162, "total_steps": 500, "loss": 1.1577, "learning_rate": 8.692276703129421e-05, "epoch": 0.288, "percentage": 32.4, "cur_time": "2024-09-19 14:42:56", "elapsed_time": "0:09:38", "remaining_time": "0:20:07", "throughput": 730.57, "total_tokens": 422880}
+{"current_steps": 165, "total_steps": 500, "loss": 1.2287, "learning_rate": 8.620859307187339e-05, "epoch": 0.29333333333333333, "percentage": 33.0, "cur_time": "2024-09-19 14:43:07", "elapsed_time": "0:09:49", "remaining_time": "0:19:57", "throughput": 728.02, "total_tokens": 429440}
+{"current_steps": 168, "total_steps": 500, "loss": 1.2672, "learning_rate": 8.547853682682604e-05, "epoch": 0.2986666666666667, "percentage": 33.6, "cur_time": "2024-09-19 14:43:18", "elapsed_time": "0:10:00", "remaining_time": "0:19:47", "throughput": 727.46, "total_tokens": 437200}
+{"current_steps": 171, "total_steps": 500, "loss": 1.299, "learning_rate": 8.473291852294987e-05, "epoch": 0.304, "percentage": 34.2, "cur_time": "2024-09-19 14:43:29", "elapsed_time": "0:10:11", "remaining_time": "0:19:36", "throughput": 729.8, "total_tokens": 446416}
+{"current_steps": 174, "total_steps": 500, "loss": 1.4358, "learning_rate": 8.397206521307584e-05, "epoch": 0.30933333333333335, "percentage": 34.8, "cur_time": "2024-09-19 14:43:39", "elapsed_time": "0:10:22", "remaining_time": "0:19:26", "throughput": 730.48, "total_tokens": 454672}
+{"current_steps": 177, "total_steps": 500, "loss": 1.2488, "learning_rate": 8.319631063261209e-05, "epoch": 0.31466666666666665, "percentage": 35.4, "cur_time": "2024-09-19 14:43:50", "elapsed_time": "0:10:33", "remaining_time": "0:19:15", "throughput": 731.53, "total_tokens": 463216}
+{"current_steps": 180, "total_steps": 500, "loss": 1.2692, "learning_rate": 8.240599505315655e-05, "epoch": 0.32, "percentage": 36.0, "cur_time": "2024-09-19 14:44:01", "elapsed_time": "0:10:43", "remaining_time": "0:19:04", "throughput": 733.8, "total_tokens": 472432}
+{"current_steps": 183, "total_steps": 500, "loss": 1.2339, "learning_rate": 8.160146513324254e-05, "epoch": 0.3253333333333333, "percentage": 36.6, "cur_time": "2024-09-19 14:44:11", "elapsed_time": "0:10:54", "remaining_time": "0:18:53", "throughput": 733.66, "total_tokens": 479968}
+{"current_steps": 186, "total_steps": 500, "loss": 1.4274, "learning_rate": 8.07830737662829e-05, "epoch": 0.33066666666666666, "percentage": 37.2, "cur_time": "2024-09-19 14:44:22", "elapsed_time": "0:11:04", "remaining_time": "0:18:42", "throughput": 734.36, "total_tokens": 488112}
+{"current_steps": 189, "total_steps": 500, "loss": 1.3689, "learning_rate": 7.99511799257793e-05, "epoch": 0.336, "percentage": 37.8, "cur_time": "2024-09-19 14:44:32", "elapsed_time": "0:11:15", "remaining_time": "0:18:31", "throughput": 734.04, "total_tokens": 495808}
+{"current_steps": 192, "total_steps": 500, "loss": 1.344, "learning_rate": 7.910614850786448e-05, "epoch": 0.3413333333333333, "percentage": 38.4, "cur_time": "2024-09-19 14:44:43", "elapsed_time": "0:11:26", "remaining_time": "0:18:20", "throughput": 733.19, "total_tokens": 503216}
+{"current_steps": 195, "total_steps": 500, "loss": 1.2793, "learning_rate": 7.82483501712469e-05, "epoch": 0.3466666666666667, "percentage": 39.0, "cur_time": "2024-09-19 14:44:54", "elapsed_time": "0:11:37", "remaining_time": "0:18:10", "throughput": 733.77, "total_tokens": 511568}
+{"current_steps": 198, "total_steps": 500, "loss": 1.5863, "learning_rate": 7.737816117462752e-05, "epoch": 0.352, "percentage": 39.6, "cur_time": "2024-09-19 14:45:05", "elapsed_time": "0:11:47", "remaining_time": "0:17:59", "throughput": 734.08, "total_tokens": 519536}
+{"current_steps": 201, "total_steps": 500, "loss": 1.5119, "learning_rate": 7.649596321166024e-05, "epoch": 0.35733333333333334, "percentage": 40.2, "cur_time": "2024-09-19 14:45:15", "elapsed_time": "0:11:58", "remaining_time": "0:17:48", "throughput": 732.99, "total_tokens": 526576}
+{"current_steps": 204, "total_steps": 500, "loss": 1.335, "learning_rate": 7.560214324352858e-05, "epoch": 0.3626666666666667, "percentage": 40.8, "cur_time": "2024-09-19 14:45:26", "elapsed_time": "0:12:09", "remaining_time": "0:17:37", "throughput": 732.15, "total_tokens": 533824}
+{"current_steps": 207, "total_steps": 500, "loss": 1.2377, "learning_rate": 7.469709332921155e-05, "epoch": 0.368, "percentage": 41.4, "cur_time": "2024-09-19 14:45:37", "elapsed_time": "0:12:19", "remaining_time": "0:17:27", "throughput": 731.85, "total_tokens": 541472}
+{"current_steps": 210, "total_steps": 500, "loss": 1.1963, "learning_rate": 7.378121045351378e-05, "epoch": 0.37333333333333335, "percentage": 42.0, "cur_time": "2024-09-19 14:45:47", "elapsed_time": "0:12:30", "remaining_time": "0:17:16", "throughput": 733.12, "total_tokens": 550144}
+{"current_steps": 213, "total_steps": 500, "loss": 1.3623, "learning_rate": 7.285489635293472e-05, "epoch": 0.37866666666666665, "percentage": 42.6, "cur_time": "2024-09-19 14:45:58", "elapsed_time": "0:12:40", "remaining_time": "0:17:05", "throughput": 733.12, "total_tokens": 557872}
+{"current_steps": 216, "total_steps": 500, "loss": 1.2955, "learning_rate": 7.191855733945387e-05, "epoch": 0.384, "percentage": 43.2, "cur_time": "2024-09-19 14:46:08", "elapsed_time": "0:12:51", "remaining_time": "0:16:54", "throughput": 730.69, "total_tokens": 563728}
+{"current_steps": 219, "total_steps": 500, "loss": 1.2986, "learning_rate": 7.097260412230886e-05, "epoch": 0.3893333333333333, "percentage": 43.8, "cur_time": "2024-09-19 14:46:19", "elapsed_time": "0:13:02", "remaining_time": "0:16:43", "throughput": 730.21, "total_tokens": 571056}
+{"current_steps": 222, "total_steps": 500, "loss": 1.0992, "learning_rate": 7.001745162784477e-05, "epoch": 0.39466666666666667, "percentage": 44.4, "cur_time": "2024-09-19 14:46:29", "elapsed_time": "0:13:12", "remaining_time": "0:16:32", "throughput": 730.81, "total_tokens": 579184}
+{"current_steps": 225, "total_steps": 500, "loss": 1.437, "learning_rate": 6.905351881751372e-05, "epoch": 0.4, "percentage": 45.0, "cur_time": "2024-09-19 14:46:40", "elapsed_time": "0:13:22", "remaining_time": "0:16:21", "throughput": 730.11, "total_tokens": 586128}
+{"current_steps": 228, "total_steps": 500, "loss": 1.431, "learning_rate": 6.808122850410461e-05, "epoch": 0.4053333333333333, "percentage": 45.6, "cur_time": "2024-09-19 14:46:50", "elapsed_time": "0:13:33", "remaining_time": "0:16:10", "throughput": 731.52, "total_tokens": 594848}
+{"current_steps": 231, "total_steps": 500, "loss": 1.3132, "learning_rate": 6.710100716628344e-05, "epoch": 0.4106666666666667, "percentage": 46.2, "cur_time": "2024-09-19 14:47:00", "elapsed_time": "0:13:43", "remaining_time": "0:15:59", "throughput": 731.65, "total_tokens": 602544}
+{"current_steps": 234, "total_steps": 500, "loss": 1.3838, "learning_rate": 6.644333233692916e-05, "epoch": 0.416, "percentage": 46.8, "cur_time": "2024-09-19 14:47:11", "elapsed_time": "0:13:53", "remaining_time": "0:15:47", "throughput": 731.42, "total_tokens": 609904}
+{"current_steps": 237, "total_steps": 500, "loss": 1.1309, "learning_rate": 6.545084971874738e-05, "epoch": 0.42133333333333334, "percentage": 47.4, "cur_time": "2024-09-19 14:47:21", "elapsed_time": "0:14:04", "remaining_time": "0:15:36", "throughput": 732.82, "total_tokens": 618576}
+{"current_steps": 240, "total_steps": 500, "loss": 1.3572, "learning_rate": 6.445158984722358e-05, "epoch": 0.4266666666666667, "percentage": 48.0, "cur_time": "2024-09-19 14:47:31", "elapsed_time": "0:14:14", "remaining_time": "0:15:25", "throughput": 733.87, "total_tokens": 626960}
+{"current_steps": 243, "total_steps": 500, "loss": 1.3563, "learning_rate": 6.344599103076329e-05, "epoch": 0.432, "percentage": 48.6, "cur_time": "2024-09-19 14:47:41", "elapsed_time": "0:14:24", "remaining_time": "0:15:14", "throughput": 734.35, "total_tokens": 634896}
+{"current_steps": 246, "total_steps": 500, "loss": 1.2947, "learning_rate": 6.243449435824276e-05, "epoch": 0.43733333333333335, "percentage": 49.2, "cur_time": "2024-09-19 14:47:52", "elapsed_time": "0:14:34", "remaining_time": "0:15:03", "throughput": 734.59, "total_tokens": 642640}
+{"current_steps": 249, "total_steps": 500, "loss": 1.206, "learning_rate": 6.141754350553279e-05, "epoch": 0.44266666666666665, "percentage": 49.8, "cur_time": "2024-09-19 14:48:02", "elapsed_time": "0:14:45", "remaining_time": "0:14:52", "throughput": 735.33, "total_tokens": 650816}
+{"current_steps": 252, "total_steps": 500, "loss": 1.2874, "learning_rate": 6.0395584540887963e-05, "epoch": 0.448, "percentage": 50.4, "cur_time": "2024-09-19 14:48:12", "elapsed_time": "0:14:55", "remaining_time": "0:14:41", "throughput": 735.72, "total_tokens": 658736}
+{"current_steps": 255, "total_steps": 500, "loss": 1.3168, "learning_rate": 5.9369065729286245e-05, "epoch": 0.4533333333333333, "percentage": 51.0, "cur_time": "2024-09-19 14:48:23", "elapsed_time": "0:15:05", "remaining_time": "0:14:30", "throughput": 735.88, "total_tokens": 666416}
+{"current_steps": 258, "total_steps": 500, "loss": 1.3401, "learning_rate": 5.833843733580512e-05, "epoch": 0.45866666666666667, "percentage": 51.6, "cur_time": "2024-09-19 14:48:33", "elapsed_time": "0:15:15", "remaining_time": "0:14:19", "throughput": 735.18, "total_tokens": 673312}
+{"current_steps": 261, "total_steps": 500, "loss": 1.3921, "learning_rate": 5.730415142812059e-05, "epoch": 0.464, "percentage": 52.2, "cur_time": "2024-09-19 14:48:43", "elapsed_time": "0:15:26", "remaining_time": "0:14:08", "throughput": 735.78, "total_tokens": 681408}
+{"current_steps": 264, "total_steps": 500, "loss": 1.2268, "learning_rate": 5.6266661678215216e-05, "epoch": 0.4693333333333333, "percentage": 52.8, "cur_time": "2024-09-19 14:48:53", "elapsed_time": "0:15:36", "remaining_time": "0:13:57", "throughput": 737.55, "total_tokens": 690608}
+{"current_steps": 267, "total_steps": 500, "loss": 1.3181, "learning_rate": 5.522642316338268e-05, "epoch": 0.4746666666666667, "percentage": 53.4, "cur_time": "2024-09-19 14:49:03", "elapsed_time": "0:15:46", "remaining_time": "0:13:46", "throughput": 738.01, "total_tokens": 698576}
+{"current_steps": 270, "total_steps": 500, "loss": 1.1667, "learning_rate": 5.418389216661579e-05, "epoch": 0.48, "percentage": 54.0, "cur_time": "2024-09-19 14:49:14", "elapsed_time": "0:15:57", "remaining_time": "0:13:35", "throughput": 739.49, "total_tokens": 707808}
+{"current_steps": 273, "total_steps": 500, "loss": 1.2205, "learning_rate": 5.313952597646568e-05, "epoch": 0.48533333333333334, "percentage": 54.6, "cur_time": "2024-09-19 14:49:25", "elapsed_time": "0:16:07", "remaining_time": "0:13:24", "throughput": 737.69, "total_tokens": 714016}
+{"current_steps": 276, "total_steps": 500, "loss": 1.2018, "learning_rate": 5.209378268645998e-05, "epoch": 0.49066666666666664, "percentage": 55.2, "cur_time": "2024-09-19 14:49:35", "elapsed_time": "0:16:18", "remaining_time": "0:13:14", "throughput": 737.36, "total_tokens": 721552}
+{"current_steps": 279, "total_steps": 500, "loss": 1.3723, "learning_rate": 5.104712099416785e-05, "epoch": 0.496, "percentage": 55.8, "cur_time": "2024-09-19 14:49:46", "elapsed_time": "0:16:29", "remaining_time": "0:13:03", "throughput": 736.87, "total_tokens": 729104}
+{"current_steps": 282, "total_steps": 500, "loss": 1.3686, "learning_rate": 5e-05, "epoch": 0.5013333333333333, "percentage": 56.4, "cur_time": "2024-09-19 14:49:57", "elapsed_time": "0:16:39", "remaining_time": "0:12:52", "throughput": 736.57, "total_tokens": 736496}
+{"current_steps": 285, "total_steps": 500, "loss": 1.2406, "learning_rate": 4.895287900583216e-05, "epoch": 0.5066666666666667, "percentage": 57.0, "cur_time": "2024-09-19 14:50:08", "elapsed_time": "0:16:50", "remaining_time": "0:12:42", "throughput": 737.33, "total_tokens": 745184}
+{"current_steps": 288, "total_steps": 500, "loss": 1.2107, "learning_rate": 4.790621731354003e-05, "epoch": 0.512, "percentage": 57.6, "cur_time": "2024-09-19 14:50:18", "elapsed_time": "0:17:01", "remaining_time": "0:12:31", "throughput": 736.43, "total_tokens": 752048}
+{"current_steps": 291, "total_steps": 500, "loss": 1.2136, "learning_rate": 4.6860474023534335e-05, "epoch": 0.5173333333333333, "percentage": 58.2, "cur_time": "2024-09-19 14:50:29", "elapsed_time": "0:17:12", "remaining_time": "0:12:21", "throughput": 737.36, "total_tokens": 761088}
+{"current_steps": 294, "total_steps": 500, "loss": 1.2939, "learning_rate": 4.5816107833384234e-05, "epoch": 0.5226666666666666, "percentage": 58.8, "cur_time": "2024-09-19 14:50:40", "elapsed_time": "0:17:22", "remaining_time": "0:12:10", "throughput": 736.53, "total_tokens": 768080}
+{"current_steps": 297, "total_steps": 500, "loss": 1.2621, "learning_rate": 4.477357683661734e-05, "epoch": 0.528, "percentage": 59.4, "cur_time": "2024-09-19 14:50:51", "elapsed_time": "0:17:33", "remaining_time": "0:12:00", "throughput": 738.06, "total_tokens": 777808}
+{"current_steps": 300, "total_steps": 500, "loss": 1.3968, "learning_rate": 4.373333832178478e-05, "epoch": 0.5333333333333333, "percentage": 60.0, "cur_time": "2024-09-19 14:51:02", "elapsed_time": "0:17:44", "remaining_time": "0:11:49", "throughput": 738.26, "total_tokens": 786032}
+{"current_steps": 303, "total_steps": 500, "loss": 1.2069, "learning_rate": 4.269584857187943e-05, "epoch": 0.5386666666666666, "percentage": 60.6, "cur_time": "2024-09-19 14:51:12", "elapsed_time": "0:17:55", "remaining_time": "0:11:39", "throughput": 737.67, "total_tokens": 793424}
+{"current_steps": 306, "total_steps": 500, "loss": 1.1725, "learning_rate": 4.166156266419489e-05, "epoch": 0.544, "percentage": 61.2, "cur_time": "2024-09-19 14:51:23", "elapsed_time": "0:18:06", "remaining_time": "0:11:28", "throughput": 737.78, "total_tokens": 801392}
+{"current_steps": 309, "total_steps": 500, "loss": 1.1476, "learning_rate": 4.063093427071376e-05, "epoch": 0.5493333333333333, "percentage": 61.8, "cur_time": "2024-09-19 14:51:34", "elapsed_time": "0:18:16", "remaining_time": "0:11:17", "throughput": 737.79, "total_tokens": 809184}
+{"current_steps": 312, "total_steps": 500, "loss": 1.3471, "learning_rate": 3.960441545911204e-05, "epoch": 0.5546666666666666, "percentage": 62.4, "cur_time": "2024-09-19 14:51:44", "elapsed_time": "0:18:27", "remaining_time": "0:11:07", "throughput": 737.77, "total_tokens": 816896}
+{"current_steps": 315, "total_steps": 500, "loss": 1.3197, "learning_rate": 3.858245649446721e-05, "epoch": 0.56, "percentage": 63.0, "cur_time": "2024-09-19 14:51:55", "elapsed_time": "0:18:38", "remaining_time": "0:10:56", "throughput": 738.67, "total_tokens": 825952}
+{"current_steps": 318, "total_steps": 500, "loss": 1.2391, "learning_rate": 3.756550564175727e-05, "epoch": 0.5653333333333334, "percentage": 63.6, "cur_time": "2024-09-19 14:52:06", "elapsed_time": "0:18:49", "remaining_time": "0:10:46", "throughput": 739.98, "total_tokens": 835664}
+{"current_steps": 321, "total_steps": 500, "loss": 1.2011, "learning_rate": 3.655400896923672e-05, "epoch": 0.5706666666666667, "percentage": 64.2, "cur_time": "2024-09-19 14:52:17", "elapsed_time": "0:19:00", "remaining_time": "0:10:35", "throughput": 740.33, "total_tokens": 844288}
+{"current_steps": 324, "total_steps": 500, "loss": 1.2983, "learning_rate": 3.554841015277641e-05, "epoch": 0.576, "percentage": 64.8, "cur_time": "2024-09-19 14:52:28", "elapsed_time": "0:19:11", "remaining_time": "0:10:25", "throughput": 739.48, "total_tokens": 851584}
+{"current_steps": 327, "total_steps": 500, "loss": 1.3713, "learning_rate": 3.4549150281252636e-05, "epoch": 0.5813333333333334, "percentage": 65.4, "cur_time": "2024-09-19 14:52:40", "elapsed_time": "0:19:22", "remaining_time": "0:10:15", "throughput": 740.68, "total_tokens": 861216}
+{"current_steps": 330, "total_steps": 500, "loss": 1.3763, "learning_rate": 3.355666766307084e-05, "epoch": 0.5866666666666667, "percentage": 66.0, "cur_time": "2024-09-19 14:52:51", "elapsed_time": "0:19:33", "remaining_time": "0:10:04", "throughput": 740.95, "total_tokens": 869792}
+{"current_steps": 333, "total_steps": 500, "loss": 1.1632, "learning_rate": 3.257139763390925e-05, "epoch": 0.592, "percentage": 66.6, "cur_time": "2024-09-19 14:53:02", "elapsed_time": "0:19:44", "remaining_time": "0:09:54", "throughput": 740.41, "total_tokens": 877216}
+{"current_steps": 336, "total_steps": 500, "loss": 1.2382, "learning_rate": 3.1593772365766105e-05, "epoch": 0.5973333333333334, "percentage": 67.2, "cur_time": "2024-09-19 14:53:12", "elapsed_time": "0:19:55", "remaining_time": "0:09:43", "throughput": 739.28, "total_tokens": 883488}
+{"current_steps": 339, "total_steps": 500, "loss": 1.1966, "learning_rate": 3.062422067739485e-05, "epoch": 0.6026666666666667, "percentage": 67.8, "cur_time": "2024-09-19 14:53:23", "elapsed_time": "0:20:05", "remaining_time": "0:09:32", "throughput": 738.88, "total_tokens": 890816}
+{"current_steps": 342, "total_steps": 500, "loss": 1.2934, "learning_rate": 2.9663167846209998e-05, "epoch": 0.608, "percentage": 68.4, "cur_time": "2024-09-19 14:53:33", "elapsed_time": "0:20:16", "remaining_time": "0:09:21", "throughput": 740.21, "total_tokens": 900352}
+{"current_steps": 345, "total_steps": 500, "loss": 1.184, "learning_rate": 2.8711035421746367e-05, "epoch": 0.6133333333333333, "percentage": 69.0, "cur_time": "2024-09-19 14:53:44", "elapsed_time": "0:20:26", "remaining_time": "0:09:11", "throughput": 741.26, "total_tokens": 909312}
+{"current_steps": 348, "total_steps": 500, "loss": 1.1655, "learning_rate": 2.776824104075364e-05, "epoch": 0.6186666666666667, "percentage": 69.6, "cur_time": "2024-09-19 14:53:54", "elapsed_time": "0:20:36", "remaining_time": "0:09:00", "throughput": 741.49, "total_tokens": 917168}
+{"current_steps": 351, "total_steps": 500, "loss": 1.2241, "learning_rate": 2.6835198244006927e-05, "epoch": 0.624, "percentage": 70.2, "cur_time": "2024-09-19 14:54:04", "elapsed_time": "0:20:47", "remaining_time": "0:08:49", "throughput": 740.4, "total_tokens": 923424}
+{"current_steps": 354, "total_steps": 500, "loss": 1.275, "learning_rate": 2.591231629491423e-05, "epoch": 0.6293333333333333, "percentage": 70.8, "cur_time": "2024-09-19 14:54:15", "elapsed_time": "0:20:57", "remaining_time": "0:08:38", "throughput": 740.37, "total_tokens": 931152}
+{"current_steps": 357, "total_steps": 500, "loss": 1.2231, "learning_rate": 2.500000000000001e-05, "epoch": 0.6346666666666667, "percentage": 71.4, "cur_time": "2024-09-19 14:54:25", "elapsed_time": "0:21:08", "remaining_time": "0:08:28", "throughput": 741.9, "total_tokens": 940992}
+{"current_steps": 360, "total_steps": 500, "loss": 1.3608, "learning_rate": 2.4098649531343497e-05, "epoch": 0.64, "percentage": 72.0, "cur_time": "2024-09-19 14:54:36", "elapsed_time": "0:21:18", "remaining_time": "0:08:17", "throughput": 742.73, "total_tokens": 949936}
+{"current_steps": 363, "total_steps": 500, "loss": 1.3481, "learning_rate": 2.3208660251050158e-05, "epoch": 0.6453333333333333, "percentage": 72.6, "cur_time": "2024-09-19 14:54:47", "elapsed_time": "0:21:29", "remaining_time": "0:08:06", "throughput": 743.06, "total_tokens": 958432}
+{"current_steps": 366, "total_steps": 500, "loss": 1.2009, "learning_rate": 2.23304225378328e-05, "epoch": 0.6506666666666666, "percentage": 73.2, "cur_time": "2024-09-19 14:54:58", "elapsed_time": "0:21:40", "remaining_time": "0:07:56", "throughput": 743.06, "total_tokens": 966688}
+{"current_steps": 369, "total_steps": 500, "loss": 1.1758, "learning_rate": 2.1464321615778422e-05, "epoch": 0.656, "percentage": 73.8, "cur_time": "2024-09-19 14:55:09", "elapsed_time": "0:21:52", "remaining_time": "0:07:45", "throughput": 742.21, "total_tokens": 973872}
+{"current_steps": 372, "total_steps": 500, "loss": 1.3557, "learning_rate": 2.061073738537635e-05, "epoch": 0.6613333333333333, "percentage": 74.4, "cur_time": "2024-09-19 14:55:20", "elapsed_time": "0:22:03", "remaining_time": "0:07:35", "throughput": 741.58, "total_tokens": 981264}
+{"current_steps": 375, "total_steps": 500, "loss": 1.4066, "learning_rate": 1.977004425688126e-05, "epoch": 0.6666666666666666, "percentage": 75.0, "cur_time": "2024-09-19 14:55:31", "elapsed_time": "0:22:14", "remaining_time": "0:07:24", "throughput": 741.2, "total_tokens": 988800}
+{"current_steps": 378, "total_steps": 500, "loss": 1.4266, "learning_rate": 1.8942610986084486e-05, "epoch": 0.672, "percentage": 75.6, "cur_time": "2024-09-19 14:55:41", "elapsed_time": "0:22:24", "remaining_time": "0:07:13", "throughput": 741.3, "total_tokens": 996656}
+{"current_steps": 381, "total_steps": 500, "loss": 1.2348, "learning_rate": 1.8128800512565513e-05, "epoch": 0.6773333333333333, "percentage": 76.2, "cur_time": "2024-09-19 14:55:52", "elapsed_time": "0:22:34", "remaining_time": "0:07:03", "throughput": 741.62, "total_tokens": 1004672}
+{"current_steps": 384, "total_steps": 500, "loss": 1.2, "learning_rate": 1.7328969800494726e-05, "epoch": 0.6826666666666666, "percentage": 76.8, "cur_time": "2024-09-19 14:56:02", "elapsed_time": "0:22:44", "remaining_time": "0:06:52", "throughput": 741.71, "total_tokens": 1012400}
+{"current_steps": 387, "total_steps": 500, "loss": 1.2074, "learning_rate": 1.6543469682057106e-05, "epoch": 0.688, "percentage": 77.4, "cur_time": "2024-09-19 14:56:12", "elapsed_time": "0:22:55", "remaining_time": "0:06:41", "throughput": 740.89, "total_tokens": 1018768}
+{"current_steps": 390, "total_steps": 500, "loss": 1.3071, "learning_rate": 1.5772644703565565e-05, "epoch": 0.6933333333333334, "percentage": 78.0, "cur_time": "2024-09-19 14:56:22", "elapsed_time": "0:23:05", "remaining_time": "0:06:30", "throughput": 740.9, "total_tokens": 1026368}
+{"current_steps": 393, "total_steps": 500, "loss": 1.3392, "learning_rate": 1.5016832974331724e-05, "epoch": 0.6986666666666667, "percentage": 78.6, "cur_time": "2024-09-19 14:56:32", "elapsed_time": "0:23:15", "remaining_time": "0:06:19", "throughput": 740.7, "total_tokens": 1033664}
+{"current_steps": 396, "total_steps": 500, "loss": 1.2203, "learning_rate": 1.4276366018359844e-05, "epoch": 0.704, "percentage": 79.2, "cur_time": "2024-09-19 14:56:43", "elapsed_time": "0:23:25", "remaining_time": "0:06:09", "throughput": 741.61, "total_tokens": 1042560}
+{"current_steps": 399, "total_steps": 500, "loss": 1.3089, "learning_rate": 1.3551568628929434e-05, "epoch": 0.7093333333333334, "percentage": 79.8, "cur_time": "2024-09-19 14:56:53", "elapsed_time": "0:23:36", "remaining_time": "0:05:58", "throughput": 741.57, "total_tokens": 1050336}
+{"current_steps": 402, "total_steps": 500, "loss": 1.3856, "learning_rate": 1.2842758726130283e-05, "epoch": 0.7146666666666667, "percentage": 80.4, "cur_time": "2024-09-19 14:57:04", "elapsed_time": "0:23:47", "remaining_time": "0:05:47", "throughput": 741.16, "total_tokens": 1057824}
+{"current_steps": 405, "total_steps": 500, "loss": 1.4345, "learning_rate": 1.2150247217412186e-05, "epoch": 0.72, "percentage": 81.0, "cur_time": "2024-09-19 14:57:15", "elapsed_time": "0:23:58", "remaining_time": "0:05:37", "throughput": 740.73, "total_tokens": 1065184}
+{"current_steps": 408, "total_steps": 500, "loss": 1.3059, "learning_rate": 1.1474337861210543e-05, "epoch": 0.7253333333333334, "percentage": 81.6, "cur_time": "2024-09-19 14:57:25", "elapsed_time": "0:24:08", "remaining_time": "0:05:26", "throughput": 740.11, "total_tokens": 1072016}
+{"current_steps": 411, "total_steps": 500, "loss": 1.1952, "learning_rate": 1.0815327133708015e-05, "epoch": 0.7306666666666667, "percentage": 82.2, "cur_time": "2024-09-19 14:57:36", "elapsed_time": "0:24:19", "remaining_time": "0:05:15", "throughput": 739.9, "total_tokens": 1079584}
+{"current_steps": 414, "total_steps": 500, "loss": 1.1284, "learning_rate": 1.0173504098790187e-05, "epoch": 0.736, "percentage": 82.8, "cur_time": "2024-09-19 14:57:46", "elapsed_time": "0:24:29", "remaining_time": "0:05:05", "throughput": 740.43, "total_tokens": 1087984}
+{"current_steps": 417, "total_steps": 500, "loss": 1.1887, "learning_rate": 9.549150281252633e-06, "epoch": 0.7413333333333333, "percentage": 83.4, "cur_time": "2024-09-19 14:57:57", "elapsed_time": "0:24:39", "remaining_time": "0:04:54", "throughput": 740.7, "total_tokens": 1096032}
+{"current_steps": 420, "total_steps": 500, "loss": 1.1508, "learning_rate": 8.9425395433148e-06, "epoch": 0.7466666666666667, "percentage": 84.0, "cur_time": "2024-09-19 14:58:07", "elapsed_time": "0:24:50", "remaining_time": "0:04:43", "throughput": 742.05, "total_tokens": 1105696}
+{"current_steps": 423, "total_steps": 500, "loss": 1.2795, "learning_rate": 8.353937964495029e-06, "epoch": 0.752, "percentage": 84.6, "cur_time": "2024-09-19 14:58:17", "elapsed_time": "0:25:00", "remaining_time": "0:04:33", "throughput": 742.46, "total_tokens": 1114048}
+{"current_steps": 426, "total_steps": 500, "loss": 1.4123, "learning_rate": 7.783603724899257e-06, "epoch": 0.7573333333333333, "percentage": 85.2, "cur_time": "2024-09-19 14:58:28", "elapsed_time": "0:25:11", "remaining_time": "0:04:22", "throughput": 742.87, "total_tokens": 1122528}
+{"current_steps": 429, "total_steps": 500, "loss": 1.2925, "learning_rate": 7.2317869919746705e-06, "epoch": 0.7626666666666667, "percentage": 85.8, "cur_time": "2024-09-19 14:58:39", "elapsed_time": "0:25:21", "remaining_time": "0:04:11", "throughput": 743.46, "total_tokens": 1131328}
+{"current_steps": 432, "total_steps": 500, "loss": 1.1505, "learning_rate": 6.698729810778065e-06, "epoch": 0.768, "percentage": 86.4, "cur_time": "2024-09-19 14:58:50", "elapsed_time": "0:25:32", "remaining_time": "0:04:01", "throughput": 743.16, "total_tokens": 1138992}
+{"current_steps": 435, "total_steps": 500, "loss": 1.3429, "learning_rate": 6.184665997806832e-06, "epoch": 0.7733333333333333, "percentage": 87.0, "cur_time": "2024-09-19 14:59:00", "elapsed_time": "0:25:42", "remaining_time": "0:03:50", "throughput": 743.91, "total_tokens": 1147776}
+{"current_steps": 438, "total_steps": 500, "loss": 1.2439, "learning_rate": 5.689821038439263e-06, "epoch": 0.7786666666666666, "percentage": 87.6, "cur_time": "2024-09-19 14:59:11", "elapsed_time": "0:25:53", "remaining_time": "0:03:39", "throughput": 744.84, "total_tokens": 1157360}
+{"current_steps": 441, "total_steps": 500, "loss": 1.2647, "learning_rate": 5.214411988029355e-06, "epoch": 0.784, "percentage": 88.2, "cur_time": "2024-09-19 14:59:21", "elapsed_time": "0:26:04", "remaining_time": "0:03:29", "throughput": 745.41, "total_tokens": 1166256}
+{"current_steps": 444, "total_steps": 500, "loss": 1.3183, "learning_rate": 4.758647376699032e-06, "epoch": 0.7893333333333333, "percentage": 88.8, "cur_time": "2024-09-19 14:59:32", "elapsed_time": "0:26:15", "remaining_time": "0:03:18", "throughput": 745.27, "total_tokens": 1173840}
+{"current_steps": 447, "total_steps": 500, "loss": 1.3921, "learning_rate": 4.322727117869951e-06, "epoch": 0.7946666666666666, "percentage": 89.4, "cur_time": "2024-09-19 14:59:42", "elapsed_time": "0:26:25", "remaining_time": "0:03:07", "throughput": 744.81, "total_tokens": 1180848}
+{"current_steps": 450, "total_steps": 500, "loss": 1.3069, "learning_rate": 3.90684242057498e-06, "epoch": 0.8, "percentage": 90.0, "cur_time": "2024-09-19 14:59:53", "elapsed_time": "0:26:36", "remaining_time": "0:02:57", "throughput": 745.37, "total_tokens": 1189952}
+{"current_steps": 453, "total_steps": 500, "loss": 1.228, "learning_rate": 3.511175705587433e-06, "epoch": 0.8053333333333333, "percentage": 90.6, "cur_time": "2024-09-19 15:00:04", "elapsed_time": "0:26:47", "remaining_time": "0:02:46", "throughput": 745.54, "total_tokens": 1198336}
+{"current_steps": 456, "total_steps": 500, "loss": 1.264, "learning_rate": 3.1359005254054273e-06, "epoch": 0.8106666666666666, "percentage": 91.2, "cur_time": "2024-09-19 15:00:15", "elapsed_time": "0:26:57", "remaining_time": "0:02:36", "throughput": 745.65, "total_tokens": 1206304}
+{"current_steps": 459, "total_steps": 500, "loss": 1.3767, "learning_rate": 2.7811814881259503e-06, "epoch": 0.816, "percentage": 91.8, "cur_time": "2024-09-19 15:00:25", "elapsed_time": "0:27:08", "remaining_time": "0:02:25", "throughput": 745.73, "total_tokens": 1214480}
+{"current_steps": 462, "total_steps": 500, "loss": 1.2163, "learning_rate": 2.4471741852423237e-06, "epoch": 0.8213333333333334, "percentage": 92.4, "cur_time": "2024-09-19 15:00:36", "elapsed_time": "0:27:19", "remaining_time": "0:02:14", "throughput": 745.84, "total_tokens": 1222608}
+{"current_steps": 465, "total_steps": 500, "loss": 1.3609, "learning_rate": 2.134025123396638e-06, "epoch": 0.8266666666666667, "percentage": 93.0, "cur_time": "2024-09-19 15:00:47", "elapsed_time": "0:27:29", "remaining_time": "0:02:04", "throughput": 745.32, "total_tokens": 1229568}
+{"current_steps": 468, "total_steps": 500, "loss": 1.2252, "learning_rate": 1.841871660117095e-06, "epoch": 0.832, "percentage": 93.6, "cur_time": "2024-09-19 15:00:57", "elapsed_time": "0:27:40", "remaining_time": "0:01:53", "throughput": 744.92, "total_tokens": 1236656}
+{"current_steps": 471, "total_steps": 500, "loss": 1.3446, "learning_rate": 1.5708419435684462e-06, "epoch": 0.8373333333333334, "percentage": 94.2, "cur_time": "2024-09-19 15:01:08", "elapsed_time": "0:27:51", "remaining_time": "0:01:42", "throughput": 744.33, "total_tokens": 1243776}
+{"current_steps": 474, "total_steps": 500, "loss": 1.1298, "learning_rate": 1.3210548563419856e-06, "epoch": 0.8426666666666667, "percentage": 94.8, "cur_time": "2024-09-19 15:01:19", "elapsed_time": "0:28:02", "remaining_time": "0:01:32", "throughput": 745.02, "total_tokens": 1253296}
+{"current_steps": 477, "total_steps": 500, "loss": 1.4383, "learning_rate": 1.0926199633097157e-06, "epoch": 0.848, "percentage": 95.4, "cur_time": "2024-09-19 15:01:30", "elapsed_time": "0:28:12", "remaining_time": "0:01:21", "throughput": 745.14, "total_tokens": 1261440}
+{"current_steps": 480, "total_steps": 500, "loss": 1.4809, "learning_rate": 8.856374635655695e-07, "epoch": 0.8533333333333334, "percentage": 96.0, "cur_time": "2024-09-19 15:01:41", "elapsed_time": "0:28:23", "remaining_time": "0:01:10", "throughput": 744.5, "total_tokens": 1268336}
+{"current_steps": 483, "total_steps": 500, "loss": 1.4316, "learning_rate": 7.001981464747565e-07, "epoch": 0.8586666666666667, "percentage": 96.6, "cur_time": "2024-09-19 15:01:51", "elapsed_time": "0:28:33", "remaining_time": "0:01:00", "throughput": 744.62, "total_tokens": 1276224}
+{"current_steps": 486, "total_steps": 500, "loss": 1.1178, "learning_rate": 5.363833518505834e-07, "epoch": 0.864, "percentage": 97.2, "cur_time": "2024-09-19 15:02:01", "elapsed_time": "0:28:44", "remaining_time": "0:00:49", "throughput": 744.67, "total_tokens": 1284112}
+{"current_steps": 489, "total_steps": 500, "loss": 1.3764, "learning_rate": 3.9426493427611177e-07, "epoch": 0.8693333333333333, "percentage": 97.8, "cur_time": "2024-09-19 15:02:12", "elapsed_time": "0:28:54", "remaining_time": "0:00:39", "throughput": 745.29, "total_tokens": 1292800}
+{"current_steps": 492, "total_steps": 500, "loss": 1.4021, "learning_rate": 2.7390523158633554e-07, "epoch": 0.8746666666666667, "percentage": 98.4, "cur_time": "2024-09-19 15:02:22", "elapsed_time": "0:29:04", "remaining_time": "0:00:28", "throughput": 745.42, "total_tokens": 1300656}
+{"current_steps": 495, "total_steps": 500, "loss": 1.3095, "learning_rate": 1.753570375247815e-07, "epoch": 0.88, "percentage": 99.0, "cur_time": "2024-09-19 15:02:32", "elapsed_time": "0:29:15", "remaining_time": "0:00:17", "throughput": 746.25, "total_tokens": 1309792}
+{"current_steps": 498, "total_steps": 500, "loss": 1.23, "learning_rate": 9.866357858642205e-08, "epoch": 0.8853333333333333, "percentage": 99.6, "cur_time": "2024-09-19 15:02:43", "elapsed_time": "0:29:25", "remaining_time": "0:00:07", "throughput": 745.22, "total_tokens": 1316000}
+{"current_steps": 500, "total_steps": 500, "eval_loss": 1.272219181060791, "epoch": 0.8888888888888888, "percentage": 100.0, "cur_time": "2024-09-19 15:03:46", "elapsed_time": "0:30:29", "remaining_time": "0:00:00", "throughput": 722.22, "total_tokens": 1321104}
+{"current_steps": 500, "total_steps": 500, "epoch": 0.8888888888888888, "percentage": 100.0, "cur_time": "2024-09-19 15:03:47", "elapsed_time": "0:30:30", "remaining_time": "0:00:00", "throughput": 721.82, "total_tokens": 1321104}
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/trainer_state.json
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/trainer_state.json
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_args.bin
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_args.bin
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_eval_loss.png
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_eval_loss.png
--- a/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_loss.png
+++ b/results/910b/lora_sft_Qwen-7B_8_gpu_500_step_20240919142813/training_loss.png