diff --git a/README.md b/README.md index 80ab347f..14af3f46 100644 --- a/README.md +++ b/README.md @@ -200,6 +200,9 @@ You also can add a custom chat template to [template.py](src/llamafactory/data/t | ORPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | SimPO Training | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +> [!TIP] +> The implementation details of PPO can be found in [this blog](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html). + ## Provided Datasets
Pre-training datasets diff --git a/README_zh.md b/README_zh.md index 962dcf43..578d2960 100644 --- a/README_zh.md +++ b/README_zh.md @@ -200,6 +200,9 @@ https://github.com/user-attachments/assets/e6ce34b0-52d5-4f3e-a830-592106c4c272 | ORPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | SimPO 训练 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | +> [!TIP] +> 有关 PPO 的实现细节,请参考[此博客](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html)。 + ## 数据集
预训练数据集