punk02/logs/logs.txt

12477 lines
3.3 MiB
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

/data02/users/lz/miniconda3/envs/cpm/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/data02/users/lz/miniconda3/envs/cpm/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/data02/users/lz/miniconda3/envs/cpm/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/data02/users/lz/miniconda3/envs/cpm/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 61458 examples [00:00, 385951.50 examples/s] Generating train split: 61458 examples [00:00, 381522.15 examples/s]
Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 200 examples [00:00, 82467.64 examples/s]
Map (num_proc=16): 0%| | 0/61458 [00:00<?, ? examples/s] Map (num_proc=16): 0%| | 0/61458 [00:00<?, ? examples/s] Map (num_proc=16): 0%| | 0/61458 [00:00<?, ? examples/s] Map (num_proc=16): 0%| | 0/61458 [00:00<?, ? examples/s] Map (num_proc=16): 2%|▏ | 1000/61458 [00:01<01:08, 880.49 examples/s] Map (num_proc=16): 2%|▏ | 1000/61458 [00:01<01:21, 741.02 examples/s] Map (num_proc=16): 3%|▎ | 2000/61458 [00:01<00:34, 1738.64 examples/s] Map (num_proc=16): 2%|▏ | 1000/61458 [00:01<01:16, 788.16 examples/s] Map (num_proc=16): 3%|▎ | 2000/61458 [00:01<00:34, 1715.19 examples/s] Map (num_proc=16): 3%|▎ | 2000/61458 [00:01<00:44, 1344.14 examples/s] Map (num_proc=16): 5%|▍ | 3000/61458 [00:01<00:23, 2508.78 examples/s] Map (num_proc=16): 8%|▊ | 5000/61458 [00:01<00:14, 4001.95 examples/s] Map (num_proc=16): 10%|▉ | 6000/61458 [00:01<00:09, 5838.89 examples/s] Map (num_proc=16): 10%|▉ | 6000/61458 [00:02<00:13, 4044.83 examples/s] Map (num_proc=16): 11%|█▏ | 7000/61458 [00:01<00:10, 5386.98 examples/s] Map (num_proc=16): 2%|▏ | 1000/61458 [00:01<01:27, 689.44 examples/s] Map (num_proc=16): 15%|█▍ | 9000/61458 [00:02<00:06, 7509.04 examples/s] Map (num_proc=16): 7%|▋ | 4000/61458 [00:01<00:22, 2562.59 examples/s] Map (num_proc=16): 18%|█▊ | 11000/61458 [00:02<00:05, 9148.20 examples/s] Map (num_proc=16): 3%|▎ | 2000/61458 [00:01<00:42, 1398.36 examples/s] Map (num_proc=16): 11%|█▏ | 7000/61458 [00:02<00:10, 5164.79 examples/s] Map (num_proc=16): 13%|█▎ | 8000/61458 [00:02<00:10, 4921.96 examples/s] Map (num_proc=16): 23%|██▎ | 14000/61458 [00:02<00:03, 12271.31 examples/s] Map (num_proc=16): 18%|█▊ | 11000/61458 [00:02<00:06, 7725.81 examples/s] Map (num_proc=16): 15%|█▍ | 9000/61458 [00:02<00:08, 6478.86 examples/s] Map (num_proc=16): 21%|██ | 13000/61458 [00:02<00:05, 9106.43 examples/s] Map (num_proc=16): 26%|██▌ | 16000/61458 [00:02<00:03, 12154.57 examples/s] Map (num_proc=16): 18%|█▊ | 11000/61458 [00:02<00:06, 7706.59 examples/s] Map (num_proc=16): 5%|▍ | 3000/61458 [00:01<00:31, 1852.48 examples/s] Map (num_proc=16): 10%|▉ | 6000/61458 [00:02<00:11, 4724.66 examples/s] Map (num_proc=16): 24%|██▍ | 15000/61458 [00:02<00:04, 9367.76 examples/s] Map (num_proc=16): 23%|██▎ | 14000/61458 [00:02<00:04, 10071.75 examples/s] Map (num_proc=16): 29%|██▉ | 18000/61458 [00:02<00:04, 9551.04 examples/s] Map (num_proc=16): 29%|██▉ | 18000/61458 [00:02<00:02, 14564.59 examples/s] Map (num_proc=16): 13%|█▎ | 8000/61458 [00:02<00:08, 6007.12 examples/s] Map (num_proc=16): 28%|██▊ | 17000/61458 [00:03<00:04, 9577.70 examples/s] Map (num_proc=16): 39%|███▉ | 23842/61458 [00:02<00:02, 17279.08 examples/s] Map (num_proc=16): 16%|█▋ | 10000/61458 [00:02<00:06, 7621.16 examples/s] Map (num_proc=16): 33%|███▎ | 20000/61458 [00:02<00:03, 12661.15 examples/s] Map (num_proc=16): 33%|███▎ | 20000/61458 [00:03<00:03, 11065.89 examples/s] Map (num_proc=16): 44%|████▎ | 26842/61458 [00:03<00:02, 16302.76 examples/s] Map (num_proc=16): 20%|█▉ | 12000/61458 [00:02<00:05, 8627.75 examples/s] Map (num_proc=16): 36%|███▌ | 22000/61458 [00:03<00:02, 13353.20 examples/s] Map (num_proc=16): 39%|███▉ | 24000/61458 [00:03<00:02, 15057.96 examples/s] Map (num_proc=16): 49%|████▊ | 29842/61458 [00:03<00:01, 18360.49 examples/s] Map (num_proc=16): 23%|██▎ | 14000/61458 [00:02<00:04, 9833.77 examples/s] Map (num_proc=16): 42%|████▏ | 26000/61458 [00:03<00:02, 15966.61 examples/s] Map (num_proc=16): 39%|███▉ | 24000/61458 [00:03<00:03, 11358.17 examples/s] Map (num_proc=16): 26%|██▌ | 16000/61458 [00:02<00:04, 10612.87 examples/s] Map (num_proc=16): 46%|████▌ | 28000/61458 [00:03<00:02, 15098.42 examples/s] Map (num_proc=16): 44%|████▍ | 27000/61458 [00:03<00:02, 14261.73 examples/s] Map (num_proc=16): 53%|█████▎ | 32683/61458 [00:03<00:01, 14402.48 examples/s] Map (num_proc=16): 31%|███ | 19000/61458 [00:03<00:02, 14250.45 examples/s] Map (num_proc=16): 58%|█████▊ | 35365/61458 [00:03<00:01, 16461.25 examples/s] Map (num_proc=16): 50%|█████ | 31000/61458 [00:03<00:01, 18793.69 examples/s] Map (num_proc=16): 49%|████▉ | 30000/61458 [00:03<00:02, 13926.91 examples/s] Map (num_proc=16): 55%|█████▌ | 34000/61458 [00:03<00:01, 20874.45 examples/s] Map (num_proc=16): 53%|█████▎ | 32842/61458 [00:03<00:01, 15520.45 examples/s] Map (num_proc=16): 34%|███▍ | 21000/61458 [00:03<00:03, 10513.40 examples/s] Map (num_proc=16): 62%|██████▏ | 38365/61458 [00:03<00:01, 15027.86 examples/s] Map (num_proc=16): 60%|█████▉ | 36683/61458 [00:04<00:01, 18419.05 examples/s] Map (num_proc=16): 67%|██████▋ | 41365/61458 [00:03<00:01, 17319.93 examples/s] Map (num_proc=16): 37%|███▋ | 23000/61458 [00:03<00:03, 11202.20 examples/s] Map (num_proc=16): 63%|██████▎ | 38683/61458 [00:04<00:01, 18712.97 examples/s] Map (num_proc=16): 60%|█████▉ | 36842/61458 [00:03<00:01, 14570.77 examples/s] Map (num_proc=16): 74%|███████▎ | 45207/61458 [00:04<00:00, 21305.83 examples/s] Map (num_proc=16): 66%|██████▌ | 40683/61458 [00:04<00:01, 18817.98 examples/s] Map (num_proc=16): 47%|████▋ | 29000/61458 [00:03<00:01, 19124.97 examples/s] Map (num_proc=16): 63%|██████▎ | 38842/61458 [00:04<00:01, 14983.90 examples/s] Map (num_proc=16): 71%|███████ | 43365/61458 [00:04<00:00, 20701.47 examples/s] Map (num_proc=16): 52%|█████▏ | 32000/61458 [00:03<00:01, 17007.01 examples/s] Map (num_proc=16): 71%|███████ | 43366/61458 [00:04<00:01, 14215.47 examples/s] Map (num_proc=16): 57%|█████▋ | 34842/61458 [00:03<00:01, 18997.33 examples/s] Map (num_proc=16): 75%|███████▍ | 46048/61458 [00:04<00:00, 15824.35 examples/s] Map (num_proc=16): 75%|███████▌ | 46207/61458 [00:04<00:00, 16638.22 examples/s] Map (num_proc=16): 80%|███████▉ | 49048/61458 [00:04<00:00, 18203.00 examples/s] Map (num_proc=16): 78%|███████▊ | 48048/61458 [00:04<00:01, 10842.53 examples/s] Map (num_proc=16): 62%|██████▏ | 37842/61458 [00:04<00:01, 15864.66 examples/s] Map (num_proc=16): 80%|███████▉ | 49048/61458 [00:04<00:00, 14426.66 examples/s] Map (num_proc=16): 87%|████████▋ | 53571/61458 [00:04<00:00, 16176.50 examples/s] Map (num_proc=16): 66%|██████▌ | 40365/61458 [00:04<00:01, 16288.70 examples/s] Map (num_proc=16): 84%|████████▍ | 51730/61458 [00:04<00:00, 15824.53 examples/s] Map (num_proc=16): 86%|████████▌ | 52889/61458 [00:05<00:00, 15592.23 examples/s] Map (num_proc=16): 72%|███████▏ | 44048/61458 [00:04<00:01, 17171.40 examples/s] Map (num_proc=16): 92%|█████████▏| 56412/61458 [00:05<00:00, 14372.32 examples/s] Map (num_proc=16): 89%|████████▉ | 54571/61458 [00:05<00:00, 14630.36 examples/s] Map (num_proc=16): 90%|█████████ | 55571/61458 [00:05<00:00, 16160.00 examples/s] Map (num_proc=16): 81%|████████ | 49889/61458 [00:04<00:00, 24721.32 examples/s] Map (num_proc=16): 93%|█████████▎| 57094/61458 [00:05<00:00, 15727.93 examples/s] Map (num_proc=16): 95%|█████████▍| 58094/61458 [00:05<00:00, 15220.66 examples/s] Map (num_proc=16): 96%|█████████▌| 58935/61458 [00:05<00:00, 11349.25 examples/s] Map (num_proc=16): 96%|█████████▌| 58935/61458 [00:05<00:00, 11354.61 examples/s] Map (num_proc=16): 87%|████████▋ | 53571/61458 [00:05<00:00, 15314.52 examples/s] Map (num_proc=16): 97%|█████████▋| 59776/61458 [00:05<00:00, 10840.50 examples/s] Map (num_proc=16): 93%|█████████▎| 57094/61458 [00:05<00:00, 18055.39 examples/s] Map (num_proc=16): 100%|██████████| 61458/61458 [00:06<00:00, 10712.26 examples/s] Map (num_proc=16): 99%|█████████▊| 60617/61458 [00:05<00:00, 9287.52 examples/s] Map (num_proc=16): 100%|██████████| 61458/61458 [00:05<00:00, 8955.10 examples/s] Map (num_proc=16): 97%|█████████▋| 59776/61458 [00:05<00:00, 14164.35 examples/s] Map (num_proc=16): 100%|██████████| 61458/61458 [00:06<00:00, 9808.04 examples/s]
Map (num_proc=16): 100%|██████████| 61458/61458 [00:06<00:00, 9909.86 examples/s]
Map (num_proc=16): 100%|██████████| 61458/61458 [00:06<00:00, 9633.20 examples/s]
Map (num_proc=16): 100%|██████████| 61458/61458 [00:06<00:00, 9690.66 examples/s]
Map (num_proc=16): 0%| | 0/200 [00:00<?, ? examples/s] Map (num_proc=16): 0%| | 0/200 [00:00<?, ? examples/s] Map (num_proc=16): 6%|▋ | 13/200 [00:00<00:07, 26.70 examples/s] Map (num_proc=16): 0%| | 0/200 [00:00<?, ? examples/s] Map (num_proc=16): 13%|█▎ | 26/200 [00:00<00:03, 49.67 examples/s] Map (num_proc=16): 0%| | 0/200 [00:00<?, ? examples/s] Map (num_proc=16): 20%|█▉ | 39/200 [00:00<00:02, 68.57 examples/s] Map (num_proc=16): 26%|██▌ | 52/200 [00:00<00:01, 83.23 examples/s] Map (num_proc=16): 6%|▋ | 13/200 [00:00<00:07, 25.38 examples/s] Map (num_proc=16): 13%|█▎ | 26/200 [00:00<00:03, 46.88 examples/s] Map (num_proc=16): 6%|▋ | 13/200 [00:00<00:06, 28.81 examples/s] Map (num_proc=16): 32%|███▎ | 65/200 [00:01<00:01, 73.96 examples/s] Map (num_proc=16): 20%|█▉ | 39/200 [00:00<00:02, 65.34 examples/s] Map (num_proc=16): 13%|█▎ | 26/200 [00:00<00:03, 49.86 examples/s] Map (num_proc=16): 46%|████▌ | 91/200 [00:01<00:00, 113.96 examples/s] Map (num_proc=16): 6%|▋ | 13/200 [00:00<00:07, 24.49 examples/s] Map (num_proc=16): 26%|██▌ | 52/200 [00:00<00:01, 79.45 examples/s] Map (num_proc=16): 26%|██▌ | 52/200 [00:00<00:01, 95.66 examples/s] Map (num_proc=16): 32%|███▎ | 65/200 [00:00<00:01, 90.86 examples/s] Map (num_proc=16): 20%|█▉ | 39/200 [00:00<00:02, 64.94 examples/s] Map (num_proc=16): 58%|█████▊ | 116/200 [00:01<00:00, 113.28 examples/s] Map (num_proc=16): 39%|███▉ | 78/200 [00:01<00:01, 101.21 examples/s] Map (num_proc=16): 46%|████▌ | 91/200 [00:01<00:01, 106.99 examples/s] Map (num_proc=16): 39%|███▉ | 78/200 [00:00<00:01, 114.49 examples/s] Map (num_proc=16): 39%|███▉ | 78/200 [00:00<00:01, 92.60 examples/s] Map (num_proc=16): 70%|███████ | 140/200 [00:01<00:00, 112.36 examples/s] Map (num_proc=16): 52%|█████▏ | 104/200 [00:01<00:00, 108.54 examples/s] Map (num_proc=16): 46%|████▌ | 91/200 [00:01<00:01, 88.66 examples/s] Map (num_proc=16): 52%|█████▏ | 104/200 [00:01<00:00, 106.12 examples/s] Map (num_proc=16): 82%|████████▏ | 164/200 [00:01<00:00, 112.60 examples/s] Map (num_proc=16): 52%|█████▏ | 104/200 [00:01<00:01, 94.84 examples/s] Map (num_proc=16): 64%|██████▍ | 128/200 [00:01<00:00, 111.66 examples/s] Map (num_proc=16): 58%|█████▊ | 116/200 [00:01<00:00, 94.90 examples/s] Map (num_proc=16): 64%|██████▍ | 128/200 [00:01<00:00, 114.43 examples/s] Map (num_proc=16): 94%|█████████▍| 188/200 [00:01<00:00, 116.68 examples/s] Map (num_proc=16): 76%|███████▌ | 152/200 [00:01<00:00, 115.01 examples/s] Map (num_proc=16): 70%|███████ | 140/200 [00:01<00:00, 124.89 examples/s] Map (num_proc=16): 76%|███████▌ | 152/200 [00:01<00:00, 116.56 examples/s] Map (num_proc=16): 82%|████████▏ | 164/200 [00:01<00:00, 131.33 examples/s] Map (num_proc=16): 88%|████████▊ | 176/200 [00:01<00:00, 114.61 examples/s] Map (num_proc=16): 88%|████████▊ | 176/200 [00:01<00:00, 133.76 examples/s] Map (num_proc=16): 100%|██████████| 200/200 [00:02<00:00, 86.60 examples/s]
Map (num_proc=16): 94%|█████████▍| 188/200 [00:01<00:00, 140.09 examples/s] Map (num_proc=16): 100%|██████████| 200/200 [00:02<00:00, 122.15 examples/s] Map (num_proc=16): 100%|██████████| 200/200 [00:01<00:00, 133.87 examples/s]{'input': "Write a python function to generate a bot response for a user message using random choice.\nassert generate_bot_response('hello') in ['Hi', 'Hi there', 'Hello, how are you?', 'Hey there']\nassert generate_bot_response('exit') == None", 'output': 'import random\n\ndef generate_bot_response(user_message):\n responses = {\n \'hello\': [\'Hi\', \'Hi there\', \'Hello, how are you?\', \'Hey there\'],\n \'exit\': None\n }\n \n if user_message in responses:\n if responses[user_message] is None:\n return None\n return random.choice(responses[user_message])\n return "I\'m not sure how to respond to that."', 'input_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1786, 4194, 95388, 10643, 1348, 7467, 1914, 1385, 7514, 1348, 14827, 3455, 1421, 1348, 3060, 4290, 2067, 4663, 6344, 72, 5, 3313, 7514, 95360, 6730, 95360, 5570, 2249, 17751, 5589, 1377, 11465, 24252, 2342, 1772, 24252, 1887, 2342, 1772, 15934, 95342, 1980, 1502, 1449, 74, 2342, 1772, 36969, 1887, 7965, 5, 3313, 7514, 95360, 6730, 95360, 5570, 2249, 10002, 5589, 95320, 64, 64, 4090, 95396, 10850, 95388, 1724, 4663, 5, 5, 1962, 7514, 95360, 6730, 95360, 5570, 95348, 1836, 95360, 4635, 3307, 5, 1354, 50371, 95320, 64, 1504, 5, 1395, 95361, 17751, 5704, 11465, 24252, 2342, 1772, 24252, 1887, 2342, 1772, 15934, 95342, 1980, 1502, 1449, 74, 2342, 1772, 36969, 1887, 24779, 5, 1395, 95361, 10002, 5704, 4090, 5, 1354, 95369, 5, 1354, 5, 1354, 1436, 3060, 95360, 4635, 1377, 7579, 95358, 5, 1395, 1436, 7579, 95399, 1836, 95360, 4635, 95400, 1410, 4090, 95358, 5, 1535, 1720, 4090, 5, 1395, 1720, 4663, 72, 24045, 95348, 50371, 95399, 1836, 95360, 4635, 5049, 5, 1354, 1720, 1496, 95355, 95361, 95335, 1573, 3221, 1980, 1385, 9800, 1385, 1457, 72, 95354, 2], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 95388, 10643, 1348, 7467, 1914, 1385, 7514, 1348, 14827, 3455, 1421, 1348, 3060, 4290, 2067, 4663, 6344, 72, 5, 3313, 7514, 95360, 6730, 95360, 5570, 2249, 17751, 5589, 1377, 11465, 24252, 2342, 1772, 24252, 1887, 2342, 1772, 15934, 95342, 1980, 1502, 1449, 74, 2342, 1772, 36969, 1887, 7965, 5, 3313, 7514, 95360, 6730, 95360, 5570, 2249, 10002, 5589, 95320, 64, 64, 4090, 95396, 10850, 95388, 1724, 4663, 5, 5, 1962, 7514, 95360, 6730, 95360, 5570, 95348, 1836, 95360, 4635, 3307, 5, 1354, 50371, 95320, 64, 1504, 5, 1395, 95361, 17751, 5704, 11465, 24252, 2342, 1772, 24252, 1887, 2342, 1772, 15934, 95342, 1980, 1502, 1449, 74, 2342, 1772, 36969, 1887, 24779, 5, 1395, 95361, 10002, 5704, 4090, 5, 1354, 95369, 5, 1354, 5, 1354, 1436, 3060, 95360, 4635, 1377, 7579, 95358, 5, 1395, 1436, 7579, 95399, 1836, 95360, 4635, 95400, 1410, 4090, 95358, 5, 1535, 1720, 4090, 5, 1395, 1720, 4663, 72, 24045, 95348, 50371, 95399, 1836, 95360, 4635, 5049, 5, 1354, 1720, 1496, 95355, 95361, 95335, 1573, 3221, 1980, 1385, 9800, 1385, 1457, 72, 95354, 2]}
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Map (num_proc=16): 100%|██████████| 200/200 [00:02<00:00, 86.10 examples/s]
Map (num_proc=16): 100%|██████████| 200/200 [00:02<00:00, 98.31 examples/s]
Map (num_proc=16): 100%|██████████| 200/200 [00:02<00:00, 92.23 examples/s]
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.18.6
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
0%| | 0/11526 [00:00<?, ?it/s][W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/11526 [00:01<5:15:28, 1.64s/it] {'loss': 1.0949, 'grad_norm': 3.4261324405670166, 'learning_rate': 8.673026886383348e-09, 'epoch': 0.0}
0%| | 1/11526 [00:01<5:15:28, 1.64s/it] 0%| | 2/11526 [00:02<3:12:09, 1.00s/it] {'loss': 1.0156, 'grad_norm': 2.793147563934326, 'learning_rate': 1.7346053772766696e-08, 'epoch': 0.0}
0%| | 2/11526 [00:02<3:12:09, 1.00s/it] 0%| | 3/11526 [00:02<2:38:27, 1.21it/s] {'loss': 0.8763, 'grad_norm': 2.143695116043091, 'learning_rate': 2.6019080659150047e-08, 'epoch': 0.0}
0%| | 3/11526 [00:02<2:38:27, 1.21it/s] 0%| | 4/11526 [00:03<2:22:34, 1.35it/s] {'loss': 0.8775, 'grad_norm': 2.126295804977417, 'learning_rate': 3.469210754553339e-08, 'epoch': 0.0}
0%| | 4/11526 [00:03<2:22:34, 1.35it/s] 0%| | 5/11526 [00:04<2:13:52, 1.43it/s] {'loss': 0.9212, 'grad_norm': 2.863893747329712, 'learning_rate': 4.336513443191674e-08, 'epoch': 0.0}
0%| | 5/11526 [00:04<2:13:52, 1.43it/s] 0%| | 6/11526 [00:04<2:08:42, 1.49it/s] {'loss': 0.8643, 'grad_norm': 1.941997766494751, 'learning_rate': 5.203816131830009e-08, 'epoch': 0.0}
0%| | 6/11526 [00:04<2:08:42, 1.49it/s] 0%| | 7/11526 [00:05<2:05:15, 1.53it/s] {'loss': 0.9857, 'grad_norm': 2.772730588912964, 'learning_rate': 6.071118820468345e-08, 'epoch': 0.0}
0%| | 7/11526 [00:05<2:05:15, 1.53it/s] 0%| | 8/11526 [00:05<2:03:10, 1.56it/s] {'loss': 0.8731, 'grad_norm': 2.11067533493042, 'learning_rate': 6.938421509106678e-08, 'epoch': 0.0}
0%| | 8/11526 [00:06<2:03:10, 1.56it/s] 0%| | 9/11526 [00:06<2:01:50, 1.58it/s] {'loss': 0.9733, 'grad_norm': 2.9080772399902344, 'learning_rate': 7.805724197745014e-08, 'epoch': 0.0}
0%| | 9/11526 [00:06<2:01:50, 1.58it/s] 0%| | 10/11526 [00:07<2:00:47, 1.59it/s] {'loss': 0.855, 'grad_norm': 2.449317216873169, 'learning_rate': 8.673026886383348e-08, 'epoch': 0.0}
0%| | 10/11526 [00:07<2:00:47, 1.59it/s] 0%| | 11/11526 [00:07<2:00:09, 1.60it/s] {'loss': 0.9987, 'grad_norm': 2.7982800006866455, 'learning_rate': 9.540329575021684e-08, 'epoch': 0.0}
0%| | 11/11526 [00:07<2:00:09, 1.60it/s] 0%| | 12/11526 [00:08<1:59:40, 1.60it/s] {'loss': 1.0395, 'grad_norm': 3.0237698554992676, 'learning_rate': 1.0407632263660019e-07, 'epoch': 0.0}
0%| | 12/11526 [00:08<1:59:40, 1.60it/s] 0%| | 13/11526 [00:08<1:59:19, 1.61it/s] {'loss': 0.9886, 'grad_norm': 2.75026798248291, 'learning_rate': 1.1274934952298352e-07, 'epoch': 0.0}
0%| | 13/11526 [00:09<1:59:19, 1.61it/s] 0%| | 14/11526 [00:09<1:59:16, 1.61it/s] {'loss': 1.0184, 'grad_norm': 2.9731454849243164, 'learning_rate': 1.214223764093669e-07, 'epoch': 0.0}
0%| | 14/11526 [00:09<1:59:16, 1.61it/s] 0%| | 15/11526 [00:10<1:59:02, 1.61it/s] {'loss': 1.078, 'grad_norm': 3.5550379753112793, 'learning_rate': 1.3009540329575022e-07, 'epoch': 0.0}
0%| | 15/11526 [00:10<1:59:02, 1.61it/s] 0%| | 16/11526 [00:10<1:58:52, 1.61it/s] {'loss': 0.893, 'grad_norm': 3.8476381301879883, 'learning_rate': 1.3876843018213356e-07, 'epoch': 0.0}
0%| | 16/11526 [00:10<1:58:52, 1.61it/s] 0%| | 17/11526 [00:11<1:58:50, 1.61it/s] {'loss': 0.9109, 'grad_norm': 2.2631473541259766, 'learning_rate': 1.4744145706851694e-07, 'epoch': 0.0}
0%| | 17/11526 [00:11<1:58:50, 1.61it/s] 0%| | 18/11526 [00:12<1:58:41, 1.62it/s] {'loss': 0.9527, 'grad_norm': 3.2427501678466797, 'learning_rate': 1.5611448395490029e-07, 'epoch': 0.0}
0%| | 18/11526 [00:12<1:58:41, 1.62it/s] 0%| | 19/11526 [00:12<1:58:50, 1.61it/s] {'loss': 1.0022, 'grad_norm': 2.5299832820892334, 'learning_rate': 1.647875108412836e-07, 'epoch': 0.0}
0%| | 19/11526 [00:12<1:58:50, 1.61it/s] 0%| | 20/11526 [00:13<1:58:49, 1.61it/s] {'loss': 0.9819, 'grad_norm': 2.721240520477295, 'learning_rate': 1.7346053772766696e-07, 'epoch': 0.01}
0%| | 20/11526 [00:13<1:58:49, 1.61it/s] 0%| | 21/11526 [00:13<1:58:42, 1.62it/s] {'loss': 0.9274, 'grad_norm': 2.557347059249878, 'learning_rate': 1.821335646140503e-07, 'epoch': 0.01}
0%| | 21/11526 [00:14<1:58:42, 1.62it/s] 0%| | 22/11526 [00:14<1:58:29, 1.62it/s] {'loss': 0.9244, 'grad_norm': 2.724199056625366, 'learning_rate': 1.9080659150043368e-07, 'epoch': 0.01}
0%| | 22/11526 [00:14<1:58:29, 1.62it/s] 0%| | 23/11526 [00:15<1:58:15, 1.62it/s] {'loss': 0.8573, 'grad_norm': 2.356158494949341, 'learning_rate': 1.9947961838681703e-07, 'epoch': 0.01}
0%| | 23/11526 [00:15<1:58:15, 1.62it/s] 0%| | 24/11526 [00:15<1:58:25, 1.62it/s] {'loss': 0.8442, 'grad_norm': 2.3353495597839355, 'learning_rate': 2.0815264527320037e-07, 'epoch': 0.01}
0%| | 24/11526 [00:15<1:58:25, 1.62it/s] 0%| | 25/11526 [00:16<1:58:15, 1.62it/s] {'loss': 0.9984, 'grad_norm': 2.8589794635772705, 'learning_rate': 2.1682567215958372e-07, 'epoch': 0.01}
0%| | 25/11526 [00:16<1:58:15, 1.62it/s] 0%| | 26/11526 [00:17<1:58:12, 1.62it/s] {'loss': 1.0249, 'grad_norm': 2.319955348968506, 'learning_rate': 2.2549869904596704e-07, 'epoch': 0.01}
0%| | 26/11526 [00:17<1:58:12, 1.62it/s] 0%| | 27/11526 [00:17<1:58:19, 1.62it/s] {'loss': 0.9097, 'grad_norm': 2.331475257873535, 'learning_rate': 2.341717259323504e-07, 'epoch': 0.01}
0%| | 27/11526 [00:17<1:58:19, 1.62it/s] 0%| | 28/11526 [00:18<1:58:18, 1.62it/s] {'loss': 0.9434, 'grad_norm': 2.422861099243164, 'learning_rate': 2.428447528187338e-07, 'epoch': 0.01}
0%| | 28/11526 [00:18<1:58:18, 1.62it/s] 0%| | 29/11526 [00:18<1:58:23, 1.62it/s] {'loss': 0.8909, 'grad_norm': 2.466123580932617, 'learning_rate': 2.5151777970511714e-07, 'epoch': 0.01}
0%| | 29/11526 [00:19<1:58:23, 1.62it/s] 0%| | 30/11526 [00:19<1:58:19, 1.62it/s] {'loss': 0.9338, 'grad_norm': 2.343005657196045, 'learning_rate': 2.6019080659150043e-07, 'epoch': 0.01}
0%| | 30/11526 [00:19<1:58:19, 1.62it/s] 0%| | 31/11526 [00:20<1:58:19, 1.62it/s] {'loss': 0.9823, 'grad_norm': 2.707864999771118, 'learning_rate': 2.688638334778838e-07, 'epoch': 0.01}
0%| | 31/11526 [00:20<1:58:19, 1.62it/s] 0%| | 32/11526 [00:20<1:58:22, 1.62it/s] {'loss': 1.0908, 'grad_norm': 3.574810028076172, 'learning_rate': 2.7753686036426713e-07, 'epoch': 0.01}
0%| | 32/11526 [00:20<1:58:22, 1.62it/s] 0%| | 33/11526 [00:21<1:58:20, 1.62it/s] {'loss': 1.1014, 'grad_norm': 3.069289207458496, 'learning_rate': 2.862098872506505e-07, 'epoch': 0.01}
0%| | 33/11526 [00:21<1:58:20, 1.62it/s] 0%| | 34/11526 [00:21<1:58:29, 1.62it/s] {'loss': 0.9573, 'grad_norm': 2.73120379447937, 'learning_rate': 2.948829141370339e-07, 'epoch': 0.01}
0%| | 34/11526 [00:22<1:58:29, 1.62it/s] 0%| | 35/11526 [00:22<1:58:27, 1.62it/s] {'loss': 0.9543, 'grad_norm': 2.78043270111084, 'learning_rate': 3.035559410234172e-07, 'epoch': 0.01}
0%| | 35/11526 [00:22<1:58:27, 1.62it/s] 0%| | 36/11526 [00:23<1:58:25, 1.62it/s] {'loss': 0.9428, 'grad_norm': 3.141796827316284, 'learning_rate': 3.1222896790980057e-07, 'epoch': 0.01}
0%| | 36/11526 [00:23<1:58:25, 1.62it/s] 0%| | 37/11526 [00:23<1:58:22, 1.62it/s] {'loss': 1.0225, 'grad_norm': 3.2343151569366455, 'learning_rate': 3.2090199479618387e-07, 'epoch': 0.01}
0%| | 37/11526 [00:23<1:58:22, 1.62it/s] 0%| | 38/11526 [00:24<1:58:19, 1.62it/s] {'loss': 0.8858, 'grad_norm': 2.3611724376678467, 'learning_rate': 3.295750216825672e-07, 'epoch': 0.01}
0%| | 38/11526 [00:24<1:58:19, 1.62it/s] 0%| | 39/11526 [00:25<1:58:22, 1.62it/s] {'loss': 0.9635, 'grad_norm': 2.764070510864258, 'learning_rate': 3.3824804856895056e-07, 'epoch': 0.01}
0%| | 39/11526 [00:25<1:58:22, 1.62it/s] 0%| | 40/11526 [00:25<1:58:23, 1.62it/s] {'loss': 0.9833, 'grad_norm': 2.382430076599121, 'learning_rate': 3.469210754553339e-07, 'epoch': 0.01}
0%| | 40/11526 [00:25<1:58:23, 1.62it/s] 0%| | 41/11526 [00:26<1:58:18, 1.62it/s] {'loss': 0.9648, 'grad_norm': 2.3069510459899902, 'learning_rate': 3.5559410234171726e-07, 'epoch': 0.01}
0%| | 41/11526 [00:26<1:58:18, 1.62it/s] 0%| | 42/11526 [00:26<1:58:18, 1.62it/s] {'loss': 1.0278, 'grad_norm': 2.188366413116455, 'learning_rate': 3.642671292281006e-07, 'epoch': 0.01}
0%| | 42/11526 [00:27<1:58:18, 1.62it/s] 0%| | 43/11526 [00:27<1:58:15, 1.62it/s] {'loss': 0.8541, 'grad_norm': 2.3554527759552, 'learning_rate': 3.72940156114484e-07, 'epoch': 0.01}
0%| | 43/11526 [00:27<1:58:15, 1.62it/s] 0%| | 44/11526 [00:28<1:58:35, 1.61it/s] {'loss': 0.9811, 'grad_norm': 2.698211193084717, 'learning_rate': 3.8161318300086735e-07, 'epoch': 0.01}
0%| | 44/11526 [00:28<1:58:35, 1.61it/s] 0%| | 45/11526 [00:28<1:58:28, 1.62it/s] {'loss': 0.9443, 'grad_norm': 2.33135724067688, 'learning_rate': 3.902862098872507e-07, 'epoch': 0.01}
0%| | 45/11526 [00:28<1:58:28, 1.62it/s] 0%| | 46/11526 [00:29<1:58:25, 1.62it/s] {'loss': 0.9664, 'grad_norm': 2.3470664024353027, 'learning_rate': 3.9895923677363405e-07, 'epoch': 0.01}
0%| | 46/11526 [00:29<1:58:25, 1.62it/s] 0%| | 47/11526 [00:30<1:58:23, 1.62it/s] {'loss': 1.0037, 'grad_norm': 2.3067004680633545, 'learning_rate': 4.076322636600174e-07, 'epoch': 0.01}
0%| | 47/11526 [00:30<1:58:23, 1.62it/s] 0%| | 48/11526 [00:30<1:58:21, 1.62it/s] {'loss': 0.8985, 'grad_norm': 2.5476326942443848, 'learning_rate': 4.1630529054640075e-07, 'epoch': 0.01}
0%| | 48/11526 [00:30<1:58:21, 1.62it/s] 0%| | 49/11526 [00:31<1:58:27, 1.61it/s] {'loss': 0.8556, 'grad_norm': 2.472865104675293, 'learning_rate': 4.249783174327841e-07, 'epoch': 0.01}
0%| | 49/11526 [00:31<1:58:27, 1.61it/s] 0%| | 50/11526 [00:31<1:58:21, 1.62it/s] {'loss': 1.0094, 'grad_norm': 2.755929708480835, 'learning_rate': 4.3365134431916744e-07, 'epoch': 0.01}
0%| | 50/11526 [00:31<1:58:21, 1.62it/s] 0%| | 51/11526 [00:32<1:58:18, 1.62it/s] {'loss': 0.9995, 'grad_norm': 3.1631948947906494, 'learning_rate': 4.4232437120555074e-07, 'epoch': 0.01}
0%| | 51/11526 [00:32<1:58:18, 1.62it/s] 0%| | 52/11526 [00:33<1:58:13, 1.62it/s] {'loss': 0.831, 'grad_norm': 2.4676876068115234, 'learning_rate': 4.509973980919341e-07, 'epoch': 0.01}
0%| | 52/11526 [00:33<1:58:13, 1.62it/s] 0%| | 53/11526 [00:33<1:58:11, 1.62it/s] {'loss': 0.8687, 'grad_norm': 2.0569050312042236, 'learning_rate': 4.5967042497831743e-07, 'epoch': 0.01}
0%| | 53/11526 [00:33<1:58:11, 1.62it/s] 0%| | 54/11526 [00:34<1:58:13, 1.62it/s] {'loss': 0.8908, 'grad_norm': 2.7526803016662598, 'learning_rate': 4.683434518647008e-07, 'epoch': 0.01}
0%| | 54/11526 [00:34<1:58:13, 1.62it/s] 0%| | 55/11526 [00:34<1:58:09, 1.62it/s] {'loss': 0.9405, 'grad_norm': 2.6636526584625244, 'learning_rate': 4.770164787510842e-07, 'epoch': 0.01}
0%| | 55/11526 [00:35<1:58:09, 1.62it/s] 0%| | 56/11526 [00:35<1:58:08, 1.62it/s] {'loss': 0.8945, 'grad_norm': 2.3059258460998535, 'learning_rate': 4.856895056374676e-07, 'epoch': 0.01}
0%| | 56/11526 [00:35<1:58:08, 1.62it/s] 0%| | 57/11526 [00:36<1:58:05, 1.62it/s] {'loss': 0.9525, 'grad_norm': 2.3337178230285645, 'learning_rate': 4.943625325238509e-07, 'epoch': 0.01}
0%| | 57/11526 [00:36<1:58:05, 1.62it/s] 1%| | 58/11526 [00:36<1:58:07, 1.62it/s] {'loss': 0.8883, 'grad_norm': 2.5158843994140625, 'learning_rate': 5.030355594102343e-07, 'epoch': 0.02}
1%| | 58/11526 [00:36<1:58:07, 1.62it/s] 1%| | 59/11526 [00:37<1:58:15, 1.62it/s] {'loss': 0.9122, 'grad_norm': 1.9929068088531494, 'learning_rate': 5.117085862966176e-07, 'epoch': 0.02}
1%| | 59/11526 [00:37<1:58:15, 1.62it/s] 1%| | 60/11526 [00:38<1:58:11, 1.62it/s] {'loss': 0.7592, 'grad_norm': 2.1772091388702393, 'learning_rate': 5.203816131830009e-07, 'epoch': 0.02}
1%| | 60/11526 [00:38<1:58:11, 1.62it/s] 1%| | 61/11526 [00:38<1:58:06, 1.62it/s] {'loss': 1.0441, 'grad_norm': 2.5863544940948486, 'learning_rate': 5.290546400693842e-07, 'epoch': 0.02}
1%| | 61/11526 [00:38<1:58:06, 1.62it/s] 1%| | 62/11526 [00:39<1:58:00, 1.62it/s] {'loss': 0.9181, 'grad_norm': 2.332345485687256, 'learning_rate': 5.377276669557676e-07, 'epoch': 0.02}
1%| | 62/11526 [00:39<1:58:00, 1.62it/s] 1%| | 63/11526 [00:39<1:57:55, 1.62it/s] {'loss': 0.9614, 'grad_norm': 2.590165615081787, 'learning_rate': 5.464006938421509e-07, 'epoch': 0.02}
1%| | 63/11526 [00:40<1:57:55, 1.62it/s] 1%| | 64/11526 [00:40<1:58:05, 1.62it/s] {'loss': 0.9756, 'grad_norm': 3.0666537284851074, 'learning_rate': 5.550737207285343e-07, 'epoch': 0.02}
1%| | 64/11526 [00:40<1:58:05, 1.62it/s] 1%| | 65/11526 [00:41<1:58:00, 1.62it/s] {'loss': 0.8731, 'grad_norm': 2.46872615814209, 'learning_rate': 5.637467476149176e-07, 'epoch': 0.02}
1%| | 65/11526 [00:41<1:58:00, 1.62it/s] 1%| | 66/11526 [00:41<1:57:52, 1.62it/s] {'loss': 0.9034, 'grad_norm': 2.2258975505828857, 'learning_rate': 5.72419774501301e-07, 'epoch': 0.02}
1%| | 66/11526 [00:41<1:57:52, 1.62it/s] 1%| | 67/11526 [00:42<1:57:54, 1.62it/s] {'loss': 0.9931, 'grad_norm': 3.302924871444702, 'learning_rate': 5.810928013876844e-07, 'epoch': 0.02}
1%| | 67/11526 [00:42<1:57:54, 1.62it/s] 1%| | 68/11526 [00:42<1:57:58, 1.62it/s] {'loss': 0.8728, 'grad_norm': 2.612285614013672, 'learning_rate': 5.897658282740678e-07, 'epoch': 0.02}
1%| | 68/11526 [00:43<1:57:58, 1.62it/s] 1%| | 69/11526 [00:43<1:58:19, 1.61it/s] {'loss': 0.8563, 'grad_norm': 2.46860933303833, 'learning_rate': 5.984388551604511e-07, 'epoch': 0.02}
1%| | 69/11526 [00:43<1:58:19, 1.61it/s] 1%| | 70/11526 [00:44<1:58:13, 1.62it/s] {'loss': 0.9464, 'grad_norm': 2.5882294178009033, 'learning_rate': 6.071118820468344e-07, 'epoch': 0.02}
1%| | 70/11526 [00:44<1:58:13, 1.62it/s] 1%| | 71/11526 [00:44<1:57:57, 1.62it/s] {'loss': 1.0264, 'grad_norm': 2.3509042263031006, 'learning_rate': 6.157849089332178e-07, 'epoch': 0.02}
1%| | 71/11526 [00:44<1:57:57, 1.62it/s] 1%| | 72/11526 [00:45<1:57:41, 1.62it/s] {'loss': 0.8318, 'grad_norm': 2.2289910316467285, 'learning_rate': 6.244579358196011e-07, 'epoch': 0.02}
1%| | 72/11526 [00:45<1:57:41, 1.62it/s] 1%| | 73/11526 [00:46<1:57:34, 1.62it/s] {'loss': 0.9499, 'grad_norm': 2.844564199447632, 'learning_rate': 6.331309627059845e-07, 'epoch': 0.02}
1%| | 73/11526 [00:46<1:57:34, 1.62it/s] 1%| | 74/11526 [00:46<1:57:33, 1.62it/s] {'loss': 0.9871, 'grad_norm': 2.558464765548706, 'learning_rate': 6.418039895923677e-07, 'epoch': 0.02}
1%| | 74/11526 [00:46<1:57:33, 1.62it/s] 1%| | 75/11526 [00:47<1:57:39, 1.62it/s] {'loss': 0.8232, 'grad_norm': 2.770019054412842, 'learning_rate': 6.504770164787512e-07, 'epoch': 0.02}
1%| | 75/11526 [00:47<1:57:39, 1.62it/s] 1%| | 76/11526 [00:47<1:57:43, 1.62it/s] {'loss': 0.9688, 'grad_norm': 2.527897357940674, 'learning_rate': 6.591500433651344e-07, 'epoch': 0.02}
1%| | 76/11526 [00:48<1:57:43, 1.62it/s] 1%| | 77/11526 [00:48<1:57:46, 1.62it/s] {'loss': 0.8528, 'grad_norm': 2.155005931854248, 'learning_rate': 6.678230702515179e-07, 'epoch': 0.02}
1%| | 77/11526 [00:48<1:57:46, 1.62it/s] 1%| | 78/11526 [00:49<1:57:51, 1.62it/s] {'loss': 0.9494, 'grad_norm': 2.6538939476013184, 'learning_rate': 6.764960971379011e-07, 'epoch': 0.02}
1%| | 78/11526 [00:49<1:57:51, 1.62it/s] 1%| | 79/11526 [00:49<1:57:58, 1.62it/s] {'loss': 0.936, 'grad_norm': 2.350085735321045, 'learning_rate': 6.851691240242846e-07, 'epoch': 0.02}
1%| | 79/11526 [00:49<1:57:58, 1.62it/s] 1%| | 80/11526 [00:50<1:57:55, 1.62it/s] {'loss': 0.8373, 'grad_norm': 2.1617846488952637, 'learning_rate': 6.938421509106678e-07, 'epoch': 0.02}
1%| | 80/11526 [00:50<1:57:55, 1.62it/s] 1%| | 81/11526 [00:51<1:57:51, 1.62it/s] {'loss': 1.014, 'grad_norm': 2.7454488277435303, 'learning_rate': 7.025151777970513e-07, 'epoch': 0.02}
1%| | 81/11526 [00:51<1:57:51, 1.62it/s] 1%| | 82/11526 [00:51<1:57:47, 1.62it/s] {'loss': 0.9313, 'grad_norm': 2.5120420455932617, 'learning_rate': 7.111882046834345e-07, 'epoch': 0.02}
1%| | 82/11526 [00:51<1:57:47, 1.62it/s] 1%| | 83/11526 [00:52<1:57:49, 1.62it/s] {'loss': 1.0062, 'grad_norm': 2.82900071144104, 'learning_rate': 7.19861231569818e-07, 'epoch': 0.02}
1%| | 83/11526 [00:52<1:57:49, 1.62it/s] 1%| | 84/11526 [00:52<1:57:46, 1.62it/s] {'loss': 1.0068, 'grad_norm': 2.580829381942749, 'learning_rate': 7.285342584562012e-07, 'epoch': 0.02}
1%| | 84/11526 [00:52<1:57:46, 1.62it/s] 1%| | 85/11526 [00:53<1:57:45, 1.62it/s] {'loss': 0.8582, 'grad_norm': 2.6254355907440186, 'learning_rate': 7.372072853425847e-07, 'epoch': 0.02}
1%| | 85/11526 [00:53<1:57:45, 1.62it/s] 1%| | 86/11526 [00:54<1:57:41, 1.62it/s] {'loss': 0.914, 'grad_norm': 2.748262643814087, 'learning_rate': 7.45880312228968e-07, 'epoch': 0.02}
1%| | 86/11526 [00:54<1:57:41, 1.62it/s] 1%| | 87/11526 [00:54<1:57:35, 1.62it/s] {'loss': 0.9415, 'grad_norm': 2.7541556358337402, 'learning_rate': 7.545533391153513e-07, 'epoch': 0.02}
1%| | 87/11526 [00:54<1:57:35, 1.62it/s] 1%| | 88/11526 [00:55<1:57:32, 1.62it/s] {'loss': 0.7448, 'grad_norm': 2.3519718647003174, 'learning_rate': 7.632263660017347e-07, 'epoch': 0.02}
1%| | 88/11526 [00:55<1:57:32, 1.62it/s] 1%| | 89/11526 [00:55<1:57:35, 1.62it/s] {'loss': 0.7378, 'grad_norm': 1.6259915828704834, 'learning_rate': 7.71899392888118e-07, 'epoch': 0.02}
1%| | 89/11526 [00:56<1:57:35, 1.62it/s] 1%| | 90/11526 [00:56<1:57:29, 1.62it/s] {'loss': 0.939, 'grad_norm': 2.368450880050659, 'learning_rate': 7.805724197745014e-07, 'epoch': 0.02}
1%| | 90/11526 [00:56<1:57:29, 1.62it/s] 1%| | 91/11526 [00:57<1:57:25, 1.62it/s] {'loss': 0.7851, 'grad_norm': 1.825242042541504, 'learning_rate': 7.892454466608846e-07, 'epoch': 0.02}
1%| | 91/11526 [00:57<1:57:25, 1.62it/s] 1%| | 92/11526 [00:57<1:57:18, 1.62it/s] {'loss': 0.827, 'grad_norm': 2.2322847843170166, 'learning_rate': 7.979184735472681e-07, 'epoch': 0.02}
1%| | 92/11526 [00:57<1:57:18, 1.62it/s] 1%| | 93/11526 [00:58<1:57:20, 1.62it/s] {'loss': 0.8678, 'grad_norm': 2.20609450340271, 'learning_rate': 8.065915004336513e-07, 'epoch': 0.02}
1%| | 93/11526 [00:58<1:57:20, 1.62it/s] 1%| | 94/11526 [00:59<1:57:33, 1.62it/s] {'loss': 0.8721, 'grad_norm': 1.9949091672897339, 'learning_rate': 8.152645273200348e-07, 'epoch': 0.02}
1%| | 94/11526 [00:59<1:57:33, 1.62it/s] 1%| | 95/11526 [00:59<1:57:33, 1.62it/s] {'loss': 0.7532, 'grad_norm': 1.8752204179763794, 'learning_rate': 8.23937554206418e-07, 'epoch': 0.02}
1%| | 95/11526 [00:59<1:57:33, 1.62it/s] 1%| | 96/11526 [01:00<1:57:35, 1.62it/s] {'loss': 0.8847, 'grad_norm': 2.1880412101745605, 'learning_rate': 8.326105810928015e-07, 'epoch': 0.02}
1%| | 96/11526 [01:00<1:57:35, 1.62it/s] 1%| | 97/11526 [01:00<1:57:34, 1.62it/s] {'loss': 0.7596, 'grad_norm': 2.0729541778564453, 'learning_rate': 8.412836079791847e-07, 'epoch': 0.03}
1%| | 97/11526 [01:01<1:57:34, 1.62it/s] 1%| | 98/11526 [01:01<1:57:52, 1.62it/s] {'loss': 0.916, 'grad_norm': 2.4603145122528076, 'learning_rate': 8.499566348655682e-07, 'epoch': 0.03}
1%| | 98/11526 [01:01<1:57:52, 1.62it/s] 1%| | 99/11526 [01:02<1:57:55, 1.61it/s] {'loss': 0.9586, 'grad_norm': 2.156376361846924, 'learning_rate': 8.586296617519515e-07, 'epoch': 0.03}
1%| | 99/11526 [01:02<1:57:55, 1.61it/s] 1%| | 100/11526 [01:02<1:57:42, 1.62it/s] {'loss': 0.7507, 'grad_norm': 1.7791136503219604, 'learning_rate': 8.673026886383349e-07, 'epoch': 0.03}
1%| | 100/11526 [01:02<1:57:42, 1.62it/s] 1%| | 101/11526 [01:03<1:57:36, 1.62it/s] {'loss': 0.8453, 'grad_norm': 1.928830623626709, 'learning_rate': 8.759757155247182e-07, 'epoch': 0.03}
1%| | 101/11526 [01:03<1:57:36, 1.62it/s] 1%| | 102/11526 [01:03<1:57:25, 1.62it/s] {'loss': 0.8484, 'grad_norm': 1.8511621952056885, 'learning_rate': 8.846487424111015e-07, 'epoch': 0.03}
1%| | 102/11526 [01:04<1:57:25, 1.62it/s] 1%| | 103/11526 [01:04<1:57:18, 1.62it/s] {'loss': 0.7887, 'grad_norm': 1.6576415300369263, 'learning_rate': 8.933217692974849e-07, 'epoch': 0.03}
1%| | 103/11526 [01:04<1:57:18, 1.62it/s] 1%| | 104/11526 [01:05<1:57:22, 1.62it/s] {'loss': 0.7807, 'grad_norm': 1.695311188697815, 'learning_rate': 9.019947961838682e-07, 'epoch': 0.03}
1%| | 104/11526 [01:05<1:57:22, 1.62it/s] 1%| | 105/11526 [01:05<1:57:22, 1.62it/s] {'loss': 0.6844, 'grad_norm': 1.711766242980957, 'learning_rate': 9.106678230702516e-07, 'epoch': 0.03}
1%| | 105/11526 [01:05<1:57:22, 1.62it/s] 1%| | 106/11526 [01:06<1:57:18, 1.62it/s] {'loss': 0.7641, 'grad_norm': 1.687597632408142, 'learning_rate': 9.193408499566349e-07, 'epoch': 0.03}
1%| | 106/11526 [01:06<1:57:18, 1.62it/s] 1%| | 107/11526 [01:07<1:57:14, 1.62it/s] {'loss': 0.8026, 'grad_norm': 1.937798023223877, 'learning_rate': 9.280138768430183e-07, 'epoch': 0.03}
1%| | 107/11526 [01:07<1:57:14, 1.62it/s] 1%| | 108/11526 [01:07<1:57:07, 1.62it/s] {'loss': 0.838, 'grad_norm': 1.926873803138733, 'learning_rate': 9.366869037294016e-07, 'epoch': 0.03}
1%| | 108/11526 [01:07<1:57:07, 1.62it/s] 1%| | 109/11526 [01:08<1:57:08, 1.62it/s] {'loss': 0.8744, 'grad_norm': 1.6707262992858887, 'learning_rate': 9.45359930615785e-07, 'epoch': 0.03}
1%| | 109/11526 [01:08<1:57:08, 1.62it/s] 1%| | 110/11526 [01:08<1:57:07, 1.62it/s] {'loss': 0.7772, 'grad_norm': 1.5839877128601074, 'learning_rate': 9.540329575021685e-07, 'epoch': 0.03}
1%| | 110/11526 [01:09<1:57:07, 1.62it/s] 1%| | 111/11526 [01:09<1:57:15, 1.62it/s] {'loss': 0.7224, 'grad_norm': 1.5069698095321655, 'learning_rate': 9.627059843885516e-07, 'epoch': 0.03}
1%| | 111/11526 [01:09<1:57:15, 1.62it/s] 1%| | 112/11526 [01:10<1:57:10, 1.62it/s] {'loss': 0.7786, 'grad_norm': 2.007721424102783, 'learning_rate': 9.713790112749352e-07, 'epoch': 0.03}
1%| | 112/11526 [01:10<1:57:10, 1.62it/s] 1%| | 113/11526 [01:10<1:57:19, 1.62it/s] {'loss': 0.8187, 'grad_norm': 1.7204419374465942, 'learning_rate': 9.800520381613183e-07, 'epoch': 0.03}
1%| | 113/11526 [01:10<1:57:19, 1.62it/s] 1%| | 114/11526 [01:11<1:57:33, 1.62it/s] {'loss': 0.7963, 'grad_norm': 1.8152350187301636, 'learning_rate': 9.887250650477019e-07, 'epoch': 0.03}
1%| | 114/11526 [01:11<1:57:33, 1.62it/s] 1%| | 115/11526 [01:11<1:57:39, 1.62it/s] {'loss': 0.7744, 'grad_norm': 1.5332870483398438, 'learning_rate': 9.97398091934085e-07, 'epoch': 0.03}
1%| | 115/11526 [01:12<1:57:39, 1.62it/s] 1%| | 116/11526 [01:12<1:57:40, 1.62it/s] {'loss': 0.7205, 'grad_norm': 1.463029146194458, 'learning_rate': 1.0060711188204686e-06, 'epoch': 0.03}
1%| | 116/11526 [01:12<1:57:40, 1.62it/s] 1%| | 117/11526 [01:13<1:57:35, 1.62it/s] {'loss': 0.7717, 'grad_norm': 1.7677960395812988, 'learning_rate': 1.0147441457068517e-06, 'epoch': 0.03}
1%| | 117/11526 [01:13<1:57:35, 1.62it/s] 1%| | 118/11526 [01:13<1:57:25, 1.62it/s] {'loss': 0.7577, 'grad_norm': 1.7238428592681885, 'learning_rate': 1.0234171725932352e-06, 'epoch': 0.03}
1%| | 118/11526 [01:13<1:57:25, 1.62it/s] 1%| | 119/11526 [01:14<1:57:26, 1.62it/s] {'loss': 0.6389, 'grad_norm': 1.4990946054458618, 'learning_rate': 1.0320901994796184e-06, 'epoch': 0.03}
1%| | 119/11526 [01:14<1:57:26, 1.62it/s] 1%| | 120/11526 [01:15<1:57:20, 1.62it/s] {'loss': 0.892, 'grad_norm': 1.716744065284729, 'learning_rate': 1.0407632263660017e-06, 'epoch': 0.03}
1%| | 120/11526 [01:15<1:57:20, 1.62it/s] 1%| | 121/11526 [01:15<1:57:20, 1.62it/s] {'loss': 0.7709, 'grad_norm': 1.7419546842575073, 'learning_rate': 1.049436253252385e-06, 'epoch': 0.03}
1%| | 121/11526 [01:15<1:57:20, 1.62it/s] 1%| | 122/11526 [01:16<1:57:18, 1.62it/s] {'loss': 0.5891, 'grad_norm': 1.463056206703186, 'learning_rate': 1.0581092801387684e-06, 'epoch': 0.03}
1%| | 122/11526 [01:16<1:57:18, 1.62it/s] 1%| | 123/11526 [01:16<1:57:14, 1.62it/s] {'loss': 0.7855, 'grad_norm': 1.9125365018844604, 'learning_rate': 1.066782307025152e-06, 'epoch': 0.03}
1%| | 123/11526 [01:17<1:57:14, 1.62it/s] 1%| | 124/11526 [01:17<1:57:14, 1.62it/s] {'loss': 0.6292, 'grad_norm': 1.425355076789856, 'learning_rate': 1.0754553339115351e-06, 'epoch': 0.03}
1%| | 124/11526 [01:17<1:57:14, 1.62it/s] 1%| | 125/11526 [01:18<1:57:13, 1.62it/s] {'loss': 0.5834, 'grad_norm': 1.3620693683624268, 'learning_rate': 1.0841283607979187e-06, 'epoch': 0.03}
1%| | 125/11526 [01:18<1:57:13, 1.62it/s] 1%| | 126/11526 [01:18<1:57:12, 1.62it/s] {'loss': 0.6492, 'grad_norm': 1.4669655561447144, 'learning_rate': 1.0928013876843018e-06, 'epoch': 0.03}
1%| | 126/11526 [01:18<1:57:12, 1.62it/s] 1%| | 127/11526 [01:19<1:57:12, 1.62it/s] {'loss': 0.5672, 'grad_norm': 1.475087285041809, 'learning_rate': 1.1014744145706854e-06, 'epoch': 0.03}
1%| | 127/11526 [01:19<1:57:12, 1.62it/s] 1%| | 128/11526 [01:20<1:57:11, 1.62it/s] {'loss': 0.7576, 'grad_norm': 1.5403342247009277, 'learning_rate': 1.1101474414570685e-06, 'epoch': 0.03}
1%| | 128/11526 [01:20<1:57:11, 1.62it/s] 1%| | 129/11526 [01:20<1:57:11, 1.62it/s] {'loss': 0.9373, 'grad_norm': 2.1037890911102295, 'learning_rate': 1.118820468343452e-06, 'epoch': 0.03}
1%| | 129/11526 [01:20<1:57:11, 1.62it/s] 1%| | 130/11526 [01:21<1:57:12, 1.62it/s] {'loss': 0.6499, 'grad_norm': 1.4111831188201904, 'learning_rate': 1.1274934952298352e-06, 'epoch': 0.03}
1%| | 130/11526 [01:21<1:57:12, 1.62it/s] 1%| | 131/11526 [01:21<1:57:11, 1.62it/s] {'loss': 0.7308, 'grad_norm': 1.4123345613479614, 'learning_rate': 1.1361665221162188e-06, 'epoch': 0.03}
1%| | 131/11526 [01:21<1:57:11, 1.62it/s] 1%| | 132/11526 [01:22<1:57:10, 1.62it/s] {'loss': 0.7433, 'grad_norm': 1.7135119438171387, 'learning_rate': 1.144839549002602e-06, 'epoch': 0.03}
1%| | 132/11526 [01:22<1:57:10, 1.62it/s] 1%| | 133/11526 [01:23<1:57:12, 1.62it/s] {'loss': 0.697, 'grad_norm': 1.3512945175170898, 'learning_rate': 1.1535125758889853e-06, 'epoch': 0.03}
1%| | 133/11526 [01:23<1:57:12, 1.62it/s] 1%| | 134/11526 [01:23<1:57:15, 1.62it/s] {'loss': 0.6869, 'grad_norm': 1.3782225847244263, 'learning_rate': 1.1621856027753688e-06, 'epoch': 0.03}
1%| | 134/11526 [01:23<1:57:15, 1.62it/s] 1%| | 135/11526 [01:24<1:57:10, 1.62it/s] {'loss': 0.6985, 'grad_norm': 1.438422441482544, 'learning_rate': 1.170858629661752e-06, 'epoch': 0.04}
1%| | 135/11526 [01:24<1:57:10, 1.62it/s] 1%| | 136/11526 [01:24<1:57:06, 1.62it/s] {'loss': 0.5912, 'grad_norm': 1.3065789937973022, 'learning_rate': 1.1795316565481355e-06, 'epoch': 0.04}
1%| | 136/11526 [01:25<1:57:06, 1.62it/s] 1%| | 137/11526 [01:25<1:57:04, 1.62it/s] {'loss': 0.7323, 'grad_norm': 1.4421719312667847, 'learning_rate': 1.1882046834345186e-06, 'epoch': 0.04}
1%| | 137/11526 [01:25<1:57:04, 1.62it/s] 1%| | 138/11526 [01:26<1:57:03, 1.62it/s] {'loss': 0.5965, 'grad_norm': 1.286468267440796, 'learning_rate': 1.1968777103209022e-06, 'epoch': 0.04}
1%| | 138/11526 [01:26<1:57:03, 1.62it/s] 1%| | 139/11526 [01:26<1:57:02, 1.62it/s] {'loss': 0.5876, 'grad_norm': 1.443925380706787, 'learning_rate': 1.2055507372072853e-06, 'epoch': 0.04}
1%| | 139/11526 [01:26<1:57:02, 1.62it/s] 1%| | 140/11526 [01:27<1:56:53, 1.62it/s] {'loss': 0.6765, 'grad_norm': 1.211211085319519, 'learning_rate': 1.214223764093669e-06, 'epoch': 0.04}
1%| | 140/11526 [01:27<1:56:53, 1.62it/s] 1%| | 141/11526 [01:28<1:56:57, 1.62it/s] {'loss': 0.7977, 'grad_norm': 1.4931507110595703, 'learning_rate': 1.222896790980052e-06, 'epoch': 0.04}
1%| | 141/11526 [01:28<1:56:57, 1.62it/s] 1%| | 142/11526 [01:28<1:56:53, 1.62it/s] {'loss': 0.6237, 'grad_norm': 1.568094253540039, 'learning_rate': 1.2315698178664356e-06, 'epoch': 0.04}
1%| | 142/11526 [01:28<1:56:53, 1.62it/s] 1%| | 143/11526 [01:29<1:57:05, 1.62it/s] {'loss': 0.6116, 'grad_norm': 1.2825233936309814, 'learning_rate': 1.2402428447528187e-06, 'epoch': 0.04}
1%| | 143/11526 [01:29<1:57:05, 1.62it/s] 1%| | 144/11526 [01:29<1:57:17, 1.62it/s] {'loss': 0.5838, 'grad_norm': 1.1763852834701538, 'learning_rate': 1.2489158716392023e-06, 'epoch': 0.04}
1%| | 144/11526 [01:30<1:57:17, 1.62it/s] 1%|▏ | 145/11526 [01:30<1:57:12, 1.62it/s] {'loss': 0.5777, 'grad_norm': 1.1497303247451782, 'learning_rate': 1.2575888985255854e-06, 'epoch': 0.04}
1%|▏ | 145/11526 [01:30<1:57:12, 1.62it/s] 1%|▏ | 146/11526 [01:31<1:57:14, 1.62it/s] {'loss': 0.5924, 'grad_norm': 1.107702374458313, 'learning_rate': 1.266261925411969e-06, 'epoch': 0.04}
1%|▏ | 146/11526 [01:31<1:57:14, 1.62it/s] 1%|▏ | 147/11526 [01:31<1:57:09, 1.62it/s] {'loss': 0.5243, 'grad_norm': 1.0762262344360352, 'learning_rate': 1.2749349522983523e-06, 'epoch': 0.04}
1%|▏ | 147/11526 [01:31<1:57:09, 1.62it/s] 1%|▏ | 148/11526 [01:32<1:57:13, 1.62it/s] {'loss': 0.6526, 'grad_norm': 1.1643974781036377, 'learning_rate': 1.2836079791847355e-06, 'epoch': 0.04}
1%|▏ | 148/11526 [01:32<1:57:13, 1.62it/s] 1%|▏ | 149/11526 [01:32<1:57:23, 1.62it/s] {'loss': 0.7247, 'grad_norm': 1.3989418745040894, 'learning_rate': 1.2922810060711188e-06, 'epoch': 0.04}
1%|▏ | 149/11526 [01:33<1:57:23, 1.62it/s] 1%|▏ | 150/11526 [01:33<1:57:32, 1.61it/s] {'loss': 0.5651, 'grad_norm': 1.062746524810791, 'learning_rate': 1.3009540329575024e-06, 'epoch': 0.04}
1%|▏ | 150/11526 [01:33<1:57:32, 1.61it/s] 1%|▏ | 151/11526 [01:34<1:57:27, 1.61it/s] {'loss': 0.589, 'grad_norm': 1.09380042552948, 'learning_rate': 1.3096270598438857e-06, 'epoch': 0.04}
1%|▏ | 151/11526 [01:34<1:57:27, 1.61it/s] 1%|▏ | 152/11526 [01:34<1:57:20, 1.62it/s] {'loss': 0.5278, 'grad_norm': 0.929856538772583, 'learning_rate': 1.3183000867302689e-06, 'epoch': 0.04}
1%|▏ | 152/11526 [01:34<1:57:20, 1.62it/s] 1%|▏ | 153/11526 [01:35<1:57:23, 1.61it/s] {'loss': 0.6685, 'grad_norm': 1.1692864894866943, 'learning_rate': 1.3269731136166522e-06, 'epoch': 0.04}
1%|▏ | 153/11526 [01:35<1:57:23, 1.61it/s] 1%|▏ | 154/11526 [01:36<1:57:19, 1.62it/s] {'loss': 0.6129, 'grad_norm': 1.1693429946899414, 'learning_rate': 1.3356461405030358e-06, 'epoch': 0.04}
1%|▏ | 154/11526 [01:36<1:57:19, 1.62it/s] 1%|▏ | 155/11526 [01:36<1:57:05, 1.62it/s] {'loss': 0.617, 'grad_norm': 1.1112942695617676, 'learning_rate': 1.3443191673894191e-06, 'epoch': 0.04}
1%|▏ | 155/11526 [01:36<1:57:05, 1.62it/s] 1%|▏ | 156/11526 [01:37<1:57:14, 1.62it/s] {'loss': 0.6157, 'grad_norm': 1.1956267356872559, 'learning_rate': 1.3529921942758023e-06, 'epoch': 0.04}
1%|▏ | 156/11526 [01:37<1:57:14, 1.62it/s] 1%|▏ | 157/11526 [01:37<1:57:05, 1.62it/s] {'loss': 0.6358, 'grad_norm': 1.1118966341018677, 'learning_rate': 1.3616652211621856e-06, 'epoch': 0.04}
1%|▏ | 157/11526 [01:38<1:57:05, 1.62it/s] 1%|▏ | 158/11526 [01:38<1:57:04, 1.62it/s] {'loss': 0.7017, 'grad_norm': 1.222934365272522, 'learning_rate': 1.3703382480485692e-06, 'epoch': 0.04}
1%|▏ | 158/11526 [01:38<1:57:04, 1.62it/s] 1%|▏ | 159/11526 [01:39<1:57:11, 1.62it/s] {'loss': 0.5604, 'grad_norm': 1.0738201141357422, 'learning_rate': 1.3790112749349525e-06, 'epoch': 0.04}
1%|▏ | 159/11526 [01:39<1:57:11, 1.62it/s] 1%|▏ | 160/11526 [01:39<1:57:15, 1.62it/s] {'loss': 0.5611, 'grad_norm': 1.056372880935669, 'learning_rate': 1.3876843018213356e-06, 'epoch': 0.04}
1%|▏ | 160/11526 [01:39<1:57:15, 1.62it/s] 1%|▏ | 161/11526 [01:40<1:57:01, 1.62it/s] {'loss': 0.5747, 'grad_norm': 1.0789778232574463, 'learning_rate': 1.3963573287077192e-06, 'epoch': 0.04}
1%|▏ | 161/11526 [01:40<1:57:01, 1.62it/s] 1%|▏ | 162/11526 [01:41<1:56:49, 1.62it/s] {'loss': 0.5926, 'grad_norm': 1.204740285873413, 'learning_rate': 1.4050303555941025e-06, 'epoch': 0.04}
1%|▏ | 162/11526 [01:41<1:56:49, 1.62it/s] 1%|▏ | 163/11526 [01:41<1:56:36, 1.62it/s] {'loss': 0.7206, 'grad_norm': 1.238403081893921, 'learning_rate': 1.4137033824804857e-06, 'epoch': 0.04}
1%|▏ | 163/11526 [01:41<1:56:36, 1.62it/s] 1%|▏ | 164/11526 [01:42<1:56:37, 1.62it/s] {'loss': 0.7112, 'grad_norm': 1.1109087467193604, 'learning_rate': 1.422376409366869e-06, 'epoch': 0.04}
1%|▏ | 164/11526 [01:42<1:56:37, 1.62it/s] 1%|▏ | 165/11526 [01:42<1:56:39, 1.62it/s] {'loss': 0.6053, 'grad_norm': 1.0813874006271362, 'learning_rate': 1.4310494362532526e-06, 'epoch': 0.04}
1%|▏ | 165/11526 [01:42<1:56:39, 1.62it/s] 1%|▏ | 166/11526 [01:43<1:56:36, 1.62it/s] {'loss': 0.5977, 'grad_norm': 1.1231296062469482, 'learning_rate': 1.439722463139636e-06, 'epoch': 0.04}
1%|▏ | 166/11526 [01:43<1:56:36, 1.62it/s] 1%|▏ | 167/11526 [01:44<1:56:24, 1.63it/s] {'loss': 0.4941, 'grad_norm': 0.9455352425575256, 'learning_rate': 1.448395490026019e-06, 'epoch': 0.04}
1%|▏ | 167/11526 [01:44<1:56:24, 1.63it/s] 1%|▏ | 168/11526 [01:44<1:56:20, 1.63it/s] {'loss': 0.5498, 'grad_norm': 0.9913843870162964, 'learning_rate': 1.4570685169124024e-06, 'epoch': 0.04}
1%|▏ | 168/11526 [01:44<1:56:20, 1.63it/s] 1%|▏ | 169/11526 [01:45<1:57:06, 1.62it/s] {'loss': 0.6178, 'grad_norm': 1.0535211563110352, 'learning_rate': 1.465741543798786e-06, 'epoch': 0.04}
1%|▏ | 169/11526 [01:45<1:57:06, 1.62it/s] 1%|▏ | 170/11526 [01:45<1:57:05, 1.62it/s] {'loss': 0.4798, 'grad_norm': 0.9131155610084534, 'learning_rate': 1.4744145706851693e-06, 'epoch': 0.04}
1%|▏ | 170/11526 [01:46<1:57:05, 1.62it/s] 1%|▏ | 171/11526 [01:46<1:56:46, 1.62it/s] {'loss': 0.4877, 'grad_norm': 1.1136891841888428, 'learning_rate': 1.4830875975715525e-06, 'epoch': 0.04}
1%|▏ | 171/11526 [01:46<1:56:46, 1.62it/s] 1%|▏ | 172/11526 [01:47<1:56:49, 1.62it/s] {'loss': 0.6241, 'grad_norm': 1.06000816822052, 'learning_rate': 1.491760624457936e-06, 'epoch': 0.04}
1%|▏ | 172/11526 [01:47<1:56:49, 1.62it/s] 2%|▏ | 173/11526 [01:47<1:56:53, 1.62it/s] {'loss': 0.5009, 'grad_norm': 0.9263389706611633, 'learning_rate': 1.5004336513443194e-06, 'epoch': 0.05}
2%|▏ | 173/11526 [01:47<1:56:53, 1.62it/s] 2%|▏ | 174/11526 [01:48<1:57:00, 1.62it/s] {'loss': 0.5841, 'grad_norm': 0.9428918361663818, 'learning_rate': 1.5091066782307025e-06, 'epoch': 0.05}
2%|▏ | 174/11526 [01:48<1:57:00, 1.62it/s] 2%|▏ | 175/11526 [01:49<1:57:03, 1.62it/s] {'loss': 0.507, 'grad_norm': 0.9835313558578491, 'learning_rate': 1.5177797051170859e-06, 'epoch': 0.05}
2%|▏ | 175/11526 [01:49<1:57:03, 1.62it/s] 2%|▏ | 176/11526 [01:49<1:56:52, 1.62it/s] {'loss': 0.6359, 'grad_norm': 0.9990827441215515, 'learning_rate': 1.5264527320034694e-06, 'epoch': 0.05}
2%|▏ | 176/11526 [01:49<1:56:52, 1.62it/s] 2%|▏ | 177/11526 [01:50<1:56:49, 1.62it/s] {'loss': 0.4714, 'grad_norm': 0.9614607691764832, 'learning_rate': 1.5351257588898528e-06, 'epoch': 0.05}
2%|▏ | 177/11526 [01:50<1:56:49, 1.62it/s] 2%|▏ | 178/11526 [01:50<1:56:41, 1.62it/s] {'loss': 0.6777, 'grad_norm': 1.112619400024414, 'learning_rate': 1.543798785776236e-06, 'epoch': 0.05}
2%|▏ | 178/11526 [01:51<1:56:41, 1.62it/s] 2%|▏ | 179/11526 [01:51<1:56:53, 1.62it/s] {'loss': 0.633, 'grad_norm': 1.1248114109039307, 'learning_rate': 1.5524718126626192e-06, 'epoch': 0.05}
2%|▏ | 179/11526 [01:51<1:56:53, 1.62it/s] 2%|▏ | 180/11526 [01:52<1:56:47, 1.62it/s] {'loss': 0.4786, 'grad_norm': 0.958987295627594, 'learning_rate': 1.5611448395490028e-06, 'epoch': 0.05}
2%|▏ | 180/11526 [01:52<1:56:47, 1.62it/s] 2%|▏ | 181/11526 [01:52<1:56:46, 1.62it/s] {'loss': 0.6358, 'grad_norm': 1.149208903312683, 'learning_rate': 1.5698178664353862e-06, 'epoch': 0.05}
2%|▏ | 181/11526 [01:52<1:56:46, 1.62it/s] 2%|▏ | 182/11526 [01:53<1:56:37, 1.62it/s] {'loss': 0.4743, 'grad_norm': 0.8846536874771118, 'learning_rate': 1.5784908933217693e-06, 'epoch': 0.05}
2%|▏ | 182/11526 [01:53<1:56:37, 1.62it/s] 2%|▏ | 183/11526 [01:53<1:56:40, 1.62it/s] {'loss': 0.5149, 'grad_norm': 0.909762978553772, 'learning_rate': 1.5871639202081529e-06, 'epoch': 0.05}
2%|▏ | 183/11526 [01:54<1:56:40, 1.62it/s] 2%|▏ | 184/11526 [01:54<1:56:44, 1.62it/s] {'loss': 0.6426, 'grad_norm': 1.0294567346572876, 'learning_rate': 1.5958369470945362e-06, 'epoch': 0.05}
2%|▏ | 184/11526 [01:54<1:56:44, 1.62it/s] 2%|▏ | 185/11526 [01:55<1:56:43, 1.62it/s] {'loss': 0.463, 'grad_norm': 0.902991533279419, 'learning_rate': 1.6045099739809195e-06, 'epoch': 0.05}
2%|▏ | 185/11526 [01:55<1:56:43, 1.62it/s] 2%|▏ | 186/11526 [01:55<1:56:34, 1.62it/s] {'loss': 0.6086, 'grad_norm': 1.1820343732833862, 'learning_rate': 1.6131830008673027e-06, 'epoch': 0.05}
2%|▏ | 186/11526 [01:55<1:56:34, 1.62it/s] 2%|▏ | 187/11526 [01:56<1:56:56, 1.62it/s] {'loss': 0.6342, 'grad_norm': 1.012710452079773, 'learning_rate': 1.6218560277536862e-06, 'epoch': 0.05}
2%|▏ | 187/11526 [01:56<1:56:56, 1.62it/s] 2%|▏ | 188/11526 [01:57<1:56:44, 1.62it/s] {'loss': 0.5113, 'grad_norm': 0.9933950901031494, 'learning_rate': 1.6305290546400696e-06, 'epoch': 0.05}
2%|▏ | 188/11526 [01:57<1:56:44, 1.62it/s] 2%|▏ | 189/11526 [01:57<1:56:47, 1.62it/s] {'loss': 0.4752, 'grad_norm': 1.0137450695037842, 'learning_rate': 1.6392020815264527e-06, 'epoch': 0.05}
2%|▏ | 189/11526 [01:57<1:56:47, 1.62it/s] 2%|▏ | 190/11526 [01:58<1:56:43, 1.62it/s] {'loss': 0.6238, 'grad_norm': 1.163692593574524, 'learning_rate': 1.647875108412836e-06, 'epoch': 0.05}
2%|▏ | 190/11526 [01:58<1:56:43, 1.62it/s] 2%|▏ | 191/11526 [01:58<1:56:51, 1.62it/s] {'loss': 0.6651, 'grad_norm': 1.009185552597046, 'learning_rate': 1.6565481352992196e-06, 'epoch': 0.05}
2%|▏ | 191/11526 [01:59<1:56:51, 1.62it/s] 2%|▏ | 192/11526 [01:59<1:57:06, 1.61it/s] {'loss': 0.438, 'grad_norm': 0.8828431963920593, 'learning_rate': 1.665221162185603e-06, 'epoch': 0.05}
2%|▏ | 192/11526 [01:59<1:57:06, 1.61it/s] 2%|▏ | 193/11526 [02:00<1:57:01, 1.61it/s] {'loss': 0.5476, 'grad_norm': 0.9766502380371094, 'learning_rate': 1.6738941890719861e-06, 'epoch': 0.05}
2%|▏ | 193/11526 [02:00<1:57:01, 1.61it/s] 2%|▏ | 194/11526 [02:00<1:56:57, 1.61it/s] {'loss': 0.5093, 'grad_norm': 0.9113374948501587, 'learning_rate': 1.6825672159583695e-06, 'epoch': 0.05}
2%|▏ | 194/11526 [02:00<1:56:57, 1.61it/s] 2%|▏ | 195/11526 [02:01<1:56:45, 1.62it/s] {'loss': 0.5294, 'grad_norm': 0.973028838634491, 'learning_rate': 1.691240242844753e-06, 'epoch': 0.05}
2%|▏ | 195/11526 [02:01<1:56:45, 1.62it/s] 2%|▏ | 196/11526 [02:02<1:56:42, 1.62it/s] {'loss': 0.624, 'grad_norm': 1.0550676584243774, 'learning_rate': 1.6999132697311364e-06, 'epoch': 0.05}
2%|▏ | 196/11526 [02:02<1:56:42, 1.62it/s] 2%|▏ | 197/11526 [02:02<1:56:36, 1.62it/s] {'loss': 0.5009, 'grad_norm': 0.968843936920166, 'learning_rate': 1.7085862966175195e-06, 'epoch': 0.05}
2%|▏ | 197/11526 [02:02<1:56:36, 1.62it/s] 2%|▏ | 198/11526 [02:03<1:56:35, 1.62it/s] {'loss': 0.5506, 'grad_norm': 0.9119936227798462, 'learning_rate': 1.717259323503903e-06, 'epoch': 0.05}
2%|▏ | 198/11526 [02:03<1:56:35, 1.62it/s] 2%|▏ | 199/11526 [02:03<1:56:41, 1.62it/s] {'loss': 0.6645, 'grad_norm': 1.021525502204895, 'learning_rate': 1.7259323503902864e-06, 'epoch': 0.05}
2%|▏ | 199/11526 [02:04<1:56:41, 1.62it/s] 2%|▏ | 200/11526 [02:04<1:56:36, 1.62it/s] {'loss': 0.4601, 'grad_norm': 0.8272081613540649, 'learning_rate': 1.7346053772766698e-06, 'epoch': 0.05}
2%|▏ | 200/11526 [02:04<1:56:36, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.01it/s]
31%|███ | 4/13 [00:00<00:01, 8.28it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.70it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.34it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.12it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.97it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.87it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.75it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.72it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.886620283126831, 'eval_runtime': 1.9655, 'eval_samples_per_second': 101.755, 'eval_steps_per_second': 6.614, 'epoch': 0.05}
2%|▏ | 200/11526 [02:06<1:56:36, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 2%|▏ | 201/11526 [02:07<3:48:08, 1.21s/it] {'loss': 0.6475, 'grad_norm': 1.1240336894989014, 'learning_rate': 1.743278404163053e-06, 'epoch': 0.05}
2%|▏ | 201/11526 [02:07<3:48:08, 1.21s/it] 2%|▏ | 202/11526 [02:07<3:14:35, 1.03s/it] {'loss': 0.5469, 'grad_norm': 0.8994268774986267, 'learning_rate': 1.7519514310494365e-06, 'epoch': 0.05}
2%|▏ | 202/11526 [02:07<3:14:35, 1.03s/it] 2%|▏ | 203/11526 [02:08<2:51:06, 1.10it/s] {'loss': 0.5461, 'grad_norm': 0.9215892553329468, 'learning_rate': 1.7606244579358198e-06, 'epoch': 0.05}
2%|▏ | 203/11526 [02:08<2:51:06, 1.10it/s] 2%|▏ | 204/11526 [02:08<2:34:42, 1.22it/s] {'loss': 0.3772, 'grad_norm': 0.8127555251121521, 'learning_rate': 1.769297484822203e-06, 'epoch': 0.05}
2%|▏ | 204/11526 [02:09<2:34:42, 1.22it/s] 2%|▏ | 205/11526 [02:09<2:23:14, 1.32it/s] {'loss': 0.6155, 'grad_norm': 0.9543675780296326, 'learning_rate': 1.7779705117085863e-06, 'epoch': 0.05}
2%|▏ | 205/11526 [02:09<2:23:14, 1.32it/s] 2%|▏ | 206/11526 [02:10<2:15:18, 1.39it/s] {'loss': 0.4677, 'grad_norm': 0.8139895796775818, 'learning_rate': 1.7866435385949699e-06, 'epoch': 0.05}
2%|▏ | 206/11526 [02:10<2:15:18, 1.39it/s] 2%|▏ | 207/11526 [02:10<2:09:42, 1.45it/s] {'loss': 0.5685, 'grad_norm': 1.0176928043365479, 'learning_rate': 1.7953165654813532e-06, 'epoch': 0.05}
2%|▏ | 207/11526 [02:10<2:09:42, 1.45it/s] 2%|▏ | 208/11526 [02:11<2:05:46, 1.50it/s] {'loss': 0.6238, 'grad_norm': 0.9643756747245789, 'learning_rate': 1.8039895923677363e-06, 'epoch': 0.05}
2%|▏ | 208/11526 [02:11<2:05:46, 1.50it/s] 2%|▏ | 209/11526 [02:12<2:02:55, 1.53it/s] {'loss': 0.5569, 'grad_norm': 0.9393540024757385, 'learning_rate': 1.81266261925412e-06, 'epoch': 0.05}
2%|▏ | 209/11526 [02:12<2:02:55, 1.53it/s] 2%|▏ | 210/11526 [02:12<2:00:50, 1.56it/s] {'loss': 0.5684, 'grad_norm': 1.027360439300537, 'learning_rate': 1.8213356461405032e-06, 'epoch': 0.05}
2%|▏ | 210/11526 [02:12<2:00:50, 1.56it/s] 2%|▏ | 211/11526 [02:13<1:59:29, 1.58it/s] {'loss': 0.4719, 'grad_norm': 0.8393529653549194, 'learning_rate': 1.8300086730268866e-06, 'epoch': 0.05}
2%|▏ | 211/11526 [02:13<1:59:29, 1.58it/s] 2%|▏ | 212/11526 [02:13<1:58:28, 1.59it/s] {'loss': 0.5277, 'grad_norm': 0.8615040183067322, 'learning_rate': 1.8386816999132697e-06, 'epoch': 0.06}
2%|▏ | 212/11526 [02:13<1:58:28, 1.59it/s] 2%|▏ | 213/11526 [02:14<1:57:51, 1.60it/s] {'loss': 0.4689, 'grad_norm': 0.8096368908882141, 'learning_rate': 1.8473547267996533e-06, 'epoch': 0.06}
2%|▏ | 213/11526 [02:14<1:57:51, 1.60it/s] 2%|▏ | 214/11526 [02:15<1:57:31, 1.60it/s] {'loss': 0.6827, 'grad_norm': 1.0990355014801025, 'learning_rate': 1.8560277536860366e-06, 'epoch': 0.06}
2%|▏ | 214/11526 [02:15<1:57:31, 1.60it/s] 2%|▏ | 215/11526 [02:15<1:57:11, 1.61it/s] {'loss': 0.4538, 'grad_norm': 1.069753646850586, 'learning_rate': 1.86470078057242e-06, 'epoch': 0.06}
2%|▏ | 215/11526 [02:15<1:57:11, 1.61it/s] 2%|▏ | 216/11526 [02:16<1:56:58, 1.61it/s] {'loss': 0.5351, 'grad_norm': 1.0479443073272705, 'learning_rate': 1.8733738074588031e-06, 'epoch': 0.06}
2%|▏ | 216/11526 [02:16<1:56:58, 1.61it/s] 2%|▏ | 217/11526 [02:16<1:56:49, 1.61it/s] {'loss': 0.6071, 'grad_norm': 1.0307600498199463, 'learning_rate': 1.8820468343451867e-06, 'epoch': 0.06}
2%|▏ | 217/11526 [02:17<1:56:49, 1.61it/s] 2%|▏ | 218/11526 [02:17<1:56:43, 1.61it/s] {'loss': 0.4931, 'grad_norm': 0.8304101824760437, 'learning_rate': 1.89071986123157e-06, 'epoch': 0.06}
2%|▏ | 218/11526 [02:17<1:56:43, 1.61it/s] 2%|▏ | 219/11526 [02:18<1:56:32, 1.62it/s] {'loss': 0.4616, 'grad_norm': 0.9603312611579895, 'learning_rate': 1.8993928881179532e-06, 'epoch': 0.06}
2%|▏ | 219/11526 [02:18<1:56:32, 1.62it/s] 2%|▏ | 220/11526 [02:18<1:56:23, 1.62it/s] {'loss': 0.4481, 'grad_norm': 0.7976512312889099, 'learning_rate': 1.908065915004337e-06, 'epoch': 0.06}
2%|▏ | 220/11526 [02:18<1:56:23, 1.62it/s] 2%|▏ | 221/11526 [02:19<1:56:24, 1.62it/s] {'loss': 0.4171, 'grad_norm': 0.76175856590271, 'learning_rate': 1.9167389418907203e-06, 'epoch': 0.06}
2%|▏ | 221/11526 [02:19<1:56:24, 1.62it/s] 2%|▏ | 222/11526 [02:20<1:56:18, 1.62it/s] {'loss': 0.5006, 'grad_norm': 0.8543665409088135, 'learning_rate': 1.925411968777103e-06, 'epoch': 0.06}
2%|▏ | 222/11526 [02:20<1:56:18, 1.62it/s] 2%|▏ | 223/11526 [02:20<1:56:26, 1.62it/s] {'loss': 0.4982, 'grad_norm': 0.7840791344642639, 'learning_rate': 1.9340849956634866e-06, 'epoch': 0.06}
2%|▏ | 223/11526 [02:20<1:56:26, 1.62it/s] 2%|▏ | 224/11526 [02:21<1:56:33, 1.62it/s] {'loss': 0.4925, 'grad_norm': 0.9489762783050537, 'learning_rate': 1.9427580225498703e-06, 'epoch': 0.06}
2%|▏ | 224/11526 [02:21<1:56:33, 1.62it/s] 2%|▏ | 225/11526 [02:21<1:56:34, 1.62it/s] {'loss': 0.5693, 'grad_norm': 0.9446815252304077, 'learning_rate': 1.9514310494362532e-06, 'epoch': 0.06}
2%|▏ | 225/11526 [02:22<1:56:34, 1.62it/s] 2%|▏ | 226/11526 [02:22<1:56:30, 1.62it/s] {'loss': 0.534, 'grad_norm': 0.9462544322013855, 'learning_rate': 1.9601040763226366e-06, 'epoch': 0.06}
2%|▏ | 226/11526 [02:22<1:56:30, 1.62it/s] 2%|▏ | 227/11526 [02:23<1:56:25, 1.62it/s] {'loss': 0.5015, 'grad_norm': 0.8612630367279053, 'learning_rate': 1.96877710320902e-06, 'epoch': 0.06}
2%|▏ | 227/11526 [02:23<1:56:25, 1.62it/s] 2%|▏ | 228/11526 [02:23<1:56:22, 1.62it/s] {'loss': 0.4809, 'grad_norm': 0.8276378512382507, 'learning_rate': 1.9774501300954037e-06, 'epoch': 0.06}
2%|▏ | 228/11526 [02:23<1:56:22, 1.62it/s] 2%|▏ | 229/11526 [02:24<1:56:28, 1.62it/s] {'loss': 0.5567, 'grad_norm': 0.855896532535553, 'learning_rate': 1.9861231569817866e-06, 'epoch': 0.06}
2%|▏ | 229/11526 [02:24<1:56:28, 1.62it/s] 2%|▏ | 230/11526 [02:24<1:56:21, 1.62it/s] {'loss': 0.4568, 'grad_norm': 0.7570719122886658, 'learning_rate': 1.99479618386817e-06, 'epoch': 0.06}
2%|▏ | 230/11526 [02:25<1:56:21, 1.62it/s] 2%|▏ | 231/11526 [02:25<1:56:20, 1.62it/s] {'loss': 0.4735, 'grad_norm': 0.866510272026062, 'learning_rate': 2.0034692107545538e-06, 'epoch': 0.06}
2%|▏ | 231/11526 [02:25<1:56:20, 1.62it/s] 2%|▏ | 232/11526 [02:26<1:56:16, 1.62it/s] {'loss': 0.4826, 'grad_norm': 0.7541313171386719, 'learning_rate': 2.012142237640937e-06, 'epoch': 0.06}
2%|▏ | 232/11526 [02:26<1:56:16, 1.62it/s] 2%|▏ | 233/11526 [02:26<1:56:21, 1.62it/s] {'loss': 0.5679, 'grad_norm': 0.8497040867805481, 'learning_rate': 2.02081526452732e-06, 'epoch': 0.06}
2%|▏ | 233/11526 [02:26<1:56:21, 1.62it/s] 2%|▏ | 234/11526 [02:27<1:56:21, 1.62it/s] {'loss': 0.507, 'grad_norm': 0.8653647899627686, 'learning_rate': 2.0294882914137034e-06, 'epoch': 0.06}
2%|▏ | 234/11526 [02:27<1:56:21, 1.62it/s] 2%|▏ | 235/11526 [02:28<1:56:17, 1.62it/s] {'loss': 0.4999, 'grad_norm': 0.8112255334854126, 'learning_rate': 2.038161318300087e-06, 'epoch': 0.06}
2%|▏ | 235/11526 [02:28<1:56:17, 1.62it/s] 2%|▏ | 236/11526 [02:28<1:56:16, 1.62it/s] {'loss': 0.5493, 'grad_norm': 0.8768890500068665, 'learning_rate': 2.0468343451864705e-06, 'epoch': 0.06}
2%|▏ | 236/11526 [02:28<1:56:16, 1.62it/s] 2%|▏ | 237/11526 [02:29<1:56:18, 1.62it/s] {'loss': 0.5352, 'grad_norm': 0.8988469839096069, 'learning_rate': 2.0555073720728534e-06, 'epoch': 0.06}
2%|▏ | 237/11526 [02:29<1:56:18, 1.62it/s] 2%|▏ | 238/11526 [02:29<1:56:10, 1.62it/s] {'loss': 0.4932, 'grad_norm': 0.7622236013412476, 'learning_rate': 2.0641803989592368e-06, 'epoch': 0.06}
2%|▏ | 238/11526 [02:30<1:56:10, 1.62it/s] 2%|▏ | 239/11526 [02:30<1:56:15, 1.62it/s] {'loss': 0.4915, 'grad_norm': 0.9700366258621216, 'learning_rate': 2.0728534258456205e-06, 'epoch': 0.06}
2%|▏ | 239/11526 [02:30<1:56:15, 1.62it/s] 2%|▏ | 240/11526 [02:31<1:56:06, 1.62it/s] {'loss': 0.59, 'grad_norm': 0.7398103475570679, 'learning_rate': 2.0815264527320035e-06, 'epoch': 0.06}
2%|▏ | 240/11526 [02:31<1:56:06, 1.62it/s] 2%|▏ | 241/11526 [02:31<1:56:03, 1.62it/s] {'loss': 0.4806, 'grad_norm': 0.8657262921333313, 'learning_rate': 2.090199479618387e-06, 'epoch': 0.06}
2%|▏ | 241/11526 [02:31<1:56:03, 1.62it/s] 2%|▏ | 242/11526 [02:32<1:56:05, 1.62it/s] {'loss': 0.4564, 'grad_norm': 0.8339256048202515, 'learning_rate': 2.09887250650477e-06, 'epoch': 0.06}
2%|▏ | 242/11526 [02:32<1:56:05, 1.62it/s] 2%|▏ | 243/11526 [02:33<1:56:03, 1.62it/s] {'loss': 0.6507, 'grad_norm': 0.9983787536621094, 'learning_rate': 2.107545533391154e-06, 'epoch': 0.06}
2%|▏ | 243/11526 [02:33<1:56:03, 1.62it/s] 2%|▏ | 244/11526 [02:33<1:56:10, 1.62it/s] {'loss': 0.5751, 'grad_norm': 1.0589375495910645, 'learning_rate': 2.116218560277537e-06, 'epoch': 0.06}
2%|▏ | 244/11526 [02:33<1:56:10, 1.62it/s] 2%|▏ | 245/11526 [02:34<1:56:11, 1.62it/s] {'loss': 0.4881, 'grad_norm': 0.9231506586074829, 'learning_rate': 2.12489158716392e-06, 'epoch': 0.06}
2%|▏ | 245/11526 [02:34<1:56:11, 1.62it/s] 2%|▏ | 246/11526 [02:34<1:56:08, 1.62it/s] {'loss': 0.477, 'grad_norm': 0.8600086569786072, 'learning_rate': 2.133564614050304e-06, 'epoch': 0.06}
2%|▏ | 246/11526 [02:35<1:56:08, 1.62it/s] 2%|▏ | 247/11526 [02:35<1:56:09, 1.62it/s] {'loss': 0.5333, 'grad_norm': 0.8610706925392151, 'learning_rate': 2.1422376409366873e-06, 'epoch': 0.06}
2%|▏ | 247/11526 [02:35<1:56:09, 1.62it/s] 2%|▏ | 248/11526 [02:36<1:56:06, 1.62it/s] {'loss': 0.5032, 'grad_norm': 0.8197236061096191, 'learning_rate': 2.1509106678230702e-06, 'epoch': 0.06}
2%|▏ | 248/11526 [02:36<1:56:06, 1.62it/s] 2%|▏ | 249/11526 [02:36<1:56:15, 1.62it/s] {'loss': 0.5402, 'grad_norm': 0.8955065011978149, 'learning_rate': 2.1595836947094536e-06, 'epoch': 0.06}
2%|▏ | 249/11526 [02:36<1:56:15, 1.62it/s] 2%|▏ | 250/11526 [02:37<1:56:02, 1.62it/s] {'loss': 0.5558, 'grad_norm': 0.8481603264808655, 'learning_rate': 2.1682567215958374e-06, 'epoch': 0.07}
2%|▏ | 250/11526 [02:37<1:56:02, 1.62it/s] 2%|▏ | 251/11526 [02:37<1:55:58, 1.62it/s] {'loss': 0.3769, 'grad_norm': 0.6754995584487915, 'learning_rate': 2.1769297484822203e-06, 'epoch': 0.07}
2%|▏ | 251/11526 [02:38<1:55:58, 1.62it/s] 2%|▏ | 252/11526 [02:38<1:55:56, 1.62it/s] {'loss': 0.4508, 'grad_norm': 0.7623719573020935, 'learning_rate': 2.1856027753686036e-06, 'epoch': 0.07}
2%|▏ | 252/11526 [02:38<1:55:56, 1.62it/s] 2%|▏ | 253/11526 [02:39<1:55:55, 1.62it/s] {'loss': 0.4412, 'grad_norm': 0.6883414387702942, 'learning_rate': 2.194275802254987e-06, 'epoch': 0.07}
2%|▏ | 253/11526 [02:39<1:55:55, 1.62it/s] 2%|▏ | 254/11526 [02:39<1:55:59, 1.62it/s] {'loss': 0.5021, 'grad_norm': 0.8785725235939026, 'learning_rate': 2.2029488291413708e-06, 'epoch': 0.07}
2%|▏ | 254/11526 [02:39<1:55:59, 1.62it/s] 2%|▏ | 255/11526 [02:40<1:55:51, 1.62it/s] {'loss': 0.4999, 'grad_norm': 0.9236719012260437, 'learning_rate': 2.2116218560277537e-06, 'epoch': 0.07}
2%|▏ | 255/11526 [02:40<1:55:51, 1.62it/s] 2%|▏ | 256/11526 [02:41<1:55:54, 1.62it/s] {'loss': 0.4236, 'grad_norm': 0.8259513974189758, 'learning_rate': 2.220294882914137e-06, 'epoch': 0.07}
2%|▏ | 256/11526 [02:41<1:55:54, 1.62it/s] 2%|▏ | 257/11526 [02:41<1:55:58, 1.62it/s] {'loss': 0.4464, 'grad_norm': 0.8425729274749756, 'learning_rate': 2.228967909800521e-06, 'epoch': 0.07}
2%|▏ | 257/11526 [02:41<1:55:58, 1.62it/s] 2%|▏ | 258/11526 [02:42<1:55:54, 1.62it/s] {'loss': 0.5109, 'grad_norm': 0.8704429864883423, 'learning_rate': 2.237640936686904e-06, 'epoch': 0.07}
2%|▏ | 258/11526 [02:42<1:55:54, 1.62it/s] 2%|▏ | 259/11526 [02:42<1:55:53, 1.62it/s] {'loss': 0.4504, 'grad_norm': 0.7601255774497986, 'learning_rate': 2.246313963573287e-06, 'epoch': 0.07}
2%|▏ | 259/11526 [02:43<1:55:53, 1.62it/s] 2%|▏ | 260/11526 [02:43<1:55:49, 1.62it/s] {'loss': 0.484, 'grad_norm': 0.9076888561248779, 'learning_rate': 2.2549869904596704e-06, 'epoch': 0.07}
2%|▏ | 260/11526 [02:43<1:55:49, 1.62it/s] 2%|▏ | 261/11526 [02:44<1:55:51, 1.62it/s] {'loss': 0.5263, 'grad_norm': 0.7861553430557251, 'learning_rate': 2.263660017346054e-06, 'epoch': 0.07}
2%|▏ | 261/11526 [02:44<1:55:51, 1.62it/s] 2%|▏ | 262/11526 [02:44<1:55:53, 1.62it/s] {'loss': 0.4669, 'grad_norm': 0.8085551857948303, 'learning_rate': 2.2723330442324375e-06, 'epoch': 0.07}
2%|▏ | 262/11526 [02:44<1:55:53, 1.62it/s] 2%|▏ | 263/11526 [02:45<1:55:55, 1.62it/s] {'loss': 0.5011, 'grad_norm': 0.7605535984039307, 'learning_rate': 2.2810060711188205e-06, 'epoch': 0.07}
2%|▏ | 263/11526 [02:45<1:55:55, 1.62it/s] 2%|▏ | 264/11526 [02:45<1:56:00, 1.62it/s] {'loss': 0.4594, 'grad_norm': 0.9364368319511414, 'learning_rate': 2.289679098005204e-06, 'epoch': 0.07}
2%|▏ | 264/11526 [02:46<1:56:00, 1.62it/s] 2%|▏ | 265/11526 [02:46<1:55:54, 1.62it/s] {'loss': 0.5565, 'grad_norm': 0.8769363164901733, 'learning_rate': 2.2983521248915876e-06, 'epoch': 0.07}
2%|▏ | 265/11526 [02:46<1:55:54, 1.62it/s] 2%|▏ | 266/11526 [02:47<1:55:58, 1.62it/s] {'loss': 0.5829, 'grad_norm': 0.9305436611175537, 'learning_rate': 2.3070251517779705e-06, 'epoch': 0.07}
2%|▏ | 266/11526 [02:47<1:55:58, 1.62it/s] 2%|▏ | 267/11526 [02:47<1:56:02, 1.62it/s] {'loss': 0.5152, 'grad_norm': 0.8357610106468201, 'learning_rate': 2.315698178664354e-06, 'epoch': 0.07}
2%|▏ | 267/11526 [02:47<1:56:02, 1.62it/s] 2%|▏ | 268/11526 [02:48<1:56:02, 1.62it/s] {'loss': 0.3704, 'grad_norm': 0.6150936484336853, 'learning_rate': 2.3243712055507376e-06, 'epoch': 0.07}
2%|▏ | 268/11526 [02:48<1:56:02, 1.62it/s] 2%|▏ | 269/11526 [02:49<1:56:02, 1.62it/s] {'loss': 0.5495, 'grad_norm': 0.8398849964141846, 'learning_rate': 2.333044232437121e-06, 'epoch': 0.07}
2%|▏ | 269/11526 [02:49<1:56:02, 1.62it/s] 2%|▏ | 270/11526 [02:49<1:56:03, 1.62it/s] {'loss': 0.5341, 'grad_norm': 0.919773280620575, 'learning_rate': 2.341717259323504e-06, 'epoch': 0.07}
2%|▏ | 270/11526 [02:49<1:56:03, 1.62it/s] 2%|▏ | 271/11526 [02:50<1:56:00, 1.62it/s] {'loss': 0.53, 'grad_norm': 0.9085887670516968, 'learning_rate': 2.3503902862098872e-06, 'epoch': 0.07}
2%|▏ | 271/11526 [02:50<1:56:00, 1.62it/s] 2%|▏ | 272/11526 [02:50<1:55:59, 1.62it/s] {'loss': 0.4364, 'grad_norm': 0.715894341468811, 'learning_rate': 2.359063313096271e-06, 'epoch': 0.07}
2%|▏ | 272/11526 [02:51<1:55:59, 1.62it/s] 2%|▏ | 273/11526 [02:51<1:55:59, 1.62it/s] {'loss': 0.4911, 'grad_norm': 0.9233723878860474, 'learning_rate': 2.3677363399826544e-06, 'epoch': 0.07}
2%|▏ | 273/11526 [02:51<1:55:59, 1.62it/s] 2%|▏ | 274/11526 [02:52<1:56:04, 1.62it/s] {'loss': 0.4436, 'grad_norm': 0.7409653067588806, 'learning_rate': 2.3764093668690373e-06, 'epoch': 0.07}
2%|▏ | 274/11526 [02:52<1:56:04, 1.62it/s] 2%|▏ | 275/11526 [02:52<1:55:59, 1.62it/s] {'loss': 0.5128, 'grad_norm': 0.8842470049858093, 'learning_rate': 2.3850823937554206e-06, 'epoch': 0.07}
2%|▏ | 275/11526 [02:52<1:55:59, 1.62it/s] 2%|▏ | 276/11526 [02:53<1:55:53, 1.62it/s] {'loss': 0.4791, 'grad_norm': 0.7243514657020569, 'learning_rate': 2.3937554206418044e-06, 'epoch': 0.07}
2%|▏ | 276/11526 [02:53<1:55:53, 1.62it/s] 2%|▏ | 277/11526 [02:54<1:55:50, 1.62it/s] {'loss': 0.3975, 'grad_norm': 1.13950777053833, 'learning_rate': 2.4024284475281878e-06, 'epoch': 0.07}
2%|▏ | 277/11526 [02:54<1:55:50, 1.62it/s] 2%|▏ | 278/11526 [02:54<1:55:54, 1.62it/s] {'loss': 0.4769, 'grad_norm': 0.8198620676994324, 'learning_rate': 2.4111014744145707e-06, 'epoch': 0.07}
2%|▏ | 278/11526 [02:54<1:55:54, 1.62it/s] 2%|▏ | 279/11526 [02:55<1:55:56, 1.62it/s] {'loss': 0.4653, 'grad_norm': 0.8040992021560669, 'learning_rate': 2.419774501300954e-06, 'epoch': 0.07}
2%|▏ | 279/11526 [02:55<1:55:56, 1.62it/s] 2%|▏ | 280/11526 [02:55<1:55:54, 1.62it/s] {'loss': 0.5751, 'grad_norm': 0.8148336410522461, 'learning_rate': 2.428447528187338e-06, 'epoch': 0.07}
2%|▏ | 280/11526 [02:56<1:55:54, 1.62it/s] 2%|▏ | 281/11526 [02:56<1:55:52, 1.62it/s] {'loss': 0.4751, 'grad_norm': 0.7908636927604675, 'learning_rate': 2.4371205550737207e-06, 'epoch': 0.07}
2%|▏ | 281/11526 [02:56<1:55:52, 1.62it/s] 2%|▏ | 282/11526 [02:57<1:55:43, 1.62it/s] {'loss': 0.6632, 'grad_norm': 0.92950838804245, 'learning_rate': 2.445793581960104e-06, 'epoch': 0.07}
2%|▏ | 282/11526 [02:57<1:55:43, 1.62it/s] 2%|▏ | 283/11526 [02:57<1:55:39, 1.62it/s] {'loss': 0.4736, 'grad_norm': 0.8463438749313354, 'learning_rate': 2.454466608846488e-06, 'epoch': 0.07}
2%|▏ | 283/11526 [02:57<1:55:39, 1.62it/s] 2%|▏ | 284/11526 [02:58<1:55:47, 1.62it/s] {'loss': 0.4028, 'grad_norm': 0.7083885073661804, 'learning_rate': 2.463139635732871e-06, 'epoch': 0.07}
2%|▏ | 284/11526 [02:58<1:55:47, 1.62it/s] 2%|▏ | 285/11526 [02:58<1:55:47, 1.62it/s] {'loss': 0.3887, 'grad_norm': 0.6592311859130859, 'learning_rate': 2.471812662619254e-06, 'epoch': 0.07}
2%|▏ | 285/11526 [02:59<1:55:47, 1.62it/s] 2%|▏ | 286/11526 [02:59<1:55:47, 1.62it/s] {'loss': 0.6346, 'grad_norm': 1.0245972871780396, 'learning_rate': 2.4804856895056375e-06, 'epoch': 0.07}
2%|▏ | 286/11526 [02:59<1:55:47, 1.62it/s] 2%|▏ | 287/11526 [03:00<1:55:43, 1.62it/s] {'loss': 0.501, 'grad_norm': 0.8079186677932739, 'learning_rate': 2.4891587163920212e-06, 'epoch': 0.07}
2%|▏ | 287/11526 [03:00<1:55:43, 1.62it/s] 2%|▏ | 288/11526 [03:00<1:55:42, 1.62it/s] {'loss': 0.5456, 'grad_norm': 0.9552274942398071, 'learning_rate': 2.4978317432784046e-06, 'epoch': 0.07}
2%|▏ | 288/11526 [03:00<1:55:42, 1.62it/s] 3%|▎ | 289/11526 [03:01<1:55:47, 1.62it/s] {'loss': 0.5946, 'grad_norm': 0.8791469931602478, 'learning_rate': 2.5065047701647875e-06, 'epoch': 0.08}
3%|▎ | 289/11526 [03:01<1:55:47, 1.62it/s] 3%|▎ | 290/11526 [03:02<1:55:45, 1.62it/s] {'loss': 0.4731, 'grad_norm': 0.7295354604721069, 'learning_rate': 2.515177797051171e-06, 'epoch': 0.08}
3%|▎ | 290/11526 [03:02<1:55:45, 1.62it/s] 3%|▎ | 291/11526 [03:02<1:55:45, 1.62it/s] {'loss': 0.5044, 'grad_norm': 0.7993372678756714, 'learning_rate': 2.523850823937554e-06, 'epoch': 0.08}
3%|▎ | 291/11526 [03:02<1:55:45, 1.62it/s] 3%|▎ | 292/11526 [03:03<1:55:43, 1.62it/s] {'loss': 0.4275, 'grad_norm': 0.7936784029006958, 'learning_rate': 2.532523850823938e-06, 'epoch': 0.08}
3%|▎ | 292/11526 [03:03<1:55:43, 1.62it/s] 3%|▎ | 293/11526 [03:03<1:55:44, 1.62it/s] {'loss': 0.4477, 'grad_norm': 0.7995445132255554, 'learning_rate': 2.5411968777103213e-06, 'epoch': 0.08}
3%|▎ | 293/11526 [03:04<1:55:44, 1.62it/s] 3%|▎ | 294/11526 [03:04<1:55:44, 1.62it/s] {'loss': 0.444, 'grad_norm': 0.7885014414787292, 'learning_rate': 2.5498699045967047e-06, 'epoch': 0.08}
3%|▎ | 294/11526 [03:04<1:55:44, 1.62it/s] 3%|▎ | 295/11526 [03:05<1:55:32, 1.62it/s] {'loss': 0.4476, 'grad_norm': 0.7491124272346497, 'learning_rate': 2.558542931483088e-06, 'epoch': 0.08}
3%|▎ | 295/11526 [03:05<1:55:32, 1.62it/s] 3%|▎ | 296/11526 [03:05<1:55:27, 1.62it/s] {'loss': 0.4693, 'grad_norm': 0.8975275158882141, 'learning_rate': 2.567215958369471e-06, 'epoch': 0.08}
3%|▎ | 296/11526 [03:05<1:55:27, 1.62it/s] 3%|▎ | 297/11526 [03:06<1:55:28, 1.62it/s] {'loss': 0.5048, 'grad_norm': 0.7656221985816956, 'learning_rate': 2.5758889852558543e-06, 'epoch': 0.08}
3%|▎ | 297/11526 [03:06<1:55:28, 1.62it/s] 3%|▎ | 298/11526 [03:06<1:55:34, 1.62it/s] {'loss': 0.3705, 'grad_norm': 0.651519238948822, 'learning_rate': 2.5845620121422376e-06, 'epoch': 0.08}
3%|▎ | 298/11526 [03:07<1:55:34, 1.62it/s] 3%|▎ | 299/11526 [03:07<1:55:29, 1.62it/s] {'loss': 0.3507, 'grad_norm': 0.6464735865592957, 'learning_rate': 2.593235039028621e-06, 'epoch': 0.08}
3%|▎ | 299/11526 [03:07<1:55:29, 1.62it/s] 3%|▎ | 300/11526 [03:08<1:55:29, 1.62it/s] {'loss': 0.4744, 'grad_norm': 0.7804017066955566, 'learning_rate': 2.6019080659150048e-06, 'epoch': 0.08}
3%|▎ | 300/11526 [03:08<1:55:29, 1.62it/s] 3%|▎ | 301/11526 [03:08<1:55:29, 1.62it/s] {'loss': 0.4069, 'grad_norm': 0.5855188965797424, 'learning_rate': 2.610581092801388e-06, 'epoch': 0.08}
3%|▎ | 301/11526 [03:08<1:55:29, 1.62it/s] 3%|▎ | 302/11526 [03:09<1:55:24, 1.62it/s] {'loss': 0.4916, 'grad_norm': 0.7273641228675842, 'learning_rate': 2.6192541196877714e-06, 'epoch': 0.08}
3%|▎ | 302/11526 [03:09<1:55:24, 1.62it/s] 3%|▎ | 303/11526 [03:10<1:55:22, 1.62it/s] {'loss': 0.4877, 'grad_norm': 0.7542781233787537, 'learning_rate': 2.627927146574155e-06, 'epoch': 0.08}
3%|▎ | 303/11526 [03:10<1:55:22, 1.62it/s] 3%|▎ | 304/11526 [03:10<1:55:45, 1.62it/s] {'loss': 0.4429, 'grad_norm': 0.7797462344169617, 'learning_rate': 2.6366001734605377e-06, 'epoch': 0.08}
3%|▎ | 304/11526 [03:10<1:55:45, 1.62it/s] 3%|▎ | 305/11526 [03:11<1:55:42, 1.62it/s] {'loss': 0.4672, 'grad_norm': 0.8137367963790894, 'learning_rate': 2.645273200346921e-06, 'epoch': 0.08}
3%|▎ | 305/11526 [03:11<1:55:42, 1.62it/s] 3%|▎ | 306/11526 [03:11<1:55:29, 1.62it/s] {'loss': 0.4782, 'grad_norm': 0.8051887154579163, 'learning_rate': 2.6539462272333044e-06, 'epoch': 0.08}
3%|▎ | 306/11526 [03:12<1:55:29, 1.62it/s] 3%|▎ | 307/11526 [03:12<1:55:15, 1.62it/s] {'loss': 0.465, 'grad_norm': 0.8099355101585388, 'learning_rate': 2.662619254119688e-06, 'epoch': 0.08}
3%|▎ | 307/11526 [03:12<1:55:15, 1.62it/s] 3%|▎ | 308/11526 [03:13<1:55:31, 1.62it/s] {'loss': 0.4284, 'grad_norm': 0.7956909537315369, 'learning_rate': 2.6712922810060715e-06, 'epoch': 0.08}
3%|▎ | 308/11526 [03:13<1:55:31, 1.62it/s] 3%|▎ | 309/11526 [03:13<1:55:30, 1.62it/s] {'loss': 0.4518, 'grad_norm': 0.7926578521728516, 'learning_rate': 2.679965307892455e-06, 'epoch': 0.08}
3%|▎ | 309/11526 [03:13<1:55:30, 1.62it/s] 3%|▎ | 310/11526 [03:14<1:55:20, 1.62it/s] {'loss': 0.5641, 'grad_norm': 0.8888891339302063, 'learning_rate': 2.6886383347788382e-06, 'epoch': 0.08}
3%|▎ | 310/11526 [03:14<1:55:20, 1.62it/s] 3%|▎ | 311/11526 [03:15<1:55:17, 1.62it/s] {'loss': 0.4287, 'grad_norm': 0.799744188785553, 'learning_rate': 2.697311361665221e-06, 'epoch': 0.08}
3%|▎ | 311/11526 [03:15<1:55:17, 1.62it/s] 3%|▎ | 312/11526 [03:15<1:55:15, 1.62it/s] {'loss': 0.6268, 'grad_norm': 1.3034987449645996, 'learning_rate': 2.7059843885516045e-06, 'epoch': 0.08}
3%|▎ | 312/11526 [03:15<1:55:15, 1.62it/s] 3%|▎ | 313/11526 [03:16<1:55:19, 1.62it/s] {'loss': 0.4431, 'grad_norm': 0.6842606067657471, 'learning_rate': 2.714657415437988e-06, 'epoch': 0.08}
3%|▎ | 313/11526 [03:16<1:55:19, 1.62it/s] 3%|▎ | 314/11526 [03:16<1:55:22, 1.62it/s] {'loss': 0.4107, 'grad_norm': 0.7855454087257385, 'learning_rate': 2.723330442324371e-06, 'epoch': 0.08}
3%|▎ | 314/11526 [03:17<1:55:22, 1.62it/s] 3%|▎ | 315/11526 [03:17<1:55:16, 1.62it/s] {'loss': 0.4548, 'grad_norm': 0.7047145366668701, 'learning_rate': 2.732003469210755e-06, 'epoch': 0.08}
3%|▎ | 315/11526 [03:17<1:55:16, 1.62it/s] 3%|▎ | 316/11526 [03:18<1:55:24, 1.62it/s] {'loss': 0.4521, 'grad_norm': 0.8328726887702942, 'learning_rate': 2.7406764960971383e-06, 'epoch': 0.08}
3%|▎ | 316/11526 [03:18<1:55:24, 1.62it/s] 3%|▎ | 317/11526 [03:18<1:55:20, 1.62it/s] {'loss': 0.4983, 'grad_norm': 0.8834674954414368, 'learning_rate': 2.7493495229835217e-06, 'epoch': 0.08}
3%|▎ | 317/11526 [03:18<1:55:20, 1.62it/s] 3%|▎ | 318/11526 [03:19<1:55:22, 1.62it/s] {'loss': 0.4728, 'grad_norm': 0.8137502074241638, 'learning_rate': 2.758022549869905e-06, 'epoch': 0.08}
3%|▎ | 318/11526 [03:19<1:55:22, 1.62it/s] 3%|▎ | 319/11526 [03:19<1:55:27, 1.62it/s] {'loss': 0.5403, 'grad_norm': 0.8027554750442505, 'learning_rate': 2.766695576756288e-06, 'epoch': 0.08}
3%|▎ | 319/11526 [03:20<1:55:27, 1.62it/s] 3%|▎ | 320/11526 [03:20<1:55:25, 1.62it/s] {'loss': 0.6087, 'grad_norm': 0.8172733187675476, 'learning_rate': 2.7753686036426713e-06, 'epoch': 0.08}
3%|▎ | 320/11526 [03:20<1:55:25, 1.62it/s] 3%|▎ | 321/11526 [03:21<1:55:11, 1.62it/s] {'loss': 0.5812, 'grad_norm': 1.0325490236282349, 'learning_rate': 2.7840416305290546e-06, 'epoch': 0.08}
3%|▎ | 321/11526 [03:21<1:55:11, 1.62it/s] 3%|▎ | 322/11526 [03:21<1:55:08, 1.62it/s] {'loss': 0.5237, 'grad_norm': 0.7511606812477112, 'learning_rate': 2.7927146574154384e-06, 'epoch': 0.08}
3%|▎ | 322/11526 [03:21<1:55:08, 1.62it/s] 3%|▎ | 323/11526 [03:22<1:55:14, 1.62it/s] {'loss': 0.4585, 'grad_norm': 0.6904565095901489, 'learning_rate': 2.8013876843018218e-06, 'epoch': 0.08}
3%|▎ | 323/11526 [03:22<1:55:14, 1.62it/s] 3%|▎ | 324/11526 [03:23<1:55:18, 1.62it/s] {'loss': 0.4195, 'grad_norm': 0.6626433730125427, 'learning_rate': 2.810060711188205e-06, 'epoch': 0.08}
3%|▎ | 324/11526 [03:23<1:55:18, 1.62it/s] 3%|▎ | 325/11526 [03:23<1:55:24, 1.62it/s] {'loss': 0.5578, 'grad_norm': 0.8128219246864319, 'learning_rate': 2.8187337380745884e-06, 'epoch': 0.08}
3%|▎ | 325/11526 [03:23<1:55:24, 1.62it/s] 3%|▎ | 326/11526 [03:24<1:55:27, 1.62it/s] {'loss': 0.4964, 'grad_norm': 0.7322054505348206, 'learning_rate': 2.8274067649609714e-06, 'epoch': 0.08}
3%|▎ | 326/11526 [03:24<1:55:27, 1.62it/s] 3%|▎ | 327/11526 [03:24<1:55:26, 1.62it/s] {'loss': 0.4671, 'grad_norm': 0.6976147294044495, 'learning_rate': 2.8360797918473547e-06, 'epoch': 0.09}
3%|▎ | 327/11526 [03:25<1:55:26, 1.62it/s] 3%|▎ | 328/11526 [03:25<1:55:21, 1.62it/s] {'loss': 0.4694, 'grad_norm': 0.6896835565567017, 'learning_rate': 2.844752818733738e-06, 'epoch': 0.09}
3%|▎ | 328/11526 [03:25<1:55:21, 1.62it/s] 3%|▎ | 329/11526 [03:26<1:55:32, 1.62it/s] {'loss': 0.4406, 'grad_norm': 0.7392511367797852, 'learning_rate': 2.853425845620122e-06, 'epoch': 0.09}
3%|▎ | 329/11526 [03:26<1:55:32, 1.62it/s] 3%|▎ | 330/11526 [03:26<1:55:27, 1.62it/s] {'loss': 0.3702, 'grad_norm': 0.6997090578079224, 'learning_rate': 2.862098872506505e-06, 'epoch': 0.09}
3%|▎ | 330/11526 [03:26<1:55:27, 1.62it/s] 3%|▎ | 331/11526 [03:27<2:02:48, 1.52it/s] {'loss': 0.5204, 'grad_norm': 0.7851254940032959, 'learning_rate': 2.8707718993928885e-06, 'epoch': 0.09}
3%|▎ | 331/11526 [03:27<2:02:48, 1.52it/s] 3%|▎ | 332/11526 [03:28<2:00:32, 1.55it/s] {'loss': 0.4849, 'grad_norm': 0.871063768863678, 'learning_rate': 2.879444926279272e-06, 'epoch': 0.09}
3%|▎ | 332/11526 [03:28<2:00:32, 1.55it/s] 3%|▎ | 333/11526 [03:28<1:58:59, 1.57it/s] {'loss': 0.4297, 'grad_norm': 0.6995636224746704, 'learning_rate': 2.8881179531656552e-06, 'epoch': 0.09}
3%|▎ | 333/11526 [03:28<1:58:59, 1.57it/s] 3%|▎ | 334/11526 [03:29<1:57:51, 1.58it/s] {'loss': 0.5015, 'grad_norm': 0.817538857460022, 'learning_rate': 2.896790980052038e-06, 'epoch': 0.09}
3%|▎ | 334/11526 [03:29<1:57:51, 1.58it/s] 3%|▎ | 335/11526 [03:29<1:57:03, 1.59it/s] {'loss': 0.4899, 'grad_norm': 0.8032779097557068, 'learning_rate': 2.9054640069384215e-06, 'epoch': 0.09}
3%|▎ | 335/11526 [03:30<1:57:03, 1.59it/s] 3%|▎ | 336/11526 [03:30<1:56:20, 1.60it/s] {'loss': 0.4382, 'grad_norm': 0.7510805130004883, 'learning_rate': 2.914137033824805e-06, 'epoch': 0.09}
3%|▎ | 336/11526 [03:30<1:56:20, 1.60it/s] 3%|▎ | 337/11526 [03:31<1:56:04, 1.61it/s] {'loss': 0.53, 'grad_norm': 0.8694416284561157, 'learning_rate': 2.9228100607111886e-06, 'epoch': 0.09}
3%|▎ | 337/11526 [03:31<1:56:04, 1.61it/s] 3%|▎ | 338/11526 [03:31<1:55:47, 1.61it/s] {'loss': 0.3858, 'grad_norm': 0.5970102548599243, 'learning_rate': 2.931483087597572e-06, 'epoch': 0.09}
3%|▎ | 338/11526 [03:31<1:55:47, 1.61it/s] 3%|▎ | 339/11526 [03:32<1:55:36, 1.61it/s] {'loss': 0.5437, 'grad_norm': 0.7555249333381653, 'learning_rate': 2.9401561144839553e-06, 'epoch': 0.09}
3%|▎ | 339/11526 [03:32<1:55:36, 1.61it/s] 3%|▎ | 340/11526 [03:33<1:55:27, 1.61it/s] {'loss': 0.396, 'grad_norm': 0.655350387096405, 'learning_rate': 2.9488291413703387e-06, 'epoch': 0.09}
3%|▎ | 340/11526 [03:33<1:55:27, 1.61it/s] 3%|▎ | 341/11526 [03:33<1:55:13, 1.62it/s] {'loss': 0.4858, 'grad_norm': 0.7930605411529541, 'learning_rate': 2.9575021682567216e-06, 'epoch': 0.09}
3%|▎ | 341/11526 [03:33<1:55:13, 1.62it/s] 3%|▎ | 342/11526 [03:34<1:55:27, 1.61it/s] {'loss': 0.5224, 'grad_norm': 0.8194026947021484, 'learning_rate': 2.966175195143105e-06, 'epoch': 0.09}
3%|▎ | 342/11526 [03:34<1:55:27, 1.61it/s] 3%|▎ | 343/11526 [03:34<1:55:19, 1.62it/s] {'loss': 0.5006, 'grad_norm': 0.8258594274520874, 'learning_rate': 2.9748482220294883e-06, 'epoch': 0.09}
3%|▎ | 343/11526 [03:35<1:55:19, 1.62it/s] 3%|▎ | 344/11526 [03:35<1:55:10, 1.62it/s] {'loss': 0.3935, 'grad_norm': 0.7128439545631409, 'learning_rate': 2.983521248915872e-06, 'epoch': 0.09}
3%|▎ | 344/11526 [03:35<1:55:10, 1.62it/s] 3%|▎ | 345/11526 [03:36<1:54:56, 1.62it/s] {'loss': 0.6021, 'grad_norm': 0.9631819128990173, 'learning_rate': 2.9921942758022554e-06, 'epoch': 0.09}
3%|▎ | 345/11526 [03:36<1:54:56, 1.62it/s] 3%|▎ | 346/11526 [03:36<1:54:53, 1.62it/s] {'loss': 0.6476, 'grad_norm': 0.9687549471855164, 'learning_rate': 3.0008673026886387e-06, 'epoch': 0.09}
3%|▎ | 346/11526 [03:36<1:54:53, 1.62it/s] 3%|▎ | 347/11526 [03:37<1:55:08, 1.62it/s] {'loss': 0.4889, 'grad_norm': 0.72234046459198, 'learning_rate': 3.009540329575022e-06, 'epoch': 0.09}
3%|▎ | 347/11526 [03:37<1:55:08, 1.62it/s] 3%|▎ | 348/11526 [03:38<1:55:05, 1.62it/s] {'loss': 0.5986, 'grad_norm': 0.7172614932060242, 'learning_rate': 3.018213356461405e-06, 'epoch': 0.09}
3%|▎ | 348/11526 [03:38<1:55:05, 1.62it/s] 3%|▎ | 349/11526 [03:38<1:55:05, 1.62it/s] {'loss': 0.3709, 'grad_norm': 0.6439437866210938, 'learning_rate': 3.0268863833477884e-06, 'epoch': 0.09}
3%|▎ | 349/11526 [03:38<1:55:05, 1.62it/s] 3%|▎ | 350/11526 [03:39<1:55:03, 1.62it/s] {'loss': 0.4452, 'grad_norm': 0.7908746004104614, 'learning_rate': 3.0355594102341717e-06, 'epoch': 0.09}
3%|▎ | 350/11526 [03:39<1:55:03, 1.62it/s] 3%|▎ | 351/11526 [03:39<1:55:02, 1.62it/s] {'loss': 0.4234, 'grad_norm': 0.761121928691864, 'learning_rate': 3.044232437120555e-06, 'epoch': 0.09}
3%|▎ | 351/11526 [03:39<1:55:02, 1.62it/s] 3%|▎ | 352/11526 [03:40<1:55:12, 1.62it/s] {'loss': 0.4074, 'grad_norm': 0.6333280801773071, 'learning_rate': 3.052905464006939e-06, 'epoch': 0.09}
3%|▎ | 352/11526 [03:40<1:55:12, 1.62it/s] 3%|▎ | 353/11526 [03:41<1:55:08, 1.62it/s] {'loss': 0.4224, 'grad_norm': 0.7365403175354004, 'learning_rate': 3.061578490893322e-06, 'epoch': 0.09}
3%|▎ | 353/11526 [03:41<1:55:08, 1.62it/s] 3%|▎ | 354/11526 [03:41<1:55:06, 1.62it/s] {'loss': 0.4734, 'grad_norm': 0.779144823551178, 'learning_rate': 3.0702515177797055e-06, 'epoch': 0.09}
3%|▎ | 354/11526 [03:41<1:55:06, 1.62it/s] 3%|▎ | 355/11526 [03:42<1:55:02, 1.62it/s] {'loss': 0.4679, 'grad_norm': 0.7273516654968262, 'learning_rate': 3.078924544666089e-06, 'epoch': 0.09}
3%|▎ | 355/11526 [03:42<1:55:02, 1.62it/s] 3%|▎ | 356/11526 [03:42<1:55:00, 1.62it/s] {'loss': 0.3904, 'grad_norm': 0.6788473725318909, 'learning_rate': 3.087597571552472e-06, 'epoch': 0.09}
3%|▎ | 356/11526 [03:43<1:55:00, 1.62it/s] 3%|▎ | 357/11526 [03:43<1:55:05, 1.62it/s] {'loss': 0.3855, 'grad_norm': 0.6568129062652588, 'learning_rate': 3.096270598438855e-06, 'epoch': 0.09}
3%|▎ | 357/11526 [03:43<1:55:05, 1.62it/s] 3%|▎ | 358/11526 [03:44<1:55:04, 1.62it/s] {'loss': 0.5213, 'grad_norm': 0.7990548610687256, 'learning_rate': 3.1049436253252385e-06, 'epoch': 0.09}
3%|▎ | 358/11526 [03:44<1:55:04, 1.62it/s] 3%|▎ | 359/11526 [03:44<1:54:59, 1.62it/s] {'loss': 0.4633, 'grad_norm': 0.7317043542861938, 'learning_rate': 3.1136166522116223e-06, 'epoch': 0.09}
3%|▎ | 359/11526 [03:44<1:54:59, 1.62it/s] 3%|▎ | 360/11526 [03:45<1:55:03, 1.62it/s] {'loss': 0.4521, 'grad_norm': 0.8358280658721924, 'learning_rate': 3.1222896790980056e-06, 'epoch': 0.09}
3%|▎ | 360/11526 [03:45<1:55:03, 1.62it/s] 3%|▎ | 361/11526 [03:46<1:55:05, 1.62it/s] {'loss': 0.4556, 'grad_norm': 0.7991945743560791, 'learning_rate': 3.130962705984389e-06, 'epoch': 0.09}
3%|▎ | 361/11526 [03:46<1:55:05, 1.62it/s] 3%|▎ | 362/11526 [03:46<1:55:15, 1.61it/s] {'loss': 0.5447, 'grad_norm': 0.6881815195083618, 'learning_rate': 3.1396357328707723e-06, 'epoch': 0.09}
3%|▎ | 362/11526 [03:46<1:55:15, 1.61it/s] 3%|▎ | 363/11526 [03:47<1:55:12, 1.61it/s] {'loss': 0.4815, 'grad_norm': 0.6540077328681946, 'learning_rate': 3.1483087597571552e-06, 'epoch': 0.09}
3%|▎ | 363/11526 [03:47<1:55:12, 1.61it/s] 3%|▎ | 364/11526 [03:47<1:55:08, 1.62it/s] {'loss': 0.4687, 'grad_norm': 0.7201825380325317, 'learning_rate': 3.1569817866435386e-06, 'epoch': 0.09}
3%|▎ | 364/11526 [03:48<1:55:08, 1.62it/s] 3%|▎ | 365/11526 [03:48<1:55:03, 1.62it/s] {'loss': 0.4083, 'grad_norm': 0.6731961369514465, 'learning_rate': 3.165654813529922e-06, 'epoch': 0.1}
3%|▎ | 365/11526 [03:48<1:55:03, 1.62it/s] 3%|▎ | 366/11526 [03:49<1:54:59, 1.62it/s] {'loss': 0.5027, 'grad_norm': 0.7996076345443726, 'learning_rate': 3.1743278404163057e-06, 'epoch': 0.1}
3%|▎ | 366/11526 [03:49<1:54:59, 1.62it/s] 3%|▎ | 367/11526 [03:49<1:55:02, 1.62it/s] {'loss': 0.4292, 'grad_norm': 0.6320903301239014, 'learning_rate': 3.183000867302689e-06, 'epoch': 0.1}
3%|▎ | 367/11526 [03:49<1:55:02, 1.62it/s] 3%|▎ | 368/11526 [03:50<1:54:58, 1.62it/s] {'loss': 0.3954, 'grad_norm': 0.8092179894447327, 'learning_rate': 3.1916738941890724e-06, 'epoch': 0.1}
3%|▎ | 368/11526 [03:50<1:54:58, 1.62it/s] 3%|▎ | 369/11526 [03:50<1:54:55, 1.62it/s] {'loss': 0.435, 'grad_norm': 0.8509931564331055, 'learning_rate': 3.2003469210754557e-06, 'epoch': 0.1}
3%|▎ | 369/11526 [03:51<1:54:55, 1.62it/s] 3%|▎ | 370/11526 [03:51<1:54:54, 1.62it/s] {'loss': 0.4669, 'grad_norm': 0.686587393283844, 'learning_rate': 3.209019947961839e-06, 'epoch': 0.1}
3%|▎ | 370/11526 [03:51<1:54:54, 1.62it/s] 3%|▎ | 371/11526 [03:52<1:54:40, 1.62it/s] {'loss': 0.4668, 'grad_norm': 0.7930492758750916, 'learning_rate': 3.217692974848222e-06, 'epoch': 0.1}
3%|▎ | 371/11526 [03:52<1:54:40, 1.62it/s] 3%|▎ | 372/11526 [03:52<1:54:50, 1.62it/s] {'loss': 0.5656, 'grad_norm': 0.73323655128479, 'learning_rate': 3.2263660017346054e-06, 'epoch': 0.1}
3%|▎ | 372/11526 [03:52<1:54:50, 1.62it/s] 3%|▎ | 373/11526 [03:53<1:54:51, 1.62it/s] {'loss': 0.5403, 'grad_norm': 0.8316366672515869, 'learning_rate': 3.2350390286209887e-06, 'epoch': 0.1}
3%|▎ | 373/11526 [03:53<1:54:51, 1.62it/s] 3%|▎ | 374/11526 [03:54<1:54:50, 1.62it/s] {'loss': 0.4938, 'grad_norm': 0.7903643250465393, 'learning_rate': 3.2437120555073725e-06, 'epoch': 0.1}
3%|▎ | 374/11526 [03:54<1:54:50, 1.62it/s] 3%|▎ | 375/11526 [03:54<1:54:50, 1.62it/s] {'loss': 0.5318, 'grad_norm': 0.7630770802497864, 'learning_rate': 3.252385082393756e-06, 'epoch': 0.1}
3%|▎ | 375/11526 [03:54<1:54:50, 1.62it/s] 3%|▎ | 376/11526 [03:55<1:54:47, 1.62it/s] {'loss': 0.3796, 'grad_norm': 0.7427902221679688, 'learning_rate': 3.261058109280139e-06, 'epoch': 0.1}
3%|▎ | 376/11526 [03:55<1:54:47, 1.62it/s] 3%|▎ | 377/11526 [03:55<1:54:51, 1.62it/s] {'loss': 0.4446, 'grad_norm': 0.7550668716430664, 'learning_rate': 3.2697311361665225e-06, 'epoch': 0.1}
3%|▎ | 377/11526 [03:56<1:54:51, 1.62it/s] 3%|▎ | 378/11526 [03:56<1:54:51, 1.62it/s] {'loss': 0.3405, 'grad_norm': 0.6073533296585083, 'learning_rate': 3.2784041630529055e-06, 'epoch': 0.1}
3%|▎ | 378/11526 [03:56<1:54:51, 1.62it/s] 3%|▎ | 379/11526 [03:57<1:54:51, 1.62it/s] {'loss': 0.475, 'grad_norm': 0.7896513938903809, 'learning_rate': 3.287077189939289e-06, 'epoch': 0.1}
3%|▎ | 379/11526 [03:57<1:54:51, 1.62it/s] 3%|▎ | 380/11526 [03:57<1:54:46, 1.62it/s] {'loss': 0.404, 'grad_norm': 0.6171608567237854, 'learning_rate': 3.295750216825672e-06, 'epoch': 0.1}
3%|▎ | 380/11526 [03:57<1:54:46, 1.62it/s] 3%|▎ | 381/11526 [03:58<1:54:46, 1.62it/s] {'loss': 0.578, 'grad_norm': 0.9408411383628845, 'learning_rate': 3.304423243712056e-06, 'epoch': 0.1}
3%|▎ | 381/11526 [03:58<1:54:46, 1.62it/s] 3%|▎ | 382/11526 [03:59<1:54:59, 1.62it/s] {'loss': 0.4733, 'grad_norm': 0.6848828196525574, 'learning_rate': 3.3130962705984393e-06, 'epoch': 0.1}
3%|▎ | 382/11526 [03:59<1:54:59, 1.62it/s] 3%|▎ | 383/11526 [03:59<1:54:54, 1.62it/s] {'loss': 0.3978, 'grad_norm': 0.7133733034133911, 'learning_rate': 3.3217692974848226e-06, 'epoch': 0.1}
3%|▎ | 383/11526 [03:59<1:54:54, 1.62it/s] 3%|▎ | 384/11526 [04:00<1:54:52, 1.62it/s] {'loss': 0.3843, 'grad_norm': 0.6871532797813416, 'learning_rate': 3.330442324371206e-06, 'epoch': 0.1}
3%|▎ | 384/11526 [04:00<1:54:52, 1.62it/s] 3%|▎ | 385/11526 [04:00<1:54:48, 1.62it/s] {'loss': 0.4096, 'grad_norm': 0.7488415837287903, 'learning_rate': 3.3391153512575893e-06, 'epoch': 0.1}
3%|▎ | 385/11526 [04:01<1:54:48, 1.62it/s] 3%|▎ | 386/11526 [04:01<1:54:44, 1.62it/s] {'loss': 0.4127, 'grad_norm': 0.6648820638656616, 'learning_rate': 3.3477883781439722e-06, 'epoch': 0.1}
3%|▎ | 386/11526 [04:01<1:54:44, 1.62it/s] 3%|▎ | 387/11526 [04:02<1:54:49, 1.62it/s] {'loss': 0.5066, 'grad_norm': 0.7179889678955078, 'learning_rate': 3.3564614050303556e-06, 'epoch': 0.1}
3%|▎ | 387/11526 [04:02<1:54:49, 1.62it/s] 3%|▎ | 388/11526 [04:02<1:54:45, 1.62it/s] {'loss': 0.3949, 'grad_norm': 0.7137555480003357, 'learning_rate': 3.365134431916739e-06, 'epoch': 0.1}
3%|▎ | 388/11526 [04:02<1:54:45, 1.62it/s] 3%|▎ | 389/11526 [04:03<1:54:42, 1.62it/s] {'loss': 0.5081, 'grad_norm': 1.0987062454223633, 'learning_rate': 3.3738074588031227e-06, 'epoch': 0.1}
3%|▎ | 389/11526 [04:03<1:54:42, 1.62it/s] 3%|▎ | 390/11526 [04:04<1:58:43, 1.56it/s] {'loss': 0.3669, 'grad_norm': 0.6701561808586121, 'learning_rate': 3.382480485689506e-06, 'epoch': 0.1}
3%|▎ | 390/11526 [04:04<1:58:43, 1.56it/s] 3%|▎ | 391/11526 [04:04<1:57:58, 1.57it/s] {'loss': 0.5143, 'grad_norm': 0.8905373811721802, 'learning_rate': 3.3911535125758894e-06, 'epoch': 0.1}
3%|▎ | 391/11526 [04:04<1:57:58, 1.57it/s] 3%|▎ | 392/11526 [04:05<1:57:06, 1.58it/s] {'loss': 0.4003, 'grad_norm': 0.7190313935279846, 'learning_rate': 3.3998265394622727e-06, 'epoch': 0.1}
3%|▎ | 392/11526 [04:05<1:57:06, 1.58it/s] 3%|▎ | 393/11526 [04:05<1:56:18, 1.60it/s] {'loss': 0.5048, 'grad_norm': 0.8081957697868347, 'learning_rate': 3.4084995663486557e-06, 'epoch': 0.1}
3%|▎ | 393/11526 [04:06<1:56:18, 1.60it/s] 3%|▎ | 394/11526 [04:06<1:55:41, 1.60it/s] {'loss': 0.5092, 'grad_norm': 0.8248569369316101, 'learning_rate': 3.417172593235039e-06, 'epoch': 0.1}
3%|▎ | 394/11526 [04:06<1:55:41, 1.60it/s] 3%|▎ | 395/11526 [04:07<1:55:27, 1.61it/s] {'loss': 0.4935, 'grad_norm': 0.8161707520484924, 'learning_rate': 3.4258456201214224e-06, 'epoch': 0.1}
3%|▎ | 395/11526 [04:07<1:55:27, 1.61it/s] 3%|▎ | 396/11526 [04:07<1:55:10, 1.61it/s] {'loss': 0.4303, 'grad_norm': 0.682453453540802, 'learning_rate': 3.434518647007806e-06, 'epoch': 0.1}
3%|▎ | 396/11526 [04:07<1:55:10, 1.61it/s] 3%|▎ | 397/11526 [04:08<1:54:58, 1.61it/s] {'loss': 0.5531, 'grad_norm': 0.8073570132255554, 'learning_rate': 3.4431916738941895e-06, 'epoch': 0.1}
3%|▎ | 397/11526 [04:08<1:54:58, 1.61it/s] 3%|▎ | 398/11526 [04:08<1:54:43, 1.62it/s] {'loss': 0.4117, 'grad_norm': 0.7479109168052673, 'learning_rate': 3.451864700780573e-06, 'epoch': 0.1}
3%|▎ | 398/11526 [04:09<1:54:43, 1.62it/s] 3%|▎ | 399/11526 [04:09<1:54:39, 1.62it/s] {'loss': 0.546, 'grad_norm': 0.918054461479187, 'learning_rate': 3.460537727666956e-06, 'epoch': 0.1}
3%|▎ | 399/11526 [04:09<1:54:39, 1.62it/s] 3%|▎ | 400/11526 [04:10<1:54:55, 1.61it/s] {'loss': 0.4294, 'grad_norm': 0.8408240079879761, 'learning_rate': 3.4692107545533395e-06, 'epoch': 0.1}
3%|▎ | 400/11526 [04:10<1:54:55, 1.61it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.09it/s]
31%|███ | 4/13 [00:00<00:01, 8.30it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.71it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.30it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.08it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.92it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.83it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.75it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.70it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.65it/s]
100%|██████████| 13/13 [00:01<00:00, 6.69it/s]
{'eval_loss': 0.8436405062675476, 'eval_runtime': 1.9754, 'eval_samples_per_second': 101.248, 'eval_steps_per_second': 6.581, 'epoch': 0.1}
3%|▎ | 400/11526 [04:12<1:54:55, 1.61it/s]
100%|██████████| 13/13 [00:01<00:00, 6.69it/s]
 3%|▎ | 401/11526 [04:12<3:45:02, 1.21s/it] {'loss': 0.4142, 'grad_norm': 0.7738603353500366, 'learning_rate': 3.4778837814397225e-06, 'epoch': 0.1}
3%|▎ | 401/11526 [04:12<3:45:02, 1.21s/it] 3%|▎ | 402/11526 [04:13<3:11:59, 1.04s/it] {'loss': 0.46, 'grad_norm': 0.9230268597602844, 'learning_rate': 3.486556808326106e-06, 'epoch': 0.1}
3%|▎ | 402/11526 [04:13<3:11:59, 1.04s/it] 3%|▎ | 403/11526 [04:14<2:48:45, 1.10it/s] {'loss': 0.3884, 'grad_norm': 0.7134712934494019, 'learning_rate': 3.4952298352124896e-06, 'epoch': 0.1}
3%|▎ | 403/11526 [04:14<2:48:45, 1.10it/s] 4%|▎ | 404/11526 [04:14<2:32:26, 1.22it/s] {'loss': 0.4169, 'grad_norm': 0.6907707452774048, 'learning_rate': 3.503902862098873e-06, 'epoch': 0.11}
4%|▎ | 404/11526 [04:14<2:32:26, 1.22it/s] 4%|▎ | 405/11526 [04:15<2:21:08, 1.31it/s] {'loss': 0.5511, 'grad_norm': 0.7192000150680542, 'learning_rate': 3.5125758889852563e-06, 'epoch': 0.11}
4%|▎ | 405/11526 [04:15<2:21:08, 1.31it/s] 4%|▎ | 406/11526 [04:15<2:13:07, 1.39it/s] {'loss': 0.4347, 'grad_norm': 0.6796731352806091, 'learning_rate': 3.5212489158716396e-06, 'epoch': 0.11}
4%|▎ | 406/11526 [04:16<2:13:07, 1.39it/s] 4%|▎ | 407/11526 [04:16<2:07:38, 1.45it/s] {'loss': 0.3678, 'grad_norm': 0.6644582748413086, 'learning_rate': 3.529921942758023e-06, 'epoch': 0.11}
4%|▎ | 407/11526 [04:16<2:07:38, 1.45it/s] 4%|▎ | 408/11526 [04:17<2:03:40, 1.50it/s] {'loss': 0.4568, 'grad_norm': 0.7975987792015076, 'learning_rate': 3.538594969644406e-06, 'epoch': 0.11}
4%|▎ | 408/11526 [04:17<2:03:40, 1.50it/s] 4%|▎ | 409/11526 [04:17<2:00:49, 1.53it/s] {'loss': 0.4545, 'grad_norm': 0.7411627769470215, 'learning_rate': 3.5472679965307892e-06, 'epoch': 0.11}
4%|▎ | 409/11526 [04:17<2:00:49, 1.53it/s] 4%|▎ | 410/11526 [04:18<1:59:01, 1.56it/s] {'loss': 0.4496, 'grad_norm': 0.7292632460594177, 'learning_rate': 3.5559410234171726e-06, 'epoch': 0.11}
4%|▎ | 410/11526 [04:18<1:59:01, 1.56it/s] 4%|▎ | 411/11526 [04:19<1:57:41, 1.57it/s] {'loss': 0.3314, 'grad_norm': 0.6336116194725037, 'learning_rate': 3.5646140503035564e-06, 'epoch': 0.11}
4%|▎ | 411/11526 [04:19<1:57:41, 1.57it/s] 4%|▎ | 412/11526 [04:19<1:56:50, 1.59it/s] {'loss': 0.4308, 'grad_norm': 0.7815155982971191, 'learning_rate': 3.5732870771899397e-06, 'epoch': 0.11}
4%|▎ | 412/11526 [04:19<1:56:50, 1.59it/s] 4%|▎ | 413/11526 [04:20<1:56:03, 1.60it/s] {'loss': 0.3587, 'grad_norm': 0.6310390830039978, 'learning_rate': 3.581960104076323e-06, 'epoch': 0.11}
4%|▎ | 413/11526 [04:20<1:56:03, 1.60it/s] 4%|▎ | 414/11526 [04:20<1:55:29, 1.60it/s] {'loss': 0.4195, 'grad_norm': 0.6449280381202698, 'learning_rate': 3.5906331309627064e-06, 'epoch': 0.11}
4%|▎ | 414/11526 [04:21<1:55:29, 1.60it/s] 4%|▎ | 415/11526 [04:21<1:55:03, 1.61it/s] {'loss': 0.4418, 'grad_norm': 0.7777904272079468, 'learning_rate': 3.5993061578490897e-06, 'epoch': 0.11}
4%|▎ | 415/11526 [04:21<1:55:03, 1.61it/s] 4%|▎ | 416/11526 [04:22<1:54:47, 1.61it/s] {'loss': 0.4489, 'grad_norm': 0.90255206823349, 'learning_rate': 3.6079791847354727e-06, 'epoch': 0.11}
4%|▎ | 416/11526 [04:22<1:54:47, 1.61it/s] 4%|▎ | 417/11526 [04:22<1:54:32, 1.62it/s] {'loss': 0.5081, 'grad_norm': 0.8227331042289734, 'learning_rate': 3.616652211621856e-06, 'epoch': 0.11}
4%|▎ | 417/11526 [04:22<1:54:32, 1.62it/s] 4%|▎ | 418/11526 [04:23<1:54:20, 1.62it/s] {'loss': 0.479, 'grad_norm': 0.8722947239875793, 'learning_rate': 3.62532523850824e-06, 'epoch': 0.11}
4%|▎ | 418/11526 [04:23<1:54:20, 1.62it/s] 4%|▎ | 419/11526 [04:23<1:54:13, 1.62it/s] {'loss': 0.5748, 'grad_norm': 0.7339964509010315, 'learning_rate': 3.633998265394623e-06, 'epoch': 0.11}
4%|▎ | 419/11526 [04:24<1:54:13, 1.62it/s] 4%|▎ | 420/11526 [04:24<1:54:03, 1.62it/s] {'loss': 0.4204, 'grad_norm': 0.7038097977638245, 'learning_rate': 3.6426712922810065e-06, 'epoch': 0.11}
4%|▎ | 420/11526 [04:24<1:54:03, 1.62it/s] 4%|▎ | 421/11526 [04:25<1:54:05, 1.62it/s] {'loss': 0.4475, 'grad_norm': 0.8147553205490112, 'learning_rate': 3.65134431916739e-06, 'epoch': 0.11}
4%|▎ | 421/11526 [04:25<1:54:05, 1.62it/s] 4%|▎ | 422/11526 [04:25<1:54:06, 1.62it/s] {'loss': 0.4035, 'grad_norm': 0.8670389652252197, 'learning_rate': 3.660017346053773e-06, 'epoch': 0.11}
4%|▎ | 422/11526 [04:25<1:54:06, 1.62it/s] 4%|▎ | 423/11526 [04:26<1:54:02, 1.62it/s] {'loss': 0.4396, 'grad_norm': 0.7860738039016724, 'learning_rate': 3.668690372940156e-06, 'epoch': 0.11}
4%|▎ | 423/11526 [04:26<1:54:02, 1.62it/s] 4%|▎ | 424/11526 [04:27<1:54:04, 1.62it/s] {'loss': 0.4721, 'grad_norm': 0.6939603090286255, 'learning_rate': 3.6773633998265395e-06, 'epoch': 0.11}
4%|▎ | 424/11526 [04:27<1:54:04, 1.62it/s] 4%|▎ | 425/11526 [04:27<1:54:09, 1.62it/s] {'loss': 0.436, 'grad_norm': 0.6765546202659607, 'learning_rate': 3.6860364267129232e-06, 'epoch': 0.11}
4%|▎ | 425/11526 [04:27<1:54:09, 1.62it/s] 4%|▎ | 426/11526 [04:28<1:54:09, 1.62it/s] {'loss': 0.3838, 'grad_norm': 0.7155818343162537, 'learning_rate': 3.6947094535993066e-06, 'epoch': 0.11}
4%|▎ | 426/11526 [04:28<1:54:09, 1.62it/s] 4%|▎ | 427/11526 [04:28<1:54:06, 1.62it/s] {'loss': 0.4598, 'grad_norm': 0.7969728112220764, 'learning_rate': 3.70338248048569e-06, 'epoch': 0.11}
4%|▎ | 427/11526 [04:29<1:54:06, 1.62it/s] 4%|▎ | 428/11526 [04:29<1:54:05, 1.62it/s] {'loss': 0.3867, 'grad_norm': 0.7485313415527344, 'learning_rate': 3.7120555073720733e-06, 'epoch': 0.11}
4%|▎ | 428/11526 [04:29<1:54:05, 1.62it/s] 4%|▎ | 429/11526 [04:30<1:54:06, 1.62it/s] {'loss': 0.4351, 'grad_norm': 0.7587866187095642, 'learning_rate': 3.7207285342584566e-06, 'epoch': 0.11}
4%|▎ | 429/11526 [04:30<1:54:06, 1.62it/s] 4%|▎ | 430/11526 [04:30<1:54:10, 1.62it/s] {'loss': 0.4555, 'grad_norm': 0.7470924258232117, 'learning_rate': 3.72940156114484e-06, 'epoch': 0.11}
4%|▎ | 430/11526 [04:30<1:54:10, 1.62it/s] 4%|▎ | 431/11526 [04:31<1:54:13, 1.62it/s] {'loss': 0.5735, 'grad_norm': 0.879463791847229, 'learning_rate': 3.738074588031223e-06, 'epoch': 0.11}
4%|▎ | 431/11526 [04:31<1:54:13, 1.62it/s] 4%|▎ | 432/11526 [04:31<1:54:09, 1.62it/s] {'loss': 0.4335, 'grad_norm': 0.7269470691680908, 'learning_rate': 3.7467476149176062e-06, 'epoch': 0.11}
4%|▎ | 432/11526 [04:32<1:54:09, 1.62it/s] 4%|▍ | 433/11526 [04:32<1:54:06, 1.62it/s] {'loss': 0.4701, 'grad_norm': 0.6932854056358337, 'learning_rate': 3.75542064180399e-06, 'epoch': 0.11}
4%|▍ | 433/11526 [04:32<1:54:06, 1.62it/s] 4%|▍ | 434/11526 [04:33<1:54:05, 1.62it/s] {'loss': 0.4949, 'grad_norm': 0.7623521685600281, 'learning_rate': 3.7640936686903734e-06, 'epoch': 0.11}
4%|▍ | 434/11526 [04:33<1:54:05, 1.62it/s] 4%|▍ | 435/11526 [04:33<1:54:12, 1.62it/s] {'loss': 0.4118, 'grad_norm': 0.7146267294883728, 'learning_rate': 3.7727666955767567e-06, 'epoch': 0.11}
4%|▍ | 435/11526 [04:33<1:54:12, 1.62it/s] 4%|▍ | 436/11526 [04:34<1:54:12, 1.62it/s] {'loss': 0.4236, 'grad_norm': 0.7255732417106628, 'learning_rate': 3.78143972246314e-06, 'epoch': 0.11}
4%|▍ | 436/11526 [04:34<1:54:12, 1.62it/s] 4%|▍ | 437/11526 [04:35<1:54:07, 1.62it/s] {'loss': 0.4145, 'grad_norm': 0.6990090608596802, 'learning_rate': 3.7901127493495234e-06, 'epoch': 0.11}
4%|▍ | 437/11526 [04:35<1:54:07, 1.62it/s] 4%|▍ | 438/11526 [04:35<1:53:54, 1.62it/s] {'loss': 0.4307, 'grad_norm': 0.8664701581001282, 'learning_rate': 3.7987857762359063e-06, 'epoch': 0.11}
4%|▍ | 438/11526 [04:35<1:53:54, 1.62it/s] 4%|▍ | 439/11526 [04:36<1:53:49, 1.62it/s] {'loss': 0.4258, 'grad_norm': 0.6882883310317993, 'learning_rate': 3.8074588031222897e-06, 'epoch': 0.11}
4%|▍ | 439/11526 [04:36<1:53:49, 1.62it/s] 4%|▍ | 440/11526 [04:36<1:53:54, 1.62it/s] {'loss': 0.3927, 'grad_norm': 0.7198670506477356, 'learning_rate': 3.816131830008674e-06, 'epoch': 0.11}
4%|▍ | 440/11526 [04:37<1:53:54, 1.62it/s] 4%|▍ | 441/11526 [04:37<1:53:43, 1.62it/s] {'loss': 0.4245, 'grad_norm': 0.7020233869552612, 'learning_rate': 3.824804856895057e-06, 'epoch': 0.11}
4%|▍ | 441/11526 [04:37<1:53:43, 1.62it/s] 4%|▍ | 442/11526 [04:38<1:53:42, 1.62it/s] {'loss': 0.3734, 'grad_norm': 0.5998426675796509, 'learning_rate': 3.8334778837814406e-06, 'epoch': 0.12}
4%|▍ | 442/11526 [04:38<1:53:42, 1.62it/s] 4%|▍ | 443/11526 [04:38<1:53:36, 1.63it/s] {'loss': 0.3909, 'grad_norm': 0.687070369720459, 'learning_rate': 3.842150910667823e-06, 'epoch': 0.12}
4%|▍ | 443/11526 [04:38<1:53:36, 1.63it/s] 4%|▍ | 444/11526 [04:39<1:53:38, 1.63it/s] {'loss': 0.5381, 'grad_norm': 0.8387272953987122, 'learning_rate': 3.850823937554206e-06, 'epoch': 0.12}
4%|▍ | 444/11526 [04:39<1:53:38, 1.63it/s] 4%|▍ | 445/11526 [04:40<1:54:46, 1.61it/s] {'loss': 0.4455, 'grad_norm': 0.675404965877533, 'learning_rate': 3.85949696444059e-06, 'epoch': 0.12}
4%|▍ | 445/11526 [04:40<1:54:46, 1.61it/s] 4%|▍ | 446/11526 [04:40<1:54:15, 1.62it/s] {'loss': 0.4068, 'grad_norm': 0.7010311484336853, 'learning_rate': 3.868169991326973e-06, 'epoch': 0.12}
4%|▍ | 446/11526 [04:40<1:54:15, 1.62it/s] 4%|▍ | 447/11526 [04:41<1:54:13, 1.62it/s] {'loss': 0.439, 'grad_norm': 0.818248450756073, 'learning_rate': 3.8768430182133565e-06, 'epoch': 0.12}
4%|▍ | 447/11526 [04:41<1:54:13, 1.62it/s] 4%|▍ | 448/11526 [04:41<1:54:01, 1.62it/s] {'loss': 0.4494, 'grad_norm': 0.8068976402282715, 'learning_rate': 3.885516045099741e-06, 'epoch': 0.12}
4%|▍ | 448/11526 [04:41<1:54:01, 1.62it/s] 4%|▍ | 449/11526 [04:42<1:53:48, 1.62it/s] {'loss': 0.3641, 'grad_norm': 0.6514712572097778, 'learning_rate': 3.894189071986124e-06, 'epoch': 0.12}
4%|▍ | 449/11526 [04:42<1:53:48, 1.62it/s] 4%|▍ | 450/11526 [04:43<1:53:47, 1.62it/s] {'loss': 0.4282, 'grad_norm': 0.7461514472961426, 'learning_rate': 3.9028620988725065e-06, 'epoch': 0.12}
4%|▍ | 450/11526 [04:43<1:53:47, 1.62it/s] 4%|▍ | 451/11526 [04:43<1:53:38, 1.62it/s] {'loss': 0.4066, 'grad_norm': 0.7232134938240051, 'learning_rate': 3.91153512575889e-06, 'epoch': 0.12}
4%|▍ | 451/11526 [04:43<1:53:38, 1.62it/s] 4%|▍ | 452/11526 [04:44<1:53:46, 1.62it/s] {'loss': 0.403, 'grad_norm': 0.7308376431465149, 'learning_rate': 3.920208152645273e-06, 'epoch': 0.12}
4%|▍ | 452/11526 [04:44<1:53:46, 1.62it/s] 4%|▍ | 453/11526 [04:44<1:53:35, 1.62it/s] {'loss': 0.5161, 'grad_norm': 0.743516206741333, 'learning_rate': 3.9288811795316565e-06, 'epoch': 0.12}
4%|▍ | 453/11526 [04:45<1:53:35, 1.62it/s] 4%|▍ | 454/11526 [04:45<1:53:29, 1.63it/s] {'loss': 0.4355, 'grad_norm': 0.6186915636062622, 'learning_rate': 3.93755420641804e-06, 'epoch': 0.12}
4%|▍ | 454/11526 [04:45<1:53:29, 1.63it/s] 4%|▍ | 455/11526 [04:46<1:53:36, 1.62it/s] {'loss': 0.4512, 'grad_norm': 0.5926752090454102, 'learning_rate': 3.946227233304424e-06, 'epoch': 0.12}
4%|▍ | 455/11526 [04:46<1:53:36, 1.62it/s] 4%|▍ | 456/11526 [04:46<1:53:28, 1.63it/s] {'loss': 0.3823, 'grad_norm': 0.6394097208976746, 'learning_rate': 3.9549002601908074e-06, 'epoch': 0.12}
4%|▍ | 456/11526 [04:46<1:53:28, 1.63it/s] 4%|▍ | 457/11526 [04:47<1:53:23, 1.63it/s] {'loss': 0.3789, 'grad_norm': 0.6143398284912109, 'learning_rate': 3.963573287077191e-06, 'epoch': 0.12}
4%|▍ | 457/11526 [04:47<1:53:23, 1.63it/s] 4%|▍ | 458/11526 [04:48<1:53:20, 1.63it/s] {'loss': 0.3636, 'grad_norm': 0.620137095451355, 'learning_rate': 3.972246313963573e-06, 'epoch': 0.12}
4%|▍ | 458/11526 [04:48<1:53:20, 1.63it/s] 4%|▍ | 459/11526 [04:48<1:53:20, 1.63it/s] {'loss': 0.5323, 'grad_norm': 0.7696968913078308, 'learning_rate': 3.980919340849957e-06, 'epoch': 0.12}
4%|▍ | 459/11526 [04:48<1:53:20, 1.63it/s] 4%|▍ | 460/11526 [04:49<1:53:27, 1.63it/s] {'loss': 0.392, 'grad_norm': 0.7505579590797424, 'learning_rate': 3.98959236773634e-06, 'epoch': 0.12}
4%|▍ | 460/11526 [04:49<1:53:27, 1.63it/s] 4%|▍ | 461/11526 [04:49<1:53:23, 1.63it/s] {'loss': 0.5001, 'grad_norm': 0.6998777985572815, 'learning_rate': 3.998265394622723e-06, 'epoch': 0.12}
4%|▍ | 461/11526 [04:49<1:53:23, 1.63it/s] 4%|▍ | 462/11526 [04:50<1:53:51, 1.62it/s] {'loss': 0.5021, 'grad_norm': 0.8078737854957581, 'learning_rate': 4.0069384215091075e-06, 'epoch': 0.12}
4%|▍ | 462/11526 [04:50<1:53:51, 1.62it/s] 4%|▍ | 463/11526 [04:51<1:53:39, 1.62it/s] {'loss': 0.5385, 'grad_norm': 0.815019965171814, 'learning_rate': 4.015611448395491e-06, 'epoch': 0.12}
4%|▍ | 463/11526 [04:51<1:53:39, 1.62it/s] 4%|▍ | 464/11526 [04:51<1:53:32, 1.62it/s] {'loss': 0.4379, 'grad_norm': 0.6589834690093994, 'learning_rate': 4.024284475281874e-06, 'epoch': 0.12}
4%|▍ | 464/11526 [04:51<1:53:32, 1.62it/s] 4%|▍ | 465/11526 [04:52<1:53:33, 1.62it/s] {'loss': 0.3977, 'grad_norm': 0.7179739475250244, 'learning_rate': 4.032957502168257e-06, 'epoch': 0.12}
4%|▍ | 465/11526 [04:52<1:53:33, 1.62it/s] 4%|▍ | 466/11526 [04:52<1:53:26, 1.62it/s] {'loss': 0.3192, 'grad_norm': 0.622694730758667, 'learning_rate': 4.04163052905464e-06, 'epoch': 0.12}
4%|▍ | 466/11526 [04:53<1:53:26, 1.62it/s] 4%|▍ | 467/11526 [04:53<1:53:36, 1.62it/s] {'loss': 0.4759, 'grad_norm': 0.8641308546066284, 'learning_rate': 4.050303555941023e-06, 'epoch': 0.12}
4%|▍ | 467/11526 [04:53<1:53:36, 1.62it/s] 4%|▍ | 468/11526 [04:54<1:53:27, 1.62it/s] {'loss': 0.4193, 'grad_norm': 0.8497452139854431, 'learning_rate': 4.058976582827407e-06, 'epoch': 0.12}
4%|▍ | 468/11526 [04:54<1:53:27, 1.62it/s] 4%|▍ | 469/11526 [04:54<1:53:19, 1.63it/s] {'loss': 0.5005, 'grad_norm': 0.8138662576675415, 'learning_rate': 4.06764960971379e-06, 'epoch': 0.12}
4%|▍ | 469/11526 [04:54<1:53:19, 1.63it/s] 4%|▍ | 470/11526 [04:55<1:53:21, 1.63it/s] {'loss': 0.3668, 'grad_norm': 0.6512378454208374, 'learning_rate': 4.076322636600174e-06, 'epoch': 0.12}
4%|▍ | 470/11526 [04:55<1:53:21, 1.63it/s] 4%|▍ | 471/11526 [04:56<1:53:15, 1.63it/s] {'loss': 0.5422, 'grad_norm': 0.7337889075279236, 'learning_rate': 4.084995663486558e-06, 'epoch': 0.12}
4%|▍ | 471/11526 [04:56<1:53:15, 1.63it/s] 4%|▍ | 472/11526 [04:56<1:53:19, 1.63it/s] {'loss': 0.4511, 'grad_norm': 0.745064377784729, 'learning_rate': 4.093668690372941e-06, 'epoch': 0.12}
4%|▍ | 472/11526 [04:56<1:53:19, 1.63it/s] 4%|▍ | 473/11526 [04:57<1:53:16, 1.63it/s] {'loss': 0.5524, 'grad_norm': 0.7917768955230713, 'learning_rate': 4.1023417172593235e-06, 'epoch': 0.12}
4%|▍ | 473/11526 [04:57<1:53:16, 1.63it/s] 4%|▍ | 474/11526 [04:57<1:53:16, 1.63it/s] {'loss': 0.4427, 'grad_norm': 0.7341815829277039, 'learning_rate': 4.111014744145707e-06, 'epoch': 0.12}
4%|▍ | 474/11526 [04:57<1:53:16, 1.63it/s] 4%|▍ | 475/11526 [04:58<1:53:20, 1.62it/s] {'loss': 0.3589, 'grad_norm': 0.6503676176071167, 'learning_rate': 4.11968777103209e-06, 'epoch': 0.12}
4%|▍ | 475/11526 [04:58<1:53:20, 1.62it/s] 4%|▍ | 476/11526 [04:59<1:53:15, 1.63it/s] {'loss': 0.3915, 'grad_norm': 0.7324827313423157, 'learning_rate': 4.1283607979184735e-06, 'epoch': 0.12}
4%|▍ | 476/11526 [04:59<1:53:15, 1.63it/s] 4%|▍ | 477/11526 [04:59<1:53:53, 1.62it/s] {'loss': 0.444, 'grad_norm': 0.7242802977561951, 'learning_rate': 4.137033824804858e-06, 'epoch': 0.12}
4%|▍ | 477/11526 [04:59<1:53:53, 1.62it/s] 4%|▍ | 478/11526 [05:00<1:53:39, 1.62it/s] {'loss': 0.4726, 'grad_norm': 0.7382639646530151, 'learning_rate': 4.145706851691241e-06, 'epoch': 0.12}
4%|▍ | 478/11526 [05:00<1:53:39, 1.62it/s] 4%|▍ | 479/11526 [05:00<1:53:28, 1.62it/s] {'loss': 0.4027, 'grad_norm': 0.7265614867210388, 'learning_rate': 4.1543798785776244e-06, 'epoch': 0.12}
4%|▍ | 479/11526 [05:01<1:53:28, 1.62it/s] 4%|▍ | 480/11526 [05:01<1:53:23, 1.62it/s] {'loss': 0.4274, 'grad_norm': 0.7367269396781921, 'learning_rate': 4.163052905464007e-06, 'epoch': 0.12}
4%|▍ | 480/11526 [05:01<1:53:23, 1.62it/s] 4%|▍ | 481/11526 [05:02<1:53:15, 1.63it/s] {'loss': 0.3949, 'grad_norm': 0.7692021727561951, 'learning_rate': 4.17172593235039e-06, 'epoch': 0.13}
4%|▍ | 481/11526 [05:02<1:53:15, 1.63it/s] 4%|▍ | 482/11526 [05:02<1:53:26, 1.62it/s] {'loss': 0.3602, 'grad_norm': 0.6487683057785034, 'learning_rate': 4.180398959236774e-06, 'epoch': 0.13}
4%|▍ | 482/11526 [05:02<1:53:26, 1.62it/s] 4%|▍ | 483/11526 [05:03<1:53:19, 1.62it/s] {'loss': 0.4128, 'grad_norm': 0.732130765914917, 'learning_rate': 4.189071986123157e-06, 'epoch': 0.13}
4%|▍ | 483/11526 [05:03<1:53:19, 1.62it/s] 4%|▍ | 484/11526 [05:04<1:53:12, 1.63it/s] {'loss': 0.5104, 'grad_norm': 0.8079080581665039, 'learning_rate': 4.19774501300954e-06, 'epoch': 0.13}
4%|▍ | 484/11526 [05:04<1:53:12, 1.63it/s] 4%|▍ | 485/11526 [05:04<1:53:12, 1.63it/s] {'loss': 0.5031, 'grad_norm': 0.872143566608429, 'learning_rate': 4.2064180398959245e-06, 'epoch': 0.13}
4%|▍ | 485/11526 [05:04<1:53:12, 1.63it/s] 4%|▍ | 486/11526 [05:05<1:53:12, 1.63it/s] {'loss': 0.3629, 'grad_norm': 0.6663718223571777, 'learning_rate': 4.215091066782308e-06, 'epoch': 0.13}
4%|▍ | 486/11526 [05:05<1:53:12, 1.63it/s] 4%|▍ | 487/11526 [05:05<1:53:22, 1.62it/s] {'loss': 0.3759, 'grad_norm': 0.6923399567604065, 'learning_rate': 4.22376409366869e-06, 'epoch': 0.13}
4%|▍ | 487/11526 [05:05<1:53:22, 1.62it/s] 4%|▍ | 488/11526 [05:06<1:53:14, 1.62it/s] {'loss': 0.4691, 'grad_norm': 0.797860860824585, 'learning_rate': 4.232437120555074e-06, 'epoch': 0.13}
4%|▍ | 488/11526 [05:06<1:53:14, 1.62it/s] 4%|▍ | 489/11526 [05:07<1:53:12, 1.62it/s] {'loss': 0.3292, 'grad_norm': 0.6856442093849182, 'learning_rate': 4.241110147441457e-06, 'epoch': 0.13}
4%|▍ | 489/11526 [05:07<1:53:12, 1.62it/s] 4%|▍ | 490/11526 [05:07<1:53:18, 1.62it/s] {'loss': 0.3906, 'grad_norm': 0.6590916514396667, 'learning_rate': 4.24978317432784e-06, 'epoch': 0.13}
4%|▍ | 490/11526 [05:07<1:53:18, 1.62it/s] 4%|▍ | 491/11526 [05:08<1:53:12, 1.62it/s] {'loss': 0.555, 'grad_norm': 0.8308565616607666, 'learning_rate': 4.258456201214224e-06, 'epoch': 0.13}
4%|▍ | 491/11526 [05:08<1:53:12, 1.62it/s] 4%|▍ | 492/11526 [05:08<1:53:16, 1.62it/s] {'loss': 0.3474, 'grad_norm': 0.6358709931373596, 'learning_rate': 4.267129228100608e-06, 'epoch': 0.13}
4%|▍ | 492/11526 [05:09<1:53:16, 1.62it/s] 4%|▍ | 493/11526 [05:09<1:53:10, 1.62it/s] {'loss': 0.4605, 'grad_norm': 0.8001893758773804, 'learning_rate': 4.275802254986991e-06, 'epoch': 0.13}
4%|▍ | 493/11526 [05:09<1:53:10, 1.62it/s] 4%|▍ | 494/11526 [05:10<1:53:05, 1.63it/s] {'loss': 0.4677, 'grad_norm': 0.7611310482025146, 'learning_rate': 4.284475281873375e-06, 'epoch': 0.13}
4%|▍ | 494/11526 [05:10<1:53:05, 1.63it/s] 4%|▍ | 495/11526 [05:10<1:53:11, 1.62it/s] {'loss': 0.4742, 'grad_norm': 0.7742082476615906, 'learning_rate': 4.293148308759757e-06, 'epoch': 0.13}
4%|▍ | 495/11526 [05:10<1:53:11, 1.62it/s] 4%|▍ | 496/11526 [05:11<1:53:04, 1.63it/s] {'loss': 0.4861, 'grad_norm': 0.8139631152153015, 'learning_rate': 4.3018213356461405e-06, 'epoch': 0.13}
4%|▍ | 496/11526 [05:11<1:53:04, 1.63it/s] 4%|▍ | 497/11526 [05:12<1:53:12, 1.62it/s] {'loss': 0.3421, 'grad_norm': 0.6481139063835144, 'learning_rate': 4.310494362532524e-06, 'epoch': 0.13}
4%|▍ | 497/11526 [05:12<1:53:12, 1.62it/s] 4%|▍ | 498/11526 [05:12<1:53:10, 1.62it/s] {'loss': 0.4357, 'grad_norm': 0.6997607946395874, 'learning_rate': 4.319167389418907e-06, 'epoch': 0.13}
4%|▍ | 498/11526 [05:12<1:53:10, 1.62it/s] 4%|▍ | 499/11526 [05:13<1:53:08, 1.62it/s] {'loss': 0.4356, 'grad_norm': 0.827028751373291, 'learning_rate': 4.327840416305291e-06, 'epoch': 0.13}
4%|▍ | 499/11526 [05:13<1:53:08, 1.62it/s] 4%|▍ | 500/11526 [05:13<1:53:19, 1.62it/s] {'loss': 0.4242, 'grad_norm': 0.69318026304245, 'learning_rate': 4.336513443191675e-06, 'epoch': 0.13}
4%|▍ | 500/11526 [05:13<1:53:19, 1.62it/s] 4%|▍ | 501/11526 [05:14<1:53:10, 1.62it/s] {'loss': 0.3223, 'grad_norm': 0.7564066648483276, 'learning_rate': 4.345186470078058e-06, 'epoch': 0.13}
4%|▍ | 501/11526 [05:14<1:53:10, 1.62it/s] 4%|▍ | 502/11526 [05:15<1:53:09, 1.62it/s] {'loss': 0.3889, 'grad_norm': 0.6652188897132874, 'learning_rate': 4.353859496964441e-06, 'epoch': 0.13}
4%|▍ | 502/11526 [05:15<1:53:09, 1.62it/s] 4%|▍ | 503/11526 [05:15<1:53:06, 1.62it/s] {'loss': 0.4536, 'grad_norm': 0.7560793161392212, 'learning_rate': 4.362532523850824e-06, 'epoch': 0.13}
4%|▍ | 503/11526 [05:15<1:53:06, 1.62it/s] 4%|▍ | 504/11526 [05:16<1:53:03, 1.62it/s] {'loss': 0.4438, 'grad_norm': 0.8014217615127563, 'learning_rate': 4.371205550737207e-06, 'epoch': 0.13}
4%|▍ | 504/11526 [05:16<1:53:03, 1.62it/s] 4%|▍ | 505/11526 [05:16<1:53:03, 1.62it/s] {'loss': 0.3928, 'grad_norm': 0.7244613170623779, 'learning_rate': 4.379878577623591e-06, 'epoch': 0.13}
4%|▍ | 505/11526 [05:17<1:53:03, 1.62it/s] 4%|▍ | 506/11526 [05:17<1:52:59, 1.63it/s] {'loss': 0.4021, 'grad_norm': 0.752855658531189, 'learning_rate': 4.388551604509974e-06, 'epoch': 0.13}
4%|▍ | 506/11526 [05:17<1:52:59, 1.63it/s] 4%|▍ | 507/11526 [05:18<1:53:04, 1.62it/s] {'loss': 0.4596, 'grad_norm': 0.7738847136497498, 'learning_rate': 4.397224631396358e-06, 'epoch': 0.13}
4%|▍ | 507/11526 [05:18<1:53:04, 1.62it/s] 4%|▍ | 508/11526 [05:18<1:53:02, 1.62it/s] {'loss': 0.4714, 'grad_norm': 0.7224528789520264, 'learning_rate': 4.4058976582827415e-06, 'epoch': 0.13}
4%|▍ | 508/11526 [05:18<1:53:02, 1.62it/s] 4%|▍ | 509/11526 [05:19<1:52:59, 1.62it/s] {'loss': 0.395, 'grad_norm': 0.6495622992515564, 'learning_rate': 4.414570685169125e-06, 'epoch': 0.13}
4%|▍ | 509/11526 [05:19<1:52:59, 1.62it/s] 4%|▍ | 510/11526 [05:20<1:53:06, 1.62it/s] {'loss': 0.495, 'grad_norm': 0.8108032941818237, 'learning_rate': 4.423243712055507e-06, 'epoch': 0.13}
4%|▍ | 510/11526 [05:20<1:53:06, 1.62it/s] 4%|▍ | 511/11526 [05:20<1:52:58, 1.63it/s] {'loss': 0.4265, 'grad_norm': 0.6335014700889587, 'learning_rate': 4.431916738941891e-06, 'epoch': 0.13}
4%|▍ | 511/11526 [05:20<1:52:58, 1.63it/s] 4%|▍ | 512/11526 [05:21<1:53:08, 1.62it/s] {'loss': 0.51, 'grad_norm': 0.9643590450286865, 'learning_rate': 4.440589765828274e-06, 'epoch': 0.13}
4%|▍ | 512/11526 [05:21<1:53:08, 1.62it/s] 4%|▍ | 513/11526 [05:21<1:53:01, 1.62it/s] {'loss': 0.4025, 'grad_norm': 0.6892954111099243, 'learning_rate': 4.449262792714657e-06, 'epoch': 0.13}
4%|▍ | 513/11526 [05:21<1:53:01, 1.62it/s] 4%|▍ | 514/11526 [05:22<1:52:58, 1.62it/s] {'loss': 0.3813, 'grad_norm': 0.7943369150161743, 'learning_rate': 4.457935819601042e-06, 'epoch': 0.13}
4%|▍ | 514/11526 [05:22<1:52:58, 1.62it/s] 4%|▍ | 515/11526 [05:23<1:53:03, 1.62it/s] {'loss': 0.4248, 'grad_norm': 0.7252538800239563, 'learning_rate': 4.466608846487425e-06, 'epoch': 0.13}
4%|▍ | 515/11526 [05:23<1:53:03, 1.62it/s] 4%|▍ | 516/11526 [05:23<1:52:56, 1.62it/s] {'loss': 0.4898, 'grad_norm': 0.7119054198265076, 'learning_rate': 4.475281873373808e-06, 'epoch': 0.13}
4%|▍ | 516/11526 [05:23<1:52:56, 1.62it/s] 4%|▍ | 517/11526 [05:24<1:53:04, 1.62it/s] {'loss': 0.4612, 'grad_norm': 0.7612258195877075, 'learning_rate': 4.483954900260191e-06, 'epoch': 0.13}
4%|▍ | 517/11526 [05:24<1:53:04, 1.62it/s] 4%|▍ | 518/11526 [05:24<1:52:57, 1.62it/s] {'loss': 0.4115, 'grad_norm': 0.8430337905883789, 'learning_rate': 4.492627927146574e-06, 'epoch': 0.13}
4%|▍ | 518/11526 [05:25<1:52:57, 1.62it/s] 5%|▍ | 519/11526 [05:25<1:52:52, 1.63it/s] {'loss': 0.4641, 'grad_norm': 0.7127485871315002, 'learning_rate': 4.5013009540329575e-06, 'epoch': 0.14}
5%|▍ | 519/11526 [05:25<1:52:52, 1.63it/s] 5%|▍ | 520/11526 [05:26<1:53:01, 1.62it/s] {'loss': 0.4054, 'grad_norm': 0.6231193542480469, 'learning_rate': 4.509973980919341e-06, 'epoch': 0.14}
5%|▍ | 520/11526 [05:26<1:53:01, 1.62it/s] 5%|▍ | 521/11526 [05:26<1:52:54, 1.62it/s] {'loss': 0.4487, 'grad_norm': 0.7323324084281921, 'learning_rate': 4.518647007805724e-06, 'epoch': 0.14}
5%|▍ | 521/11526 [05:26<1:52:54, 1.62it/s] 5%|▍ | 522/11526 [05:27<1:52:55, 1.62it/s] {'loss': 0.4421, 'grad_norm': 0.7011528611183167, 'learning_rate': 4.527320034692108e-06, 'epoch': 0.14}
5%|▍ | 522/11526 [05:27<1:52:55, 1.62it/s] 5%|▍ | 523/11526 [05:28<1:52:55, 1.62it/s] {'loss': 0.4806, 'grad_norm': 0.810947597026825, 'learning_rate': 4.535993061578492e-06, 'epoch': 0.14}
5%|▍ | 523/11526 [05:28<1:52:55, 1.62it/s] 5%|▍ | 524/11526 [05:28<1:52:50, 1.62it/s] {'loss': 0.477, 'grad_norm': 0.8093245029449463, 'learning_rate': 4.544666088464875e-06, 'epoch': 0.14}
5%|▍ | 524/11526 [05:28<1:52:50, 1.62it/s] 5%|▍ | 525/11526 [05:29<1:52:55, 1.62it/s] {'loss': 0.3918, 'grad_norm': 0.6866489052772522, 'learning_rate': 4.5533391153512576e-06, 'epoch': 0.14}
5%|▍ | 525/11526 [05:29<1:52:55, 1.62it/s] 5%|▍ | 526/11526 [05:29<1:52:54, 1.62it/s] {'loss': 0.4806, 'grad_norm': 0.6820096969604492, 'learning_rate': 4.562012142237641e-06, 'epoch': 0.14}
5%|▍ | 526/11526 [05:30<1:52:54, 1.62it/s] 5%|▍ | 527/11526 [05:30<1:53:00, 1.62it/s] {'loss': 0.4109, 'grad_norm': 0.6663830876350403, 'learning_rate': 4.570685169124024e-06, 'epoch': 0.14}
5%|▍ | 527/11526 [05:30<1:53:00, 1.62it/s] 5%|▍ | 528/11526 [05:31<1:52:54, 1.62it/s] {'loss': 0.611, 'grad_norm': 0.7167218923568726, 'learning_rate': 4.579358196010408e-06, 'epoch': 0.14}
5%|▍ | 528/11526 [05:31<1:52:54, 1.62it/s] 5%|▍ | 529/11526 [05:31<1:52:49, 1.62it/s] {'loss': 0.3959, 'grad_norm': 0.7184093594551086, 'learning_rate': 4.588031222896792e-06, 'epoch': 0.14}
5%|▍ | 529/11526 [05:31<1:52:49, 1.62it/s] 5%|▍ | 530/11526 [05:32<1:52:57, 1.62it/s] {'loss': 0.3892, 'grad_norm': 0.6691920757293701, 'learning_rate': 4.596704249783175e-06, 'epoch': 0.14}
5%|▍ | 530/11526 [05:32<1:52:57, 1.62it/s] 5%|▍ | 531/11526 [05:32<1:52:50, 1.62it/s] {'loss': 0.4444, 'grad_norm': 0.6811487078666687, 'learning_rate': 4.6053772766695585e-06, 'epoch': 0.14}
5%|▍ | 531/11526 [05:33<1:52:50, 1.62it/s] 5%|▍ | 532/11526 [05:33<1:52:53, 1.62it/s] {'loss': 0.3683, 'grad_norm': 0.6388775706291199, 'learning_rate': 4.614050303555941e-06, 'epoch': 0.14}
5%|▍ | 532/11526 [05:33<1:52:53, 1.62it/s] 5%|▍ | 533/11526 [05:34<1:52:49, 1.62it/s] {'loss': 0.464, 'grad_norm': 0.6939169764518738, 'learning_rate': 4.622723330442324e-06, 'epoch': 0.14}
5%|▍ | 533/11526 [05:34<1:52:49, 1.62it/s] 5%|▍ | 534/11526 [05:34<1:52:47, 1.62it/s] {'loss': 0.3176, 'grad_norm': 0.6436537504196167, 'learning_rate': 4.631396357328708e-06, 'epoch': 0.14}
5%|▍ | 534/11526 [05:34<1:52:47, 1.62it/s] 5%|▍ | 535/11526 [05:35<1:53:04, 1.62it/s] {'loss': 0.325, 'grad_norm': 0.6319178342819214, 'learning_rate': 4.640069384215091e-06, 'epoch': 0.14}
5%|▍ | 535/11526 [05:35<1:53:04, 1.62it/s] 5%|▍ | 536/11526 [05:36<1:52:47, 1.62it/s] {'loss': 0.3781, 'grad_norm': 0.6574820876121521, 'learning_rate': 4.648742411101475e-06, 'epoch': 0.14}
5%|▍ | 536/11526 [05:36<1:52:47, 1.62it/s] 5%|▍ | 537/11526 [05:36<1:52:52, 1.62it/s] {'loss': 0.5594, 'grad_norm': 0.6719399094581604, 'learning_rate': 4.657415437987859e-06, 'epoch': 0.14}
5%|▍ | 537/11526 [05:36<1:52:52, 1.62it/s] 5%|▍ | 538/11526 [05:37<1:52:49, 1.62it/s] {'loss': 0.4602, 'grad_norm': 0.672275185585022, 'learning_rate': 4.666088464874242e-06, 'epoch': 0.14}
5%|▍ | 538/11526 [05:37<1:52:49, 1.62it/s] 5%|▍ | 539/11526 [05:37<1:52:47, 1.62it/s] {'loss': 0.4076, 'grad_norm': 0.7421301007270813, 'learning_rate': 4.674761491760625e-06, 'epoch': 0.14}
5%|▍ | 539/11526 [05:38<1:52:47, 1.62it/s] 5%|▍ | 540/11526 [05:38<1:52:47, 1.62it/s] {'loss': 0.4321, 'grad_norm': 0.7658659219741821, 'learning_rate': 4.683434518647008e-06, 'epoch': 0.14}
5%|▍ | 540/11526 [05:38<1:52:47, 1.62it/s] 5%|▍ | 541/11526 [05:39<1:52:50, 1.62it/s] {'loss': 0.3478, 'grad_norm': 0.5944129228591919, 'learning_rate': 4.692107545533391e-06, 'epoch': 0.14}
5%|▍ | 541/11526 [05:39<1:52:50, 1.62it/s] 5%|▍ | 542/11526 [05:39<1:52:48, 1.62it/s] {'loss': 0.4082, 'grad_norm': 0.7082988619804382, 'learning_rate': 4.7007805724197745e-06, 'epoch': 0.14}
5%|▍ | 542/11526 [05:39<1:52:48, 1.62it/s] 5%|▍ | 543/11526 [05:40<1:52:43, 1.62it/s] {'loss': 0.3403, 'grad_norm': 0.5929768681526184, 'learning_rate': 4.709453599306158e-06, 'epoch': 0.14}
5%|▍ | 543/11526 [05:40<1:52:43, 1.62it/s] 5%|▍ | 544/11526 [05:40<1:52:41, 1.62it/s] {'loss': 0.3871, 'grad_norm': 0.7475152015686035, 'learning_rate': 4.718126626192542e-06, 'epoch': 0.14}
5%|▍ | 544/11526 [05:41<1:52:41, 1.62it/s] 5%|▍ | 545/11526 [05:41<1:52:43, 1.62it/s] {'loss': 0.3651, 'grad_norm': 0.6970365047454834, 'learning_rate': 4.726799653078925e-06, 'epoch': 0.14}
5%|▍ | 545/11526 [05:41<1:52:43, 1.62it/s] 5%|▍ | 546/11526 [05:42<1:52:41, 1.62it/s] {'loss': 0.4066, 'grad_norm': 0.826568603515625, 'learning_rate': 4.735472679965309e-06, 'epoch': 0.14}
5%|▍ | 546/11526 [05:42<1:52:41, 1.62it/s] 5%|▍ | 547/11526 [05:42<1:52:48, 1.62it/s] {'loss': 0.3923, 'grad_norm': 0.7130035161972046, 'learning_rate': 4.744145706851691e-06, 'epoch': 0.14}
5%|▍ | 547/11526 [05:42<1:52:48, 1.62it/s] 5%|▍ | 548/11526 [05:43<1:52:38, 1.62it/s] {'loss': 0.4959, 'grad_norm': 0.8317971229553223, 'learning_rate': 4.7528187337380746e-06, 'epoch': 0.14}
5%|▍ | 548/11526 [05:43<1:52:38, 1.62it/s] 5%|▍ | 549/11526 [05:44<1:52:36, 1.62it/s] {'loss': 0.4085, 'grad_norm': 0.7221544981002808, 'learning_rate': 4.761491760624458e-06, 'epoch': 0.14}
5%|▍ | 549/11526 [05:44<1:52:36, 1.62it/s] 5%|▍ | 550/11526 [05:44<1:52:45, 1.62it/s] {'loss': 0.4501, 'grad_norm': 0.8954630494117737, 'learning_rate': 4.770164787510841e-06, 'epoch': 0.14}
5%|▍ | 550/11526 [05:44<1:52:45, 1.62it/s] 5%|▍ | 551/11526 [05:45<1:52:39, 1.62it/s] {'loss': 0.372, 'grad_norm': 0.6574524641036987, 'learning_rate': 4.7788378143972255e-06, 'epoch': 0.14}
5%|▍ | 551/11526 [05:45<1:52:39, 1.62it/s] 5%|▍ | 552/11526 [05:45<1:52:47, 1.62it/s] {'loss': 0.3858, 'grad_norm': 0.6956750750541687, 'learning_rate': 4.787510841283609e-06, 'epoch': 0.14}
5%|▍ | 552/11526 [05:46<1:52:47, 1.62it/s] 5%|▍ | 553/11526 [05:46<1:52:44, 1.62it/s] {'loss': 0.4131, 'grad_norm': 0.7007177472114563, 'learning_rate': 4.796183868169992e-06, 'epoch': 0.14}
5%|▍ | 553/11526 [05:46<1:52:44, 1.62it/s] 5%|▍ | 554/11526 [05:47<1:52:38, 1.62it/s] {'loss': 0.3374, 'grad_norm': 0.582796037197113, 'learning_rate': 4.8048568950563755e-06, 'epoch': 0.14}
5%|▍ | 554/11526 [05:47<1:52:38, 1.62it/s] 5%|▍ | 555/11526 [05:47<1:52:45, 1.62it/s] {'loss': 0.4838, 'grad_norm': 0.8321179151535034, 'learning_rate': 4.813529921942758e-06, 'epoch': 0.14}
5%|▍ | 555/11526 [05:47<1:52:45, 1.62it/s] 5%|▍ | 556/11526 [05:48<1:52:41, 1.62it/s] {'loss': 0.416, 'grad_norm': 0.7405956387519836, 'learning_rate': 4.822202948829141e-06, 'epoch': 0.14}
5%|▍ | 556/11526 [05:48<1:52:41, 1.62it/s] 5%|▍ | 557/11526 [05:48<1:53:17, 1.61it/s] {'loss': 0.4443, 'grad_norm': 0.7160441279411316, 'learning_rate': 4.830875975715525e-06, 'epoch': 0.14}
5%|▍ | 557/11526 [05:49<1:53:17, 1.61it/s] 5%|▍ | 558/11526 [05:49<1:53:01, 1.62it/s] {'loss': 0.4073, 'grad_norm': 0.6448276042938232, 'learning_rate': 4.839549002601908e-06, 'epoch': 0.15}
5%|▍ | 558/11526 [05:49<1:53:01, 1.62it/s] 5%|▍ | 559/11526 [05:50<1:52:49, 1.62it/s] {'loss': 0.3766, 'grad_norm': 0.6249399781227112, 'learning_rate': 4.848222029488292e-06, 'epoch': 0.15}
5%|▍ | 559/11526 [05:50<1:52:49, 1.62it/s] 5%|▍ | 560/11526 [05:50<1:52:52, 1.62it/s] {'loss': 0.4433, 'grad_norm': 0.7296474575996399, 'learning_rate': 4.856895056374676e-06, 'epoch': 0.15}
5%|▍ | 560/11526 [05:50<1:52:52, 1.62it/s] 5%|▍ | 561/11526 [05:51<1:52:42, 1.62it/s] {'loss': 0.3969, 'grad_norm': 0.7387452125549316, 'learning_rate': 4.865568083261059e-06, 'epoch': 0.15}
5%|▍ | 561/11526 [05:51<1:52:42, 1.62it/s] 5%|▍ | 562/11526 [05:52<1:52:45, 1.62it/s] {'loss': 0.3511, 'grad_norm': 0.6322174668312073, 'learning_rate': 4.8742411101474414e-06, 'epoch': 0.15}
5%|▍ | 562/11526 [05:52<1:52:45, 1.62it/s] 5%|▍ | 563/11526 [05:52<1:52:38, 1.62it/s] {'loss': 0.3251, 'grad_norm': 0.6100327968597412, 'learning_rate': 4.882914137033825e-06, 'epoch': 0.15}
5%|▍ | 563/11526 [05:52<1:52:38, 1.62it/s] 5%|▍ | 564/11526 [05:53<1:52:35, 1.62it/s] {'loss': 0.3848, 'grad_norm': 0.6025633215904236, 'learning_rate': 4.891587163920208e-06, 'epoch': 0.15}
5%|▍ | 564/11526 [05:53<1:52:35, 1.62it/s] 5%|▍ | 565/11526 [05:53<1:52:35, 1.62it/s] {'loss': 0.4392, 'grad_norm': 0.7761245965957642, 'learning_rate': 4.9002601908065915e-06, 'epoch': 0.15}
5%|▍ | 565/11526 [05:54<1:52:35, 1.62it/s] 5%|▍ | 566/11526 [05:54<1:52:29, 1.62it/s] {'loss': 0.3437, 'grad_norm': 0.594966471195221, 'learning_rate': 4.908933217692976e-06, 'epoch': 0.15}
5%|▍ | 566/11526 [05:54<1:52:29, 1.62it/s] 5%|▍ | 567/11526 [05:55<1:52:35, 1.62it/s] {'loss': 0.3608, 'grad_norm': 0.5719125270843506, 'learning_rate': 4.917606244579359e-06, 'epoch': 0.15}
5%|▍ | 567/11526 [05:55<1:52:35, 1.62it/s] 5%|▍ | 568/11526 [05:55<1:52:30, 1.62it/s] {'loss': 0.3647, 'grad_norm': 0.6692488193511963, 'learning_rate': 4.926279271465742e-06, 'epoch': 0.15}
5%|▍ | 568/11526 [05:55<1:52:30, 1.62it/s] 5%|▍ | 569/11526 [05:56<1:52:25, 1.62it/s] {'loss': 0.5057, 'grad_norm': 0.8100046515464783, 'learning_rate': 4.934952298352126e-06, 'epoch': 0.15}
5%|▍ | 569/11526 [05:56<1:52:25, 1.62it/s] 5%|▍ | 570/11526 [05:56<1:52:29, 1.62it/s] {'loss': 0.4536, 'grad_norm': 0.9842545390129089, 'learning_rate': 4.943625325238508e-06, 'epoch': 0.15}
5%|▍ | 570/11526 [05:57<1:52:29, 1.62it/s] 5%|▍ | 571/11526 [05:57<1:52:32, 1.62it/s] {'loss': 0.4804, 'grad_norm': 0.7130855917930603, 'learning_rate': 4.9522983521248916e-06, 'epoch': 0.15}
5%|▍ | 571/11526 [05:57<1:52:32, 1.62it/s] 5%|▍ | 572/11526 [05:58<1:52:49, 1.62it/s] {'loss': 0.495, 'grad_norm': 0.8368422985076904, 'learning_rate': 4.960971379011275e-06, 'epoch': 0.15}
5%|▍ | 572/11526 [05:58<1:52:49, 1.62it/s] 5%|▍ | 573/11526 [05:58<1:52:30, 1.62it/s] {'loss': 0.4624, 'grad_norm': 0.6840730309486389, 'learning_rate': 4.969644405897659e-06, 'epoch': 0.15}
5%|▍ | 573/11526 [05:58<1:52:30, 1.62it/s] 5%|▍ | 574/11526 [05:59<1:52:28, 1.62it/s] {'loss': 0.4041, 'grad_norm': 0.6755459308624268, 'learning_rate': 4.9783174327840425e-06, 'epoch': 0.15}
5%|▍ | 574/11526 [05:59<1:52:28, 1.62it/s] 5%|▍ | 575/11526 [06:00<1:52:29, 1.62it/s] {'loss': 0.4063, 'grad_norm': 0.6317890882492065, 'learning_rate': 4.986990459670426e-06, 'epoch': 0.15}
5%|▍ | 575/11526 [06:00<1:52:29, 1.62it/s] 5%|▍ | 576/11526 [06:00<1:52:27, 1.62it/s] {'loss': 0.3805, 'grad_norm': 0.6407385468482971, 'learning_rate': 4.995663486556809e-06, 'epoch': 0.15}
5%|▍ | 576/11526 [06:00<1:52:27, 1.62it/s] 5%|▌ | 577/11526 [06:01<1:52:29, 1.62it/s] {'loss': 0.4368, 'grad_norm': 0.8015986084938049, 'learning_rate': 5.004336513443192e-06, 'epoch': 0.15}
5%|▌ | 577/11526 [06:01<1:52:29, 1.62it/s] 5%|▌ | 578/11526 [06:01<1:52:23, 1.62it/s] {'loss': 0.5049, 'grad_norm': 0.7261749505996704, 'learning_rate': 5.013009540329575e-06, 'epoch': 0.15}
5%|▌ | 578/11526 [06:02<1:52:23, 1.62it/s] 5%|▌ | 579/11526 [06:02<1:52:20, 1.62it/s] {'loss': 0.3429, 'grad_norm': 0.5720138549804688, 'learning_rate': 5.021682567215958e-06, 'epoch': 0.15}
5%|▌ | 579/11526 [06:02<1:52:20, 1.62it/s] 5%|▌ | 580/11526 [06:03<1:52:25, 1.62it/s] {'loss': 0.4339, 'grad_norm': 0.7676470875740051, 'learning_rate': 5.030355594102342e-06, 'epoch': 0.15}
5%|▌ | 580/11526 [06:03<1:52:25, 1.62it/s] 5%|▌ | 581/11526 [06:03<1:52:17, 1.62it/s] {'loss': 0.4285, 'grad_norm': 0.7366043329238892, 'learning_rate': 5.039028620988725e-06, 'epoch': 0.15}
5%|▌ | 581/11526 [06:03<1:52:17, 1.62it/s] 5%|▌ | 582/11526 [06:04<1:52:25, 1.62it/s] {'loss': 0.3668, 'grad_norm': 0.7274613380432129, 'learning_rate': 5.047701647875108e-06, 'epoch': 0.15}
5%|▌ | 582/11526 [06:04<1:52:25, 1.62it/s] 5%|▌ | 583/11526 [06:05<1:52:18, 1.62it/s] {'loss': 0.4758, 'grad_norm': 0.7810908555984497, 'learning_rate': 5.056374674761492e-06, 'epoch': 0.15}
5%|▌ | 583/11526 [06:05<1:52:18, 1.62it/s] 5%|▌ | 584/11526 [06:05<1:52:12, 1.63it/s] {'loss': 0.4507, 'grad_norm': 0.752301037311554, 'learning_rate': 5.065047701647876e-06, 'epoch': 0.15}
5%|▌ | 584/11526 [06:05<1:52:12, 1.63it/s] 5%|▌ | 585/11526 [06:06<1:52:19, 1.62it/s] {'loss': 0.3714, 'grad_norm': 0.8189757466316223, 'learning_rate': 5.073720728534259e-06, 'epoch': 0.15}
5%|▌ | 585/11526 [06:06<1:52:19, 1.62it/s] 5%|▌ | 586/11526 [06:06<1:52:15, 1.62it/s] {'loss': 0.4478, 'grad_norm': 0.7168189287185669, 'learning_rate': 5.082393755420643e-06, 'epoch': 0.15}
5%|▌ | 586/11526 [06:06<1:52:15, 1.62it/s] 5%|▌ | 587/11526 [06:07<1:52:26, 1.62it/s] {'loss': 0.4083, 'grad_norm': 0.6339823007583618, 'learning_rate': 5.091066782307026e-06, 'epoch': 0.15}
5%|▌ | 587/11526 [06:07<1:52:26, 1.62it/s] 5%|▌ | 588/11526 [06:08<1:52:26, 1.62it/s] {'loss': 0.4106, 'grad_norm': 0.7957248687744141, 'learning_rate': 5.099739809193409e-06, 'epoch': 0.15}
5%|▌ | 588/11526 [06:08<1:52:26, 1.62it/s] 5%|▌ | 589/11526 [06:08<1:52:19, 1.62it/s] {'loss': 0.4726, 'grad_norm': 0.8344278931617737, 'learning_rate': 5.108412836079793e-06, 'epoch': 0.15}
5%|▌ | 589/11526 [06:08<1:52:19, 1.62it/s] 5%|▌ | 590/11526 [06:09<1:52:47, 1.62it/s] {'loss': 0.4156, 'grad_norm': 0.7491523623466492, 'learning_rate': 5.117085862966176e-06, 'epoch': 0.15}
5%|▌ | 590/11526 [06:09<1:52:47, 1.62it/s] 5%|▌ | 591/11526 [06:09<1:52:36, 1.62it/s] {'loss': 0.4552, 'grad_norm': 0.7244550585746765, 'learning_rate': 5.125758889852559e-06, 'epoch': 0.15}
5%|▌ | 591/11526 [06:10<1:52:36, 1.62it/s] 5%|▌ | 592/11526 [06:10<1:52:31, 1.62it/s] {'loss': 0.3811, 'grad_norm': 0.6590360999107361, 'learning_rate': 5.134431916738942e-06, 'epoch': 0.15}
5%|▌ | 592/11526 [06:10<1:52:31, 1.62it/s] 5%|▌ | 593/11526 [06:11<1:52:22, 1.62it/s] {'loss': 0.3963, 'grad_norm': 0.6553525328636169, 'learning_rate': 5.143104943625325e-06, 'epoch': 0.15}
5%|▌ | 593/11526 [06:11<1:52:22, 1.62it/s] 5%|▌ | 594/11526 [06:11<1:52:14, 1.62it/s] {'loss': 0.3394, 'grad_norm': 0.7342849373817444, 'learning_rate': 5.1517779705117086e-06, 'epoch': 0.15}
5%|▌ | 594/11526 [06:11<1:52:14, 1.62it/s] 5%|▌ | 595/11526 [06:12<1:52:16, 1.62it/s] {'loss': 0.4275, 'grad_norm': 1.0718433856964111, 'learning_rate': 5.160450997398092e-06, 'epoch': 0.15}
5%|▌ | 595/11526 [06:12<1:52:16, 1.62it/s] 5%|▌ | 596/11526 [06:13<1:52:10, 1.62it/s] {'loss': 0.5753, 'grad_norm': 0.7880038022994995, 'learning_rate': 5.169124024284475e-06, 'epoch': 0.16}
5%|▌ | 596/11526 [06:13<1:52:10, 1.62it/s] 5%|▌ | 597/11526 [06:13<1:52:21, 1.62it/s] {'loss': 0.5402, 'grad_norm': 0.9272701740264893, 'learning_rate': 5.177797051170859e-06, 'epoch': 0.16}
5%|▌ | 597/11526 [06:13<1:52:21, 1.62it/s] 5%|▌ | 598/11526 [06:14<1:52:14, 1.62it/s] {'loss': 0.3598, 'grad_norm': 0.5668549537658691, 'learning_rate': 5.186470078057242e-06, 'epoch': 0.16}
5%|▌ | 598/11526 [06:14<1:52:14, 1.62it/s] 5%|▌ | 599/11526 [06:14<1:52:08, 1.62it/s] {'loss': 0.3203, 'grad_norm': 0.6099364757537842, 'learning_rate': 5.195143104943626e-06, 'epoch': 0.16}
5%|▌ | 599/11526 [06:14<1:52:08, 1.62it/s] 5%|▌ | 600/11526 [06:15<1:52:07, 1.62it/s] {'loss': 0.3288, 'grad_norm': 0.6318109035491943, 'learning_rate': 5.2038161318300095e-06, 'epoch': 0.16}
5%|▌ | 600/11526 [06:15<1:52:07, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.32it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.8277760148048401, 'eval_runtime': 1.9567, 'eval_samples_per_second': 102.215, 'eval_steps_per_second': 6.644, 'epoch': 0.16}
5%|▌ | 600/11526 [06:17<1:52:07, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 5%|▌ | 601/11526 [06:18<3:39:15, 1.20s/it] {'loss': 0.5251, 'grad_norm': 0.8516116738319397, 'learning_rate': 5.212489158716393e-06, 'epoch': 0.16}
5%|▌ | 601/11526 [06:18<3:39:15, 1.20s/it] 5%|▌ | 602/11526 [06:18<3:06:58, 1.03s/it] {'loss': 0.4933, 'grad_norm': 0.865422785282135, 'learning_rate': 5.221162185602776e-06, 'epoch': 0.16}
5%|▌ | 602/11526 [06:18<3:06:58, 1.03s/it] 5%|▌ | 603/11526 [06:19<2:44:26, 1.11it/s] {'loss': 0.3097, 'grad_norm': 0.6224344372749329, 'learning_rate': 5.2298352124891595e-06, 'epoch': 0.16}
5%|▌ | 603/11526 [06:19<2:44:26, 1.11it/s] 5%|▌ | 604/11526 [06:19<2:28:40, 1.22it/s] {'loss': 0.4253, 'grad_norm': 0.7784982919692993, 'learning_rate': 5.238508239375543e-06, 'epoch': 0.16}
5%|▌ | 604/11526 [06:20<2:28:40, 1.22it/s] 5%|▌ | 605/11526 [06:20<2:17:39, 1.32it/s] {'loss': 0.3857, 'grad_norm': 0.7433805465698242, 'learning_rate': 5.247181266261926e-06, 'epoch': 0.16}
5%|▌ | 605/11526 [06:20<2:17:39, 1.32it/s] 5%|▌ | 606/11526 [06:21<2:09:55, 1.40it/s] {'loss': 0.4133, 'grad_norm': 0.7097176909446716, 'learning_rate': 5.25585429314831e-06, 'epoch': 0.16}
5%|▌ | 606/11526 [06:21<2:09:55, 1.40it/s] 5%|▌ | 607/11526 [06:21<2:04:30, 1.46it/s] {'loss': 0.4756, 'grad_norm': 0.812838613986969, 'learning_rate': 5.264527320034692e-06, 'epoch': 0.16}
5%|▌ | 607/11526 [06:21<2:04:30, 1.46it/s] 5%|▌ | 608/11526 [06:22<2:00:40, 1.51it/s] {'loss': 0.4481, 'grad_norm': 0.7140307426452637, 'learning_rate': 5.2732003469210754e-06, 'epoch': 0.16}
5%|▌ | 608/11526 [06:22<2:00:40, 1.51it/s] 5%|▌ | 609/11526 [06:22<1:58:01, 1.54it/s] {'loss': 0.3882, 'grad_norm': 0.7785724401473999, 'learning_rate': 5.281873373807459e-06, 'epoch': 0.16}
5%|▌ | 609/11526 [06:23<1:58:01, 1.54it/s] 5%|▌ | 610/11526 [06:23<1:56:16, 1.56it/s] {'loss': 0.3604, 'grad_norm': 0.6865347027778625, 'learning_rate': 5.290546400693842e-06, 'epoch': 0.16}
5%|▌ | 610/11526 [06:23<1:56:16, 1.56it/s] 5%|▌ | 611/11526 [06:24<1:54:55, 1.58it/s] {'loss': 0.3654, 'grad_norm': 0.6348939538002014, 'learning_rate': 5.2992194275802255e-06, 'epoch': 0.16}
5%|▌ | 611/11526 [06:24<1:54:55, 1.58it/s] 5%|▌ | 612/11526 [06:24<1:54:07, 1.59it/s] {'loss': 0.4542, 'grad_norm': 0.7515613436698914, 'learning_rate': 5.307892454466609e-06, 'epoch': 0.16}
5%|▌ | 612/11526 [06:24<1:54:07, 1.59it/s] 5%|▌ | 613/11526 [06:25<1:53:29, 1.60it/s] {'loss': 0.3398, 'grad_norm': 0.6049278974533081, 'learning_rate': 5.316565481352992e-06, 'epoch': 0.16}
5%|▌ | 613/11526 [06:25<1:53:29, 1.60it/s] 5%|▌ | 614/11526 [06:26<1:52:58, 1.61it/s] {'loss': 0.4018, 'grad_norm': 0.729516863822937, 'learning_rate': 5.325238508239376e-06, 'epoch': 0.16}
5%|▌ | 614/11526 [06:26<1:52:58, 1.61it/s] 5%|▌ | 615/11526 [06:26<1:52:40, 1.61it/s] {'loss': 0.3954, 'grad_norm': 0.661711573600769, 'learning_rate': 5.33391153512576e-06, 'epoch': 0.16}
5%|▌ | 615/11526 [06:26<1:52:40, 1.61it/s] 5%|▌ | 616/11526 [06:27<1:52:23, 1.62it/s] {'loss': 0.4454, 'grad_norm': 0.7562820315361023, 'learning_rate': 5.342584562012143e-06, 'epoch': 0.16}
5%|▌ | 616/11526 [06:27<1:52:23, 1.62it/s] 5%|▌ | 617/11526 [06:27<1:52:18, 1.62it/s] {'loss': 0.4337, 'grad_norm': 0.6648319959640503, 'learning_rate': 5.351257588898526e-06, 'epoch': 0.16}
5%|▌ | 617/11526 [06:28<1:52:18, 1.62it/s] 5%|▌ | 618/11526 [06:28<1:52:12, 1.62it/s] {'loss': 0.4336, 'grad_norm': 0.7453455328941345, 'learning_rate': 5.35993061578491e-06, 'epoch': 0.16}
5%|▌ | 618/11526 [06:28<1:52:12, 1.62it/s] 5%|▌ | 619/11526 [06:29<1:52:03, 1.62it/s] {'loss': 0.4137, 'grad_norm': 0.6863594651222229, 'learning_rate': 5.368603642671293e-06, 'epoch': 0.16}
5%|▌ | 619/11526 [06:29<1:52:03, 1.62it/s] 5%|▌ | 620/11526 [06:29<1:52:04, 1.62it/s] {'loss': 0.3813, 'grad_norm': 0.7223635911941528, 'learning_rate': 5.3772766695576765e-06, 'epoch': 0.16}
5%|▌ | 620/11526 [06:29<1:52:04, 1.62it/s] 5%|▌ | 621/11526 [06:30<1:51:55, 1.62it/s] {'loss': 0.4745, 'grad_norm': 0.7769727110862732, 'learning_rate': 5.38594969644406e-06, 'epoch': 0.16}
5%|▌ | 621/11526 [06:30<1:51:55, 1.62it/s] 5%|▌ | 622/11526 [06:30<1:51:58, 1.62it/s] {'loss': 0.4003, 'grad_norm': 0.6247936487197876, 'learning_rate': 5.394622723330442e-06, 'epoch': 0.16}
5%|▌ | 622/11526 [06:31<1:51:58, 1.62it/s] 5%|▌ | 623/11526 [06:31<1:51:53, 1.62it/s] {'loss': 0.3835, 'grad_norm': 0.6962177753448486, 'learning_rate': 5.403295750216826e-06, 'epoch': 0.16}
5%|▌ | 623/11526 [06:31<1:51:53, 1.62it/s] 5%|▌ | 624/11526 [06:32<1:51:50, 1.62it/s] {'loss': 0.3266, 'grad_norm': 0.5521346926689148, 'learning_rate': 5.411968777103209e-06, 'epoch': 0.16}
5%|▌ | 624/11526 [06:32<1:51:50, 1.62it/s] 5%|▌ | 625/11526 [06:32<1:51:53, 1.62it/s] {'loss': 0.4881, 'grad_norm': 0.7635420560836792, 'learning_rate': 5.420641803989592e-06, 'epoch': 0.16}
5%|▌ | 625/11526 [06:32<1:51:53, 1.62it/s] 5%|▌ | 626/11526 [06:33<1:51:46, 1.63it/s] {'loss': 0.4195, 'grad_norm': 0.683856725692749, 'learning_rate': 5.429314830875976e-06, 'epoch': 0.16}
5%|▌ | 626/11526 [06:33<1:51:46, 1.63it/s] 5%|▌ | 627/11526 [06:34<1:51:51, 1.62it/s] {'loss': 0.4148, 'grad_norm': 0.8593076467514038, 'learning_rate': 5.437987857762359e-06, 'epoch': 0.16}
5%|▌ | 627/11526 [06:34<1:51:51, 1.62it/s] 5%|▌ | 628/11526 [06:34<1:51:48, 1.62it/s] {'loss': 0.3995, 'grad_norm': 0.7127716541290283, 'learning_rate': 5.446660884648742e-06, 'epoch': 0.16}
5%|▌ | 628/11526 [06:34<1:51:48, 1.62it/s] 5%|▌ | 629/11526 [06:35<1:51:42, 1.63it/s] {'loss': 0.4479, 'grad_norm': 0.8563690185546875, 'learning_rate': 5.455333911535127e-06, 'epoch': 0.16}
5%|▌ | 629/11526 [06:35<1:51:42, 1.63it/s] 5%|▌ | 630/11526 [06:35<1:52:08, 1.62it/s] {'loss': 0.4303, 'grad_norm': 0.6810048222541809, 'learning_rate': 5.46400693842151e-06, 'epoch': 0.16}
5%|▌ | 630/11526 [06:36<1:52:08, 1.62it/s] 5%|▌ | 631/11526 [06:36<1:51:59, 1.62it/s] {'loss': 0.3767, 'grad_norm': 0.6722681522369385, 'learning_rate': 5.472679965307893e-06, 'epoch': 0.16}
5%|▌ | 631/11526 [06:36<1:51:59, 1.62it/s] 5%|▌ | 632/11526 [06:37<1:51:59, 1.62it/s] {'loss': 0.3967, 'grad_norm': 0.7507127523422241, 'learning_rate': 5.481352992194277e-06, 'epoch': 0.16}
5%|▌ | 632/11526 [06:37<1:51:59, 1.62it/s] 5%|▌ | 633/11526 [06:37<1:51:47, 1.62it/s] {'loss': 0.4244, 'grad_norm': 0.702053964138031, 'learning_rate': 5.49002601908066e-06, 'epoch': 0.16}
5%|▌ | 633/11526 [06:37<1:51:47, 1.62it/s] 6%|▌ | 634/11526 [06:38<1:51:48, 1.62it/s] {'loss': 0.3541, 'grad_norm': 0.6062840819358826, 'learning_rate': 5.498699045967043e-06, 'epoch': 0.17}
6%|▌ | 634/11526 [06:38<1:51:48, 1.62it/s] 6%|▌ | 635/11526 [06:38<1:51:49, 1.62it/s] {'loss': 0.3666, 'grad_norm': 0.6902633905410767, 'learning_rate': 5.507372072853427e-06, 'epoch': 0.17}
6%|▌ | 635/11526 [06:39<1:51:49, 1.62it/s] 6%|▌ | 636/11526 [06:39<1:51:44, 1.62it/s] {'loss': 0.4465, 'grad_norm': 0.7360482215881348, 'learning_rate': 5.51604509973981e-06, 'epoch': 0.17}
6%|▌ | 636/11526 [06:39<1:51:44, 1.62it/s] 6%|▌ | 637/11526 [06:40<1:51:49, 1.62it/s] {'loss': 0.4704, 'grad_norm': 0.8006880283355713, 'learning_rate': 5.5247181266261925e-06, 'epoch': 0.17}
6%|▌ | 637/11526 [06:40<1:51:49, 1.62it/s] 6%|▌ | 638/11526 [06:40<1:51:46, 1.62it/s] {'loss': 0.5234, 'grad_norm': 0.7523778080940247, 'learning_rate': 5.533391153512576e-06, 'epoch': 0.17}
6%|▌ | 638/11526 [06:40<1:51:46, 1.62it/s] 6%|▌ | 639/11526 [06:41<1:51:39, 1.62it/s] {'loss': 0.3837, 'grad_norm': 0.6712956428527832, 'learning_rate': 5.542064180398959e-06, 'epoch': 0.17}
6%|▌ | 639/11526 [06:41<1:51:39, 1.62it/s] 6%|▌ | 640/11526 [06:42<1:51:46, 1.62it/s] {'loss': 0.3981, 'grad_norm': 0.6499000787734985, 'learning_rate': 5.5507372072853426e-06, 'epoch': 0.17}
6%|▌ | 640/11526 [06:42<1:51:46, 1.62it/s] 6%|▌ | 641/11526 [06:42<1:51:41, 1.62it/s] {'loss': 0.4782, 'grad_norm': 0.7593610286712646, 'learning_rate': 5.559410234171726e-06, 'epoch': 0.17}
6%|▌ | 641/11526 [06:42<1:51:41, 1.62it/s] 6%|▌ | 642/11526 [06:43<1:51:44, 1.62it/s] {'loss': 0.4122, 'grad_norm': 0.817128598690033, 'learning_rate': 5.568083261058109e-06, 'epoch': 0.17}
6%|▌ | 642/11526 [06:43<1:51:44, 1.62it/s] 6%|▌ | 643/11526 [06:43<1:51:40, 1.62it/s] {'loss': 0.4279, 'grad_norm': 0.8090819120407104, 'learning_rate': 5.576756287944493e-06, 'epoch': 0.17}
6%|▌ | 643/11526 [06:44<1:51:40, 1.62it/s] 6%|▌ | 644/11526 [06:44<1:51:43, 1.62it/s] {'loss': 0.4543, 'grad_norm': 0.6981033682823181, 'learning_rate': 5.585429314830877e-06, 'epoch': 0.17}
6%|▌ | 644/11526 [06:44<1:51:43, 1.62it/s] 6%|▌ | 645/11526 [06:45<1:52:17, 1.62it/s] {'loss': 0.3563, 'grad_norm': 0.6695054173469543, 'learning_rate': 5.59410234171726e-06, 'epoch': 0.17}
6%|▌ | 645/11526 [06:45<1:52:17, 1.62it/s] 6%|▌ | 646/11526 [06:45<1:52:07, 1.62it/s] {'loss': 0.4002, 'grad_norm': 0.6255576014518738, 'learning_rate': 5.6027753686036435e-06, 'epoch': 0.17}
6%|▌ | 646/11526 [06:45<1:52:07, 1.62it/s] 6%|▌ | 647/11526 [06:46<1:52:37, 1.61it/s] {'loss': 0.372, 'grad_norm': 0.5963560938835144, 'learning_rate': 5.611448395490027e-06, 'epoch': 0.17}
6%|▌ | 647/11526 [06:46<1:52:37, 1.61it/s] 6%|▌ | 648/11526 [06:47<1:52:14, 1.62it/s] {'loss': 0.4189, 'grad_norm': 0.6875995397567749, 'learning_rate': 5.62012142237641e-06, 'epoch': 0.17}
6%|▌ | 648/11526 [06:47<1:52:14, 1.62it/s] 6%|▌ | 649/11526 [06:47<1:51:59, 1.62it/s] {'loss': 0.3632, 'grad_norm': 0.6571831107139587, 'learning_rate': 5.6287944492627935e-06, 'epoch': 0.17}
6%|▌ | 649/11526 [06:47<1:51:59, 1.62it/s] 6%|▌ | 650/11526 [06:48<1:51:54, 1.62it/s] {'loss': 0.3732, 'grad_norm': 0.73530513048172, 'learning_rate': 5.637467476149177e-06, 'epoch': 0.17}
6%|▌ | 650/11526 [06:48<1:51:54, 1.62it/s] 6%|▌ | 651/11526 [06:48<1:51:41, 1.62it/s] {'loss': 0.3986, 'grad_norm': 0.7153792977333069, 'learning_rate': 5.64614050303556e-06, 'epoch': 0.17}
6%|▌ | 651/11526 [06:48<1:51:41, 1.62it/s] 6%|▌ | 652/11526 [06:49<1:51:46, 1.62it/s] {'loss': 0.408, 'grad_norm': 0.7042515277862549, 'learning_rate': 5.654813529921943e-06, 'epoch': 0.17}
6%|▌ | 652/11526 [06:49<1:51:46, 1.62it/s] 6%|▌ | 653/11526 [06:50<1:51:38, 1.62it/s] {'loss': 0.4053, 'grad_norm': 0.7995944023132324, 'learning_rate': 5.663486556808326e-06, 'epoch': 0.17}
6%|▌ | 653/11526 [06:50<1:51:38, 1.62it/s] 6%|▌ | 654/11526 [06:50<1:51:34, 1.62it/s] {'loss': 0.4447, 'grad_norm': 0.7605393528938293, 'learning_rate': 5.6721595836947094e-06, 'epoch': 0.17}
6%|▌ | 654/11526 [06:50<1:51:34, 1.62it/s] 6%|▌ | 655/11526 [06:51<1:51:40, 1.62it/s] {'loss': 0.4376, 'grad_norm': 0.8139204382896423, 'learning_rate': 5.680832610581093e-06, 'epoch': 0.17}
6%|▌ | 655/11526 [06:51<1:51:40, 1.62it/s] 6%|▌ | 656/11526 [06:51<1:51:39, 1.62it/s] {'loss': 0.4363, 'grad_norm': 0.8124903440475464, 'learning_rate': 5.689505637467476e-06, 'epoch': 0.17}
6%|▌ | 656/11526 [06:52<1:51:39, 1.62it/s] 6%|▌ | 657/11526 [06:52<1:51:42, 1.62it/s] {'loss': 0.3036, 'grad_norm': 0.6453284621238708, 'learning_rate': 5.6981786643538595e-06, 'epoch': 0.17}
6%|▌ | 657/11526 [06:52<1:51:42, 1.62it/s] 6%|▌ | 658/11526 [06:53<1:51:35, 1.62it/s] {'loss': 0.3601, 'grad_norm': 0.704792320728302, 'learning_rate': 5.706851691240244e-06, 'epoch': 0.17}
6%|▌ | 658/11526 [06:53<1:51:35, 1.62it/s] 6%|▌ | 659/11526 [06:53<1:51:27, 1.63it/s] {'loss': 0.5292, 'grad_norm': 0.864479124546051, 'learning_rate': 5.715524718126627e-06, 'epoch': 0.17}
6%|▌ | 659/11526 [06:53<1:51:27, 1.63it/s] 6%|▌ | 660/11526 [06:54<1:51:40, 1.62it/s] {'loss': 0.373, 'grad_norm': 0.7101563811302185, 'learning_rate': 5.72419774501301e-06, 'epoch': 0.17}
6%|▌ | 660/11526 [06:54<1:51:40, 1.62it/s] 6%|▌ | 661/11526 [06:55<1:51:33, 1.62it/s] {'loss': 0.4179, 'grad_norm': 0.7074534893035889, 'learning_rate': 5.732870771899394e-06, 'epoch': 0.17}
6%|▌ | 661/11526 [06:55<1:51:33, 1.62it/s] 6%|▌ | 662/11526 [06:55<1:51:34, 1.62it/s] {'loss': 0.3482, 'grad_norm': 0.673642098903656, 'learning_rate': 5.741543798785777e-06, 'epoch': 0.17}
6%|▌ | 662/11526 [06:55<1:51:34, 1.62it/s] 6%|▌ | 663/11526 [06:56<1:51:25, 1.62it/s] {'loss': 0.3772, 'grad_norm': 0.7207738161087036, 'learning_rate': 5.75021682567216e-06, 'epoch': 0.17}
6%|▌ | 663/11526 [06:56<1:51:25, 1.62it/s] 6%|▌ | 664/11526 [06:56<1:51:27, 1.62it/s] {'loss': 0.452, 'grad_norm': 0.7771927714347839, 'learning_rate': 5.758889852558544e-06, 'epoch': 0.17}
6%|▌ | 664/11526 [06:56<1:51:27, 1.62it/s] 6%|▌ | 665/11526 [06:57<1:51:33, 1.62it/s] {'loss': 0.3511, 'grad_norm': 0.6626598238945007, 'learning_rate': 5.767562879444927e-06, 'epoch': 0.17}
6%|▌ | 665/11526 [06:57<1:51:33, 1.62it/s] 6%|▌ | 666/11526 [06:58<1:51:32, 1.62it/s] {'loss': 0.4722, 'grad_norm': 0.8726609349250793, 'learning_rate': 5.7762359063313105e-06, 'epoch': 0.17}
6%|▌ | 666/11526 [06:58<1:51:32, 1.62it/s] 6%|▌ | 667/11526 [06:58<1:51:31, 1.62it/s] {'loss': 0.4453, 'grad_norm': 0.7692903876304626, 'learning_rate': 5.784908933217693e-06, 'epoch': 0.17}
6%|▌ | 667/11526 [06:58<1:51:31, 1.62it/s] 6%|▌ | 668/11526 [06:59<1:51:23, 1.62it/s] {'loss': 0.3389, 'grad_norm': 0.6361052989959717, 'learning_rate': 5.793581960104076e-06, 'epoch': 0.17}
6%|▌ | 668/11526 [06:59<1:51:23, 1.62it/s] 6%|▌ | 669/11526 [06:59<1:51:15, 1.63it/s] {'loss': 0.4245, 'grad_norm': 0.7561235427856445, 'learning_rate': 5.80225498699046e-06, 'epoch': 0.17}
6%|▌ | 669/11526 [07:00<1:51:15, 1.63it/s] 6%|▌ | 670/11526 [07:00<1:51:33, 1.62it/s] {'loss': 0.4333, 'grad_norm': 0.8586171865463257, 'learning_rate': 5.810928013876843e-06, 'epoch': 0.17}
6%|▌ | 670/11526 [07:00<1:51:33, 1.62it/s] 6%|▌ | 671/11526 [07:01<1:51:26, 1.62it/s] {'loss': 0.4232, 'grad_norm': 0.6692326068878174, 'learning_rate': 5.819601040763226e-06, 'epoch': 0.17}
6%|▌ | 671/11526 [07:01<1:51:26, 1.62it/s] 6%|▌ | 672/11526 [07:01<1:51:27, 1.62it/s] {'loss': 0.3572, 'grad_norm': 0.678993284702301, 'learning_rate': 5.82827406764961e-06, 'epoch': 0.17}
6%|▌ | 672/11526 [07:01<1:51:27, 1.62it/s] 6%|▌ | 673/11526 [07:02<1:51:27, 1.62it/s] {'loss': 0.3759, 'grad_norm': 0.6678770184516907, 'learning_rate': 5.836947094535994e-06, 'epoch': 0.18}
6%|▌ | 673/11526 [07:02<1:51:27, 1.62it/s] 6%|▌ | 674/11526 [07:03<1:51:22, 1.62it/s] {'loss': 0.3637, 'grad_norm': 0.6033802628517151, 'learning_rate': 5.845620121422377e-06, 'epoch': 0.18}
6%|▌ | 674/11526 [07:03<1:51:22, 1.62it/s] 6%|▌ | 675/11526 [07:03<1:51:28, 1.62it/s] {'loss': 0.4651, 'grad_norm': 0.6968647241592407, 'learning_rate': 5.854293148308761e-06, 'epoch': 0.18}
6%|▌ | 675/11526 [07:03<1:51:28, 1.62it/s] 6%|▌ | 676/11526 [07:04<1:51:23, 1.62it/s] {'loss': 0.2911, 'grad_norm': 0.6785936951637268, 'learning_rate': 5.862966175195144e-06, 'epoch': 0.18}
6%|▌ | 676/11526 [07:04<1:51:23, 1.62it/s] 6%|▌ | 677/11526 [07:04<1:51:52, 1.62it/s] {'loss': 0.5108, 'grad_norm': 0.7493978142738342, 'learning_rate': 5.871639202081527e-06, 'epoch': 0.18}
6%|▌ | 677/11526 [07:05<1:51:52, 1.62it/s] 6%|▌ | 678/11526 [07:05<1:51:40, 1.62it/s] {'loss': 0.393, 'grad_norm': 0.7035509943962097, 'learning_rate': 5.880312228967911e-06, 'epoch': 0.18}
6%|▌ | 678/11526 [07:05<1:51:40, 1.62it/s] 6%|▌ | 679/11526 [07:06<1:51:29, 1.62it/s] {'loss': 0.3535, 'grad_norm': 0.8260535597801208, 'learning_rate': 5.888985255854294e-06, 'epoch': 0.18}
6%|▌ | 679/11526 [07:06<1:51:29, 1.62it/s] 6%|▌ | 680/11526 [07:06<1:51:30, 1.62it/s] {'loss': 0.4195, 'grad_norm': 0.7198976874351501, 'learning_rate': 5.897658282740677e-06, 'epoch': 0.18}
6%|▌ | 680/11526 [07:06<1:51:30, 1.62it/s] 6%|▌ | 681/11526 [07:07<1:51:23, 1.62it/s] {'loss': 0.4544, 'grad_norm': 0.8647755980491638, 'learning_rate': 5.90633130962706e-06, 'epoch': 0.18}
6%|▌ | 681/11526 [07:07<1:51:23, 1.62it/s] 6%|▌ | 682/11526 [07:07<1:51:29, 1.62it/s] {'loss': 0.3607, 'grad_norm': 0.6065459251403809, 'learning_rate': 5.915004336513443e-06, 'epoch': 0.18}
6%|▌ | 682/11526 [07:08<1:51:29, 1.62it/s] 6%|▌ | 683/11526 [07:08<1:51:20, 1.62it/s] {'loss': 0.321, 'grad_norm': 0.6130356192588806, 'learning_rate': 5.9236773633998265e-06, 'epoch': 0.18}
6%|▌ | 683/11526 [07:08<1:51:20, 1.62it/s] 6%|▌ | 684/11526 [07:09<1:51:16, 1.62it/s] {'loss': 0.3468, 'grad_norm': 0.7903779745101929, 'learning_rate': 5.93235039028621e-06, 'epoch': 0.18}
6%|▌ | 684/11526 [07:09<1:51:16, 1.62it/s] 6%|▌ | 685/11526 [07:09<1:51:26, 1.62it/s] {'loss': 0.3648, 'grad_norm': 0.6771460175514221, 'learning_rate': 5.941023417172593e-06, 'epoch': 0.18}
6%|▌ | 685/11526 [07:09<1:51:26, 1.62it/s] 6%|▌ | 686/11526 [07:10<1:51:19, 1.62it/s] {'loss': 0.3304, 'grad_norm': 0.6622586846351624, 'learning_rate': 5.9496964440589766e-06, 'epoch': 0.18}
6%|▌ | 686/11526 [07:10<1:51:19, 1.62it/s] 6%|▌ | 687/11526 [07:11<1:51:18, 1.62it/s] {'loss': 0.4226, 'grad_norm': 0.7631015181541443, 'learning_rate': 5.95836947094536e-06, 'epoch': 0.18}
6%|▌ | 687/11526 [07:11<1:51:18, 1.62it/s] 6%|▌ | 688/11526 [07:11<1:51:14, 1.62it/s] {'loss': 0.4445, 'grad_norm': 0.8758919835090637, 'learning_rate': 5.967042497831744e-06, 'epoch': 0.18}
6%|▌ | 688/11526 [07:11<1:51:14, 1.62it/s] 6%|▌ | 689/11526 [07:12<1:51:14, 1.62it/s] {'loss': 0.3256, 'grad_norm': 0.6754420399665833, 'learning_rate': 5.9757155247181275e-06, 'epoch': 0.18}
6%|▌ | 689/11526 [07:12<1:51:14, 1.62it/s] 6%|▌ | 690/11526 [07:12<1:51:15, 1.62it/s] {'loss': 0.3736, 'grad_norm': 0.7025169730186462, 'learning_rate': 5.984388551604511e-06, 'epoch': 0.18}
6%|▌ | 690/11526 [07:13<1:51:15, 1.62it/s] 6%|▌ | 691/11526 [07:13<1:51:12, 1.62it/s] {'loss': 0.5128, 'grad_norm': 0.7584311962127686, 'learning_rate': 5.993061578490894e-06, 'epoch': 0.18}
6%|▌ | 691/11526 [07:13<1:51:12, 1.62it/s] 6%|▌ | 692/11526 [07:14<1:51:15, 1.62it/s] {'loss': 0.4513, 'grad_norm': 0.7602017521858215, 'learning_rate': 6.0017346053772775e-06, 'epoch': 0.18}
6%|▌ | 692/11526 [07:14<1:51:15, 1.62it/s] 6%|▌ | 693/11526 [07:14<1:51:08, 1.62it/s] {'loss': 0.4106, 'grad_norm': 0.7211409211158752, 'learning_rate': 6.010407632263661e-06, 'epoch': 0.18}
6%|▌ | 693/11526 [07:14<1:51:08, 1.62it/s] 6%|▌ | 694/11526 [07:15<1:51:03, 1.63it/s] {'loss': 0.3913, 'grad_norm': 0.8201491236686707, 'learning_rate': 6.019080659150044e-06, 'epoch': 0.18}
6%|▌ | 694/11526 [07:15<1:51:03, 1.63it/s] 6%|▌ | 695/11526 [07:15<1:51:41, 1.62it/s] {'loss': 0.4042, 'grad_norm': 0.731590986251831, 'learning_rate': 6.0277536860364275e-06, 'epoch': 0.18}
6%|▌ | 695/11526 [07:16<1:51:41, 1.62it/s] 6%|▌ | 696/11526 [07:16<1:51:26, 1.62it/s] {'loss': 0.3312, 'grad_norm': 0.666768491268158, 'learning_rate': 6.03642671292281e-06, 'epoch': 0.18}
6%|▌ | 696/11526 [07:16<1:51:26, 1.62it/s] 6%|▌ | 697/11526 [07:17<1:51:22, 1.62it/s] {'loss': 0.4788, 'grad_norm': 0.8642355799674988, 'learning_rate': 6.045099739809193e-06, 'epoch': 0.18}
6%|▌ | 697/11526 [07:17<1:51:22, 1.62it/s] 6%|▌ | 698/11526 [07:17<1:51:17, 1.62it/s] {'loss': 0.3597, 'grad_norm': 0.6812474131584167, 'learning_rate': 6.053772766695577e-06, 'epoch': 0.18}
6%|▌ | 698/11526 [07:17<1:51:17, 1.62it/s] 6%|▌ | 699/11526 [07:18<1:51:09, 1.62it/s] {'loss': 0.352, 'grad_norm': 0.8071223497390747, 'learning_rate': 6.06244579358196e-06, 'epoch': 0.18}
6%|▌ | 699/11526 [07:18<1:51:09, 1.62it/s] 6%|▌ | 700/11526 [07:19<1:51:13, 1.62it/s] {'loss': 0.3365, 'grad_norm': 0.6415539383888245, 'learning_rate': 6.0711188204683434e-06, 'epoch': 0.18}
6%|▌ | 700/11526 [07:19<1:51:13, 1.62it/s] 6%|▌ | 701/11526 [07:19<1:51:07, 1.62it/s] {'loss': 0.3526, 'grad_norm': 0.785638689994812, 'learning_rate': 6.079791847354727e-06, 'epoch': 0.18}
6%|▌ | 701/11526 [07:19<1:51:07, 1.62it/s] 6%|▌ | 702/11526 [07:20<1:51:21, 1.62it/s] {'loss': 0.3965, 'grad_norm': 0.7369773983955383, 'learning_rate': 6.08846487424111e-06, 'epoch': 0.18}
6%|▌ | 702/11526 [07:20<1:51:21, 1.62it/s] 6%|▌ | 703/11526 [07:20<1:51:08, 1.62it/s] {'loss': 0.4333, 'grad_norm': 0.7616879343986511, 'learning_rate': 6.097137901127494e-06, 'epoch': 0.18}
6%|▌ | 703/11526 [07:21<1:51:08, 1.62it/s] 6%|▌ | 704/11526 [07:21<1:51:04, 1.62it/s] {'loss': 0.3108, 'grad_norm': 0.6483084559440613, 'learning_rate': 6.105810928013878e-06, 'epoch': 0.18}
6%|▌ | 704/11526 [07:21<1:51:04, 1.62it/s] 6%|▌ | 705/11526 [07:22<1:51:05, 1.62it/s] {'loss': 0.3998, 'grad_norm': 0.8608818054199219, 'learning_rate': 6.114483954900261e-06, 'epoch': 0.18}
6%|▌ | 705/11526 [07:22<1:51:05, 1.62it/s] 6%|▌ | 706/11526 [07:22<1:51:03, 1.62it/s] {'loss': 0.3554, 'grad_norm': 0.7020348310470581, 'learning_rate': 6.123156981786644e-06, 'epoch': 0.18}
6%|▌ | 706/11526 [07:22<1:51:03, 1.62it/s] 6%|▌ | 707/11526 [07:23<1:51:01, 1.62it/s] {'loss': 0.4039, 'grad_norm': 0.6991458535194397, 'learning_rate': 6.131830008673028e-06, 'epoch': 0.18}
6%|▌ | 707/11526 [07:23<1:51:01, 1.62it/s] 6%|▌ | 708/11526 [07:23<1:51:02, 1.62it/s] {'loss': 0.4194, 'grad_norm': 0.8185877799987793, 'learning_rate': 6.140503035559411e-06, 'epoch': 0.18}
6%|▌ | 708/11526 [07:24<1:51:02, 1.62it/s] 6%|▌ | 709/11526 [07:24<1:50:59, 1.62it/s] {'loss': 0.3881, 'grad_norm': 0.8268315196037292, 'learning_rate': 6.149176062445794e-06, 'epoch': 0.18}
6%|▌ | 709/11526 [07:24<1:50:59, 1.62it/s] 6%|▌ | 710/11526 [07:25<1:51:05, 1.62it/s] {'loss': 0.4213, 'grad_norm': 0.7795161008834839, 'learning_rate': 6.157849089332178e-06, 'epoch': 0.18}
6%|▌ | 710/11526 [07:25<1:51:05, 1.62it/s] 6%|▌ | 711/11526 [07:25<1:51:02, 1.62it/s] {'loss': 0.4904, 'grad_norm': 0.8342912793159485, 'learning_rate': 6.16652211621856e-06, 'epoch': 0.19}
6%|▌ | 711/11526 [07:25<1:51:02, 1.62it/s] 6%|▌ | 712/11526 [07:26<1:51:10, 1.62it/s] {'loss': 0.4362, 'grad_norm': 0.7214239239692688, 'learning_rate': 6.175195143104944e-06, 'epoch': 0.19}
6%|▌ | 712/11526 [07:26<1:51:10, 1.62it/s] 6%|▌ | 713/11526 [07:27<1:51:02, 1.62it/s] {'loss': 0.4309, 'grad_norm': 0.7811275124549866, 'learning_rate': 6.183868169991327e-06, 'epoch': 0.19}
6%|▌ | 713/11526 [07:27<1:51:02, 1.62it/s] 6%|▌ | 714/11526 [07:27<1:50:55, 1.62it/s] {'loss': 0.3367, 'grad_norm': 0.6584367752075195, 'learning_rate': 6.19254119687771e-06, 'epoch': 0.19}
6%|▌ | 714/11526 [07:27<1:50:55, 1.62it/s] 6%|▌ | 715/11526 [07:28<1:51:22, 1.62it/s] {'loss': 0.4319, 'grad_norm': 0.8556516170501709, 'learning_rate': 6.201214223764094e-06, 'epoch': 0.19}
6%|▌ | 715/11526 [07:28<1:51:22, 1.62it/s] 6%|▌ | 716/11526 [07:28<1:51:11, 1.62it/s] {'loss': 0.3365, 'grad_norm': 0.6348908543586731, 'learning_rate': 6.209887250650477e-06, 'epoch': 0.19}
6%|▌ | 716/11526 [07:29<1:51:11, 1.62it/s] 6%|▌ | 717/11526 [07:29<1:51:09, 1.62it/s] {'loss': 0.3149, 'grad_norm': 0.6364282965660095, 'learning_rate': 6.218560277536861e-06, 'epoch': 0.19}
6%|▌ | 717/11526 [07:29<1:51:09, 1.62it/s] 6%|▌ | 718/11526 [07:30<1:51:00, 1.62it/s] {'loss': 0.5053, 'grad_norm': 0.8278027176856995, 'learning_rate': 6.2272333044232445e-06, 'epoch': 0.19}
6%|▌ | 718/11526 [07:30<1:51:00, 1.62it/s] 6%|▌ | 719/11526 [07:30<1:50:56, 1.62it/s] {'loss': 0.3632, 'grad_norm': 0.7414032816886902, 'learning_rate': 6.235906331309628e-06, 'epoch': 0.19}
6%|▌ | 719/11526 [07:30<1:50:56, 1.62it/s] 6%|▌ | 720/11526 [07:31<1:51:01, 1.62it/s] {'loss': 0.4039, 'grad_norm': 0.8173367977142334, 'learning_rate': 6.244579358196011e-06, 'epoch': 0.19}
6%|▌ | 720/11526 [07:31<1:51:01, 1.62it/s] 6%|▋ | 721/11526 [07:32<1:50:54, 1.62it/s] {'loss': 0.3312, 'grad_norm': 0.692030668258667, 'learning_rate': 6.253252385082395e-06, 'epoch': 0.19}
6%|▋ | 721/11526 [07:32<1:50:54, 1.62it/s] 6%|▋ | 722/11526 [07:32<1:50:51, 1.62it/s] {'loss': 0.3877, 'grad_norm': 0.7351370453834534, 'learning_rate': 6.261925411968778e-06, 'epoch': 0.19}
6%|▋ | 722/11526 [07:32<1:50:51, 1.62it/s] 6%|▋ | 723/11526 [07:33<1:50:45, 1.63it/s] {'loss': 0.4981, 'grad_norm': 0.8904139399528503, 'learning_rate': 6.270598438855161e-06, 'epoch': 0.19}
6%|▋ | 723/11526 [07:33<1:50:45, 1.63it/s] 6%|▋ | 724/11526 [07:33<1:50:40, 1.63it/s] {'loss': 0.3106, 'grad_norm': 0.7477068901062012, 'learning_rate': 6.279271465741545e-06, 'epoch': 0.19}
6%|▋ | 724/11526 [07:33<1:50:40, 1.63it/s] 6%|▋ | 725/11526 [07:34<1:50:46, 1.63it/s] {'loss': 0.5024, 'grad_norm': 0.8378632664680481, 'learning_rate': 6.287944492627928e-06, 'epoch': 0.19}
6%|▋ | 725/11526 [07:34<1:50:46, 1.63it/s] 6%|▋ | 726/11526 [07:35<1:50:39, 1.63it/s] {'loss': 0.3964, 'grad_norm': 0.7309871315956116, 'learning_rate': 6.2966175195143105e-06, 'epoch': 0.19}
6%|▋ | 726/11526 [07:35<1:50:39, 1.63it/s] 6%|▋ | 727/11526 [07:35<1:50:50, 1.62it/s] {'loss': 0.4431, 'grad_norm': 0.686634361743927, 'learning_rate': 6.305290546400694e-06, 'epoch': 0.19}
6%|▋ | 727/11526 [07:35<1:50:50, 1.62it/s] 6%|▋ | 728/11526 [07:36<1:50:48, 1.62it/s] {'loss': 0.3976, 'grad_norm': 0.7263730764389038, 'learning_rate': 6.313963573287077e-06, 'epoch': 0.19}
6%|▋ | 728/11526 [07:36<1:50:48, 1.62it/s] 6%|▋ | 729/11526 [07:36<1:50:43, 1.63it/s] {'loss': 0.2691, 'grad_norm': 0.5932652354240417, 'learning_rate': 6.3226366001734605e-06, 'epoch': 0.19}
6%|▋ | 729/11526 [07:37<1:50:43, 1.63it/s] 6%|▋ | 730/11526 [07:37<1:50:51, 1.62it/s] {'loss': 0.483, 'grad_norm': 0.7892284393310547, 'learning_rate': 6.331309627059844e-06, 'epoch': 0.19}
6%|▋ | 730/11526 [07:37<1:50:51, 1.62it/s] 6%|▋ | 731/11526 [07:38<1:50:43, 1.62it/s] {'loss': 0.3184, 'grad_norm': 0.6759921908378601, 'learning_rate': 6.339982653946227e-06, 'epoch': 0.19}
6%|▋ | 731/11526 [07:38<1:50:43, 1.62it/s] 6%|▋ | 732/11526 [07:38<1:50:46, 1.62it/s] {'loss': 0.3971, 'grad_norm': 0.7926105856895447, 'learning_rate': 6.348655680832611e-06, 'epoch': 0.19}
6%|▋ | 732/11526 [07:38<1:50:46, 1.62it/s] 6%|▋ | 733/11526 [07:39<1:50:40, 1.63it/s] {'loss': 0.3791, 'grad_norm': 0.7854400277137756, 'learning_rate': 6.357328707718995e-06, 'epoch': 0.19}
6%|▋ | 733/11526 [07:39<1:50:40, 1.63it/s] 6%|▋ | 734/11526 [07:39<1:50:37, 1.63it/s] {'loss': 0.3828, 'grad_norm': 0.7191298007965088, 'learning_rate': 6.366001734605378e-06, 'epoch': 0.19}
6%|▋ | 734/11526 [07:40<1:50:37, 1.63it/s] 6%|▋ | 735/11526 [07:40<1:50:49, 1.62it/s] {'loss': 0.3224, 'grad_norm': 0.6002159118652344, 'learning_rate': 6.3746747614917615e-06, 'epoch': 0.19}
6%|▋ | 735/11526 [07:40<1:50:49, 1.62it/s] 6%|▋ | 736/11526 [07:41<1:50:40, 1.62it/s] {'loss': 0.3399, 'grad_norm': 0.7986701726913452, 'learning_rate': 6.383347788378145e-06, 'epoch': 0.19}
6%|▋ | 736/11526 [07:41<1:50:40, 1.62it/s] 6%|▋ | 737/11526 [07:41<1:51:10, 1.62it/s] {'loss': 0.4375, 'grad_norm': 0.7030100226402283, 'learning_rate': 6.392020815264528e-06, 'epoch': 0.19}
6%|▋ | 737/11526 [07:41<1:51:10, 1.62it/s] 6%|▋ | 738/11526 [07:42<1:50:54, 1.62it/s] {'loss': 0.3473, 'grad_norm': 0.7116193175315857, 'learning_rate': 6.4006938421509115e-06, 'epoch': 0.19}
6%|▋ | 738/11526 [07:42<1:50:54, 1.62it/s] 6%|▋ | 739/11526 [07:43<1:50:43, 1.62it/s] {'loss': 0.4066, 'grad_norm': 0.8682845234870911, 'learning_rate': 6.409366869037295e-06, 'epoch': 0.19}
6%|▋ | 739/11526 [07:43<1:50:43, 1.62it/s] 6%|▋ | 740/11526 [07:43<1:50:56, 1.62it/s] {'loss': 0.4303, 'grad_norm': 0.7984737157821655, 'learning_rate': 6.418039895923678e-06, 'epoch': 0.19}
6%|▋ | 740/11526 [07:43<1:50:56, 1.62it/s] 6%|▋ | 741/11526 [07:44<1:50:46, 1.62it/s] {'loss': 0.3462, 'grad_norm': 0.8835185170173645, 'learning_rate': 6.426712922810061e-06, 'epoch': 0.19}
6%|▋ | 741/11526 [07:44<1:50:46, 1.62it/s] 6%|▋ | 742/11526 [07:44<1:51:19, 1.61it/s] {'loss': 0.4372, 'grad_norm': 0.7319797873497009, 'learning_rate': 6.435385949696444e-06, 'epoch': 0.19}
6%|▋ | 742/11526 [07:45<1:51:19, 1.61it/s] 6%|▋ | 743/11526 [07:45<1:51:09, 1.62it/s] {'loss': 0.3582, 'grad_norm': 0.6791582107543945, 'learning_rate': 6.444058976582827e-06, 'epoch': 0.19}
6%|▋ | 743/11526 [07:45<1:51:09, 1.62it/s] 6%|▋ | 744/11526 [07:46<1:50:56, 1.62it/s] {'loss': 0.425, 'grad_norm': 0.8135459423065186, 'learning_rate': 6.452732003469211e-06, 'epoch': 0.19}
6%|▋ | 744/11526 [07:46<1:50:56, 1.62it/s] 6%|▋ | 745/11526 [07:46<1:50:59, 1.62it/s] {'loss': 0.5121, 'grad_norm': 1.1523535251617432, 'learning_rate': 6.461405030355594e-06, 'epoch': 0.19}
6%|▋ | 745/11526 [07:46<1:50:59, 1.62it/s] 6%|▋ | 746/11526 [07:47<1:50:50, 1.62it/s] {'loss': 0.3832, 'grad_norm': 0.6926369667053223, 'learning_rate': 6.4700780572419774e-06, 'epoch': 0.19}
6%|▋ | 746/11526 [07:47<1:50:50, 1.62it/s] 6%|▋ | 747/11526 [07:48<1:50:47, 1.62it/s] {'loss': 0.2742, 'grad_norm': 0.7597611546516418, 'learning_rate': 6.478751084128362e-06, 'epoch': 0.19}
6%|▋ | 747/11526 [07:48<1:50:47, 1.62it/s] 6%|▋ | 748/11526 [07:48<1:50:43, 1.62it/s] {'loss': 0.3067, 'grad_norm': 0.6464918851852417, 'learning_rate': 6.487424111014745e-06, 'epoch': 0.19}
6%|▋ | 748/11526 [07:48<1:50:43, 1.62it/s] 6%|▋ | 749/11526 [07:49<1:50:34, 1.62it/s] {'loss': 0.3462, 'grad_norm': 0.7140690684318542, 'learning_rate': 6.496097137901128e-06, 'epoch': 0.19}
6%|▋ | 749/11526 [07:49<1:50:34, 1.62it/s] 7%|▋ | 750/11526 [07:49<1:50:43, 1.62it/s] {'loss': 0.3991, 'grad_norm': 0.6659241914749146, 'learning_rate': 6.504770164787512e-06, 'epoch': 0.2}
7%|▋ | 750/11526 [07:50<1:50:43, 1.62it/s] 7%|▋ | 751/11526 [07:50<1:50:36, 1.62it/s] {'loss': 0.4084, 'grad_norm': 0.7160258889198303, 'learning_rate': 6.513443191673895e-06, 'epoch': 0.2}
7%|▋ | 751/11526 [07:50<1:50:36, 1.62it/s] 7%|▋ | 752/11526 [07:51<1:50:36, 1.62it/s] {'loss': 0.5371, 'grad_norm': 0.7184198498725891, 'learning_rate': 6.522116218560278e-06, 'epoch': 0.2}
7%|▋ | 752/11526 [07:51<1:50:36, 1.62it/s] 7%|▋ | 753/11526 [07:51<1:50:30, 1.62it/s] {'loss': 0.3476, 'grad_norm': 0.6188551783561707, 'learning_rate': 6.530789245446662e-06, 'epoch': 0.2}
7%|▋ | 753/11526 [07:51<1:50:30, 1.62it/s] 7%|▋ | 754/11526 [07:52<1:50:33, 1.62it/s] {'loss': 0.2819, 'grad_norm': 0.6014720797538757, 'learning_rate': 6.539462272333045e-06, 'epoch': 0.2}
7%|▋ | 754/11526 [07:52<1:50:33, 1.62it/s] 7%|▋ | 755/11526 [07:52<1:50:37, 1.62it/s] {'loss': 0.407, 'grad_norm': 0.7019389271736145, 'learning_rate': 6.548135299219428e-06, 'epoch': 0.2}
7%|▋ | 755/11526 [07:53<1:50:37, 1.62it/s] 7%|▋ | 756/11526 [07:53<1:50:28, 1.62it/s] {'loss': 0.3979, 'grad_norm': 0.793718695640564, 'learning_rate': 6.556808326105811e-06, 'epoch': 0.2}
7%|▋ | 756/11526 [07:53<1:50:28, 1.62it/s] 7%|▋ | 757/11526 [07:54<1:50:31, 1.62it/s] {'loss': 0.3528, 'grad_norm': 0.63706374168396, 'learning_rate': 6.565481352992194e-06, 'epoch': 0.2}
7%|▋ | 757/11526 [07:54<1:50:31, 1.62it/s] 7%|▋ | 758/11526 [07:54<1:50:29, 1.62it/s] {'loss': 0.3285, 'grad_norm': 0.6330990791320801, 'learning_rate': 6.574154379878578e-06, 'epoch': 0.2}
7%|▋ | 758/11526 [07:54<1:50:29, 1.62it/s] 7%|▋ | 759/11526 [07:55<1:50:25, 1.63it/s] {'loss': 0.4314, 'grad_norm': 0.6847608089447021, 'learning_rate': 6.582827406764961e-06, 'epoch': 0.2}
7%|▋ | 759/11526 [07:55<1:50:25, 1.63it/s] 7%|▋ | 760/11526 [07:56<1:50:32, 1.62it/s] {'loss': 0.427, 'grad_norm': 0.7705488204956055, 'learning_rate': 6.591500433651344e-06, 'epoch': 0.2}
7%|▋ | 760/11526 [07:56<1:50:32, 1.62it/s] 7%|▋ | 761/11526 [07:56<1:50:27, 1.62it/s] {'loss': 0.3258, 'grad_norm': 0.5987486243247986, 'learning_rate': 6.600173460537728e-06, 'epoch': 0.2}
7%|▋ | 761/11526 [07:56<1:50:27, 1.62it/s] 7%|▋ | 762/11526 [07:57<1:50:32, 1.62it/s] {'loss': 0.3792, 'grad_norm': 0.6566416621208191, 'learning_rate': 6.608846487424112e-06, 'epoch': 0.2}
7%|▋ | 762/11526 [07:57<1:50:32, 1.62it/s] 7%|▋ | 763/11526 [07:57<1:50:26, 1.62it/s] {'loss': 0.4188, 'grad_norm': 0.6747887134552002, 'learning_rate': 6.617519514310495e-06, 'epoch': 0.2}
7%|▋ | 763/11526 [07:58<1:50:26, 1.62it/s] 7%|▋ | 764/11526 [07:58<1:50:19, 1.63it/s] {'loss': 0.3861, 'grad_norm': 0.8108258247375488, 'learning_rate': 6.6261925411968785e-06, 'epoch': 0.2}
7%|▋ | 764/11526 [07:58<1:50:19, 1.63it/s] 7%|▋ | 765/11526 [07:59<1:50:28, 1.62it/s] {'loss': 0.4567, 'grad_norm': 0.7089862823486328, 'learning_rate': 6.634865568083262e-06, 'epoch': 0.2}
7%|▋ | 765/11526 [07:59<1:50:28, 1.62it/s] 7%|▋ | 766/11526 [07:59<1:50:28, 1.62it/s] {'loss': 0.3824, 'grad_norm': 0.7166500091552734, 'learning_rate': 6.643538594969645e-06, 'epoch': 0.2}
7%|▋ | 766/11526 [07:59<1:50:28, 1.62it/s] 7%|▋ | 767/11526 [08:00<1:50:27, 1.62it/s] {'loss': 0.4085, 'grad_norm': 0.7435932159423828, 'learning_rate': 6.652211621856029e-06, 'epoch': 0.2}
7%|▋ | 767/11526 [08:00<1:50:27, 1.62it/s] 7%|▋ | 768/11526 [08:00<1:50:21, 1.62it/s] {'loss': 0.3269, 'grad_norm': 0.6162803769111633, 'learning_rate': 6.660884648742412e-06, 'epoch': 0.2}
7%|▋ | 768/11526 [08:01<1:50:21, 1.62it/s] 7%|▋ | 769/11526 [08:01<1:50:19, 1.62it/s] {'loss': 0.4502, 'grad_norm': 0.8063548803329468, 'learning_rate': 6.669557675628795e-06, 'epoch': 0.2}
7%|▋ | 769/11526 [08:01<1:50:19, 1.62it/s] 7%|▋ | 770/11526 [08:02<1:50:25, 1.62it/s] {'loss': 0.3683, 'grad_norm': 0.8223588466644287, 'learning_rate': 6.678230702515179e-06, 'epoch': 0.2}
7%|▋ | 770/11526 [08:02<1:50:25, 1.62it/s] 7%|▋ | 771/11526 [08:02<1:50:24, 1.62it/s] {'loss': 0.3191, 'grad_norm': 0.5688366293907166, 'learning_rate': 6.686903729401561e-06, 'epoch': 0.2}
7%|▋ | 771/11526 [08:02<1:50:24, 1.62it/s] 7%|▋ | 772/11526 [08:03<1:50:22, 1.62it/s] {'loss': 0.4245, 'grad_norm': 0.7287966012954712, 'learning_rate': 6.6955767562879445e-06, 'epoch': 0.2}
7%|▋ | 772/11526 [08:03<1:50:22, 1.62it/s] 7%|▋ | 773/11526 [08:04<1:50:19, 1.62it/s] {'loss': 0.3832, 'grad_norm': 0.6673471331596375, 'learning_rate': 6.704249783174328e-06, 'epoch': 0.2}
7%|▋ | 773/11526 [08:04<1:50:19, 1.62it/s] 7%|▋ | 774/11526 [08:04<1:50:14, 1.63it/s] {'loss': 0.4026, 'grad_norm': 0.7545210719108582, 'learning_rate': 6.712922810060711e-06, 'epoch': 0.2}
7%|▋ | 774/11526 [08:04<1:50:14, 1.63it/s] 7%|▋ | 775/11526 [08:05<1:50:21, 1.62it/s] {'loss': 0.3773, 'grad_norm': 0.7263842225074768, 'learning_rate': 6.7215958369470945e-06, 'epoch': 0.2}
7%|▋ | 775/11526 [08:05<1:50:21, 1.62it/s] 7%|▋ | 776/11526 [08:05<1:50:14, 1.63it/s] {'loss': 0.4497, 'grad_norm': 0.7241200804710388, 'learning_rate': 6.730268863833478e-06, 'epoch': 0.2}
7%|▋ | 776/11526 [08:06<1:50:14, 1.63it/s] 7%|▋ | 777/11526 [08:06<1:50:25, 1.62it/s] {'loss': 0.3352, 'grad_norm': 0.6227957010269165, 'learning_rate': 6.738941890719862e-06, 'epoch': 0.2}
7%|▋ | 777/11526 [08:06<1:50:25, 1.62it/s] 7%|▋ | 778/11526 [08:07<1:50:18, 1.62it/s] {'loss': 0.3992, 'grad_norm': 0.7485185265541077, 'learning_rate': 6.747614917606245e-06, 'epoch': 0.2}
7%|▋ | 778/11526 [08:07<1:50:18, 1.62it/s] 7%|▋ | 779/11526 [08:07<1:50:13, 1.63it/s] {'loss': 0.3615, 'grad_norm': 0.6314504146575928, 'learning_rate': 6.756287944492629e-06, 'epoch': 0.2}
7%|▋ | 779/11526 [08:07<1:50:13, 1.63it/s] 7%|▋ | 780/11526 [08:08<1:50:15, 1.62it/s] {'loss': 0.447, 'grad_norm': 0.7672885656356812, 'learning_rate': 6.764960971379012e-06, 'epoch': 0.2}
7%|▋ | 780/11526 [08:08<1:50:15, 1.62it/s] 7%|▋ | 781/11526 [08:08<1:50:10, 1.63it/s] {'loss': 0.3817, 'grad_norm': 0.8407154083251953, 'learning_rate': 6.7736339982653955e-06, 'epoch': 0.2}
7%|▋ | 781/11526 [08:09<1:50:10, 1.63it/s] 7%|▋ | 782/11526 [08:09<1:50:45, 1.62it/s] {'loss': 0.4367, 'grad_norm': 0.7562662363052368, 'learning_rate': 6.782307025151779e-06, 'epoch': 0.2}
7%|▋ | 782/11526 [08:09<1:50:45, 1.62it/s] 7%|▋ | 783/11526 [08:10<1:50:29, 1.62it/s] {'loss': 0.339, 'grad_norm': 0.6242331862449646, 'learning_rate': 6.790980052038162e-06, 'epoch': 0.2}
7%|▋ | 783/11526 [08:10<1:50:29, 1.62it/s] 7%|▋ | 784/11526 [08:10<1:50:24, 1.62it/s] {'loss': 0.3572, 'grad_norm': 0.6413666605949402, 'learning_rate': 6.7996530789245455e-06, 'epoch': 0.2}
7%|▋ | 784/11526 [08:10<1:50:24, 1.62it/s] 7%|▋ | 785/11526 [08:11<1:50:22, 1.62it/s] {'loss': 0.4377, 'grad_norm': 0.7122604250907898, 'learning_rate': 6.808326105810929e-06, 'epoch': 0.2}
7%|▋ | 785/11526 [08:11<1:50:22, 1.62it/s] 7%|▋ | 786/11526 [08:12<1:50:23, 1.62it/s] {'loss': 0.378, 'grad_norm': 0.7381094098091125, 'learning_rate': 6.816999132697311e-06, 'epoch': 0.2}
7%|▋ | 786/11526 [08:12<1:50:23, 1.62it/s] 7%|▋ | 787/11526 [08:12<1:50:25, 1.62it/s] {'loss': 0.41, 'grad_norm': 0.8357268571853638, 'learning_rate': 6.825672159583695e-06, 'epoch': 0.2}
7%|▋ | 787/11526 [08:12<1:50:25, 1.62it/s] 7%|▋ | 788/11526 [08:13<1:50:15, 1.62it/s] {'loss': 0.3783, 'grad_norm': 0.702508270740509, 'learning_rate': 6.834345186470078e-06, 'epoch': 0.21}
7%|▋ | 788/11526 [08:13<1:50:15, 1.62it/s] 7%|▋ | 789/11526 [08:13<1:50:10, 1.62it/s] {'loss': 0.3491, 'grad_norm': 0.7558185458183289, 'learning_rate': 6.843018213356461e-06, 'epoch': 0.21}
7%|▋ | 789/11526 [08:14<1:50:10, 1.62it/s] 7%|▋ | 790/11526 [08:14<1:50:42, 1.62it/s] {'loss': 0.3532, 'grad_norm': 0.6314918398857117, 'learning_rate': 6.851691240242845e-06, 'epoch': 0.21}
7%|▋ | 790/11526 [08:14<1:50:42, 1.62it/s] 7%|▋ | 791/11526 [08:15<1:50:26, 1.62it/s] {'loss': 0.3724, 'grad_norm': 0.735069215297699, 'learning_rate': 6.860364267129229e-06, 'epoch': 0.21}
7%|▋ | 791/11526 [08:15<1:50:26, 1.62it/s] 7%|▋ | 792/11526 [08:15<1:50:25, 1.62it/s] {'loss': 0.3676, 'grad_norm': 0.7373847961425781, 'learning_rate': 6.869037294015612e-06, 'epoch': 0.21}
7%|▋ | 792/11526 [08:15<1:50:25, 1.62it/s] 7%|▋ | 793/11526 [08:16<1:50:15, 1.62it/s] {'loss': 0.3185, 'grad_norm': 0.6867648363113403, 'learning_rate': 6.877710320901996e-06, 'epoch': 0.21}
7%|▋ | 793/11526 [08:16<1:50:15, 1.62it/s] 7%|▋ | 794/11526 [08:16<1:50:10, 1.62it/s] {'loss': 0.3486, 'grad_norm': 0.6845462918281555, 'learning_rate': 6.886383347788379e-06, 'epoch': 0.21}
7%|▋ | 794/11526 [08:17<1:50:10, 1.62it/s] 7%|▋ | 795/11526 [08:17<1:50:18, 1.62it/s] {'loss': 0.4056, 'grad_norm': 0.7643906474113464, 'learning_rate': 6.895056374674762e-06, 'epoch': 0.21}
7%|▋ | 795/11526 [08:17<1:50:18, 1.62it/s] 7%|▋ | 796/11526 [08:18<1:50:11, 1.62it/s] {'loss': 0.395, 'grad_norm': 0.6683636903762817, 'learning_rate': 6.903729401561146e-06, 'epoch': 0.21}
7%|▋ | 796/11526 [08:18<1:50:11, 1.62it/s] 7%|▋ | 797/11526 [08:18<1:50:15, 1.62it/s] {'loss': 0.4195, 'grad_norm': 0.7211464643478394, 'learning_rate': 6.912402428447529e-06, 'epoch': 0.21}
7%|▋ | 797/11526 [08:18<1:50:15, 1.62it/s] 7%|▋ | 798/11526 [08:19<1:50:10, 1.62it/s] {'loss': 0.3814, 'grad_norm': 0.7677984833717346, 'learning_rate': 6.921075455333912e-06, 'epoch': 0.21}
7%|▋ | 798/11526 [08:19<1:50:10, 1.62it/s] 7%|▋ | 799/11526 [08:20<1:50:02, 1.62it/s] {'loss': 0.3211, 'grad_norm': 0.6122089624404907, 'learning_rate': 6.929748482220296e-06, 'epoch': 0.21}
7%|▋ | 799/11526 [08:20<1:50:02, 1.62it/s] 7%|▋ | 800/11526 [08:20<1:50:12, 1.62it/s] {'loss': 0.4089, 'grad_norm': 0.7540348172187805, 'learning_rate': 6.938421509106679e-06, 'epoch': 0.21}
7%|▋ | 800/11526 [08:20<1:50:12, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.8008062839508057, 'eval_runtime': 1.9568, 'eval_samples_per_second': 102.209, 'eval_steps_per_second': 6.644, 'epoch': 0.21}
7%|▋ | 800/11526 [08:22<1:50:12, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 7%|▋ | 801/11526 [08:23<3:35:20, 1.20s/it] {'loss': 0.4708, 'grad_norm': 0.7538439035415649, 'learning_rate': 6.9470945359930616e-06, 'epoch': 0.21}
7%|▋ | 801/11526 [08:23<3:35:20, 1.20s/it] 7%|▋ | 802/11526 [08:23<3:03:42, 1.03s/it] {'loss': 0.3928, 'grad_norm': 0.7297358512878418, 'learning_rate': 6.955767562879445e-06, 'epoch': 0.21}
7%|▋ | 802/11526 [08:24<3:03:42, 1.03s/it] 7%|▋ | 803/11526 [08:24<2:41:35, 1.11it/s] {'loss': 0.3763, 'grad_norm': 0.6727347373962402, 'learning_rate': 6.964440589765828e-06, 'epoch': 0.21}
7%|▋ | 803/11526 [08:24<2:41:35, 1.11it/s] 7%|▋ | 804/11526 [08:25<2:26:02, 1.22it/s] {'loss': 0.3971, 'grad_norm': 0.6569233536720276, 'learning_rate': 6.973113616652212e-06, 'epoch': 0.21}
7%|▋ | 804/11526 [08:25<2:26:02, 1.22it/s] 7%|▋ | 805/11526 [08:25<2:15:13, 1.32it/s] {'loss': 0.3488, 'grad_norm': 0.8454895615577698, 'learning_rate': 6.981786643538595e-06, 'epoch': 0.21}
7%|▋ | 805/11526 [08:25<2:15:13, 1.32it/s] 7%|▋ | 806/11526 [08:26<2:07:32, 1.40it/s] {'loss': 0.3819, 'grad_norm': 0.7034002542495728, 'learning_rate': 6.990459670424979e-06, 'epoch': 0.21}
7%|▋ | 806/11526 [08:26<2:07:32, 1.40it/s] 7%|▋ | 807/11526 [08:26<2:02:27, 1.46it/s] {'loss': 0.3701, 'grad_norm': 0.7242304086685181, 'learning_rate': 6.9991326973113625e-06, 'epoch': 0.21}
7%|▋ | 807/11526 [08:27<2:02:27, 1.46it/s] 7%|▋ | 808/11526 [08:27<1:58:45, 1.50it/s] {'loss': 0.3747, 'grad_norm': 0.6931145787239075, 'learning_rate': 7.007805724197746e-06, 'epoch': 0.21}
7%|▋ | 808/11526 [08:27<1:58:45, 1.50it/s] 7%|▋ | 809/11526 [08:28<1:56:04, 1.54it/s] {'loss': 0.3142, 'grad_norm': 0.7182419896125793, 'learning_rate': 7.016478751084129e-06, 'epoch': 0.21}
7%|▋ | 809/11526 [08:28<1:56:04, 1.54it/s] 7%|▋ | 810/11526 [08:28<1:54:20, 1.56it/s] {'loss': 0.3446, 'grad_norm': 0.6234284043312073, 'learning_rate': 7.0251517779705125e-06, 'epoch': 0.21}
7%|▋ | 810/11526 [08:28<1:54:20, 1.56it/s] 7%|▋ | 811/11526 [08:29<1:52:58, 1.58it/s] {'loss': 0.445, 'grad_norm': 0.7259695529937744, 'learning_rate': 7.033824804856896e-06, 'epoch': 0.21}
7%|▋ | 811/11526 [08:29<1:52:58, 1.58it/s] 7%|▋ | 812/11526 [08:30<1:52:05, 1.59it/s] {'loss': 0.3727, 'grad_norm': 0.7261296510696411, 'learning_rate': 7.042497831743279e-06, 'epoch': 0.21}
7%|▋ | 812/11526 [08:30<1:52:05, 1.59it/s] 7%|▋ | 813/11526 [08:30<1:51:25, 1.60it/s] {'loss': 0.3507, 'grad_norm': 0.6855398416519165, 'learning_rate': 7.051170858629663e-06, 'epoch': 0.21}
7%|▋ | 813/11526 [08:30<1:51:25, 1.60it/s] 7%|▋ | 814/11526 [08:31<1:50:54, 1.61it/s] {'loss': 0.3051, 'grad_norm': 0.6249645948410034, 'learning_rate': 7.059843885516046e-06, 'epoch': 0.21}
7%|▋ | 814/11526 [08:31<1:50:54, 1.61it/s] 7%|▋ | 815/11526 [08:31<1:50:44, 1.61it/s] {'loss': 0.3973, 'grad_norm': 0.6950514912605286, 'learning_rate': 7.068516912402429e-06, 'epoch': 0.21}
7%|▋ | 815/11526 [08:32<1:50:44, 1.61it/s] 7%|▋ | 816/11526 [08:32<1:50:24, 1.62it/s] {'loss': 0.4115, 'grad_norm': 0.8236079812049866, 'learning_rate': 7.077189939288812e-06, 'epoch': 0.21}
7%|▋ | 816/11526 [08:32<1:50:24, 1.62it/s] 7%|▋ | 817/11526 [08:33<1:50:26, 1.62it/s] {'loss': 0.3472, 'grad_norm': 0.7023073434829712, 'learning_rate': 7.085862966175195e-06, 'epoch': 0.21}
7%|▋ | 817/11526 [08:33<1:50:26, 1.62it/s] 7%|▋ | 818/11526 [08:33<1:50:13, 1.62it/s] {'loss': 0.3949, 'grad_norm': 0.7737618684768677, 'learning_rate': 7.0945359930615785e-06, 'epoch': 0.21}
7%|▋ | 818/11526 [08:33<1:50:13, 1.62it/s] 7%|▋ | 819/11526 [08:34<1:50:01, 1.62it/s] {'loss': 0.3434, 'grad_norm': 0.6426411271095276, 'learning_rate': 7.103209019947962e-06, 'epoch': 0.21}
7%|▋ | 819/11526 [08:34<1:50:01, 1.62it/s] 7%|▋ | 820/11526 [08:34<1:49:58, 1.62it/s] {'loss': 0.2686, 'grad_norm': 0.5899262428283691, 'learning_rate': 7.111882046834345e-06, 'epoch': 0.21}
7%|▋ | 820/11526 [08:35<1:49:58, 1.62it/s] 7%|▋ | 821/11526 [08:35<1:50:03, 1.62it/s] {'loss': 0.2859, 'grad_norm': 0.6283571720123291, 'learning_rate': 7.120555073720729e-06, 'epoch': 0.21}
7%|▋ | 821/11526 [08:35<1:50:03, 1.62it/s] 7%|▋ | 822/11526 [08:36<1:50:04, 1.62it/s] {'loss': 0.34, 'grad_norm': 0.7163175344467163, 'learning_rate': 7.129228100607113e-06, 'epoch': 0.21}
7%|▋ | 822/11526 [08:36<1:50:04, 1.62it/s] 7%|▋ | 823/11526 [08:36<1:50:00, 1.62it/s] {'loss': 0.3102, 'grad_norm': 0.5973681211471558, 'learning_rate': 7.137901127493496e-06, 'epoch': 0.21}
7%|▋ | 823/11526 [08:36<1:50:00, 1.62it/s] 7%|▋ | 824/11526 [08:37<1:49:55, 1.62it/s] {'loss': 0.3537, 'grad_norm': 0.6487677693367004, 'learning_rate': 7.146574154379879e-06, 'epoch': 0.21}
7%|▋ | 824/11526 [08:37<1:49:55, 1.62it/s] 7%|▋ | 825/11526 [08:38<1:50:18, 1.62it/s] {'loss': 0.4211, 'grad_norm': 0.7911254167556763, 'learning_rate': 7.155247181266263e-06, 'epoch': 0.21}
7%|▋ | 825/11526 [08:38<1:50:18, 1.62it/s] 7%|▋ | 826/11526 [08:38<1:50:10, 1.62it/s] {'loss': 0.3434, 'grad_norm': 0.7140270471572876, 'learning_rate': 7.163920208152646e-06, 'epoch': 0.21}
7%|▋ | 826/11526 [08:38<1:50:10, 1.62it/s] 7%|▋ | 827/11526 [08:39<1:50:12, 1.62it/s] {'loss': 0.4468, 'grad_norm': 0.7667456865310669, 'learning_rate': 7.1725932350390294e-06, 'epoch': 0.22}
7%|▋ | 827/11526 [08:39<1:50:12, 1.62it/s] 7%|▋ | 828/11526 [08:39<1:50:03, 1.62it/s] {'loss': 0.3421, 'grad_norm': 0.7052581310272217, 'learning_rate': 7.181266261925413e-06, 'epoch': 0.22}
7%|▋ | 828/11526 [08:40<1:50:03, 1.62it/s] 7%|▋ | 829/11526 [08:40<1:49:55, 1.62it/s] {'loss': 0.2914, 'grad_norm': 0.5805850625038147, 'learning_rate': 7.189939288811796e-06, 'epoch': 0.22}
7%|▋ | 829/11526 [08:40<1:49:55, 1.62it/s] 7%|▋ | 830/11526 [08:41<1:50:01, 1.62it/s] {'loss': 0.3861, 'grad_norm': 0.7380026578903198, 'learning_rate': 7.1986123156981795e-06, 'epoch': 0.22}
7%|▋ | 830/11526 [08:41<1:50:01, 1.62it/s] 7%|▋ | 831/11526 [08:41<1:50:00, 1.62it/s] {'loss': 0.429, 'grad_norm': 0.7766146063804626, 'learning_rate': 7.207285342584562e-06, 'epoch': 0.22}
7%|▋ | 831/11526 [08:41<1:50:00, 1.62it/s] 7%|▋ | 832/11526 [08:42<1:50:00, 1.62it/s] {'loss': 0.4465, 'grad_norm': 0.750363290309906, 'learning_rate': 7.215958369470945e-06, 'epoch': 0.22}
7%|▋ | 832/11526 [08:42<1:50:00, 1.62it/s] 7%|▋ | 833/11526 [08:42<1:49:42, 1.62it/s] {'loss': 0.3489, 'grad_norm': 0.6823840141296387, 'learning_rate': 7.224631396357329e-06, 'epoch': 0.22}
7%|▋ | 833/11526 [08:43<1:49:42, 1.62it/s] 7%|▋ | 834/11526 [08:43<1:49:35, 1.63it/s] {'loss': 0.4348, 'grad_norm': 0.7201584577560425, 'learning_rate': 7.233304423243712e-06, 'epoch': 0.22}
7%|▋ | 834/11526 [08:43<1:49:35, 1.63it/s] 7%|▋ | 835/11526 [08:44<1:49:42, 1.62it/s] {'loss': 0.3629, 'grad_norm': 0.6620877981185913, 'learning_rate': 7.241977450130095e-06, 'epoch': 0.22}
7%|▋ | 835/11526 [08:44<1:49:42, 1.62it/s] 7%|▋ | 836/11526 [08:44<1:49:37, 1.63it/s] {'loss': 0.3385, 'grad_norm': 0.6437369585037231, 'learning_rate': 7.25065047701648e-06, 'epoch': 0.22}
7%|▋ | 836/11526 [08:44<1:49:37, 1.63it/s] 7%|▋ | 837/11526 [08:45<1:49:41, 1.62it/s] {'loss': 0.4326, 'grad_norm': 0.7968866229057312, 'learning_rate': 7.259323503902863e-06, 'epoch': 0.22}
7%|▋ | 837/11526 [08:45<1:49:41, 1.62it/s] 7%|▋ | 838/11526 [08:46<1:49:36, 1.63it/s] {'loss': 0.3052, 'grad_norm': 0.6953592896461487, 'learning_rate': 7.267996530789246e-06, 'epoch': 0.22}
7%|▋ | 838/11526 [08:46<1:49:36, 1.63it/s] 7%|▋ | 839/11526 [08:46<1:49:35, 1.63it/s] {'loss': 0.3894, 'grad_norm': 0.7695007920265198, 'learning_rate': 7.27666955767563e-06, 'epoch': 0.22}
7%|▋ | 839/11526 [08:46<1:49:35, 1.63it/s] 7%|▋ | 840/11526 [08:47<1:49:38, 1.62it/s] {'loss': 0.3481, 'grad_norm': 0.6626860499382019, 'learning_rate': 7.285342584562013e-06, 'epoch': 0.22}
7%|▋ | 840/11526 [08:47<1:49:38, 1.62it/s] 7%|▋ | 841/11526 [08:47<1:49:34, 1.63it/s] {'loss': 0.4049, 'grad_norm': 0.7149922251701355, 'learning_rate': 7.294015611448396e-06, 'epoch': 0.22}
7%|▋ | 841/11526 [08:48<1:49:34, 1.63it/s] 7%|▋ | 842/11526 [08:48<1:49:47, 1.62it/s] {'loss': 0.4031, 'grad_norm': 0.7253623604774475, 'learning_rate': 7.30268863833478e-06, 'epoch': 0.22}
7%|▋ | 842/11526 [08:48<1:49:47, 1.62it/s] 7%|▋ | 843/11526 [08:49<1:49:39, 1.62it/s] {'loss': 0.3392, 'grad_norm': 0.7315935492515564, 'learning_rate': 7.311361665221163e-06, 'epoch': 0.22}
7%|▋ | 843/11526 [08:49<1:49:39, 1.62it/s] 7%|▋ | 844/11526 [08:49<1:49:34, 1.62it/s] {'loss': 0.3118, 'grad_norm': 0.7120328545570374, 'learning_rate': 7.320034692107546e-06, 'epoch': 0.22}
7%|▋ | 844/11526 [08:49<1:49:34, 1.62it/s] 7%|▋ | 845/11526 [08:50<1:49:54, 1.62it/s] {'loss': 0.3528, 'grad_norm': 0.7169113159179688, 'learning_rate': 7.32870771899393e-06, 'epoch': 0.22}
7%|▋ | 845/11526 [08:50<1:49:54, 1.62it/s] 7%|▋ | 846/11526 [08:50<1:49:42, 1.62it/s] {'loss': 0.3982, 'grad_norm': 0.7533276081085205, 'learning_rate': 7.337380745880312e-06, 'epoch': 0.22}
7%|▋ | 846/11526 [08:51<1:49:42, 1.62it/s] 7%|▋ | 847/11526 [08:51<1:49:44, 1.62it/s] {'loss': 0.3505, 'grad_norm': 0.7684126496315002, 'learning_rate': 7.3460537727666956e-06, 'epoch': 0.22}
7%|▋ | 847/11526 [08:51<1:49:44, 1.62it/s] 7%|▋ | 848/11526 [08:52<1:49:39, 1.62it/s] {'loss': 0.3456, 'grad_norm': 0.7908963561058044, 'learning_rate': 7.354726799653079e-06, 'epoch': 0.22}
7%|▋ | 848/11526 [08:52<1:49:39, 1.62it/s] 7%|▋ | 849/11526 [08:52<1:49:34, 1.62it/s] {'loss': 0.4488, 'grad_norm': 0.7953776121139526, 'learning_rate': 7.363399826539462e-06, 'epoch': 0.22}
7%|▋ | 849/11526 [08:52<1:49:34, 1.62it/s] 7%|▋ | 850/11526 [08:53<1:50:08, 1.62it/s] {'loss': 0.3837, 'grad_norm': 0.7209896445274353, 'learning_rate': 7.3720728534258464e-06, 'epoch': 0.22}
7%|▋ | 850/11526 [08:53<1:50:08, 1.62it/s] 7%|▋ | 851/11526 [08:54<1:49:54, 1.62it/s] {'loss': 0.3068, 'grad_norm': 0.6641440987586975, 'learning_rate': 7.38074588031223e-06, 'epoch': 0.22}
7%|▋ | 851/11526 [08:54<1:49:54, 1.62it/s] 7%|▋ | 852/11526 [08:54<1:49:55, 1.62it/s] {'loss': 0.3312, 'grad_norm': 0.7056489586830139, 'learning_rate': 7.389418907198613e-06, 'epoch': 0.22}
7%|▋ | 852/11526 [08:54<1:49:55, 1.62it/s] 7%|▋ | 853/11526 [08:55<1:49:58, 1.62it/s] {'loss': 0.3825, 'grad_norm': 0.7022843956947327, 'learning_rate': 7.3980919340849965e-06, 'epoch': 0.22}
7%|▋ | 853/11526 [08:55<1:49:58, 1.62it/s] 7%|▋ | 854/11526 [08:55<1:49:46, 1.62it/s] {'loss': 0.3067, 'grad_norm': 0.6230480074882507, 'learning_rate': 7.40676496097138e-06, 'epoch': 0.22}
7%|▋ | 854/11526 [08:56<1:49:46, 1.62it/s] 7%|▋ | 855/11526 [08:56<1:49:44, 1.62it/s] {'loss': 0.3913, 'grad_norm': 0.6699287295341492, 'learning_rate': 7.415437987857763e-06, 'epoch': 0.22}
7%|▋ | 855/11526 [08:56<1:49:44, 1.62it/s] 7%|▋ | 856/11526 [08:57<1:49:37, 1.62it/s] {'loss': 0.3097, 'grad_norm': 0.5991837382316589, 'learning_rate': 7.4241110147441465e-06, 'epoch': 0.22}
7%|▋ | 856/11526 [08:57<1:49:37, 1.62it/s] 7%|▋ | 857/11526 [08:57<1:49:46, 1.62it/s] {'loss': 0.4164, 'grad_norm': 0.6963226795196533, 'learning_rate': 7.43278404163053e-06, 'epoch': 0.22}
7%|▋ | 857/11526 [08:57<1:49:46, 1.62it/s] 7%|▋ | 858/11526 [08:58<1:49:37, 1.62it/s] {'loss': 0.3798, 'grad_norm': 0.75253826379776, 'learning_rate': 7.441457068516913e-06, 'epoch': 0.22}
7%|▋ | 858/11526 [08:58<1:49:37, 1.62it/s] 7%|▋ | 859/11526 [08:59<1:49:34, 1.62it/s] {'loss': 0.387, 'grad_norm': 0.6586372256278992, 'learning_rate': 7.4501300954032966e-06, 'epoch': 0.22}
7%|▋ | 859/11526 [08:59<1:49:34, 1.62it/s] 7%|▋ | 860/11526 [08:59<1:49:36, 1.62it/s] {'loss': 0.5176, 'grad_norm': 0.679161012172699, 'learning_rate': 7.45880312228968e-06, 'epoch': 0.22}
7%|▋ | 860/11526 [08:59<1:49:36, 1.62it/s] 7%|▋ | 861/11526 [09:00<1:49:27, 1.62it/s] {'loss': 0.3168, 'grad_norm': 0.6601464748382568, 'learning_rate': 7.4674761491760624e-06, 'epoch': 0.22}
7%|▋ | 861/11526 [09:00<1:49:27, 1.62it/s] 7%|▋ | 862/11526 [09:00<1:49:31, 1.62it/s] {'loss': 0.3844, 'grad_norm': 0.699120819568634, 'learning_rate': 7.476149176062446e-06, 'epoch': 0.22}
7%|▋ | 862/11526 [09:00<1:49:31, 1.62it/s] 7%|▋ | 863/11526 [09:01<1:49:24, 1.62it/s] {'loss': 0.371, 'grad_norm': 0.7014876008033752, 'learning_rate': 7.484822202948829e-06, 'epoch': 0.22}
7%|▋ | 863/11526 [09:01<1:49:24, 1.62it/s] 7%|▋ | 864/11526 [09:02<1:49:20, 1.63it/s] {'loss': 0.404, 'grad_norm': 0.6643251180648804, 'learning_rate': 7.4934952298352125e-06, 'epoch': 0.22}
7%|▋ | 864/11526 [09:02<1:49:20, 1.63it/s] 8%|▊ | 865/11526 [09:02<1:49:22, 1.62it/s] {'loss': 0.2794, 'grad_norm': 0.6294180750846863, 'learning_rate': 7.502168256721597e-06, 'epoch': 0.23}
8%|▊ | 865/11526 [09:02<1:49:22, 1.62it/s] 8%|▊ | 866/11526 [09:03<1:49:19, 1.63it/s] {'loss': 0.4217, 'grad_norm': 0.7939609885215759, 'learning_rate': 7.51084128360798e-06, 'epoch': 0.23}
8%|▊ | 866/11526 [09:03<1:49:19, 1.63it/s] 8%|▊ | 867/11526 [09:03<1:49:30, 1.62it/s] {'loss': 0.2991, 'grad_norm': 0.559368371963501, 'learning_rate': 7.519514310494363e-06, 'epoch': 0.23}
8%|▊ | 867/11526 [09:04<1:49:30, 1.62it/s] 8%|▊ | 868/11526 [09:04<1:49:23, 1.62it/s] {'loss': 0.4046, 'grad_norm': 0.6893911957740784, 'learning_rate': 7.528187337380747e-06, 'epoch': 0.23}
8%|▊ | 868/11526 [09:04<1:49:23, 1.62it/s] 8%|▊ | 869/11526 [09:05<1:49:21, 1.62it/s] {'loss': 0.3433, 'grad_norm': 0.5961382389068604, 'learning_rate': 7.53686036426713e-06, 'epoch': 0.23}
8%|▊ | 869/11526 [09:05<1:49:21, 1.62it/s] 8%|▊ | 870/11526 [09:05<1:49:25, 1.62it/s] {'loss': 0.3277, 'grad_norm': 0.6445572376251221, 'learning_rate': 7.545533391153513e-06, 'epoch': 0.23}
8%|▊ | 870/11526 [09:05<1:49:25, 1.62it/s] 8%|▊ | 871/11526 [09:06<1:49:25, 1.62it/s] {'loss': 0.2862, 'grad_norm': 0.6973950266838074, 'learning_rate': 7.554206418039897e-06, 'epoch': 0.23}
8%|▊ | 871/11526 [09:06<1:49:25, 1.62it/s] 8%|▊ | 872/11526 [09:07<1:49:21, 1.62it/s] {'loss': 0.3783, 'grad_norm': 0.7038639187812805, 'learning_rate': 7.56287944492628e-06, 'epoch': 0.23}
8%|▊ | 872/11526 [09:07<1:49:21, 1.62it/s] 8%|▊ | 873/11526 [09:07<1:49:14, 1.63it/s] {'loss': 0.3024, 'grad_norm': 0.6962592005729675, 'learning_rate': 7.5715524718126634e-06, 'epoch': 0.23}
8%|▊ | 873/11526 [09:07<1:49:14, 1.63it/s] 8%|▊ | 874/11526 [09:08<1:49:12, 1.63it/s] {'loss': 0.3158, 'grad_norm': 0.6007298231124878, 'learning_rate': 7.580225498699047e-06, 'epoch': 0.23}
8%|▊ | 874/11526 [09:08<1:49:12, 1.63it/s] 8%|▊ | 875/11526 [09:08<1:49:19, 1.62it/s] {'loss': 0.3535, 'grad_norm': 0.6964049935340881, 'learning_rate': 7.58889852558543e-06, 'epoch': 0.23}
8%|▊ | 875/11526 [09:08<1:49:19, 1.62it/s] 8%|▊ | 876/11526 [09:09<1:49:12, 1.63it/s] {'loss': 0.3214, 'grad_norm': 0.6978402137756348, 'learning_rate': 7.597571552471813e-06, 'epoch': 0.23}
8%|▊ | 876/11526 [09:09<1:49:12, 1.63it/s] 8%|▊ | 877/11526 [09:10<1:49:12, 1.63it/s] {'loss': 0.3, 'grad_norm': 0.8196964859962463, 'learning_rate': 7.606244579358196e-06, 'epoch': 0.23}
8%|▊ | 877/11526 [09:10<1:49:12, 1.63it/s] 8%|▊ | 878/11526 [09:10<1:49:10, 1.63it/s] {'loss': 0.3272, 'grad_norm': 0.7798760533332825, 'learning_rate': 7.614917606244579e-06, 'epoch': 0.23}
8%|▊ | 878/11526 [09:10<1:49:10, 1.63it/s] 8%|▊ | 879/11526 [09:11<1:49:07, 1.63it/s] {'loss': 0.246, 'grad_norm': 0.6238197088241577, 'learning_rate': 7.623590633130963e-06, 'epoch': 0.23}
8%|▊ | 879/11526 [09:11<1:49:07, 1.63it/s] 8%|▊ | 880/11526 [09:11<1:49:10, 1.63it/s] {'loss': 0.4346, 'grad_norm': 0.7972267866134644, 'learning_rate': 7.632263660017348e-06, 'epoch': 0.23}
8%|▊ | 880/11526 [09:12<1:49:10, 1.63it/s] 8%|▊ | 881/11526 [09:12<1:49:05, 1.63it/s] {'loss': 0.4117, 'grad_norm': 0.8893899321556091, 'learning_rate': 7.64093668690373e-06, 'epoch': 0.23}
8%|▊ | 881/11526 [09:12<1:49:05, 1.63it/s] 8%|▊ | 882/11526 [09:13<1:49:11, 1.62it/s] {'loss': 0.3795, 'grad_norm': 0.7515725493431091, 'learning_rate': 7.649609713790114e-06, 'epoch': 0.23}
8%|▊ | 882/11526 [09:13<1:49:11, 1.62it/s] 8%|▊ | 883/11526 [09:13<1:49:09, 1.63it/s] {'loss': 0.319, 'grad_norm': 0.626658022403717, 'learning_rate': 7.658282740676497e-06, 'epoch': 0.23}
8%|▊ | 883/11526 [09:13<1:49:09, 1.63it/s] 8%|▊ | 884/11526 [09:14<1:49:05, 1.63it/s] {'loss': 0.4002, 'grad_norm': 0.6883169412612915, 'learning_rate': 7.666955767562881e-06, 'epoch': 0.23}
8%|▊ | 884/11526 [09:14<1:49:05, 1.63it/s] 8%|▊ | 885/11526 [09:15<1:49:08, 1.62it/s] {'loss': 0.351, 'grad_norm': 0.7620996832847595, 'learning_rate': 7.675628794449264e-06, 'epoch': 0.23}
8%|▊ | 885/11526 [09:15<1:49:08, 1.62it/s] 8%|▊ | 886/11526 [09:15<1:49:12, 1.62it/s] {'loss': 0.3594, 'grad_norm': 0.7095179557800293, 'learning_rate': 7.684301821335646e-06, 'epoch': 0.23}
8%|▊ | 886/11526 [09:15<1:49:12, 1.62it/s] 8%|▊ | 887/11526 [09:16<1:49:19, 1.62it/s] {'loss': 0.3664, 'grad_norm': 0.7995191812515259, 'learning_rate': 7.69297484822203e-06, 'epoch': 0.23}
8%|▊ | 887/11526 [09:16<1:49:19, 1.62it/s] 8%|▊ | 888/11526 [09:16<1:49:14, 1.62it/s] {'loss': 0.3828, 'grad_norm': 0.829699695110321, 'learning_rate': 7.701647875108413e-06, 'epoch': 0.23}
8%|▊ | 888/11526 [09:16<1:49:14, 1.62it/s] 8%|▊ | 889/11526 [09:17<1:49:08, 1.62it/s] {'loss': 0.3718, 'grad_norm': 0.7947219610214233, 'learning_rate': 7.710320901994797e-06, 'epoch': 0.23}
8%|▊ | 889/11526 [09:17<1:49:08, 1.62it/s] 8%|▊ | 890/11526 [09:18<1:49:09, 1.62it/s] {'loss': 0.3842, 'grad_norm': 0.691902220249176, 'learning_rate': 7.71899392888118e-06, 'epoch': 0.23}
8%|▊ | 890/11526 [09:18<1:49:09, 1.62it/s] 8%|▊ | 891/11526 [09:18<1:49:04, 1.63it/s] {'loss': 0.3341, 'grad_norm': 0.7406737208366394, 'learning_rate': 7.727666955767564e-06, 'epoch': 0.23}
8%|▊ | 891/11526 [09:18<1:49:04, 1.63it/s] 8%|▊ | 892/11526 [09:19<1:49:14, 1.62it/s] {'loss': 0.3754, 'grad_norm': 0.7159379720687866, 'learning_rate': 7.736339982653946e-06, 'epoch': 0.23}
8%|▊ | 892/11526 [09:19<1:49:14, 1.62it/s] 8%|▊ | 893/11526 [09:19<1:49:07, 1.62it/s] {'loss': 0.3618, 'grad_norm': 0.6011752486228943, 'learning_rate': 7.74501300954033e-06, 'epoch': 0.23}
8%|▊ | 893/11526 [09:20<1:49:07, 1.62it/s] 8%|▊ | 894/11526 [09:20<1:49:04, 1.62it/s] {'loss': 0.2768, 'grad_norm': 0.6976768970489502, 'learning_rate': 7.753686036426713e-06, 'epoch': 0.23}
8%|▊ | 894/11526 [09:20<1:49:04, 1.62it/s] 8%|▊ | 895/11526 [09:21<1:49:11, 1.62it/s] {'loss': 0.346, 'grad_norm': 0.6388375759124756, 'learning_rate': 7.762359063313097e-06, 'epoch': 0.23}
8%|▊ | 895/11526 [09:21<1:49:11, 1.62it/s] 8%|▊ | 896/11526 [09:21<1:49:04, 1.62it/s] {'loss': 0.2925, 'grad_norm': 0.6563900113105774, 'learning_rate': 7.771032090199481e-06, 'epoch': 0.23}
8%|▊ | 896/11526 [09:21<1:49:04, 1.62it/s] 8%|▊ | 897/11526 [09:22<1:49:10, 1.62it/s] {'loss': 0.3172, 'grad_norm': 0.6994851231575012, 'learning_rate': 7.779705117085864e-06, 'epoch': 0.23}
8%|▊ | 897/11526 [09:22<1:49:10, 1.62it/s] 8%|▊ | 898/11526 [09:23<1:49:06, 1.62it/s] {'loss': 0.3756, 'grad_norm': 0.6949496865272522, 'learning_rate': 7.788378143972248e-06, 'epoch': 0.23}
8%|▊ | 898/11526 [09:23<1:49:06, 1.62it/s] 8%|▊ | 899/11526 [09:23<1:49:02, 1.62it/s] {'loss': 0.3672, 'grad_norm': 0.7478926181793213, 'learning_rate': 7.79705117085863e-06, 'epoch': 0.23}
8%|▊ | 899/11526 [09:23<1:49:02, 1.62it/s] 8%|▊ | 900/11526 [09:24<1:49:09, 1.62it/s] {'loss': 0.3371, 'grad_norm': 0.700088620185852, 'learning_rate': 7.805724197745013e-06, 'epoch': 0.23}
8%|▊ | 900/11526 [09:24<1:49:09, 1.62it/s] 8%|▊ | 901/11526 [09:24<1:49:02, 1.62it/s] {'loss': 0.3478, 'grad_norm': 0.6523679494857788, 'learning_rate': 7.814397224631397e-06, 'epoch': 0.23}
8%|▊ | 901/11526 [09:24<1:49:02, 1.62it/s] 8%|▊ | 902/11526 [09:25<1:49:05, 1.62it/s] {'loss': 0.3785, 'grad_norm': 0.6770567893981934, 'learning_rate': 7.82307025151778e-06, 'epoch': 0.23}
8%|▊ | 902/11526 [09:25<1:49:05, 1.62it/s] 8%|▊ | 903/11526 [09:26<1:48:58, 1.62it/s] {'loss': 0.3422, 'grad_norm': 0.6459780335426331, 'learning_rate': 7.831743278404164e-06, 'epoch': 0.24}
8%|▊ | 903/11526 [09:26<1:48:58, 1.62it/s] 8%|▊ | 904/11526 [09:26<1:48:56, 1.62it/s] {'loss': 0.3545, 'grad_norm': 0.7004954814910889, 'learning_rate': 7.840416305290546e-06, 'epoch': 0.24}
8%|▊ | 904/11526 [09:26<1:48:56, 1.62it/s] 8%|▊ | 905/11526 [09:27<1:48:59, 1.62it/s] {'loss': 0.3653, 'grad_norm': 0.7036858797073364, 'learning_rate': 7.84908933217693e-06, 'epoch': 0.24}
8%|▊ | 905/11526 [09:27<1:48:59, 1.62it/s] 8%|▊ | 906/11526 [09:27<1:48:56, 1.62it/s] {'loss': 0.355, 'grad_norm': 0.7081471085548401, 'learning_rate': 7.857762359063313e-06, 'epoch': 0.24}
8%|▊ | 906/11526 [09:28<1:48:56, 1.62it/s] 8%|▊ | 907/11526 [09:28<1:49:25, 1.62it/s] {'loss': 0.3487, 'grad_norm': 0.687362790107727, 'learning_rate': 7.866435385949697e-06, 'epoch': 0.24}
8%|▊ | 907/11526 [09:28<1:49:25, 1.62it/s] 8%|▊ | 908/11526 [09:29<1:49:13, 1.62it/s] {'loss': 0.2869, 'grad_norm': 0.6471632719039917, 'learning_rate': 7.87510841283608e-06, 'epoch': 0.24}
8%|▊ | 908/11526 [09:29<1:49:13, 1.62it/s] 8%|▊ | 909/11526 [09:29<1:49:11, 1.62it/s] {'loss': 0.3033, 'grad_norm': 0.6794768571853638, 'learning_rate': 7.883781439722464e-06, 'epoch': 0.24}
8%|▊ | 909/11526 [09:29<1:49:11, 1.62it/s] 8%|▊ | 910/11526 [09:30<1:49:09, 1.62it/s] {'loss': 0.3879, 'grad_norm': 0.7114742994308472, 'learning_rate': 7.892454466608848e-06, 'epoch': 0.24}
8%|▊ | 910/11526 [09:30<1:49:09, 1.62it/s] 8%|▊ | 911/11526 [09:31<1:49:02, 1.62it/s] {'loss': 0.3185, 'grad_norm': 0.704896092414856, 'learning_rate': 7.90112749349523e-06, 'epoch': 0.24}
8%|▊ | 911/11526 [09:31<1:49:02, 1.62it/s] 8%|▊ | 912/11526 [09:31<1:49:10, 1.62it/s] {'loss': 0.3968, 'grad_norm': 0.7257667779922485, 'learning_rate': 7.909800520381615e-06, 'epoch': 0.24}
8%|▊ | 912/11526 [09:31<1:49:10, 1.62it/s] 8%|▊ | 913/11526 [09:32<1:49:08, 1.62it/s] {'loss': 0.3354, 'grad_norm': 0.6662918925285339, 'learning_rate': 7.918473547267997e-06, 'epoch': 0.24}
8%|▊ | 913/11526 [09:32<1:49:08, 1.62it/s] 8%|▊ | 914/11526 [09:32<1:48:59, 1.62it/s] {'loss': 0.4021, 'grad_norm': 0.8201828002929688, 'learning_rate': 7.927146574154382e-06, 'epoch': 0.24}
8%|▊ | 914/11526 [09:33<1:48:59, 1.62it/s] 8%|▊ | 915/11526 [09:33<1:49:01, 1.62it/s] {'loss': 0.3571, 'grad_norm': 0.8401138782501221, 'learning_rate': 7.935819601040764e-06, 'epoch': 0.24}
8%|▊ | 915/11526 [09:33<1:49:01, 1.62it/s] 8%|▊ | 916/11526 [09:34<1:48:54, 1.62it/s] {'loss': 0.3667, 'grad_norm': 0.7836294770240784, 'learning_rate': 7.944492627927147e-06, 'epoch': 0.24}
8%|▊ | 916/11526 [09:34<1:48:54, 1.62it/s] 8%|▊ | 917/11526 [09:34<1:48:59, 1.62it/s] {'loss': 0.3387, 'grad_norm': 0.6879644393920898, 'learning_rate': 7.95316565481353e-06, 'epoch': 0.24}
8%|▊ | 917/11526 [09:34<1:48:59, 1.62it/s] 8%|▊ | 918/11526 [09:35<1:48:56, 1.62it/s] {'loss': 0.2933, 'grad_norm': 0.5950043797492981, 'learning_rate': 7.961838681699913e-06, 'epoch': 0.24}
8%|▊ | 918/11526 [09:35<1:48:56, 1.62it/s] 8%|▊ | 919/11526 [09:35<1:48:49, 1.62it/s] {'loss': 0.363, 'grad_norm': 0.7090800404548645, 'learning_rate': 7.970511708586297e-06, 'epoch': 0.24}
8%|▊ | 919/11526 [09:36<1:48:49, 1.62it/s] 8%|▊ | 920/11526 [09:36<1:48:55, 1.62it/s] {'loss': 0.4388, 'grad_norm': 0.7754530310630798, 'learning_rate': 7.97918473547268e-06, 'epoch': 0.24}
8%|▊ | 920/11526 [09:36<1:48:55, 1.62it/s] 8%|▊ | 921/11526 [09:37<1:48:51, 1.62it/s] {'loss': 0.3733, 'grad_norm': 0.7619813680648804, 'learning_rate': 7.987857762359064e-06, 'epoch': 0.24}
8%|▊ | 921/11526 [09:37<1:48:51, 1.62it/s] 8%|▊ | 922/11526 [09:37<1:48:49, 1.62it/s] {'loss': 0.2913, 'grad_norm': 0.5745354890823364, 'learning_rate': 7.996530789245447e-06, 'epoch': 0.24}
8%|▊ | 922/11526 [09:37<1:48:49, 1.62it/s] 8%|▊ | 923/11526 [09:38<1:48:44, 1.63it/s] {'loss': 0.3257, 'grad_norm': 0.6523197293281555, 'learning_rate': 8.00520381613183e-06, 'epoch': 0.24}
8%|▊ | 923/11526 [09:38<1:48:44, 1.63it/s] 8%|▊ | 924/11526 [09:39<1:48:42, 1.63it/s] {'loss': 0.2974, 'grad_norm': 0.6339744925498962, 'learning_rate': 8.013876843018215e-06, 'epoch': 0.24}
8%|▊ | 924/11526 [09:39<1:48:42, 1.63it/s] 8%|▊ | 925/11526 [09:39<1:48:46, 1.62it/s] {'loss': 0.3502, 'grad_norm': 0.6394465565681458, 'learning_rate': 8.022549869904598e-06, 'epoch': 0.24}
8%|▊ | 925/11526 [09:39<1:48:46, 1.62it/s] 8%|▊ | 926/11526 [09:40<1:48:42, 1.63it/s] {'loss': 0.4079, 'grad_norm': 0.6584296226501465, 'learning_rate': 8.031222896790982e-06, 'epoch': 0.24}
8%|▊ | 926/11526 [09:40<1:48:42, 1.63it/s] 8%|▊ | 927/11526 [09:40<1:48:47, 1.62it/s] {'loss': 0.4491, 'grad_norm': 0.7887781858444214, 'learning_rate': 8.039895923677364e-06, 'epoch': 0.24}
8%|▊ | 927/11526 [09:41<1:48:47, 1.62it/s] 8%|▊ | 928/11526 [09:41<1:48:41, 1.63it/s] {'loss': 0.387, 'grad_norm': 0.7195541262626648, 'learning_rate': 8.048568950563748e-06, 'epoch': 0.24}
8%|▊ | 928/11526 [09:41<1:48:41, 1.63it/s] 8%|▊ | 929/11526 [09:42<1:48:38, 1.63it/s] {'loss': 0.4455, 'grad_norm': 0.6968345046043396, 'learning_rate': 8.057241977450131e-06, 'epoch': 0.24}
8%|▊ | 929/11526 [09:42<1:48:38, 1.63it/s] 8%|▊ | 930/11526 [09:42<1:48:42, 1.62it/s] {'loss': 0.332, 'grad_norm': 0.599797785282135, 'learning_rate': 8.065915004336513e-06, 'epoch': 0.24}
8%|▊ | 930/11526 [09:42<1:48:42, 1.62it/s] 8%|▊ | 931/11526 [09:43<1:48:38, 1.63it/s] {'loss': 0.3727, 'grad_norm': 0.6872587203979492, 'learning_rate': 8.074588031222898e-06, 'epoch': 0.24}
8%|▊ | 931/11526 [09:43<1:48:38, 1.63it/s] 8%|▊ | 932/11526 [09:43<1:48:38, 1.63it/s] {'loss': 0.3468, 'grad_norm': 0.6894674301147461, 'learning_rate': 8.08326105810928e-06, 'epoch': 0.24}
8%|▊ | 932/11526 [09:44<1:48:38, 1.63it/s] 8%|▊ | 933/11526 [09:44<1:48:39, 1.62it/s] {'loss': 0.2933, 'grad_norm': 0.6398645043373108, 'learning_rate': 8.091934084995664e-06, 'epoch': 0.24}
8%|▊ | 933/11526 [09:44<1:48:39, 1.62it/s] 8%|▊ | 934/11526 [09:45<1:48:34, 1.63it/s] {'loss': 0.2922, 'grad_norm': 0.5899367332458496, 'learning_rate': 8.100607111882047e-06, 'epoch': 0.24}
8%|▊ | 934/11526 [09:45<1:48:34, 1.63it/s] 8%|▊ | 935/11526 [09:45<1:48:46, 1.62it/s] {'loss': 0.2892, 'grad_norm': 0.5606797933578491, 'learning_rate': 8.109280138768431e-06, 'epoch': 0.24}
8%|▊ | 935/11526 [09:45<1:48:46, 1.62it/s] 8%|▊ | 936/11526 [09:46<1:48:44, 1.62it/s] {'loss': 0.399, 'grad_norm': 0.6526550650596619, 'learning_rate': 8.117953165654814e-06, 'epoch': 0.24}
8%|▊ | 936/11526 [09:46<1:48:44, 1.62it/s] 8%|▊ | 937/11526 [09:47<1:48:48, 1.62it/s] {'loss': 0.3226, 'grad_norm': 0.6591784358024597, 'learning_rate': 8.126626192541198e-06, 'epoch': 0.24}
8%|▊ | 937/11526 [09:47<1:48:48, 1.62it/s] 8%|▊ | 938/11526 [09:47<1:48:43, 1.62it/s] {'loss': 0.2514, 'grad_norm': 0.5823246836662292, 'learning_rate': 8.13529921942758e-06, 'epoch': 0.24}
8%|▊ | 938/11526 [09:47<1:48:43, 1.62it/s] 8%|▊ | 939/11526 [09:48<1:48:38, 1.62it/s] {'loss': 0.3132, 'grad_norm': 0.6623145937919617, 'learning_rate': 8.143972246313964e-06, 'epoch': 0.24}
8%|▊ | 939/11526 [09:48<1:48:38, 1.62it/s] 8%|▊ | 940/11526 [09:48<1:48:39, 1.62it/s] {'loss': 0.3205, 'grad_norm': 0.6624166369438171, 'learning_rate': 8.152645273200349e-06, 'epoch': 0.24}
8%|▊ | 940/11526 [09:49<1:48:39, 1.62it/s] 8%|▊ | 941/11526 [09:49<1:48:34, 1.62it/s] {'loss': 0.4858, 'grad_norm': 0.8773713111877441, 'learning_rate': 8.161318300086731e-06, 'epoch': 0.24}
8%|▊ | 941/11526 [09:49<1:48:34, 1.62it/s] 8%|▊ | 942/11526 [09:50<1:48:37, 1.62it/s] {'loss': 0.3583, 'grad_norm': 0.6435461640357971, 'learning_rate': 8.169991326973115e-06, 'epoch': 0.25}
8%|▊ | 942/11526 [09:50<1:48:37, 1.62it/s] 8%|▊ | 943/11526 [09:50<1:48:31, 1.63it/s] {'loss': 0.4245, 'grad_norm': 0.6789295673370361, 'learning_rate': 8.178664353859498e-06, 'epoch': 0.25}
8%|▊ | 943/11526 [09:50<1:48:31, 1.63it/s] 8%|▊ | 944/11526 [09:51<1:48:28, 1.63it/s] {'loss': 0.3114, 'grad_norm': 0.6174657940864563, 'learning_rate': 8.187337380745882e-06, 'epoch': 0.25}
8%|▊ | 944/11526 [09:51<1:48:28, 1.63it/s] 8%|▊ | 945/11526 [09:51<1:48:35, 1.62it/s] {'loss': 0.3481, 'grad_norm': 0.8540397882461548, 'learning_rate': 8.196010407632264e-06, 'epoch': 0.25}
8%|▊ | 945/11526 [09:52<1:48:35, 1.62it/s] 8%|▊ | 946/11526 [09:52<1:48:38, 1.62it/s] {'loss': 0.3894, 'grad_norm': 0.7671207785606384, 'learning_rate': 8.204683434518647e-06, 'epoch': 0.25}
8%|▊ | 946/11526 [09:52<1:48:38, 1.62it/s] 8%|▊ | 947/11526 [09:53<1:48:46, 1.62it/s] {'loss': 0.2441, 'grad_norm': 0.5698780417442322, 'learning_rate': 8.213356461405031e-06, 'epoch': 0.25}
8%|▊ | 947/11526 [09:53<1:48:46, 1.62it/s] 8%|▊ | 948/11526 [09:53<1:48:41, 1.62it/s] {'loss': 0.3531, 'grad_norm': 0.7611456513404846, 'learning_rate': 8.222029488291414e-06, 'epoch': 0.25}
8%|▊ | 948/11526 [09:53<1:48:41, 1.62it/s] 8%|▊ | 949/11526 [09:54<1:48:34, 1.62it/s] {'loss': 0.2983, 'grad_norm': 0.6448303461074829, 'learning_rate': 8.230702515177798e-06, 'epoch': 0.25}
8%|▊ | 949/11526 [09:54<1:48:34, 1.62it/s] 8%|▊ | 950/11526 [09:55<1:48:37, 1.62it/s] {'loss': 0.4097, 'grad_norm': 0.7887703776359558, 'learning_rate': 8.23937554206418e-06, 'epoch': 0.25}
8%|▊ | 950/11526 [09:55<1:48:37, 1.62it/s] 8%|▊ | 951/11526 [09:55<1:48:33, 1.62it/s] {'loss': 0.3385, 'grad_norm': 0.6960475444793701, 'learning_rate': 8.248048568950565e-06, 'epoch': 0.25}
8%|▊ | 951/11526 [09:55<1:48:33, 1.62it/s] 8%|▊ | 952/11526 [09:56<1:48:35, 1.62it/s] {'loss': 0.466, 'grad_norm': 0.7579076290130615, 'learning_rate': 8.256721595836947e-06, 'epoch': 0.25}
8%|▊ | 952/11526 [09:56<1:48:35, 1.62it/s] 8%|▊ | 953/11526 [09:56<1:48:37, 1.62it/s] {'loss': 0.381, 'grad_norm': 0.7063487768173218, 'learning_rate': 8.265394622723331e-06, 'epoch': 0.25}
8%|▊ | 953/11526 [09:57<1:48:37, 1.62it/s] 8%|▊ | 954/11526 [09:57<1:48:34, 1.62it/s] {'loss': 0.34, 'grad_norm': 0.7320043444633484, 'learning_rate': 8.274067649609715e-06, 'epoch': 0.25}
8%|▊ | 954/11526 [09:57<1:48:34, 1.62it/s] 8%|▊ | 955/11526 [09:58<1:48:57, 1.62it/s] {'loss': 0.3974, 'grad_norm': 0.9076068997383118, 'learning_rate': 8.282740676496098e-06, 'epoch': 0.25}
8%|▊ | 955/11526 [09:58<1:48:57, 1.62it/s] 8%|▊ | 956/11526 [09:58<1:48:47, 1.62it/s] {'loss': 0.3959, 'grad_norm': 0.7275028824806213, 'learning_rate': 8.291413703382482e-06, 'epoch': 0.25}
8%|▊ | 956/11526 [09:58<1:48:47, 1.62it/s] 8%|▊ | 957/11526 [09:59<1:48:41, 1.62it/s] {'loss': 0.3723, 'grad_norm': 0.6829838752746582, 'learning_rate': 8.300086730268865e-06, 'epoch': 0.25}
8%|▊ | 957/11526 [09:59<1:48:41, 1.62it/s] 8%|▊ | 958/11526 [09:59<1:48:35, 1.62it/s] {'loss': 0.3267, 'grad_norm': 0.6013583540916443, 'learning_rate': 8.308759757155249e-06, 'epoch': 0.25}
8%|▊ | 958/11526 [10:00<1:48:35, 1.62it/s] 8%|▊ | 959/11526 [10:00<1:48:30, 1.62it/s] {'loss': 0.3029, 'grad_norm': 0.6962816715240479, 'learning_rate': 8.317432784041631e-06, 'epoch': 0.25}
8%|▊ | 959/11526 [10:00<1:48:30, 1.62it/s] 8%|▊ | 960/11526 [10:01<1:48:34, 1.62it/s] {'loss': 0.3326, 'grad_norm': 0.6009585857391357, 'learning_rate': 8.326105810928014e-06, 'epoch': 0.25}
8%|▊ | 960/11526 [10:01<1:48:34, 1.62it/s] 8%|▊ | 961/11526 [10:01<1:48:30, 1.62it/s] {'loss': 0.3851, 'grad_norm': 0.6680829524993896, 'learning_rate': 8.334778837814398e-06, 'epoch': 0.25}
8%|▊ | 961/11526 [10:01<1:48:30, 1.62it/s] 8%|▊ | 962/11526 [10:02<1:48:30, 1.62it/s] {'loss': 0.3184, 'grad_norm': 0.6824049949645996, 'learning_rate': 8.34345186470078e-06, 'epoch': 0.25}
8%|▊ | 962/11526 [10:02<1:48:30, 1.62it/s] 8%|▊ | 963/11526 [10:03<1:48:21, 1.62it/s] {'loss': 0.4896, 'grad_norm': 0.8954772353172302, 'learning_rate': 8.352124891587165e-06, 'epoch': 0.25}
8%|▊ | 963/11526 [10:03<1:48:21, 1.62it/s] 8%|▊ | 964/11526 [10:03<1:48:16, 1.63it/s] {'loss': 0.3179, 'grad_norm': 0.6279625296592712, 'learning_rate': 8.360797918473547e-06, 'epoch': 0.25}
8%|▊ | 964/11526 [10:03<1:48:16, 1.63it/s] 8%|▊ | 965/11526 [10:04<1:48:20, 1.62it/s] {'loss': 0.3518, 'grad_norm': 0.6343486905097961, 'learning_rate': 8.369470945359931e-06, 'epoch': 0.25}
8%|▊ | 965/11526 [10:04<1:48:20, 1.62it/s] 8%|▊ | 966/11526 [10:04<1:48:14, 1.63it/s] {'loss': 0.2959, 'grad_norm': 0.6421394944190979, 'learning_rate': 8.378143972246314e-06, 'epoch': 0.25}
8%|▊ | 966/11526 [10:05<1:48:14, 1.63it/s] 8%|▊ | 967/11526 [10:05<1:48:23, 1.62it/s] {'loss': 0.3688, 'grad_norm': 0.7580284476280212, 'learning_rate': 8.386816999132698e-06, 'epoch': 0.25}
8%|▊ | 967/11526 [10:05<1:48:23, 1.62it/s] 8%|▊ | 968/11526 [10:06<1:48:19, 1.62it/s] {'loss': 0.2962, 'grad_norm': 0.6466118097305298, 'learning_rate': 8.39549002601908e-06, 'epoch': 0.25}
8%|▊ | 968/11526 [10:06<1:48:19, 1.62it/s] 8%|▊ | 969/11526 [10:06<1:48:17, 1.62it/s] {'loss': 0.3505, 'grad_norm': 0.6639811992645264, 'learning_rate': 8.404163052905465e-06, 'epoch': 0.25}
8%|▊ | 969/11526 [10:06<1:48:17, 1.62it/s] 8%|▊ | 970/11526 [10:07<1:48:19, 1.62it/s] {'loss': 0.4792, 'grad_norm': 0.8033050298690796, 'learning_rate': 8.412836079791849e-06, 'epoch': 0.25}
8%|▊ | 970/11526 [10:07<1:48:19, 1.62it/s] 8%|▊ | 971/11526 [10:07<1:48:14, 1.63it/s] {'loss': 0.292, 'grad_norm': 0.6043140888214111, 'learning_rate': 8.421509106678232e-06, 'epoch': 0.25}
8%|▊ | 971/11526 [10:08<1:48:14, 1.63it/s] 8%|▊ | 972/11526 [10:08<1:48:18, 1.62it/s] {'loss': 0.2568, 'grad_norm': 1.1217511892318726, 'learning_rate': 8.430182133564616e-06, 'epoch': 0.25}
8%|▊ | 972/11526 [10:08<1:48:18, 1.62it/s] 8%|▊ | 973/11526 [10:09<1:48:11, 1.63it/s] {'loss': 0.3944, 'grad_norm': 0.6210816502571106, 'learning_rate': 8.438855160450998e-06, 'epoch': 0.25}
8%|▊ | 973/11526 [10:09<1:48:11, 1.63it/s] 8%|▊ | 974/11526 [10:09<1:48:13, 1.63it/s] {'loss': 0.3497, 'grad_norm': 0.7469643354415894, 'learning_rate': 8.44752818733738e-06, 'epoch': 0.25}
8%|▊ | 974/11526 [10:09<1:48:13, 1.63it/s] 8%|▊ | 975/11526 [10:10<1:48:21, 1.62it/s] {'loss': 0.4847, 'grad_norm': 0.7448015213012695, 'learning_rate': 8.456201214223765e-06, 'epoch': 0.25}
8%|▊ | 975/11526 [10:10<1:48:21, 1.62it/s] 8%|▊ | 976/11526 [10:11<1:53:58, 1.54it/s] {'loss': 0.3475, 'grad_norm': 0.6953893899917603, 'learning_rate': 8.464874241110147e-06, 'epoch': 0.25}
8%|▊ | 976/11526 [10:11<1:53:58, 1.54it/s] 8%|▊ | 977/11526 [10:11<1:52:08, 1.57it/s] {'loss': 0.5261, 'grad_norm': 0.8689889311790466, 'learning_rate': 8.473547267996532e-06, 'epoch': 0.25}
8%|▊ | 977/11526 [10:11<1:52:08, 1.57it/s] 8%|▊ | 978/11526 [10:12<1:50:58, 1.58it/s] {'loss': 0.2736, 'grad_norm': 0.6072566509246826, 'learning_rate': 8.482220294882914e-06, 'epoch': 0.25}
8%|▊ | 978/11526 [10:12<1:50:58, 1.58it/s] 8%|▊ | 979/11526 [10:13<1:50:04, 1.60it/s] {'loss': 0.3017, 'grad_norm': 0.681659996509552, 'learning_rate': 8.490893321769298e-06, 'epoch': 0.25}
8%|▊ | 979/11526 [10:13<1:50:04, 1.60it/s] 9%|▊ | 980/11526 [10:13<1:49:27, 1.61it/s] {'loss': 0.332, 'grad_norm': 0.6370188593864441, 'learning_rate': 8.49956634865568e-06, 'epoch': 0.26}
9%|▊ | 980/11526 [10:13<1:49:27, 1.61it/s] 9%|▊ | 981/11526 [10:14<1:48:59, 1.61it/s] {'loss': 0.354, 'grad_norm': 0.6394904255867004, 'learning_rate': 8.508239375542065e-06, 'epoch': 0.26}
9%|▊ | 981/11526 [10:14<1:48:59, 1.61it/s] 9%|▊ | 982/11526 [10:14<1:48:42, 1.62it/s] {'loss': 0.2929, 'grad_norm': 0.6187093257904053, 'learning_rate': 8.516912402428448e-06, 'epoch': 0.26}
9%|▊ | 982/11526 [10:14<1:48:42, 1.62it/s] 9%|▊ | 983/11526 [10:15<1:48:27, 1.62it/s] {'loss': 0.3481, 'grad_norm': 0.6961092948913574, 'learning_rate': 8.525585429314832e-06, 'epoch': 0.26}
9%|▊ | 983/11526 [10:15<1:48:27, 1.62it/s] 9%|▊ | 984/11526 [10:16<1:48:21, 1.62it/s] {'loss': 0.3179, 'grad_norm': 0.6759708523750305, 'learning_rate': 8.534258456201216e-06, 'epoch': 0.26}
9%|▊ | 984/11526 [10:16<1:48:21, 1.62it/s] 9%|▊ | 985/11526 [10:16<1:48:13, 1.62it/s] {'loss': 0.3096, 'grad_norm': 0.6581959128379822, 'learning_rate': 8.542931483087598e-06, 'epoch': 0.26}
9%|▊ | 985/11526 [10:16<1:48:13, 1.62it/s] 9%|▊ | 986/11526 [10:17<1:48:08, 1.62it/s] {'loss': 0.2893, 'grad_norm': 0.8766991496086121, 'learning_rate': 8.551604509973983e-06, 'epoch': 0.26}
9%|▊ | 986/11526 [10:17<1:48:08, 1.62it/s] 9%|▊ | 987/11526 [10:17<1:48:05, 1.62it/s] {'loss': 0.3004, 'grad_norm': 0.5884973406791687, 'learning_rate': 8.560277536860365e-06, 'epoch': 0.26}
9%|▊ | 987/11526 [10:18<1:48:05, 1.62it/s] 9%|▊ | 988/11526 [10:18<1:48:07, 1.62it/s] {'loss': 0.268, 'grad_norm': 0.5794839859008789, 'learning_rate': 8.56895056374675e-06, 'epoch': 0.26}
9%|▊ | 988/11526 [10:18<1:48:07, 1.62it/s] 9%|▊ | 989/11526 [10:19<1:48:00, 1.63it/s] {'loss': 0.3012, 'grad_norm': 0.6424844264984131, 'learning_rate': 8.577623590633132e-06, 'epoch': 0.26}
9%|▊ | 989/11526 [10:19<1:48:00, 1.63it/s] 9%|▊ | 990/11526 [10:19<1:48:11, 1.62it/s] {'loss': 0.3115, 'grad_norm': 0.8079777359962463, 'learning_rate': 8.586296617519514e-06, 'epoch': 0.26}
9%|▊ | 990/11526 [10:19<1:48:11, 1.62it/s] 9%|▊ | 991/11526 [10:20<1:48:06, 1.62it/s] {'loss': 0.2857, 'grad_norm': 0.6683807373046875, 'learning_rate': 8.594969644405898e-06, 'epoch': 0.26}
9%|▊ | 991/11526 [10:20<1:48:06, 1.62it/s] 9%|▊ | 992/11526 [10:21<1:47:59, 1.63it/s] {'loss': 0.3697, 'grad_norm': 0.6853190064430237, 'learning_rate': 8.603642671292281e-06, 'epoch': 0.26}
9%|▊ | 992/11526 [10:21<1:47:59, 1.63it/s] 9%|▊ | 993/11526 [10:21<1:48:07, 1.62it/s] {'loss': 0.3008, 'grad_norm': 0.7861067056655884, 'learning_rate': 8.612315698178665e-06, 'epoch': 0.26}
9%|▊ | 993/11526 [10:21<1:48:07, 1.62it/s] 9%|▊ | 994/11526 [10:22<1:48:02, 1.62it/s] {'loss': 0.3705, 'grad_norm': 0.8013705015182495, 'learning_rate': 8.620988725065048e-06, 'epoch': 0.26}
9%|▊ | 994/11526 [10:22<1:48:02, 1.62it/s] 9%|▊ | 995/11526 [10:22<1:48:04, 1.62it/s] {'loss': 0.3957, 'grad_norm': 0.731290876865387, 'learning_rate': 8.629661751951432e-06, 'epoch': 0.26}
9%|▊ | 995/11526 [10:22<1:48:04, 1.62it/s] 9%|▊ | 996/11526 [10:23<1:48:00, 1.62it/s] {'loss': 0.3842, 'grad_norm': 0.6502794027328491, 'learning_rate': 8.638334778837814e-06, 'epoch': 0.26}
9%|▊ | 996/11526 [10:23<1:48:00, 1.62it/s] 9%|▊ | 997/11526 [10:24<1:47:55, 1.63it/s] {'loss': 0.2842, 'grad_norm': 0.6521506309509277, 'learning_rate': 8.647007805724199e-06, 'epoch': 0.26}
9%|▊ | 997/11526 [10:24<1:47:55, 1.63it/s] 9%|▊ | 998/11526 [10:24<1:47:52, 1.63it/s] {'loss': 0.2669, 'grad_norm': 0.70146244764328, 'learning_rate': 8.655680832610583e-06, 'epoch': 0.26}
9%|▊ | 998/11526 [10:24<1:47:52, 1.63it/s] 9%|▊ | 999/11526 [10:25<1:47:47, 1.63it/s] {'loss': 0.4575, 'grad_norm': 0.8079536557197571, 'learning_rate': 8.664353859496965e-06, 'epoch': 0.26}
9%|▊ | 999/11526 [10:25<1:47:47, 1.63it/s] 9%|▊ | 1000/11526 [10:25<1:48:00, 1.62it/s] {'loss': 0.4004, 'grad_norm': 0.6627769470214844, 'learning_rate': 8.67302688638335e-06, 'epoch': 0.26}
9%|▊ | 1000/11526 [10:26<1:48:00, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.36it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.02it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.7717033624649048, 'eval_runtime': 1.9535, 'eval_samples_per_second': 102.381, 'eval_steps_per_second': 6.655, 'epoch': 0.26}
9%|▊ | 1000/11526 [10:28<1:48:00, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 9%|▊ | 1001/11526 [10:28<3:31:01, 1.20s/it] {'loss': 0.4169, 'grad_norm': 0.6686106324195862, 'learning_rate': 8.681699913269732e-06, 'epoch': 0.26}
9%|▊ | 1001/11526 [10:28<3:31:01, 1.20s/it] 9%|▊ | 1002/11526 [10:29<3:00:03, 1.03s/it] {'loss': 0.3814, 'grad_norm': 0.6950452327728271, 'learning_rate': 8.690372940156116e-06, 'epoch': 0.26}
9%|▊ | 1002/11526 [10:29<3:00:03, 1.03s/it] 9%|▊ | 1003/11526 [10:29<2:38:19, 1.11it/s] {'loss': 0.3716, 'grad_norm': 0.7297069430351257, 'learning_rate': 8.699045967042499e-06, 'epoch': 0.26}
9%|▊ | 1003/11526 [10:29<2:38:19, 1.11it/s] 9%|▊ | 1004/11526 [10:30<2:23:04, 1.23it/s] {'loss': 0.2945, 'grad_norm': 0.5942865014076233, 'learning_rate': 8.707718993928881e-06, 'epoch': 0.26}
9%|▊ | 1004/11526 [10:30<2:23:04, 1.23it/s] 9%|▊ | 1005/11526 [10:30<2:12:29, 1.32it/s] {'loss': 0.2879, 'grad_norm': 0.682569146156311, 'learning_rate': 8.716392020815265e-06, 'epoch': 0.26}
9%|▊ | 1005/11526 [10:31<2:12:29, 1.32it/s] 9%|▊ | 1006/11526 [10:31<2:05:04, 1.40it/s] {'loss': 0.3672, 'grad_norm': 0.7473095655441284, 'learning_rate': 8.725065047701648e-06, 'epoch': 0.26}
9%|▊ | 1006/11526 [10:31<2:05:04, 1.40it/s] 9%|▊ | 1007/11526 [10:32<1:59:49, 1.46it/s] {'loss': 0.2358, 'grad_norm': 0.6127230525016785, 'learning_rate': 8.733738074588032e-06, 'epoch': 0.26}
9%|▊ | 1007/11526 [10:32<1:59:49, 1.46it/s] 9%|▊ | 1008/11526 [10:32<1:56:09, 1.51it/s] {'loss': 0.3331, 'grad_norm': 0.673065721988678, 'learning_rate': 8.742411101474415e-06, 'epoch': 0.26}
9%|▊ | 1008/11526 [10:32<1:56:09, 1.51it/s] 9%|▉ | 1009/11526 [10:33<1:53:36, 1.54it/s] {'loss': 0.3797, 'grad_norm': 0.7414672374725342, 'learning_rate': 8.751084128360799e-06, 'epoch': 0.26}
9%|▉ | 1009/11526 [10:33<1:53:36, 1.54it/s] 9%|▉ | 1010/11526 [10:34<1:51:46, 1.57it/s] {'loss': 0.3232, 'grad_norm': 0.6436885595321655, 'learning_rate': 8.759757155247181e-06, 'epoch': 0.26}
9%|▉ | 1010/11526 [10:34<1:51:46, 1.57it/s] 9%|▉ | 1011/11526 [10:34<1:50:30, 1.59it/s] {'loss': 0.3816, 'grad_norm': 0.813566267490387, 'learning_rate': 8.768430182133565e-06, 'epoch': 0.26}
9%|▉ | 1011/11526 [10:34<1:50:30, 1.59it/s] 9%|▉ | 1012/11526 [10:35<1:49:34, 1.60it/s] {'loss': 0.3582, 'grad_norm': 0.7398074269294739, 'learning_rate': 8.777103209019948e-06, 'epoch': 0.26}
9%|▉ | 1012/11526 [10:35<1:49:34, 1.60it/s] 9%|▉ | 1013/11526 [10:35<1:48:56, 1.61it/s] {'loss': 0.3006, 'grad_norm': 0.6016348004341125, 'learning_rate': 8.785776235906332e-06, 'epoch': 0.26}
9%|▉ | 1013/11526 [10:36<1:48:56, 1.61it/s] 9%|▉ | 1014/11526 [10:36<1:48:30, 1.61it/s] {'loss': 0.2726, 'grad_norm': 0.6118936538696289, 'learning_rate': 8.794449262792716e-06, 'epoch': 0.26}
9%|▉ | 1014/11526 [10:36<1:48:30, 1.61it/s] 9%|▉ | 1015/11526 [10:37<1:48:10, 1.62it/s] {'loss': 0.3675, 'grad_norm': 0.7111355662345886, 'learning_rate': 8.803122289679099e-06, 'epoch': 0.26}
9%|▉ | 1015/11526 [10:37<1:48:10, 1.62it/s] 9%|▉ | 1016/11526 [10:37<1:47:59, 1.62it/s] {'loss': 0.4096, 'grad_norm': 0.7180419564247131, 'learning_rate': 8.811795316565483e-06, 'epoch': 0.26}
9%|▉ | 1016/11526 [10:37<1:47:59, 1.62it/s] 9%|▉ | 1017/11526 [10:38<1:47:51, 1.62it/s] {'loss': 0.3128, 'grad_norm': 0.7526729106903076, 'learning_rate': 8.820468343451866e-06, 'epoch': 0.26}
9%|▉ | 1017/11526 [10:38<1:47:51, 1.62it/s] 9%|▉ | 1018/11526 [10:38<1:47:45, 1.63it/s] {'loss': 0.3366, 'grad_norm': 0.7552559971809387, 'learning_rate': 8.82914137033825e-06, 'epoch': 0.26}
9%|▉ | 1018/11526 [10:39<1:47:45, 1.63it/s] 9%|▉ | 1019/11526 [10:39<1:47:37, 1.63it/s] {'loss': 0.2751, 'grad_norm': 0.5671443343162537, 'learning_rate': 8.837814397224632e-06, 'epoch': 0.27}
9%|▉ | 1019/11526 [10:39<1:47:37, 1.63it/s] 9%|▉ | 1020/11526 [10:40<1:47:34, 1.63it/s] {'loss': 0.4851, 'grad_norm': 0.8711665868759155, 'learning_rate': 8.846487424111015e-06, 'epoch': 0.27}
9%|▉ | 1020/11526 [10:40<1:47:34, 1.63it/s] 9%|▉ | 1021/11526 [10:40<1:47:33, 1.63it/s] {'loss': 0.2896, 'grad_norm': 0.5815742611885071, 'learning_rate': 8.855160450997399e-06, 'epoch': 0.27}
9%|▉ | 1021/11526 [10:40<1:47:33, 1.63it/s] 9%|▉ | 1022/11526 [10:41<1:47:30, 1.63it/s] {'loss': 0.3226, 'grad_norm': 0.6572399139404297, 'learning_rate': 8.863833477883781e-06, 'epoch': 0.27}
9%|▉ | 1022/11526 [10:41<1:47:30, 1.63it/s] 9%|▉ | 1023/11526 [10:42<1:47:30, 1.63it/s] {'loss': 0.378, 'grad_norm': 0.7928173542022705, 'learning_rate': 8.872506504770166e-06, 'epoch': 0.27}
9%|▉ | 1023/11526 [10:42<1:47:30, 1.63it/s] 9%|▉ | 1024/11526 [10:42<1:47:34, 1.63it/s] {'loss': 0.3606, 'grad_norm': 0.7149636149406433, 'learning_rate': 8.881179531656548e-06, 'epoch': 0.27}
9%|▉ | 1024/11526 [10:42<1:47:34, 1.63it/s] 9%|▉ | 1025/11526 [10:43<1:47:28, 1.63it/s] {'loss': 0.3171, 'grad_norm': 0.7085655927658081, 'learning_rate': 8.889852558542932e-06, 'epoch': 0.27}
9%|▉ | 1025/11526 [10:43<1:47:28, 1.63it/s] 9%|▉ | 1026/11526 [10:43<1:47:37, 1.63it/s] {'loss': 0.3437, 'grad_norm': 0.6728426218032837, 'learning_rate': 8.898525585429315e-06, 'epoch': 0.27}
9%|▉ | 1026/11526 [10:43<1:47:37, 1.63it/s] 9%|▉ | 1027/11526 [10:44<1:47:34, 1.63it/s] {'loss': 0.3227, 'grad_norm': 0.6695771813392639, 'learning_rate': 8.907198612315699e-06, 'epoch': 0.27}
9%|▉ | 1027/11526 [10:44<1:47:34, 1.63it/s] 9%|▉ | 1028/11526 [10:45<1:47:32, 1.63it/s] {'loss': 0.3033, 'grad_norm': 0.6193923354148865, 'learning_rate': 8.915871639202083e-06, 'epoch': 0.27}
9%|▉ | 1028/11526 [10:45<1:47:32, 1.63it/s] 9%|▉ | 1029/11526 [10:45<1:47:33, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.5271924734115601, 'learning_rate': 8.924544666088466e-06, 'epoch': 0.27}
9%|▉ | 1029/11526 [10:45<1:47:33, 1.63it/s] 9%|▉ | 1030/11526 [10:46<1:47:29, 1.63it/s] {'loss': 0.3722, 'grad_norm': 0.7165858745574951, 'learning_rate': 8.93321769297485e-06, 'epoch': 0.27}
9%|▉ | 1030/11526 [10:46<1:47:29, 1.63it/s] 9%|▉ | 1031/11526 [10:46<1:47:26, 1.63it/s] {'loss': 0.3149, 'grad_norm': 0.6534351706504822, 'learning_rate': 8.941890719861232e-06, 'epoch': 0.27}
9%|▉ | 1031/11526 [10:47<1:47:26, 1.63it/s] 9%|▉ | 1032/11526 [10:47<1:47:29, 1.63it/s] {'loss': 0.3289, 'grad_norm': 0.696994960308075, 'learning_rate': 8.950563746747617e-06, 'epoch': 0.27}
9%|▉ | 1032/11526 [10:47<1:47:29, 1.63it/s] 9%|▉ | 1033/11526 [10:48<1:47:31, 1.63it/s] {'loss': 0.3779, 'grad_norm': 0.687186062335968, 'learning_rate': 8.959236773633999e-06, 'epoch': 0.27}
9%|▉ | 1033/11526 [10:48<1:47:31, 1.63it/s] 9%|▉ | 1034/11526 [10:48<1:53:35, 1.54it/s] {'loss': 0.41, 'grad_norm': 0.7428098917007446, 'learning_rate': 8.967909800520382e-06, 'epoch': 0.27}
9%|▉ | 1034/11526 [10:49<1:53:35, 1.54it/s] 9%|▉ | 1035/11526 [10:49<1:51:50, 1.56it/s] {'loss': 0.3321, 'grad_norm': 0.685451865196228, 'learning_rate': 8.976582827406766e-06, 'epoch': 0.27}
9%|▉ | 1035/11526 [10:49<1:51:50, 1.56it/s] 9%|▉ | 1036/11526 [10:50<1:50:31, 1.58it/s] {'loss': 0.3086, 'grad_norm': 0.6776358485221863, 'learning_rate': 8.985255854293148e-06, 'epoch': 0.27}
9%|▉ | 1036/11526 [10:50<1:50:31, 1.58it/s] 9%|▉ | 1037/11526 [10:50<1:49:38, 1.59it/s] {'loss': 0.2672, 'grad_norm': 0.5746687650680542, 'learning_rate': 8.993928881179532e-06, 'epoch': 0.27}
9%|▉ | 1037/11526 [10:50<1:49:38, 1.59it/s] 9%|▉ | 1038/11526 [10:51<1:48:57, 1.60it/s] {'loss': 0.3289, 'grad_norm': 0.7165554165840149, 'learning_rate': 9.002601908065915e-06, 'epoch': 0.27}
9%|▉ | 1038/11526 [10:51<1:48:57, 1.60it/s] 9%|▉ | 1039/11526 [10:51<1:48:25, 1.61it/s] {'loss': 0.3541, 'grad_norm': 0.7743039727210999, 'learning_rate': 9.0112749349523e-06, 'epoch': 0.27}
9%|▉ | 1039/11526 [10:52<1:48:25, 1.61it/s] 9%|▉ | 1040/11526 [10:52<1:48:04, 1.62it/s] {'loss': 0.289, 'grad_norm': 0.6489986181259155, 'learning_rate': 9.019947961838682e-06, 'epoch': 0.27}
9%|▉ | 1040/11526 [10:52<1:48:04, 1.62it/s] 9%|▉ | 1041/11526 [10:53<1:47:50, 1.62it/s] {'loss': 0.3476, 'grad_norm': 0.6888150572776794, 'learning_rate': 9.028620988725066e-06, 'epoch': 0.27}
9%|▉ | 1041/11526 [10:53<1:47:50, 1.62it/s] 9%|▉ | 1042/11526 [10:53<1:47:39, 1.62it/s] {'loss': 0.4098, 'grad_norm': 0.8587669730186462, 'learning_rate': 9.037294015611448e-06, 'epoch': 0.27}
9%|▉ | 1042/11526 [10:53<1:47:39, 1.62it/s] 9%|▉ | 1043/11526 [10:54<1:47:39, 1.62it/s] {'loss': 0.256, 'grad_norm': 0.6029600501060486, 'learning_rate': 9.045967042497833e-06, 'epoch': 0.27}
9%|▉ | 1043/11526 [10:54<1:47:39, 1.62it/s] 9%|▉ | 1044/11526 [10:55<1:47:33, 1.62it/s] {'loss': 0.324, 'grad_norm': 0.6674512028694153, 'learning_rate': 9.054640069384217e-06, 'epoch': 0.27}
9%|▉ | 1044/11526 [10:55<1:47:33, 1.62it/s] 9%|▉ | 1045/11526 [10:55<1:47:28, 1.63it/s] {'loss': 0.3553, 'grad_norm': 0.7347824573516846, 'learning_rate': 9.0633130962706e-06, 'epoch': 0.27}
9%|▉ | 1045/11526 [10:55<1:47:28, 1.63it/s] 9%|▉ | 1046/11526 [10:56<1:47:21, 1.63it/s] {'loss': 0.4098, 'grad_norm': 0.7534907460212708, 'learning_rate': 9.071986123156983e-06, 'epoch': 0.27}
9%|▉ | 1046/11526 [10:56<1:47:21, 1.63it/s] 9%|▉ | 1047/11526 [10:56<1:47:23, 1.63it/s] {'loss': 0.3637, 'grad_norm': 0.7227787971496582, 'learning_rate': 9.080659150043366e-06, 'epoch': 0.27}
9%|▉ | 1047/11526 [10:57<1:47:23, 1.63it/s] 9%|▉ | 1048/11526 [10:57<1:47:29, 1.62it/s] {'loss': 0.3308, 'grad_norm': 0.7248200178146362, 'learning_rate': 9.08933217692975e-06, 'epoch': 0.27}
9%|▉ | 1048/11526 [10:57<1:47:29, 1.62it/s] 9%|▉ | 1049/11526 [10:58<1:47:23, 1.63it/s] {'loss': 0.4304, 'grad_norm': 0.7174375653266907, 'learning_rate': 9.098005203816133e-06, 'epoch': 0.27}
9%|▉ | 1049/11526 [10:58<1:47:23, 1.63it/s] 9%|▉ | 1050/11526 [10:58<1:47:19, 1.63it/s] {'loss': 0.3878, 'grad_norm': 0.8381022810935974, 'learning_rate': 9.106678230702515e-06, 'epoch': 0.27}
9%|▉ | 1050/11526 [10:58<1:47:19, 1.63it/s] 9%|▉ | 1051/11526 [10:59<1:47:18, 1.63it/s] {'loss': 0.3637, 'grad_norm': 0.7055355310440063, 'learning_rate': 9.1153512575889e-06, 'epoch': 0.27}
9%|▉ | 1051/11526 [10:59<1:47:18, 1.63it/s] 9%|▉ | 1052/11526 [10:59<1:47:14, 1.63it/s] {'loss': 0.3633, 'grad_norm': 0.6673529744148254, 'learning_rate': 9.124024284475282e-06, 'epoch': 0.27}
9%|▉ | 1052/11526 [11:00<1:47:14, 1.63it/s] 9%|▉ | 1053/11526 [11:00<1:47:11, 1.63it/s] {'loss': 0.2643, 'grad_norm': 0.5791777968406677, 'learning_rate': 9.132697311361666e-06, 'epoch': 0.27}
9%|▉ | 1053/11526 [11:00<1:47:11, 1.63it/s] 9%|▉ | 1054/11526 [11:01<1:47:09, 1.63it/s] {'loss': 0.3315, 'grad_norm': 1.0432590246200562, 'learning_rate': 9.141370338248049e-06, 'epoch': 0.27}
9%|▉ | 1054/11526 [11:01<1:47:09, 1.63it/s] 9%|▉ | 1055/11526 [11:01<1:47:07, 1.63it/s] {'loss': 0.3689, 'grad_norm': 0.6886454224586487, 'learning_rate': 9.150043365134433e-06, 'epoch': 0.27}
9%|▉ | 1055/11526 [11:01<1:47:07, 1.63it/s] 9%|▉ | 1056/11526 [11:02<1:47:06, 1.63it/s] {'loss': 0.3925, 'grad_norm': 1.0679984092712402, 'learning_rate': 9.158716392020815e-06, 'epoch': 0.27}
9%|▉ | 1056/11526 [11:02<1:47:06, 1.63it/s] 9%|▉ | 1057/11526 [11:03<1:47:05, 1.63it/s] {'loss': 0.3956, 'grad_norm': 0.7865321040153503, 'learning_rate': 9.1673894189072e-06, 'epoch': 0.28}
9%|▉ | 1057/11526 [11:03<1:47:05, 1.63it/s] 9%|▉ | 1058/11526 [11:03<1:47:07, 1.63it/s] {'loss': 0.3618, 'grad_norm': 0.6852288246154785, 'learning_rate': 9.176062445793584e-06, 'epoch': 0.28}
9%|▉ | 1058/11526 [11:03<1:47:07, 1.63it/s] 9%|▉ | 1059/11526 [11:04<1:47:07, 1.63it/s] {'loss': 0.3287, 'grad_norm': 0.6252366304397583, 'learning_rate': 9.184735472679966e-06, 'epoch': 0.28}
9%|▉ | 1059/11526 [11:04<1:47:07, 1.63it/s] 9%|▉ | 1060/11526 [11:04<1:47:07, 1.63it/s] {'loss': 0.3479, 'grad_norm': 0.6923961043357849, 'learning_rate': 9.19340849956635e-06, 'epoch': 0.28}
9%|▉ | 1060/11526 [11:04<1:47:07, 1.63it/s] 9%|▉ | 1061/11526 [11:05<1:47:04, 1.63it/s] {'loss': 0.2854, 'grad_norm': 0.6875212788581848, 'learning_rate': 9.202081526452733e-06, 'epoch': 0.28}
9%|▉ | 1061/11526 [11:05<1:47:04, 1.63it/s] 9%|▉ | 1062/11526 [11:06<1:47:04, 1.63it/s] {'loss': 0.3455, 'grad_norm': 0.7662047743797302, 'learning_rate': 9.210754553339117e-06, 'epoch': 0.28}
9%|▉ | 1062/11526 [11:06<1:47:04, 1.63it/s] 9%|▉ | 1063/11526 [11:06<1:47:04, 1.63it/s] {'loss': 0.3059, 'grad_norm': 0.6567496061325073, 'learning_rate': 9.2194275802255e-06, 'epoch': 0.28}
9%|▉ | 1063/11526 [11:06<1:47:04, 1.63it/s] 9%|▉ | 1064/11526 [11:07<1:47:01, 1.63it/s] {'loss': 0.2859, 'grad_norm': 0.7017526626586914, 'learning_rate': 9.228100607111882e-06, 'epoch': 0.28}
9%|▉ | 1064/11526 [11:07<1:47:01, 1.63it/s] 9%|▉ | 1065/11526 [11:07<1:47:01, 1.63it/s] {'loss': 0.348, 'grad_norm': 0.6375651359558105, 'learning_rate': 9.236773633998266e-06, 'epoch': 0.28}
9%|▉ | 1065/11526 [11:08<1:47:01, 1.63it/s] 9%|▉ | 1066/11526 [11:08<1:47:03, 1.63it/s] {'loss': 0.355, 'grad_norm': 0.699772834777832, 'learning_rate': 9.245446660884649e-06, 'epoch': 0.28}
9%|▉ | 1066/11526 [11:08<1:47:03, 1.63it/s] 9%|▉ | 1067/11526 [11:09<1:47:02, 1.63it/s] {'loss': 0.3551, 'grad_norm': 0.5997841358184814, 'learning_rate': 9.254119687771033e-06, 'epoch': 0.28}
9%|▉ | 1067/11526 [11:09<1:47:02, 1.63it/s] 9%|▉ | 1068/11526 [11:09<1:47:04, 1.63it/s] {'loss': 0.325, 'grad_norm': 0.651286244392395, 'learning_rate': 9.262792714657415e-06, 'epoch': 0.28}
9%|▉ | 1068/11526 [11:09<1:47:04, 1.63it/s] 9%|▉ | 1069/11526 [11:10<1:47:06, 1.63it/s] {'loss': 0.4082, 'grad_norm': 0.7161149978637695, 'learning_rate': 9.2714657415438e-06, 'epoch': 0.28}
9%|▉ | 1069/11526 [11:10<1:47:06, 1.63it/s] 9%|▉ | 1070/11526 [11:11<1:47:04, 1.63it/s] {'loss': 0.2888, 'grad_norm': 0.6129555702209473, 'learning_rate': 9.280138768430182e-06, 'epoch': 0.28}
9%|▉ | 1070/11526 [11:11<1:47:04, 1.63it/s] 9%|▉ | 1071/11526 [11:11<1:47:03, 1.63it/s] {'loss': 0.2935, 'grad_norm': 0.6845977306365967, 'learning_rate': 9.288811795316566e-06, 'epoch': 0.28}
9%|▉ | 1071/11526 [11:11<1:47:03, 1.63it/s] 9%|▉ | 1072/11526 [11:12<1:46:59, 1.63it/s] {'loss': 0.3237, 'grad_norm': 0.6552291512489319, 'learning_rate': 9.29748482220295e-06, 'epoch': 0.28}
9%|▉ | 1072/11526 [11:12<1:46:59, 1.63it/s] 9%|▉ | 1073/11526 [11:12<1:46:55, 1.63it/s] {'loss': 0.3233, 'grad_norm': 0.7863655090332031, 'learning_rate': 9.306157849089333e-06, 'epoch': 0.28}
9%|▉ | 1073/11526 [11:12<1:46:55, 1.63it/s] 9%|▉ | 1074/11526 [11:13<1:46:58, 1.63it/s] {'loss': 0.2695, 'grad_norm': 0.6659260988235474, 'learning_rate': 9.314830875975717e-06, 'epoch': 0.28}
9%|▉ | 1074/11526 [11:13<1:46:58, 1.63it/s] 9%|▉ | 1075/11526 [11:14<1:46:57, 1.63it/s] {'loss': 0.3949, 'grad_norm': 0.7785394787788391, 'learning_rate': 9.3235039028621e-06, 'epoch': 0.28}
9%|▉ | 1075/11526 [11:14<1:46:57, 1.63it/s] 9%|▉ | 1076/11526 [11:14<1:46:55, 1.63it/s] {'loss': 0.3294, 'grad_norm': 0.6363103985786438, 'learning_rate': 9.332176929748484e-06, 'epoch': 0.28}
9%|▉ | 1076/11526 [11:14<1:46:55, 1.63it/s] 9%|▉ | 1077/11526 [11:15<1:46:53, 1.63it/s] {'loss': 0.2836, 'grad_norm': 0.65339195728302, 'learning_rate': 9.340849956634866e-06, 'epoch': 0.28}
9%|▉ | 1077/11526 [11:15<1:46:53, 1.63it/s] 9%|▉ | 1078/11526 [11:15<1:46:56, 1.63it/s] {'loss': 0.2822, 'grad_norm': 0.6022049784660339, 'learning_rate': 9.34952298352125e-06, 'epoch': 0.28}
9%|▉ | 1078/11526 [11:16<1:46:56, 1.63it/s] 9%|▉ | 1079/11526 [11:16<1:46:56, 1.63it/s] {'loss': 0.3413, 'grad_norm': 0.6849562525749207, 'learning_rate': 9.358196010407633e-06, 'epoch': 0.28}
9%|▉ | 1079/11526 [11:16<1:46:56, 1.63it/s] 9%|▉ | 1080/11526 [11:17<1:46:58, 1.63it/s] {'loss': 0.4227, 'grad_norm': 0.8562867641448975, 'learning_rate': 9.366869037294016e-06, 'epoch': 0.28}
9%|▉ | 1080/11526 [11:17<1:46:58, 1.63it/s] 9%|▉ | 1081/11526 [11:17<1:47:00, 1.63it/s] {'loss': 0.3223, 'grad_norm': 0.6731454730033875, 'learning_rate': 9.3755420641804e-06, 'epoch': 0.28}
9%|▉ | 1081/11526 [11:17<1:47:00, 1.63it/s] 9%|▉ | 1082/11526 [11:18<1:46:58, 1.63it/s] {'loss': 0.4225, 'grad_norm': 0.8182327151298523, 'learning_rate': 9.384215091066782e-06, 'epoch': 0.28}
9%|▉ | 1082/11526 [11:18<1:46:58, 1.63it/s] 9%|▉ | 1083/11526 [11:18<1:46:56, 1.63it/s] {'loss': 0.3986, 'grad_norm': 0.7337361574172974, 'learning_rate': 9.392888117953166e-06, 'epoch': 0.28}
9%|▉ | 1083/11526 [11:19<1:46:56, 1.63it/s] 9%|▉ | 1084/11526 [11:19<1:46:57, 1.63it/s] {'loss': 0.3184, 'grad_norm': 0.7649515271186829, 'learning_rate': 9.401561144839549e-06, 'epoch': 0.28}
9%|▉ | 1084/11526 [11:19<1:46:57, 1.63it/s] 9%|▉ | 1085/11526 [11:20<1:46:53, 1.63it/s] {'loss': 0.281, 'grad_norm': 0.6771136522293091, 'learning_rate': 9.410234171725933e-06, 'epoch': 0.28}
9%|▉ | 1085/11526 [11:20<1:46:53, 1.63it/s] 9%|▉ | 1086/11526 [11:20<1:46:52, 1.63it/s] {'loss': 0.341, 'grad_norm': 0.6867103576660156, 'learning_rate': 9.418907198612316e-06, 'epoch': 0.28}
9%|▉ | 1086/11526 [11:20<1:46:52, 1.63it/s] 9%|▉ | 1087/11526 [11:21<1:46:51, 1.63it/s] {'loss': 0.3327, 'grad_norm': 0.6035224795341492, 'learning_rate': 9.4275802254987e-06, 'epoch': 0.28}
9%|▉ | 1087/11526 [11:21<1:46:51, 1.63it/s] 9%|▉ | 1088/11526 [11:22<1:47:18, 1.62it/s] {'loss': 0.295, 'grad_norm': 0.6268593668937683, 'learning_rate': 9.436253252385084e-06, 'epoch': 0.28}
9%|▉ | 1088/11526 [11:22<1:47:18, 1.62it/s] 9%|▉ | 1089/11526 [11:22<1:47:12, 1.62it/s] {'loss': 0.2896, 'grad_norm': 0.6343078017234802, 'learning_rate': 9.444926279271467e-06, 'epoch': 0.28}
9%|▉ | 1089/11526 [11:22<1:47:12, 1.62it/s] 9%|▉ | 1090/11526 [11:23<1:47:01, 1.63it/s] {'loss': 0.3536, 'grad_norm': 0.7275059819221497, 'learning_rate': 9.45359930615785e-06, 'epoch': 0.28}
9%|▉ | 1090/11526 [11:23<1:47:01, 1.63it/s] 9%|▉ | 1091/11526 [11:23<1:47:00, 1.63it/s] {'loss': 0.3375, 'grad_norm': 0.6837372779846191, 'learning_rate': 9.462272333044233e-06, 'epoch': 0.28}
9%|▉ | 1091/11526 [11:24<1:47:00, 1.63it/s] 9%|▉ | 1092/11526 [11:24<1:46:56, 1.63it/s] {'loss': 0.4063, 'grad_norm': 0.7484169602394104, 'learning_rate': 9.470945359930617e-06, 'epoch': 0.28}
9%|▉ | 1092/11526 [11:24<1:46:56, 1.63it/s] 9%|▉ | 1093/11526 [11:25<1:47:00, 1.62it/s] {'loss': 0.3104, 'grad_norm': 0.7493953704833984, 'learning_rate': 9.479618386817e-06, 'epoch': 0.28}
9%|▉ | 1093/11526 [11:25<1:47:00, 1.62it/s] 9%|▉ | 1094/11526 [11:25<1:46:54, 1.63it/s] {'loss': 0.2951, 'grad_norm': 0.813327431678772, 'learning_rate': 9.488291413703382e-06, 'epoch': 0.28}
9%|▉ | 1094/11526 [11:25<1:46:54, 1.63it/s] 10%|▉ | 1095/11526 [11:26<1:46:54, 1.63it/s] {'loss': 0.409, 'grad_norm': 0.722315788269043, 'learning_rate': 9.496964440589767e-06, 'epoch': 0.29}
10%|▉ | 1095/11526 [11:26<1:46:54, 1.63it/s] 10%|▉ | 1096/11526 [11:26<1:46:54, 1.63it/s] {'loss': 0.325, 'grad_norm': 0.5929539799690247, 'learning_rate': 9.505637467476149e-06, 'epoch': 0.29}
10%|▉ | 1096/11526 [11:27<1:46:54, 1.63it/s] 10%|▉ | 1097/11526 [11:27<1:46:49, 1.63it/s] {'loss': 0.3538, 'grad_norm': 0.7230235934257507, 'learning_rate': 9.514310494362533e-06, 'epoch': 0.29}
10%|▉ | 1097/11526 [11:27<1:46:49, 1.63it/s] 10%|▉ | 1098/11526 [11:28<1:46:47, 1.63it/s] {'loss': 0.3375, 'grad_norm': 0.880620539188385, 'learning_rate': 9.522983521248916e-06, 'epoch': 0.29}
10%|▉ | 1098/11526 [11:28<1:46:47, 1.63it/s] 10%|▉ | 1099/11526 [11:28<1:46:43, 1.63it/s] {'loss': 0.3859, 'grad_norm': 0.8127730488777161, 'learning_rate': 9.5316565481353e-06, 'epoch': 0.29}
10%|▉ | 1099/11526 [11:28<1:46:43, 1.63it/s] 10%|▉ | 1100/11526 [11:29<1:46:37, 1.63it/s] {'loss': 0.2827, 'grad_norm': 0.7007185816764832, 'learning_rate': 9.540329575021683e-06, 'epoch': 0.29}
10%|▉ | 1100/11526 [11:29<1:46:37, 1.63it/s] 10%|▉ | 1101/11526 [11:30<1:46:38, 1.63it/s] {'loss': 0.3819, 'grad_norm': 0.7960718870162964, 'learning_rate': 9.549002601908067e-06, 'epoch': 0.29}
10%|▉ | 1101/11526 [11:30<1:46:38, 1.63it/s] 10%|▉ | 1102/11526 [11:30<1:46:39, 1.63it/s] {'loss': 0.2887, 'grad_norm': 0.5851671695709229, 'learning_rate': 9.557675628794451e-06, 'epoch': 0.29}
10%|▉ | 1102/11526 [11:30<1:46:39, 1.63it/s] 10%|▉ | 1103/11526 [11:31<1:46:41, 1.63it/s] {'loss': 0.3296, 'grad_norm': 0.6284934282302856, 'learning_rate': 9.566348655680833e-06, 'epoch': 0.29}
10%|▉ | 1103/11526 [11:31<1:46:41, 1.63it/s] 10%|▉ | 1104/11526 [11:31<1:46:39, 1.63it/s] {'loss': 0.2696, 'grad_norm': 0.6574866771697998, 'learning_rate': 9.575021682567218e-06, 'epoch': 0.29}
10%|▉ | 1104/11526 [11:32<1:46:39, 1.63it/s] 10%|▉ | 1105/11526 [11:32<1:46:43, 1.63it/s] {'loss': 0.3011, 'grad_norm': 0.6575213074684143, 'learning_rate': 9.5836947094536e-06, 'epoch': 0.29}
10%|▉ | 1105/11526 [11:32<1:46:43, 1.63it/s] 10%|▉ | 1106/11526 [11:33<1:46:41, 1.63it/s] {'loss': 0.3151, 'grad_norm': 0.6747561693191528, 'learning_rate': 9.592367736339984e-06, 'epoch': 0.29}
10%|▉ | 1106/11526 [11:33<1:46:41, 1.63it/s] 10%|▉ | 1107/11526 [11:33<1:46:41, 1.63it/s] {'loss': 0.3586, 'grad_norm': 0.7156968116760254, 'learning_rate': 9.601040763226367e-06, 'epoch': 0.29}
10%|▉ | 1107/11526 [11:33<1:46:41, 1.63it/s] 10%|▉ | 1108/11526 [11:34<1:46:39, 1.63it/s] {'loss': 0.2846, 'grad_norm': 0.6668442487716675, 'learning_rate': 9.609713790112751e-06, 'epoch': 0.29}
10%|▉ | 1108/11526 [11:34<1:46:39, 1.63it/s] 10%|▉ | 1109/11526 [11:34<1:46:39, 1.63it/s] {'loss': 0.3728, 'grad_norm': 0.7159383893013, 'learning_rate': 9.618386816999134e-06, 'epoch': 0.29}
10%|▉ | 1109/11526 [11:35<1:46:39, 1.63it/s] 10%|▉ | 1110/11526 [11:35<1:46:37, 1.63it/s] {'loss': 0.3126, 'grad_norm': 0.6233133673667908, 'learning_rate': 9.627059843885516e-06, 'epoch': 0.29}
10%|▉ | 1110/11526 [11:35<1:46:37, 1.63it/s] 10%|▉ | 1111/11526 [11:36<1:46:42, 1.63it/s] {'loss': 0.2973, 'grad_norm': 0.6698866486549377, 'learning_rate': 9.6357328707719e-06, 'epoch': 0.29}
10%|▉ | 1111/11526 [11:36<1:46:42, 1.63it/s] 10%|▉ | 1112/11526 [11:36<1:46:39, 1.63it/s] {'loss': 0.3042, 'grad_norm': 0.63419109582901, 'learning_rate': 9.644405897658283e-06, 'epoch': 0.29}
10%|▉ | 1112/11526 [11:36<1:46:39, 1.63it/s] 10%|▉ | 1113/11526 [11:37<1:46:35, 1.63it/s] {'loss': 0.2733, 'grad_norm': 0.5762624144554138, 'learning_rate': 9.653078924544667e-06, 'epoch': 0.29}
10%|▉ | 1113/11526 [11:37<1:46:35, 1.63it/s] 10%|▉ | 1114/11526 [11:38<1:46:37, 1.63it/s] {'loss': 0.3882, 'grad_norm': 0.7246988415718079, 'learning_rate': 9.66175195143105e-06, 'epoch': 0.29}
10%|▉ | 1114/11526 [11:38<1:46:37, 1.63it/s] 10%|▉ | 1115/11526 [11:38<1:46:37, 1.63it/s] {'loss': 0.2356, 'grad_norm': 0.5604111552238464, 'learning_rate': 9.670424978317434e-06, 'epoch': 0.29}
10%|▉ | 1115/11526 [11:38<1:46:37, 1.63it/s] 10%|▉ | 1116/11526 [11:39<1:46:34, 1.63it/s] {'loss': 0.3458, 'grad_norm': 0.6467885375022888, 'learning_rate': 9.679098005203816e-06, 'epoch': 0.29}
10%|▉ | 1116/11526 [11:39<1:46:34, 1.63it/s] 10%|▉ | 1117/11526 [11:39<1:46:31, 1.63it/s] {'loss': 0.3832, 'grad_norm': 0.7033353447914124, 'learning_rate': 9.6877710320902e-06, 'epoch': 0.29}
10%|▉ | 1117/11526 [11:40<1:46:31, 1.63it/s] 10%|▉ | 1118/11526 [11:40<1:46:29, 1.63it/s] {'loss': 0.3289, 'grad_norm': 0.6872131824493408, 'learning_rate': 9.696444058976584e-06, 'epoch': 0.29}
10%|▉ | 1118/11526 [11:40<1:46:29, 1.63it/s] 10%|▉ | 1119/11526 [11:41<1:46:26, 1.63it/s] {'loss': 0.3081, 'grad_norm': 0.6710604429244995, 'learning_rate': 9.705117085862967e-06, 'epoch': 0.29}
10%|▉ | 1119/11526 [11:41<1:46:26, 1.63it/s] 10%|▉ | 1120/11526 [11:41<1:46:27, 1.63it/s] {'loss': 0.2803, 'grad_norm': 0.6712472438812256, 'learning_rate': 9.713790112749351e-06, 'epoch': 0.29}
10%|▉ | 1120/11526 [11:41<1:46:27, 1.63it/s] 10%|▉ | 1121/11526 [11:42<1:46:28, 1.63it/s] {'loss': 0.3763, 'grad_norm': 0.5959776639938354, 'learning_rate': 9.722463139635734e-06, 'epoch': 0.29}
10%|▉ | 1121/11526 [11:42<1:46:28, 1.63it/s] 10%|▉ | 1122/11526 [11:42<1:46:30, 1.63it/s] {'loss': 0.3788, 'grad_norm': 0.7035123109817505, 'learning_rate': 9.731136166522118e-06, 'epoch': 0.29}
10%|▉ | 1122/11526 [11:43<1:46:30, 1.63it/s] 10%|▉ | 1123/11526 [11:43<1:46:33, 1.63it/s] {'loss': 0.4343, 'grad_norm': 0.674229085445404, 'learning_rate': 9.7398091934085e-06, 'epoch': 0.29}
10%|▉ | 1123/11526 [11:43<1:46:33, 1.63it/s] 10%|▉ | 1124/11526 [11:44<1:46:32, 1.63it/s] {'loss': 0.3387, 'grad_norm': 0.7235898375511169, 'learning_rate': 9.748482220294883e-06, 'epoch': 0.29}
10%|▉ | 1124/11526 [11:44<1:46:32, 1.63it/s] 10%|▉ | 1125/11526 [11:44<1:46:37, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.569334089756012, 'learning_rate': 9.757155247181267e-06, 'epoch': 0.29}
10%|▉ | 1125/11526 [11:44<1:46:37, 1.63it/s] 10%|▉ | 1126/11526 [11:45<1:46:30, 1.63it/s] {'loss': 0.3135, 'grad_norm': 0.7465218901634216, 'learning_rate': 9.76582827406765e-06, 'epoch': 0.29}
10%|▉ | 1126/11526 [11:45<1:46:30, 1.63it/s] 10%|▉ | 1127/11526 [11:46<1:46:28, 1.63it/s] {'loss': 0.3594, 'grad_norm': 0.7326985001564026, 'learning_rate': 9.774501300954034e-06, 'epoch': 0.29}
10%|▉ | 1127/11526 [11:46<1:46:28, 1.63it/s] 10%|▉ | 1128/11526 [11:46<1:46:26, 1.63it/s] {'loss': 0.299, 'grad_norm': 0.6712443828582764, 'learning_rate': 9.783174327840416e-06, 'epoch': 0.29}
10%|▉ | 1128/11526 [11:46<1:46:26, 1.63it/s] 10%|▉ | 1129/11526 [11:47<1:46:25, 1.63it/s] {'loss': 0.3516, 'grad_norm': 0.7820031046867371, 'learning_rate': 9.7918473547268e-06, 'epoch': 0.29}
10%|▉ | 1129/11526 [11:47<1:46:25, 1.63it/s] 10%|▉ | 1130/11526 [11:47<1:46:26, 1.63it/s] {'loss': 0.3233, 'grad_norm': 0.6041061282157898, 'learning_rate': 9.800520381613183e-06, 'epoch': 0.29}
10%|▉ | 1130/11526 [11:48<1:46:26, 1.63it/s] 10%|▉ | 1131/11526 [11:48<1:46:24, 1.63it/s] {'loss': 0.2744, 'grad_norm': 0.6542778015136719, 'learning_rate': 9.809193408499567e-06, 'epoch': 0.29}
10%|▉ | 1131/11526 [11:48<1:46:24, 1.63it/s] 10%|▉ | 1132/11526 [11:49<1:46:25, 1.63it/s] {'loss': 0.2699, 'grad_norm': 0.6602995991706848, 'learning_rate': 9.817866435385951e-06, 'epoch': 0.29}
10%|▉ | 1132/11526 [11:49<1:46:25, 1.63it/s] 10%|▉ | 1133/11526 [11:49<1:46:23, 1.63it/s] {'loss': 0.3112, 'grad_norm': 0.7451454997062683, 'learning_rate': 9.826539462272334e-06, 'epoch': 0.29}
10%|▉ | 1133/11526 [11:49<1:46:23, 1.63it/s] 10%|▉ | 1134/11526 [11:50<1:46:23, 1.63it/s] {'loss': 0.3742, 'grad_norm': 0.7971048355102539, 'learning_rate': 9.835212489158718e-06, 'epoch': 0.3}
10%|▉ | 1134/11526 [11:50<1:46:23, 1.63it/s] 10%|▉ | 1135/11526 [11:50<1:46:24, 1.63it/s] {'loss': 0.2403, 'grad_norm': 0.6590219140052795, 'learning_rate': 9.8438855160451e-06, 'epoch': 0.3}
10%|▉ | 1135/11526 [11:51<1:46:24, 1.63it/s] 10%|▉ | 1136/11526 [11:51<1:46:22, 1.63it/s] {'loss': 0.3098, 'grad_norm': 0.7001121044158936, 'learning_rate': 9.852558542931485e-06, 'epoch': 0.3}
10%|▉ | 1136/11526 [11:51<1:46:22, 1.63it/s] 10%|▉ | 1137/11526 [11:52<1:46:21, 1.63it/s] {'loss': 0.2933, 'grad_norm': 0.650661289691925, 'learning_rate': 9.861231569817867e-06, 'epoch': 0.3}
10%|▉ | 1137/11526 [11:52<1:46:21, 1.63it/s] 10%|▉ | 1138/11526 [11:52<1:46:27, 1.63it/s] {'loss': 0.3686, 'grad_norm': 0.7064012885093689, 'learning_rate': 9.869904596704251e-06, 'epoch': 0.3}
10%|▉ | 1138/11526 [11:52<1:46:27, 1.63it/s] 10%|▉ | 1139/11526 [11:53<1:46:23, 1.63it/s] {'loss': 0.3095, 'grad_norm': 0.7216424942016602, 'learning_rate': 9.878577623590634e-06, 'epoch': 0.3}
10%|▉ | 1139/11526 [11:53<1:46:23, 1.63it/s] 10%|▉ | 1140/11526 [11:54<1:46:21, 1.63it/s] {'loss': 0.4567, 'grad_norm': 0.7765135765075684, 'learning_rate': 9.887250650477016e-06, 'epoch': 0.3}
10%|▉ | 1140/11526 [11:54<1:46:21, 1.63it/s] 10%|▉ | 1141/11526 [11:54<1:46:17, 1.63it/s] {'loss': 0.3466, 'grad_norm': 0.7226525545120239, 'learning_rate': 9.8959236773634e-06, 'epoch': 0.3}
10%|▉ | 1141/11526 [11:54<1:46:17, 1.63it/s] 10%|▉ | 1142/11526 [11:55<1:46:15, 1.63it/s] {'loss': 0.3514, 'grad_norm': 0.7362167835235596, 'learning_rate': 9.904596704249783e-06, 'epoch': 0.3}
10%|▉ | 1142/11526 [11:55<1:46:15, 1.63it/s] 10%|▉ | 1143/11526 [11:55<1:46:12, 1.63it/s] {'loss': 0.2576, 'grad_norm': 0.6497313976287842, 'learning_rate': 9.913269731136167e-06, 'epoch': 0.3}
10%|▉ | 1143/11526 [11:55<1:46:12, 1.63it/s] 10%|▉ | 1144/11526 [11:56<1:46:12, 1.63it/s] {'loss': 0.3411, 'grad_norm': 0.7201453447341919, 'learning_rate': 9.92194275802255e-06, 'epoch': 0.3}
10%|▉ | 1144/11526 [11:56<1:46:12, 1.63it/s] 10%|▉ | 1145/11526 [11:57<1:46:12, 1.63it/s] {'loss': 0.3001, 'grad_norm': 0.6378819942474365, 'learning_rate': 9.930615784908934e-06, 'epoch': 0.3}
10%|▉ | 1145/11526 [11:57<1:46:12, 1.63it/s] 10%|▉ | 1146/11526 [11:57<1:46:16, 1.63it/s] {'loss': 0.3352, 'grad_norm': 0.7410324811935425, 'learning_rate': 9.939288811795318e-06, 'epoch': 0.3}
10%|▉ | 1146/11526 [11:57<1:46:16, 1.63it/s] 10%|▉ | 1147/11526 [11:58<1:46:16, 1.63it/s] {'loss': 0.3269, 'grad_norm': 0.7283575534820557, 'learning_rate': 9.9479618386817e-06, 'epoch': 0.3}
10%|▉ | 1147/11526 [11:58<1:46:16, 1.63it/s] 10%|▉ | 1148/11526 [11:58<1:46:14, 1.63it/s] {'loss': 0.3803, 'grad_norm': 0.7108330726623535, 'learning_rate': 9.956634865568085e-06, 'epoch': 0.3}
10%|▉ | 1148/11526 [11:59<1:46:14, 1.63it/s] 10%|▉ | 1149/11526 [11:59<1:46:11, 1.63it/s] {'loss': 0.355, 'grad_norm': 0.8507702350616455, 'learning_rate': 9.965307892454467e-06, 'epoch': 0.3}
10%|▉ | 1149/11526 [11:59<1:46:11, 1.63it/s] 10%|▉ | 1150/11526 [12:00<1:46:13, 1.63it/s] {'loss': 0.3755, 'grad_norm': 0.6221222877502441, 'learning_rate': 9.973980919340852e-06, 'epoch': 0.3}
10%|▉ | 1150/11526 [12:00<1:46:13, 1.63it/s] 10%|▉ | 1151/11526 [12:00<1:46:15, 1.63it/s] {'loss': 0.3753, 'grad_norm': 0.7677313089370728, 'learning_rate': 9.982653946227234e-06, 'epoch': 0.3}
10%|▉ | 1151/11526 [12:00<1:46:15, 1.63it/s] 10%|▉ | 1152/11526 [12:01<1:46:13, 1.63it/s] {'loss': 0.4544, 'grad_norm': 0.8403504490852356, 'learning_rate': 9.991326973113618e-06, 'epoch': 0.3}
10%|▉ | 1152/11526 [12:01<1:46:13, 1.63it/s] 10%|█ | 1153/11526 [12:02<1:46:11, 1.63it/s] {'loss': 0.3102, 'grad_norm': 0.6145944595336914, 'learning_rate': 1e-05, 'epoch': 0.3}
10%|█ | 1153/11526 [12:02<1:46:11, 1.63it/s] 10%|█ | 1154/11526 [12:02<1:46:15, 1.63it/s] {'loss': 0.289, 'grad_norm': 0.5986009240150452, 'learning_rate': 9.999999770685776e-06, 'epoch': 0.3}
10%|█ | 1154/11526 [12:02<1:46:15, 1.63it/s] 10%|█ | 1155/11526 [12:03<1:46:11, 1.63it/s] {'loss': 0.2814, 'grad_norm': 0.5406612157821655, 'learning_rate': 9.999999082743123e-06, 'epoch': 0.3}
10%|█ | 1155/11526 [12:03<1:46:11, 1.63it/s] 10%|█ | 1156/11526 [12:03<1:46:12, 1.63it/s] {'loss': 0.4123, 'grad_norm': 0.7303431630134583, 'learning_rate': 9.999997936172107e-06, 'epoch': 0.3}
10%|█ | 1156/11526 [12:03<1:46:12, 1.63it/s] 10%|█ | 1157/11526 [12:04<1:46:10, 1.63it/s] {'loss': 0.3734, 'grad_norm': 0.7797909379005432, 'learning_rate': 9.999996330972832e-06, 'epoch': 0.3}
10%|█ | 1157/11526 [12:04<1:46:10, 1.63it/s] 10%|█ | 1158/11526 [12:05<1:46:09, 1.63it/s] {'loss': 0.4025, 'grad_norm': 0.6775457859039307, 'learning_rate': 9.999994267145443e-06, 'epoch': 0.3}
10%|█ | 1158/11526 [12:05<1:46:09, 1.63it/s] 10%|█ | 1159/11526 [12:05<1:46:08, 1.63it/s] {'loss': 0.3436, 'grad_norm': 0.6740869283676147, 'learning_rate': 9.99999174469013e-06, 'epoch': 0.3}
10%|█ | 1159/11526 [12:05<1:46:08, 1.63it/s] 10%|█ | 1160/11526 [12:06<1:46:05, 1.63it/s] {'loss': 0.3898, 'grad_norm': 0.7520406246185303, 'learning_rate': 9.999988763607127e-06, 'epoch': 0.3}
10%|█ | 1160/11526 [12:06<1:46:05, 1.63it/s] 10%|█ | 1161/11526 [12:06<1:46:07, 1.63it/s] {'loss': 0.3297, 'grad_norm': 0.6964832544326782, 'learning_rate': 9.999985323896707e-06, 'epoch': 0.3}
10%|█ | 1161/11526 [12:07<1:46:07, 1.63it/s] 10%|█ | 1162/11526 [12:07<1:46:08, 1.63it/s] {'loss': 0.272, 'grad_norm': 0.6673797965049744, 'learning_rate': 9.999981425559182e-06, 'epoch': 0.3}
10%|█ | 1162/11526 [12:07<1:46:08, 1.63it/s] 10%|█ | 1163/11526 [12:08<1:46:06, 1.63it/s] {'loss': 0.3684, 'grad_norm': 0.7804749608039856, 'learning_rate': 9.999977068594913e-06, 'epoch': 0.3}
10%|█ | 1163/11526 [12:08<1:46:06, 1.63it/s] 10%|█ | 1164/11526 [12:08<1:46:05, 1.63it/s] {'loss': 0.3099, 'grad_norm': 0.6917244791984558, 'learning_rate': 9.999972253004297e-06, 'epoch': 0.3}
10%|█ | 1164/11526 [12:08<1:46:05, 1.63it/s] 10%|█ | 1165/11526 [12:09<1:46:08, 1.63it/s] {'loss': 0.3118, 'grad_norm': 0.6892030239105225, 'learning_rate': 9.99996697878778e-06, 'epoch': 0.3}
10%|█ | 1165/11526 [12:09<1:46:08, 1.63it/s] 10%|█ | 1166/11526 [12:09<1:46:11, 1.63it/s] {'loss': 0.2677, 'grad_norm': 0.5849792957305908, 'learning_rate': 9.999961245945841e-06, 'epoch': 0.3}
10%|█ | 1166/11526 [12:10<1:46:11, 1.63it/s] 10%|█ | 1167/11526 [12:10<1:46:09, 1.63it/s] {'loss': 0.3342, 'grad_norm': 0.766074538230896, 'learning_rate': 9.999955054479009e-06, 'epoch': 0.3}
10%|█ | 1167/11526 [12:10<1:46:09, 1.63it/s] 10%|█ | 1168/11526 [12:11<1:46:12, 1.63it/s] {'loss': 0.3281, 'grad_norm': 0.6750710606575012, 'learning_rate': 9.999948404387851e-06, 'epoch': 0.3}
10%|█ | 1168/11526 [12:11<1:46:12, 1.63it/s] 10%|█ | 1169/11526 [12:11<1:46:06, 1.63it/s] {'loss': 0.2979, 'grad_norm': 0.8664166927337646, 'learning_rate': 9.999941295672977e-06, 'epoch': 0.3}
10%|█ | 1169/11526 [12:11<1:46:06, 1.63it/s] 10%|█ | 1170/11526 [12:12<1:45:59, 1.63it/s] {'loss': 0.2802, 'grad_norm': 0.5777806043624878, 'learning_rate': 9.999933728335038e-06, 'epoch': 0.3}
10%|█ | 1170/11526 [12:12<1:45:59, 1.63it/s] 10%|█ | 1171/11526 [12:13<1:46:08, 1.63it/s] {'loss': 0.2701, 'grad_norm': 0.6587753891944885, 'learning_rate': 9.999925702374728e-06, 'epoch': 0.3}
10%|█ | 1171/11526 [12:13<1:46:08, 1.63it/s] 10%|█ | 1172/11526 [12:13<1:46:07, 1.63it/s] {'loss': 0.453, 'grad_norm': 0.8812687397003174, 'learning_rate': 9.999917217792786e-06, 'epoch': 0.31}
10%|█ | 1172/11526 [12:13<1:46:07, 1.63it/s] 10%|█ | 1173/11526 [12:14<1:46:08, 1.63it/s] {'loss': 0.2606, 'grad_norm': 0.5340201258659363, 'learning_rate': 9.999908274589988e-06, 'epoch': 0.31}
10%|█ | 1173/11526 [12:14<1:46:08, 1.63it/s] 10%|█ | 1174/11526 [12:14<1:46:03, 1.63it/s] {'loss': 0.2293, 'grad_norm': 0.5412325263023376, 'learning_rate': 9.999898872767153e-06, 'epoch': 0.31}
10%|█ | 1174/11526 [12:15<1:46:03, 1.63it/s] 10%|█ | 1175/11526 [12:15<1:45:59, 1.63it/s] {'loss': 0.2961, 'grad_norm': 0.6064091920852661, 'learning_rate': 9.999889012325148e-06, 'epoch': 0.31}
10%|█ | 1175/11526 [12:15<1:45:59, 1.63it/s] 10%|█ | 1176/11526 [12:16<1:46:01, 1.63it/s] {'loss': 0.4098, 'grad_norm': 0.7364852428436279, 'learning_rate': 9.999878693264873e-06, 'epoch': 0.31}
10%|█ | 1176/11526 [12:16<1:46:01, 1.63it/s] 10%|█ | 1177/11526 [12:16<1:46:00, 1.63it/s] {'loss': 0.4091, 'grad_norm': 0.715528666973114, 'learning_rate': 9.999867915587276e-06, 'epoch': 0.31}
10%|█ | 1177/11526 [12:16<1:46:00, 1.63it/s] 10%|█ | 1178/11526 [12:17<1:46:07, 1.63it/s] {'loss': 0.339, 'grad_norm': 0.6656315326690674, 'learning_rate': 9.999856679293348e-06, 'epoch': 0.31}
10%|█ | 1178/11526 [12:17<1:46:07, 1.63it/s] 10%|█ | 1179/11526 [12:17<1:45:59, 1.63it/s] {'loss': 0.2821, 'grad_norm': 0.6647992134094238, 'learning_rate': 9.999844984384114e-06, 'epoch': 0.31}
10%|█ | 1179/11526 [12:18<1:45:59, 1.63it/s] 10%|█ | 1180/11526 [12:18<1:45:56, 1.63it/s] {'loss': 0.4017, 'grad_norm': 0.7729828953742981, 'learning_rate': 9.999832830860652e-06, 'epoch': 0.31}
10%|█ | 1180/11526 [12:18<1:45:56, 1.63it/s] 10%|█ | 1181/11526 [12:19<1:46:03, 1.63it/s] {'loss': 0.3083, 'grad_norm': 0.6605077981948853, 'learning_rate': 9.999820218724075e-06, 'epoch': 0.31}
10%|█ | 1181/11526 [12:19<1:46:03, 1.63it/s] 10%|█ | 1182/11526 [12:19<1:45:58, 1.63it/s] {'loss': 0.4382, 'grad_norm': 0.8875957131385803, 'learning_rate': 9.999807147975537e-06, 'epoch': 0.31}
10%|█ | 1182/11526 [12:19<1:45:58, 1.63it/s] 10%|█ | 1183/11526 [12:20<1:45:58, 1.63it/s] {'loss': 0.2755, 'grad_norm': 0.6468813419342041, 'learning_rate': 9.999793618616242e-06, 'epoch': 0.31}
10%|█ | 1183/11526 [12:20<1:45:58, 1.63it/s] 10%|█ | 1184/11526 [12:21<1:45:58, 1.63it/s] {'loss': 0.3036, 'grad_norm': 0.8064806461334229, 'learning_rate': 9.99977963064743e-06, 'epoch': 0.31}
10%|█ | 1184/11526 [12:21<1:45:58, 1.63it/s] 10%|█ | 1185/11526 [12:21<1:45:56, 1.63it/s] {'loss': 0.3181, 'grad_norm': 0.6595768332481384, 'learning_rate': 9.99976518407038e-06, 'epoch': 0.31}
10%|█ | 1185/11526 [12:21<1:45:56, 1.63it/s] 10%|█ | 1186/11526 [12:22<1:45:52, 1.63it/s] {'loss': 0.304, 'grad_norm': 0.5995873808860779, 'learning_rate': 9.999750278886421e-06, 'epoch': 0.31}
10%|█ | 1186/11526 [12:22<1:45:52, 1.63it/s] 10%|█ | 1187/11526 [12:22<1:45:49, 1.63it/s] {'loss': 0.312, 'grad_norm': 0.7186850905418396, 'learning_rate': 9.99973491509692e-06, 'epoch': 0.31}
10%|█ | 1187/11526 [12:23<1:45:49, 1.63it/s] 10%|█ | 1188/11526 [12:23<1:45:44, 1.63it/s] {'loss': 0.2471, 'grad_norm': 0.5820941925048828, 'learning_rate': 9.999719092703284e-06, 'epoch': 0.31}
10%|█ | 1188/11526 [12:23<1:45:44, 1.63it/s] 10%|█ | 1189/11526 [12:24<1:45:46, 1.63it/s] {'loss': 0.2491, 'grad_norm': 0.5498075485229492, 'learning_rate': 9.999702811706966e-06, 'epoch': 0.31}
10%|█ | 1189/11526 [12:24<1:45:46, 1.63it/s] 10%|█ | 1190/11526 [12:24<1:45:45, 1.63it/s] {'loss': 0.3186, 'grad_norm': 0.6278696656227112, 'learning_rate': 9.999686072109458e-06, 'epoch': 0.31}
10%|█ | 1190/11526 [12:24<1:45:45, 1.63it/s] 10%|█ | 1191/11526 [12:25<1:45:55, 1.63it/s] {'loss': 0.2796, 'grad_norm': 0.6268672347068787, 'learning_rate': 9.999668873912299e-06, 'epoch': 0.31}
10%|█ | 1191/11526 [12:25<1:45:55, 1.63it/s] 10%|█ | 1192/11526 [12:25<1:45:51, 1.63it/s] {'loss': 0.3255, 'grad_norm': 0.6590171456336975, 'learning_rate': 9.999651217117061e-06, 'epoch': 0.31}
10%|█ | 1192/11526 [12:26<1:45:51, 1.63it/s] 10%|█ | 1193/11526 [12:26<1:46:02, 1.62it/s] {'loss': 0.3396, 'grad_norm': 0.6421312689781189, 'learning_rate': 9.999633101725367e-06, 'epoch': 0.31}
10%|█ | 1193/11526 [12:26<1:46:02, 1.62it/s] 10%|█ | 1194/11526 [12:27<1:45:55, 1.63it/s] {'loss': 0.3104, 'grad_norm': 0.6591479182243347, 'learning_rate': 9.999614527738882e-06, 'epoch': 0.31}
10%|█ | 1194/11526 [12:27<1:45:55, 1.63it/s] 10%|█ | 1195/11526 [12:27<1:45:48, 1.63it/s] {'loss': 0.258, 'grad_norm': 0.643905520439148, 'learning_rate': 9.999595495159301e-06, 'epoch': 0.31}
10%|█ | 1195/11526 [12:27<1:45:48, 1.63it/s] 10%|█ | 1196/11526 [12:28<1:45:55, 1.63it/s] {'loss': 0.3043, 'grad_norm': 0.6156298518180847, 'learning_rate': 9.999576003988376e-06, 'epoch': 0.31}
10%|█ | 1196/11526 [12:28<1:45:55, 1.63it/s] 10%|█ | 1197/11526 [12:29<1:45:51, 1.63it/s] {'loss': 0.3533, 'grad_norm': 0.6638573408126831, 'learning_rate': 9.999556054227894e-06, 'epoch': 0.31}
10%|█ | 1197/11526 [12:29<1:45:51, 1.63it/s] 10%|█ | 1198/11526 [12:29<1:45:47, 1.63it/s] {'loss': 0.2346, 'grad_norm': 0.5316244959831238, 'learning_rate': 9.999535645879685e-06, 'epoch': 0.31}
10%|█ | 1198/11526 [12:29<1:45:47, 1.63it/s] 10%|█ | 1199/11526 [12:30<1:45:44, 1.63it/s] {'loss': 0.2284, 'grad_norm': 0.69542396068573, 'learning_rate': 9.99951477894562e-06, 'epoch': 0.31}
10%|█ | 1199/11526 [12:30<1:45:44, 1.63it/s] 10%|█ | 1200/11526 [12:30<1:45:42, 1.63it/s] {'loss': 0.2979, 'grad_norm': 0.6673340201377869, 'learning_rate': 9.999493453427613e-06, 'epoch': 0.31}
10%|█ | 1200/11526 [12:31<1:45:42, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.7692622542381287, 'eval_runtime': 1.9539, 'eval_samples_per_second': 102.358, 'eval_steps_per_second': 6.653, 'epoch': 0.31}
10%|█ | 1200/11526 [12:32<1:45:42, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 10%|█ | 1201/11526 [12:33<3:26:46, 1.20s/it] {'loss': 0.3901, 'grad_norm': 0.777594804763794, 'learning_rate': 9.99947166932762e-06, 'epoch': 0.31}
10%|█ | 1201/11526 [12:33<3:26:46, 1.20s/it] 10%|█ | 1202/11526 [12:34<2:56:24, 1.03s/it] {'loss': 0.2707, 'grad_norm': 0.5678277015686035, 'learning_rate': 9.999449426647641e-06, 'epoch': 0.31}
10%|█ | 1202/11526 [12:34<2:56:24, 1.03s/it] 10%|█ | 1203/11526 [12:34<2:35:17, 1.11it/s] {'loss': 0.2511, 'grad_norm': 0.691693127155304, 'learning_rate': 9.999426725389713e-06, 'epoch': 0.31}
10%|█ | 1203/11526 [12:34<2:35:17, 1.11it/s] 10%|█ | 1204/11526 [12:35<2:20:24, 1.23it/s] {'loss': 0.2598, 'grad_norm': 0.6970161199569702, 'learning_rate': 9.999403565555923e-06, 'epoch': 0.31}
10%|█ | 1204/11526 [12:35<2:20:24, 1.23it/s] 10%|█ | 1205/11526 [12:35<2:10:03, 1.32it/s] {'loss': 0.3025, 'grad_norm': 0.7412875294685364, 'learning_rate': 9.999379947148391e-06, 'epoch': 0.31}
10%|█ | 1205/11526 [12:36<2:10:03, 1.32it/s] 10%|█ | 1206/11526 [12:36<2:02:44, 1.40it/s] {'loss': 0.3487, 'grad_norm': 0.7465935349464417, 'learning_rate': 9.999355870169284e-06, 'epoch': 0.31}
10%|█ | 1206/11526 [12:36<2:02:44, 1.40it/s] 10%|█ | 1207/11526 [12:37<1:57:38, 1.46it/s] {'loss': 0.3404, 'grad_norm': 0.6743242144584656, 'learning_rate': 9.999331334620814e-06, 'epoch': 0.31}
10%|█ | 1207/11526 [12:37<1:57:38, 1.46it/s] 10%|█ | 1208/11526 [12:37<1:54:07, 1.51it/s] {'loss': 0.3195, 'grad_norm': 0.6707761287689209, 'learning_rate': 9.999306340505226e-06, 'epoch': 0.31}
10%|█ | 1208/11526 [12:37<1:54:07, 1.51it/s] 10%|█ | 1209/11526 [12:38<1:51:31, 1.54it/s] {'loss': 0.2943, 'grad_norm': 0.67238849401474, 'learning_rate': 9.999280887824819e-06, 'epoch': 0.31}
10%|█ | 1209/11526 [12:38<1:51:31, 1.54it/s] 10%|█ | 1210/11526 [12:38<1:49:48, 1.57it/s] {'loss': 0.3574, 'grad_norm': 0.664055347442627, 'learning_rate': 9.99925497658192e-06, 'epoch': 0.31}
10%|█ | 1210/11526 [12:39<1:49:48, 1.57it/s] 11%|█ | 1211/11526 [12:39<1:48:31, 1.58it/s] {'loss': 0.2353, 'grad_norm': 0.5312572717666626, 'learning_rate': 9.999228606778912e-06, 'epoch': 0.32}
11%|█ | 1211/11526 [12:39<1:48:31, 1.58it/s] 11%|█ | 1212/11526 [12:40<1:47:37, 1.60it/s] {'loss': 0.2753, 'grad_norm': 0.5894121527671814, 'learning_rate': 9.999201778418213e-06, 'epoch': 0.32}
11%|█ | 1212/11526 [12:40<1:47:37, 1.60it/s] 11%|█ | 1213/11526 [12:40<1:47:08, 1.60it/s] {'loss': 0.3216, 'grad_norm': 0.6352202892303467, 'learning_rate': 9.999174491502282e-06, 'epoch': 0.32}
11%|█ | 1213/11526 [12:40<1:47:08, 1.60it/s] 11%|█ | 1214/11526 [12:41<1:46:37, 1.61it/s] {'loss': 0.3313, 'grad_norm': 0.7323378324508667, 'learning_rate': 9.999146746033621e-06, 'epoch': 0.32}
11%|█ | 1214/11526 [12:41<1:46:37, 1.61it/s] 11%|█ | 1215/11526 [12:42<1:46:20, 1.62it/s] {'loss': 0.2804, 'grad_norm': 0.6152045130729675, 'learning_rate': 9.999118542014777e-06, 'epoch': 0.32}
11%|█ | 1215/11526 [12:42<1:46:20, 1.62it/s] 11%|█ | 1216/11526 [12:42<1:46:10, 1.62it/s] {'loss': 0.3588, 'grad_norm': 0.7324710488319397, 'learning_rate': 9.999089879448338e-06, 'epoch': 0.32}
11%|█ | 1216/11526 [12:42<1:46:10, 1.62it/s] 11%|█ | 1217/11526 [12:43<1:45:59, 1.62it/s] {'loss': 0.3086, 'grad_norm': 0.6613231897354126, 'learning_rate': 9.99906075833693e-06, 'epoch': 0.32}
11%|█ | 1217/11526 [12:43<1:45:59, 1.62it/s] 11%|█ | 1218/11526 [12:43<1:45:55, 1.62it/s] {'loss': 0.2969, 'grad_norm': 0.8143608570098877, 'learning_rate': 9.999031178683227e-06, 'epoch': 0.32}
11%|█ | 1218/11526 [12:44<1:45:55, 1.62it/s] 11%|█ | 1219/11526 [12:44<1:45:46, 1.62it/s] {'loss': 0.3354, 'grad_norm': 0.6834306716918945, 'learning_rate': 9.99900114048994e-06, 'epoch': 0.32}
11%|█ | 1219/11526 [12:44<1:45:46, 1.62it/s] 11%|█ | 1220/11526 [12:45<1:45:40, 1.63it/s] {'loss': 0.25, 'grad_norm': 0.5365996956825256, 'learning_rate': 9.998970643759824e-06, 'epoch': 0.32}
11%|█ | 1220/11526 [12:45<1:45:40, 1.63it/s] 11%|█ | 1221/11526 [12:45<1:45:49, 1.62it/s] {'loss': 0.3205, 'grad_norm': 0.7225944399833679, 'learning_rate': 9.99893968849568e-06, 'epoch': 0.32}
11%|█ | 1221/11526 [12:45<1:45:49, 1.62it/s] 11%|█ | 1222/11526 [12:46<1:45:41, 1.62it/s] {'loss': 0.3682, 'grad_norm': 0.6768682599067688, 'learning_rate': 9.998908274700344e-06, 'epoch': 0.32}
11%|█ | 1222/11526 [12:46<1:45:41, 1.62it/s] 11%|█ | 1223/11526 [12:46<1:45:41, 1.62it/s] {'loss': 0.3202, 'grad_norm': 0.6104533076286316, 'learning_rate': 9.998876402376697e-06, 'epoch': 0.32}
11%|█ | 1223/11526 [12:47<1:45:41, 1.62it/s] 11%|█ | 1224/11526 [12:47<1:45:35, 1.63it/s] {'loss': 0.3158, 'grad_norm': 0.7356615662574768, 'learning_rate': 9.998844071527667e-06, 'epoch': 0.32}
11%|█ | 1224/11526 [12:47<1:45:35, 1.63it/s] 11%|█ | 1225/11526 [12:48<1:45:33, 1.63it/s] {'loss': 0.3155, 'grad_norm': 0.5757622122764587, 'learning_rate': 9.998811282156216e-06, 'epoch': 0.32}
11%|█ | 1225/11526 [12:48<1:45:33, 1.63it/s] 11%|█ | 1226/11526 [12:48<1:45:51, 1.62it/s] {'loss': 0.2834, 'grad_norm': 0.6212471127510071, 'learning_rate': 9.99877803426535e-06, 'epoch': 0.32}
11%|█ | 1226/11526 [12:48<1:45:51, 1.62it/s] 11%|█ | 1227/11526 [12:49<1:45:43, 1.62it/s] {'loss': 0.3924, 'grad_norm': 0.7878178358078003, 'learning_rate': 9.998744327858122e-06, 'epoch': 0.32}
11%|█ | 1227/11526 [12:49<1:45:43, 1.62it/s] 11%|█ | 1228/11526 [12:50<1:45:44, 1.62it/s] {'loss': 0.3497, 'grad_norm': 0.7282169461250305, 'learning_rate': 9.998710162937623e-06, 'epoch': 0.32}
11%|█ | 1228/11526 [12:50<1:45:44, 1.62it/s] 11%|█ | 1229/11526 [12:50<1:45:36, 1.62it/s] {'loss': 0.3439, 'grad_norm': 0.725741982460022, 'learning_rate': 9.998675539506986e-06, 'epoch': 0.32}
11%|█ | 1229/11526 [12:50<1:45:36, 1.62it/s] 11%|█ | 1230/11526 [12:51<1:45:32, 1.63it/s] {'loss': 0.286, 'grad_norm': 0.6213083267211914, 'learning_rate': 9.998640457569386e-06, 'epoch': 0.32}
11%|█ | 1230/11526 [12:51<1:45:32, 1.63it/s] 11%|█ | 1231/11526 [12:51<1:45:36, 1.62it/s] {'loss': 0.3348, 'grad_norm': 0.7080572843551636, 'learning_rate': 9.998604917128045e-06, 'epoch': 0.32}
11%|█ | 1231/11526 [12:52<1:45:36, 1.62it/s] 11%|█ | 1232/11526 [12:52<1:45:28, 1.63it/s] {'loss': 0.3506, 'grad_norm': 0.5913798809051514, 'learning_rate': 9.998568918186217e-06, 'epoch': 0.32}
11%|█ | 1232/11526 [12:52<1:45:28, 1.63it/s] 11%|█ | 1233/11526 [12:53<1:45:26, 1.63it/s] {'loss': 0.2663, 'grad_norm': 0.7309101223945618, 'learning_rate': 9.998532460747209e-06, 'epoch': 0.32}
11%|█ | 1233/11526 [12:53<1:45:26, 1.63it/s] 11%|█ | 1234/11526 [12:53<1:45:22, 1.63it/s] {'loss': 0.3732, 'grad_norm': 0.8312999606132507, 'learning_rate': 9.998495544814362e-06, 'epoch': 0.32}
11%|█ | 1234/11526 [12:53<1:45:22, 1.63it/s] 11%|█ | 1235/11526 [12:54<1:45:20, 1.63it/s] {'loss': 0.301, 'grad_norm': 0.5824926495552063, 'learning_rate': 9.998458170391065e-06, 'epoch': 0.32}
11%|█ | 1235/11526 [12:54<1:45:20, 1.63it/s] 11%|█ | 1236/11526 [12:54<1:45:22, 1.63it/s] {'loss': 0.3417, 'grad_norm': 0.7176511287689209, 'learning_rate': 9.998420337480744e-06, 'epoch': 0.32}
11%|█ | 1236/11526 [12:55<1:45:22, 1.63it/s] 11%|█ | 1237/11526 [12:55<1:45:22, 1.63it/s] {'loss': 0.3182, 'grad_norm': 0.6377743482589722, 'learning_rate': 9.99838204608687e-06, 'epoch': 0.32}
11%|█ | 1237/11526 [12:55<1:45:22, 1.63it/s] 11%|█ | 1238/11526 [12:56<1:45:32, 1.62it/s] {'loss': 0.2726, 'grad_norm': 0.7166119813919067, 'learning_rate': 9.998343296212954e-06, 'epoch': 0.32}
11%|█ | 1238/11526 [12:56<1:45:32, 1.62it/s] 11%|█ | 1239/11526 [12:56<1:45:25, 1.63it/s] {'loss': 0.3285, 'grad_norm': 0.712219774723053, 'learning_rate': 9.998304087862551e-06, 'epoch': 0.32}
11%|█ | 1239/11526 [12:56<1:45:25, 1.63it/s] 11%|█ | 1240/11526 [12:57<1:45:19, 1.63it/s] {'loss': 0.3612, 'grad_norm': 0.8578994274139404, 'learning_rate': 9.99826442103926e-06, 'epoch': 0.32}
11%|█ | 1240/11526 [12:57<1:45:19, 1.63it/s] 11%|█ | 1241/11526 [12:58<1:45:24, 1.63it/s] {'loss': 0.3857, 'grad_norm': 0.7007575035095215, 'learning_rate': 9.998224295746716e-06, 'epoch': 0.32}
11%|█ | 1241/11526 [12:58<1:45:24, 1.63it/s] 11%|█ | 1242/11526 [12:58<1:45:22, 1.63it/s] {'loss': 0.2859, 'grad_norm': 0.5710839629173279, 'learning_rate': 9.998183711988601e-06, 'epoch': 0.32}
11%|█ | 1242/11526 [12:58<1:45:22, 1.63it/s] 11%|█ | 1243/11526 [12:59<1:45:24, 1.63it/s] {'loss': 0.3051, 'grad_norm': 0.6259340047836304, 'learning_rate': 9.998142669768638e-06, 'epoch': 0.32}
11%|█ | 1243/11526 [12:59<1:45:24, 1.63it/s] 11%|█ | 1244/11526 [12:59<1:45:21, 1.63it/s] {'loss': 0.2972, 'grad_norm': 0.6536032557487488, 'learning_rate': 9.99810116909059e-06, 'epoch': 0.32}
11%|█ | 1244/11526 [13:00<1:45:21, 1.63it/s] 11%|█ | 1245/11526 [13:00<1:45:21, 1.63it/s] {'loss': 0.3663, 'grad_norm': 0.724888265132904, 'learning_rate': 9.998059209958266e-06, 'epoch': 0.32}
11%|█ | 1245/11526 [13:00<1:45:21, 1.63it/s] 11%|█ | 1246/11526 [13:01<1:45:27, 1.62it/s] {'loss': 0.2953, 'grad_norm': 0.5659674406051636, 'learning_rate': 9.998016792375514e-06, 'epoch': 0.32}
11%|█ | 1246/11526 [13:01<1:45:27, 1.62it/s] 11%|█ | 1247/11526 [13:01<1:45:19, 1.63it/s] {'loss': 0.4079, 'grad_norm': 0.7694100737571716, 'learning_rate': 9.997973916346222e-06, 'epoch': 0.32}
11%|█ | 1247/11526 [13:01<1:45:19, 1.63it/s] 11%|█ | 1248/11526 [13:02<1:45:28, 1.62it/s] {'loss': 0.3539, 'grad_norm': 0.7260539531707764, 'learning_rate': 9.997930581874326e-06, 'epoch': 0.32}
11%|█ | 1248/11526 [13:02<1:45:28, 1.62it/s] 11%|█ | 1249/11526 [13:02<1:45:25, 1.62it/s] {'loss': 0.3346, 'grad_norm': 0.7633036375045776, 'learning_rate': 9.9978867889638e-06, 'epoch': 0.33}
11%|█ | 1249/11526 [13:03<1:45:25, 1.62it/s] 11%|█ | 1250/11526 [13:03<1:45:20, 1.63it/s] {'loss': 0.3794, 'grad_norm': 0.7634235620498657, 'learning_rate': 9.997842537618663e-06, 'epoch': 0.33}
11%|█ | 1250/11526 [13:03<1:45:20, 1.63it/s] 11%|█ | 1251/11526 [13:04<1:45:25, 1.62it/s] {'loss': 0.2257, 'grad_norm': 0.5694007277488708, 'learning_rate': 9.997797827842968e-06, 'epoch': 0.33}
11%|█ | 1251/11526 [13:04<1:45:25, 1.62it/s] 11%|█ | 1252/11526 [13:04<1:45:22, 1.63it/s] {'loss': 0.3281, 'grad_norm': 0.6319273114204407, 'learning_rate': 9.997752659640822e-06, 'epoch': 0.33}
11%|█ | 1252/11526 [13:04<1:45:22, 1.63it/s] 11%|█ | 1253/11526 [13:05<1:45:31, 1.62it/s] {'loss': 0.4283, 'grad_norm': 0.740554928779602, 'learning_rate': 9.997707033016367e-06, 'epoch': 0.33}
11%|█ | 1253/11526 [13:05<1:45:31, 1.62it/s] 11%|█ | 1254/11526 [13:06<1:45:22, 1.62it/s] {'loss': 0.3671, 'grad_norm': 0.6531791090965271, 'learning_rate': 9.997660947973787e-06, 'epoch': 0.33}
11%|█ | 1254/11526 [13:06<1:45:22, 1.62it/s] 11%|█ | 1255/11526 [13:06<1:45:18, 1.63it/s] {'loss': 0.3273, 'grad_norm': 0.7011472582817078, 'learning_rate': 9.997614404517308e-06, 'epoch': 0.33}
11%|█ | 1255/11526 [13:06<1:45:18, 1.63it/s] 11%|█ | 1256/11526 [13:07<1:45:36, 1.62it/s] {'loss': 0.2963, 'grad_norm': 0.6996995210647583, 'learning_rate': 9.9975674026512e-06, 'epoch': 0.33}
11%|█ | 1256/11526 [13:07<1:45:36, 1.62it/s] 11%|█ | 1257/11526 [13:07<1:45:24, 1.62it/s] {'loss': 0.3246, 'grad_norm': 0.6567392349243164, 'learning_rate': 9.997519942379776e-06, 'epoch': 0.33}
11%|█ | 1257/11526 [13:08<1:45:24, 1.62it/s] 11%|█ | 1258/11526 [13:08<1:45:24, 1.62it/s] {'loss': 0.2651, 'grad_norm': 0.6009412407875061, 'learning_rate': 9.997472023707389e-06, 'epoch': 0.33}
11%|█ | 1258/11526 [13:08<1:45:24, 1.62it/s] 11%|█ | 1259/11526 [13:09<1:45:18, 1.62it/s] {'loss': 0.3293, 'grad_norm': 0.6473912000656128, 'learning_rate': 9.997423646638431e-06, 'epoch': 0.33}
11%|█ | 1259/11526 [13:09<1:45:18, 1.62it/s] 11%|█ | 1260/11526 [13:09<1:45:12, 1.63it/s] {'loss': 0.2925, 'grad_norm': 0.667515218257904, 'learning_rate': 9.997374811177345e-06, 'epoch': 0.33}
11%|█ | 1260/11526 [13:09<1:45:12, 1.63it/s] 11%|█ | 1261/11526 [13:10<1:45:35, 1.62it/s] {'loss': 0.375, 'grad_norm': 0.6665599346160889, 'learning_rate': 9.997325517328607e-06, 'epoch': 0.33}
11%|█ | 1261/11526 [13:10<1:45:35, 1.62it/s] 11%|█ | 1262/11526 [13:10<1:45:23, 1.62it/s] {'loss': 0.2506, 'grad_norm': 0.6572256684303284, 'learning_rate': 9.997275765096736e-06, 'epoch': 0.33}
11%|█ | 1262/11526 [13:11<1:45:23, 1.62it/s] 11%|█ | 1263/11526 [13:11<1:45:13, 1.63it/s] {'loss': 0.3418, 'grad_norm': 0.7446691393852234, 'learning_rate': 9.997225554486302e-06, 'epoch': 0.33}
11%|█ | 1263/11526 [13:11<1:45:13, 1.63it/s] 11%|█ | 1264/11526 [13:12<1:45:11, 1.63it/s] {'loss': 0.3397, 'grad_norm': 0.6701580286026001, 'learning_rate': 9.997174885501906e-06, 'epoch': 0.33}
11%|█ | 1264/11526 [13:12<1:45:11, 1.63it/s] 11%|█ | 1265/11526 [13:12<1:45:05, 1.63it/s] {'loss': 0.2677, 'grad_norm': 0.525046706199646, 'learning_rate': 9.997123758148196e-06, 'epoch': 0.33}
11%|█ | 1265/11526 [13:12<1:45:05, 1.63it/s] 11%|█ | 1266/11526 [13:13<1:45:06, 1.63it/s] {'loss': 0.3365, 'grad_norm': 0.8889179825782776, 'learning_rate': 9.997072172429863e-06, 'epoch': 0.33}
11%|█ | 1266/11526 [13:13<1:45:06, 1.63it/s] 11%|█ | 1267/11526 [13:14<1:45:08, 1.63it/s] {'loss': 0.2944, 'grad_norm': 0.6972739696502686, 'learning_rate': 9.997020128351638e-06, 'epoch': 0.33}
11%|█ | 1267/11526 [13:14<1:45:08, 1.63it/s] 11%|█ | 1268/11526 [13:14<1:45:17, 1.62it/s] {'loss': 0.4114, 'grad_norm': 0.7039806842803955, 'learning_rate': 9.996967625918295e-06, 'epoch': 0.33}
11%|█ | 1268/11526 [13:14<1:45:17, 1.62it/s] 11%|█ | 1269/11526 [13:15<1:45:10, 1.63it/s] {'loss': 0.2638, 'grad_norm': 0.6024688482284546, 'learning_rate': 9.996914665134649e-06, 'epoch': 0.33}
11%|█ | 1269/11526 [13:15<1:45:10, 1.63it/s] 11%|█ | 1270/11526 [13:15<1:45:09, 1.63it/s] {'loss': 0.3591, 'grad_norm': 0.6401560306549072, 'learning_rate': 9.996861246005558e-06, 'epoch': 0.33}
11%|█ | 1270/11526 [13:16<1:45:09, 1.63it/s] 11%|█ | 1271/11526 [13:16<1:45:12, 1.62it/s] {'loss': 0.3113, 'grad_norm': 0.6926708221435547, 'learning_rate': 9.996807368535926e-06, 'epoch': 0.33}
11%|█ | 1271/11526 [13:16<1:45:12, 1.62it/s] 11%|█ | 1272/11526 [13:17<1:45:03, 1.63it/s] {'loss': 0.4285, 'grad_norm': 0.7402824759483337, 'learning_rate': 9.996753032730688e-06, 'epoch': 0.33}
11%|█ | 1272/11526 [13:17<1:45:03, 1.63it/s] 11%|█ | 1273/11526 [13:17<1:45:00, 1.63it/s] {'loss': 0.3942, 'grad_norm': 0.8589272499084473, 'learning_rate': 9.996698238594832e-06, 'epoch': 0.33}
11%|█ | 1273/11526 [13:17<1:45:00, 1.63it/s] 11%|█ | 1274/11526 [13:18<1:44:59, 1.63it/s] {'loss': 0.3301, 'grad_norm': 0.6335020661354065, 'learning_rate': 9.996642986133386e-06, 'epoch': 0.33}
11%|█ | 1274/11526 [13:18<1:44:59, 1.63it/s] 11%|█ | 1275/11526 [13:18<1:44:53, 1.63it/s] {'loss': 0.391, 'grad_norm': 0.7658047676086426, 'learning_rate': 9.996587275351413e-06, 'epoch': 0.33}
11%|█ | 1275/11526 [13:19<1:44:53, 1.63it/s] 11%|█ | 1276/11526 [13:19<1:44:56, 1.63it/s] {'loss': 0.3011, 'grad_norm': 0.676794171333313, 'learning_rate': 9.996531106254027e-06, 'epoch': 0.33}
11%|█ | 1276/11526 [13:19<1:44:56, 1.63it/s] 11%|█ | 1277/11526 [13:20<1:44:55, 1.63it/s] {'loss': 0.324, 'grad_norm': 0.6311485767364502, 'learning_rate': 9.996474478846379e-06, 'epoch': 0.33}
11%|█ | 1277/11526 [13:20<1:44:55, 1.63it/s] 11%|█ | 1278/11526 [13:20<1:44:51, 1.63it/s] {'loss': 0.3468, 'grad_norm': 0.7284096479415894, 'learning_rate': 9.996417393133662e-06, 'epoch': 0.33}
11%|█ | 1278/11526 [13:20<1:44:51, 1.63it/s] 11%|█ | 1279/11526 [13:21<1:44:53, 1.63it/s] {'loss': 0.2791, 'grad_norm': 0.6159084439277649, 'learning_rate': 9.996359849121113e-06, 'epoch': 0.33}
11%|█ | 1279/11526 [13:21<1:44:53, 1.63it/s] 11%|█ | 1280/11526 [13:22<1:44:52, 1.63it/s] {'loss': 0.373, 'grad_norm': 0.7565951943397522, 'learning_rate': 9.996301846814012e-06, 'epoch': 0.33}
11%|█ | 1280/11526 [13:22<1:44:52, 1.63it/s] 11%|█ | 1281/11526 [13:22<1:44:51, 1.63it/s] {'loss': 0.3331, 'grad_norm': 0.7339642643928528, 'learning_rate': 9.996243386217678e-06, 'epoch': 0.33}
11%|█ | 1281/11526 [13:22<1:44:51, 1.63it/s] 11%|█ | 1282/11526 [13:23<1:44:51, 1.63it/s] {'loss': 0.4208, 'grad_norm': 0.7045592665672302, 'learning_rate': 9.996184467337471e-06, 'epoch': 0.33}
11%|█ | 1282/11526 [13:23<1:44:51, 1.63it/s] 11%|█ | 1283/11526 [13:23<1:44:51, 1.63it/s] {'loss': 0.4137, 'grad_norm': 0.7245872616767883, 'learning_rate': 9.9961250901788e-06, 'epoch': 0.33}
11%|█ | 1283/11526 [13:24<1:44:51, 1.63it/s] 11%|█ | 1284/11526 [13:24<1:44:50, 1.63it/s] {'loss': 0.3534, 'grad_norm': 0.6624929904937744, 'learning_rate': 9.996065254747107e-06, 'epoch': 0.33}
11%|█ | 1284/11526 [13:24<1:44:50, 1.63it/s] 11%|█ | 1285/11526 [13:25<1:44:53, 1.63it/s] {'loss': 0.2942, 'grad_norm': 0.5671316385269165, 'learning_rate': 9.996004961047883e-06, 'epoch': 0.33}
11%|█ | 1285/11526 [13:25<1:44:53, 1.63it/s] 11%|█ | 1286/11526 [13:25<1:44:52, 1.63it/s] {'loss': 0.2664, 'grad_norm': 0.5533472895622253, 'learning_rate': 9.995944209086658e-06, 'epoch': 0.33}
11%|█ | 1286/11526 [13:25<1:44:52, 1.63it/s] 11%|█ | 1287/11526 [13:26<1:44:48, 1.63it/s] {'loss': 0.3298, 'grad_norm': 0.6537714600563049, 'learning_rate': 9.995882998869005e-06, 'epoch': 0.33}
11%|█ | 1287/11526 [13:26<1:44:48, 1.63it/s] 11%|█ | 1288/11526 [13:26<1:44:45, 1.63it/s] {'loss': 0.3111, 'grad_norm': 0.5964373350143433, 'learning_rate': 9.995821330400538e-06, 'epoch': 0.34}
11%|█ | 1288/11526 [13:27<1:44:45, 1.63it/s] 11%|█ | 1289/11526 [13:27<1:44:45, 1.63it/s] {'loss': 0.3292, 'grad_norm': 0.7800673842430115, 'learning_rate': 9.995759203686912e-06, 'epoch': 0.34}
11%|█ | 1289/11526 [13:27<1:44:45, 1.63it/s] 11%|█ | 1290/11526 [13:28<1:44:44, 1.63it/s] {'loss': 0.3156, 'grad_norm': 0.7106887698173523, 'learning_rate': 9.99569661873383e-06, 'epoch': 0.34}
11%|█ | 1290/11526 [13:28<1:44:44, 1.63it/s] 11%|█ | 1291/11526 [13:28<1:44:53, 1.63it/s] {'loss': 0.3131, 'grad_norm': 0.5907251834869385, 'learning_rate': 9.995633575547027e-06, 'epoch': 0.34}
11%|█ | 1291/11526 [13:28<1:44:53, 1.63it/s] 11%|█ | 1292/11526 [13:29<1:44:53, 1.63it/s] {'loss': 0.3168, 'grad_norm': 0.5945401191711426, 'learning_rate': 9.995570074132289e-06, 'epoch': 0.34}
11%|█ | 1292/11526 [13:29<1:44:53, 1.63it/s] 11%|█ | 1293/11526 [13:30<1:44:55, 1.63it/s] {'loss': 0.2695, 'grad_norm': 0.6019432544708252, 'learning_rate': 9.995506114495442e-06, 'epoch': 0.34}
11%|█ | 1293/11526 [13:30<1:44:55, 1.63it/s] 11%|█ | 1294/11526 [13:30<1:44:53, 1.63it/s] {'loss': 0.3388, 'grad_norm': 0.6556792855262756, 'learning_rate': 9.995441696642347e-06, 'epoch': 0.34}
11%|█ | 1294/11526 [13:30<1:44:53, 1.63it/s] 11%|█ | 1295/11526 [13:31<1:44:52, 1.63it/s] {'loss': 0.3065, 'grad_norm': 0.679209291934967, 'learning_rate': 9.995376820578921e-06, 'epoch': 0.34}
11%|█ | 1295/11526 [13:31<1:44:52, 1.63it/s] 11%|█ | 1296/11526 [13:31<1:44:51, 1.63it/s] {'loss': 0.2975, 'grad_norm': 0.6286720037460327, 'learning_rate': 9.995311486311108e-06, 'epoch': 0.34}
11%|█ | 1296/11526 [13:32<1:44:51, 1.63it/s] 11%|█▏ | 1297/11526 [13:32<1:44:44, 1.63it/s] {'loss': 0.2615, 'grad_norm': 0.6150269508361816, 'learning_rate': 9.995245693844904e-06, 'epoch': 0.34}
11%|█▏ | 1297/11526 [13:32<1:44:44, 1.63it/s] 11%|█▏ | 1298/11526 [13:33<1:44:49, 1.63it/s] {'loss': 0.2877, 'grad_norm': 0.5755539536476135, 'learning_rate': 9.995179443186342e-06, 'epoch': 0.34}
11%|█▏ | 1298/11526 [13:33<1:44:49, 1.63it/s] 11%|█▏ | 1299/11526 [13:33<1:44:45, 1.63it/s] {'loss': 0.2668, 'grad_norm': 0.5693782567977905, 'learning_rate': 9.995112734341502e-06, 'epoch': 0.34}
11%|█▏ | 1299/11526 [13:33<1:44:45, 1.63it/s] 11%|█▏ | 1300/11526 [13:34<1:44:41, 1.63it/s] {'loss': 0.3709, 'grad_norm': 0.7264142036437988, 'learning_rate': 9.9950455673165e-06, 'epoch': 0.34}
11%|█▏ | 1300/11526 [13:34<1:44:41, 1.63it/s] 11%|█▏ | 1301/11526 [13:34<1:44:49, 1.63it/s] {'loss': 0.3214, 'grad_norm': 0.6575470566749573, 'learning_rate': 9.994977942117499e-06, 'epoch': 0.34}
11%|█▏ | 1301/11526 [13:35<1:44:49, 1.63it/s] 11%|█▏ | 1302/11526 [13:35<1:44:47, 1.63it/s] {'loss': 0.2677, 'grad_norm': 0.5613196492195129, 'learning_rate': 9.994909858750699e-06, 'epoch': 0.34}
11%|█▏ | 1302/11526 [13:35<1:44:47, 1.63it/s] 11%|█▏ | 1303/11526 [13:36<1:44:53, 1.62it/s] {'loss': 0.4101, 'grad_norm': 0.7098811864852905, 'learning_rate': 9.994841317222348e-06, 'epoch': 0.34}
11%|█▏ | 1303/11526 [13:36<1:44:53, 1.62it/s] 11%|█▏ | 1304/11526 [13:36<1:44:50, 1.63it/s] {'loss': 0.3269, 'grad_norm': 0.6028900742530823, 'learning_rate': 9.994772317538733e-06, 'epoch': 0.34}
11%|█▏ | 1304/11526 [13:36<1:44:50, 1.63it/s] 11%|█▏ | 1305/11526 [13:37<1:44:45, 1.63it/s] {'loss': 0.3502, 'grad_norm': 0.7001261711120605, 'learning_rate': 9.99470285970618e-06, 'epoch': 0.34}
11%|█▏ | 1305/11526 [13:37<1:44:45, 1.63it/s] 11%|█▏ | 1306/11526 [13:38<1:44:48, 1.63it/s] {'loss': 0.3637, 'grad_norm': 0.7320265173912048, 'learning_rate': 9.994632943731064e-06, 'epoch': 0.34}
11%|█▏ | 1306/11526 [13:38<1:44:48, 1.63it/s] 11%|█▏ | 1307/11526 [13:38<1:44:43, 1.63it/s] {'loss': 0.3182, 'grad_norm': 0.6643579602241516, 'learning_rate': 9.994562569619794e-06, 'epoch': 0.34}
11%|█▏ | 1307/11526 [13:38<1:44:43, 1.63it/s] 11%|█▏ | 1308/11526 [13:39<1:44:38, 1.63it/s] {'loss': 0.3967, 'grad_norm': 0.733094334602356, 'learning_rate': 9.99449173737883e-06, 'epoch': 0.34}
11%|█▏ | 1308/11526 [13:39<1:44:38, 1.63it/s] 11%|█▏ | 1309/11526 [13:39<1:44:36, 1.63it/s] {'loss': 0.3914, 'grad_norm': 0.6415351033210754, 'learning_rate': 9.994420447014664e-06, 'epoch': 0.34}
11%|█▏ | 1309/11526 [13:40<1:44:36, 1.63it/s] 11%|█▏ | 1310/11526 [13:40<1:44:33, 1.63it/s] {'loss': 0.3519, 'grad_norm': 0.7197930216789246, 'learning_rate': 9.994348698533836e-06, 'epoch': 0.34}
11%|█▏ | 1310/11526 [13:40<1:44:33, 1.63it/s] 11%|█▏ | 1311/11526 [13:41<1:44:37, 1.63it/s] {'loss': 0.2805, 'grad_norm': 0.6882822513580322, 'learning_rate': 9.994276491942932e-06, 'epoch': 0.34}
11%|█▏ | 1311/11526 [13:41<1:44:37, 1.63it/s] 11%|█▏ | 1312/11526 [13:41<1:44:34, 1.63it/s] {'loss': 0.3207, 'grad_norm': 0.7153171300888062, 'learning_rate': 9.99420382724857e-06, 'epoch': 0.34}
11%|█▏ | 1312/11526 [13:41<1:44:34, 1.63it/s] 11%|█▏ | 1313/11526 [13:42<1:44:34, 1.63it/s] {'loss': 0.3653, 'grad_norm': 0.6837087869644165, 'learning_rate': 9.994130704457417e-06, 'epoch': 0.34}
11%|█▏ | 1313/11526 [13:42<1:44:34, 1.63it/s] 11%|█▏ | 1314/11526 [13:42<1:44:36, 1.63it/s] {'loss': 0.2551, 'grad_norm': 0.6031266450881958, 'learning_rate': 9.994057123576182e-06, 'epoch': 0.34}
11%|█▏ | 1314/11526 [13:43<1:44:36, 1.63it/s] 11%|█▏ | 1315/11526 [13:43<1:44:33, 1.63it/s] {'loss': 0.2909, 'grad_norm': 0.6389250159263611, 'learning_rate': 9.993983084611612e-06, 'epoch': 0.34}
11%|█▏ | 1315/11526 [13:43<1:44:33, 1.63it/s] 11%|█▏ | 1316/11526 [13:44<1:44:36, 1.63it/s] {'loss': 0.3538, 'grad_norm': 0.7580963373184204, 'learning_rate': 9.993908587570497e-06, 'epoch': 0.34}
11%|█▏ | 1316/11526 [13:44<1:44:36, 1.63it/s] 11%|█▏ | 1317/11526 [13:44<1:44:34, 1.63it/s] {'loss': 0.2401, 'grad_norm': 0.5305936336517334, 'learning_rate': 9.993833632459675e-06, 'epoch': 0.34}
11%|█▏ | 1317/11526 [13:44<1:44:34, 1.63it/s] 11%|█▏ | 1318/11526 [13:45<1:44:31, 1.63it/s] {'loss': 0.3172, 'grad_norm': 0.6084617972373962, 'learning_rate': 9.993758219286018e-06, 'epoch': 0.34}
11%|█▏ | 1318/11526 [13:45<1:44:31, 1.63it/s] 11%|█▏ | 1319/11526 [13:46<1:44:32, 1.63it/s] {'loss': 0.2793, 'grad_norm': 0.5025671720504761, 'learning_rate': 9.993682348056443e-06, 'epoch': 0.34}
11%|█▏ | 1319/11526 [13:46<1:44:32, 1.63it/s] 11%|█▏ | 1320/11526 [13:46<1:44:29, 1.63it/s] {'loss': 0.3354, 'grad_norm': 0.7782684564590454, 'learning_rate': 9.99360601877791e-06, 'epoch': 0.34}
11%|█▏ | 1320/11526 [13:46<1:44:29, 1.63it/s] 11%|█▏ | 1321/11526 [13:47<1:44:58, 1.62it/s] {'loss': 0.2771, 'grad_norm': 0.633272647857666, 'learning_rate': 9.99352923145742e-06, 'epoch': 0.34}
11%|█▏ | 1321/11526 [13:47<1:44:58, 1.62it/s] 11%|█▏ | 1322/11526 [13:47<1:44:45, 1.62it/s] {'loss': 0.3723, 'grad_norm': 0.7040402293205261, 'learning_rate': 9.993451986102018e-06, 'epoch': 0.34}
11%|█▏ | 1322/11526 [13:48<1:44:45, 1.62it/s] 11%|█▏ | 1323/11526 [13:48<1:44:38, 1.62it/s] {'loss': 0.2535, 'grad_norm': 0.6618372201919556, 'learning_rate': 9.993374282718788e-06, 'epoch': 0.34}
11%|█▏ | 1323/11526 [13:48<1:44:38, 1.62it/s] 11%|█▏ | 1324/11526 [13:49<1:44:32, 1.63it/s] {'loss': 0.2526, 'grad_norm': 0.7533034086227417, 'learning_rate': 9.993296121314858e-06, 'epoch': 0.34}
11%|█▏ | 1324/11526 [13:49<1:44:32, 1.63it/s] 11%|█▏ | 1325/11526 [13:49<1:44:29, 1.63it/s] {'loss': 0.4275, 'grad_norm': 0.9341511726379395, 'learning_rate': 9.993217501897397e-06, 'epoch': 0.34}
11%|█▏ | 1325/11526 [13:49<1:44:29, 1.63it/s] 12%|█▏ | 1326/11526 [13:50<1:44:32, 1.63it/s] {'loss': 0.3209, 'grad_norm': 0.8029589653015137, 'learning_rate': 9.993138424473616e-06, 'epoch': 0.35}
12%|█▏ | 1326/11526 [13:50<1:44:32, 1.63it/s] 12%|█▏ | 1327/11526 [13:50<1:44:26, 1.63it/s] {'loss': 0.3499, 'grad_norm': 0.8351003527641296, 'learning_rate': 9.993058889050768e-06, 'epoch': 0.35}
12%|█▏ | 1327/11526 [13:51<1:44:26, 1.63it/s] 12%|█▏ | 1328/11526 [13:51<1:44:37, 1.62it/s] {'loss': 0.2928, 'grad_norm': 0.5988321900367737, 'learning_rate': 9.992978895636152e-06, 'epoch': 0.35}
12%|█▏ | 1328/11526 [13:51<1:44:37, 1.62it/s] 12%|█▏ | 1329/11526 [13:52<1:44:30, 1.63it/s] {'loss': 0.3862, 'grad_norm': 0.7084307074546814, 'learning_rate': 9.9928984442371e-06, 'epoch': 0.35}
12%|█▏ | 1329/11526 [13:52<1:44:30, 1.63it/s] 12%|█▏ | 1330/11526 [13:52<1:44:27, 1.63it/s] {'loss': 0.3139, 'grad_norm': 0.6446699500083923, 'learning_rate': 9.992817534860996e-06, 'epoch': 0.35}
12%|█▏ | 1330/11526 [13:52<1:44:27, 1.63it/s] 12%|█▏ | 1331/11526 [13:53<1:44:35, 1.62it/s] {'loss': 0.3482, 'grad_norm': 0.6382171511650085, 'learning_rate': 9.992736167515258e-06, 'epoch': 0.35}
12%|█▏ | 1331/11526 [13:53<1:44:35, 1.62it/s] 12%|█▏ | 1332/11526 [13:54<1:44:29, 1.63it/s] {'loss': 0.2809, 'grad_norm': 0.5841465592384338, 'learning_rate': 9.992654342207353e-06, 'epoch': 0.35}
12%|█▏ | 1332/11526 [13:54<1:44:29, 1.63it/s] 12%|█▏ | 1333/11526 [13:54<1:44:32, 1.63it/s] {'loss': 0.3108, 'grad_norm': 0.6990169882774353, 'learning_rate': 9.992572058944786e-06, 'epoch': 0.35}
12%|█▏ | 1333/11526 [13:54<1:44:32, 1.63it/s] 12%|█▏ | 1334/11526 [13:55<1:44:31, 1.63it/s] {'loss': 0.2563, 'grad_norm': 0.5782155990600586, 'learning_rate': 9.9924893177351e-06, 'epoch': 0.35}
12%|█▏ | 1334/11526 [13:55<1:44:31, 1.63it/s] 12%|█▏ | 1335/11526 [13:55<1:44:30, 1.63it/s] {'loss': 0.2597, 'grad_norm': 0.7464459538459778, 'learning_rate': 9.992406118585889e-06, 'epoch': 0.35}
12%|█▏ | 1335/11526 [13:55<1:44:30, 1.63it/s] 12%|█▏ | 1336/11526 [13:56<1:44:29, 1.63it/s] {'loss': 0.2705, 'grad_norm': 0.5487133860588074, 'learning_rate': 9.992322461504784e-06, 'epoch': 0.35}
12%|█▏ | 1336/11526 [13:56<1:44:29, 1.63it/s] 12%|█▏ | 1337/11526 [13:57<1:44:27, 1.63it/s] {'loss': 0.2982, 'grad_norm': 0.6034495234489441, 'learning_rate': 9.992238346499457e-06, 'epoch': 0.35}
12%|█▏ | 1337/11526 [13:57<1:44:27, 1.63it/s] 12%|█▏ | 1338/11526 [13:57<1:44:40, 1.62it/s] {'loss': 0.3157, 'grad_norm': 0.6241181492805481, 'learning_rate': 9.992153773577624e-06, 'epoch': 0.35}
12%|█▏ | 1338/11526 [13:57<1:44:40, 1.62it/s] 12%|█▏ | 1339/11526 [13:58<1:44:32, 1.62it/s] {'loss': 0.3283, 'grad_norm': 0.6818301677703857, 'learning_rate': 9.992068742747043e-06, 'epoch': 0.35}
12%|█▏ | 1339/11526 [13:58<1:44:32, 1.62it/s] 12%|█▏ | 1340/11526 [13:58<1:44:24, 1.63it/s] {'loss': 0.3731, 'grad_norm': 0.6760416030883789, 'learning_rate': 9.991983254015513e-06, 'epoch': 0.35}
12%|█▏ | 1340/11526 [13:59<1:44:24, 1.63it/s] 12%|█▏ | 1341/11526 [13:59<1:44:30, 1.62it/s] {'loss': 0.3288, 'grad_norm': 0.7776120901107788, 'learning_rate': 9.991897307390876e-06, 'epoch': 0.35}
12%|█▏ | 1341/11526 [13:59<1:44:30, 1.62it/s] 12%|█▏ | 1342/11526 [14:00<1:44:23, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.6183263063430786, 'learning_rate': 9.991810902881015e-06, 'epoch': 0.35}
12%|█▏ | 1342/11526 [14:00<1:44:23, 1.63it/s] 12%|█▏ | 1343/11526 [14:00<1:44:27, 1.62it/s] {'loss': 0.298, 'grad_norm': 0.5395490527153015, 'learning_rate': 9.991724040493855e-06, 'epoch': 0.35}
12%|█▏ | 1343/11526 [14:00<1:44:27, 1.62it/s] 12%|█▏ | 1344/11526 [14:01<1:44:22, 1.63it/s] {'loss': 0.411, 'grad_norm': 0.7309435606002808, 'learning_rate': 9.991636720237366e-06, 'epoch': 0.35}
12%|█▏ | 1344/11526 [14:01<1:44:22, 1.63it/s] 12%|█▏ | 1345/11526 [14:02<1:44:19, 1.63it/s] {'loss': 0.323, 'grad_norm': 0.7088828682899475, 'learning_rate': 9.991548942119553e-06, 'epoch': 0.35}
12%|█▏ | 1345/11526 [14:02<1:44:19, 1.63it/s] 12%|█▏ | 1346/11526 [14:02<1:44:18, 1.63it/s] {'loss': 0.2925, 'grad_norm': 0.6210641860961914, 'learning_rate': 9.991460706148473e-06, 'epoch': 0.35}
12%|█▏ | 1346/11526 [14:02<1:44:18, 1.63it/s] 12%|█▏ | 1347/11526 [14:03<1:44:14, 1.63it/s] {'loss': 0.2995, 'grad_norm': 0.6505029797554016, 'learning_rate': 9.991372012332216e-06, 'epoch': 0.35}
12%|█▏ | 1347/11526 [14:03<1:44:14, 1.63it/s] 12%|█▏ | 1348/11526 [14:03<1:44:24, 1.62it/s] {'loss': 0.3339, 'grad_norm': 0.6098588109016418, 'learning_rate': 9.991282860678919e-06, 'epoch': 0.35}
12%|█▏ | 1348/11526 [14:03<1:44:24, 1.62it/s] 12%|█▏ | 1349/11526 [14:04<1:44:19, 1.63it/s] {'loss': 0.3721, 'grad_norm': 0.6594542264938354, 'learning_rate': 9.991193251196758e-06, 'epoch': 0.35}
12%|█▏ | 1349/11526 [14:04<1:44:19, 1.63it/s] 12%|█▏ | 1350/11526 [14:05<1:44:16, 1.63it/s] {'loss': 0.3139, 'grad_norm': 0.6606386303901672, 'learning_rate': 9.991103183893953e-06, 'epoch': 0.35}
12%|█▏ | 1350/11526 [14:05<1:44:16, 1.63it/s] 12%|█▏ | 1351/11526 [14:05<1:44:22, 1.62it/s] {'loss': 0.2624, 'grad_norm': 0.5380088090896606, 'learning_rate': 9.991012658778764e-06, 'epoch': 0.35}
12%|█▏ | 1351/11526 [14:05<1:44:22, 1.62it/s] 12%|█▏ | 1352/11526 [14:06<1:44:11, 1.63it/s] {'loss': 0.2946, 'grad_norm': 0.6826528906822205, 'learning_rate': 9.9909216758595e-06, 'epoch': 0.35}
12%|█▏ | 1352/11526 [14:06<1:44:11, 1.63it/s] 12%|█▏ | 1353/11526 [14:06<1:44:43, 1.62it/s] {'loss': 0.3406, 'grad_norm': 0.6686887145042419, 'learning_rate': 9.990830235144501e-06, 'epoch': 0.35}
12%|█▏ | 1353/11526 [14:07<1:44:43, 1.62it/s] 12%|█▏ | 1354/11526 [14:07<1:44:30, 1.62it/s] {'loss': 0.3546, 'grad_norm': 0.6858396530151367, 'learning_rate': 9.990738336642157e-06, 'epoch': 0.35}
12%|█▏ | 1354/11526 [14:07<1:44:30, 1.62it/s] 12%|█▏ | 1355/11526 [14:08<1:44:25, 1.62it/s] {'loss': 0.2998, 'grad_norm': 0.5760311484336853, 'learning_rate': 9.990645980360895e-06, 'epoch': 0.35}
12%|█▏ | 1355/11526 [14:08<1:44:25, 1.62it/s] 12%|█▏ | 1356/11526 [14:08<1:44:27, 1.62it/s] {'loss': 0.3421, 'grad_norm': 0.7196638584136963, 'learning_rate': 9.990553166309188e-06, 'epoch': 0.35}
12%|█▏ | 1356/11526 [14:08<1:44:27, 1.62it/s] 12%|█▏ | 1357/11526 [14:09<1:44:19, 1.62it/s] {'loss': 0.3211, 'grad_norm': 0.7659049034118652, 'learning_rate': 9.99045989449555e-06, 'epoch': 0.35}
12%|█▏ | 1357/11526 [14:09<1:44:19, 1.62it/s] 12%|█▏ | 1358/11526 [14:10<1:44:27, 1.62it/s] {'loss': 0.3273, 'grad_norm': 0.6569952368736267, 'learning_rate': 9.990366164928538e-06, 'epoch': 0.35}
12%|█▏ | 1358/11526 [14:10<1:44:27, 1.62it/s] 12%|█▏ | 1359/11526 [14:10<1:44:19, 1.62it/s] {'loss': 0.3319, 'grad_norm': 0.6037618517875671, 'learning_rate': 9.990271977616746e-06, 'epoch': 0.35}
12%|█▏ | 1359/11526 [14:10<1:44:19, 1.62it/s] 12%|█▏ | 1360/11526 [14:11<1:44:12, 1.63it/s] {'loss': 0.308, 'grad_norm': 0.666756808757782, 'learning_rate': 9.990177332568813e-06, 'epoch': 0.35}
12%|█▏ | 1360/11526 [14:11<1:44:12, 1.63it/s] 12%|█▏ | 1361/11526 [14:11<1:44:17, 1.62it/s] {'loss': 0.2918, 'grad_norm': 0.6017860770225525, 'learning_rate': 9.990082229793423e-06, 'epoch': 0.35}
12%|█▏ | 1361/11526 [14:12<1:44:17, 1.62it/s] 12%|█▏ | 1362/11526 [14:12<1:44:10, 1.63it/s] {'loss': 0.1949, 'grad_norm': 0.5049149394035339, 'learning_rate': 9.989986669299302e-06, 'epoch': 0.35}
12%|█▏ | 1362/11526 [14:12<1:44:10, 1.63it/s] 12%|█▏ | 1363/11526 [14:13<1:44:14, 1.62it/s] {'loss': 0.3619, 'grad_norm': 0.6943830847740173, 'learning_rate': 9.989890651095207e-06, 'epoch': 0.35}
12%|█▏ | 1363/11526 [14:13<1:44:14, 1.62it/s] 12%|█▏ | 1364/11526 [14:13<1:44:08, 1.63it/s] {'loss': 0.3562, 'grad_norm': 0.6390529274940491, 'learning_rate': 9.989794175189953e-06, 'epoch': 0.36}
12%|█▏ | 1364/11526 [14:13<1:44:08, 1.63it/s] 12%|█▏ | 1365/11526 [14:14<1:44:08, 1.63it/s] {'loss': 0.283, 'grad_norm': 0.6571494936943054, 'learning_rate': 9.989697241592384e-06, 'epoch': 0.36}
12%|█▏ | 1365/11526 [14:14<1:44:08, 1.63it/s] 12%|█▏ | 1366/11526 [14:14<1:44:13, 1.62it/s] {'loss': 0.4196, 'grad_norm': 0.601210355758667, 'learning_rate': 9.989599850311396e-06, 'epoch': 0.36}
12%|█▏ | 1366/11526 [14:15<1:44:13, 1.62it/s] 12%|█▏ | 1367/11526 [14:15<1:44:09, 1.63it/s] {'loss': 0.3197, 'grad_norm': 0.6508076190948486, 'learning_rate': 9.98950200135592e-06, 'epoch': 0.36}
12%|█▏ | 1367/11526 [14:15<1:44:09, 1.63it/s] 12%|█▏ | 1368/11526 [14:16<1:44:12, 1.62it/s] {'loss': 0.2998, 'grad_norm': 0.8081549406051636, 'learning_rate': 9.98940369473493e-06, 'epoch': 0.36}
12%|█▏ | 1368/11526 [14:16<1:44:12, 1.62it/s] 12%|█▏ | 1369/11526 [14:16<1:44:05, 1.63it/s] {'loss': 0.2832, 'grad_norm': 0.5892232060432434, 'learning_rate': 9.989304930457446e-06, 'epoch': 0.36}
12%|█▏ | 1369/11526 [14:16<1:44:05, 1.63it/s] 12%|█▏ | 1370/11526 [14:17<1:43:56, 1.63it/s] {'loss': 0.2858, 'grad_norm': 0.6319658756256104, 'learning_rate': 9.989205708532526e-06, 'epoch': 0.36}
12%|█▏ | 1370/11526 [14:17<1:43:56, 1.63it/s] 12%|█▏ | 1371/11526 [14:18<1:44:22, 1.62it/s] {'loss': 0.2309, 'grad_norm': 0.7287460565567017, 'learning_rate': 9.98910602896927e-06, 'epoch': 0.36}
12%|█▏ | 1371/11526 [14:18<1:44:22, 1.62it/s] 12%|█▏ | 1372/11526 [14:18<1:44:13, 1.62it/s] {'loss': 0.272, 'grad_norm': 0.5740658044815063, 'learning_rate': 9.989005891776821e-06, 'epoch': 0.36}
12%|█▏ | 1372/11526 [14:18<1:44:13, 1.62it/s] 12%|█▏ | 1373/11526 [14:19<1:44:15, 1.62it/s] {'loss': 0.375, 'grad_norm': 0.7192324995994568, 'learning_rate': 9.988905296964368e-06, 'epoch': 0.36}
12%|█▏ | 1373/11526 [14:19<1:44:15, 1.62it/s] 12%|█▏ | 1374/11526 [14:19<1:44:07, 1.62it/s] {'loss': 0.4032, 'grad_norm': 0.6897481679916382, 'learning_rate': 9.988804244541132e-06, 'epoch': 0.36}
12%|█▏ | 1374/11526 [14:19<1:44:07, 1.62it/s] 12%|█▏ | 1375/11526 [14:20<1:44:00, 1.63it/s] {'loss': 0.324, 'grad_norm': 0.7175881266593933, 'learning_rate': 9.988702734516389e-06, 'epoch': 0.36}
12%|█▏ | 1375/11526 [14:20<1:44:00, 1.63it/s] 12%|█▏ | 1376/11526 [14:21<1:44:03, 1.63it/s] {'loss': 0.2731, 'grad_norm': 0.6210786700248718, 'learning_rate': 9.988600766899446e-06, 'epoch': 0.36}
12%|█▏ | 1376/11526 [14:21<1:44:03, 1.63it/s] 12%|█▏ | 1377/11526 [14:21<1:44:02, 1.63it/s] {'loss': 0.3089, 'grad_norm': 0.7107729315757751, 'learning_rate': 9.988498341699654e-06, 'epoch': 0.36}
12%|█▏ | 1377/11526 [14:21<1:44:02, 1.63it/s] 12%|█▏ | 1378/11526 [14:22<1:44:02, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.513292670249939, 'learning_rate': 9.988395458926414e-06, 'epoch': 0.36}
12%|█▏ | 1378/11526 [14:22<1:44:02, 1.63it/s] 12%|█▏ | 1379/11526 [14:22<1:43:58, 1.63it/s] {'loss': 0.3748, 'grad_norm': 0.6309224367141724, 'learning_rate': 9.988292118589158e-06, 'epoch': 0.36}
12%|█▏ | 1379/11526 [14:23<1:43:58, 1.63it/s] 12%|█▏ | 1380/11526 [14:23<1:44:03, 1.62it/s] {'loss': 0.4525, 'grad_norm': 0.9461695551872253, 'learning_rate': 9.988188320697368e-06, 'epoch': 0.36}
12%|█▏ | 1380/11526 [14:23<1:44:03, 1.62it/s] 12%|█▏ | 1381/11526 [14:24<1:44:04, 1.62it/s] {'loss': 0.3714, 'grad_norm': 0.6327297687530518, 'learning_rate': 9.988084065260563e-06, 'epoch': 0.36}
12%|█▏ | 1381/11526 [14:24<1:44:04, 1.62it/s] 12%|█▏ | 1382/11526 [14:24<1:44:03, 1.62it/s] {'loss': 0.2826, 'grad_norm': 0.5581488013267517, 'learning_rate': 9.987979352288307e-06, 'epoch': 0.36}
12%|█▏ | 1382/11526 [14:24<1:44:03, 1.62it/s] 12%|█▏ | 1383/11526 [14:25<1:44:02, 1.62it/s] {'loss': 0.334, 'grad_norm': 0.6141449213027954, 'learning_rate': 9.987874181790203e-06, 'epoch': 0.36}
12%|█▏ | 1383/11526 [14:25<1:44:02, 1.62it/s] 12%|█▏ | 1384/11526 [14:26<1:43:56, 1.63it/s] {'loss': 0.3325, 'grad_norm': 0.5696989893913269, 'learning_rate': 9.9877685537759e-06, 'epoch': 0.36}
12%|█▏ | 1384/11526 [14:26<1:43:56, 1.63it/s] 12%|█▏ | 1385/11526 [14:26<1:43:53, 1.63it/s] {'loss': 0.2497, 'grad_norm': 0.5925938487052917, 'learning_rate': 9.987662468255086e-06, 'epoch': 0.36}
12%|█▏ | 1385/11526 [14:26<1:43:53, 1.63it/s] 12%|█▏ | 1386/11526 [14:27<1:43:55, 1.63it/s] {'loss': 0.3087, 'grad_norm': 0.618736207485199, 'learning_rate': 9.987555925237491e-06, 'epoch': 0.36}
12%|█▏ | 1386/11526 [14:27<1:43:55, 1.63it/s] 12%|█▏ | 1387/11526 [14:27<1:43:49, 1.63it/s] {'loss': 0.3038, 'grad_norm': 0.7092956900596619, 'learning_rate': 9.98744892473289e-06, 'epoch': 0.36}
12%|█▏ | 1387/11526 [14:27<1:43:49, 1.63it/s] 12%|█▏ | 1388/11526 [14:28<1:43:52, 1.63it/s] {'loss': 0.2652, 'grad_norm': 0.5463821291923523, 'learning_rate': 9.987341466751094e-06, 'epoch': 0.36}
12%|█▏ | 1388/11526 [14:28<1:43:52, 1.63it/s] 12%|█▏ | 1389/11526 [14:29<1:43:45, 1.63it/s] {'loss': 0.3037, 'grad_norm': 0.5726853609085083, 'learning_rate': 9.987233551301963e-06, 'epoch': 0.36}
12%|█▏ | 1389/11526 [14:29<1:43:45, 1.63it/s] 12%|█▏ | 1390/11526 [14:29<1:43:48, 1.63it/s] {'loss': 0.362, 'grad_norm': 0.6323769092559814, 'learning_rate': 9.987125178395396e-06, 'epoch': 0.36}
12%|█▏ | 1390/11526 [14:29<1:43:48, 1.63it/s] 12%|█▏ | 1391/11526 [14:30<1:43:44, 1.63it/s] {'loss': 0.3818, 'grad_norm': 0.6809389591217041, 'learning_rate': 9.98701634804133e-06, 'epoch': 0.36}
12%|█▏ | 1391/11526 [14:30<1:43:44, 1.63it/s] 12%|█▏ | 1392/11526 [14:30<1:43:44, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.5206583142280579, 'learning_rate': 9.98690706024975e-06, 'epoch': 0.36}
12%|█▏ | 1392/11526 [14:31<1:43:44, 1.63it/s] 12%|█▏ | 1393/11526 [14:31<1:43:41, 1.63it/s] {'loss': 0.3256, 'grad_norm': 0.6861296892166138, 'learning_rate': 9.98679731503068e-06, 'epoch': 0.36}
12%|█▏ | 1393/11526 [14:31<1:43:41, 1.63it/s] 12%|█▏ | 1394/11526 [14:32<1:43:44, 1.63it/s] {'loss': 0.2731, 'grad_norm': 0.6039528846740723, 'learning_rate': 9.986687112394189e-06, 'epoch': 0.36}
12%|█▏ | 1394/11526 [14:32<1:43:44, 1.63it/s] 12%|█▏ | 1395/11526 [14:32<1:43:40, 1.63it/s] {'loss': 0.2649, 'grad_norm': 0.5175105929374695, 'learning_rate': 9.986576452350381e-06, 'epoch': 0.36}
12%|█▏ | 1395/11526 [14:32<1:43:40, 1.63it/s] 12%|█▏ | 1396/11526 [14:33<1:43:48, 1.63it/s] {'loss': 0.3459, 'grad_norm': 0.64835125207901, 'learning_rate': 9.986465334909408e-06, 'epoch': 0.36}
12%|█▏ | 1396/11526 [14:33<1:43:48, 1.63it/s] 12%|█▏ | 1397/11526 [14:34<1:43:42, 1.63it/s] {'loss': 0.2899, 'grad_norm': 0.6822006702423096, 'learning_rate': 9.986353760081462e-06, 'epoch': 0.36}
12%|█▏ | 1397/11526 [14:34<1:43:42, 1.63it/s] 12%|█▏ | 1398/11526 [14:34<1:43:43, 1.63it/s] {'loss': 0.3045, 'grad_norm': 0.6443101763725281, 'learning_rate': 9.98624172787678e-06, 'epoch': 0.36}
12%|█▏ | 1398/11526 [14:34<1:43:43, 1.63it/s] 12%|█▏ | 1399/11526 [14:35<1:43:43, 1.63it/s] {'loss': 0.2947, 'grad_norm': 0.5600707530975342, 'learning_rate': 9.986129238305635e-06, 'epoch': 0.36}
12%|█▏ | 1399/11526 [14:35<1:43:43, 1.63it/s] 12%|█▏ | 1400/11526 [14:35<1:43:39, 1.63it/s] {'loss': 0.2638, 'grad_norm': 0.5384290814399719, 'learning_rate': 9.986016291378347e-06, 'epoch': 0.36}
12%|█▏ | 1400/11526 [14:35<1:43:39, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.28it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.749698281288147, 'eval_runtime': 1.9547, 'eval_samples_per_second': 102.317, 'eval_steps_per_second': 6.651, 'epoch': 0.36}
12%|█▏ | 1400/11526 [14:37<1:43:39, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 12%|█▏ | 1401/11526 [14:38<3:22:51, 1.20s/it] {'loss': 0.2625, 'grad_norm': 0.5381671190261841, 'learning_rate': 9.985902887105274e-06, 'epoch': 0.36}
12%|█▏ | 1401/11526 [14:38<3:22:51, 1.20s/it] 12%|█▏ | 1402/11526 [14:39<2:53:04, 1.03s/it] {'loss': 0.3614, 'grad_norm': 0.6531431674957275, 'learning_rate': 9.985789025496822e-06, 'epoch': 0.36}
12%|█▏ | 1402/11526 [14:39<2:53:04, 1.03s/it] 12%|█▏ | 1403/11526 [14:39<2:32:16, 1.11it/s] {'loss': 0.3452, 'grad_norm': 0.5570220351219177, 'learning_rate': 9.985674706563431e-06, 'epoch': 0.37}
12%|█▏ | 1403/11526 [14:39<2:32:16, 1.11it/s] 12%|█▏ | 1404/11526 [14:40<2:17:40, 1.23it/s] {'loss': 0.2954, 'grad_norm': 0.6576467752456665, 'learning_rate': 9.985559930315587e-06, 'epoch': 0.37}
12%|█▏ | 1404/11526 [14:40<2:17:40, 1.23it/s] 12%|█▏ | 1405/11526 [14:40<2:07:25, 1.32it/s] {'loss': 0.3136, 'grad_norm': 0.7169822454452515, 'learning_rate': 9.985444696763822e-06, 'epoch': 0.37}
12%|█▏ | 1405/11526 [14:41<2:07:25, 1.32it/s] 12%|█▏ | 1406/11526 [14:41<2:00:11, 1.40it/s] {'loss': 0.3813, 'grad_norm': 0.7843633890151978, 'learning_rate': 9.985329005918702e-06, 'epoch': 0.37}
12%|█▏ | 1406/11526 [14:41<2:00:11, 1.40it/s] 12%|█▏ | 1407/11526 [14:42<1:55:12, 1.46it/s] {'loss': 0.4152, 'grad_norm': 0.6645270586013794, 'learning_rate': 9.985212857790842e-06, 'epoch': 0.37}
12%|█▏ | 1407/11526 [14:42<1:55:12, 1.46it/s] 12%|█▏ | 1408/11526 [14:42<1:51:42, 1.51it/s] {'loss': 0.3292, 'grad_norm': 0.7025036215782166, 'learning_rate': 9.985096252390893e-06, 'epoch': 0.37}
12%|█▏ | 1408/11526 [14:42<1:51:42, 1.51it/s] 12%|█▏ | 1409/11526 [14:43<1:49:14, 1.54it/s] {'loss': 0.3031, 'grad_norm': 0.6198179125785828, 'learning_rate': 9.984979189729552e-06, 'epoch': 0.37}
12%|█▏ | 1409/11526 [14:43<1:49:14, 1.54it/s] 12%|█▏ | 1410/11526 [14:43<1:47:30, 1.57it/s] {'loss': 0.2706, 'grad_norm': 0.6066552996635437, 'learning_rate': 9.984861669817556e-06, 'epoch': 0.37}
12%|█▏ | 1410/11526 [14:44<1:47:30, 1.57it/s] 12%|█▏ | 1411/11526 [14:44<1:46:17, 1.59it/s] {'loss': 0.33, 'grad_norm': 0.6727302074432373, 'learning_rate': 9.984743692665686e-06, 'epoch': 0.37}
12%|█▏ | 1411/11526 [14:44<1:46:17, 1.59it/s] 12%|█▏ | 1412/11526 [14:45<1:45:25, 1.60it/s] {'loss': 0.3084, 'grad_norm': 0.6618478894233704, 'learning_rate': 9.98462525828476e-06, 'epoch': 0.37}
12%|█▏ | 1412/11526 [14:45<1:45:25, 1.60it/s] 12%|█▏ | 1413/11526 [14:45<1:44:55, 1.61it/s] {'loss': 0.3002, 'grad_norm': 0.5781378149986267, 'learning_rate': 9.984506366685646e-06, 'epoch': 0.37}
12%|█▏ | 1413/11526 [14:45<1:44:55, 1.61it/s] 12%|█▏ | 1414/11526 [14:46<1:44:29, 1.61it/s] {'loss': 0.2757, 'grad_norm': 0.5696771144866943, 'learning_rate': 9.984387017879248e-06, 'epoch': 0.37}
12%|█▏ | 1414/11526 [14:46<1:44:29, 1.61it/s] 12%|█▏ | 1415/11526 [14:47<1:44:07, 1.62it/s] {'loss': 0.2533, 'grad_norm': 0.5833761692047119, 'learning_rate': 9.98426721187651e-06, 'epoch': 0.37}
12%|█▏ | 1415/11526 [14:47<1:44:07, 1.62it/s] 12%|█▏ | 1416/11526 [14:47<1:43:53, 1.62it/s] {'loss': 0.2833, 'grad_norm': 0.6906667947769165, 'learning_rate': 9.984146948688426e-06, 'epoch': 0.37}
12%|█▏ | 1416/11526 [14:47<1:43:53, 1.62it/s] 12%|█▏ | 1417/11526 [14:48<1:43:50, 1.62it/s] {'loss': 0.344, 'grad_norm': 0.6119542121887207, 'learning_rate': 9.984026228326024e-06, 'epoch': 0.37}
12%|█▏ | 1417/11526 [14:48<1:43:50, 1.62it/s] 12%|█▏ | 1418/11526 [14:48<1:43:42, 1.62it/s] {'loss': 0.3642, 'grad_norm': 0.6949847936630249, 'learning_rate': 9.98390505080038e-06, 'epoch': 0.37}
12%|█▏ | 1418/11526 [14:48<1:43:42, 1.62it/s] 12%|█▏ | 1419/11526 [14:49<1:43:41, 1.62it/s] {'loss': 0.2723, 'grad_norm': 0.6506890654563904, 'learning_rate': 9.983783416122605e-06, 'epoch': 0.37}
12%|█▏ | 1419/11526 [14:49<1:43:41, 1.62it/s] 12%|█▏ | 1420/11526 [14:50<1:43:39, 1.62it/s] {'loss': 0.2719, 'grad_norm': 0.5619133114814758, 'learning_rate': 9.983661324303861e-06, 'epoch': 0.37}
12%|█▏ | 1420/11526 [14:50<1:43:39, 1.62it/s] 12%|█▏ | 1421/11526 [14:50<1:43:36, 1.63it/s] {'loss': 0.3325, 'grad_norm': 0.6501272320747375, 'learning_rate': 9.983538775355343e-06, 'epoch': 0.37}
12%|█▏ | 1421/11526 [14:50<1:43:36, 1.63it/s] 12%|█▏ | 1422/11526 [14:51<1:43:30, 1.63it/s] {'loss': 0.3256, 'grad_norm': 0.6278843879699707, 'learning_rate': 9.983415769288295e-06, 'epoch': 0.37}
12%|█▏ | 1422/11526 [14:51<1:43:30, 1.63it/s] 12%|█▏ | 1423/11526 [14:51<1:43:28, 1.63it/s] {'loss': 0.3029, 'grad_norm': 0.6156235933303833, 'learning_rate': 9.983292306113997e-06, 'epoch': 0.37}
12%|█▏ | 1423/11526 [14:52<1:43:28, 1.63it/s] 12%|█▏ | 1424/11526 [14:52<1:43:25, 1.63it/s] {'loss': 0.2526, 'grad_norm': 0.6608335375785828, 'learning_rate': 9.983168385843776e-06, 'epoch': 0.37}
12%|█▏ | 1424/11526 [14:52<1:43:25, 1.63it/s] 12%|█▏ | 1425/11526 [14:53<1:43:30, 1.63it/s] {'loss': 0.3671, 'grad_norm': 0.6689117550849915, 'learning_rate': 9.983044008488996e-06, 'epoch': 0.37}
12%|█▏ | 1425/11526 [14:53<1:43:30, 1.63it/s] 12%|█▏ | 1426/11526 [14:53<1:43:36, 1.62it/s] {'loss': 0.3057, 'grad_norm': 0.5705158114433289, 'learning_rate': 9.982919174061067e-06, 'epoch': 0.37}
12%|█▏ | 1426/11526 [14:53<1:43:36, 1.62it/s] 12%|█▏ | 1427/11526 [14:54<1:43:31, 1.63it/s] {'loss': 0.3513, 'grad_norm': 0.6976368427276611, 'learning_rate': 9.982793882571442e-06, 'epoch': 0.37}
12%|█▏ | 1427/11526 [14:54<1:43:31, 1.63it/s] 12%|█▏ | 1428/11526 [14:55<1:43:35, 1.62it/s] {'loss': 0.256, 'grad_norm': 0.5510998368263245, 'learning_rate': 9.98266813403161e-06, 'epoch': 0.37}
12%|█▏ | 1428/11526 [14:55<1:43:35, 1.62it/s] 12%|█▏ | 1429/11526 [14:55<1:43:27, 1.63it/s] {'loss': 0.3734, 'grad_norm': 0.846614420413971, 'learning_rate': 9.982541928453105e-06, 'epoch': 0.37}
12%|█▏ | 1429/11526 [14:55<1:43:27, 1.63it/s] 12%|█▏ | 1430/11526 [14:56<1:43:26, 1.63it/s] {'loss': 0.2657, 'grad_norm': 0.5048947930335999, 'learning_rate': 9.982415265847507e-06, 'epoch': 0.37}
12%|█▏ | 1430/11526 [14:56<1:43:26, 1.63it/s] 12%|█▏ | 1431/11526 [14:56<1:43:26, 1.63it/s] {'loss': 0.2871, 'grad_norm': 0.6087683439254761, 'learning_rate': 9.982288146226431e-06, 'epoch': 0.37}
12%|█▏ | 1431/11526 [14:56<1:43:26, 1.63it/s] 12%|█▏ | 1432/11526 [14:57<1:43:21, 1.63it/s] {'loss': 0.3239, 'grad_norm': 0.6234449148178101, 'learning_rate': 9.982160569601539e-06, 'epoch': 0.37}
12%|█▏ | 1432/11526 [14:57<1:43:21, 1.63it/s] 12%|█▏ | 1433/11526 [14:58<1:43:32, 1.62it/s] {'loss': 0.372, 'grad_norm': 0.7929520606994629, 'learning_rate': 9.982032535984532e-06, 'epoch': 0.37}
12%|█▏ | 1433/11526 [14:58<1:43:32, 1.62it/s] 12%|█▏ | 1434/11526 [14:58<1:43:28, 1.63it/s] {'loss': 0.297, 'grad_norm': 0.5730663537979126, 'learning_rate': 9.981904045387154e-06, 'epoch': 0.37}
12%|█▏ | 1434/11526 [14:58<1:43:28, 1.63it/s] 12%|█▏ | 1435/11526 [14:59<1:43:23, 1.63it/s] {'loss': 0.3349, 'grad_norm': 0.7714781165122986, 'learning_rate': 9.981775097821189e-06, 'epoch': 0.37}
12%|█▏ | 1435/11526 [14:59<1:43:23, 1.63it/s] 12%|█▏ | 1436/11526 [14:59<1:43:22, 1.63it/s] {'loss': 0.3265, 'grad_norm': 0.7265565395355225, 'learning_rate': 9.981645693298469e-06, 'epoch': 0.37}
12%|█▏ | 1436/11526 [15:00<1:43:22, 1.63it/s] 12%|█▏ | 1437/11526 [15:00<1:43:19, 1.63it/s] {'loss': 0.2576, 'grad_norm': 0.5610262155532837, 'learning_rate': 9.981515831830861e-06, 'epoch': 0.37}
12%|█▏ | 1437/11526 [15:00<1:43:19, 1.63it/s] 12%|█▏ | 1438/11526 [15:01<1:43:14, 1.63it/s] {'loss': 0.3001, 'grad_norm': 0.6407801508903503, 'learning_rate': 9.98138551343028e-06, 'epoch': 0.37}
12%|█▏ | 1438/11526 [15:01<1:43:14, 1.63it/s] 12%|█▏ | 1439/11526 [15:01<1:43:15, 1.63it/s] {'loss': 0.3699, 'grad_norm': 0.6995333433151245, 'learning_rate': 9.981254738108674e-06, 'epoch': 0.37}
12%|█▏ | 1439/11526 [15:01<1:43:15, 1.63it/s] 12%|█▏ | 1440/11526 [15:02<1:43:13, 1.63it/s] {'loss': 0.2506, 'grad_norm': 0.5866441130638123, 'learning_rate': 9.98112350587804e-06, 'epoch': 0.37}
12%|█▏ | 1440/11526 [15:02<1:43:13, 1.63it/s] 13%|█▎ | 1441/11526 [15:03<1:43:20, 1.63it/s] {'loss': 0.3213, 'grad_norm': 0.5443451404571533, 'learning_rate': 9.98099181675042e-06, 'epoch': 0.38}
13%|█▎ | 1441/11526 [15:03<1:43:20, 1.63it/s] 13%|█▎ | 1442/11526 [15:03<1:43:17, 1.63it/s] {'loss': 0.2509, 'grad_norm': 0.5806031823158264, 'learning_rate': 9.980859670737889e-06, 'epoch': 0.38}
13%|█▎ | 1442/11526 [15:03<1:43:17, 1.63it/s] 13%|█▎ | 1443/11526 [15:04<1:43:54, 1.62it/s] {'loss': 0.298, 'grad_norm': 0.6566796898841858, 'learning_rate': 9.980727067852567e-06, 'epoch': 0.38}
13%|█▎ | 1443/11526 [15:04<1:43:54, 1.62it/s] 13%|█▎ | 1444/11526 [15:04<1:43:41, 1.62it/s] {'loss': 0.2901, 'grad_norm': 0.6535646915435791, 'learning_rate': 9.980594008106621e-06, 'epoch': 0.38}
13%|█▎ | 1444/11526 [15:04<1:43:41, 1.62it/s] 13%|█▎ | 1445/11526 [15:05<1:43:32, 1.62it/s] {'loss': 0.358, 'grad_norm': 0.7097778916358948, 'learning_rate': 9.980460491512253e-06, 'epoch': 0.38}
13%|█▎ | 1445/11526 [15:05<1:43:32, 1.62it/s] 13%|█▎ | 1446/11526 [15:06<1:43:30, 1.62it/s] {'loss': 0.3274, 'grad_norm': 0.6893709301948547, 'learning_rate': 9.980326518081711e-06, 'epoch': 0.38}
13%|█▎ | 1446/11526 [15:06<1:43:30, 1.62it/s] 13%|█▎ | 1447/11526 [15:06<1:43:24, 1.62it/s] {'loss': 0.3078, 'grad_norm': 0.6994718909263611, 'learning_rate': 9.980192087827286e-06, 'epoch': 0.38}
13%|█▎ | 1447/11526 [15:06<1:43:24, 1.62it/s] 13%|█▎ | 1448/11526 [15:07<1:43:31, 1.62it/s] {'loss': 0.2892, 'grad_norm': 0.5980274677276611, 'learning_rate': 9.980057200761303e-06, 'epoch': 0.38}
13%|█▎ | 1448/11526 [15:07<1:43:31, 1.62it/s] 13%|█▎ | 1449/11526 [15:07<1:43:25, 1.62it/s] {'loss': 0.3443, 'grad_norm': 0.7826482653617859, 'learning_rate': 9.979921856896143e-06, 'epoch': 0.38}
13%|█▎ | 1449/11526 [15:08<1:43:25, 1.62it/s] 13%|█▎ | 1450/11526 [15:08<1:43:16, 1.63it/s] {'loss': 0.2828, 'grad_norm': 0.6203989386558533, 'learning_rate': 9.979786056244211e-06, 'epoch': 0.38}
13%|█▎ | 1450/11526 [15:08<1:43:16, 1.63it/s] 13%|█▎ | 1451/11526 [15:09<1:43:20, 1.62it/s] {'loss': 0.4063, 'grad_norm': 0.6949871182441711, 'learning_rate': 9.979649798817971e-06, 'epoch': 0.38}
13%|█▎ | 1451/11526 [15:09<1:43:20, 1.62it/s] 13%|█▎ | 1452/11526 [15:09<1:43:15, 1.63it/s] {'loss': 0.3667, 'grad_norm': 0.6781949996948242, 'learning_rate': 9.979513084629917e-06, 'epoch': 0.38}
13%|█▎ | 1452/11526 [15:09<1:43:15, 1.63it/s] 13%|█▎ | 1453/11526 [15:10<1:43:22, 1.62it/s] {'loss': 0.2969, 'grad_norm': 0.6303414702415466, 'learning_rate': 9.979375913692592e-06, 'epoch': 0.38}
13%|█▎ | 1453/11526 [15:10<1:43:22, 1.62it/s] 13%|█▎ | 1454/11526 [15:11<1:43:15, 1.63it/s] {'loss': 0.3205, 'grad_norm': 0.6470819711685181, 'learning_rate': 9.979238286018576e-06, 'epoch': 0.38}
13%|█▎ | 1454/11526 [15:11<1:43:15, 1.63it/s] 13%|█▎ | 1455/11526 [15:11<1:43:12, 1.63it/s] {'loss': 0.3995, 'grad_norm': 0.8899050354957581, 'learning_rate': 9.979100201620493e-06, 'epoch': 0.38}
13%|█▎ | 1455/11526 [15:11<1:43:12, 1.63it/s] 13%|█▎ | 1456/11526 [15:12<1:43:12, 1.63it/s] {'loss': 0.3354, 'grad_norm': 0.6659083366394043, 'learning_rate': 9.97896166051101e-06, 'epoch': 0.38}
13%|█▎ | 1456/11526 [15:12<1:43:12, 1.63it/s] 13%|█▎ | 1457/11526 [15:12<1:43:12, 1.63it/s] {'loss': 0.289, 'grad_norm': 0.6129001379013062, 'learning_rate': 9.978822662702835e-06, 'epoch': 0.38}
13%|█▎ | 1457/11526 [15:12<1:43:12, 1.63it/s] 13%|█▎ | 1458/11526 [15:13<1:43:13, 1.63it/s] {'loss': 0.3429, 'grad_norm': 0.6726822853088379, 'learning_rate': 9.978683208208716e-06, 'epoch': 0.38}
13%|█▎ | 1458/11526 [15:13<1:43:13, 1.63it/s] 13%|█▎ | 1459/11526 [15:14<1:43:10, 1.63it/s] {'loss': 0.3489, 'grad_norm': 0.6367989182472229, 'learning_rate': 9.978543297041448e-06, 'epoch': 0.38}
13%|█▎ | 1459/11526 [15:14<1:43:10, 1.63it/s] 13%|█▎ | 1460/11526 [15:14<1:43:08, 1.63it/s] {'loss': 0.3245, 'grad_norm': 0.6455172300338745, 'learning_rate': 9.978402929213859e-06, 'epoch': 0.38}
13%|█▎ | 1460/11526 [15:14<1:43:08, 1.63it/s] 13%|█▎ | 1461/11526 [15:15<1:43:08, 1.63it/s] {'loss': 0.2698, 'grad_norm': 0.5750539898872375, 'learning_rate': 9.978262104738829e-06, 'epoch': 0.38}
13%|█▎ | 1461/11526 [15:15<1:43:08, 1.63it/s] 13%|█▎ | 1462/11526 [15:15<1:43:09, 1.63it/s] {'loss': 0.3006, 'grad_norm': 0.6361469626426697, 'learning_rate': 9.978120823629274e-06, 'epoch': 0.38}
13%|█▎ | 1462/11526 [15:16<1:43:09, 1.63it/s] 13%|█▎ | 1463/11526 [15:16<1:43:15, 1.62it/s] {'loss': 0.3448, 'grad_norm': 0.6067706942558289, 'learning_rate': 9.97797908589815e-06, 'epoch': 0.38}
13%|█▎ | 1463/11526 [15:16<1:43:15, 1.62it/s] 13%|█▎ | 1464/11526 [15:17<1:43:09, 1.63it/s] {'loss': 0.2964, 'grad_norm': 0.6492442488670349, 'learning_rate': 9.977836891558464e-06, 'epoch': 0.38}
13%|█▎ | 1464/11526 [15:17<1:43:09, 1.63it/s] 13%|█▎ | 1465/11526 [15:17<1:43:02, 1.63it/s] {'loss': 0.2401, 'grad_norm': 0.5736812353134155, 'learning_rate': 9.977694240623254e-06, 'epoch': 0.38}
13%|█▎ | 1465/11526 [15:17<1:43:02, 1.63it/s] 13%|█▎ | 1466/11526 [15:18<1:42:58, 1.63it/s] {'loss': 0.2651, 'grad_norm': 0.5872892141342163, 'learning_rate': 9.977551133105607e-06, 'epoch': 0.38}
13%|█▎ | 1466/11526 [15:18<1:42:58, 1.63it/s] 13%|█▎ | 1467/11526 [15:18<1:42:53, 1.63it/s] {'loss': 0.4335, 'grad_norm': 0.8190596699714661, 'learning_rate': 9.97740756901865e-06, 'epoch': 0.38}
13%|█▎ | 1467/11526 [15:19<1:42:53, 1.63it/s] 13%|█▎ | 1468/11526 [15:19<1:43:01, 1.63it/s] {'loss': 0.2903, 'grad_norm': 0.661781907081604, 'learning_rate': 9.977263548375548e-06, 'epoch': 0.38}
13%|█▎ | 1468/11526 [15:19<1:43:01, 1.63it/s] 13%|█▎ | 1469/11526 [15:20<1:43:01, 1.63it/s] {'loss': 0.3257, 'grad_norm': 0.61982661485672, 'learning_rate': 9.977119071189514e-06, 'epoch': 0.38}
13%|█▎ | 1469/11526 [15:20<1:43:01, 1.63it/s] 13%|█▎ | 1470/11526 [15:20<1:43:03, 1.63it/s] {'loss': 0.2591, 'grad_norm': 0.6490437388420105, 'learning_rate': 9.976974137473803e-06, 'epoch': 0.38}
13%|█▎ | 1470/11526 [15:20<1:43:03, 1.63it/s] 13%|█▎ | 1471/11526 [15:21<1:42:59, 1.63it/s] {'loss': 0.3471, 'grad_norm': 0.7030933499336243, 'learning_rate': 9.976828747241704e-06, 'epoch': 0.38}
13%|█▎ | 1471/11526 [15:21<1:42:59, 1.63it/s] 13%|█▎ | 1472/11526 [15:22<1:42:56, 1.63it/s] {'loss': 0.3627, 'grad_norm': 0.8273230791091919, 'learning_rate': 9.976682900506555e-06, 'epoch': 0.38}
13%|█▎ | 1472/11526 [15:22<1:42:56, 1.63it/s] 13%|█▎ | 1473/11526 [15:22<1:43:05, 1.63it/s] {'loss': 0.3672, 'grad_norm': 0.6604546308517456, 'learning_rate': 9.976536597281736e-06, 'epoch': 0.38}
13%|█▎ | 1473/11526 [15:22<1:43:05, 1.63it/s] 13%|█▎ | 1474/11526 [15:23<1:42:59, 1.63it/s] {'loss': 0.2702, 'grad_norm': 0.5878256559371948, 'learning_rate': 9.976389837580664e-06, 'epoch': 0.38}
13%|█▎ | 1474/11526 [15:23<1:42:59, 1.63it/s] 13%|█▎ | 1475/11526 [15:23<1:42:55, 1.63it/s] {'loss': 0.2809, 'grad_norm': 0.7121512293815613, 'learning_rate': 9.9762426214168e-06, 'epoch': 0.38}
13%|█▎ | 1475/11526 [15:24<1:42:55, 1.63it/s] 13%|█▎ | 1476/11526 [15:24<1:42:53, 1.63it/s] {'loss': 0.2918, 'grad_norm': 0.6432474255561829, 'learning_rate': 9.976094948803652e-06, 'epoch': 0.38}
13%|█▎ | 1476/11526 [15:24<1:42:53, 1.63it/s] 13%|█▎ | 1477/11526 [15:25<1:42:53, 1.63it/s] {'loss': 0.2464, 'grad_norm': 0.5856762528419495, 'learning_rate': 9.97594681975476e-06, 'epoch': 0.38}
13%|█▎ | 1477/11526 [15:25<1:42:53, 1.63it/s] 13%|█▎ | 1478/11526 [15:25<1:43:04, 1.62it/s] {'loss': 0.2322, 'grad_norm': 0.5725975036621094, 'learning_rate': 9.975798234283716e-06, 'epoch': 0.38}
13%|█▎ | 1478/11526 [15:25<1:43:04, 1.62it/s] 13%|█▎ | 1479/11526 [15:26<1:42:57, 1.63it/s] {'loss': 0.3478, 'grad_norm': 0.7632825374603271, 'learning_rate': 9.975649192404144e-06, 'epoch': 0.38}
13%|█▎ | 1479/11526 [15:26<1:42:57, 1.63it/s] 13%|█▎ | 1480/11526 [15:26<1:42:58, 1.63it/s] {'loss': 0.2899, 'grad_norm': 0.5771756768226624, 'learning_rate': 9.97549969412972e-06, 'epoch': 0.39}
13%|█▎ | 1480/11526 [15:27<1:42:58, 1.63it/s] 13%|█▎ | 1481/11526 [15:27<1:42:59, 1.63it/s] {'loss': 0.2996, 'grad_norm': 0.5421635508537292, 'learning_rate': 9.975349739474156e-06, 'epoch': 0.39}
13%|█▎ | 1481/11526 [15:27<1:42:59, 1.63it/s] 13%|█▎ | 1482/11526 [15:28<1:42:58, 1.63it/s] {'loss': 0.2592, 'grad_norm': 0.7901323437690735, 'learning_rate': 9.9751993284512e-06, 'epoch': 0.39}
13%|█▎ | 1482/11526 [15:28<1:42:58, 1.63it/s] 13%|█▎ | 1483/11526 [15:28<1:43:02, 1.62it/s] {'loss': 0.2602, 'grad_norm': 0.5977946519851685, 'learning_rate': 9.97504846107466e-06, 'epoch': 0.39}
13%|█▎ | 1483/11526 [15:28<1:43:02, 1.62it/s] 13%|█▎ | 1484/11526 [15:29<1:43:01, 1.62it/s] {'loss': 0.3167, 'grad_norm': 0.6605621576309204, 'learning_rate': 9.974897137358366e-06, 'epoch': 0.39}
13%|█▎ | 1484/11526 [15:29<1:43:01, 1.62it/s] 13%|█▎ | 1485/11526 [15:30<1:42:56, 1.63it/s] {'loss': 0.246, 'grad_norm': 0.6374636888504028, 'learning_rate': 9.9747453573162e-06, 'epoch': 0.39}
13%|█▎ | 1485/11526 [15:30<1:42:56, 1.63it/s] 13%|█▎ | 1486/11526 [15:30<1:42:58, 1.63it/s] {'loss': 0.4041, 'grad_norm': 0.6845167875289917, 'learning_rate': 9.974593120962084e-06, 'epoch': 0.39}
13%|█▎ | 1486/11526 [15:30<1:42:58, 1.63it/s] 13%|█▎ | 1487/11526 [15:31<1:42:53, 1.63it/s] {'loss': 0.2611, 'grad_norm': 0.5713543891906738, 'learning_rate': 9.974440428309984e-06, 'epoch': 0.39}
13%|█▎ | 1487/11526 [15:31<1:42:53, 1.63it/s] 13%|█▎ | 1488/11526 [15:31<1:42:55, 1.63it/s] {'loss': 0.1972, 'grad_norm': 0.4700230360031128, 'learning_rate': 9.974287279373904e-06, 'epoch': 0.39}
13%|█▎ | 1488/11526 [15:32<1:42:55, 1.63it/s] 13%|█▎ | 1489/11526 [15:32<1:42:51, 1.63it/s] {'loss': 0.3368, 'grad_norm': 0.7086900472640991, 'learning_rate': 9.974133674167892e-06, 'epoch': 0.39}
13%|█▎ | 1489/11526 [15:32<1:42:51, 1.63it/s] 13%|█▎ | 1490/11526 [15:33<1:42:50, 1.63it/s] {'loss': 0.2638, 'grad_norm': 0.7060227990150452, 'learning_rate': 9.973979612706038e-06, 'epoch': 0.39}
13%|█▎ | 1490/11526 [15:33<1:42:50, 1.63it/s] 13%|█▎ | 1491/11526 [15:33<1:42:45, 1.63it/s] {'loss': 0.2833, 'grad_norm': 0.6238495707511902, 'learning_rate': 9.973825095002474e-06, 'epoch': 0.39}
13%|█▎ | 1491/11526 [15:33<1:42:45, 1.63it/s] 13%|█▎ | 1492/11526 [15:34<1:42:41, 1.63it/s] {'loss': 0.4324, 'grad_norm': 0.7165035605430603, 'learning_rate': 9.973670121071371e-06, 'epoch': 0.39}
13%|█▎ | 1492/11526 [15:34<1:42:41, 1.63it/s] 13%|█▎ | 1493/11526 [15:34<1:42:47, 1.63it/s] {'loss': 0.2783, 'grad_norm': 0.734086275100708, 'learning_rate': 9.973514690926947e-06, 'epoch': 0.39}
13%|█▎ | 1493/11526 [15:35<1:42:47, 1.63it/s] 13%|█▎ | 1494/11526 [15:35<1:42:42, 1.63it/s] {'loss': 0.355, 'grad_norm': 0.7286525964736938, 'learning_rate': 9.973358804583454e-06, 'epoch': 0.39}
13%|█▎ | 1494/11526 [15:35<1:42:42, 1.63it/s] 13%|█▎ | 1495/11526 [15:36<1:42:40, 1.63it/s] {'loss': 0.2598, 'grad_norm': 0.6584556698799133, 'learning_rate': 9.973202462055199e-06, 'epoch': 0.39}
13%|█▎ | 1495/11526 [15:36<1:42:40, 1.63it/s] 13%|█▎ | 1496/11526 [15:36<1:42:40, 1.63it/s] {'loss': 0.3591, 'grad_norm': 0.7812018394470215, 'learning_rate': 9.973045663356515e-06, 'epoch': 0.39}
13%|█▎ | 1496/11526 [15:36<1:42:40, 1.63it/s] 13%|█▎ | 1497/11526 [15:37<1:42:37, 1.63it/s] {'loss': 0.3117, 'grad_norm': 0.731386661529541, 'learning_rate': 9.972888408501788e-06, 'epoch': 0.39}
13%|█▎ | 1497/11526 [15:37<1:42:37, 1.63it/s] 13%|█▎ | 1498/11526 [15:38<1:42:42, 1.63it/s] {'loss': 0.3084, 'grad_norm': 0.7928164005279541, 'learning_rate': 9.972730697505442e-06, 'epoch': 0.39}
13%|█▎ | 1498/11526 [15:38<1:42:42, 1.63it/s] 13%|█▎ | 1499/11526 [15:38<1:42:39, 1.63it/s] {'loss': 0.3356, 'grad_norm': 0.567084014415741, 'learning_rate': 9.972572530381942e-06, 'epoch': 0.39}
13%|█▎ | 1499/11526 [15:38<1:42:39, 1.63it/s] 13%|█▎ | 1500/11526 [15:39<1:42:38, 1.63it/s] {'loss': 0.2839, 'grad_norm': 0.5622056722640991, 'learning_rate': 9.972413907145797e-06, 'epoch': 0.39}
13%|█▎ | 1500/11526 [15:39<1:42:38, 1.63it/s] 13%|█▎ | 1501/11526 [15:39<1:42:37, 1.63it/s] {'loss': 0.2786, 'grad_norm': 0.6816549897193909, 'learning_rate': 9.972254827811556e-06, 'epoch': 0.39}
13%|█▎ | 1501/11526 [15:40<1:42:37, 1.63it/s] 13%|█▎ | 1502/11526 [15:40<1:42:35, 1.63it/s] {'loss': 0.306, 'grad_norm': 0.6374323964118958, 'learning_rate': 9.97209529239381e-06, 'epoch': 0.39}
13%|█▎ | 1502/11526 [15:40<1:42:35, 1.63it/s] 13%|█▎ | 1503/11526 [15:41<1:42:37, 1.63it/s] {'loss': 0.2947, 'grad_norm': 0.6392992734909058, 'learning_rate': 9.971935300907197e-06, 'epoch': 0.39}
13%|█▎ | 1503/11526 [15:41<1:42:37, 1.63it/s] 13%|█▎ | 1504/11526 [15:41<1:42:36, 1.63it/s] {'loss': 0.2773, 'grad_norm': 0.5891335010528564, 'learning_rate': 9.971774853366388e-06, 'epoch': 0.39}
13%|█▎ | 1504/11526 [15:41<1:42:36, 1.63it/s] 13%|█▎ | 1505/11526 [15:42<1:42:35, 1.63it/s] {'loss': 0.369, 'grad_norm': 0.7568463683128357, 'learning_rate': 9.971613949786099e-06, 'epoch': 0.39}
13%|█▎ | 1505/11526 [15:42<1:42:35, 1.63it/s] 13%|█▎ | 1506/11526 [15:42<1:42:40, 1.63it/s] {'loss': 0.2233, 'grad_norm': 0.5363214612007141, 'learning_rate': 9.971452590181095e-06, 'epoch': 0.39}
13%|█▎ | 1506/11526 [15:43<1:42:40, 1.63it/s] 13%|█▎ | 1507/11526 [15:43<1:42:45, 1.62it/s] {'loss': 0.2946, 'grad_norm': 0.6083124876022339, 'learning_rate': 9.97129077456617e-06, 'epoch': 0.39}
13%|█▎ | 1507/11526 [15:43<1:42:45, 1.62it/s] 13%|█▎ | 1508/11526 [15:44<1:42:41, 1.63it/s] {'loss': 0.2543, 'grad_norm': 0.5521849989891052, 'learning_rate': 9.971128502956172e-06, 'epoch': 0.39}
13%|█▎ | 1508/11526 [15:44<1:42:41, 1.63it/s] 13%|█▎ | 1509/11526 [15:44<1:42:36, 1.63it/s] {'loss': 0.366, 'grad_norm': 0.6944957971572876, 'learning_rate': 9.97096577536598e-06, 'epoch': 0.39}
13%|█▎ | 1509/11526 [15:44<1:42:36, 1.63it/s] 13%|█▎ | 1510/11526 [15:45<1:42:31, 1.63it/s] {'loss': 0.2851, 'grad_norm': 0.5598714351654053, 'learning_rate': 9.970802591810526e-06, 'epoch': 0.39}
13%|█▎ | 1510/11526 [15:45<1:42:31, 1.63it/s] 13%|█▎ | 1511/11526 [15:46<1:43:00, 1.62it/s] {'loss': 0.2643, 'grad_norm': 0.5990560054779053, 'learning_rate': 9.970638952304775e-06, 'epoch': 0.39}
13%|█▎ | 1511/11526 [15:46<1:43:00, 1.62it/s] 13%|█▎ | 1512/11526 [15:46<1:42:51, 1.62it/s] {'loss': 0.297, 'grad_norm': 0.5907925963401794, 'learning_rate': 9.970474856863736e-06, 'epoch': 0.39}
13%|█▎ | 1512/11526 [15:46<1:42:51, 1.62it/s] 13%|█▎ | 1513/11526 [15:47<1:42:50, 1.62it/s] {'loss': 0.2993, 'grad_norm': 0.673758864402771, 'learning_rate': 9.970310305502464e-06, 'epoch': 0.39}
13%|█▎ | 1513/11526 [15:47<1:42:50, 1.62it/s] 13%|█▎ | 1514/11526 [15:47<1:42:41, 1.62it/s] {'loss': 0.3574, 'grad_norm': 0.7369756102561951, 'learning_rate': 9.970145298236051e-06, 'epoch': 0.39}
13%|█▎ | 1514/11526 [15:48<1:42:41, 1.62it/s] 13%|█▎ | 1515/11526 [15:48<1:42:38, 1.63it/s] {'loss': 0.3001, 'grad_norm': 0.626036524772644, 'learning_rate': 9.969979835079632e-06, 'epoch': 0.39}
13%|█▎ | 1515/11526 [15:48<1:42:38, 1.63it/s] 13%|█▎ | 1516/11526 [15:49<1:42:37, 1.63it/s] {'loss': 0.2555, 'grad_norm': 0.5705260634422302, 'learning_rate': 9.969813916048385e-06, 'epoch': 0.39}
13%|█▎ | 1516/11526 [15:49<1:42:37, 1.63it/s] 13%|█▎ | 1517/11526 [15:49<1:42:34, 1.63it/s] {'loss': 0.3444, 'grad_norm': 0.6455205678939819, 'learning_rate': 9.969647541157528e-06, 'epoch': 0.39}
13%|█▎ | 1517/11526 [15:49<1:42:34, 1.63it/s] 13%|█▎ | 1518/11526 [15:50<1:42:28, 1.63it/s] {'loss': 0.3167, 'grad_norm': 0.6530055403709412, 'learning_rate': 9.969480710422322e-06, 'epoch': 0.4}
13%|█▎ | 1518/11526 [15:50<1:42:28, 1.63it/s] 13%|█▎ | 1519/11526 [15:50<1:42:25, 1.63it/s] {'loss': 0.2502, 'grad_norm': 0.6748886108398438, 'learning_rate': 9.969313423858069e-06, 'epoch': 0.4}
13%|█▎ | 1519/11526 [15:51<1:42:25, 1.63it/s] 13%|█▎ | 1520/11526 [15:51<1:42:27, 1.63it/s] {'loss': 0.2498, 'grad_norm': 0.6356087327003479, 'learning_rate': 9.969145681480116e-06, 'epoch': 0.4}
13%|█▎ | 1520/11526 [15:51<1:42:27, 1.63it/s] 13%|█▎ | 1521/11526 [15:52<1:42:30, 1.63it/s] {'loss': 0.2916, 'grad_norm': 0.8419787287712097, 'learning_rate': 9.968977483303848e-06, 'epoch': 0.4}
13%|█▎ | 1521/11526 [15:52<1:42:30, 1.63it/s] 13%|█▎ | 1522/11526 [15:52<1:42:29, 1.63it/s] {'loss': 0.2648, 'grad_norm': 0.5774216055870056, 'learning_rate': 9.968808829344692e-06, 'epoch': 0.4}
13%|█▎ | 1522/11526 [15:52<1:42:29, 1.63it/s] 13%|█▎ | 1523/11526 [15:53<1:42:36, 1.62it/s] {'loss': 0.3157, 'grad_norm': 0.70695960521698, 'learning_rate': 9.968639719618121e-06, 'epoch': 0.4}
13%|█▎ | 1523/11526 [15:53<1:42:36, 1.62it/s] 13%|█▎ | 1524/11526 [15:54<1:42:30, 1.63it/s] {'loss': 0.2458, 'grad_norm': 0.58070969581604, 'learning_rate': 9.968470154139643e-06, 'epoch': 0.4}
13%|█▎ | 1524/11526 [15:54<1:42:30, 1.63it/s] 13%|█▎ | 1525/11526 [15:54<1:42:29, 1.63it/s] {'loss': 0.2484, 'grad_norm': 0.5860843658447266, 'learning_rate': 9.968300132924812e-06, 'epoch': 0.4}
13%|█▎ | 1525/11526 [15:54<1:42:29, 1.63it/s] 13%|█▎ | 1526/11526 [15:55<1:42:35, 1.62it/s] {'loss': 0.3574, 'grad_norm': 0.6039165258407593, 'learning_rate': 9.968129655989224e-06, 'epoch': 0.4}
13%|█▎ | 1526/11526 [15:55<1:42:35, 1.62it/s] 13%|█▎ | 1527/11526 [15:55<1:42:27, 1.63it/s] {'loss': 0.3662, 'grad_norm': 0.8049788475036621, 'learning_rate': 9.967958723348519e-06, 'epoch': 0.4}
13%|█▎ | 1527/11526 [15:56<1:42:27, 1.63it/s] 13%|█▎ | 1528/11526 [15:56<1:42:26, 1.63it/s] {'loss': 0.2543, 'grad_norm': 0.6056718826293945, 'learning_rate': 9.967787335018372e-06, 'epoch': 0.4}
13%|█▎ | 1528/11526 [15:56<1:42:26, 1.63it/s] 13%|█▎ | 1529/11526 [15:57<1:42:22, 1.63it/s] {'loss': 0.3223, 'grad_norm': 0.6574417352676392, 'learning_rate': 9.967615491014507e-06, 'epoch': 0.4}
13%|█▎ | 1529/11526 [15:57<1:42:22, 1.63it/s] 13%|█▎ | 1530/11526 [15:57<1:42:19, 1.63it/s] {'loss': 0.3417, 'grad_norm': 0.6173158288002014, 'learning_rate': 9.967443191352681e-06, 'epoch': 0.4}
13%|█▎ | 1530/11526 [15:57<1:42:19, 1.63it/s] 13%|█▎ | 1531/11526 [15:58<1:42:26, 1.63it/s] {'loss': 0.375, 'grad_norm': 0.6806215047836304, 'learning_rate': 9.967270436048704e-06, 'epoch': 0.4}
13%|█▎ | 1531/11526 [15:58<1:42:26, 1.63it/s] 13%|█▎ | 1532/11526 [15:58<1:42:21, 1.63it/s] {'loss': 0.3397, 'grad_norm': 0.6920742988586426, 'learning_rate': 9.967097225118421e-06, 'epoch': 0.4}
13%|█▎ | 1532/11526 [15:59<1:42:21, 1.63it/s] 13%|█▎ | 1533/11526 [15:59<1:42:17, 1.63it/s] {'loss': 0.3374, 'grad_norm': 0.6963880062103271, 'learning_rate': 9.966923558577717e-06, 'epoch': 0.4}
13%|█▎ | 1533/11526 [15:59<1:42:17, 1.63it/s] 13%|█▎ | 1534/11526 [16:00<1:42:15, 1.63it/s] {'loss': 0.3396, 'grad_norm': 0.6495581865310669, 'learning_rate': 9.966749436442525e-06, 'epoch': 0.4}
13%|█▎ | 1534/11526 [16:00<1:42:15, 1.63it/s] 13%|█▎ | 1535/11526 [16:00<1:42:16, 1.63it/s] {'loss': 0.3196, 'grad_norm': 0.6844871640205383, 'learning_rate': 9.966574858728813e-06, 'epoch': 0.4}
13%|█▎ | 1535/11526 [16:00<1:42:16, 1.63it/s] 13%|█▎ | 1536/11526 [16:01<1:42:13, 1.63it/s] {'loss': 0.3817, 'grad_norm': 0.7423290610313416, 'learning_rate': 9.966399825452596e-06, 'epoch': 0.4}
13%|█▎ | 1536/11526 [16:01<1:42:13, 1.63it/s] 13%|█▎ | 1537/11526 [16:02<1:42:14, 1.63it/s] {'loss': 0.2732, 'grad_norm': 0.6014947891235352, 'learning_rate': 9.966224336629933e-06, 'epoch': 0.4}
13%|█▎ | 1537/11526 [16:02<1:42:14, 1.63it/s] 13%|█▎ | 1538/11526 [16:02<1:42:25, 1.63it/s] {'loss': 0.3624, 'grad_norm': 0.6578903794288635, 'learning_rate': 9.966048392276915e-06, 'epoch': 0.4}
13%|█▎ | 1538/11526 [16:02<1:42:25, 1.63it/s] 13%|█▎ | 1539/11526 [16:03<1:42:18, 1.63it/s] {'loss': 0.2916, 'grad_norm': 0.648206889629364, 'learning_rate': 9.965871992409684e-06, 'epoch': 0.4}
13%|█▎ | 1539/11526 [16:03<1:42:18, 1.63it/s] 13%|█▎ | 1540/11526 [16:03<1:42:14, 1.63it/s] {'loss': 0.3181, 'grad_norm': 0.6955523490905762, 'learning_rate': 9.965695137044418e-06, 'epoch': 0.4}
13%|█▎ | 1540/11526 [16:04<1:42:14, 1.63it/s] 13%|█▎ | 1541/11526 [16:04<1:42:11, 1.63it/s] {'loss': 0.2933, 'grad_norm': 0.5705356001853943, 'learning_rate': 9.965517826197339e-06, 'epoch': 0.4}
13%|█▎ | 1541/11526 [16:04<1:42:11, 1.63it/s] 13%|█▎ | 1542/11526 [16:05<1:42:15, 1.63it/s] {'loss': 0.263, 'grad_norm': 0.6171284317970276, 'learning_rate': 9.965340059884713e-06, 'epoch': 0.4}
13%|█▎ | 1542/11526 [16:05<1:42:15, 1.63it/s] 13%|█▎ | 1543/11526 [16:05<1:42:21, 1.63it/s] {'loss': 0.2817, 'grad_norm': 0.6368576288223267, 'learning_rate': 9.965161838122847e-06, 'epoch': 0.4}
13%|█▎ | 1543/11526 [16:05<1:42:21, 1.63it/s] 13%|█▎ | 1544/11526 [16:06<1:42:16, 1.63it/s] {'loss': 0.282, 'grad_norm': 0.7033287882804871, 'learning_rate': 9.964983160928085e-06, 'epoch': 0.4}
13%|█▎ | 1544/11526 [16:06<1:42:16, 1.63it/s] 13%|█▎ | 1545/11526 [16:06<1:42:12, 1.63it/s] {'loss': 0.3043, 'grad_norm': 0.6579833626747131, 'learning_rate': 9.964804028316819e-06, 'epoch': 0.4}
13%|█▎ | 1545/11526 [16:07<1:42:12, 1.63it/s] 13%|█▎ | 1546/11526 [16:07<1:42:16, 1.63it/s] {'loss': 0.226, 'grad_norm': 0.5328530073165894, 'learning_rate': 9.96462444030548e-06, 'epoch': 0.4}
13%|█▎ | 1546/11526 [16:07<1:42:16, 1.63it/s] 13%|█▎ | 1547/11526 [16:08<1:42:18, 1.63it/s] {'loss': 0.3225, 'grad_norm': 0.6900309324264526, 'learning_rate': 9.964444396910538e-06, 'epoch': 0.4}
13%|█▎ | 1547/11526 [16:08<1:42:18, 1.63it/s] 13%|█▎ | 1548/11526 [16:08<1:42:17, 1.63it/s] {'loss': 0.3059, 'grad_norm': 0.5781267285346985, 'learning_rate': 9.96426389814851e-06, 'epoch': 0.4}
13%|█▎ | 1548/11526 [16:08<1:42:17, 1.63it/s] 13%|█▎ | 1549/11526 [16:09<1:42:17, 1.63it/s] {'loss': 0.2489, 'grad_norm': 0.5725744962692261, 'learning_rate': 9.96408294403595e-06, 'epoch': 0.4}
13%|█▎ | 1549/11526 [16:09<1:42:17, 1.63it/s] 13%|█▎ | 1550/11526 [16:10<1:42:13, 1.63it/s] {'loss': 0.3116, 'grad_norm': 1.0252164602279663, 'learning_rate': 9.96390153458946e-06, 'epoch': 0.4}
13%|█▎ | 1550/11526 [16:10<1:42:13, 1.63it/s] 13%|█▎ | 1551/11526 [16:10<1:42:13, 1.63it/s] {'loss': 0.2505, 'grad_norm': 0.5662574172019958, 'learning_rate': 9.963719669825678e-06, 'epoch': 0.4}
13%|█▎ | 1551/11526 [16:10<1:42:13, 1.63it/s] 13%|█▎ | 1552/11526 [16:11<1:42:09, 1.63it/s] {'loss': 0.3862, 'grad_norm': 0.7927621006965637, 'learning_rate': 9.963537349761283e-06, 'epoch': 0.4}
13%|█▎ | 1552/11526 [16:11<1:42:09, 1.63it/s] 13%|█▎ | 1553/11526 [16:11<1:42:14, 1.63it/s] {'loss': 0.3085, 'grad_norm': 0.6548159718513489, 'learning_rate': 9.963354574413004e-06, 'epoch': 0.4}
13%|█▎ | 1553/11526 [16:12<1:42:14, 1.63it/s] 13%|█▎ | 1554/11526 [16:12<1:42:07, 1.63it/s] {'loss': 0.2828, 'grad_norm': 0.603880763053894, 'learning_rate': 9.9631713437976e-06, 'epoch': 0.4}
13%|█▎ | 1554/11526 [16:12<1:42:07, 1.63it/s] 13%|█▎ | 1555/11526 [16:13<1:42:05, 1.63it/s] {'loss': 0.2669, 'grad_norm': 0.7185399532318115, 'learning_rate': 9.962987657931883e-06, 'epoch': 0.4}
13%|█▎ | 1555/11526 [16:13<1:42:05, 1.63it/s] 13%|█▎ | 1556/11526 [16:13<1:42:08, 1.63it/s] {'loss': 0.2745, 'grad_norm': 0.6123446226119995, 'learning_rate': 9.9628035168327e-06, 'epoch': 0.4}
13%|█▎ | 1556/11526 [16:13<1:42:08, 1.63it/s] 14%|█▎ | 1557/11526 [16:14<1:42:06, 1.63it/s] {'loss': 0.3781, 'grad_norm': 0.7509409785270691, 'learning_rate': 9.962618920516941e-06, 'epoch': 0.41}
14%|█▎ | 1557/11526 [16:14<1:42:06, 1.63it/s] 14%|█▎ | 1558/11526 [16:14<1:42:10, 1.63it/s] {'loss': 0.2358, 'grad_norm': 0.5133228898048401, 'learning_rate': 9.962433869001538e-06, 'epoch': 0.41}
14%|█▎ | 1558/11526 [16:15<1:42:10, 1.63it/s] 14%|█▎ | 1559/11526 [16:15<1:42:08, 1.63it/s] {'loss': 0.3327, 'grad_norm': 0.6816455125808716, 'learning_rate': 9.962248362303466e-06, 'epoch': 0.41}
14%|█▎ | 1559/11526 [16:15<1:42:08, 1.63it/s] 14%|█▎ | 1560/11526 [16:16<1:42:04, 1.63it/s] {'loss': 0.2722, 'grad_norm': 0.6709058880805969, 'learning_rate': 9.96206240043974e-06, 'epoch': 0.41}
14%|█▎ | 1560/11526 [16:16<1:42:04, 1.63it/s] 14%|█▎ | 1561/11526 [16:16<1:42:12, 1.62it/s] {'loss': 0.2729, 'grad_norm': 0.6272708177566528, 'learning_rate': 9.961875983427417e-06, 'epoch': 0.41}
14%|█▎ | 1561/11526 [16:16<1:42:12, 1.62it/s] 14%|█▎ | 1562/11526 [16:17<1:42:08, 1.63it/s] {'loss': 0.2822, 'grad_norm': 0.6176790595054626, 'learning_rate': 9.961689111283598e-06, 'epoch': 0.41}
14%|█▎ | 1562/11526 [16:17<1:42:08, 1.63it/s] 14%|█▎ | 1563/11526 [16:18<1:42:13, 1.62it/s] {'loss': 0.3152, 'grad_norm': 0.5699624419212341, 'learning_rate': 9.961501784025423e-06, 'epoch': 0.41}
14%|█▎ | 1563/11526 [16:18<1:42:13, 1.62it/s] 14%|█▎ | 1564/11526 [16:18<1:42:07, 1.63it/s] {'loss': 0.3553, 'grad_norm': 0.6757445335388184, 'learning_rate': 9.961314001670073e-06, 'epoch': 0.41}
14%|█▎ | 1564/11526 [16:18<1:42:07, 1.63it/s] 14%|█▎ | 1565/11526 [16:19<1:42:03, 1.63it/s] {'loss': 0.3419, 'grad_norm': 0.6463468670845032, 'learning_rate': 9.961125764234774e-06, 'epoch': 0.41}
14%|█▎ | 1565/11526 [16:19<1:42:03, 1.63it/s] 14%|█▎ | 1566/11526 [16:19<1:42:07, 1.63it/s] {'loss': 0.3037, 'grad_norm': 0.6443637013435364, 'learning_rate': 9.960937071736793e-06, 'epoch': 0.41}
14%|█▎ | 1566/11526 [16:19<1:42:07, 1.63it/s] 14%|█▎ | 1567/11526 [16:20<1:42:03, 1.63it/s] {'loss': 0.257, 'grad_norm': 0.6537495851516724, 'learning_rate': 9.960747924193439e-06, 'epoch': 0.41}
14%|█▎ | 1567/11526 [16:20<1:42:03, 1.63it/s] 14%|█▎ | 1568/11526 [16:21<1:42:05, 1.63it/s] {'loss': 0.29, 'grad_norm': 0.6074130535125732, 'learning_rate': 9.960558321622058e-06, 'epoch': 0.41}
14%|█▎ | 1568/11526 [16:21<1:42:05, 1.63it/s] 14%|█▎ | 1569/11526 [16:21<1:42:03, 1.63it/s] {'loss': 0.3465, 'grad_norm': 0.6448304653167725, 'learning_rate': 9.960368264040044e-06, 'epoch': 0.41}
14%|█▎ | 1569/11526 [16:21<1:42:03, 1.63it/s] 14%|█▎ | 1570/11526 [16:22<1:41:57, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.6102621555328369, 'learning_rate': 9.960177751464827e-06, 'epoch': 0.41}
14%|█▎ | 1570/11526 [16:22<1:41:57, 1.63it/s] 14%|█▎ | 1571/11526 [16:22<1:41:59, 1.63it/s] {'loss': 0.3445, 'grad_norm': 0.647519588470459, 'learning_rate': 9.959986783913888e-06, 'epoch': 0.41}
14%|█▎ | 1571/11526 [16:23<1:41:59, 1.63it/s] 14%|█▎ | 1572/11526 [16:23<1:41:57, 1.63it/s] {'loss': 0.2589, 'grad_norm': 0.6337569952011108, 'learning_rate': 9.959795361404739e-06, 'epoch': 0.41}
14%|█▎ | 1572/11526 [16:23<1:41:57, 1.63it/s] 14%|█▎ | 1573/11526 [16:24<1:42:04, 1.63it/s] {'loss': 0.2919, 'grad_norm': 0.6383366584777832, 'learning_rate': 9.959603483954938e-06, 'epoch': 0.41}
14%|█▎ | 1573/11526 [16:24<1:42:04, 1.63it/s] 14%|█▎ | 1574/11526 [16:24<1:42:00, 1.63it/s] {'loss': 0.3772, 'grad_norm': 0.7540466785430908, 'learning_rate': 9.959411151582087e-06, 'epoch': 0.41}
14%|█▎ | 1574/11526 [16:24<1:42:00, 1.63it/s] 14%|█▎ | 1575/11526 [16:25<1:41:56, 1.63it/s] {'loss': 0.2903, 'grad_norm': 0.6072871088981628, 'learning_rate': 9.95921836430383e-06, 'epoch': 0.41}
14%|█▎ | 1575/11526 [16:25<1:41:56, 1.63it/s] 14%|█▎ | 1576/11526 [16:26<1:41:52, 1.63it/s] {'loss': 0.2446, 'grad_norm': 0.5980660915374756, 'learning_rate': 9.959025122137844e-06, 'epoch': 0.41}
14%|█▎ | 1576/11526 [16:26<1:41:52, 1.63it/s] 14%|█▎ | 1577/11526 [16:26<1:41:52, 1.63it/s] {'loss': 0.3613, 'grad_norm': 0.7180606126785278, 'learning_rate': 9.95883142510186e-06, 'epoch': 0.41}
14%|█▎ | 1577/11526 [16:26<1:41:52, 1.63it/s] 14%|█▎ | 1578/11526 [16:27<1:41:55, 1.63it/s] {'loss': 0.3515, 'grad_norm': 0.6593069434165955, 'learning_rate': 9.958637273213643e-06, 'epoch': 0.41}
14%|█▎ | 1578/11526 [16:27<1:41:55, 1.63it/s] 14%|█▎ | 1579/11526 [16:27<1:41:51, 1.63it/s] {'loss': 0.3111, 'grad_norm': 0.6704108715057373, 'learning_rate': 9.958442666491001e-06, 'epoch': 0.41}
14%|█▎ | 1579/11526 [16:27<1:41:51, 1.63it/s] 14%|█▎ | 1580/11526 [16:28<1:41:49, 1.63it/s] {'loss': 0.2613, 'grad_norm': 0.6050119996070862, 'learning_rate': 9.958247604951786e-06, 'epoch': 0.41}
14%|█▎ | 1580/11526 [16:28<1:41:49, 1.63it/s] 14%|█▎ | 1581/11526 [16:29<1:41:47, 1.63it/s] {'loss': 0.2085, 'grad_norm': 0.4987117350101471, 'learning_rate': 9.958052088613889e-06, 'epoch': 0.41}
14%|█▎ | 1581/11526 [16:29<1:41:47, 1.63it/s] 14%|█▎ | 1582/11526 [16:29<1:41:49, 1.63it/s] {'loss': 0.2568, 'grad_norm': 0.5666993260383606, 'learning_rate': 9.957856117495243e-06, 'epoch': 0.41}
14%|█▎ | 1582/11526 [16:29<1:41:49, 1.63it/s] 14%|█▎ | 1583/11526 [16:30<1:41:47, 1.63it/s] {'loss': 0.2971, 'grad_norm': 0.6332944631576538, 'learning_rate': 9.957659691613829e-06, 'epoch': 0.41}
14%|█▎ | 1583/11526 [16:30<1:41:47, 1.63it/s] 14%|█▎ | 1584/11526 [16:30<1:41:45, 1.63it/s] {'loss': 0.3388, 'grad_norm': 0.7260019183158875, 'learning_rate': 9.957462810987657e-06, 'epoch': 0.41}
14%|█▎ | 1584/11526 [16:31<1:41:45, 1.63it/s] 14%|█▍ | 1585/11526 [16:31<1:41:44, 1.63it/s] {'loss': 0.2899, 'grad_norm': 0.5673947930335999, 'learning_rate': 9.95726547563479e-06, 'epoch': 0.41}
14%|█▍ | 1585/11526 [16:31<1:41:44, 1.63it/s] 14%|█▍ | 1586/11526 [16:32<1:41:49, 1.63it/s] {'loss': 0.2866, 'grad_norm': 0.6105451583862305, 'learning_rate': 9.957067685573327e-06, 'epoch': 0.41}
14%|█▍ | 1586/11526 [16:32<1:41:49, 1.63it/s] 14%|█▍ | 1587/11526 [16:32<1:41:44, 1.63it/s] {'loss': 0.2725, 'grad_norm': 0.6170808672904968, 'learning_rate': 9.956869440821412e-06, 'epoch': 0.41}
14%|█▍ | 1587/11526 [16:32<1:41:44, 1.63it/s] 14%|█▍ | 1588/11526 [16:33<1:41:39, 1.63it/s] {'loss': 0.3161, 'grad_norm': 0.6770796179771423, 'learning_rate': 9.956670741397227e-06, 'epoch': 0.41}
14%|█▍ | 1588/11526 [16:33<1:41:39, 1.63it/s] 14%|█▍ | 1589/11526 [16:33<1:41:45, 1.63it/s] {'loss': 0.2974, 'grad_norm': 0.9351621866226196, 'learning_rate': 9.956471587319001e-06, 'epoch': 0.41}
14%|█▍ | 1589/11526 [16:34<1:41:45, 1.63it/s] 14%|█▍ | 1590/11526 [16:34<1:41:44, 1.63it/s] {'loss': 0.2881, 'grad_norm': 0.6564651131629944, 'learning_rate': 9.956271978605e-06, 'epoch': 0.41}
14%|█▍ | 1590/11526 [16:34<1:41:44, 1.63it/s] 14%|█▍ | 1591/11526 [16:35<1:41:40, 1.63it/s] {'loss': 0.3153, 'grad_norm': 0.6644666194915771, 'learning_rate': 9.956071915273533e-06, 'epoch': 0.41}
14%|█▍ | 1591/11526 [16:35<1:41:40, 1.63it/s] 14%|█▍ | 1592/11526 [16:35<1:41:40, 1.63it/s] {'loss': 0.3385, 'grad_norm': 0.6536378860473633, 'learning_rate': 9.95587139734295e-06, 'epoch': 0.41}
14%|█▍ | 1592/11526 [16:35<1:41:40, 1.63it/s] 14%|█▍ | 1593/11526 [16:36<1:41:38, 1.63it/s] {'loss': 0.3371, 'grad_norm': 0.6390582919120789, 'learning_rate': 9.955670424831646e-06, 'epoch': 0.41}
14%|█▍ | 1593/11526 [16:36<1:41:38, 1.63it/s] 14%|█▍ | 1594/11526 [16:37<1:41:41, 1.63it/s] {'loss': 0.4087, 'grad_norm': 0.7096782326698303, 'learning_rate': 9.955468997758053e-06, 'epoch': 0.41}
14%|█▍ | 1594/11526 [16:37<1:41:41, 1.63it/s] 14%|█▍ | 1595/11526 [16:37<1:41:40, 1.63it/s] {'loss': 0.2612, 'grad_norm': 0.6987845301628113, 'learning_rate': 9.955267116140649e-06, 'epoch': 0.42}
14%|█▍ | 1595/11526 [16:37<1:41:40, 1.63it/s] 14%|█▍ | 1596/11526 [16:38<1:41:36, 1.63it/s] {'loss': 0.2819, 'grad_norm': 0.6115595698356628, 'learning_rate': 9.955064779997949e-06, 'epoch': 0.42}
14%|█▍ | 1596/11526 [16:38<1:41:36, 1.63it/s] 14%|█▍ | 1597/11526 [16:38<1:41:37, 1.63it/s] {'loss': 0.3167, 'grad_norm': 0.7143731713294983, 'learning_rate': 9.954861989348516e-06, 'epoch': 0.42}
14%|█▍ | 1597/11526 [16:39<1:41:37, 1.63it/s] 14%|█▍ | 1598/11526 [16:39<1:41:35, 1.63it/s] {'loss': 0.353, 'grad_norm': 0.7060030102729797, 'learning_rate': 9.95465874421095e-06, 'epoch': 0.42}
14%|█▍ | 1598/11526 [16:39<1:41:35, 1.63it/s] 14%|█▍ | 1599/11526 [16:40<1:41:36, 1.63it/s] {'loss': 0.3137, 'grad_norm': 0.6623069047927856, 'learning_rate': 9.954455044603892e-06, 'epoch': 0.42}
14%|█▍ | 1599/11526 [16:40<1:41:36, 1.63it/s] 14%|█▍ | 1600/11526 [16:40<1:41:37, 1.63it/s] {'loss': 0.3055, 'grad_norm': 0.6095805764198303, 'learning_rate': 9.954250890546026e-06, 'epoch': 0.42}
14%|█▍ | 1600/11526 [16:40<1:41:37, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.26it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.7429405450820923, 'eval_runtime': 1.9543, 'eval_samples_per_second': 102.341, 'eval_steps_per_second': 6.652, 'epoch': 0.42}
14%|█▍ | 1600/11526 [16:42<1:41:37, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 14%|█▍ | 1601/11526 [16:43<3:18:52, 1.20s/it] {'loss': 0.3223, 'grad_norm': 0.7427825331687927, 'learning_rate': 9.954046282056082e-06, 'epoch': 0.42}
14%|█▍ | 1601/11526 [16:43<3:18:52, 1.20s/it] 14%|█▍ | 1602/11526 [16:43<2:49:38, 1.03s/it] {'loss': 0.3091, 'grad_norm': 0.734635591506958, 'learning_rate': 9.953841219152826e-06, 'epoch': 0.42}
14%|█▍ | 1602/11526 [16:44<2:49:38, 1.03s/it] 14%|█▍ | 1603/11526 [16:44<2:29:10, 1.11it/s] {'loss': 0.2954, 'grad_norm': 0.7873830199241638, 'learning_rate': 9.953635701855066e-06, 'epoch': 0.42}
14%|█▍ | 1603/11526 [16:44<2:29:10, 1.11it/s] 14%|█▍ | 1604/11526 [16:45<2:14:51, 1.23it/s] {'loss': 0.339, 'grad_norm': 0.7668554782867432, 'learning_rate': 9.953429730181653e-06, 'epoch': 0.42}
14%|█▍ | 1604/11526 [16:45<2:14:51, 1.23it/s] 14%|█▍ | 1605/11526 [16:45<2:04:47, 1.33it/s] {'loss': 0.2612, 'grad_norm': 0.5613839626312256, 'learning_rate': 9.953223304151486e-06, 'epoch': 0.42}
14%|█▍ | 1605/11526 [16:45<2:04:47, 1.33it/s] 14%|█▍ | 1606/11526 [16:46<1:57:45, 1.40it/s] {'loss': 0.2856, 'grad_norm': 0.6514965891838074, 'learning_rate': 9.95301642378349e-06, 'epoch': 0.42}
14%|█▍ | 1606/11526 [16:46<1:57:45, 1.40it/s] 14%|█▍ | 1607/11526 [16:47<1:52:52, 1.46it/s] {'loss': 0.2952, 'grad_norm': 0.6162201762199402, 'learning_rate': 9.952809089096648e-06, 'epoch': 0.42}
14%|█▍ | 1607/11526 [16:47<1:52:52, 1.46it/s] 14%|█▍ | 1608/11526 [16:47<1:49:27, 1.51it/s] {'loss': 0.2745, 'grad_norm': 0.535861074924469, 'learning_rate': 9.952601300109976e-06, 'epoch': 0.42}
14%|█▍ | 1608/11526 [16:47<1:49:27, 1.51it/s] 14%|█▍ | 1609/11526 [16:48<1:47:02, 1.54it/s] {'loss': 0.2867, 'grad_norm': 0.6220869421958923, 'learning_rate': 9.952393056842534e-06, 'epoch': 0.42}
14%|█▍ | 1609/11526 [16:48<1:47:02, 1.54it/s] 14%|█▍ | 1610/11526 [16:48<1:45:21, 1.57it/s] {'loss': 0.342, 'grad_norm': 0.5927568078041077, 'learning_rate': 9.952184359313422e-06, 'epoch': 0.42}
14%|█▍ | 1610/11526 [16:48<1:45:21, 1.57it/s] 14%|█▍ | 1611/11526 [16:49<1:44:09, 1.59it/s] {'loss': 0.3388, 'grad_norm': 0.6408463716506958, 'learning_rate': 9.951975207541784e-06, 'epoch': 0.42}
14%|█▍ | 1611/11526 [16:49<1:44:09, 1.59it/s] 14%|█▍ | 1612/11526 [16:50<1:43:17, 1.60it/s] {'loss': 0.3573, 'grad_norm': 0.633581817150116, 'learning_rate': 9.951765601546805e-06, 'epoch': 0.42}
14%|█▍ | 1612/11526 [16:50<1:43:17, 1.60it/s] 14%|█▍ | 1613/11526 [16:50<1:42:44, 1.61it/s] {'loss': 0.3076, 'grad_norm': 0.7041019201278687, 'learning_rate': 9.95155554134771e-06, 'epoch': 0.42}
14%|█▍ | 1613/11526 [16:50<1:42:44, 1.61it/s] 14%|█▍ | 1614/11526 [16:51<1:42:18, 1.61it/s] {'loss': 0.2628, 'grad_norm': 0.5862003564834595, 'learning_rate': 9.951345026963768e-06, 'epoch': 0.42}
14%|█▍ | 1614/11526 [16:51<1:42:18, 1.61it/s] 14%|█▍ | 1615/11526 [16:51<1:42:03, 1.62it/s] {'loss': 0.3058, 'grad_norm': 0.6540805697441101, 'learning_rate': 9.951134058414289e-06, 'epoch': 0.42}
14%|█▍ | 1615/11526 [16:52<1:42:03, 1.62it/s] 14%|█▍ | 1616/11526 [16:52<1:41:50, 1.62it/s] {'loss': 0.273, 'grad_norm': 0.669191837310791, 'learning_rate': 9.950922635718622e-06, 'epoch': 0.42}
14%|█▍ | 1616/11526 [16:52<1:41:50, 1.62it/s] 14%|█▍ | 1617/11526 [16:53<1:41:45, 1.62it/s] {'loss': 0.283, 'grad_norm': 0.6095407009124756, 'learning_rate': 9.95071075889616e-06, 'epoch': 0.42}
14%|█▍ | 1617/11526 [16:53<1:41:45, 1.62it/s] 14%|█▍ | 1618/11526 [16:53<1:41:38, 1.62it/s] {'loss': 0.286, 'grad_norm': 0.5453087687492371, 'learning_rate': 9.950498427966341e-06, 'epoch': 0.42}
14%|█▍ | 1618/11526 [16:53<1:41:38, 1.62it/s] 14%|█▍ | 1619/11526 [16:54<1:41:31, 1.63it/s] {'loss': 0.3656, 'grad_norm': 0.7548046708106995, 'learning_rate': 9.950285642948638e-06, 'epoch': 0.42}
14%|█▍ | 1619/11526 [16:54<1:41:31, 1.63it/s] 14%|█▍ | 1620/11526 [16:55<1:46:48, 1.55it/s] {'loss': 0.2496, 'grad_norm': 0.521570086479187, 'learning_rate': 9.950072403862572e-06, 'epoch': 0.42}
14%|█▍ | 1620/11526 [16:55<1:46:48, 1.55it/s] 14%|█▍ | 1621/11526 [16:55<1:45:17, 1.57it/s] {'loss': 0.3196, 'grad_norm': 0.5962499380111694, 'learning_rate': 9.949858710727698e-06, 'epoch': 0.42}
14%|█▍ | 1621/11526 [16:55<1:45:17, 1.57it/s] 14%|█▍ | 1622/11526 [16:56<1:44:12, 1.58it/s] {'loss': 0.3519, 'grad_norm': 0.7130600214004517, 'learning_rate': 9.949644563563621e-06, 'epoch': 0.42}
14%|█▍ | 1622/11526 [16:56<1:44:12, 1.58it/s] 14%|█▍ | 1623/11526 [16:56<1:43:23, 1.60it/s] {'loss': 0.2558, 'grad_norm': 0.5599803328514099, 'learning_rate': 9.94942996238998e-06, 'epoch': 0.42}
14%|█▍ | 1623/11526 [16:57<1:43:23, 1.60it/s] 14%|█▍ | 1624/11526 [16:57<1:42:44, 1.61it/s] {'loss': 0.4304, 'grad_norm': 0.7020969390869141, 'learning_rate': 9.949214907226464e-06, 'epoch': 0.42}
14%|█▍ | 1624/11526 [16:57<1:42:44, 1.61it/s] 14%|█▍ | 1625/11526 [16:58<1:42:18, 1.61it/s] {'loss': 0.3196, 'grad_norm': 0.7404207587242126, 'learning_rate': 9.948999398092796e-06, 'epoch': 0.42}
14%|█▍ | 1625/11526 [16:58<1:42:18, 1.61it/s] 14%|█▍ | 1626/11526 [16:58<1:42:02, 1.62it/s] {'loss': 0.2876, 'grad_norm': 0.5772219300270081, 'learning_rate': 9.948783435008744e-06, 'epoch': 0.42}
14%|█▍ | 1626/11526 [16:58<1:42:02, 1.62it/s] 14%|█▍ | 1627/11526 [16:59<1:41:49, 1.62it/s] {'loss': 0.3097, 'grad_norm': 0.6458829641342163, 'learning_rate': 9.94856701799412e-06, 'epoch': 0.42}
14%|█▍ | 1627/11526 [16:59<1:41:49, 1.62it/s] 14%|█▍ | 1628/11526 [17:00<1:41:38, 1.62it/s] {'loss': 0.3083, 'grad_norm': 0.6192994117736816, 'learning_rate': 9.94835014706877e-06, 'epoch': 0.42}
14%|█▍ | 1628/11526 [17:00<1:41:38, 1.62it/s] 14%|█▍ | 1629/11526 [17:00<1:41:35, 1.62it/s] {'loss': 0.3604, 'grad_norm': 0.6595682501792908, 'learning_rate': 9.948132822252592e-06, 'epoch': 0.42}
14%|█▍ | 1629/11526 [17:00<1:41:35, 1.62it/s] 14%|█▍ | 1630/11526 [17:01<1:41:29, 1.63it/s] {'loss': 0.2935, 'grad_norm': 0.5797795057296753, 'learning_rate': 9.947915043565516e-06, 'epoch': 0.42}
14%|█▍ | 1630/11526 [17:01<1:41:29, 1.63it/s] 14%|█▍ | 1631/11526 [17:01<1:41:26, 1.63it/s] {'loss': 0.26, 'grad_norm': 0.5381667613983154, 'learning_rate': 9.94769681102752e-06, 'epoch': 0.42}
14%|█▍ | 1631/11526 [17:01<1:41:26, 1.63it/s] 14%|█▍ | 1632/11526 [17:02<1:41:27, 1.63it/s] {'loss': 0.2988, 'grad_norm': 0.6840410232543945, 'learning_rate': 9.947478124658622e-06, 'epoch': 0.42}
14%|█▍ | 1632/11526 [17:02<1:41:27, 1.63it/s] 14%|█▍ | 1633/11526 [17:03<1:41:32, 1.62it/s] {'loss': 0.3186, 'grad_norm': 0.6511109471321106, 'learning_rate': 9.947258984478882e-06, 'epoch': 0.43}
14%|█▍ | 1633/11526 [17:03<1:41:32, 1.62it/s] 14%|█▍ | 1634/11526 [17:03<1:41:26, 1.63it/s] {'loss': 0.2478, 'grad_norm': 0.5933401584625244, 'learning_rate': 9.947039390508397e-06, 'epoch': 0.43}
14%|█▍ | 1634/11526 [17:03<1:41:26, 1.63it/s] 14%|█▍ | 1635/11526 [17:04<1:41:20, 1.63it/s] {'loss': 0.2632, 'grad_norm': 0.5587055683135986, 'learning_rate': 9.946819342767313e-06, 'epoch': 0.43}
14%|█▍ | 1635/11526 [17:04<1:41:20, 1.63it/s] 14%|█▍ | 1636/11526 [17:04<1:41:13, 1.63it/s] {'loss': 0.3108, 'grad_norm': 0.7572540044784546, 'learning_rate': 9.946598841275812e-06, 'epoch': 0.43}
14%|█▍ | 1636/11526 [17:05<1:41:13, 1.63it/s] 14%|█▍ | 1637/11526 [17:05<1:41:10, 1.63it/s] {'loss': 0.3174, 'grad_norm': 0.6526702642440796, 'learning_rate': 9.94637788605412e-06, 'epoch': 0.43}
14%|█▍ | 1637/11526 [17:05<1:41:10, 1.63it/s] 14%|█▍ | 1638/11526 [17:06<1:41:22, 1.63it/s] {'loss': 0.3545, 'grad_norm': 0.664218008518219, 'learning_rate': 9.946156477122503e-06, 'epoch': 0.43}
14%|█▍ | 1638/11526 [17:06<1:41:22, 1.63it/s] 14%|█▍ | 1639/11526 [17:06<1:41:18, 1.63it/s] {'loss': 0.2298, 'grad_norm': 0.5754586458206177, 'learning_rate': 9.945934614501274e-06, 'epoch': 0.43}
14%|█▍ | 1639/11526 [17:06<1:41:18, 1.63it/s] 14%|█▍ | 1640/11526 [17:07<1:41:17, 1.63it/s] {'loss': 0.2925, 'grad_norm': 0.5998905897140503, 'learning_rate': 9.94571229821078e-06, 'epoch': 0.43}
14%|█▍ | 1640/11526 [17:07<1:41:17, 1.63it/s] 14%|█▍ | 1641/11526 [17:08<1:41:17, 1.63it/s] {'loss': 0.2758, 'grad_norm': 0.5803450345993042, 'learning_rate': 9.945489528271415e-06, 'epoch': 0.43}
14%|█▍ | 1641/11526 [17:08<1:41:17, 1.63it/s] 14%|█▍ | 1642/11526 [17:08<1:41:13, 1.63it/s] {'loss': 0.2762, 'grad_norm': 0.6439971327781677, 'learning_rate': 9.94526630470361e-06, 'epoch': 0.43}
14%|█▍ | 1642/11526 [17:08<1:41:13, 1.63it/s] 14%|█▍ | 1643/11526 [17:09<1:41:13, 1.63it/s] {'loss': 0.282, 'grad_norm': 0.5945114493370056, 'learning_rate': 9.945042627527844e-06, 'epoch': 0.43}
14%|█▍ | 1643/11526 [17:09<1:41:13, 1.63it/s] 14%|█▍ | 1644/11526 [17:09<1:41:13, 1.63it/s] {'loss': 0.3141, 'grad_norm': 0.6033275127410889, 'learning_rate': 9.94481849676463e-06, 'epoch': 0.43}
14%|█▍ | 1644/11526 [17:09<1:41:13, 1.63it/s] 14%|█▍ | 1645/11526 [17:10<1:41:09, 1.63it/s] {'loss': 0.2566, 'grad_norm': 0.5597129464149475, 'learning_rate': 9.94459391243453e-06, 'epoch': 0.43}
14%|█▍ | 1645/11526 [17:10<1:41:09, 1.63it/s] 14%|█▍ | 1646/11526 [17:11<1:41:10, 1.63it/s] {'loss': 0.236, 'grad_norm': 0.5715410709381104, 'learning_rate': 9.944368874558142e-06, 'epoch': 0.43}
14%|█▍ | 1646/11526 [17:11<1:41:10, 1.63it/s] 14%|█▍ | 1647/11526 [17:11<1:41:06, 1.63it/s] {'loss': 0.2345, 'grad_norm': 0.5653513073921204, 'learning_rate': 9.944143383156109e-06, 'epoch': 0.43}
14%|█▍ | 1647/11526 [17:11<1:41:06, 1.63it/s] 14%|█▍ | 1648/11526 [17:12<1:41:03, 1.63it/s] {'loss': 0.1992, 'grad_norm': 0.5366659164428711, 'learning_rate': 9.943917438249114e-06, 'epoch': 0.43}
14%|█▍ | 1648/11526 [17:12<1:41:03, 1.63it/s] 14%|█▍ | 1649/11526 [17:12<1:41:06, 1.63it/s] {'loss': 0.2866, 'grad_norm': 0.6606447100639343, 'learning_rate': 9.94369103985788e-06, 'epoch': 0.43}
14%|█▍ | 1649/11526 [17:13<1:41:06, 1.63it/s] 14%|█▍ | 1650/11526 [17:13<1:41:05, 1.63it/s] {'loss': 0.3062, 'grad_norm': 0.6068494319915771, 'learning_rate': 9.94346418800318e-06, 'epoch': 0.43}
14%|█▍ | 1650/11526 [17:13<1:41:05, 1.63it/s] 14%|█▍ | 1651/11526 [17:14<1:41:03, 1.63it/s] {'loss': 0.3207, 'grad_norm': 0.673321008682251, 'learning_rate': 9.943236882705813e-06, 'epoch': 0.43}
14%|█▍ | 1651/11526 [17:14<1:41:03, 1.63it/s] 14%|█▍ | 1652/11526 [17:14<1:41:01, 1.63it/s] {'loss': 0.2761, 'grad_norm': 0.7097437381744385, 'learning_rate': 9.943009123986636e-06, 'epoch': 0.43}
14%|█▍ | 1652/11526 [17:14<1:41:01, 1.63it/s] 14%|█▍ | 1653/11526 [17:15<1:41:00, 1.63it/s] {'loss': 0.2903, 'grad_norm': 0.6128035187721252, 'learning_rate': 9.942780911866536e-06, 'epoch': 0.43}
14%|█▍ | 1653/11526 [17:15<1:41:00, 1.63it/s] 14%|█▍ | 1654/11526 [17:15<1:40:57, 1.63it/s] {'loss': 0.3189, 'grad_norm': 0.6037086844444275, 'learning_rate': 9.942552246366449e-06, 'epoch': 0.43}
14%|█▍ | 1654/11526 [17:16<1:40:57, 1.63it/s] 14%|█▍ | 1655/11526 [17:16<1:40:59, 1.63it/s] {'loss': 0.2997, 'grad_norm': 0.6417410373687744, 'learning_rate': 9.942323127507347e-06, 'epoch': 0.43}
14%|█▍ | 1655/11526 [17:16<1:40:59, 1.63it/s] 14%|█▍ | 1656/11526 [17:17<1:41:06, 1.63it/s] {'loss': 0.2754, 'grad_norm': 0.6574549674987793, 'learning_rate': 9.942093555310247e-06, 'epoch': 0.43}
14%|█▍ | 1656/11526 [17:17<1:41:06, 1.63it/s] 14%|█▍ | 1657/11526 [17:17<1:41:02, 1.63it/s] {'loss': 0.289, 'grad_norm': 0.5834922194480896, 'learning_rate': 9.941863529796206e-06, 'epoch': 0.43}
14%|█▍ | 1657/11526 [17:17<1:41:02, 1.63it/s] 14%|█▍ | 1658/11526 [17:18<1:40:59, 1.63it/s] {'loss': 0.2586, 'grad_norm': 0.6523821353912354, 'learning_rate': 9.941633050986325e-06, 'epoch': 0.43}
14%|█▍ | 1658/11526 [17:18<1:40:59, 1.63it/s] 14%|█▍ | 1659/11526 [17:19<1:41:02, 1.63it/s] {'loss': 0.2896, 'grad_norm': 0.602878987789154, 'learning_rate': 9.941402118901743e-06, 'epoch': 0.43}
14%|█▍ | 1659/11526 [17:19<1:41:02, 1.63it/s] 14%|█▍ | 1660/11526 [17:19<1:40:58, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.5652886033058167, 'learning_rate': 9.941170733563645e-06, 'epoch': 0.43}
14%|█▍ | 1660/11526 [17:19<1:40:58, 1.63it/s] 14%|█▍ | 1661/11526 [17:20<1:41:23, 1.62it/s] {'loss': 0.2338, 'grad_norm': 0.5661214590072632, 'learning_rate': 9.940938894993251e-06, 'epoch': 0.43}
14%|█▍ | 1661/11526 [17:20<1:41:23, 1.62it/s] 14%|█▍ | 1662/11526 [17:20<1:41:17, 1.62it/s] {'loss': 0.2692, 'grad_norm': 0.6314307451248169, 'learning_rate': 9.940706603211831e-06, 'epoch': 0.43}
14%|█▍ | 1662/11526 [17:21<1:41:17, 1.62it/s] 14%|█▍ | 1663/11526 [17:21<1:41:12, 1.62it/s] {'loss': 0.2927, 'grad_norm': 0.6654487252235413, 'learning_rate': 9.94047385824069e-06, 'epoch': 0.43}
14%|█▍ | 1663/11526 [17:21<1:41:12, 1.62it/s] 14%|█▍ | 1664/11526 [17:22<1:41:07, 1.63it/s] {'loss': 0.2661, 'grad_norm': 0.5393458604812622, 'learning_rate': 9.940240660101173e-06, 'epoch': 0.43}
14%|█▍ | 1664/11526 [17:22<1:41:07, 1.63it/s] 14%|█▍ | 1665/11526 [17:22<1:41:02, 1.63it/s] {'loss': 0.3112, 'grad_norm': 0.6225635409355164, 'learning_rate': 9.940007008814676e-06, 'epoch': 0.43}
14%|█▍ | 1665/11526 [17:22<1:41:02, 1.63it/s] 14%|█▍ | 1666/11526 [17:23<1:41:05, 1.63it/s] {'loss': 0.3286, 'grad_norm': 0.643736720085144, 'learning_rate': 9.939772904402629e-06, 'epoch': 0.43}
14%|█▍ | 1666/11526 [17:23<1:41:05, 1.63it/s] 14%|█▍ | 1667/11526 [17:23<1:41:00, 1.63it/s] {'loss': 0.3184, 'grad_norm': 0.6156187653541565, 'learning_rate': 9.939538346886507e-06, 'epoch': 0.43}
14%|█▍ | 1667/11526 [17:24<1:41:00, 1.63it/s] 14%|█▍ | 1668/11526 [17:24<1:40:56, 1.63it/s] {'loss': 0.3718, 'grad_norm': 0.6801180243492126, 'learning_rate': 9.93930333628782e-06, 'epoch': 0.43}
14%|█▍ | 1668/11526 [17:24<1:40:56, 1.63it/s] 14%|█▍ | 1669/11526 [17:25<1:40:51, 1.63it/s] {'loss': 0.3187, 'grad_norm': 0.6757953763008118, 'learning_rate': 9.939067872628127e-06, 'epoch': 0.43}
14%|█▍ | 1669/11526 [17:25<1:40:51, 1.63it/s] 14%|█▍ | 1670/11526 [17:25<1:40:49, 1.63it/s] {'loss': 0.2414, 'grad_norm': 0.5517905354499817, 'learning_rate': 9.93883195592903e-06, 'epoch': 0.43}
14%|█▍ | 1670/11526 [17:25<1:40:49, 1.63it/s] 14%|█▍ | 1671/11526 [17:26<1:41:00, 1.63it/s] {'loss': 0.4167, 'grad_norm': 0.6973585486412048, 'learning_rate': 9.938595586212163e-06, 'epoch': 0.43}
14%|█▍ | 1671/11526 [17:26<1:41:00, 1.63it/s] 15%|█▍ | 1672/11526 [17:27<1:40:56, 1.63it/s] {'loss': 0.2822, 'grad_norm': 0.5436526536941528, 'learning_rate': 9.93835876349921e-06, 'epoch': 0.44}
15%|█▍ | 1672/11526 [17:27<1:40:56, 1.63it/s] 15%|█▍ | 1673/11526 [17:27<1:41:04, 1.62it/s] {'loss': 0.3319, 'grad_norm': 0.5908645987510681, 'learning_rate': 9.938121487811892e-06, 'epoch': 0.44}
15%|█▍ | 1673/11526 [17:27<1:41:04, 1.62it/s] 15%|█▍ | 1674/11526 [17:28<1:41:11, 1.62it/s] {'loss': 0.3181, 'grad_norm': 0.7072334885597229, 'learning_rate': 9.937883759171975e-06, 'epoch': 0.44}
15%|█▍ | 1674/11526 [17:28<1:41:11, 1.62it/s] 15%|█▍ | 1675/11526 [17:28<1:41:08, 1.62it/s] {'loss': 0.3831, 'grad_norm': 0.6936294436454773, 'learning_rate': 9.937645577601265e-06, 'epoch': 0.44}
15%|█▍ | 1675/11526 [17:29<1:41:08, 1.62it/s] 15%|█▍ | 1676/11526 [17:29<1:41:08, 1.62it/s] {'loss': 0.3004, 'grad_norm': 0.6793572306632996, 'learning_rate': 9.937406943121609e-06, 'epoch': 0.44}
15%|█▍ | 1676/11526 [17:29<1:41:08, 1.62it/s] 15%|█▍ | 1677/11526 [17:30<1:41:01, 1.62it/s] {'loss': 0.3245, 'grad_norm': 0.6879299879074097, 'learning_rate': 9.937167855754893e-06, 'epoch': 0.44}
15%|█▍ | 1677/11526 [17:30<1:41:01, 1.62it/s] 15%|█▍ | 1678/11526 [17:30<1:41:03, 1.62it/s] {'loss': 0.333, 'grad_norm': 0.745814859867096, 'learning_rate': 9.93692831552305e-06, 'epoch': 0.44}
15%|█▍ | 1678/11526 [17:30<1:41:03, 1.62it/s] 15%|█▍ | 1679/11526 [17:31<1:46:18, 1.54it/s] {'loss': 0.2971, 'grad_norm': 0.6450758576393127, 'learning_rate': 9.936688322448053e-06, 'epoch': 0.44}
15%|█▍ | 1679/11526 [17:31<1:46:18, 1.54it/s] 15%|█▍ | 1680/11526 [17:32<1:44:37, 1.57it/s] {'loss': 0.2892, 'grad_norm': 0.631404459476471, 'learning_rate': 9.936447876551916e-06, 'epoch': 0.44}
15%|█▍ | 1680/11526 [17:32<1:44:37, 1.57it/s] 15%|█▍ | 1681/11526 [17:32<1:43:27, 1.59it/s] {'loss': 0.2379, 'grad_norm': 0.5383914709091187, 'learning_rate': 9.936206977856691e-06, 'epoch': 0.44}
15%|█▍ | 1681/11526 [17:32<1:43:27, 1.59it/s] 15%|█▍ | 1682/11526 [17:33<1:42:36, 1.60it/s] {'loss': 0.3326, 'grad_norm': 0.6529980301856995, 'learning_rate': 9.935965626384477e-06, 'epoch': 0.44}
15%|█▍ | 1682/11526 [17:33<1:42:36, 1.60it/s] 15%|█▍ | 1683/11526 [17:33<1:42:00, 1.61it/s] {'loss': 0.2675, 'grad_norm': 0.6179131269454956, 'learning_rate': 9.93572382215741e-06, 'epoch': 0.44}
15%|█▍ | 1683/11526 [17:34<1:42:00, 1.61it/s] 15%|█▍ | 1684/11526 [17:34<1:41:35, 1.61it/s] {'loss': 0.3748, 'grad_norm': 0.6495605111122131, 'learning_rate': 9.935481565197674e-06, 'epoch': 0.44}
15%|█▍ | 1684/11526 [17:34<1:41:35, 1.61it/s] 15%|█▍ | 1685/11526 [17:35<1:46:38, 1.54it/s] {'loss': 0.2186, 'grad_norm': 0.5998679995536804, 'learning_rate': 9.935238855527483e-06, 'epoch': 0.44}
15%|█▍ | 1685/11526 [17:35<1:46:38, 1.54it/s] 15%|█▍ | 1686/11526 [17:35<1:44:55, 1.56it/s] {'loss': 0.2623, 'grad_norm': 0.5565500855445862, 'learning_rate': 9.934995693169104e-06, 'epoch': 0.44}
15%|█▍ | 1686/11526 [17:36<1:44:55, 1.56it/s] 15%|█▍ | 1687/11526 [17:36<1:43:38, 1.58it/s] {'loss': 0.3087, 'grad_norm': 0.6063162684440613, 'learning_rate': 9.934752078144844e-06, 'epoch': 0.44}
15%|█▍ | 1687/11526 [17:36<1:43:38, 1.58it/s] 15%|█▍ | 1688/11526 [17:37<1:42:42, 1.60it/s] {'loss': 0.2693, 'grad_norm': 0.6883452534675598, 'learning_rate': 9.934508010477043e-06, 'epoch': 0.44}
15%|█▍ | 1688/11526 [17:37<1:42:42, 1.60it/s] 15%|█▍ | 1689/11526 [17:37<1:42:06, 1.61it/s] {'loss': 0.2938, 'grad_norm': 0.5908753871917725, 'learning_rate': 9.934263490188095e-06, 'epoch': 0.44}
15%|█▍ | 1689/11526 [17:37<1:42:06, 1.61it/s] 15%|█▍ | 1690/11526 [17:38<1:41:39, 1.61it/s] {'loss': 0.3853, 'grad_norm': 0.6287939548492432, 'learning_rate': 9.934018517300422e-06, 'epoch': 0.44}
15%|█▍ | 1690/11526 [17:38<1:41:39, 1.61it/s] 15%|█▍ | 1691/11526 [17:38<1:41:27, 1.62it/s] {'loss': 0.2853, 'grad_norm': 0.565840482711792, 'learning_rate': 9.933773091836497e-06, 'epoch': 0.44}
15%|█▍ | 1691/11526 [17:39<1:41:27, 1.62it/s] 15%|█▍ | 1692/11526 [17:39<1:41:14, 1.62it/s] {'loss': 0.243, 'grad_norm': 0.6123592853546143, 'learning_rate': 9.933527213818834e-06, 'epoch': 0.44}
15%|█▍ | 1692/11526 [17:39<1:41:14, 1.62it/s] 15%|█▍ | 1693/11526 [17:40<1:41:02, 1.62it/s] {'loss': 0.2802, 'grad_norm': 0.6272918581962585, 'learning_rate': 9.933280883269983e-06, 'epoch': 0.44}
15%|█▍ | 1693/11526 [17:40<1:41:02, 1.62it/s] 15%|█▍ | 1694/11526 [17:40<1:40:56, 1.62it/s] {'loss': 0.2467, 'grad_norm': 0.6215671896934509, 'learning_rate': 9.93303410021254e-06, 'epoch': 0.44}
15%|█▍ | 1694/11526 [17:40<1:40:56, 1.62it/s] 15%|█▍ | 1695/11526 [17:41<1:40:50, 1.62it/s] {'loss': 0.3617, 'grad_norm': 0.8227088451385498, 'learning_rate': 9.93278686466914e-06, 'epoch': 0.44}
15%|█▍ | 1695/11526 [17:41<1:40:50, 1.62it/s] 15%|█▍ | 1696/11526 [17:42<1:41:22, 1.62it/s] {'loss': 0.303, 'grad_norm': 0.6280375123023987, 'learning_rate': 9.932539176662465e-06, 'epoch': 0.44}
15%|█▍ | 1696/11526 [17:42<1:41:22, 1.62it/s] 15%|█▍ | 1697/11526 [17:42<1:41:17, 1.62it/s] {'loss': 0.2846, 'grad_norm': 0.6507154703140259, 'learning_rate': 9.932291036215231e-06, 'epoch': 0.44}
15%|█▍ | 1697/11526 [17:42<1:41:17, 1.62it/s] 15%|█▍ | 1698/11526 [17:43<1:41:04, 1.62it/s] {'loss': 0.3162, 'grad_norm': 0.603904128074646, 'learning_rate': 9.932042443350198e-06, 'epoch': 0.44}
15%|█▍ | 1698/11526 [17:43<1:41:04, 1.62it/s] 15%|█▍ | 1699/11526 [17:43<1:40:55, 1.62it/s] {'loss': 0.3063, 'grad_norm': 0.5522749423980713, 'learning_rate': 9.931793398090172e-06, 'epoch': 0.44}
15%|█▍ | 1699/11526 [17:44<1:40:55, 1.62it/s] 15%|█▍ | 1700/11526 [17:44<1:40:50, 1.62it/s] {'loss': 0.3257, 'grad_norm': 0.6512362360954285, 'learning_rate': 9.931543900457994e-06, 'epoch': 0.44}
15%|█▍ | 1700/11526 [17:44<1:40:50, 1.62it/s] 15%|█▍ | 1701/11526 [17:45<1:40:56, 1.62it/s] {'loss': 0.2385, 'grad_norm': 0.5783429741859436, 'learning_rate': 9.93129395047655e-06, 'epoch': 0.44}
15%|█▍ | 1701/11526 [17:45<1:40:56, 1.62it/s] 15%|█▍ | 1702/11526 [17:45<1:40:55, 1.62it/s] {'loss': 0.2674, 'grad_norm': 0.6066803932189941, 'learning_rate': 9.931043548168767e-06, 'epoch': 0.44}
15%|█▍ | 1702/11526 [17:45<1:40:55, 1.62it/s] 15%|█▍ | 1703/11526 [17:46<1:40:47, 1.62it/s] {'loss': 0.2291, 'grad_norm': 0.5783527493476868, 'learning_rate': 9.930792693557614e-06, 'epoch': 0.44}
15%|█▍ | 1703/11526 [17:46<1:40:47, 1.62it/s] 15%|█▍ | 1704/11526 [17:46<1:40:40, 1.63it/s] {'loss': 0.2081, 'grad_norm': 0.6737332940101624, 'learning_rate': 9.9305413866661e-06, 'epoch': 0.44}
15%|█▍ | 1704/11526 [17:47<1:40:40, 1.63it/s] 15%|█▍ | 1705/11526 [17:47<1:40:35, 1.63it/s] {'loss': 0.2866, 'grad_norm': 0.5831177830696106, 'learning_rate': 9.930289627517275e-06, 'epoch': 0.44}
15%|█▍ | 1705/11526 [17:47<1:40:35, 1.63it/s] 15%|█▍ | 1706/11526 [17:48<1:40:43, 1.63it/s] {'loss': 0.2832, 'grad_norm': 0.6552446484565735, 'learning_rate': 9.930037416134235e-06, 'epoch': 0.44}
15%|█▍ | 1706/11526 [17:48<1:40:43, 1.63it/s] 15%|█▍ | 1707/11526 [17:48<1:40:38, 1.63it/s] {'loss': 0.2586, 'grad_norm': 0.7467808127403259, 'learning_rate': 9.92978475254011e-06, 'epoch': 0.44}
15%|█▍ | 1707/11526 [17:48<1:40:38, 1.63it/s] 15%|█▍ | 1708/11526 [17:49<1:40:32, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.6265599727630615, 'learning_rate': 9.92953163675808e-06, 'epoch': 0.44}
15%|█▍ | 1708/11526 [17:49<1:40:32, 1.63it/s] 15%|█▍ | 1709/11526 [17:50<1:40:33, 1.63it/s] {'loss': 0.3181, 'grad_norm': 0.71543288230896, 'learning_rate': 9.92927806881136e-06, 'epoch': 0.44}
15%|█▍ | 1709/11526 [17:50<1:40:33, 1.63it/s] 15%|█▍ | 1710/11526 [17:50<1:40:30, 1.63it/s] {'loss': 0.3275, 'grad_norm': 0.6159385442733765, 'learning_rate': 9.92902404872321e-06, 'epoch': 0.45}
15%|█▍ | 1710/11526 [17:50<1:40:30, 1.63it/s] 15%|█▍ | 1711/11526 [17:51<1:40:33, 1.63it/s] {'loss': 0.2309, 'grad_norm': 0.4954947531223297, 'learning_rate': 9.928769576516928e-06, 'epoch': 0.45}
15%|█▍ | 1711/11526 [17:51<1:40:33, 1.63it/s] 15%|█▍ | 1712/11526 [17:51<1:40:32, 1.63it/s] {'loss': 0.353, 'grad_norm': 0.6723646521568298, 'learning_rate': 9.928514652215857e-06, 'epoch': 0.45}
15%|█▍ | 1712/11526 [17:52<1:40:32, 1.63it/s] 15%|█▍ | 1713/11526 [17:52<1:40:29, 1.63it/s] {'loss': 0.3147, 'grad_norm': 0.6712695360183716, 'learning_rate': 9.928259275843381e-06, 'epoch': 0.45}
15%|█▍ | 1713/11526 [17:52<1:40:29, 1.63it/s] 15%|█▍ | 1714/11526 [17:53<1:40:27, 1.63it/s] {'loss': 0.2804, 'grad_norm': 0.6225553750991821, 'learning_rate': 9.928003447422922e-06, 'epoch': 0.45}
15%|█▍ | 1714/11526 [17:53<1:40:27, 1.63it/s] 15%|█▍ | 1715/11526 [17:53<1:40:29, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.5644259452819824, 'learning_rate': 9.92774716697795e-06, 'epoch': 0.45}
15%|█▍ | 1715/11526 [17:53<1:40:29, 1.63it/s] 15%|█▍ | 1716/11526 [17:54<1:40:37, 1.62it/s] {'loss': 0.3074, 'grad_norm': 0.6082785129547119, 'learning_rate': 9.927490434531968e-06, 'epoch': 0.45}
15%|█▍ | 1716/11526 [17:54<1:40:37, 1.62it/s] 15%|█▍ | 1717/11526 [17:54<1:40:33, 1.63it/s] {'loss': 0.2829, 'grad_norm': 0.5844021439552307, 'learning_rate': 9.927233250108528e-06, 'epoch': 0.45}
15%|█▍ | 1717/11526 [17:55<1:40:33, 1.63it/s] 15%|█▍ | 1718/11526 [17:55<1:40:27, 1.63it/s] {'loss': 0.2653, 'grad_norm': 0.5790395140647888, 'learning_rate': 9.926975613731218e-06, 'epoch': 0.45}
15%|█▍ | 1718/11526 [17:55<1:40:27, 1.63it/s] 15%|█▍ | 1719/11526 [17:56<1:40:26, 1.63it/s] {'loss': 0.2988, 'grad_norm': 0.6642628312110901, 'learning_rate': 9.926717525423674e-06, 'epoch': 0.45}
15%|█▍ | 1719/11526 [17:56<1:40:26, 1.63it/s] 15%|█▍ | 1720/11526 [17:56<1:40:25, 1.63it/s] {'loss': 0.331, 'grad_norm': 0.8482629060745239, 'learning_rate': 9.926458985209565e-06, 'epoch': 0.45}
15%|█▍ | 1720/11526 [17:56<1:40:25, 1.63it/s] 15%|█▍ | 1721/11526 [17:57<1:40:33, 1.63it/s] {'loss': 0.2459, 'grad_norm': 0.5775266885757446, 'learning_rate': 9.926199993112609e-06, 'epoch': 0.45}
15%|█▍ | 1721/11526 [17:57<1:40:33, 1.63it/s] 15%|█▍ | 1722/11526 [17:58<1:40:34, 1.62it/s] {'loss': 0.3027, 'grad_norm': 0.6293177604675293, 'learning_rate': 9.92594054915656e-06, 'epoch': 0.45}
15%|█▍ | 1722/11526 [17:58<1:40:34, 1.62it/s] 15%|█▍ | 1723/11526 [17:58<1:40:27, 1.63it/s] {'loss': 0.3087, 'grad_norm': 0.598156750202179, 'learning_rate': 9.925680653365214e-06, 'epoch': 0.45}
15%|█▍ | 1723/11526 [17:58<1:40:27, 1.63it/s] 15%|█▍ | 1724/11526 [17:59<1:40:23, 1.63it/s] {'loss': 0.3237, 'grad_norm': 0.6729902625083923, 'learning_rate': 9.925420305762414e-06, 'epoch': 0.45}
15%|█▍ | 1724/11526 [17:59<1:40:23, 1.63it/s] 15%|█▍ | 1725/11526 [17:59<1:40:20, 1.63it/s] {'loss': 0.2409, 'grad_norm': 0.6124634146690369, 'learning_rate': 9.92515950637204e-06, 'epoch': 0.45}
15%|█▍ | 1725/11526 [17:59<1:40:20, 1.63it/s] 15%|█▍ | 1726/11526 [18:00<1:40:25, 1.63it/s] {'loss': 0.3345, 'grad_norm': 0.6183220148086548, 'learning_rate': 9.924898255218013e-06, 'epoch': 0.45}
15%|█▍ | 1726/11526 [18:00<1:40:25, 1.63it/s] 15%|█▍ | 1727/11526 [18:01<1:40:25, 1.63it/s] {'loss': 0.2413, 'grad_norm': 0.5600863099098206, 'learning_rate': 9.924636552324296e-06, 'epoch': 0.45}
15%|█▍ | 1727/11526 [18:01<1:40:25, 1.63it/s] 15%|█▍ | 1728/11526 [18:01<1:40:22, 1.63it/s] {'loss': 0.1901, 'grad_norm': 0.4973384141921997, 'learning_rate': 9.924374397714895e-06, 'epoch': 0.45}
15%|█▍ | 1728/11526 [18:01<1:40:22, 1.63it/s] 15%|█▌ | 1729/11526 [18:02<1:40:20, 1.63it/s] {'loss': 0.2083, 'grad_norm': 0.5226150155067444, 'learning_rate': 9.924111791413856e-06, 'epoch': 0.45}
15%|█▌ | 1729/11526 [18:02<1:40:20, 1.63it/s] 15%|█▌ | 1730/11526 [18:02<1:40:19, 1.63it/s] {'loss': 0.2223, 'grad_norm': 0.5025061368942261, 'learning_rate': 9.923848733445264e-06, 'epoch': 0.45}
15%|█▌ | 1730/11526 [18:03<1:40:19, 1.63it/s] 15%|█▌ | 1731/11526 [18:03<1:40:29, 1.62it/s] {'loss': 0.2939, 'grad_norm': 0.5706218481063843, 'learning_rate': 9.923585223833252e-06, 'epoch': 0.45}
15%|█▌ | 1731/11526 [18:03<1:40:29, 1.62it/s] 15%|█▌ | 1732/11526 [18:04<1:40:29, 1.62it/s] {'loss': 0.2955, 'grad_norm': 0.5636242628097534, 'learning_rate': 9.92332126260199e-06, 'epoch': 0.45}
15%|█▌ | 1732/11526 [18:04<1:40:29, 1.62it/s] 15%|█▌ | 1733/11526 [18:04<1:40:25, 1.63it/s] {'loss': 0.2903, 'grad_norm': 0.6499934792518616, 'learning_rate': 9.923056849775689e-06, 'epoch': 0.45}
15%|█▌ | 1733/11526 [18:04<1:40:25, 1.63it/s] 15%|█▌ | 1734/11526 [18:05<1:40:19, 1.63it/s] {'loss': 0.2807, 'grad_norm': 0.6055626273155212, 'learning_rate': 9.922791985378601e-06, 'epoch': 0.45}
15%|█▌ | 1734/11526 [18:05<1:40:19, 1.63it/s] 15%|█▌ | 1735/11526 [18:06<1:40:17, 1.63it/s] {'loss': 0.2847, 'grad_norm': 0.5560173392295837, 'learning_rate': 9.922526669435024e-06, 'epoch': 0.45}
15%|█▌ | 1735/11526 [18:06<1:40:17, 1.63it/s] 15%|█▌ | 1736/11526 [18:06<1:40:28, 1.62it/s] {'loss': 0.2301, 'grad_norm': 0.5336009860038757, 'learning_rate': 9.922260901969294e-06, 'epoch': 0.45}
15%|█▌ | 1736/11526 [18:06<1:40:28, 1.62it/s] 15%|█▌ | 1737/11526 [18:07<1:40:56, 1.62it/s] {'loss': 0.3013, 'grad_norm': 0.6578713059425354, 'learning_rate': 9.921994683005787e-06, 'epoch': 0.45}
15%|█▌ | 1737/11526 [18:07<1:40:56, 1.62it/s] 15%|█▌ | 1738/11526 [18:07<1:40:42, 1.62it/s] {'loss': 0.2877, 'grad_norm': 0.6351444125175476, 'learning_rate': 9.92172801256892e-06, 'epoch': 0.45}
15%|█▌ | 1738/11526 [18:08<1:40:42, 1.62it/s] 15%|█▌ | 1739/11526 [18:08<1:40:31, 1.62it/s] {'loss': 0.3146, 'grad_norm': 0.5975417494773865, 'learning_rate': 9.92146089068316e-06, 'epoch': 0.45}
15%|█▌ | 1739/11526 [18:08<1:40:31, 1.62it/s] 15%|█▌ | 1740/11526 [18:09<1:40:24, 1.62it/s] {'loss': 0.2309, 'grad_norm': 0.5504015684127808, 'learning_rate': 9.921193317373003e-06, 'epoch': 0.45}
15%|█▌ | 1740/11526 [18:09<1:40:24, 1.62it/s] 15%|█▌ | 1741/11526 [18:09<1:40:49, 1.62it/s] {'loss': 0.3104, 'grad_norm': 0.6497365236282349, 'learning_rate': 9.920925292662996e-06, 'epoch': 0.45}
15%|█▌ | 1741/11526 [18:09<1:40:49, 1.62it/s] 15%|█▌ | 1742/11526 [18:10<1:41:07, 1.61it/s] {'loss': 0.2864, 'grad_norm': 0.6222735047340393, 'learning_rate': 9.920656816577721e-06, 'epoch': 0.45}
15%|█▌ | 1742/11526 [18:10<1:41:07, 1.61it/s] 15%|█▌ | 1743/11526 [18:10<1:40:51, 1.62it/s] {'loss': 0.2645, 'grad_norm': 0.5741310119628906, 'learning_rate': 9.920387889141805e-06, 'epoch': 0.45}
15%|█▌ | 1743/11526 [18:11<1:40:51, 1.62it/s] 15%|█▌ | 1744/11526 [18:11<1:40:36, 1.62it/s] {'loss': 0.365, 'grad_norm': 0.6310993432998657, 'learning_rate': 9.920118510379917e-06, 'epoch': 0.45}
15%|█▌ | 1744/11526 [18:11<1:40:36, 1.62it/s] 15%|█▌ | 1745/11526 [18:12<1:40:31, 1.62it/s] {'loss': 0.252, 'grad_norm': 0.6128271818161011, 'learning_rate': 9.919848680316764e-06, 'epoch': 0.45}
15%|█▌ | 1745/11526 [18:12<1:40:31, 1.62it/s] 15%|█▌ | 1746/11526 [18:12<1:40:32, 1.62it/s] {'loss': 0.3465, 'grad_norm': 0.7029732465744019, 'learning_rate': 9.919578398977098e-06, 'epoch': 0.45}
15%|█▌ | 1746/11526 [18:12<1:40:32, 1.62it/s] 15%|█▌ | 1747/11526 [18:13<1:40:28, 1.62it/s] {'loss': 0.2885, 'grad_norm': 0.6844731569290161, 'learning_rate': 9.91930766638571e-06, 'epoch': 0.45}
15%|█▌ | 1747/11526 [18:13<1:40:28, 1.62it/s] 15%|█▌ | 1748/11526 [18:14<1:40:26, 1.62it/s] {'loss': 0.3192, 'grad_norm': 0.5530723929405212, 'learning_rate': 9.919036482567433e-06, 'epoch': 0.45}
15%|█▌ | 1748/11526 [18:14<1:40:26, 1.62it/s] 15%|█▌ | 1749/11526 [18:14<1:40:19, 1.62it/s] {'loss': 0.3726, 'grad_norm': 0.7369110584259033, 'learning_rate': 9.91876484754714e-06, 'epoch': 0.46}
15%|█▌ | 1749/11526 [18:14<1:40:19, 1.62it/s] 15%|█▌ | 1750/11526 [18:15<1:40:14, 1.63it/s] {'loss': 0.2332, 'grad_norm': 0.5444818139076233, 'learning_rate': 9.918492761349752e-06, 'epoch': 0.46}
15%|█▌ | 1750/11526 [18:15<1:40:14, 1.63it/s] 15%|█▌ | 1751/11526 [18:15<1:40:26, 1.62it/s] {'loss': 0.2744, 'grad_norm': 0.5887733697891235, 'learning_rate': 9.91822022400022e-06, 'epoch': 0.46}
15%|█▌ | 1751/11526 [18:16<1:40:26, 1.62it/s] 15%|█▌ | 1752/11526 [18:16<1:40:26, 1.62it/s] {'loss': 0.2691, 'grad_norm': 0.6162213683128357, 'learning_rate': 9.917947235523546e-06, 'epoch': 0.46}
15%|█▌ | 1752/11526 [18:16<1:40:26, 1.62it/s] 15%|█▌ | 1753/11526 [18:17<1:40:20, 1.62it/s] {'loss': 0.2742, 'grad_norm': 0.654265284538269, 'learning_rate': 9.917673795944771e-06, 'epoch': 0.46}
15%|█▌ | 1753/11526 [18:17<1:40:20, 1.62it/s] 15%|█▌ | 1754/11526 [18:17<1:40:18, 1.62it/s] {'loss': 0.2428, 'grad_norm': 0.5638412237167358, 'learning_rate': 9.917399905288973e-06, 'epoch': 0.46}
15%|█▌ | 1754/11526 [18:17<1:40:18, 1.62it/s] 15%|█▌ | 1755/11526 [18:18<1:40:10, 1.63it/s] {'loss': 0.2753, 'grad_norm': 0.6084681749343872, 'learning_rate': 9.91712556358128e-06, 'epoch': 0.46}
15%|█▌ | 1755/11526 [18:18<1:40:10, 1.63it/s] 15%|█▌ | 1756/11526 [18:18<1:40:17, 1.62it/s] {'loss': 0.2709, 'grad_norm': 0.6281037330627441, 'learning_rate': 9.91685077084685e-06, 'epoch': 0.46}
15%|█▌ | 1756/11526 [18:19<1:40:17, 1.62it/s] 15%|█▌ | 1757/11526 [18:19<1:40:14, 1.62it/s] {'loss': 0.2751, 'grad_norm': 0.5950341820716858, 'learning_rate': 9.916575527110893e-06, 'epoch': 0.46}
15%|█▌ | 1757/11526 [18:19<1:40:14, 1.62it/s] 15%|█▌ | 1758/11526 [18:20<1:40:11, 1.62it/s] {'loss': 0.2278, 'grad_norm': 0.5639112591743469, 'learning_rate': 9.916299832398653e-06, 'epoch': 0.46}
15%|█▌ | 1758/11526 [18:20<1:40:11, 1.62it/s] 15%|█▌ | 1759/11526 [18:20<1:40:15, 1.62it/s] {'loss': 0.1971, 'grad_norm': 0.4826659560203552, 'learning_rate': 9.91602368673542e-06, 'epoch': 0.46}
15%|█▌ | 1759/11526 [18:20<1:40:15, 1.62it/s] 15%|█▌ | 1760/11526 [18:21<1:40:17, 1.62it/s] {'loss': 0.2947, 'grad_norm': 0.5646761655807495, 'learning_rate': 9.915747090146526e-06, 'epoch': 0.46}
15%|█▌ | 1760/11526 [18:21<1:40:17, 1.62it/s] 15%|█▌ | 1761/11526 [18:22<1:40:10, 1.62it/s] {'loss': 0.2493, 'grad_norm': 0.5406242609024048, 'learning_rate': 9.915470042657338e-06, 'epoch': 0.46}
15%|█▌ | 1761/11526 [18:22<1:40:10, 1.62it/s] 15%|█▌ | 1762/11526 [18:22<1:40:18, 1.62it/s] {'loss': 0.3601, 'grad_norm': 0.7422747611999512, 'learning_rate': 9.915192544293268e-06, 'epoch': 0.46}
15%|█▌ | 1762/11526 [18:22<1:40:18, 1.62it/s] 15%|█▌ | 1763/11526 [18:23<1:40:15, 1.62it/s] {'loss': 0.292, 'grad_norm': 0.5795661211013794, 'learning_rate': 9.914914595079774e-06, 'epoch': 0.46}
15%|█▌ | 1763/11526 [18:23<1:40:15, 1.62it/s] 15%|█▌ | 1764/11526 [18:23<1:40:08, 1.62it/s] {'loss': 0.3183, 'grad_norm': 0.6352857351303101, 'learning_rate': 9.914636195042348e-06, 'epoch': 0.46}
15%|█▌ | 1764/11526 [18:24<1:40:08, 1.62it/s] 15%|█▌ | 1765/11526 [18:24<1:40:05, 1.63it/s] {'loss': 0.2382, 'grad_norm': 0.5770596861839294, 'learning_rate': 9.914357344206526e-06, 'epoch': 0.46}
15%|█▌ | 1765/11526 [18:24<1:40:05, 1.63it/s] 15%|█▌ | 1766/11526 [18:25<1:40:08, 1.62it/s] {'loss': 0.3515, 'grad_norm': 0.7576121687889099, 'learning_rate': 9.914078042597887e-06, 'epoch': 0.46}
15%|█▌ | 1766/11526 [18:25<1:40:08, 1.62it/s] 15%|█▌ | 1767/11526 [18:25<1:40:11, 1.62it/s] {'loss': 0.2735, 'grad_norm': 0.7156609892845154, 'learning_rate': 9.913798290242051e-06, 'epoch': 0.46}
15%|█▌ | 1767/11526 [18:25<1:40:11, 1.62it/s] 15%|█▌ | 1768/11526 [18:26<1:40:05, 1.62it/s] {'loss': 0.2246, 'grad_norm': 0.5179657340049744, 'learning_rate': 9.913518087164678e-06, 'epoch': 0.46}
15%|█▌ | 1768/11526 [18:26<1:40:05, 1.62it/s] 15%|█▌ | 1769/11526 [18:26<1:40:00, 1.63it/s] {'loss': 0.3404, 'grad_norm': 0.7074328660964966, 'learning_rate': 9.913237433391466e-06, 'epoch': 0.46}
15%|█▌ | 1769/11526 [18:27<1:40:00, 1.63it/s] 15%|█▌ | 1770/11526 [18:27<1:39:54, 1.63it/s] {'loss': 0.285, 'grad_norm': 0.6818488240242004, 'learning_rate': 9.912956328948165e-06, 'epoch': 0.46}
15%|█▌ | 1770/11526 [18:27<1:39:54, 1.63it/s] 15%|█▌ | 1771/11526 [18:28<1:39:57, 1.63it/s] {'loss': 0.244, 'grad_norm': 0.6191176772117615, 'learning_rate': 9.912674773860556e-06, 'epoch': 0.46}
15%|█▌ | 1771/11526 [18:28<1:39:57, 1.63it/s] 15%|█▌ | 1772/11526 [18:28<1:40:00, 1.63it/s] {'loss': 0.3298, 'grad_norm': 0.8209881782531738, 'learning_rate': 9.912392768154463e-06, 'epoch': 0.46}
15%|█▌ | 1772/11526 [18:28<1:40:00, 1.63it/s] 15%|█▌ | 1773/11526 [18:29<1:39:57, 1.63it/s] {'loss': 0.2495, 'grad_norm': 0.5353518128395081, 'learning_rate': 9.912110311855756e-06, 'epoch': 0.46}
15%|█▌ | 1773/11526 [18:29<1:39:57, 1.63it/s] 15%|█▌ | 1774/11526 [18:30<1:39:52, 1.63it/s] {'loss': 0.2803, 'grad_norm': 0.5190170407295227, 'learning_rate': 9.911827404990341e-06, 'epoch': 0.46}
15%|█▌ | 1774/11526 [18:30<1:39:52, 1.63it/s] 15%|█▌ | 1775/11526 [18:30<1:39:51, 1.63it/s] {'loss': 0.271, 'grad_norm': 0.5819395184516907, 'learning_rate': 9.91154404758417e-06, 'epoch': 0.46}
15%|█▌ | 1775/11526 [18:30<1:39:51, 1.63it/s] 15%|█▌ | 1776/11526 [18:31<1:39:58, 1.63it/s] {'loss': 0.2955, 'grad_norm': 0.6077647805213928, 'learning_rate': 9.911260239663234e-06, 'epoch': 0.46}
15%|█▌ | 1776/11526 [18:31<1:39:58, 1.63it/s] 15%|█▌ | 1777/11526 [18:31<1:39:55, 1.63it/s] {'loss': 0.3498, 'grad_norm': 0.6687936186790466, 'learning_rate': 9.910975981253565e-06, 'epoch': 0.46}
15%|█▌ | 1777/11526 [18:32<1:39:55, 1.63it/s] 15%|█▌ | 1778/11526 [18:32<1:39:48, 1.63it/s] {'loss': 0.3401, 'grad_norm': 0.777655303478241, 'learning_rate': 9.910691272381237e-06, 'epoch': 0.46}
15%|█▌ | 1778/11526 [18:32<1:39:48, 1.63it/s] 15%|█▌ | 1779/11526 [18:33<1:39:46, 1.63it/s] {'loss': 0.3199, 'grad_norm': 0.6109945774078369, 'learning_rate': 9.910406113072363e-06, 'epoch': 0.46}
15%|█▌ | 1779/11526 [18:33<1:39:46, 1.63it/s] 15%|█▌ | 1780/11526 [18:33<1:39:44, 1.63it/s] {'loss': 0.2956, 'grad_norm': 0.6087507009506226, 'learning_rate': 9.910120503353102e-06, 'epoch': 0.46}
15%|█▌ | 1780/11526 [18:33<1:39:44, 1.63it/s] 15%|█▌ | 1781/11526 [18:34<1:39:41, 1.63it/s] {'loss': 0.2625, 'grad_norm': 0.538700520992279, 'learning_rate': 9.90983444324965e-06, 'epoch': 0.46}
15%|█▌ | 1781/11526 [18:34<1:39:41, 1.63it/s] 15%|█▌ | 1782/11526 [18:34<1:39:43, 1.63it/s] {'loss': 0.2377, 'grad_norm': 0.4758448004722595, 'learning_rate': 9.90954793278825e-06, 'epoch': 0.46}
15%|█▌ | 1782/11526 [18:35<1:39:43, 1.63it/s] 15%|█▌ | 1783/11526 [18:35<1:39:44, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.4543322026729584, 'learning_rate': 9.909260971995177e-06, 'epoch': 0.46}
15%|█▌ | 1783/11526 [18:35<1:39:44, 1.63it/s] 15%|█▌ | 1784/11526 [18:36<1:39:44, 1.63it/s] {'loss': 0.2334, 'grad_norm': 0.632641613483429, 'learning_rate': 9.908973560896756e-06, 'epoch': 0.46}
15%|█▌ | 1784/11526 [18:36<1:39:44, 1.63it/s] 15%|█▌ | 1785/11526 [18:36<1:39:41, 1.63it/s] {'loss': 0.3499, 'grad_norm': 0.6539014577865601, 'learning_rate': 9.908685699519349e-06, 'epoch': 0.46}
15%|█▌ | 1785/11526 [18:36<1:39:41, 1.63it/s] 15%|█▌ | 1786/11526 [18:37<1:39:43, 1.63it/s] {'loss': 0.2515, 'grad_norm': 0.531934380531311, 'learning_rate': 9.90839738788936e-06, 'epoch': 0.46}
15%|█▌ | 1786/11526 [18:37<1:39:43, 1.63it/s] 16%|█▌ | 1787/11526 [18:38<1:39:41, 1.63it/s] {'loss': 0.2783, 'grad_norm': 0.5889500379562378, 'learning_rate': 9.908108626033235e-06, 'epoch': 0.47}
16%|█▌ | 1787/11526 [18:38<1:39:41, 1.63it/s] 16%|█▌ | 1788/11526 [18:38<1:39:38, 1.63it/s] {'loss': 0.2542, 'grad_norm': 0.5881174802780151, 'learning_rate': 9.907819413977462e-06, 'epoch': 0.47}
16%|█▌ | 1788/11526 [18:38<1:39:38, 1.63it/s] 16%|█▌ | 1789/11526 [18:39<1:39:36, 1.63it/s] {'loss': 0.2929, 'grad_norm': 0.6960455775260925, 'learning_rate': 9.907529751748567e-06, 'epoch': 0.47}
16%|█▌ | 1789/11526 [18:39<1:39:36, 1.63it/s] 16%|█▌ | 1790/11526 [18:39<1:39:37, 1.63it/s] {'loss': 0.395, 'grad_norm': 0.7572062611579895, 'learning_rate': 9.90723963937312e-06, 'epoch': 0.47}
16%|█▌ | 1790/11526 [18:40<1:39:37, 1.63it/s] 16%|█▌ | 1791/11526 [18:40<1:39:47, 1.63it/s] {'loss': 0.2861, 'grad_norm': 0.5912462472915649, 'learning_rate': 9.906949076877732e-06, 'epoch': 0.47}
16%|█▌ | 1791/11526 [18:40<1:39:47, 1.63it/s] 16%|█▌ | 1792/11526 [18:41<1:39:42, 1.63it/s] {'loss': 0.2594, 'grad_norm': 0.5877445936203003, 'learning_rate': 9.906658064289058e-06, 'epoch': 0.47}
16%|█▌ | 1792/11526 [18:41<1:39:42, 1.63it/s] 16%|█▌ | 1793/11526 [18:41<1:39:39, 1.63it/s] {'loss': 0.2605, 'grad_norm': 0.5736730694770813, 'learning_rate': 9.906366601633785e-06, 'epoch': 0.47}
16%|█▌ | 1793/11526 [18:41<1:39:39, 1.63it/s] 16%|█▌ | 1794/11526 [18:42<1:39:38, 1.63it/s] {'loss': 0.3412, 'grad_norm': 0.7288220524787903, 'learning_rate': 9.906074688938654e-06, 'epoch': 0.47}
16%|█▌ | 1794/11526 [18:42<1:39:38, 1.63it/s] 16%|█▌ | 1795/11526 [18:42<1:39:37, 1.63it/s] {'loss': 0.2761, 'grad_norm': 0.5401854515075684, 'learning_rate': 9.905782326230437e-06, 'epoch': 0.47}
16%|█▌ | 1795/11526 [18:43<1:39:37, 1.63it/s] 16%|█▌ | 1796/11526 [18:43<1:39:43, 1.63it/s] {'loss': 0.4232, 'grad_norm': 0.7876623868942261, 'learning_rate': 9.905489513535952e-06, 'epoch': 0.47}
16%|█▌ | 1796/11526 [18:43<1:39:43, 1.63it/s] 16%|█▌ | 1797/11526 [18:44<1:39:41, 1.63it/s] {'loss': 0.402, 'grad_norm': 0.7604416012763977, 'learning_rate': 9.905196250882059e-06, 'epoch': 0.47}
16%|█▌ | 1797/11526 [18:44<1:39:41, 1.63it/s] 16%|█▌ | 1798/11526 [18:44<1:39:36, 1.63it/s] {'loss': 0.3566, 'grad_norm': 0.7083583474159241, 'learning_rate': 9.904902538295655e-06, 'epoch': 0.47}
16%|█▌ | 1798/11526 [18:44<1:39:36, 1.63it/s] 16%|█▌ | 1799/11526 [18:45<1:39:35, 1.63it/s] {'loss': 0.2483, 'grad_norm': 0.5248933434486389, 'learning_rate': 9.904608375803684e-06, 'epoch': 0.47}
16%|█▌ | 1799/11526 [18:45<1:39:35, 1.63it/s] 16%|█▌ | 1800/11526 [18:46<1:39:36, 1.63it/s] {'loss': 0.256, 'grad_norm': 0.6161048412322998, 'learning_rate': 9.904313763433125e-06, 'epoch': 0.47}
16%|█▌ | 1800/11526 [18:46<1:39:36, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.7292183041572571, 'eval_runtime': 1.9584, 'eval_samples_per_second': 102.123, 'eval_steps_per_second': 6.638, 'epoch': 0.47}
16%|█▌ | 1800/11526 [18:48<1:39:36, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 16%|█▌ | 1801/11526 [18:48<3:15:12, 1.20s/it] {'loss': 0.232, 'grad_norm': 0.5330926179885864, 'learning_rate': 9.904018701211004e-06, 'epoch': 0.47}
16%|█▌ | 1801/11526 [18:48<3:15:12, 1.20s/it] 16%|█▌ | 1802/11526 [18:49<2:46:38, 1.03s/it] {'loss': 0.3038, 'grad_norm': 0.6057061553001404, 'learning_rate': 9.903723189164384e-06, 'epoch': 0.47}
16%|█▌ | 1802/11526 [18:49<2:46:38, 1.03s/it] 16%|█▌ | 1803/11526 [18:49<2:26:31, 1.11it/s] {'loss': 0.2867, 'grad_norm': 0.6551220417022705, 'learning_rate': 9.903427227320373e-06, 'epoch': 0.47}
16%|█▌ | 1803/11526 [18:49<2:26:31, 1.11it/s] 16%|█▌ | 1804/11526 [18:50<2:12:28, 1.22it/s] {'loss': 0.2867, 'grad_norm': 0.6220617890357971, 'learning_rate': 9.903130815706119e-06, 'epoch': 0.47}
16%|█▌ | 1804/11526 [18:50<2:12:28, 1.22it/s] 16%|█▌ | 1805/11526 [18:51<2:02:34, 1.32it/s] {'loss': 0.288, 'grad_norm': 0.6327654719352722, 'learning_rate': 9.902833954348806e-06, 'epoch': 0.47}
16%|█▌ | 1805/11526 [18:51<2:02:34, 1.32it/s] 16%|█▌ | 1806/11526 [18:51<1:55:35, 1.40it/s] {'loss': 0.3049, 'grad_norm': 0.579552948474884, 'learning_rate': 9.902536643275669e-06, 'epoch': 0.47}
16%|█▌ | 1806/11526 [18:51<1:55:35, 1.40it/s] 16%|█▌ | 1807/11526 [18:52<1:50:51, 1.46it/s] {'loss': 0.261, 'grad_norm': 0.5395046472549438, 'learning_rate': 9.902238882513975e-06, 'epoch': 0.47}
16%|█▌ | 1807/11526 [18:52<1:50:51, 1.46it/s] 16%|█▌ | 1808/11526 [18:52<1:47:24, 1.51it/s] {'loss': 0.3284, 'grad_norm': 0.6338933706283569, 'learning_rate': 9.90194067209104e-06, 'epoch': 0.47}
16%|█▌ | 1808/11526 [18:53<1:47:24, 1.51it/s] 16%|█▌ | 1809/11526 [18:53<1:45:00, 1.54it/s] {'loss': 0.3149, 'grad_norm': 0.5937712788581848, 'learning_rate': 9.901642012034214e-06, 'epoch': 0.47}
16%|█▌ | 1809/11526 [18:53<1:45:00, 1.54it/s] 16%|█▌ | 1810/11526 [18:54<1:43:19, 1.57it/s] {'loss': 0.2818, 'grad_norm': 0.5864075422286987, 'learning_rate': 9.901342902370893e-06, 'epoch': 0.47}
16%|█▌ | 1810/11526 [18:54<1:43:19, 1.57it/s] 16%|█▌ | 1811/11526 [18:54<1:42:08, 1.59it/s] {'loss': 0.31, 'grad_norm': 0.5949771404266357, 'learning_rate': 9.901043343128515e-06, 'epoch': 0.47}
16%|█▌ | 1811/11526 [18:54<1:42:08, 1.59it/s] 16%|█▌ | 1812/11526 [18:55<1:41:15, 1.60it/s] {'loss': 0.2434, 'grad_norm': 0.5277470946311951, 'learning_rate': 9.900743334334554e-06, 'epoch': 0.47}
16%|█▌ | 1812/11526 [18:55<1:41:15, 1.60it/s] 16%|█▌ | 1813/11526 [18:55<1:40:45, 1.61it/s] {'loss': 0.2569, 'grad_norm': 0.613141655921936, 'learning_rate': 9.900442876016533e-06, 'epoch': 0.47}
16%|█▌ | 1813/11526 [18:56<1:40:45, 1.61it/s] 16%|█▌ | 1814/11526 [18:56<1:40:21, 1.61it/s] {'loss': 0.2289, 'grad_norm': 0.5204151272773743, 'learning_rate': 9.900141968202009e-06, 'epoch': 0.47}
16%|█▌ | 1814/11526 [18:56<1:40:21, 1.61it/s] 16%|█▌ | 1815/11526 [18:57<1:40:01, 1.62it/s] {'loss': 0.325, 'grad_norm': 0.6295218467712402, 'learning_rate': 9.89984061091858e-06, 'epoch': 0.47}
16%|█▌ | 1815/11526 [18:57<1:40:01, 1.62it/s] 16%|█▌ | 1816/11526 [18:57<1:39:50, 1.62it/s] {'loss': 0.2373, 'grad_norm': 0.5049116015434265, 'learning_rate': 9.899538804193893e-06, 'epoch': 0.47}
16%|█▌ | 1816/11526 [18:57<1:39:50, 1.62it/s] 16%|█▌ | 1817/11526 [18:58<1:39:42, 1.62it/s] {'loss': 0.2674, 'grad_norm': 0.5107879042625427, 'learning_rate': 9.89923654805563e-06, 'epoch': 0.47}
16%|█▌ | 1817/11526 [18:58<1:39:42, 1.62it/s] 16%|█▌ | 1818/11526 [18:59<1:39:34, 1.62it/s] {'loss': 0.3155, 'grad_norm': 0.6142642498016357, 'learning_rate': 9.898933842531516e-06, 'epoch': 0.47}
16%|█▌ | 1818/11526 [18:59<1:39:34, 1.62it/s] 16%|█▌ | 1819/11526 [18:59<1:39:33, 1.62it/s] {'loss': 0.2721, 'grad_norm': 0.6189050674438477, 'learning_rate': 9.898630687649313e-06, 'epoch': 0.47}
16%|█▌ | 1819/11526 [18:59<1:39:33, 1.62it/s] 16%|█▌ | 1820/11526 [19:00<1:39:33, 1.62it/s] {'loss': 0.2555, 'grad_norm': 0.534614086151123, 'learning_rate': 9.898327083436833e-06, 'epoch': 0.47}
16%|█▌ | 1820/11526 [19:00<1:39:33, 1.62it/s] 16%|█▌ | 1821/11526 [19:00<1:39:30, 1.63it/s] {'loss': 0.2536, 'grad_norm': 0.5662358403205872, 'learning_rate': 9.898023029921924e-06, 'epoch': 0.47}
16%|█▌ | 1821/11526 [19:01<1:39:30, 1.63it/s] 16%|█▌ | 1822/11526 [19:01<1:39:24, 1.63it/s] {'loss': 0.2662, 'grad_norm': 0.5638494491577148, 'learning_rate': 9.897718527132474e-06, 'epoch': 0.47}
16%|█▌ | 1822/11526 [19:01<1:39:24, 1.63it/s] 16%|█▌ | 1823/11526 [19:02<1:39:20, 1.63it/s] {'loss': 0.2666, 'grad_norm': 0.5899679660797119, 'learning_rate': 9.897413575096411e-06, 'epoch': 0.47}
16%|█▌ | 1823/11526 [19:02<1:39:20, 1.63it/s] 16%|█▌ | 1824/11526 [19:02<1:39:17, 1.63it/s] {'loss': 0.2698, 'grad_norm': 0.5886790156364441, 'learning_rate': 9.897108173841712e-06, 'epoch': 0.47}
16%|█▌ | 1824/11526 [19:02<1:39:17, 1.63it/s] 16%|█▌ | 1825/11526 [19:03<1:39:20, 1.63it/s] {'loss': 0.2763, 'grad_norm': 0.6334282755851746, 'learning_rate': 9.896802323396388e-06, 'epoch': 0.48}
16%|█▌ | 1825/11526 [19:03<1:39:20, 1.63it/s] 16%|█▌ | 1826/11526 [19:03<1:39:18, 1.63it/s] {'loss': 0.2711, 'grad_norm': 0.6269354820251465, 'learning_rate': 9.896496023788492e-06, 'epoch': 0.48}
16%|█▌ | 1826/11526 [19:04<1:39:18, 1.63it/s] 16%|█▌ | 1827/11526 [19:04<1:39:16, 1.63it/s] {'loss': 0.2829, 'grad_norm': 0.5911471843719482, 'learning_rate': 9.896189275046121e-06, 'epoch': 0.48}
16%|█▌ | 1827/11526 [19:04<1:39:16, 1.63it/s] 16%|█▌ | 1828/11526 [19:05<1:39:17, 1.63it/s] {'loss': 0.1917, 'grad_norm': 0.5703173279762268, 'learning_rate': 9.895882077197413e-06, 'epoch': 0.48}
16%|█▌ | 1828/11526 [19:05<1:39:17, 1.63it/s] 16%|█▌ | 1829/11526 [19:05<1:39:19, 1.63it/s] {'loss': 0.2535, 'grad_norm': 0.6486324071884155, 'learning_rate': 9.895574430270543e-06, 'epoch': 0.48}
16%|█▌ | 1829/11526 [19:05<1:39:19, 1.63it/s] 16%|█▌ | 1830/11526 [19:06<1:39:13, 1.63it/s] {'loss': 0.2387, 'grad_norm': 0.5186181664466858, 'learning_rate': 9.895266334293732e-06, 'epoch': 0.48}
16%|█▌ | 1830/11526 [19:06<1:39:13, 1.63it/s] 16%|█▌ | 1831/11526 [19:07<1:39:17, 1.63it/s] {'loss': 0.3368, 'grad_norm': 0.7849831581115723, 'learning_rate': 9.894957789295241e-06, 'epoch': 0.48}
16%|█▌ | 1831/11526 [19:07<1:39:17, 1.63it/s] 16%|█▌ | 1832/11526 [19:07<1:39:14, 1.63it/s] {'loss': 0.2112, 'grad_norm': 0.5888433456420898, 'learning_rate': 9.89464879530337e-06, 'epoch': 0.48}
16%|█▌ | 1832/11526 [19:07<1:39:14, 1.63it/s] 16%|█▌ | 1833/11526 [19:08<1:39:12, 1.63it/s] {'loss': 0.2898, 'grad_norm': 0.6061709523200989, 'learning_rate': 9.894339352346461e-06, 'epoch': 0.48}
16%|█▌ | 1833/11526 [19:08<1:39:12, 1.63it/s] 16%|█▌ | 1834/11526 [19:08<1:39:09, 1.63it/s] {'loss': 0.3428, 'grad_norm': 0.6593373417854309, 'learning_rate': 9.894029460452899e-06, 'epoch': 0.48}
16%|█▌ | 1834/11526 [19:09<1:39:09, 1.63it/s] 16%|█▌ | 1835/11526 [19:09<1:39:09, 1.63it/s] {'loss': 0.2471, 'grad_norm': 0.5991052389144897, 'learning_rate': 9.893719119651109e-06, 'epoch': 0.48}
16%|█▌ | 1835/11526 [19:09<1:39:09, 1.63it/s] 16%|█▌ | 1836/11526 [19:10<1:39:07, 1.63it/s] {'loss': 0.2638, 'grad_norm': 0.534231960773468, 'learning_rate': 9.893408329969558e-06, 'epoch': 0.48}
16%|█▌ | 1836/11526 [19:10<1:39:07, 1.63it/s] 16%|█▌ | 1837/11526 [19:10<1:39:05, 1.63it/s] {'loss': 0.2646, 'grad_norm': 0.6077491641044617, 'learning_rate': 9.893097091436751e-06, 'epoch': 0.48}
16%|█▌ | 1837/11526 [19:10<1:39:05, 1.63it/s] 16%|█▌ | 1838/11526 [19:11<1:39:05, 1.63it/s] {'loss': 0.2579, 'grad_norm': 0.5827044248580933, 'learning_rate': 9.89278540408124e-06, 'epoch': 0.48}
16%|█▌ | 1838/11526 [19:11<1:39:05, 1.63it/s] 16%|█▌ | 1839/11526 [19:11<1:39:05, 1.63it/s] {'loss': 0.3604, 'grad_norm': 0.652097225189209, 'learning_rate': 9.892473267931612e-06, 'epoch': 0.48}
16%|█▌ | 1839/11526 [19:12<1:39:05, 1.63it/s] 16%|█▌ | 1840/11526 [19:12<1:39:04, 1.63it/s] {'loss': 0.2481, 'grad_norm': 0.5360111594200134, 'learning_rate': 9.8921606830165e-06, 'epoch': 0.48}
16%|█▌ | 1840/11526 [19:12<1:39:04, 1.63it/s] 16%|█▌ | 1841/11526 [19:13<1:39:06, 1.63it/s] {'loss': 0.2478, 'grad_norm': 0.5854982137680054, 'learning_rate': 9.891847649364574e-06, 'epoch': 0.48}
16%|█▌ | 1841/11526 [19:13<1:39:06, 1.63it/s] 16%|█▌ | 1842/11526 [19:13<1:39:10, 1.63it/s] {'loss': 0.2991, 'grad_norm': 0.5388554334640503, 'learning_rate': 9.891534167004548e-06, 'epoch': 0.48}
16%|█▌ | 1842/11526 [19:13<1:39:10, 1.63it/s] 16%|█▌ | 1843/11526 [19:14<1:39:10, 1.63it/s] {'loss': 0.2845, 'grad_norm': 0.6091363430023193, 'learning_rate': 9.891220235965177e-06, 'epoch': 0.48}
16%|█▌ | 1843/11526 [19:14<1:39:10, 1.63it/s] 16%|█▌ | 1844/11526 [19:15<1:39:12, 1.63it/s] {'loss': 0.2971, 'grad_norm': 0.6969873905181885, 'learning_rate': 9.890905856275257e-06, 'epoch': 0.48}
16%|█▌ | 1844/11526 [19:15<1:39:12, 1.63it/s] 16%|█▌ | 1845/11526 [19:15<1:39:12, 1.63it/s] {'loss': 0.2685, 'grad_norm': 0.5898727774620056, 'learning_rate': 9.890591027963622e-06, 'epoch': 0.48}
16%|█▌ | 1845/11526 [19:15<1:39:12, 1.63it/s] 16%|█▌ | 1846/11526 [19:16<1:39:06, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5505145788192749, 'learning_rate': 9.890275751059153e-06, 'epoch': 0.48}
16%|█▌ | 1846/11526 [19:16<1:39:06, 1.63it/s] 16%|█▌ | 1847/11526 [19:16<1:39:08, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.5812573432922363, 'learning_rate': 9.889960025590765e-06, 'epoch': 0.48}
16%|█▌ | 1847/11526 [19:16<1:39:08, 1.63it/s] 16%|█▌ | 1848/11526 [19:17<1:39:05, 1.63it/s] {'loss': 0.2962, 'grad_norm': 0.6503857970237732, 'learning_rate': 9.889643851587423e-06, 'epoch': 0.48}
16%|█▌ | 1848/11526 [19:17<1:39:05, 1.63it/s] 16%|█▌ | 1849/11526 [19:18<1:39:05, 1.63it/s] {'loss': 0.2977, 'grad_norm': 0.691008985042572, 'learning_rate': 9.889327229078127e-06, 'epoch': 0.48}
16%|█▌ | 1849/11526 [19:18<1:39:05, 1.63it/s] 16%|█▌ | 1850/11526 [19:18<1:39:03, 1.63it/s] {'loss': 0.4535, 'grad_norm': 0.8344059586524963, 'learning_rate': 9.889010158091917e-06, 'epoch': 0.48}
16%|█▌ | 1850/11526 [19:18<1:39:03, 1.63it/s] 16%|█▌ | 1851/11526 [19:19<1:39:01, 1.63it/s] {'loss': 0.2817, 'grad_norm': 0.6223053336143494, 'learning_rate': 9.888692638657877e-06, 'epoch': 0.48}
16%|█▌ | 1851/11526 [19:19<1:39:01, 1.63it/s] 16%|█▌ | 1852/11526 [19:19<1:39:07, 1.63it/s] {'loss': 0.3307, 'grad_norm': 0.7381311655044556, 'learning_rate': 9.888374670805134e-06, 'epoch': 0.48}
16%|█▌ | 1852/11526 [19:20<1:39:07, 1.63it/s] 16%|█▌ | 1853/11526 [19:20<1:39:06, 1.63it/s] {'loss': 0.221, 'grad_norm': 0.45161521434783936, 'learning_rate': 9.888056254562853e-06, 'epoch': 0.48}
16%|█▌ | 1853/11526 [19:20<1:39:06, 1.63it/s] 16%|█▌ | 1854/11526 [19:21<1:39:05, 1.63it/s] {'loss': 0.2887, 'grad_norm': 0.5887551307678223, 'learning_rate': 9.88773738996024e-06, 'epoch': 0.48}
16%|█▌ | 1854/11526 [19:21<1:39:05, 1.63it/s] 16%|█▌ | 1855/11526 [19:21<1:39:02, 1.63it/s] {'loss': 0.3056, 'grad_norm': 0.6235019564628601, 'learning_rate': 9.887418077026542e-06, 'epoch': 0.48}
16%|█▌ | 1855/11526 [19:21<1:39:02, 1.63it/s] 16%|█▌ | 1856/11526 [19:22<1:39:03, 1.63it/s] {'loss': 0.3352, 'grad_norm': 0.7332637906074524, 'learning_rate': 9.88709831579105e-06, 'epoch': 0.48}
16%|█▌ | 1856/11526 [19:22<1:39:03, 1.63it/s] 16%|█▌ | 1857/11526 [19:23<1:39:02, 1.63it/s] {'loss': 0.2798, 'grad_norm': 0.6782421469688416, 'learning_rate': 9.886778106283095e-06, 'epoch': 0.48}
16%|█▌ | 1857/11526 [19:23<1:39:02, 1.63it/s] 16%|█▌ | 1858/11526 [19:23<1:39:03, 1.63it/s] {'loss': 0.2272, 'grad_norm': 0.5488414764404297, 'learning_rate': 9.886457448532048e-06, 'epoch': 0.48}
16%|█▌ | 1858/11526 [19:23<1:39:03, 1.63it/s] 16%|█▌ | 1859/11526 [19:24<1:39:02, 1.63it/s] {'loss': 0.2291, 'grad_norm': 0.5275760889053345, 'learning_rate': 9.88613634256732e-06, 'epoch': 0.48}
16%|█▌ | 1859/11526 [19:24<1:39:02, 1.63it/s] 16%|█▌ | 1860/11526 [19:24<1:38:59, 1.63it/s] {'loss': 0.335, 'grad_norm': 0.6378044486045837, 'learning_rate': 9.885814788418367e-06, 'epoch': 0.48}
16%|█▌ | 1860/11526 [19:24<1:38:59, 1.63it/s] 16%|█▌ | 1861/11526 [19:25<1:38:57, 1.63it/s] {'loss': 0.2049, 'grad_norm': 0.4788295328617096, 'learning_rate': 9.885492786114682e-06, 'epoch': 0.48}
16%|█▌ | 1861/11526 [19:25<1:38:57, 1.63it/s] 16%|█▌ | 1862/11526 [19:26<1:39:00, 1.63it/s] {'loss': 0.2186, 'grad_norm': 0.6270354986190796, 'learning_rate': 9.8851703356858e-06, 'epoch': 0.48}
16%|█▌ | 1862/11526 [19:26<1:39:00, 1.63it/s] 16%|█▌ | 1863/11526 [19:26<1:38:57, 1.63it/s] {'loss': 0.2398, 'grad_norm': 0.5636417865753174, 'learning_rate': 9.8848474371613e-06, 'epoch': 0.48}
16%|█▌ | 1863/11526 [19:26<1:38:57, 1.63it/s] 16%|█▌ | 1864/11526 [19:27<1:38:55, 1.63it/s] {'loss': 0.252, 'grad_norm': 0.5731325745582581, 'learning_rate': 9.8845240905708e-06, 'epoch': 0.49}
16%|█▌ | 1864/11526 [19:27<1:38:55, 1.63it/s] 16%|█▌ | 1865/11526 [19:27<1:38:54, 1.63it/s] {'loss': 0.3072, 'grad_norm': 0.6346918940544128, 'learning_rate': 9.884200295943958e-06, 'epoch': 0.49}
16%|█▌ | 1865/11526 [19:28<1:38:54, 1.63it/s] 16%|█▌ | 1866/11526 [19:28<1:38:52, 1.63it/s] {'loss': 0.2237, 'grad_norm': 0.5768142938613892, 'learning_rate': 9.883876053310475e-06, 'epoch': 0.49}
16%|█▌ | 1866/11526 [19:28<1:38:52, 1.63it/s] 16%|█▌ | 1867/11526 [19:29<1:38:51, 1.63it/s] {'loss': 0.2908, 'grad_norm': 0.6394780874252319, 'learning_rate': 9.883551362700093e-06, 'epoch': 0.49}
16%|█▌ | 1867/11526 [19:29<1:38:51, 1.63it/s] 16%|█▌ | 1868/11526 [19:29<1:38:52, 1.63it/s] {'loss': 0.2705, 'grad_norm': 0.6458249688148499, 'learning_rate': 9.883226224142592e-06, 'epoch': 0.49}
16%|█▌ | 1868/11526 [19:29<1:38:52, 1.63it/s] 16%|█▌ | 1869/11526 [19:30<1:38:52, 1.63it/s] {'loss': 0.3164, 'grad_norm': 0.6606937646865845, 'learning_rate': 9.8829006376678e-06, 'epoch': 0.49}
16%|█▌ | 1869/11526 [19:30<1:38:52, 1.63it/s] 16%|█▌ | 1870/11526 [19:30<1:38:50, 1.63it/s] {'loss': 0.3527, 'grad_norm': 0.9363361597061157, 'learning_rate': 9.882574603305578e-06, 'epoch': 0.49}
16%|█▌ | 1870/11526 [19:31<1:38:50, 1.63it/s] 16%|█▌ | 1871/11526 [19:31<1:38:48, 1.63it/s] {'loss': 0.3018, 'grad_norm': 0.6168779730796814, 'learning_rate': 9.882248121085831e-06, 'epoch': 0.49}
16%|█▌ | 1871/11526 [19:31<1:38:48, 1.63it/s] 16%|█▌ | 1872/11526 [19:32<1:38:49, 1.63it/s] {'loss': 0.383, 'grad_norm': 0.7633475661277771, 'learning_rate': 9.881921191038508e-06, 'epoch': 0.49}
16%|█▌ | 1872/11526 [19:32<1:38:49, 1.63it/s] 16%|█▋ | 1873/11526 [19:32<1:38:49, 1.63it/s] {'loss': 0.344, 'grad_norm': 0.5873231887817383, 'learning_rate': 9.881593813193598e-06, 'epoch': 0.49}
16%|█▋ | 1873/11526 [19:32<1:38:49, 1.63it/s] 16%|█▋ | 1874/11526 [19:33<1:38:50, 1.63it/s] {'loss': 0.3031, 'grad_norm': 0.6077483296394348, 'learning_rate': 9.881265987581128e-06, 'epoch': 0.49}
16%|█▋ | 1874/11526 [19:33<1:38:50, 1.63it/s] 16%|█▋ | 1875/11526 [19:34<1:38:47, 1.63it/s] {'loss': 0.3207, 'grad_norm': 0.7646569609642029, 'learning_rate': 9.880937714231166e-06, 'epoch': 0.49}
16%|█▋ | 1875/11526 [19:34<1:38:47, 1.63it/s] 16%|█▋ | 1876/11526 [19:34<1:38:46, 1.63it/s] {'loss': 0.2302, 'grad_norm': 0.5585705637931824, 'learning_rate': 9.880608993173829e-06, 'epoch': 0.49}
16%|█▋ | 1876/11526 [19:34<1:38:46, 1.63it/s] 16%|█▋ | 1877/11526 [19:35<1:38:49, 1.63it/s] {'loss': 0.3008, 'grad_norm': 0.7021940350532532, 'learning_rate': 9.880279824439263e-06, 'epoch': 0.49}
16%|█▋ | 1877/11526 [19:35<1:38:49, 1.63it/s] 16%|█▋ | 1878/11526 [19:35<1:38:47, 1.63it/s] {'loss': 0.2146, 'grad_norm': 0.4949135184288025, 'learning_rate': 9.879950208057665e-06, 'epoch': 0.49}
16%|█▋ | 1878/11526 [19:36<1:38:47, 1.63it/s] 16%|█▋ | 1879/11526 [19:36<1:38:44, 1.63it/s] {'loss': 0.2858, 'grad_norm': 0.6212474703788757, 'learning_rate': 9.879620144059268e-06, 'epoch': 0.49}
16%|█▋ | 1879/11526 [19:36<1:38:44, 1.63it/s] 16%|█▋ | 1880/11526 [19:37<1:38:45, 1.63it/s] {'loss': 0.3126, 'grad_norm': 0.67038893699646, 'learning_rate': 9.879289632474347e-06, 'epoch': 0.49}
16%|█▋ | 1880/11526 [19:37<1:38:45, 1.63it/s] 16%|█▋ | 1881/11526 [19:37<1:38:43, 1.63it/s] {'loss': 0.2553, 'grad_norm': 0.607437252998352, 'learning_rate': 9.878958673333219e-06, 'epoch': 0.49}
16%|█▋ | 1881/11526 [19:37<1:38:43, 1.63it/s] 16%|█▋ | 1882/11526 [19:38<1:38:42, 1.63it/s] {'loss': 0.3095, 'grad_norm': 0.6013221740722656, 'learning_rate': 9.87862726666624e-06, 'epoch': 0.49}
16%|█▋ | 1882/11526 [19:38<1:38:42, 1.63it/s] 16%|█▋ | 1883/11526 [19:38<1:38:42, 1.63it/s] {'loss': 0.3227, 'grad_norm': 0.6729598045349121, 'learning_rate': 9.878295412503813e-06, 'epoch': 0.49}
16%|█▋ | 1883/11526 [19:39<1:38:42, 1.63it/s] 16%|█▋ | 1884/11526 [19:39<1:38:44, 1.63it/s] {'loss': 0.3562, 'grad_norm': 0.7025308012962341, 'learning_rate': 9.877963110876374e-06, 'epoch': 0.49}
16%|█▋ | 1884/11526 [19:39<1:38:44, 1.63it/s] 16%|█▋ | 1885/11526 [19:40<1:38:43, 1.63it/s] {'loss': 0.2632, 'grad_norm': 0.6081389784812927, 'learning_rate': 9.877630361814401e-06, 'epoch': 0.49}
16%|█▋ | 1885/11526 [19:40<1:38:43, 1.63it/s] 16%|█▋ | 1886/11526 [19:40<1:38:48, 1.63it/s] {'loss': 0.3075, 'grad_norm': 0.6746421456336975, 'learning_rate': 9.87729716534842e-06, 'epoch': 0.49}
16%|█▋ | 1886/11526 [19:40<1:38:48, 1.63it/s] 16%|█▋ | 1887/11526 [19:41<1:38:54, 1.62it/s] {'loss': 0.2678, 'grad_norm': 0.5761078000068665, 'learning_rate': 9.876963521508993e-06, 'epoch': 0.49}
16%|█▋ | 1887/11526 [19:41<1:38:54, 1.62it/s] 16%|█▋ | 1888/11526 [19:42<1:38:50, 1.63it/s] {'loss': 0.2587, 'grad_norm': 0.5383070707321167, 'learning_rate': 9.876629430326722e-06, 'epoch': 0.49}
16%|█▋ | 1888/11526 [19:42<1:38:50, 1.63it/s] 16%|█▋ | 1889/11526 [19:42<1:38:48, 1.63it/s] {'loss': 0.2772, 'grad_norm': 0.5938236713409424, 'learning_rate': 9.876294891832255e-06, 'epoch': 0.49}
16%|█▋ | 1889/11526 [19:42<1:38:48, 1.63it/s] 16%|█▋ | 1890/11526 [19:43<1:38:44, 1.63it/s] {'loss': 0.2903, 'grad_norm': 0.6738806962966919, 'learning_rate': 9.875959906056274e-06, 'epoch': 0.49}
16%|█▋ | 1890/11526 [19:43<1:38:44, 1.63it/s] 16%|█▋ | 1891/11526 [19:43<1:38:49, 1.62it/s] {'loss': 0.3349, 'grad_norm': 0.6198453307151794, 'learning_rate': 9.875624473029508e-06, 'epoch': 0.49}
16%|█▋ | 1891/11526 [19:44<1:38:49, 1.62it/s] 16%|█▋ | 1892/11526 [19:44<1:38:48, 1.63it/s] {'loss': 0.2901, 'grad_norm': 0.5889297723770142, 'learning_rate': 9.875288592782724e-06, 'epoch': 0.49}
16%|█▋ | 1892/11526 [19:44<1:38:48, 1.63it/s] 16%|█▋ | 1893/11526 [19:45<1:38:42, 1.63it/s] {'loss': 0.3536, 'grad_norm': 0.7297613620758057, 'learning_rate': 9.87495226534673e-06, 'epoch': 0.49}
16%|█▋ | 1893/11526 [19:45<1:38:42, 1.63it/s] 16%|█▋ | 1894/11526 [19:45<1:38:39, 1.63it/s] {'loss': 0.2739, 'grad_norm': 0.5550413727760315, 'learning_rate': 9.874615490752377e-06, 'epoch': 0.49}
16%|█▋ | 1894/11526 [19:45<1:38:39, 1.63it/s] 16%|█▋ | 1895/11526 [19:46<1:38:37, 1.63it/s] {'loss': 0.3016, 'grad_norm': 0.6398603916168213, 'learning_rate': 9.874278269030557e-06, 'epoch': 0.49}
16%|█▋ | 1895/11526 [19:46<1:38:37, 1.63it/s] 16%|█▋ | 1896/11526 [19:46<1:38:39, 1.63it/s] {'loss': 0.2396, 'grad_norm': 0.4702272415161133, 'learning_rate': 9.8739406002122e-06, 'epoch': 0.49}
16%|█▋ | 1896/11526 [19:47<1:38:39, 1.63it/s] 16%|█▋ | 1897/11526 [19:47<1:38:35, 1.63it/s] {'loss': 0.3114, 'grad_norm': 0.6421331763267517, 'learning_rate': 9.87360248432828e-06, 'epoch': 0.49}
16%|█▋ | 1897/11526 [19:47<1:38:35, 1.63it/s] 16%|█▋ | 1898/11526 [19:48<1:38:34, 1.63it/s] {'loss': 0.2901, 'grad_norm': 0.613903820514679, 'learning_rate': 9.87326392140981e-06, 'epoch': 0.49}
16%|█▋ | 1898/11526 [19:48<1:38:34, 1.63it/s] 16%|█▋ | 1899/11526 [19:48<1:38:35, 1.63it/s] {'loss': 0.3034, 'grad_norm': 0.9019092917442322, 'learning_rate': 9.872924911487845e-06, 'epoch': 0.49}
16%|█▋ | 1899/11526 [19:48<1:38:35, 1.63it/s] 16%|█▋ | 1900/11526 [19:49<1:38:30, 1.63it/s] {'loss': 0.3627, 'grad_norm': 0.7461093664169312, 'learning_rate': 9.872585454593482e-06, 'epoch': 0.49}
16%|█▋ | 1900/11526 [19:49<1:38:30, 1.63it/s] 16%|█▋ | 1901/11526 [19:50<1:38:29, 1.63it/s] {'loss': 0.1806, 'grad_norm': 0.42721113562583923, 'learning_rate': 9.872245550757858e-06, 'epoch': 0.49}
16%|█▋ | 1901/11526 [19:50<1:38:29, 1.63it/s] 17%|█▋ | 1902/11526 [19:50<1:38:31, 1.63it/s] {'loss': 0.2933, 'grad_norm': 0.5794920921325684, 'learning_rate': 9.871905200012148e-06, 'epoch': 0.5}
17%|█▋ | 1902/11526 [19:50<1:38:31, 1.63it/s] 17%|█▋ | 1903/11526 [19:51<1:38:28, 1.63it/s] {'loss': 0.338, 'grad_norm': 0.580616295337677, 'learning_rate': 9.871564402387574e-06, 'epoch': 0.5}
17%|█▋ | 1903/11526 [19:51<1:38:28, 1.63it/s] 17%|█▋ | 1904/11526 [19:51<1:38:29, 1.63it/s] {'loss': 0.2483, 'grad_norm': 0.4726150333881378, 'learning_rate': 9.871223157915396e-06, 'epoch': 0.5}
17%|█▋ | 1904/11526 [19:52<1:38:29, 1.63it/s] 17%|█▋ | 1905/11526 [19:52<1:38:30, 1.63it/s] {'loss': 0.3021, 'grad_norm': 0.5690626502037048, 'learning_rate': 9.870881466626912e-06, 'epoch': 0.5}
17%|█▋ | 1905/11526 [19:52<1:38:30, 1.63it/s] 17%|█▋ | 1906/11526 [19:53<1:38:38, 1.63it/s] {'loss': 0.2578, 'grad_norm': 0.7423033714294434, 'learning_rate': 9.870539328553468e-06, 'epoch': 0.5}
17%|█▋ | 1906/11526 [19:53<1:38:38, 1.63it/s] 17%|█▋ | 1907/11526 [19:53<1:39:03, 1.62it/s] {'loss': 0.2328, 'grad_norm': 0.5298792123794556, 'learning_rate': 9.870196743726444e-06, 'epoch': 0.5}
17%|█▋ | 1907/11526 [19:53<1:39:03, 1.62it/s] 17%|█▋ | 1908/11526 [19:54<1:38:51, 1.62it/s] {'loss': 0.2237, 'grad_norm': 0.47172003984451294, 'learning_rate': 9.869853712177262e-06, 'epoch': 0.5}
17%|█▋ | 1908/11526 [19:54<1:38:51, 1.62it/s] 17%|█▋ | 1909/11526 [19:54<1:38:42, 1.62it/s] {'loss': 0.2794, 'grad_norm': 0.6446471214294434, 'learning_rate': 9.869510233937391e-06, 'epoch': 0.5}
17%|█▋ | 1909/11526 [19:55<1:38:42, 1.62it/s] 17%|█▋ | 1910/11526 [19:55<1:38:36, 1.63it/s] {'loss': 0.2371, 'grad_norm': 0.5225342512130737, 'learning_rate': 9.869166309038337e-06, 'epoch': 0.5}
17%|█▋ | 1910/11526 [19:55<1:38:36, 1.63it/s] 17%|█▋ | 1911/11526 [19:56<1:38:37, 1.62it/s] {'loss': 0.2401, 'grad_norm': 0.5544805526733398, 'learning_rate': 9.868821937511641e-06, 'epoch': 0.5}
17%|█▋ | 1911/11526 [19:56<1:38:37, 1.62it/s] 17%|█▋ | 1912/11526 [19:56<1:38:34, 1.63it/s] {'loss': 0.2918, 'grad_norm': 0.615851104259491, 'learning_rate': 9.868477119388897e-06, 'epoch': 0.5}
17%|█▋ | 1912/11526 [19:56<1:38:34, 1.63it/s] 17%|█▋ | 1913/11526 [19:57<1:38:33, 1.63it/s] {'loss': 0.2678, 'grad_norm': 0.6200054287910461, 'learning_rate': 9.868131854701729e-06, 'epoch': 0.5}
17%|█▋ | 1913/11526 [19:57<1:38:33, 1.63it/s] 17%|█▋ | 1914/11526 [19:58<1:38:28, 1.63it/s] {'loss': 0.2876, 'grad_norm': 0.6332622170448303, 'learning_rate': 9.86778614348181e-06, 'epoch': 0.5}
17%|█▋ | 1914/11526 [19:58<1:38:28, 1.63it/s] 17%|█▋ | 1915/11526 [19:58<1:38:30, 1.63it/s] {'loss': 0.2697, 'grad_norm': 0.6188172698020935, 'learning_rate': 9.86743998576085e-06, 'epoch': 0.5}
17%|█▋ | 1915/11526 [19:58<1:38:30, 1.63it/s] 17%|█▋ | 1916/11526 [19:59<1:38:31, 1.63it/s] {'loss': 0.204, 'grad_norm': 0.5505199432373047, 'learning_rate': 9.867093381570599e-06, 'epoch': 0.5}
17%|█▋ | 1916/11526 [19:59<1:38:31, 1.63it/s] 17%|█▋ | 1917/11526 [19:59<1:38:34, 1.62it/s] {'loss': 0.2931, 'grad_norm': 0.7359306216239929, 'learning_rate': 9.86674633094285e-06, 'epoch': 0.5}
17%|█▋ | 1917/11526 [20:00<1:38:34, 1.62it/s] 17%|█▋ | 1918/11526 [20:00<1:38:31, 1.63it/s] {'loss': 0.2084, 'grad_norm': 0.5367986559867859, 'learning_rate': 9.866398833909438e-06, 'epoch': 0.5}
17%|█▋ | 1918/11526 [20:00<1:38:31, 1.63it/s] 17%|█▋ | 1919/11526 [20:01<1:38:24, 1.63it/s] {'loss': 0.3018, 'grad_norm': 0.5978849530220032, 'learning_rate': 9.866050890502236e-06, 'epoch': 0.5}
17%|█▋ | 1919/11526 [20:01<1:38:24, 1.63it/s] 17%|█▋ | 1920/11526 [20:01<1:38:25, 1.63it/s] {'loss': 0.3253, 'grad_norm': 0.6240270137786865, 'learning_rate': 9.865702500753158e-06, 'epoch': 0.5}
17%|█▋ | 1920/11526 [20:01<1:38:25, 1.63it/s] 17%|█▋ | 1921/11526 [20:02<1:38:23, 1.63it/s] {'loss': 0.2507, 'grad_norm': 0.5705375671386719, 'learning_rate': 9.865353664694164e-06, 'epoch': 0.5}
17%|█▋ | 1921/11526 [20:02<1:38:23, 1.63it/s] 17%|█▋ | 1922/11526 [20:02<1:38:26, 1.63it/s] {'loss': 0.2898, 'grad_norm': 0.6180533170700073, 'learning_rate': 9.865004382357248e-06, 'epoch': 0.5}
17%|█▋ | 1922/11526 [20:03<1:38:26, 1.63it/s] 17%|█▋ | 1923/11526 [20:03<1:38:27, 1.63it/s] {'loss': 0.2699, 'grad_norm': 0.638435423374176, 'learning_rate': 9.86465465377445e-06, 'epoch': 0.5}
17%|█▋ | 1923/11526 [20:03<1:38:27, 1.63it/s] 17%|█▋ | 1924/11526 [20:04<1:38:24, 1.63it/s] {'loss': 0.2845, 'grad_norm': 0.6167441606521606, 'learning_rate': 9.864304478977849e-06, 'epoch': 0.5}
17%|█▋ | 1924/11526 [20:04<1:38:24, 1.63it/s] 17%|█▋ | 1925/11526 [20:04<1:38:18, 1.63it/s] {'loss': 0.2526, 'grad_norm': 0.5341817736625671, 'learning_rate': 9.863953857999565e-06, 'epoch': 0.5}
17%|█▋ | 1925/11526 [20:04<1:38:18, 1.63it/s] 17%|█▋ | 1926/11526 [20:05<1:38:17, 1.63it/s] {'loss': 0.2617, 'grad_norm': 0.5845621228218079, 'learning_rate': 9.863602790871756e-06, 'epoch': 0.5}
17%|█▋ | 1926/11526 [20:05<1:38:17, 1.63it/s] 17%|█▋ | 1927/11526 [20:06<1:38:19, 1.63it/s] {'loss': 0.3059, 'grad_norm': 0.6614066958427429, 'learning_rate': 9.863251277626626e-06, 'epoch': 0.5}
17%|█▋ | 1927/11526 [20:06<1:38:19, 1.63it/s] 17%|█▋ | 1928/11526 [20:06<1:38:18, 1.63it/s] {'loss': 0.2746, 'grad_norm': 0.5209082365036011, 'learning_rate': 9.862899318296419e-06, 'epoch': 0.5}
17%|█▋ | 1928/11526 [20:06<1:38:18, 1.63it/s] 17%|█▋ | 1929/11526 [20:07<1:38:22, 1.63it/s] {'loss': 0.3322, 'grad_norm': 0.7992464900016785, 'learning_rate': 9.86254691291342e-06, 'epoch': 0.5}
17%|█▋ | 1929/11526 [20:07<1:38:22, 1.63it/s] 17%|█▋ | 1930/11526 [20:07<1:38:23, 1.63it/s] {'loss': 0.2949, 'grad_norm': 0.7061419486999512, 'learning_rate': 9.862194061509949e-06, 'epoch': 0.5}
17%|█▋ | 1930/11526 [20:08<1:38:23, 1.63it/s] 17%|█▋ | 1931/11526 [20:08<1:38:30, 1.62it/s] {'loss': 0.2525, 'grad_norm': 0.5105396509170532, 'learning_rate': 9.861840764118375e-06, 'epoch': 0.5}
17%|█▋ | 1931/11526 [20:08<1:38:30, 1.62it/s] 17%|█▋ | 1932/11526 [20:09<1:38:30, 1.62it/s] {'loss': 0.2295, 'grad_norm': 0.5667216181755066, 'learning_rate': 9.861487020771103e-06, 'epoch': 0.5}
17%|█▋ | 1932/11526 [20:09<1:38:30, 1.62it/s] 17%|█▋ | 1933/11526 [20:09<1:38:22, 1.63it/s] {'loss': 0.2713, 'grad_norm': 0.5570502877235413, 'learning_rate': 9.86113283150058e-06, 'epoch': 0.5}
17%|█▋ | 1933/11526 [20:09<1:38:22, 1.63it/s] 17%|█▋ | 1934/11526 [20:10<1:38:16, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.5462026000022888, 'learning_rate': 9.860778196339295e-06, 'epoch': 0.5}
17%|█▋ | 1934/11526 [20:10<1:38:16, 1.63it/s] 17%|█▋ | 1935/11526 [20:10<1:38:13, 1.63it/s] {'loss': 0.287, 'grad_norm': 0.5885124206542969, 'learning_rate': 9.860423115319778e-06, 'epoch': 0.5}
17%|█▋ | 1935/11526 [20:11<1:38:13, 1.63it/s] 17%|█▋ | 1936/11526 [20:11<1:38:24, 1.62it/s] {'loss': 0.2994, 'grad_norm': 0.5526739358901978, 'learning_rate': 9.860067588474597e-06, 'epoch': 0.5}
17%|█▋ | 1936/11526 [20:11<1:38:24, 1.62it/s] 17%|█▋ | 1937/11526 [20:12<1:38:23, 1.62it/s] {'loss': 0.3217, 'grad_norm': 0.6317270398139954, 'learning_rate': 9.859711615836366e-06, 'epoch': 0.5}
17%|█▋ | 1937/11526 [20:12<1:38:23, 1.62it/s] 17%|█▋ | 1938/11526 [20:12<1:38:20, 1.63it/s] {'loss': 0.3135, 'grad_norm': 0.6550793051719666, 'learning_rate': 9.859355197437735e-06, 'epoch': 0.5}
17%|█▋ | 1938/11526 [20:12<1:38:20, 1.63it/s] 17%|█▋ | 1939/11526 [20:13<1:38:14, 1.63it/s] {'loss': 0.3355, 'grad_norm': 0.5893796682357788, 'learning_rate': 9.858998333311394e-06, 'epoch': 0.5}
17%|█▋ | 1939/11526 [20:13<1:38:14, 1.63it/s] 17%|█▋ | 1940/11526 [20:14<1:38:10, 1.63it/s] {'loss': 0.2288, 'grad_norm': 0.545036256313324, 'learning_rate': 9.858641023490082e-06, 'epoch': 0.5}
17%|█▋ | 1940/11526 [20:14<1:38:10, 1.63it/s] 17%|█▋ | 1941/11526 [20:14<1:38:08, 1.63it/s] {'loss': 0.2464, 'grad_norm': 0.603199303150177, 'learning_rate': 9.85828326800657e-06, 'epoch': 0.51}
17%|█▋ | 1941/11526 [20:14<1:38:08, 1.63it/s] 17%|█▋ | 1942/11526 [20:15<1:38:15, 1.63it/s] {'loss': 0.3142, 'grad_norm': 0.6055068373680115, 'learning_rate': 9.857925066893674e-06, 'epoch': 0.51}
17%|█▋ | 1942/11526 [20:15<1:38:15, 1.63it/s] 17%|█▋ | 1943/11526 [20:15<1:38:11, 1.63it/s] {'loss': 0.2826, 'grad_norm': 0.6488090753555298, 'learning_rate': 9.85756642018425e-06, 'epoch': 0.51}
17%|█▋ | 1943/11526 [20:16<1:38:11, 1.63it/s] 17%|█▋ | 1944/11526 [20:16<1:38:10, 1.63it/s] {'loss': 0.285, 'grad_norm': 0.613360583782196, 'learning_rate': 9.857207327911196e-06, 'epoch': 0.51}
17%|█▋ | 1944/11526 [20:16<1:38:10, 1.63it/s] 17%|█▋ | 1945/11526 [20:17<1:38:06, 1.63it/s] {'loss': 0.2357, 'grad_norm': 0.6925175189971924, 'learning_rate': 9.85684779010745e-06, 'epoch': 0.51}
17%|█▋ | 1945/11526 [20:17<1:38:06, 1.63it/s] 17%|█▋ | 1946/11526 [20:17<1:38:08, 1.63it/s] {'loss': 0.2671, 'grad_norm': 0.6227112412452698, 'learning_rate': 9.85648780680599e-06, 'epoch': 0.51}
17%|█▋ | 1946/11526 [20:17<1:38:08, 1.63it/s] 17%|█▋ | 1947/11526 [20:18<1:38:06, 1.63it/s] {'loss': 0.239, 'grad_norm': 0.5638155937194824, 'learning_rate': 9.856127378039836e-06, 'epoch': 0.51}
17%|█▋ | 1947/11526 [20:18<1:38:06, 1.63it/s] 17%|█▋ | 1948/11526 [20:18<1:38:04, 1.63it/s] {'loss': 0.3136, 'grad_norm': 0.6074802279472351, 'learning_rate': 9.855766503842048e-06, 'epoch': 0.51}
17%|█▋ | 1948/11526 [20:19<1:38:04, 1.63it/s] 17%|█▋ | 1949/11526 [20:19<1:38:02, 1.63it/s] {'loss': 0.2663, 'grad_norm': 0.6013402938842773, 'learning_rate': 9.855405184245728e-06, 'epoch': 0.51}
17%|█▋ | 1949/11526 [20:19<1:38:02, 1.63it/s] 17%|█▋ | 1950/11526 [20:20<1:38:00, 1.63it/s] {'loss': 0.2417, 'grad_norm': 0.5306558609008789, 'learning_rate': 9.85504341928402e-06, 'epoch': 0.51}
17%|█▋ | 1950/11526 [20:20<1:38:00, 1.63it/s] 17%|█▋ | 1951/11526 [20:20<1:37:58, 1.63it/s] {'loss': 0.2602, 'grad_norm': 0.6090792417526245, 'learning_rate': 9.854681208990105e-06, 'epoch': 0.51}
17%|█▋ | 1951/11526 [20:20<1:37:58, 1.63it/s] 17%|█▋ | 1952/11526 [20:21<1:37:58, 1.63it/s] {'loss': 0.3246, 'grad_norm': 0.6346650719642639, 'learning_rate': 9.854318553397206e-06, 'epoch': 0.51}
17%|█▋ | 1952/11526 [20:21<1:37:58, 1.63it/s] 17%|█▋ | 1953/11526 [20:22<1:37:58, 1.63it/s] {'loss': 0.2507, 'grad_norm': 0.5435665845870972, 'learning_rate': 9.853955452538592e-06, 'epoch': 0.51}
17%|█▋ | 1953/11526 [20:22<1:37:58, 1.63it/s] 17%|█▋ | 1954/11526 [20:22<1:37:59, 1.63it/s] {'loss': 0.3544, 'grad_norm': 0.7238031625747681, 'learning_rate': 9.853591906447564e-06, 'epoch': 0.51}
17%|█▋ | 1954/11526 [20:22<1:37:59, 1.63it/s] 17%|█▋ | 1955/11526 [20:23<1:37:56, 1.63it/s] {'loss': 0.2995, 'grad_norm': 0.6415871381759644, 'learning_rate': 9.85322791515747e-06, 'epoch': 0.51}
17%|█▋ | 1955/11526 [20:23<1:37:56, 1.63it/s] 17%|█▋ | 1956/11526 [20:23<1:37:55, 1.63it/s] {'loss': 0.3063, 'grad_norm': 0.6676236391067505, 'learning_rate': 9.852863478701699e-06, 'epoch': 0.51}
17%|█▋ | 1956/11526 [20:23<1:37:55, 1.63it/s] 17%|█▋ | 1957/11526 [20:24<1:37:56, 1.63it/s] {'loss': 0.3076, 'grad_norm': 0.7011843323707581, 'learning_rate': 9.85249859711368e-06, 'epoch': 0.51}
17%|█▋ | 1957/11526 [20:24<1:37:56, 1.63it/s] 17%|█▋ | 1958/11526 [20:25<1:37:56, 1.63it/s] {'loss': 0.3469, 'grad_norm': 0.744916558265686, 'learning_rate': 9.852133270426877e-06, 'epoch': 0.51}
17%|█▋ | 1958/11526 [20:25<1:37:56, 1.63it/s] 17%|█▋ | 1959/11526 [20:25<1:37:55, 1.63it/s] {'loss': 0.2314, 'grad_norm': 0.5238153338432312, 'learning_rate': 9.851767498674804e-06, 'epoch': 0.51}
17%|█▋ | 1959/11526 [20:25<1:37:55, 1.63it/s] 17%|█▋ | 1960/11526 [20:26<1:37:54, 1.63it/s] {'loss': 0.3512, 'grad_norm': 0.6210460662841797, 'learning_rate': 9.851401281891011e-06, 'epoch': 0.51}
17%|█▋ | 1960/11526 [20:26<1:37:54, 1.63it/s] 17%|█▋ | 1961/11526 [20:26<1:37:56, 1.63it/s] {'loss': 0.2515, 'grad_norm': 0.6010885238647461, 'learning_rate': 9.851034620109088e-06, 'epoch': 0.51}
17%|█▋ | 1961/11526 [20:27<1:37:56, 1.63it/s] 17%|█▋ | 1962/11526 [20:27<1:37:56, 1.63it/s] {'loss': 0.2647, 'grad_norm': 0.5666185617446899, 'learning_rate': 9.85066751336267e-06, 'epoch': 0.51}
17%|█▋ | 1962/11526 [20:27<1:37:56, 1.63it/s] 17%|█▋ | 1963/11526 [20:28<1:37:53, 1.63it/s] {'loss': 0.291, 'grad_norm': 0.5947827100753784, 'learning_rate': 9.850299961685428e-06, 'epoch': 0.51}
17%|█▋ | 1963/11526 [20:28<1:37:53, 1.63it/s] 17%|█▋ | 1964/11526 [20:28<1:37:51, 1.63it/s] {'loss': 0.2416, 'grad_norm': 0.5564320683479309, 'learning_rate': 9.849931965111075e-06, 'epoch': 0.51}
17%|█▋ | 1964/11526 [20:28<1:37:51, 1.63it/s] 17%|█▋ | 1965/11526 [20:29<1:37:49, 1.63it/s] {'loss': 0.228, 'grad_norm': 0.46795517206192017, 'learning_rate': 9.84956352367337e-06, 'epoch': 0.51}
17%|█▋ | 1965/11526 [20:29<1:37:49, 1.63it/s] 17%|█▋ | 1966/11526 [20:30<1:37:51, 1.63it/s] {'loss': 0.3123, 'grad_norm': 0.6679921746253967, 'learning_rate': 9.849194637406104e-06, 'epoch': 0.51}
17%|█▋ | 1966/11526 [20:30<1:37:51, 1.63it/s] 17%|█▋ | 1967/11526 [20:30<1:37:49, 1.63it/s] {'loss': 0.338, 'grad_norm': 0.5808160901069641, 'learning_rate': 9.848825306343114e-06, 'epoch': 0.51}
17%|█▋ | 1967/11526 [20:30<1:37:49, 1.63it/s] 17%|█▋ | 1968/11526 [20:31<1:37:48, 1.63it/s] {'loss': 0.297, 'grad_norm': 0.5990259647369385, 'learning_rate': 9.84845553051828e-06, 'epoch': 0.51}
17%|█▋ | 1968/11526 [20:31<1:37:48, 1.63it/s] 17%|█▋ | 1969/11526 [20:31<1:37:46, 1.63it/s] {'loss': 0.3589, 'grad_norm': 0.638698399066925, 'learning_rate': 9.848085309965516e-06, 'epoch': 0.51}
17%|█▋ | 1969/11526 [20:31<1:37:46, 1.63it/s] 17%|█▋ | 1970/11526 [20:32<1:37:46, 1.63it/s] {'loss': 0.3401, 'grad_norm': 0.6428869962692261, 'learning_rate': 9.847714644718786e-06, 'epoch': 0.51}
17%|█▋ | 1970/11526 [20:32<1:37:46, 1.63it/s] 17%|█▋ | 1971/11526 [20:33<1:37:45, 1.63it/s] {'loss': 0.332, 'grad_norm': 0.7436872124671936, 'learning_rate': 9.847343534812084e-06, 'epoch': 0.51}
17%|█▋ | 1971/11526 [20:33<1:37:45, 1.63it/s] 17%|█▋ | 1972/11526 [20:33<1:37:49, 1.63it/s] {'loss': 0.3236, 'grad_norm': 0.6759253144264221, 'learning_rate': 9.846971980279454e-06, 'epoch': 0.51}
17%|█▋ | 1972/11526 [20:33<1:37:49, 1.63it/s] 17%|█▋ | 1973/11526 [20:34<1:37:49, 1.63it/s] {'loss': 0.2818, 'grad_norm': 0.5801043510437012, 'learning_rate': 9.846599981154975e-06, 'epoch': 0.51}
17%|█▋ | 1973/11526 [20:34<1:37:49, 1.63it/s] 17%|█▋ | 1974/11526 [20:34<1:37:50, 1.63it/s] {'loss': 0.2843, 'grad_norm': 0.6422803997993469, 'learning_rate': 9.84622753747277e-06, 'epoch': 0.51}
17%|█▋ | 1974/11526 [20:35<1:37:50, 1.63it/s] 17%|█▋ | 1975/11526 [20:35<1:37:52, 1.63it/s] {'loss': 0.2617, 'grad_norm': 0.6686455607414246, 'learning_rate': 9.845854649267001e-06, 'epoch': 0.51}
17%|█▋ | 1975/11526 [20:35<1:37:52, 1.63it/s] 17%|█▋ | 1976/11526 [20:36<1:37:48, 1.63it/s] {'loss': 0.3951, 'grad_norm': 0.6806864142417908, 'learning_rate': 9.845481316571873e-06, 'epoch': 0.51}
17%|█▋ | 1976/11526 [20:36<1:37:48, 1.63it/s] 17%|█▋ | 1977/11526 [20:36<1:37:45, 1.63it/s] {'loss': 0.3481, 'grad_norm': 0.6983235478401184, 'learning_rate': 9.845107539421627e-06, 'epoch': 0.51}
17%|█▋ | 1977/11526 [20:36<1:37:45, 1.63it/s] 17%|█▋ | 1978/11526 [20:37<1:37:47, 1.63it/s] {'loss': 0.2873, 'grad_norm': 0.6125569939613342, 'learning_rate': 9.844733317850553e-06, 'epoch': 0.51}
17%|█▋ | 1978/11526 [20:37<1:37:47, 1.63it/s] 17%|█▋ | 1979/11526 [20:37<1:37:43, 1.63it/s] {'loss': 0.2666, 'grad_norm': 0.6162131428718567, 'learning_rate': 9.844358651892972e-06, 'epoch': 0.52}
17%|█▋ | 1979/11526 [20:38<1:37:43, 1.63it/s] 17%|█▋ | 1980/11526 [20:38<1:37:41, 1.63it/s] {'loss': 0.2945, 'grad_norm': 0.5899200439453125, 'learning_rate': 9.84398354158325e-06, 'epoch': 0.52}
17%|█▋ | 1980/11526 [20:38<1:37:41, 1.63it/s] 17%|█▋ | 1981/11526 [20:39<1:37:41, 1.63it/s] {'loss': 0.251, 'grad_norm': 0.5247451066970825, 'learning_rate': 9.8436079869558e-06, 'epoch': 0.52}
17%|█▋ | 1981/11526 [20:39<1:37:41, 1.63it/s] 17%|█▋ | 1982/11526 [20:39<1:37:42, 1.63it/s] {'loss': 0.2977, 'grad_norm': 0.6177548766136169, 'learning_rate': 9.843231988045065e-06, 'epoch': 0.52}
17%|█▋ | 1982/11526 [20:39<1:37:42, 1.63it/s] 17%|█▋ | 1983/11526 [20:40<1:37:38, 1.63it/s] {'loss': 0.3524, 'grad_norm': 0.6651867032051086, 'learning_rate': 9.842855544885535e-06, 'epoch': 0.52}
17%|█▋ | 1983/11526 [20:40<1:37:38, 1.63it/s] 17%|█▋ | 1984/11526 [20:41<1:37:38, 1.63it/s] {'loss': 0.2971, 'grad_norm': 0.5762016773223877, 'learning_rate': 9.84247865751174e-06, 'epoch': 0.52}
17%|█▋ | 1984/11526 [20:41<1:37:38, 1.63it/s] 17%|█▋ | 1985/11526 [20:41<1:37:38, 1.63it/s] {'loss': 0.2535, 'grad_norm': 0.5509292483329773, 'learning_rate': 9.84210132595825e-06, 'epoch': 0.52}
17%|█▋ | 1985/11526 [20:41<1:37:38, 1.63it/s] 17%|█▋ | 1986/11526 [20:42<1:37:37, 1.63it/s] {'loss': 0.4396, 'grad_norm': 0.6986450552940369, 'learning_rate': 9.841723550259676e-06, 'epoch': 0.52}
17%|█▋ | 1986/11526 [20:42<1:37:37, 1.63it/s] 17%|█▋ | 1987/11526 [20:42<1:37:38, 1.63it/s] {'loss': 0.3031, 'grad_norm': 0.5996002554893494, 'learning_rate': 9.841345330450668e-06, 'epoch': 0.52}
17%|█▋ | 1987/11526 [20:43<1:37:38, 1.63it/s] 17%|█▋ | 1988/11526 [20:43<1:37:38, 1.63it/s] {'loss': 0.2093, 'grad_norm': 0.5217267274856567, 'learning_rate': 9.840966666565923e-06, 'epoch': 0.52}
17%|█▋ | 1988/11526 [20:43<1:37:38, 1.63it/s] 17%|█▋ | 1989/11526 [20:44<1:37:34, 1.63it/s] {'loss': 0.2992, 'grad_norm': 0.6691561341285706, 'learning_rate': 9.840587558640172e-06, 'epoch': 0.52}
17%|█▋ | 1989/11526 [20:44<1:37:34, 1.63it/s] 17%|█▋ | 1990/11526 [20:44<1:37:35, 1.63it/s] {'loss': 0.3118, 'grad_norm': 0.5913621783256531, 'learning_rate': 9.840208006708185e-06, 'epoch': 0.52}
17%|█▋ | 1990/11526 [20:44<1:37:35, 1.63it/s] 17%|█▋ | 1991/11526 [20:45<1:37:34, 1.63it/s] {'loss': 0.3368, 'grad_norm': 0.7006104588508606, 'learning_rate': 9.839828010804781e-06, 'epoch': 0.52}
17%|█▋ | 1991/11526 [20:45<1:37:34, 1.63it/s] 17%|█▋ | 1992/11526 [20:45<1:37:33, 1.63it/s] {'loss': 0.2951, 'grad_norm': 0.5856809020042419, 'learning_rate': 9.839447570964816e-06, 'epoch': 0.52}
17%|█▋ | 1992/11526 [20:46<1:37:33, 1.63it/s] 17%|█▋ | 1993/11526 [20:46<1:37:33, 1.63it/s] {'loss': 0.2482, 'grad_norm': 0.5935633182525635, 'learning_rate': 9.839066687223183e-06, 'epoch': 0.52}
17%|█▋ | 1993/11526 [20:46<1:37:33, 1.63it/s] 17%|█▋ | 1994/11526 [20:47<1:37:38, 1.63it/s] {'loss': 0.2492, 'grad_norm': 0.5615251064300537, 'learning_rate': 9.838685359614819e-06, 'epoch': 0.52}
17%|█▋ | 1994/11526 [20:47<1:37:38, 1.63it/s] 17%|█▋ | 1995/11526 [20:47<1:37:36, 1.63it/s] {'loss': 0.2247, 'grad_norm': 0.5174044370651245, 'learning_rate': 9.838303588174705e-06, 'epoch': 0.52}
17%|█▋ | 1995/11526 [20:47<1:37:36, 1.63it/s] 17%|█▋ | 1996/11526 [20:48<1:37:34, 1.63it/s] {'loss': 0.3417, 'grad_norm': 0.7576023936271667, 'learning_rate': 9.837921372937856e-06, 'epoch': 0.52}
17%|█▋ | 1996/11526 [20:48<1:37:34, 1.63it/s] 17%|█▋ | 1997/11526 [20:49<1:37:33, 1.63it/s] {'loss': 0.2151, 'grad_norm': 0.58174067735672, 'learning_rate': 9.837538713939334e-06, 'epoch': 0.52}
17%|█▋ | 1997/11526 [20:49<1:37:33, 1.63it/s] 17%|█▋ | 1998/11526 [20:49<1:37:31, 1.63it/s] {'loss': 0.327, 'grad_norm': 0.6956343650817871, 'learning_rate': 9.837155611214235e-06, 'epoch': 0.52}
17%|█▋ | 1998/11526 [20:49<1:37:31, 1.63it/s] 17%|█▋ | 1999/11526 [20:50<1:37:29, 1.63it/s] {'loss': 0.2583, 'grad_norm': 0.5294623970985413, 'learning_rate': 9.8367720647977e-06, 'epoch': 0.52}
17%|█▋ | 1999/11526 [20:50<1:37:29, 1.63it/s] 17%|█▋ | 2000/11526 [20:50<1:37:32, 1.63it/s] {'loss': 0.2688, 'grad_norm': 0.5461587309837341, 'learning_rate': 9.836388074724913e-06, 'epoch': 0.52}
17%|█▋ | 2000/11526 [20:51<1:37:32, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.7131911516189575, 'eval_runtime': 1.9561, 'eval_samples_per_second': 102.244, 'eval_steps_per_second': 6.646, 'epoch': 0.52}
17%|█▋ | 2000/11526 [20:52<1:37:32, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 17%|█▋ | 2001/11526 [20:53<3:10:57, 1.20s/it] {'loss': 0.2824, 'grad_norm': 0.5446604490280151, 'learning_rate': 9.836003641031094e-06, 'epoch': 0.52}
17%|█▋ | 2001/11526 [20:53<3:10:57, 1.20s/it] 17%|█▋ | 2002/11526 [20:54<2:42:51, 1.03s/it] {'loss': 0.2351, 'grad_norm': 0.5491369962692261, 'learning_rate': 9.835618763751504e-06, 'epoch': 0.52}
17%|█▋ | 2002/11526 [20:54<2:42:51, 1.03s/it] 17%|█▋ | 2003/11526 [20:54<2:23:14, 1.11it/s] {'loss': 0.3835, 'grad_norm': 0.6433938145637512, 'learning_rate': 9.835233442921448e-06, 'epoch': 0.52}
17%|█▋ | 2003/11526 [20:54<2:23:14, 1.11it/s] 17%|█▋ | 2004/11526 [20:55<2:09:30, 1.23it/s] {'loss': 0.3188, 'grad_norm': 0.6492552161216736, 'learning_rate': 9.83484767857627e-06, 'epoch': 0.52}
17%|█▋ | 2004/11526 [20:55<2:09:30, 1.23it/s] 17%|█▋ | 2005/11526 [20:55<1:59:50, 1.32it/s] {'loss': 0.3799, 'grad_norm': 0.6396065354347229, 'learning_rate': 9.834461470751353e-06, 'epoch': 0.52}
17%|█▋ | 2005/11526 [20:56<1:59:50, 1.32it/s] 17%|█▋ | 2006/11526 [20:56<1:53:06, 1.40it/s] {'loss': 0.221, 'grad_norm': 0.4855790138244629, 'learning_rate': 9.834074819482122e-06, 'epoch': 0.52}
17%|█▋ | 2006/11526 [20:56<1:53:06, 1.40it/s] 17%|█▋ | 2007/11526 [20:57<1:48:23, 1.46it/s] {'loss': 0.2706, 'grad_norm': 0.5560048818588257, 'learning_rate': 9.833687724804045e-06, 'epoch': 0.52}
17%|█▋ | 2007/11526 [20:57<1:48:23, 1.46it/s] 17%|█▋ | 2008/11526 [20:57<1:45:08, 1.51it/s] {'loss': 0.3178, 'grad_norm': 0.7912834286689758, 'learning_rate': 9.833300186752626e-06, 'epoch': 0.52}
17%|█▋ | 2008/11526 [20:57<1:45:08, 1.51it/s] 17%|█▋ | 2009/11526 [20:58<1:42:49, 1.54it/s] {'loss': 0.233, 'grad_norm': 0.5611964464187622, 'learning_rate': 9.832912205363416e-06, 'epoch': 0.52}
17%|█▋ | 2009/11526 [20:58<1:42:49, 1.54it/s] 17%|█▋ | 2010/11526 [20:58<1:41:11, 1.57it/s] {'loss': 0.3714, 'grad_norm': 0.5634106397628784, 'learning_rate': 9.832523780672e-06, 'epoch': 0.52}
17%|█▋ | 2010/11526 [20:59<1:41:11, 1.57it/s] 17%|█▋ | 2011/11526 [20:59<1:40:03, 1.58it/s] {'loss': 0.2743, 'grad_norm': 0.714896023273468, 'learning_rate': 9.832134912714005e-06, 'epoch': 0.52}
17%|█▋ | 2011/11526 [20:59<1:40:03, 1.58it/s] 17%|█▋ | 2012/11526 [21:00<1:39:19, 1.60it/s] {'loss': 0.2793, 'grad_norm': 0.5395379662513733, 'learning_rate': 9.831745601525102e-06, 'epoch': 0.52}
17%|█▋ | 2012/11526 [21:00<1:39:19, 1.60it/s] 17%|█▋ | 2013/11526 [21:00<1:38:44, 1.61it/s] {'loss': 0.3089, 'grad_norm': 0.6605382561683655, 'learning_rate': 9.831355847141002e-06, 'epoch': 0.52}
17%|█▋ | 2013/11526 [21:00<1:38:44, 1.61it/s] 17%|█▋ | 2014/11526 [21:01<1:38:22, 1.61it/s] {'loss': 0.2101, 'grad_norm': 0.49655580520629883, 'learning_rate': 9.830965649597455e-06, 'epoch': 0.52}
17%|█▋ | 2014/11526 [21:01<1:38:22, 1.61it/s] 17%|█▋ | 2015/11526 [21:02<1:38:02, 1.62it/s] {'loss': 0.2645, 'grad_norm': 0.6307304501533508, 'learning_rate': 9.830575008930252e-06, 'epoch': 0.52}
17%|█▋ | 2015/11526 [21:02<1:38:02, 1.62it/s] 17%|█▋ | 2016/11526 [21:02<1:37:48, 1.62it/s] {'loss': 0.4017, 'grad_norm': 0.7202574014663696, 'learning_rate': 9.830183925175223e-06, 'epoch': 0.52}
17%|█▋ | 2016/11526 [21:02<1:37:48, 1.62it/s] 17%|█▋ | 2017/11526 [21:03<1:37:38, 1.62it/s] {'loss': 0.3156, 'grad_norm': 0.6160435676574707, 'learning_rate': 9.82979239836824e-06, 'epoch': 0.52}
17%|█▋ | 2017/11526 [21:03<1:37:38, 1.62it/s] 18%|█▊ | 2018/11526 [21:03<1:37:30, 1.63it/s] {'loss': 0.3142, 'grad_norm': 0.6680341362953186, 'learning_rate': 9.829400428545221e-06, 'epoch': 0.53}
18%|█▊ | 2018/11526 [21:04<1:37:30, 1.63it/s] 18%|█▊ | 2019/11526 [21:04<1:37:24, 1.63it/s] {'loss': 0.3699, 'grad_norm': 0.7847038507461548, 'learning_rate': 9.829008015742116e-06, 'epoch': 0.53}
18%|█▊ | 2019/11526 [21:04<1:37:24, 1.63it/s] 18%|█▊ | 2020/11526 [21:05<1:37:21, 1.63it/s] {'loss': 0.4274, 'grad_norm': 0.515755295753479, 'learning_rate': 9.828615159994919e-06, 'epoch': 0.53}
18%|█▊ | 2020/11526 [21:05<1:37:21, 1.63it/s] 18%|█▊ | 2021/11526 [21:05<1:37:19, 1.63it/s] {'loss': 0.2658, 'grad_norm': 0.5729355812072754, 'learning_rate': 9.828221861339667e-06, 'epoch': 0.53}
18%|█▊ | 2021/11526 [21:05<1:37:19, 1.63it/s] 18%|█▊ | 2022/11526 [21:06<1:37:17, 1.63it/s] {'loss': 0.2693, 'grad_norm': 0.5991700887680054, 'learning_rate': 9.827828119812432e-06, 'epoch': 0.53}
18%|█▊ | 2022/11526 [21:06<1:37:17, 1.63it/s] 18%|█▊ | 2023/11526 [21:06<1:37:16, 1.63it/s] {'loss': 0.3046, 'grad_norm': 0.587770938873291, 'learning_rate': 9.827433935449335e-06, 'epoch': 0.53}
18%|█▊ | 2023/11526 [21:07<1:37:16, 1.63it/s] 18%|█▊ | 2024/11526 [21:07<1:37:15, 1.63it/s] {'loss': 0.2898, 'grad_norm': 0.5742209553718567, 'learning_rate': 9.827039308286531e-06, 'epoch': 0.53}
18%|█▊ | 2024/11526 [21:07<1:37:15, 1.63it/s] 18%|█▊ | 2025/11526 [21:08<1:37:12, 1.63it/s] {'loss': 0.2814, 'grad_norm': 0.6551092267036438, 'learning_rate': 9.826644238360215e-06, 'epoch': 0.53}
18%|█▊ | 2025/11526 [21:08<1:37:12, 1.63it/s] 18%|█▊ | 2026/11526 [21:08<1:37:13, 1.63it/s] {'loss': 0.2404, 'grad_norm': 0.5567050576210022, 'learning_rate': 9.826248725706627e-06, 'epoch': 0.53}
18%|█▊ | 2026/11526 [21:08<1:37:13, 1.63it/s] 18%|█▊ | 2027/11526 [21:09<1:37:12, 1.63it/s] {'loss': 0.3673, 'grad_norm': 0.7118464112281799, 'learning_rate': 9.825852770362046e-06, 'epoch': 0.53}
18%|█▊ | 2027/11526 [21:09<1:37:12, 1.63it/s] 18%|█▊ | 2028/11526 [21:10<1:37:10, 1.63it/s] {'loss': 0.3005, 'grad_norm': 0.5876781344413757, 'learning_rate': 9.825456372362791e-06, 'epoch': 0.53}
18%|█▊ | 2028/11526 [21:10<1:37:10, 1.63it/s] 18%|█▊ | 2029/11526 [21:10<1:37:12, 1.63it/s] {'loss': 0.2996, 'grad_norm': 0.6070621609687805, 'learning_rate': 9.825059531745222e-06, 'epoch': 0.53}
18%|█▊ | 2029/11526 [21:10<1:37:12, 1.63it/s] 18%|█▊ | 2030/11526 [21:11<1:37:14, 1.63it/s] {'loss': 0.2744, 'grad_norm': 0.6002430319786072, 'learning_rate': 9.82466224854574e-06, 'epoch': 0.53}
18%|█▊ | 2030/11526 [21:11<1:37:14, 1.63it/s] 18%|█▊ | 2031/11526 [21:11<1:37:13, 1.63it/s] {'loss': 0.2665, 'grad_norm': 0.5276063084602356, 'learning_rate': 9.824264522800785e-06, 'epoch': 0.53}
18%|█▊ | 2031/11526 [21:12<1:37:13, 1.63it/s] 18%|█▊ | 2032/11526 [21:12<1:37:09, 1.63it/s] {'loss': 0.2942, 'grad_norm': 0.6858761310577393, 'learning_rate': 9.823866354546837e-06, 'epoch': 0.53}
18%|█▊ | 2032/11526 [21:12<1:37:09, 1.63it/s] 18%|█▊ | 2033/11526 [21:13<1:37:06, 1.63it/s] {'loss': 0.2457, 'grad_norm': 0.5459600687026978, 'learning_rate': 9.823467743820424e-06, 'epoch': 0.53}
18%|█▊ | 2033/11526 [21:13<1:37:06, 1.63it/s] 18%|█▊ | 2034/11526 [21:13<1:37:04, 1.63it/s] {'loss': 0.2435, 'grad_norm': 0.5037608742713928, 'learning_rate': 9.823068690658104e-06, 'epoch': 0.53}
18%|█▊ | 2034/11526 [21:13<1:37:04, 1.63it/s] 18%|█▊ | 2035/11526 [21:14<1:37:02, 1.63it/s] {'loss': 0.2779, 'grad_norm': 0.6154975891113281, 'learning_rate': 9.822669195096479e-06, 'epoch': 0.53}
18%|█▊ | 2035/11526 [21:14<1:37:02, 1.63it/s] 18%|█▊ | 2036/11526 [21:14<1:37:06, 1.63it/s] {'loss': 0.2218, 'grad_norm': 0.46242424845695496, 'learning_rate': 9.822269257172199e-06, 'epoch': 0.53}
18%|█▊ | 2036/11526 [21:15<1:37:06, 1.63it/s] 18%|█▊ | 2037/11526 [21:15<1:37:05, 1.63it/s] {'loss': 0.2463, 'grad_norm': 0.6080564260482788, 'learning_rate': 9.821868876921942e-06, 'epoch': 0.53}
18%|█▊ | 2037/11526 [21:15<1:37:05, 1.63it/s] 18%|█▊ | 2038/11526 [21:16<1:37:03, 1.63it/s] {'loss': 0.3291, 'grad_norm': 0.6071076393127441, 'learning_rate': 9.821468054382437e-06, 'epoch': 0.53}
18%|█▊ | 2038/11526 [21:16<1:37:03, 1.63it/s] 18%|█▊ | 2039/11526 [21:16<1:37:02, 1.63it/s] {'loss': 0.2962, 'grad_norm': 0.566139280796051, 'learning_rate': 9.821066789590451e-06, 'epoch': 0.53}
18%|█▊ | 2039/11526 [21:16<1:37:02, 1.63it/s] 18%|█▊ | 2040/11526 [21:17<1:37:01, 1.63it/s] {'loss': 0.2757, 'grad_norm': 0.6636941432952881, 'learning_rate': 9.820665082582788e-06, 'epoch': 0.53}
18%|█▊ | 2040/11526 [21:17<1:37:01, 1.63it/s] 18%|█▊ | 2041/11526 [21:18<1:37:02, 1.63it/s] {'loss': 0.312, 'grad_norm': 0.6265758872032166, 'learning_rate': 9.820262933396294e-06, 'epoch': 0.53}
18%|█▊ | 2041/11526 [21:18<1:37:02, 1.63it/s] 18%|█▊ | 2042/11526 [21:18<1:37:03, 1.63it/s] {'loss': 0.2614, 'grad_norm': 0.5617411732673645, 'learning_rate': 9.819860342067857e-06, 'epoch': 0.53}
18%|█▊ | 2042/11526 [21:18<1:37:03, 1.63it/s] 18%|█▊ | 2043/11526 [21:19<1:36:59, 1.63it/s] {'loss': 0.2349, 'grad_norm': 0.508937418460846, 'learning_rate': 9.819457308634407e-06, 'epoch': 0.53}
18%|█▊ | 2043/11526 [21:19<1:36:59, 1.63it/s] 18%|█▊ | 2044/11526 [21:19<1:36:58, 1.63it/s] {'loss': 0.2903, 'grad_norm': 0.6692588329315186, 'learning_rate': 9.81905383313291e-06, 'epoch': 0.53}
18%|█▊ | 2044/11526 [21:19<1:36:58, 1.63it/s] 18%|█▊ | 2045/11526 [21:20<1:36:59, 1.63it/s] {'loss': 0.2217, 'grad_norm': 0.5411730408668518, 'learning_rate': 9.818649915600378e-06, 'epoch': 0.53}
18%|█▊ | 2045/11526 [21:20<1:36:59, 1.63it/s] 18%|█▊ | 2046/11526 [21:21<1:37:00, 1.63it/s] {'loss': 0.2749, 'grad_norm': 0.5740834474563599, 'learning_rate': 9.818245556073857e-06, 'epoch': 0.53}
18%|█▊ | 2046/11526 [21:21<1:37:00, 1.63it/s] 18%|█▊ | 2047/11526 [21:21<1:36:58, 1.63it/s] {'loss': 0.3291, 'grad_norm': 0.7335115075111389, 'learning_rate': 9.81784075459044e-06, 'epoch': 0.53}
18%|█▊ | 2047/11526 [21:21<1:36:58, 1.63it/s] 18%|█▊ | 2048/11526 [21:22<1:37:00, 1.63it/s] {'loss': 0.2329, 'grad_norm': 0.6030722856521606, 'learning_rate': 9.817435511187254e-06, 'epoch': 0.53}
18%|█▊ | 2048/11526 [21:22<1:37:00, 1.63it/s] 18%|█▊ | 2049/11526 [21:22<1:36:57, 1.63it/s] {'loss': 0.2412, 'grad_norm': 0.5488190650939941, 'learning_rate': 9.817029825901476e-06, 'epoch': 0.53}
18%|█▊ | 2049/11526 [21:23<1:36:57, 1.63it/s] 18%|█▊ | 2050/11526 [21:23<1:36:59, 1.63it/s] {'loss': 0.3047, 'grad_norm': 0.6206074357032776, 'learning_rate': 9.816623698770314e-06, 'epoch': 0.53}
18%|█▊ | 2050/11526 [21:23<1:36:59, 1.63it/s] 18%|█▊ | 2051/11526 [21:24<1:36:57, 1.63it/s] {'loss': 0.2512, 'grad_norm': 0.5848062634468079, 'learning_rate': 9.816217129831019e-06, 'epoch': 0.53}
18%|█▊ | 2051/11526 [21:24<1:36:57, 1.63it/s] 18%|█▊ | 2052/11526 [21:24<1:36:55, 1.63it/s] {'loss': 0.2733, 'grad_norm': 0.6483746767044067, 'learning_rate': 9.815810119120887e-06, 'epoch': 0.53}
18%|█▊ | 2052/11526 [21:24<1:36:55, 1.63it/s] 18%|█▊ | 2053/11526 [21:25<1:36:54, 1.63it/s] {'loss': 0.2692, 'grad_norm': 0.5696589350700378, 'learning_rate': 9.815402666677249e-06, 'epoch': 0.53}
18%|█▊ | 2053/11526 [21:25<1:36:54, 1.63it/s] 18%|█▊ | 2054/11526 [21:26<1:36:56, 1.63it/s] {'loss': 0.3218, 'grad_norm': 0.5826513767242432, 'learning_rate': 9.814994772537482e-06, 'epoch': 0.53}
18%|█▊ | 2054/11526 [21:26<1:36:56, 1.63it/s] 18%|█▊ | 2055/11526 [21:26<1:36:54, 1.63it/s] {'loss': 0.2557, 'grad_norm': 0.5441277027130127, 'learning_rate': 9.814586436738998e-06, 'epoch': 0.53}
18%|█▊ | 2055/11526 [21:26<1:36:54, 1.63it/s] 18%|█▊ | 2056/11526 [21:27<1:36:59, 1.63it/s] {'loss': 0.3311, 'grad_norm': 0.6071382761001587, 'learning_rate': 9.81417765931925e-06, 'epoch': 0.54}
18%|█▊ | 2056/11526 [21:27<1:36:59, 1.63it/s] 18%|█▊ | 2057/11526 [21:27<1:37:01, 1.63it/s] {'loss': 0.2855, 'grad_norm': 0.6262412071228027, 'learning_rate': 9.813768440315737e-06, 'epoch': 0.54}
18%|█▊ | 2057/11526 [21:27<1:37:01, 1.63it/s] 18%|█▊ | 2058/11526 [21:28<1:36:58, 1.63it/s] {'loss': 0.2477, 'grad_norm': 0.5312401056289673, 'learning_rate': 9.813358779765993e-06, 'epoch': 0.54}
18%|█▊ | 2058/11526 [21:28<1:36:58, 1.63it/s] 18%|█▊ | 2059/11526 [21:29<1:36:53, 1.63it/s] {'loss': 0.267, 'grad_norm': 0.5859576463699341, 'learning_rate': 9.812948677707597e-06, 'epoch': 0.54}
18%|█▊ | 2059/11526 [21:29<1:36:53, 1.63it/s] 18%|█▊ | 2060/11526 [21:29<1:36:53, 1.63it/s] {'loss': 0.2345, 'grad_norm': 0.6067435145378113, 'learning_rate': 9.81253813417816e-06, 'epoch': 0.54}
18%|█▊ | 2060/11526 [21:29<1:36:53, 1.63it/s] 18%|█▊ | 2061/11526 [21:30<1:37:00, 1.63it/s] {'loss': 0.3188, 'grad_norm': 0.6280801892280579, 'learning_rate': 9.812127149215346e-06, 'epoch': 0.54}
18%|█▊ | 2061/11526 [21:30<1:37:00, 1.63it/s] 18%|█▊ | 2062/11526 [21:30<1:36:55, 1.63it/s] {'loss': 0.3133, 'grad_norm': 0.6853020191192627, 'learning_rate': 9.811715722856849e-06, 'epoch': 0.54}
18%|█▊ | 2062/11526 [21:31<1:36:55, 1.63it/s] 18%|█▊ | 2063/11526 [21:31<1:36:54, 1.63it/s] {'loss': 0.2497, 'grad_norm': 0.5215528607368469, 'learning_rate': 9.811303855140408e-06, 'epoch': 0.54}
18%|█▊ | 2063/11526 [21:31<1:36:54, 1.63it/s] 18%|█▊ | 2064/11526 [21:32<1:36:53, 1.63it/s] {'loss': 0.2543, 'grad_norm': 0.5471782684326172, 'learning_rate': 9.810891546103803e-06, 'epoch': 0.54}
18%|█▊ | 2064/11526 [21:32<1:36:53, 1.63it/s] 18%|█▊ | 2065/11526 [21:32<1:36:49, 1.63it/s] {'loss': 0.2658, 'grad_norm': 0.5860686898231506, 'learning_rate': 9.810478795784852e-06, 'epoch': 0.54}
18%|█▊ | 2065/11526 [21:32<1:36:49, 1.63it/s] 18%|█▊ | 2066/11526 [21:33<1:36:58, 1.63it/s] {'loss': 0.2588, 'grad_norm': 0.6486850380897522, 'learning_rate': 9.810065604221416e-06, 'epoch': 0.54}
18%|█▊ | 2066/11526 [21:33<1:36:58, 1.63it/s] 18%|█▊ | 2067/11526 [21:33<1:36:53, 1.63it/s] {'loss': 0.3026, 'grad_norm': 0.5529473423957825, 'learning_rate': 9.809651971451394e-06, 'epoch': 0.54}
18%|█▊ | 2067/11526 [21:34<1:36:53, 1.63it/s] 18%|█▊ | 2068/11526 [21:34<1:36:51, 1.63it/s] {'loss': 0.2907, 'grad_norm': 0.5414630770683289, 'learning_rate': 9.809237897512727e-06, 'epoch': 0.54}
18%|█▊ | 2068/11526 [21:34<1:36:51, 1.63it/s] 18%|█▊ | 2069/11526 [21:35<1:36:50, 1.63it/s] {'loss': 0.2521, 'grad_norm': 0.5422844290733337, 'learning_rate': 9.808823382443398e-06, 'epoch': 0.54}
18%|█▊ | 2069/11526 [21:35<1:36:50, 1.63it/s] 18%|█▊ | 2070/11526 [21:35<1:36:47, 1.63it/s] {'loss': 0.3332, 'grad_norm': 0.7195369005203247, 'learning_rate': 9.808408426281426e-06, 'epoch': 0.54}
18%|█▊ | 2070/11526 [21:35<1:36:47, 1.63it/s] 18%|█▊ | 2071/11526 [21:36<1:36:54, 1.63it/s] {'loss': 0.2374, 'grad_norm': 0.6367312073707581, 'learning_rate': 9.807993029064874e-06, 'epoch': 0.54}
18%|█▊ | 2071/11526 [21:36<1:36:54, 1.63it/s] 18%|█▊ | 2072/11526 [21:37<1:36:55, 1.63it/s] {'loss': 0.2978, 'grad_norm': 0.5612528324127197, 'learning_rate': 9.807577190831847e-06, 'epoch': 0.54}
18%|█▊ | 2072/11526 [21:37<1:36:55, 1.63it/s] 18%|█▊ | 2073/11526 [21:37<1:36:52, 1.63it/s] {'loss': 0.3023, 'grad_norm': 0.5937158465385437, 'learning_rate': 9.807160911620484e-06, 'epoch': 0.54}
18%|█▊ | 2073/11526 [21:37<1:36:52, 1.63it/s] 18%|█▊ | 2074/11526 [21:38<1:36:50, 1.63it/s] {'loss': 0.3107, 'grad_norm': 0.583835780620575, 'learning_rate': 9.806744191468973e-06, 'epoch': 0.54}
18%|█▊ | 2074/11526 [21:38<1:36:50, 1.63it/s] 18%|█▊ | 2075/11526 [21:38<1:36:48, 1.63it/s] {'loss': 0.3216, 'grad_norm': 0.7620774507522583, 'learning_rate': 9.806327030415534e-06, 'epoch': 0.54}
18%|█▊ | 2075/11526 [21:39<1:36:48, 1.63it/s] 18%|█▊ | 2076/11526 [21:39<1:36:56, 1.62it/s] {'loss': 0.2637, 'grad_norm': 0.49049797654151917, 'learning_rate': 9.805909428498432e-06, 'epoch': 0.54}
18%|█▊ | 2076/11526 [21:39<1:36:56, 1.62it/s] 18%|█▊ | 2077/11526 [21:40<1:36:50, 1.63it/s] {'loss': 0.35, 'grad_norm': 0.6093986630439758, 'learning_rate': 9.805491385755973e-06, 'epoch': 0.54}
18%|█▊ | 2077/11526 [21:40<1:36:50, 1.63it/s] 18%|█▊ | 2078/11526 [21:40<1:36:50, 1.63it/s] {'loss': 0.2784, 'grad_norm': 0.510293185710907, 'learning_rate': 9.805072902226504e-06, 'epoch': 0.54}
18%|█▊ | 2078/11526 [21:40<1:36:50, 1.63it/s] 18%|█▊ | 2079/11526 [21:41<1:36:44, 1.63it/s] {'loss': 0.2963, 'grad_norm': 0.618696928024292, 'learning_rate': 9.804653977948406e-06, 'epoch': 0.54}
18%|█▊ | 2079/11526 [21:41<1:36:44, 1.63it/s] 18%|█▊ | 2080/11526 [21:41<1:36:42, 1.63it/s] {'loss': 0.2408, 'grad_norm': 0.6674826145172119, 'learning_rate': 9.80423461296011e-06, 'epoch': 0.54}
18%|█▊ | 2080/11526 [21:42<1:36:42, 1.63it/s] 18%|█▊ | 2081/11526 [21:42<1:36:48, 1.63it/s] {'loss': 0.2281, 'grad_norm': 0.5442675352096558, 'learning_rate': 9.80381480730008e-06, 'epoch': 0.54}
18%|█▊ | 2081/11526 [21:42<1:36:48, 1.63it/s] 18%|█▊ | 2082/11526 [21:43<1:36:46, 1.63it/s] {'loss': 0.2927, 'grad_norm': 0.5915828347206116, 'learning_rate': 9.803394561006823e-06, 'epoch': 0.54}
18%|█▊ | 2082/11526 [21:43<1:36:46, 1.63it/s] 18%|█▊ | 2083/11526 [21:43<1:36:42, 1.63it/s] {'loss': 0.2931, 'grad_norm': 0.6885738372802734, 'learning_rate': 9.802973874118886e-06, 'epoch': 0.54}
18%|█▊ | 2083/11526 [21:43<1:36:42, 1.63it/s] 18%|█▊ | 2084/11526 [21:44<1:36:44, 1.63it/s] {'loss': 0.2146, 'grad_norm': 0.46470770239830017, 'learning_rate': 9.802552746674858e-06, 'epoch': 0.54}
18%|█▊ | 2084/11526 [21:44<1:36:44, 1.63it/s] 18%|█▊ | 2085/11526 [21:45<1:36:43, 1.63it/s] {'loss': 0.3285, 'grad_norm': 0.6433090567588806, 'learning_rate': 9.802131178713366e-06, 'epoch': 0.54}
18%|█▊ | 2085/11526 [21:45<1:36:43, 1.63it/s] 18%|█▊ | 2086/11526 [21:45<1:36:42, 1.63it/s] {'loss': 0.2742, 'grad_norm': 0.5973670482635498, 'learning_rate': 9.80170917027308e-06, 'epoch': 0.54}
18%|█▊ | 2086/11526 [21:45<1:36:42, 1.63it/s] 18%|█▊ | 2087/11526 [21:46<1:36:41, 1.63it/s] {'loss': 0.2389, 'grad_norm': 0.5444079041481018, 'learning_rate': 9.801286721392708e-06, 'epoch': 0.54}
18%|█▊ | 2087/11526 [21:46<1:36:41, 1.63it/s] 18%|█▊ | 2088/11526 [21:46<1:36:38, 1.63it/s] {'loss': 0.2283, 'grad_norm': 0.5052730441093445, 'learning_rate': 9.800863832111e-06, 'epoch': 0.54}
18%|█▊ | 2088/11526 [21:47<1:36:38, 1.63it/s] 18%|█▊ | 2089/11526 [21:47<1:36:36, 1.63it/s] {'loss': 0.3306, 'grad_norm': 0.562698483467102, 'learning_rate': 9.800440502466746e-06, 'epoch': 0.54}
18%|█▊ | 2089/11526 [21:47<1:36:36, 1.63it/s] 18%|█▊ | 2090/11526 [21:48<1:36:35, 1.63it/s] {'loss': 0.2348, 'grad_norm': 0.5275609493255615, 'learning_rate': 9.800016732498773e-06, 'epoch': 0.54}
18%|█▊ | 2090/11526 [21:48<1:36:35, 1.63it/s] 18%|█▊ | 2091/11526 [21:48<1:36:33, 1.63it/s] {'loss': 0.2652, 'grad_norm': 0.6071603298187256, 'learning_rate': 9.799592522245958e-06, 'epoch': 0.54}
18%|█▊ | 2091/11526 [21:48<1:36:33, 1.63it/s] 18%|█▊ | 2092/11526 [21:49<1:36:35, 1.63it/s] {'loss': 0.3026, 'grad_norm': 0.5879668593406677, 'learning_rate': 9.799167871747206e-06, 'epoch': 0.54}
18%|█▊ | 2092/11526 [21:49<1:36:35, 1.63it/s] 18%|█▊ | 2093/11526 [21:49<1:36:35, 1.63it/s] {'loss': 0.3688, 'grad_norm': 0.705698549747467, 'learning_rate': 9.798742781041472e-06, 'epoch': 0.54}
18%|█▊ | 2093/11526 [21:50<1:36:35, 1.63it/s] 18%|█▊ | 2094/11526 [21:50<1:36:38, 1.63it/s] {'loss': 0.3157, 'grad_norm': 0.7811433672904968, 'learning_rate': 9.798317250167746e-06, 'epoch': 0.55}
18%|█▊ | 2094/11526 [21:50<1:36:38, 1.63it/s] 18%|█▊ | 2095/11526 [21:51<1:36:34, 1.63it/s] {'loss': 0.3193, 'grad_norm': 0.6589944362640381, 'learning_rate': 9.79789127916506e-06, 'epoch': 0.55}
18%|█▊ | 2095/11526 [21:51<1:36:34, 1.63it/s] 18%|█▊ | 2096/11526 [21:51<1:36:31, 1.63it/s] {'loss': 0.3007, 'grad_norm': 0.6232211589813232, 'learning_rate': 9.797464868072489e-06, 'epoch': 0.55}
18%|█▊ | 2096/11526 [21:51<1:36:31, 1.63it/s] 18%|█▊ | 2097/11526 [21:52<1:36:34, 1.63it/s] {'loss': 0.2576, 'grad_norm': 0.48749637603759766, 'learning_rate': 9.797038016929141e-06, 'epoch': 0.55}
18%|█▊ | 2097/11526 [21:52<1:36:34, 1.63it/s] 18%|█▊ | 2098/11526 [21:53<1:36:31, 1.63it/s] {'loss': 0.3809, 'grad_norm': 0.631194531917572, 'learning_rate': 9.796610725774173e-06, 'epoch': 0.55}
18%|█▊ | 2098/11526 [21:53<1:36:31, 1.63it/s] 18%|█▊ | 2099/11526 [21:53<1:36:32, 1.63it/s] {'loss': 0.2839, 'grad_norm': 0.5557408332824707, 'learning_rate': 9.796182994646779e-06, 'epoch': 0.55}
18%|█▊ | 2099/11526 [21:53<1:36:32, 1.63it/s] 18%|█▊ | 2100/11526 [21:54<1:36:32, 1.63it/s] {'loss': 0.3003, 'grad_norm': 0.5950742363929749, 'learning_rate': 9.79575482358619e-06, 'epoch': 0.55}
18%|█▊ | 2100/11526 [21:54<1:36:32, 1.63it/s] 18%|█▊ | 2101/11526 [21:54<1:36:26, 1.63it/s] {'loss': 0.3436, 'grad_norm': 0.6998963356018066, 'learning_rate': 9.795326212631682e-06, 'epoch': 0.55}
18%|█▊ | 2101/11526 [21:55<1:36:26, 1.63it/s] 18%|█▊ | 2102/11526 [21:55<1:36:28, 1.63it/s] {'loss': 0.3176, 'grad_norm': 0.6727956533432007, 'learning_rate': 9.79489716182257e-06, 'epoch': 0.55}
18%|█▊ | 2102/11526 [21:55<1:36:28, 1.63it/s] 18%|█▊ | 2103/11526 [21:56<1:36:26, 1.63it/s] {'loss': 0.2552, 'grad_norm': 0.48592430353164673, 'learning_rate': 9.794467671198208e-06, 'epoch': 0.55}
18%|█▊ | 2103/11526 [21:56<1:36:26, 1.63it/s] 18%|█▊ | 2104/11526 [21:56<1:36:25, 1.63it/s] {'loss': 0.3121, 'grad_norm': 0.5753009915351868, 'learning_rate': 9.794037740797993e-06, 'epoch': 0.55}
18%|█▊ | 2104/11526 [21:56<1:36:25, 1.63it/s] 18%|█▊ | 2105/11526 [21:57<1:36:28, 1.63it/s] {'loss': 0.2464, 'grad_norm': 0.6037343144416809, 'learning_rate': 9.793607370661358e-06, 'epoch': 0.55}
18%|█▊ | 2105/11526 [21:57<1:36:28, 1.63it/s] 18%|█▊ | 2106/11526 [21:57<1:36:26, 1.63it/s] {'loss': 0.313, 'grad_norm': 0.5606946349143982, 'learning_rate': 9.793176560827783e-06, 'epoch': 0.55}
18%|█▊ | 2106/11526 [21:58<1:36:26, 1.63it/s] 18%|█▊ | 2107/11526 [21:58<1:36:34, 1.63it/s] {'loss': 0.3593, 'grad_norm': 0.6243638396263123, 'learning_rate': 9.792745311336777e-06, 'epoch': 0.55}
18%|█▊ | 2107/11526 [21:58<1:36:34, 1.63it/s] 18%|█▊ | 2108/11526 [21:59<1:36:30, 1.63it/s] {'loss': 0.3517, 'grad_norm': 0.6271211504936218, 'learning_rate': 9.792313622227904e-06, 'epoch': 0.55}
18%|█▊ | 2108/11526 [21:59<1:36:30, 1.63it/s] 18%|█▊ | 2109/11526 [21:59<1:36:28, 1.63it/s] {'loss': 0.2447, 'grad_norm': 0.5474656224250793, 'learning_rate': 9.79188149354076e-06, 'epoch': 0.55}
18%|█▊ | 2109/11526 [21:59<1:36:28, 1.63it/s] 18%|█▊ | 2110/11526 [22:00<1:36:25, 1.63it/s] {'loss': 0.3413, 'grad_norm': 0.5847532153129578, 'learning_rate': 9.791448925314979e-06, 'epoch': 0.55}
18%|█▊ | 2110/11526 [22:00<1:36:25, 1.63it/s] 18%|█▊ | 2111/11526 [22:01<1:36:23, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.46509426832199097, 'learning_rate': 9.79101591759024e-06, 'epoch': 0.55}
18%|█▊ | 2111/11526 [22:01<1:36:23, 1.63it/s] 18%|█▊ | 2112/11526 [22:01<1:36:29, 1.63it/s] {'loss': 0.4171, 'grad_norm': 0.6503077149391174, 'learning_rate': 9.790582470406264e-06, 'epoch': 0.55}
18%|█▊ | 2112/11526 [22:01<1:36:29, 1.63it/s] 18%|█▊ | 2113/11526 [22:02<1:36:26, 1.63it/s] {'loss': 0.2549, 'grad_norm': 0.5533451437950134, 'learning_rate': 9.790148583802805e-06, 'epoch': 0.55}
18%|█▊ | 2113/11526 [22:02<1:36:26, 1.63it/s] 18%|█▊ | 2114/11526 [22:02<1:36:27, 1.63it/s] {'loss': 0.2601, 'grad_norm': 0.5537814497947693, 'learning_rate': 9.789714257819662e-06, 'epoch': 0.55}
18%|█▊ | 2114/11526 [22:03<1:36:27, 1.63it/s] 18%|█▊ | 2115/11526 [22:03<1:36:25, 1.63it/s] {'loss': 0.2994, 'grad_norm': 0.6075956225395203, 'learning_rate': 9.789279492496676e-06, 'epoch': 0.55}
18%|█▊ | 2115/11526 [22:03<1:36:25, 1.63it/s] 18%|█▊ | 2116/11526 [22:04<1:36:23, 1.63it/s] {'loss': 0.2499, 'grad_norm': 0.5926958918571472, 'learning_rate': 9.788844287873724e-06, 'epoch': 0.55}
18%|█▊ | 2116/11526 [22:04<1:36:23, 1.63it/s] 18%|█▊ | 2117/11526 [22:04<1:36:26, 1.63it/s] {'loss': 0.2369, 'grad_norm': 0.5374034643173218, 'learning_rate': 9.788408643990728e-06, 'epoch': 0.55}
18%|█▊ | 2117/11526 [22:04<1:36:26, 1.63it/s] 18%|█▊ | 2118/11526 [22:05<1:36:24, 1.63it/s] {'loss': 0.2871, 'grad_norm': 0.5514360070228577, 'learning_rate': 9.787972560887645e-06, 'epoch': 0.55}
18%|█▊ | 2118/11526 [22:05<1:36:24, 1.63it/s] 18%|█▊ | 2119/11526 [22:05<1:36:24, 1.63it/s] {'loss': 0.2211, 'grad_norm': 0.5250065326690674, 'learning_rate': 9.787536038604478e-06, 'epoch': 0.55}
18%|█▊ | 2119/11526 [22:06<1:36:24, 1.63it/s] 18%|█▊ | 2120/11526 [22:06<1:36:21, 1.63it/s] {'loss': 0.2706, 'grad_norm': 0.6780567765235901, 'learning_rate': 9.787099077181263e-06, 'epoch': 0.55}
18%|█▊ | 2120/11526 [22:06<1:36:21, 1.63it/s] 18%|█▊ | 2121/11526 [22:07<1:36:17, 1.63it/s] {'loss': 0.3245, 'grad_norm': 0.5619016289710999, 'learning_rate': 9.786661676658085e-06, 'epoch': 0.55}
18%|█▊ | 2121/11526 [22:07<1:36:17, 1.63it/s] 18%|█▊ | 2122/11526 [22:07<1:36:20, 1.63it/s] {'loss': 0.2889, 'grad_norm': 0.5202364325523376, 'learning_rate': 9.786223837075061e-06, 'epoch': 0.55}
18%|█▊ | 2122/11526 [22:07<1:36:20, 1.63it/s] 18%|█▊ | 2123/11526 [22:08<1:36:20, 1.63it/s] {'loss': 0.2561, 'grad_norm': 0.6287238001823425, 'learning_rate': 9.785785558472355e-06, 'epoch': 0.55}
18%|█▊ | 2123/11526 [22:08<1:36:20, 1.63it/s] 18%|█▊ | 2124/11526 [22:09<1:36:16, 1.63it/s] {'loss': 0.3269, 'grad_norm': 0.6588232517242432, 'learning_rate': 9.785346840890171e-06, 'epoch': 0.55}
18%|█▊ | 2124/11526 [22:09<1:36:16, 1.63it/s] 18%|█▊ | 2125/11526 [22:09<1:36:15, 1.63it/s] {'loss': 0.2955, 'grad_norm': 0.680438220500946, 'learning_rate': 9.784907684368743e-06, 'epoch': 0.55}
18%|█▊ | 2125/11526 [22:09<1:36:15, 1.63it/s] 18%|█▊ | 2126/11526 [22:10<1:36:14, 1.63it/s] {'loss': 0.3294, 'grad_norm': 0.5704694390296936, 'learning_rate': 9.78446808894836e-06, 'epoch': 0.55}
18%|█▊ | 2126/11526 [22:10<1:36:14, 1.63it/s] 18%|█▊ | 2127/11526 [22:10<1:36:18, 1.63it/s] {'loss': 0.3332, 'grad_norm': 0.7083881497383118, 'learning_rate': 9.78402805466934e-06, 'epoch': 0.55}
18%|█▊ | 2127/11526 [22:10<1:36:18, 1.63it/s] 18%|█▊ | 2128/11526 [22:11<1:36:16, 1.63it/s] {'loss': 0.328, 'grad_norm': 0.7233176231384277, 'learning_rate': 9.783587581572047e-06, 'epoch': 0.55}
18%|█▊ | 2128/11526 [22:11<1:36:16, 1.63it/s] 18%|█▊ | 2129/11526 [22:12<1:36:10, 1.63it/s] {'loss': 0.2253, 'grad_norm': 0.535622775554657, 'learning_rate': 9.783146669696883e-06, 'epoch': 0.55}
18%|█▊ | 2129/11526 [22:12<1:36:10, 1.63it/s] 18%|█▊ | 2130/11526 [22:12<1:36:10, 1.63it/s] {'loss': 0.3976, 'grad_norm': 0.7461189031600952, 'learning_rate': 9.782705319084292e-06, 'epoch': 0.55}
18%|█▊ | 2130/11526 [22:12<1:36:10, 1.63it/s] 18%|█▊ | 2131/11526 [22:13<1:36:08, 1.63it/s] {'loss': 0.3501, 'grad_norm': 0.6029630303382874, 'learning_rate': 9.782263529774756e-06, 'epoch': 0.55}
18%|█▊ | 2131/11526 [22:13<1:36:08, 1.63it/s] 18%|█▊ | 2132/11526 [22:13<1:36:16, 1.63it/s] {'loss': 0.3627, 'grad_norm': 0.6047549843788147, 'learning_rate': 9.7818213018088e-06, 'epoch': 0.55}
18%|█▊ | 2132/11526 [22:14<1:36:16, 1.63it/s] 19%|█▊ | 2133/11526 [22:14<1:36:12, 1.63it/s] {'loss': 0.2508, 'grad_norm': 0.6639825701713562, 'learning_rate': 9.781378635226988e-06, 'epoch': 0.56}
19%|█▊ | 2133/11526 [22:14<1:36:12, 1.63it/s] 19%|█▊ | 2134/11526 [22:15<1:36:20, 1.62it/s] {'loss': 0.3433, 'grad_norm': 0.6398656368255615, 'learning_rate': 9.780935530069919e-06, 'epoch': 0.56}
19%|█▊ | 2134/11526 [22:15<1:36:20, 1.62it/s] 19%|█▊ | 2135/11526 [22:15<1:36:18, 1.63it/s] {'loss': 0.2579, 'grad_norm': 0.5834253430366516, 'learning_rate': 9.780491986378244e-06, 'epoch': 0.56}
19%|█▊ | 2135/11526 [22:15<1:36:18, 1.63it/s] 19%|█▊ | 2136/11526 [22:16<1:36:19, 1.62it/s] {'loss': 0.2598, 'grad_norm': 0.628200113773346, 'learning_rate': 9.780048004192645e-06, 'epoch': 0.56}
19%|█▊ | 2136/11526 [22:16<1:36:19, 1.62it/s] 19%|█▊ | 2137/11526 [22:17<1:36:21, 1.62it/s] {'loss': 0.2686, 'grad_norm': 0.5314700603485107, 'learning_rate': 9.779603583553842e-06, 'epoch': 0.56}
19%|█▊ | 2137/11526 [22:17<1:36:21, 1.62it/s] 19%|█▊ | 2138/11526 [22:17<1:36:14, 1.63it/s] {'loss': 0.3066, 'grad_norm': 0.6209491491317749, 'learning_rate': 9.779158724502604e-06, 'epoch': 0.56}
19%|█▊ | 2138/11526 [22:17<1:36:14, 1.63it/s] 19%|█▊ | 2139/11526 [22:18<1:36:12, 1.63it/s] {'loss': 0.2243, 'grad_norm': 0.6231634020805359, 'learning_rate': 9.778713427079739e-06, 'epoch': 0.56}
19%|█▊ | 2139/11526 [22:18<1:36:12, 1.63it/s] 19%|█▊ | 2140/11526 [22:18<1:36:10, 1.63it/s] {'loss': 0.2886, 'grad_norm': 0.5467020273208618, 'learning_rate': 9.778267691326085e-06, 'epoch': 0.56}
19%|█▊ | 2140/11526 [22:18<1:36:10, 1.63it/s] 19%|█▊ | 2141/11526 [22:19<1:36:17, 1.62it/s] {'loss': 0.3774, 'grad_norm': 0.7184926867485046, 'learning_rate': 9.777821517282534e-06, 'epoch': 0.56}
19%|█▊ | 2141/11526 [22:19<1:36:17, 1.62it/s] 19%|█▊ | 2142/11526 [22:20<1:36:23, 1.62it/s] {'loss': 0.2921, 'grad_norm': 0.5877664089202881, 'learning_rate': 9.777374904990007e-06, 'epoch': 0.56}
19%|█▊ | 2142/11526 [22:20<1:36:23, 1.62it/s] 19%|█▊ | 2143/11526 [22:20<1:36:19, 1.62it/s] {'loss': 0.2596, 'grad_norm': 0.5420777201652527, 'learning_rate': 9.776927854489472e-06, 'epoch': 0.56}
19%|█▊ | 2143/11526 [22:20<1:36:19, 1.62it/s] 19%|█▊ | 2144/11526 [22:21<1:36:15, 1.62it/s] {'loss': 0.24, 'grad_norm': 0.5660854578018188, 'learning_rate': 9.776480365821935e-06, 'epoch': 0.56}
19%|█▊ | 2144/11526 [22:21<1:36:15, 1.62it/s] 19%|█▊ | 2145/11526 [22:21<1:36:09, 1.63it/s] {'loss': 0.2408, 'grad_norm': 0.5219511389732361, 'learning_rate': 9.77603243902844e-06, 'epoch': 0.56}
19%|█▊ | 2145/11526 [22:22<1:36:09, 1.63it/s] 19%|█▊ | 2146/11526 [22:22<1:36:10, 1.63it/s] {'loss': 0.2892, 'grad_norm': 0.624117374420166, 'learning_rate': 9.775584074150077e-06, 'epoch': 0.56}
19%|█▊ | 2146/11526 [22:22<1:36:10, 1.63it/s] 19%|█▊ | 2147/11526 [22:23<1:36:11, 1.63it/s] {'loss': 0.2578, 'grad_norm': 0.5869446396827698, 'learning_rate': 9.775135271227969e-06, 'epoch': 0.56}
19%|█▊ | 2147/11526 [22:23<1:36:11, 1.63it/s] 19%|█▊ | 2148/11526 [22:23<1:36:05, 1.63it/s] {'loss': 0.2543, 'grad_norm': 0.7071793675422668, 'learning_rate': 9.774686030303286e-06, 'epoch': 0.56}
19%|█▊ | 2148/11526 [22:23<1:36:05, 1.63it/s] 19%|█▊ | 2149/11526 [22:24<1:36:01, 1.63it/s] {'loss': 0.2083, 'grad_norm': 0.5593536496162415, 'learning_rate': 9.774236351417233e-06, 'epoch': 0.56}
19%|█▊ | 2149/11526 [22:24<1:36:01, 1.63it/s] 19%|█▊ | 2150/11526 [22:25<1:36:00, 1.63it/s] {'loss': 0.2929, 'grad_norm': 0.5889735817909241, 'learning_rate': 9.773786234611058e-06, 'epoch': 0.56}
19%|█▊ | 2150/11526 [22:25<1:36:00, 1.63it/s] 19%|█▊ | 2151/11526 [22:25<1:36:05, 1.63it/s] {'loss': 0.2671, 'grad_norm': 0.6299184560775757, 'learning_rate': 9.773335679926048e-06, 'epoch': 0.56}
19%|█▊ | 2151/11526 [22:25<1:36:05, 1.63it/s] 19%|█▊ | 2152/11526 [22:26<1:36:27, 1.62it/s] {'loss': 0.2081, 'grad_norm': 0.5031303763389587, 'learning_rate': 9.772884687403529e-06, 'epoch': 0.56}
19%|█▊ | 2152/11526 [22:26<1:36:27, 1.62it/s] 19%|█▊ | 2153/11526 [22:26<1:36:17, 1.62it/s] {'loss': 0.2554, 'grad_norm': 0.6468297243118286, 'learning_rate': 9.772433257084872e-06, 'epoch': 0.56}
19%|█▊ | 2153/11526 [22:26<1:36:17, 1.62it/s] 19%|█▊ | 2154/11526 [22:27<1:36:12, 1.62it/s] {'loss': 0.2412, 'grad_norm': 0.5338165760040283, 'learning_rate': 9.77198138901148e-06, 'epoch': 0.56}
19%|█▊ | 2154/11526 [22:27<1:36:12, 1.62it/s] 19%|█▊ | 2155/11526 [22:28<1:36:04, 1.63it/s] {'loss': 0.3156, 'grad_norm': 0.6966970562934875, 'learning_rate': 9.771529083224806e-06, 'epoch': 0.56}
19%|█▊ | 2155/11526 [22:28<1:36:04, 1.63it/s] 19%|█▊ | 2156/11526 [22:28<1:36:11, 1.62it/s] {'loss': 0.2645, 'grad_norm': 0.5854769349098206, 'learning_rate': 9.771076339766335e-06, 'epoch': 0.56}
19%|█▊ | 2156/11526 [22:28<1:36:11, 1.62it/s] 19%|█▊ | 2157/11526 [22:29<1:36:31, 1.62it/s] {'loss': 0.265, 'grad_norm': 0.5611105561256409, 'learning_rate': 9.770623158677595e-06, 'epoch': 0.56}
19%|█▊ | 2157/11526 [22:29<1:36:31, 1.62it/s] 19%|█▊ | 2158/11526 [22:29<1:36:19, 1.62it/s] {'loss': 0.3266, 'grad_norm': 0.6228994131088257, 'learning_rate': 9.770169540000157e-06, 'epoch': 0.56}
19%|█▊ | 2158/11526 [22:30<1:36:19, 1.62it/s] 19%|█▊ | 2159/11526 [22:30<1:36:09, 1.62it/s] {'loss': 0.2279, 'grad_norm': 0.5289015769958496, 'learning_rate': 9.769715483775626e-06, 'epoch': 0.56}
19%|█▊ | 2159/11526 [22:30<1:36:09, 1.62it/s] 19%|█▊ | 2160/11526 [22:31<1:36:02, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.5436469316482544, 'learning_rate': 9.769260990045652e-06, 'epoch': 0.56}
19%|█▊ | 2160/11526 [22:31<1:36:02, 1.63it/s] 19%|█▊ | 2161/11526 [22:31<1:36:07, 1.62it/s] {'loss': 0.2123, 'grad_norm': 0.4716946482658386, 'learning_rate': 9.768806058851924e-06, 'epoch': 0.56}
19%|█▊ | 2161/11526 [22:31<1:36:07, 1.62it/s] 19%|█▉ | 2162/11526 [22:32<1:36:09, 1.62it/s] {'loss': 0.2722, 'grad_norm': 0.5752503871917725, 'learning_rate': 9.76835069023617e-06, 'epoch': 0.56}
19%|█▉ | 2162/11526 [22:32<1:36:09, 1.62it/s] 19%|█▉ | 2163/11526 [22:33<1:36:01, 1.63it/s] {'loss': 0.2833, 'grad_norm': 0.557949423789978, 'learning_rate': 9.767894884240164e-06, 'epoch': 0.56}
19%|█▉ | 2163/11526 [22:33<1:36:01, 1.63it/s] 19%|█▉ | 2164/11526 [22:33<1:35:58, 1.63it/s] {'loss': 0.3033, 'grad_norm': 0.5801301002502441, 'learning_rate': 9.767438640905707e-06, 'epoch': 0.56}
19%|█▉ | 2164/11526 [22:33<1:35:58, 1.63it/s] 19%|█▉ | 2165/11526 [22:34<1:35:54, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.5029714107513428, 'learning_rate': 9.766981960274653e-06, 'epoch': 0.56}
19%|█▉ | 2165/11526 [22:34<1:35:54, 1.63it/s] 19%|█▉ | 2166/11526 [22:34<1:36:22, 1.62it/s] {'loss': 0.2915, 'grad_norm': 0.5740005373954773, 'learning_rate': 9.766524842388892e-06, 'epoch': 0.56}
19%|█▉ | 2166/11526 [22:35<1:36:22, 1.62it/s] 19%|█▉ | 2167/11526 [22:35<1:36:20, 1.62it/s] {'loss': 0.2854, 'grad_norm': 0.5923210978507996, 'learning_rate': 9.766067287290351e-06, 'epoch': 0.56}
19%|█▉ | 2167/11526 [22:35<1:36:20, 1.62it/s] 19%|█▉ | 2168/11526 [22:36<1:36:11, 1.62it/s] {'loss': 0.2531, 'grad_norm': 0.6856685876846313, 'learning_rate': 9.765609295021001e-06, 'epoch': 0.56}
19%|█▉ | 2168/11526 [22:36<1:36:11, 1.62it/s] 19%|█▉ | 2169/11526 [22:36<1:36:06, 1.62it/s] {'loss': 0.2758, 'grad_norm': 0.6273036599159241, 'learning_rate': 9.76515086562285e-06, 'epoch': 0.56}
19%|█▉ | 2169/11526 [22:36<1:36:06, 1.62it/s] 19%|█▉ | 2170/11526 [22:37<1:35:57, 1.63it/s] {'loss': 0.2272, 'grad_norm': 0.5897465348243713, 'learning_rate': 9.76469199913795e-06, 'epoch': 0.56}
19%|█▉ | 2170/11526 [22:37<1:35:57, 1.63it/s] 19%|█▉ | 2171/11526 [22:37<1:36:05, 1.62it/s] {'loss': 0.2445, 'grad_norm': 0.6075824499130249, 'learning_rate': 9.764232695608391e-06, 'epoch': 0.57}
19%|█▉ | 2171/11526 [22:38<1:36:05, 1.62it/s] 19%|█▉ | 2172/11526 [22:38<1:36:26, 1.62it/s] {'loss': 0.2061, 'grad_norm': 0.5872854590415955, 'learning_rate': 9.7637729550763e-06, 'epoch': 0.57}
19%|█▉ | 2172/11526 [22:38<1:36:26, 1.62it/s] 19%|█▉ | 2173/11526 [22:39<1:36:13, 1.62it/s] {'loss': 0.3281, 'grad_norm': 0.8271423578262329, 'learning_rate': 9.763312777583852e-06, 'epoch': 0.57}
19%|█▉ | 2173/11526 [22:39<1:36:13, 1.62it/s] 19%|█▉ | 2174/11526 [22:39<1:36:05, 1.62it/s] {'loss': 0.221, 'grad_norm': 0.579198956489563, 'learning_rate': 9.76285216317325e-06, 'epoch': 0.57}
19%|█▉ | 2174/11526 [22:39<1:36:05, 1.62it/s] 19%|█▉ | 2175/11526 [22:40<1:35:58, 1.62it/s] {'loss': 0.2518, 'grad_norm': 0.5745250582695007, 'learning_rate': 9.762391111886749e-06, 'epoch': 0.57}
19%|█▉ | 2175/11526 [22:40<1:35:58, 1.62it/s] 19%|█▉ | 2176/11526 [22:41<1:35:59, 1.62it/s] {'loss': 0.2549, 'grad_norm': 0.5575345158576965, 'learning_rate': 9.76192962376664e-06, 'epoch': 0.57}
19%|█▉ | 2176/11526 [22:41<1:35:59, 1.62it/s] 19%|█▉ | 2177/11526 [22:41<1:35:59, 1.62it/s] {'loss': 0.2963, 'grad_norm': 0.6516624093055725, 'learning_rate': 9.761467698855249e-06, 'epoch': 0.57}
19%|█▉ | 2177/11526 [22:41<1:35:59, 1.62it/s] 19%|█▉ | 2178/11526 [22:42<1:35:53, 1.62it/s] {'loss': 0.2798, 'grad_norm': 0.6140629053115845, 'learning_rate': 9.76100533719495e-06, 'epoch': 0.57}
19%|█▉ | 2178/11526 [22:42<1:35:53, 1.62it/s] 19%|█▉ | 2179/11526 [22:42<1:35:48, 1.63it/s] {'loss': 0.3209, 'grad_norm': 0.5724038481712341, 'learning_rate': 9.760542538828155e-06, 'epoch': 0.57}
19%|█▉ | 2179/11526 [22:43<1:35:48, 1.63it/s] 19%|█▉ | 2180/11526 [22:43<1:35:44, 1.63it/s] {'loss': 0.3766, 'grad_norm': 0.7781198024749756, 'learning_rate': 9.76007930379731e-06, 'epoch': 0.57}
19%|█▉ | 2180/11526 [22:43<1:35:44, 1.63it/s] 19%|█▉ | 2181/11526 [22:44<1:35:50, 1.63it/s] {'loss': 0.3707, 'grad_norm': 0.5495268702507019, 'learning_rate': 9.759615632144907e-06, 'epoch': 0.57}
19%|█▉ | 2181/11526 [22:44<1:35:50, 1.63it/s] 19%|█▉ | 2182/11526 [22:44<1:35:51, 1.62it/s] {'loss': 0.2301, 'grad_norm': 0.5195644497871399, 'learning_rate': 9.759151523913477e-06, 'epoch': 0.57}
19%|█▉ | 2182/11526 [22:44<1:35:51, 1.62it/s] 19%|█▉ | 2183/11526 [22:45<1:35:50, 1.62it/s] {'loss': 0.2604, 'grad_norm': 0.578645646572113, 'learning_rate': 9.758686979145591e-06, 'epoch': 0.57}
19%|█▉ | 2183/11526 [22:45<1:35:50, 1.62it/s] 19%|█▉ | 2184/11526 [22:45<1:35:47, 1.63it/s] {'loss': 0.3168, 'grad_norm': 0.6356945037841797, 'learning_rate': 9.75822199788386e-06, 'epoch': 0.57}
19%|█▉ | 2184/11526 [22:46<1:35:47, 1.63it/s] 19%|█▉ | 2185/11526 [22:46<1:35:45, 1.63it/s] {'loss': 0.2667, 'grad_norm': 0.5388787388801575, 'learning_rate': 9.757756580170934e-06, 'epoch': 0.57}
19%|█▉ | 2185/11526 [22:46<1:35:45, 1.63it/s] 19%|█▉ | 2186/11526 [22:47<1:35:50, 1.62it/s] {'loss': 0.2748, 'grad_norm': 0.5294544100761414, 'learning_rate': 9.757290726049505e-06, 'epoch': 0.57}
19%|█▉ | 2186/11526 [22:47<1:35:50, 1.62it/s] 19%|█▉ | 2187/11526 [22:47<1:36:01, 1.62it/s] {'loss': 0.2904, 'grad_norm': 0.5987764000892639, 'learning_rate': 9.756824435562302e-06, 'epoch': 0.57}
19%|█▉ | 2187/11526 [22:47<1:36:01, 1.62it/s] 19%|█▉ | 2188/11526 [22:48<1:35:51, 1.62it/s] {'loss': 0.2284, 'grad_norm': 0.5012331604957581, 'learning_rate': 9.756357708752096e-06, 'epoch': 0.57}
19%|█▉ | 2188/11526 [22:48<1:35:51, 1.62it/s] 19%|█▉ | 2189/11526 [22:49<1:35:47, 1.62it/s] {'loss': 0.2401, 'grad_norm': 0.49647489190101624, 'learning_rate': 9.7558905456617e-06, 'epoch': 0.57}
19%|█▉ | 2189/11526 [22:49<1:35:47, 1.62it/s] 19%|█▉ | 2190/11526 [22:49<1:35:41, 1.63it/s] {'loss': 0.3073, 'grad_norm': 0.6610411405563354, 'learning_rate': 9.75542294633396e-06, 'epoch': 0.57}
19%|█▉ | 2190/11526 [22:49<1:35:41, 1.63it/s] 19%|█▉ | 2191/11526 [22:50<1:35:45, 1.62it/s] {'loss': 0.2608, 'grad_norm': 0.5871783494949341, 'learning_rate': 9.754954910811772e-06, 'epoch': 0.57}
19%|█▉ | 2191/11526 [22:50<1:35:45, 1.62it/s] 19%|█▉ | 2192/11526 [22:50<1:35:47, 1.62it/s] {'loss': 0.2428, 'grad_norm': 0.5204318761825562, 'learning_rate': 9.754486439138066e-06, 'epoch': 0.57}
19%|█▉ | 2192/11526 [22:51<1:35:47, 1.62it/s] 19%|█▉ | 2193/11526 [22:51<1:35:40, 1.63it/s] {'loss': 0.2683, 'grad_norm': 0.531121015548706, 'learning_rate': 9.754017531355811e-06, 'epoch': 0.57}
19%|█▉ | 2193/11526 [22:51<1:35:40, 1.63it/s] 19%|█▉ | 2194/11526 [22:52<1:35:36, 1.63it/s] {'loss': 0.2949, 'grad_norm': 0.6172188520431519, 'learning_rate': 9.753548187508018e-06, 'epoch': 0.57}
19%|█▉ | 2194/11526 [22:52<1:35:36, 1.63it/s] 19%|█▉ | 2195/11526 [22:52<1:35:33, 1.63it/s] {'loss': 0.3315, 'grad_norm': 0.6524479985237122, 'learning_rate': 9.75307840763774e-06, 'epoch': 0.57}
19%|█▉ | 2195/11526 [22:52<1:35:33, 1.63it/s] 19%|█▉ | 2196/11526 [22:53<1:35:37, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.5490622520446777, 'learning_rate': 9.752608191788065e-06, 'epoch': 0.57}
19%|█▉ | 2196/11526 [22:53<1:35:37, 1.63it/s] 19%|█▉ | 2197/11526 [22:53<1:35:38, 1.63it/s] {'loss': 0.2975, 'grad_norm': 0.6149340867996216, 'learning_rate': 9.752137540002127e-06, 'epoch': 0.57}
19%|█▉ | 2197/11526 [22:54<1:35:38, 1.63it/s] 19%|█▉ | 2198/11526 [22:54<1:35:34, 1.63it/s] {'loss': 0.3674, 'grad_norm': 0.725165605545044, 'learning_rate': 9.751666452323094e-06, 'epoch': 0.57}
19%|█▉ | 2198/11526 [22:54<1:35:34, 1.63it/s] 19%|█▉ | 2199/11526 [22:55<1:35:36, 1.63it/s] {'loss': 0.2712, 'grad_norm': 0.5926238298416138, 'learning_rate': 9.751194928794179e-06, 'epoch': 0.57}
19%|█▉ | 2199/11526 [22:55<1:35:36, 1.63it/s] 19%|█▉ | 2200/11526 [22:55<1:35:32, 1.63it/s] {'loss': 0.265, 'grad_norm': 0.5589438080787659, 'learning_rate': 9.750722969458632e-06, 'epoch': 0.57}
19%|█▉ | 2200/11526 [22:55<1:35:32, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.33it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6976413130760193, 'eval_runtime': 1.9564, 'eval_samples_per_second': 102.227, 'eval_steps_per_second': 6.645, 'epoch': 0.57}
19%|█▉ | 2200/11526 [22:57<1:35:32, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 19%|█▉ | 2201/11526 [22:58<3:06:59, 1.20s/it] {'loss': 0.2829, 'grad_norm': 0.6059420108795166, 'learning_rate': 9.750250574359743e-06, 'epoch': 0.57}
19%|█▉ | 2201/11526 [22:58<3:06:59, 1.20s/it] 19%|█▉ | 2202/11526 [22:58<2:39:31, 1.03s/it] {'loss': 0.2401, 'grad_norm': 0.5435613393783569, 'learning_rate': 9.749777743540845e-06, 'epoch': 0.57}
19%|█▉ | 2202/11526 [22:59<2:39:31, 1.03s/it] 19%|█▉ | 2203/11526 [22:59<2:20:16, 1.11it/s] {'loss': 0.1874, 'grad_norm': 0.4857877194881439, 'learning_rate': 9.749304477045306e-06, 'epoch': 0.57}
19%|█▉ | 2203/11526 [22:59<2:20:16, 1.11it/s] 19%|█▉ | 2204/11526 [23:00<2:06:47, 1.23it/s] {'loss': 0.2013, 'grad_norm': 0.5205402970314026, 'learning_rate': 9.748830774916538e-06, 'epoch': 0.57}
19%|█▉ | 2204/11526 [23:00<2:06:47, 1.23it/s] 19%|█▉ | 2205/11526 [23:00<1:57:22, 1.32it/s] {'loss': 0.2627, 'grad_norm': 0.5972543358802795, 'learning_rate': 9.748356637197991e-06, 'epoch': 0.57}
19%|█▉ | 2205/11526 [23:00<1:57:22, 1.32it/s] 19%|█▉ | 2206/11526 [23:01<1:50:40, 1.40it/s] {'loss': 0.37, 'grad_norm': 0.6322018504142761, 'learning_rate': 9.747882063933159e-06, 'epoch': 0.57}
19%|█▉ | 2206/11526 [23:01<1:50:40, 1.40it/s] 19%|█▉ | 2207/11526 [23:02<1:46:19, 1.46it/s] {'loss': 0.2551, 'grad_norm': 0.5344486236572266, 'learning_rate': 9.747407055165567e-06, 'epoch': 0.57}
19%|█▉ | 2207/11526 [23:02<1:46:19, 1.46it/s] 19%|█▉ | 2208/11526 [23:02<1:43:02, 1.51it/s] {'loss': 0.2247, 'grad_norm': 0.5256103873252869, 'learning_rate': 9.74693161093879e-06, 'epoch': 0.57}
19%|█▉ | 2208/11526 [23:02<1:43:02, 1.51it/s] 19%|█▉ | 2209/11526 [23:03<1:40:43, 1.54it/s] {'loss': 0.2469, 'grad_norm': 0.5547171235084534, 'learning_rate': 9.746455731296435e-06, 'epoch': 0.57}
19%|█▉ | 2209/11526 [23:03<1:40:43, 1.54it/s] 19%|█▉ | 2210/11526 [23:03<1:39:05, 1.57it/s] {'loss': 0.3161, 'grad_norm': 0.5772553086280823, 'learning_rate': 9.745979416282154e-06, 'epoch': 0.58}
19%|█▉ | 2210/11526 [23:04<1:39:05, 1.57it/s] 19%|█▉ | 2211/11526 [23:04<1:38:00, 1.58it/s] {'loss': 0.2851, 'grad_norm': 0.5985826253890991, 'learning_rate': 9.74550266593964e-06, 'epoch': 0.58}
19%|█▉ | 2211/11526 [23:04<1:38:00, 1.58it/s] 19%|█▉ | 2212/11526 [23:05<1:37:27, 1.59it/s] {'loss': 0.2203, 'grad_norm': 0.5044706463813782, 'learning_rate': 9.745025480312617e-06, 'epoch': 0.58}
19%|█▉ | 2212/11526 [23:05<1:37:27, 1.59it/s] 19%|█▉ | 2213/11526 [23:05<1:36:48, 1.60it/s] {'loss': 0.1898, 'grad_norm': 0.479193776845932, 'learning_rate': 9.744547859444861e-06, 'epoch': 0.58}
19%|█▉ | 2213/11526 [23:05<1:36:48, 1.60it/s] 19%|█▉ | 2214/11526 [23:06<1:36:21, 1.61it/s] {'loss': 0.2876, 'grad_norm': 0.6608891487121582, 'learning_rate': 9.74406980338018e-06, 'epoch': 0.58}
19%|█▉ | 2214/11526 [23:06<1:36:21, 1.61it/s] 19%|█▉ | 2215/11526 [23:06<1:36:01, 1.62it/s] {'loss': 0.305, 'grad_norm': 0.5727489590644836, 'learning_rate': 9.743591312162424e-06, 'epoch': 0.58}
19%|█▉ | 2215/11526 [23:07<1:36:01, 1.62it/s] 19%|█▉ | 2216/11526 [23:07<1:35:51, 1.62it/s] {'loss': 0.317, 'grad_norm': 0.6010690331459045, 'learning_rate': 9.743112385835482e-06, 'epoch': 0.58}
19%|█▉ | 2216/11526 [23:07<1:35:51, 1.62it/s] 19%|█▉ | 2217/11526 [23:08<1:35:46, 1.62it/s] {'loss': 0.3217, 'grad_norm': 0.6085793375968933, 'learning_rate': 9.742633024443286e-06, 'epoch': 0.58}
19%|█▉ | 2217/11526 [23:08<1:35:46, 1.62it/s] 19%|█▉ | 2218/11526 [23:08<1:35:35, 1.62it/s] {'loss': 0.2609, 'grad_norm': 0.5662590861320496, 'learning_rate': 9.742153228029805e-06, 'epoch': 0.58}
19%|█▉ | 2218/11526 [23:08<1:35:35, 1.62it/s] 19%|█▉ | 2219/11526 [23:09<1:35:29, 1.62it/s] {'loss': 0.2492, 'grad_norm': 0.6247676014900208, 'learning_rate': 9.741672996639046e-06, 'epoch': 0.58}
19%|█▉ | 2219/11526 [23:09<1:35:29, 1.62it/s] 19%|█▉ | 2220/11526 [23:10<1:35:25, 1.63it/s] {'loss': 0.2539, 'grad_norm': 0.5524955987930298, 'learning_rate': 9.741192330315062e-06, 'epoch': 0.58}
19%|█▉ | 2220/11526 [23:10<1:35:25, 1.63it/s] 19%|█▉ | 2221/11526 [23:10<1:35:25, 1.63it/s] {'loss': 0.265, 'grad_norm': 0.5763030052185059, 'learning_rate': 9.740711229101943e-06, 'epoch': 0.58}
19%|█▉ | 2221/11526 [23:10<1:35:25, 1.63it/s] 19%|█▉ | 2222/11526 [23:11<1:35:24, 1.63it/s] {'loss': 0.3508, 'grad_norm': 0.7993590235710144, 'learning_rate': 9.740229693043814e-06, 'epoch': 0.58}
19%|█▉ | 2222/11526 [23:11<1:35:24, 1.63it/s] 19%|█▉ | 2223/11526 [23:11<1:35:30, 1.62it/s] {'loss': 0.3711, 'grad_norm': 0.57314532995224, 'learning_rate': 9.739747722184847e-06, 'epoch': 0.58}
19%|█▉ | 2223/11526 [23:12<1:35:30, 1.62it/s] 19%|█▉ | 2224/11526 [23:12<1:35:25, 1.62it/s] {'loss': 0.2703, 'grad_norm': 0.5477586388587952, 'learning_rate': 9.739265316569251e-06, 'epoch': 0.58}
19%|█▉ | 2224/11526 [23:12<1:35:25, 1.62it/s] 19%|█▉ | 2225/11526 [23:13<1:35:22, 1.63it/s] {'loss': 0.3036, 'grad_norm': 0.5865157842636108, 'learning_rate': 9.738782476241276e-06, 'epoch': 0.58}
19%|█▉ | 2225/11526 [23:13<1:35:22, 1.63it/s] 19%|█▉ | 2226/11526 [23:13<1:35:26, 1.62it/s] {'loss': 0.2535, 'grad_norm': 0.5521371960639954, 'learning_rate': 9.738299201245208e-06, 'epoch': 0.58}
19%|█▉ | 2226/11526 [23:13<1:35:26, 1.62it/s] 19%|█▉ | 2227/11526 [23:14<1:35:27, 1.62it/s] {'loss': 0.332, 'grad_norm': 0.6004514694213867, 'learning_rate': 9.73781549162538e-06, 'epoch': 0.58}
19%|█▉ | 2227/11526 [23:14<1:35:27, 1.62it/s] 19%|█▉ | 2228/11526 [23:14<1:35:22, 1.62it/s] {'loss': 0.2897, 'grad_norm': 0.624553918838501, 'learning_rate': 9.737331347426156e-06, 'epoch': 0.58}
19%|█▉ | 2228/11526 [23:15<1:35:22, 1.62it/s] 19%|█▉ | 2229/11526 [23:15<1:35:17, 1.63it/s] {'loss': 0.3126, 'grad_norm': 0.7120171189308167, 'learning_rate': 9.736846768691946e-06, 'epoch': 0.58}
19%|█▉ | 2229/11526 [23:15<1:35:17, 1.63it/s] 19%|█▉ | 2230/11526 [23:16<1:35:11, 1.63it/s] {'loss': 0.2728, 'grad_norm': 0.545613706111908, 'learning_rate': 9.7363617554672e-06, 'epoch': 0.58}
19%|█▉ | 2230/11526 [23:16<1:35:11, 1.63it/s] 19%|█▉ | 2231/11526 [23:16<1:35:17, 1.63it/s] {'loss': 0.275, 'grad_norm': 0.5196117758750916, 'learning_rate': 9.735876307796405e-06, 'epoch': 0.58}
19%|█▉ | 2231/11526 [23:16<1:35:17, 1.63it/s] 19%|█▉ | 2232/11526 [23:17<1:35:22, 1.62it/s] {'loss': 0.3049, 'grad_norm': 0.5824733972549438, 'learning_rate': 9.735390425724088e-06, 'epoch': 0.58}
19%|█▉ | 2232/11526 [23:17<1:35:22, 1.62it/s] 19%|█▉ | 2233/11526 [23:18<1:35:15, 1.63it/s] {'loss': 0.21, 'grad_norm': 0.42322656512260437, 'learning_rate': 9.734904109294818e-06, 'epoch': 0.58}
19%|█▉ | 2233/11526 [23:18<1:35:15, 1.63it/s] 19%|█▉ | 2234/11526 [23:18<1:35:12, 1.63it/s] {'loss': 0.2363, 'grad_norm': 0.5220878720283508, 'learning_rate': 9.734417358553205e-06, 'epoch': 0.58}
19%|█▉ | 2234/11526 [23:18<1:35:12, 1.63it/s] 19%|█▉ | 2235/11526 [23:19<1:35:12, 1.63it/s] {'loss': 0.2723, 'grad_norm': 0.6085548996925354, 'learning_rate': 9.73393017354389e-06, 'epoch': 0.58}
19%|█▉ | 2235/11526 [23:19<1:35:12, 1.63it/s] 19%|█▉ | 2236/11526 [23:19<1:35:18, 1.62it/s] {'loss': 0.2554, 'grad_norm': 0.5524199604988098, 'learning_rate': 9.733442554311568e-06, 'epoch': 0.58}
19%|█▉ | 2236/11526 [23:20<1:35:18, 1.62it/s] 19%|█▉ | 2237/11526 [23:20<1:35:16, 1.62it/s] {'loss': 0.2585, 'grad_norm': 0.5807166695594788, 'learning_rate': 9.732954500900962e-06, 'epoch': 0.58}
19%|█▉ | 2237/11526 [23:20<1:35:16, 1.62it/s] 19%|█▉ | 2238/11526 [23:21<1:35:13, 1.63it/s] {'loss': 0.3334, 'grad_norm': 0.6515971422195435, 'learning_rate': 9.73246601335684e-06, 'epoch': 0.58}
19%|█▉ | 2238/11526 [23:21<1:35:13, 1.63it/s] 19%|█▉ | 2239/11526 [23:21<1:35:12, 1.63it/s] {'loss': 0.246, 'grad_norm': 0.5672945976257324, 'learning_rate': 9.731977091724007e-06, 'epoch': 0.58}
19%|█▉ | 2239/11526 [23:21<1:35:12, 1.63it/s] 19%|█▉ | 2240/11526 [23:22<1:35:07, 1.63it/s] {'loss': 0.2159, 'grad_norm': 0.5187064409255981, 'learning_rate': 9.731487736047314e-06, 'epoch': 0.58}
19%|█▉ | 2240/11526 [23:22<1:35:07, 1.63it/s] 19%|█▉ | 2241/11526 [23:22<1:35:02, 1.63it/s] {'loss': 0.2226, 'grad_norm': 0.49571382999420166, 'learning_rate': 9.730997946371644e-06, 'epoch': 0.58}
19%|█▉ | 2241/11526 [23:23<1:35:02, 1.63it/s] 19%|█▉ | 2242/11526 [23:23<1:35:13, 1.62it/s] {'loss': 0.229, 'grad_norm': 0.501116931438446, 'learning_rate': 9.730507722741921e-06, 'epoch': 0.58}
19%|█▉ | 2242/11526 [23:23<1:35:13, 1.62it/s] 19%|█▉ | 2243/11526 [23:24<1:35:07, 1.63it/s] {'loss': 0.3235, 'grad_norm': 0.6996204853057861, 'learning_rate': 9.730017065203117e-06, 'epoch': 0.58}
19%|█▉ | 2243/11526 [23:24<1:35:07, 1.63it/s] 19%|█▉ | 2244/11526 [23:24<1:35:03, 1.63it/s] {'loss': 0.2972, 'grad_norm': 0.5490974187850952, 'learning_rate': 9.729525973800234e-06, 'epoch': 0.58}
19%|█▉ | 2244/11526 [23:24<1:35:03, 1.63it/s] 19%|█▉ | 2245/11526 [23:25<1:35:03, 1.63it/s] {'loss': 0.2633, 'grad_norm': 0.505520224571228, 'learning_rate': 9.729034448578321e-06, 'epoch': 0.58}
19%|█▉ | 2245/11526 [23:25<1:35:03, 1.63it/s] 19%|█▉ | 2246/11526 [23:26<1:35:00, 1.63it/s] {'loss': 0.3004, 'grad_norm': 0.626926839351654, 'learning_rate': 9.728542489582458e-06, 'epoch': 0.58}
19%|█▉ | 2246/11526 [23:26<1:35:00, 1.63it/s] 19%|█▉ | 2247/11526 [23:26<1:35:03, 1.63it/s] {'loss': 0.3445, 'grad_norm': 0.6431133151054382, 'learning_rate': 9.728050096857774e-06, 'epoch': 0.58}
19%|█▉ | 2247/11526 [23:26<1:35:03, 1.63it/s] 20%|█▉ | 2248/11526 [23:27<1:35:00, 1.63it/s] {'loss': 0.2687, 'grad_norm': 0.631945788860321, 'learning_rate': 9.727557270449437e-06, 'epoch': 0.59}
20%|█▉ | 2248/11526 [23:27<1:35:00, 1.63it/s] 20%|█▉ | 2249/11526 [23:27<1:35:00, 1.63it/s] {'loss': 0.2359, 'grad_norm': 0.5129666924476624, 'learning_rate': 9.727064010402646e-06, 'epoch': 0.59}
20%|█▉ | 2249/11526 [23:28<1:35:00, 1.63it/s] 20%|█▉ | 2250/11526 [23:28<1:34:56, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.5420067310333252, 'learning_rate': 9.726570316762649e-06, 'epoch': 0.59}
20%|█▉ | 2250/11526 [23:28<1:34:56, 1.63it/s] 20%|█▉ | 2251/11526 [23:29<1:34:55, 1.63it/s] {'loss': 0.2956, 'grad_norm': 0.6789200305938721, 'learning_rate': 9.72607618957473e-06, 'epoch': 0.59}
20%|█▉ | 2251/11526 [23:29<1:34:55, 1.63it/s] 20%|█▉ | 2252/11526 [23:29<1:34:59, 1.63it/s] {'loss': 0.2876, 'grad_norm': 0.5531907677650452, 'learning_rate': 9.72558162888421e-06, 'epoch': 0.59}
20%|█▉ | 2252/11526 [23:29<1:34:59, 1.63it/s] 20%|█▉ | 2253/11526 [23:30<1:34:55, 1.63it/s] {'loss': 0.2414, 'grad_norm': 0.4908548891544342, 'learning_rate': 9.725086634736458e-06, 'epoch': 0.59}
20%|█▉ | 2253/11526 [23:30<1:34:55, 1.63it/s] 20%|█▉ | 2254/11526 [23:30<1:34:55, 1.63it/s] {'loss': 0.23, 'grad_norm': 0.513424813747406, 'learning_rate': 9.724591207176873e-06, 'epoch': 0.59}
20%|█▉ | 2254/11526 [23:31<1:34:55, 1.63it/s] 20%|█▉ | 2255/11526 [23:31<1:34:55, 1.63it/s] {'loss': 0.3246, 'grad_norm': 0.5529032349586487, 'learning_rate': 9.724095346250901e-06, 'epoch': 0.59}
20%|█▉ | 2255/11526 [23:31<1:34:55, 1.63it/s] 20%|█▉ | 2256/11526 [23:32<1:34:54, 1.63it/s] {'loss': 0.2395, 'grad_norm': 0.4795919954776764, 'learning_rate': 9.723599052004027e-06, 'epoch': 0.59}
20%|█▉ | 2256/11526 [23:32<1:34:54, 1.63it/s] 20%|█▉ | 2257/11526 [23:32<1:34:58, 1.63it/s] {'loss': 0.2943, 'grad_norm': 0.603655993938446, 'learning_rate': 9.72310232448177e-06, 'epoch': 0.59}
20%|█▉ | 2257/11526 [23:32<1:34:58, 1.63it/s] 20%|█▉ | 2258/11526 [23:33<1:34:52, 1.63it/s] {'loss': 0.2727, 'grad_norm': 0.5974379777908325, 'learning_rate': 9.722605163729694e-06, 'epoch': 0.59}
20%|█▉ | 2258/11526 [23:33<1:34:52, 1.63it/s] 20%|█▉ | 2259/11526 [23:34<1:34:48, 1.63it/s] {'loss': 0.3245, 'grad_norm': 0.6689780354499817, 'learning_rate': 9.722107569793404e-06, 'epoch': 0.59}
20%|█▉ | 2259/11526 [23:34<1:34:48, 1.63it/s] 20%|█▉ | 2260/11526 [23:34<1:34:51, 1.63it/s] {'loss': 0.2498, 'grad_norm': 0.6156749725341797, 'learning_rate': 9.72160954271854e-06, 'epoch': 0.59}
20%|█▉ | 2260/11526 [23:34<1:34:51, 1.63it/s] 20%|█▉ | 2261/11526 [23:35<1:34:54, 1.63it/s] {'loss': 0.2482, 'grad_norm': 0.5621371865272522, 'learning_rate': 9.721111082550783e-06, 'epoch': 0.59}
20%|█▉ | 2261/11526 [23:35<1:34:54, 1.63it/s] 20%|█▉ | 2262/11526 [23:35<1:34:51, 1.63it/s] {'loss': 0.2845, 'grad_norm': 0.6043649315834045, 'learning_rate': 9.720612189335854e-06, 'epoch': 0.59}
20%|█▉ | 2262/11526 [23:36<1:34:51, 1.63it/s] 20%|█▉ | 2263/11526 [23:36<1:34:51, 1.63it/s] {'loss': 0.2457, 'grad_norm': 0.5604841709136963, 'learning_rate': 9.720112863119517e-06, 'epoch': 0.59}
20%|█▉ | 2263/11526 [23:36<1:34:51, 1.63it/s] 20%|█▉ | 2264/11526 [23:37<1:34:47, 1.63it/s] {'loss': 0.3355, 'grad_norm': 0.6932318210601807, 'learning_rate': 9.719613103947571e-06, 'epoch': 0.59}
20%|█▉ | 2264/11526 [23:37<1:34:47, 1.63it/s] 20%|█▉ | 2265/11526 [23:37<1:34:45, 1.63it/s] {'loss': 0.2539, 'grad_norm': 0.5565400123596191, 'learning_rate': 9.71911291186586e-06, 'epoch': 0.59}
20%|█▉ | 2265/11526 [23:37<1:34:45, 1.63it/s] 20%|█▉ | 2266/11526 [23:38<1:34:51, 1.63it/s] {'loss': 0.2663, 'grad_norm': 0.5257511734962463, 'learning_rate': 9.718612286920261e-06, 'epoch': 0.59}
20%|█▉ | 2266/11526 [23:38<1:34:51, 1.63it/s] 20%|█▉ | 2267/11526 [23:38<1:34:58, 1.62it/s] {'loss': 0.2853, 'grad_norm': 0.5365620255470276, 'learning_rate': 9.718111229156694e-06, 'epoch': 0.59}
20%|█▉ | 2267/11526 [23:39<1:34:58, 1.62it/s] 20%|█▉ | 2268/11526 [23:39<1:34:53, 1.63it/s] {'loss': 0.2818, 'grad_norm': 0.5742992162704468, 'learning_rate': 9.71760973862112e-06, 'epoch': 0.59}
20%|█▉ | 2268/11526 [23:39<1:34:53, 1.63it/s] 20%|█▉ | 2269/11526 [23:40<1:34:52, 1.63it/s] {'loss': 0.2543, 'grad_norm': 0.5519700050354004, 'learning_rate': 9.71710781535954e-06, 'epoch': 0.59}
20%|█▉ | 2269/11526 [23:40<1:34:52, 1.63it/s] 20%|█▉ | 2270/11526 [23:40<1:34:48, 1.63it/s] {'loss': 0.2756, 'grad_norm': 0.5372327566146851, 'learning_rate': 9.716605459417994e-06, 'epoch': 0.59}
20%|█▉ | 2270/11526 [23:40<1:34:48, 1.63it/s] 20%|█▉ | 2271/11526 [23:41<1:40:04, 1.54it/s] {'loss': 0.2091, 'grad_norm': 0.5214183926582336, 'learning_rate': 9.716102670842556e-06, 'epoch': 0.59}
20%|█▉ | 2271/11526 [23:41<1:40:04, 1.54it/s] 20%|█▉ | 2272/11526 [23:42<1:38:32, 1.57it/s] {'loss': 0.2276, 'grad_norm': 0.5222524404525757, 'learning_rate': 9.715599449679347e-06, 'epoch': 0.59}
20%|█▉ | 2272/11526 [23:42<1:38:32, 1.57it/s] 20%|█▉ | 2273/11526 [23:42<1:37:22, 1.58it/s] {'loss': 0.1775, 'grad_norm': 0.4880804121494293, 'learning_rate': 9.715095795974527e-06, 'epoch': 0.59}
20%|█▉ | 2273/11526 [23:42<1:37:22, 1.58it/s] 20%|█▉ | 2274/11526 [23:43<1:36:31, 1.60it/s] {'loss': 0.2797, 'grad_norm': 0.6348229050636292, 'learning_rate': 9.714591709774293e-06, 'epoch': 0.59}
20%|█▉ | 2274/11526 [23:43<1:36:31, 1.60it/s] 20%|█▉ | 2275/11526 [23:43<1:35:59, 1.61it/s] {'loss': 0.2541, 'grad_norm': 0.5239695310592651, 'learning_rate': 9.714087191124882e-06, 'epoch': 0.59}
20%|█▉ | 2275/11526 [23:44<1:35:59, 1.61it/s] 20%|█▉ | 2276/11526 [23:44<1:35:33, 1.61it/s] {'loss': 0.2668, 'grad_norm': 0.628639280796051, 'learning_rate': 9.713582240072572e-06, 'epoch': 0.59}
20%|█▉ | 2276/11526 [23:44<1:35:33, 1.61it/s] 20%|█▉ | 2277/11526 [23:45<1:35:21, 1.62it/s] {'loss': 0.2647, 'grad_norm': 0.5820115804672241, 'learning_rate': 9.71307685666368e-06, 'epoch': 0.59}
20%|█▉ | 2277/11526 [23:45<1:35:21, 1.62it/s] 20%|█▉ | 2278/11526 [23:45<1:35:07, 1.62it/s] {'loss': 0.3003, 'grad_norm': 0.6373938322067261, 'learning_rate': 9.712571040944561e-06, 'epoch': 0.59}
20%|█▉ | 2278/11526 [23:45<1:35:07, 1.62it/s] 20%|█▉ | 2279/11526 [23:46<1:34:57, 1.62it/s] {'loss': 0.2668, 'grad_norm': 0.5569896697998047, 'learning_rate': 9.712064792961614e-06, 'epoch': 0.59}
20%|█▉ | 2279/11526 [23:46<1:34:57, 1.62it/s] 20%|█▉ | 2280/11526 [23:47<1:34:51, 1.62it/s] {'loss': 0.2672, 'grad_norm': 0.5576863884925842, 'learning_rate': 9.711558112761275e-06, 'epoch': 0.59}
20%|█▉ | 2280/11526 [23:47<1:34:51, 1.62it/s] 20%|█▉ | 2281/11526 [23:47<1:34:48, 1.63it/s] {'loss': 0.2462, 'grad_norm': 0.5173166394233704, 'learning_rate': 9.711051000390015e-06, 'epoch': 0.59}
20%|█▉ | 2281/11526 [23:47<1:34:48, 1.63it/s] 20%|█▉ | 2282/11526 [23:48<1:34:50, 1.62it/s] {'loss': 0.2627, 'grad_norm': 0.5652666687965393, 'learning_rate': 9.710543455894354e-06, 'epoch': 0.59}
20%|█▉ | 2282/11526 [23:48<1:34:50, 1.62it/s] 20%|█▉ | 2283/11526 [23:48<1:34:43, 1.63it/s] {'loss': 0.3611, 'grad_norm': 0.7126139402389526, 'learning_rate': 9.710035479320847e-06, 'epoch': 0.59}
20%|█▉ | 2283/11526 [23:49<1:34:43, 1.63it/s] 20%|█▉ | 2284/11526 [23:49<1:34:41, 1.63it/s] {'loss': 0.3139, 'grad_norm': 0.6938598155975342, 'learning_rate': 9.709527070716085e-06, 'epoch': 0.59}
20%|█▉ | 2284/11526 [23:49<1:34:41, 1.63it/s] 20%|█▉ | 2285/11526 [23:50<1:34:38, 1.63it/s] {'loss': 0.262, 'grad_norm': 0.5847169756889343, 'learning_rate': 9.709018230126703e-06, 'epoch': 0.59}
20%|█▉ | 2285/11526 [23:50<1:34:38, 1.63it/s] 20%|█▉ | 2286/11526 [23:50<1:34:37, 1.63it/s] {'loss': 0.3227, 'grad_norm': 0.6047124266624451, 'learning_rate': 9.708508957599379e-06, 'epoch': 0.6}
20%|█▉ | 2286/11526 [23:50<1:34:37, 1.63it/s] 20%|█▉ | 2287/11526 [23:51<1:34:44, 1.63it/s] {'loss': 0.2157, 'grad_norm': 0.5432474613189697, 'learning_rate': 9.70799925318082e-06, 'epoch': 0.6}
20%|█▉ | 2287/11526 [23:51<1:34:44, 1.63it/s] 20%|█▉ | 2288/11526 [23:51<1:34:40, 1.63it/s] {'loss': 0.2967, 'grad_norm': 0.6576888561248779, 'learning_rate': 9.707489116917784e-06, 'epoch': 0.6}
20%|█▉ | 2288/11526 [23:52<1:34:40, 1.63it/s] 20%|█▉ | 2289/11526 [23:52<1:34:47, 1.62it/s] {'loss': 0.2662, 'grad_norm': 0.5502753257751465, 'learning_rate': 9.70697854885706e-06, 'epoch': 0.6}
20%|█▉ | 2289/11526 [23:52<1:34:47, 1.62it/s] 20%|█▉ | 2290/11526 [23:53<1:34:42, 1.63it/s] {'loss': 0.2577, 'grad_norm': 0.580558717250824, 'learning_rate': 9.706467549045483e-06, 'epoch': 0.6}
20%|█▉ | 2290/11526 [23:53<1:34:42, 1.63it/s] 20%|█▉ | 2291/11526 [23:53<1:34:37, 1.63it/s] {'loss': 0.2312, 'grad_norm': 0.5530181527137756, 'learning_rate': 9.705956117529924e-06, 'epoch': 0.6}
20%|█▉ | 2291/11526 [23:53<1:34:37, 1.63it/s] 20%|█▉ | 2292/11526 [23:54<1:34:38, 1.63it/s] {'loss': 0.3074, 'grad_norm': 0.5653261542320251, 'learning_rate': 9.705444254357293e-06, 'epoch': 0.6}
20%|█▉ | 2292/11526 [23:54<1:34:38, 1.63it/s] 20%|█▉ | 2293/11526 [23:55<1:34:33, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.5963444709777832, 'learning_rate': 9.704931959574543e-06, 'epoch': 0.6}
20%|█▉ | 2293/11526 [23:55<1:34:33, 1.63it/s] 20%|█▉ | 2294/11526 [23:55<1:35:00, 1.62it/s] {'loss': 0.3015, 'grad_norm': 0.5751652121543884, 'learning_rate': 9.704419233228664e-06, 'epoch': 0.6}
20%|█▉ | 2294/11526 [23:55<1:35:00, 1.62it/s] 20%|█▉ | 2295/11526 [23:56<1:34:47, 1.62it/s] {'loss': 0.2803, 'grad_norm': 0.6559172868728638, 'learning_rate': 9.703906075366684e-06, 'epoch': 0.6}
20%|█▉ | 2295/11526 [23:56<1:34:47, 1.62it/s] 20%|█▉ | 2296/11526 [23:56<1:34:40, 1.62it/s] {'loss': 0.3459, 'grad_norm': 0.589218258857727, 'learning_rate': 9.703392486035676e-06, 'epoch': 0.6}
20%|█▉ | 2296/11526 [23:57<1:34:40, 1.62it/s] 20%|█▉ | 2297/11526 [23:57<1:34:41, 1.62it/s] {'loss': 0.2622, 'grad_norm': 0.5147632360458374, 'learning_rate': 9.702878465282748e-06, 'epoch': 0.6}
20%|█▉ | 2297/11526 [23:57<1:34:41, 1.62it/s] 20%|█▉ | 2298/11526 [23:58<1:34:36, 1.63it/s] {'loss': 0.3133, 'grad_norm': 0.6455520391464233, 'learning_rate': 9.70236401315505e-06, 'epoch': 0.6}
20%|█▉ | 2298/11526 [23:58<1:34:36, 1.63it/s] 20%|█▉ | 2299/11526 [23:58<1:34:41, 1.62it/s] {'loss': 0.2095, 'grad_norm': 0.48440444469451904, 'learning_rate': 9.701849129699767e-06, 'epoch': 0.6}
20%|█▉ | 2299/11526 [23:58<1:34:41, 1.62it/s] 20%|█▉ | 2300/11526 [23:59<1:34:37, 1.62it/s] {'loss': 0.3181, 'grad_norm': 0.6016982793807983, 'learning_rate': 9.70133381496413e-06, 'epoch': 0.6}
20%|█▉ | 2300/11526 [23:59<1:34:37, 1.62it/s] 20%|█▉ | 2301/11526 [23:59<1:34:32, 1.63it/s] {'loss': 0.2879, 'grad_norm': 0.5514999032020569, 'learning_rate': 9.700818068995407e-06, 'epoch': 0.6}
20%|█▉ | 2301/11526 [24:00<1:34:32, 1.63it/s] 20%|█▉ | 2302/11526 [24:00<1:34:32, 1.63it/s] {'loss': 0.2551, 'grad_norm': 0.5348091125488281, 'learning_rate': 9.700301891840904e-06, 'epoch': 0.6}
20%|█▉ | 2302/11526 [24:00<1:34:32, 1.63it/s] 20%|█▉ | 2303/11526 [24:01<1:34:30, 1.63it/s] {'loss': 0.2873, 'grad_norm': 0.6056126952171326, 'learning_rate': 9.69978528354797e-06, 'epoch': 0.6}
20%|█▉ | 2303/11526 [24:01<1:34:30, 1.63it/s] 20%|█▉ | 2304/11526 [24:01<1:34:33, 1.63it/s] {'loss': 0.2898, 'grad_norm': 0.5611634850502014, 'learning_rate': 9.699268244163986e-06, 'epoch': 0.6}
20%|█▉ | 2304/11526 [24:01<1:34:33, 1.63it/s] 20%|█▉ | 2305/11526 [24:02<1:34:27, 1.63it/s] {'loss': 0.2452, 'grad_norm': 0.5648555159568787, 'learning_rate': 9.69875077373638e-06, 'epoch': 0.6}
20%|█▉ | 2305/11526 [24:02<1:34:27, 1.63it/s] 20%|██ | 2306/11526 [24:03<1:34:23, 1.63it/s] {'loss': 0.3523, 'grad_norm': 0.621058464050293, 'learning_rate': 9.69823287231262e-06, 'epoch': 0.6}
20%|██ | 2306/11526 [24:03<1:34:23, 1.63it/s] 20%|██ | 2307/11526 [24:03<1:34:28, 1.63it/s] {'loss': 0.262, 'grad_norm': 0.5210852026939392, 'learning_rate': 9.69771453994021e-06, 'epoch': 0.6}
20%|██ | 2307/11526 [24:03<1:34:28, 1.63it/s] 20%|██ | 2308/11526 [24:04<1:34:26, 1.63it/s] {'loss': 0.2686, 'grad_norm': 0.6080027222633362, 'learning_rate': 9.697195776666692e-06, 'epoch': 0.6}
20%|██ | 2308/11526 [24:04<1:34:26, 1.63it/s] 20%|██ | 2309/11526 [24:04<1:34:37, 1.62it/s] {'loss': 0.2907, 'grad_norm': 0.6368104219436646, 'learning_rate': 9.696676582539653e-06, 'epoch': 0.6}
20%|██ | 2309/11526 [24:05<1:34:37, 1.62it/s] 20%|██ | 2310/11526 [24:05<1:34:36, 1.62it/s] {'loss': 0.3549, 'grad_norm': 0.6140780448913574, 'learning_rate': 9.696156957606712e-06, 'epoch': 0.6}
20%|██ | 2310/11526 [24:05<1:34:36, 1.62it/s] 20%|██ | 2311/11526 [24:06<1:34:33, 1.62it/s] {'loss': 0.2716, 'grad_norm': 0.5588167309761047, 'learning_rate': 9.695636901915537e-06, 'epoch': 0.6}
20%|██ | 2311/11526 [24:06<1:34:33, 1.62it/s] 20%|██ | 2312/11526 [24:06<1:34:37, 1.62it/s] {'loss': 0.2535, 'grad_norm': 0.5904460549354553, 'learning_rate': 9.695116415513828e-06, 'epoch': 0.6}
20%|██ | 2312/11526 [24:06<1:34:37, 1.62it/s] 20%|██ | 2313/11526 [24:07<1:34:31, 1.62it/s] {'loss': 0.3763, 'grad_norm': 0.7151129245758057, 'learning_rate': 9.694595498449328e-06, 'epoch': 0.6}
20%|██ | 2313/11526 [24:07<1:34:31, 1.62it/s] 20%|██ | 2314/11526 [24:07<1:34:32, 1.62it/s] {'loss': 0.3677, 'grad_norm': 0.7471069097518921, 'learning_rate': 9.694074150769816e-06, 'epoch': 0.6}
20%|██ | 2314/11526 [24:08<1:34:32, 1.62it/s] 20%|██ | 2315/11526 [24:08<1:34:26, 1.63it/s] {'loss': 0.3313, 'grad_norm': 0.5766516327857971, 'learning_rate': 9.693552372523117e-06, 'epoch': 0.6}
20%|██ | 2315/11526 [24:08<1:34:26, 1.63it/s] 20%|██ | 2316/11526 [24:09<1:34:23, 1.63it/s] {'loss': 0.3281, 'grad_norm': 0.6557338833808899, 'learning_rate': 9.693030163757087e-06, 'epoch': 0.6}
20%|██ | 2316/11526 [24:09<1:34:23, 1.63it/s] 20%|██ | 2317/11526 [24:09<1:34:44, 1.62it/s] {'loss': 0.2859, 'grad_norm': 0.545998752117157, 'learning_rate': 9.69250752451963e-06, 'epoch': 0.6}
20%|██ | 2317/11526 [24:09<1:34:44, 1.62it/s] 20%|██ | 2318/11526 [24:10<1:34:36, 1.62it/s] {'loss': 0.2188, 'grad_norm': 0.5438681840896606, 'learning_rate': 9.691984454858682e-06, 'epoch': 0.6}
20%|██ | 2318/11526 [24:10<1:34:36, 1.62it/s] 20%|██ | 2319/11526 [24:11<1:34:40, 1.62it/s] {'loss': 0.4279, 'grad_norm': 0.829033613204956, 'learning_rate': 9.691460954822224e-06, 'epoch': 0.6}
20%|██ | 2319/11526 [24:11<1:34:40, 1.62it/s] 20%|██ | 2320/11526 [24:11<1:34:34, 1.62it/s] {'loss': 0.2836, 'grad_norm': 0.5434207320213318, 'learning_rate': 9.690937024458273e-06, 'epoch': 0.6}
20%|██ | 2320/11526 [24:11<1:34:34, 1.62it/s] 20%|██ | 2321/11526 [24:12<1:34:28, 1.62it/s] {'loss': 0.262, 'grad_norm': 0.6182608008384705, 'learning_rate': 9.69041266381489e-06, 'epoch': 0.6}
20%|██ | 2321/11526 [24:12<1:34:28, 1.62it/s] 20%|██ | 2322/11526 [24:12<1:34:48, 1.62it/s] {'loss': 0.3431, 'grad_norm': 0.6909881234169006, 'learning_rate': 9.68988787294017e-06, 'epoch': 0.6}
20%|██ | 2322/11526 [24:13<1:34:48, 1.62it/s] 20%|██ | 2323/11526 [24:13<1:34:36, 1.62it/s] {'loss': 0.391, 'grad_norm': 0.6632823348045349, 'learning_rate': 9.689362651882248e-06, 'epoch': 0.6}
20%|██ | 2323/11526 [24:13<1:34:36, 1.62it/s] 20%|██ | 2324/11526 [24:14<1:39:37, 1.54it/s] {'loss': 0.2518, 'grad_norm': 0.6618539094924927, 'learning_rate': 9.688837000689303e-06, 'epoch': 0.6}
20%|██ | 2324/11526 [24:14<1:39:37, 1.54it/s] 20%|██ | 2325/11526 [24:14<1:37:57, 1.57it/s] {'loss': 0.261, 'grad_norm': 0.5789929628372192, 'learning_rate': 9.688310919409548e-06, 'epoch': 0.61}
20%|██ | 2325/11526 [24:14<1:37:57, 1.57it/s] 20%|██ | 2326/11526 [24:15<1:36:47, 1.58it/s] {'loss': 0.3579, 'grad_norm': 0.7352524995803833, 'learning_rate': 9.687784408091243e-06, 'epoch': 0.61}
20%|██ | 2326/11526 [24:15<1:36:47, 1.58it/s] 20%|██ | 2327/11526 [24:16<1:36:04, 1.60it/s] {'loss': 0.2196, 'grad_norm': 0.597505509853363, 'learning_rate': 9.68725746678268e-06, 'epoch': 0.61}
20%|██ | 2327/11526 [24:16<1:36:04, 1.60it/s] 20%|██ | 2328/11526 [24:16<1:35:27, 1.61it/s] {'loss': 0.2533, 'grad_norm': 0.5230553150177002, 'learning_rate': 9.686730095532191e-06, 'epoch': 0.61}
20%|██ | 2328/11526 [24:16<1:35:27, 1.61it/s] 20%|██ | 2329/11526 [24:17<1:35:02, 1.61it/s] {'loss': 0.2348, 'grad_norm': 0.5833722352981567, 'learning_rate': 9.686202294388152e-06, 'epoch': 0.61}
20%|██ | 2329/11526 [24:17<1:35:02, 1.61it/s] 20%|██ | 2330/11526 [24:18<1:39:51, 1.53it/s] {'loss': 0.3365, 'grad_norm': 0.5871373414993286, 'learning_rate': 9.685674063398974e-06, 'epoch': 0.61}
20%|██ | 2330/11526 [24:18<1:39:51, 1.53it/s] 20%|██ | 2331/11526 [24:18<1:38:05, 1.56it/s] {'loss': 0.1981, 'grad_norm': 0.5753351449966431, 'learning_rate': 9.685145402613111e-06, 'epoch': 0.61}
20%|██ | 2331/11526 [24:18<1:38:05, 1.56it/s] 20%|██ | 2332/11526 [24:19<1:37:02, 1.58it/s] {'loss': 0.285, 'grad_norm': 0.5761277079582214, 'learning_rate': 9.684616312079056e-06, 'epoch': 0.61}
20%|██ | 2332/11526 [24:19<1:37:02, 1.58it/s] 20%|██ | 2333/11526 [24:19<1:36:12, 1.59it/s] {'loss': 0.305, 'grad_norm': 0.5990400314331055, 'learning_rate': 9.684086791845337e-06, 'epoch': 0.61}
20%|██ | 2333/11526 [24:20<1:36:12, 1.59it/s] 20%|██ | 2334/11526 [24:20<1:35:32, 1.60it/s] {'loss': 0.2483, 'grad_norm': 0.5944958925247192, 'learning_rate': 9.683556841960526e-06, 'epoch': 0.61}
20%|██ | 2334/11526 [24:20<1:35:32, 1.60it/s] 20%|██ | 2335/11526 [24:21<1:35:10, 1.61it/s] {'loss': 0.2877, 'grad_norm': 0.5602959394454956, 'learning_rate': 9.683026462473233e-06, 'epoch': 0.61}
20%|██ | 2335/11526 [24:21<1:35:10, 1.61it/s] 20%|██ | 2336/11526 [24:21<1:34:51, 1.61it/s] {'loss': 0.2745, 'grad_norm': 0.5266185998916626, 'learning_rate': 9.682495653432109e-06, 'epoch': 0.61}
20%|██ | 2336/11526 [24:21<1:34:51, 1.61it/s] 20%|██ | 2337/11526 [24:22<1:34:37, 1.62it/s] {'loss': 0.2875, 'grad_norm': 0.6628203988075256, 'learning_rate': 9.681964414885841e-06, 'epoch': 0.61}
20%|██ | 2337/11526 [24:22<1:34:37, 1.62it/s] 20%|██ | 2338/11526 [24:22<1:34:27, 1.62it/s] {'loss': 0.2724, 'grad_norm': 0.6288874745368958, 'learning_rate': 9.681432746883156e-06, 'epoch': 0.61}
20%|██ | 2338/11526 [24:23<1:34:27, 1.62it/s] 20%|██ | 2339/11526 [24:23<1:34:21, 1.62it/s] {'loss': 0.2529, 'grad_norm': 0.5182552933692932, 'learning_rate': 9.680900649472825e-06, 'epoch': 0.61}
20%|██ | 2339/11526 [24:23<1:34:21, 1.62it/s] 20%|██ | 2340/11526 [24:24<1:34:23, 1.62it/s] {'loss': 0.4173, 'grad_norm': 0.7037155032157898, 'learning_rate': 9.680368122703651e-06, 'epoch': 0.61}
20%|██ | 2340/11526 [24:24<1:34:23, 1.62it/s] 20%|██ | 2341/11526 [24:24<1:34:16, 1.62it/s] {'loss': 0.3297, 'grad_norm': 0.621253490447998, 'learning_rate': 9.679835166624483e-06, 'epoch': 0.61}
20%|██ | 2341/11526 [24:24<1:34:16, 1.62it/s] 20%|██ | 2342/11526 [24:25<1:34:18, 1.62it/s] {'loss': 0.2386, 'grad_norm': 0.5485682487487793, 'learning_rate': 9.679301781284209e-06, 'epoch': 0.61}
20%|██ | 2342/11526 [24:25<1:34:18, 1.62it/s] 20%|██ | 2343/11526 [24:26<1:34:14, 1.62it/s] {'loss': 0.2722, 'grad_norm': 0.5729796290397644, 'learning_rate': 9.67876796673175e-06, 'epoch': 0.61}
20%|██ | 2343/11526 [24:26<1:34:14, 1.62it/s] 20%|██ | 2344/11526 [24:26<1:34:10, 1.62it/s] {'loss': 0.3028, 'grad_norm': 0.7209996581077576, 'learning_rate': 9.67823372301607e-06, 'epoch': 0.61}
20%|██ | 2344/11526 [24:26<1:34:10, 1.62it/s] 20%|██ | 2345/11526 [24:27<1:34:35, 1.62it/s] {'loss': 0.2323, 'grad_norm': 0.5024094581604004, 'learning_rate': 9.677699050186178e-06, 'epoch': 0.61}
20%|██ | 2345/11526 [24:27<1:34:35, 1.62it/s] 20%|██ | 2346/11526 [24:27<1:34:22, 1.62it/s] {'loss': 0.282, 'grad_norm': 0.6121571660041809, 'learning_rate': 9.677163948291113e-06, 'epoch': 0.61}
20%|██ | 2346/11526 [24:28<1:34:22, 1.62it/s] 20%|██ | 2347/11526 [24:28<1:34:26, 1.62it/s] {'loss': 0.255, 'grad_norm': 0.5580028891563416, 'learning_rate': 9.676628417379959e-06, 'epoch': 0.61}
20%|██ | 2347/11526 [24:28<1:34:26, 1.62it/s] 20%|██ | 2348/11526 [24:29<1:34:18, 1.62it/s] {'loss': 0.2443, 'grad_norm': 0.5003350973129272, 'learning_rate': 9.676092457501836e-06, 'epoch': 0.61}
20%|██ | 2348/11526 [24:29<1:34:18, 1.62it/s] 20%|██ | 2349/11526 [24:29<1:34:09, 1.62it/s] {'loss': 0.3301, 'grad_norm': 0.6231930255889893, 'learning_rate': 9.675556068705908e-06, 'epoch': 0.61}
20%|██ | 2349/11526 [24:29<1:34:09, 1.62it/s] 20%|██ | 2350/11526 [24:30<1:34:07, 1.62it/s] {'loss': 0.2383, 'grad_norm': 0.5342728495597839, 'learning_rate': 9.675019251041374e-06, 'epoch': 0.61}
20%|██ | 2350/11526 [24:30<1:34:07, 1.62it/s] 20%|██ | 2351/11526 [24:30<1:34:03, 1.63it/s] {'loss': 0.3037, 'grad_norm': 0.6842092275619507, 'learning_rate': 9.674482004557475e-06, 'epoch': 0.61}
20%|██ | 2351/11526 [24:31<1:34:03, 1.63it/s] 20%|██ | 2352/11526 [24:31<1:34:09, 1.62it/s] {'loss': 0.3416, 'grad_norm': 0.5981770753860474, 'learning_rate': 9.67394432930349e-06, 'epoch': 0.61}
20%|██ | 2352/11526 [24:31<1:34:09, 1.62it/s] 20%|██ | 2353/11526 [24:32<1:34:05, 1.62it/s] {'loss': 0.2354, 'grad_norm': 0.5015993118286133, 'learning_rate': 9.673406225328739e-06, 'epoch': 0.61}
20%|██ | 2353/11526 [24:32<1:34:05, 1.62it/s] 20%|██ | 2354/11526 [24:32<1:34:04, 1.63it/s] {'loss': 0.3211, 'grad_norm': 0.6021949648857117, 'learning_rate': 9.672867692682577e-06, 'epoch': 0.61}
20%|██ | 2354/11526 [24:32<1:34:04, 1.63it/s] 20%|██ | 2355/11526 [24:33<1:34:03, 1.63it/s] {'loss': 0.3906, 'grad_norm': 0.6083105206489563, 'learning_rate': 9.672328731414402e-06, 'epoch': 0.61}
20%|██ | 2355/11526 [24:33<1:34:03, 1.63it/s] 20%|██ | 2356/11526 [24:34<1:33:59, 1.63it/s] {'loss': 0.3484, 'grad_norm': 0.6970499157905579, 'learning_rate': 9.671789341573652e-06, 'epoch': 0.61}
20%|██ | 2356/11526 [24:34<1:33:59, 1.63it/s] 20%|██ | 2357/11526 [24:34<1:34:04, 1.62it/s] {'loss': 0.2486, 'grad_norm': 0.6004001498222351, 'learning_rate': 9.6712495232098e-06, 'epoch': 0.61}
20%|██ | 2357/11526 [24:34<1:34:04, 1.62it/s] 20%|██ | 2358/11526 [24:35<1:34:02, 1.62it/s] {'loss': 0.2288, 'grad_norm': 0.5601375102996826, 'learning_rate': 9.670709276372368e-06, 'epoch': 0.61}
20%|██ | 2358/11526 [24:35<1:34:02, 1.62it/s] 20%|██ | 2359/11526 [24:35<1:33:57, 1.63it/s] {'loss': 0.4092, 'grad_norm': 0.6590405702590942, 'learning_rate': 9.670168601110902e-06, 'epoch': 0.61}
20%|██ | 2359/11526 [24:36<1:33:57, 1.63it/s] 20%|██ | 2360/11526 [24:36<1:33:55, 1.63it/s] {'loss': 0.4034, 'grad_norm': 0.6031973361968994, 'learning_rate': 9.669627497475002e-06, 'epoch': 0.61}
20%|██ | 2360/11526 [24:36<1:33:55, 1.63it/s] 20%|██ | 2361/11526 [24:37<1:33:52, 1.63it/s] {'loss': 0.253, 'grad_norm': 0.5104526281356812, 'learning_rate': 9.669085965514297e-06, 'epoch': 0.61}
20%|██ | 2361/11526 [24:37<1:33:52, 1.63it/s] 20%|██ | 2362/11526 [24:37<1:33:57, 1.63it/s] {'loss': 0.2637, 'grad_norm': 0.5494610667228699, 'learning_rate': 9.668544005278461e-06, 'epoch': 0.61}
20%|██ | 2362/11526 [24:37<1:33:57, 1.63it/s] 21%|██ | 2363/11526 [24:38<1:33:54, 1.63it/s] {'loss': 0.2585, 'grad_norm': 0.5614794492721558, 'learning_rate': 9.66800161681721e-06, 'epoch': 0.62}
21%|██ | 2363/11526 [24:38<1:33:54, 1.63it/s] 21%|██ | 2364/11526 [24:38<1:33:52, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.4914962947368622, 'learning_rate': 9.667458800180287e-06, 'epoch': 0.62}
21%|██ | 2364/11526 [24:39<1:33:52, 1.63it/s] 21%|██ | 2365/11526 [24:39<1:33:48, 1.63it/s] {'loss': 0.3061, 'grad_norm': 0.7151608467102051, 'learning_rate': 9.666915555417486e-06, 'epoch': 0.62}
21%|██ | 2365/11526 [24:39<1:33:48, 1.63it/s] 21%|██ | 2366/11526 [24:40<1:33:50, 1.63it/s] {'loss': 0.3474, 'grad_norm': 0.5704089999198914, 'learning_rate': 9.666371882578638e-06, 'epoch': 0.62}
21%|██ | 2366/11526 [24:40<1:33:50, 1.63it/s] 21%|██ | 2367/11526 [24:40<1:34:16, 1.62it/s] {'loss': 0.2434, 'grad_norm': 0.5337005257606506, 'learning_rate': 9.66582778171361e-06, 'epoch': 0.62}
21%|██ | 2367/11526 [24:40<1:34:16, 1.62it/s] 21%|██ | 2368/11526 [24:41<1:34:05, 1.62it/s] {'loss': 0.2136, 'grad_norm': 0.46144187450408936, 'learning_rate': 9.66528325287231e-06, 'epoch': 0.62}
21%|██ | 2368/11526 [24:41<1:34:05, 1.62it/s] 21%|██ | 2369/11526 [24:42<1:33:57, 1.62it/s] {'loss': 0.2504, 'grad_norm': 0.4935045540332794, 'learning_rate': 9.664738296104688e-06, 'epoch': 0.62}
21%|██ | 2369/11526 [24:42<1:33:57, 1.62it/s] 21%|██ | 2370/11526 [24:42<1:33:52, 1.63it/s] {'loss': 0.223, 'grad_norm': 0.5179439783096313, 'learning_rate': 9.664192911460726e-06, 'epoch': 0.62}
21%|██ | 2370/11526 [24:42<1:33:52, 1.63it/s] 21%|██ | 2371/11526 [24:43<1:33:44, 1.63it/s] {'loss': 0.2468, 'grad_norm': 0.5636986494064331, 'learning_rate': 9.663647098990452e-06, 'epoch': 0.62}
21%|██ | 2371/11526 [24:43<1:33:44, 1.63it/s] 21%|██ | 2372/11526 [24:43<1:33:51, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.6262686848640442, 'learning_rate': 9.663100858743932e-06, 'epoch': 0.62}
21%|██ | 2372/11526 [24:44<1:33:51, 1.63it/s] 21%|██ | 2373/11526 [24:44<1:33:45, 1.63it/s] {'loss': 0.2201, 'grad_norm': 0.47452476620674133, 'learning_rate': 9.662554190771268e-06, 'epoch': 0.62}
21%|██ | 2373/11526 [24:44<1:33:45, 1.63it/s] 21%|██ | 2374/11526 [24:45<1:33:44, 1.63it/s] {'loss': 0.1951, 'grad_norm': 0.5110661387443542, 'learning_rate': 9.662007095122605e-06, 'epoch': 0.62}
21%|██ | 2374/11526 [24:45<1:33:44, 1.63it/s] 21%|██ | 2375/11526 [24:45<1:33:42, 1.63it/s] {'loss': 0.2318, 'grad_norm': 0.5049543380737305, 'learning_rate': 9.661459571848126e-06, 'epoch': 0.62}
21%|██ | 2375/11526 [24:45<1:33:42, 1.63it/s] 21%|██ | 2376/11526 [24:46<1:33:40, 1.63it/s] {'loss': 0.3208, 'grad_norm': 0.6033056974411011, 'learning_rate': 9.660911620998054e-06, 'epoch': 0.62}
21%|██ | 2376/11526 [24:46<1:33:40, 1.63it/s] 21%|██ | 2377/11526 [24:46<1:33:46, 1.63it/s] {'loss': 0.2862, 'grad_norm': 0.6451190710067749, 'learning_rate': 9.660363242622646e-06, 'epoch': 0.62}
21%|██ | 2377/11526 [24:47<1:33:46, 1.63it/s] 21%|██ | 2378/11526 [24:47<1:33:43, 1.63it/s] {'loss': 0.3353, 'grad_norm': 0.6649027466773987, 'learning_rate': 9.659814436772207e-06, 'epoch': 0.62}
21%|██ | 2378/11526 [24:47<1:33:43, 1.63it/s] 21%|██ | 2379/11526 [24:48<1:33:41, 1.63it/s] {'loss': 0.2286, 'grad_norm': 0.6034337878227234, 'learning_rate': 9.659265203497074e-06, 'epoch': 0.62}
21%|██ | 2379/11526 [24:48<1:33:41, 1.63it/s] 21%|██ | 2380/11526 [24:48<1:33:39, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.4199066460132599, 'learning_rate': 9.658715542847626e-06, 'epoch': 0.62}
21%|██ | 2380/11526 [24:48<1:33:39, 1.63it/s] 21%|██ | 2381/11526 [24:49<1:33:37, 1.63it/s] {'loss': 0.2763, 'grad_norm': 0.5956510901451111, 'learning_rate': 9.658165454874282e-06, 'epoch': 0.62}
21%|██ | 2381/11526 [24:49<1:33:37, 1.63it/s] 21%|██ | 2382/11526 [24:50<1:33:47, 1.62it/s] {'loss': 0.3576, 'grad_norm': 0.7121544480323792, 'learning_rate': 9.657614939627498e-06, 'epoch': 0.62}
21%|██ | 2382/11526 [24:50<1:33:47, 1.62it/s] 21%|██ | 2383/11526 [24:50<1:33:40, 1.63it/s] {'loss': 0.2172, 'grad_norm': 0.534913957118988, 'learning_rate': 9.65706399715777e-06, 'epoch': 0.62}
21%|██ | 2383/11526 [24:50<1:33:40, 1.63it/s] 21%|██ | 2384/11526 [24:51<1:33:38, 1.63it/s] {'loss': 0.2564, 'grad_norm': 0.628411591053009, 'learning_rate': 9.656512627515635e-06, 'epoch': 0.62}
21%|██ | 2384/11526 [24:51<1:33:38, 1.63it/s] 21%|██ | 2385/11526 [24:51<1:33:42, 1.63it/s] {'loss': 0.2172, 'grad_norm': 0.5341322422027588, 'learning_rate': 9.655960830751669e-06, 'epoch': 0.62}
21%|██ | 2385/11526 [24:52<1:33:42, 1.63it/s] 21%|██ | 2386/11526 [24:52<1:33:36, 1.63it/s] {'loss': 0.2952, 'grad_norm': 0.5778174996376038, 'learning_rate': 9.655408606916484e-06, 'epoch': 0.62}
21%|██ | 2386/11526 [24:52<1:33:36, 1.63it/s] 21%|██ | 2387/11526 [24:53<1:33:42, 1.63it/s] {'loss': 0.2901, 'grad_norm': 0.5697099566459656, 'learning_rate': 9.654855956060733e-06, 'epoch': 0.62}
21%|██ | 2387/11526 [24:53<1:33:42, 1.63it/s] 21%|██ | 2388/11526 [24:53<1:33:37, 1.63it/s] {'loss': 0.2692, 'grad_norm': 0.5708557963371277, 'learning_rate': 9.654302878235107e-06, 'epoch': 0.62}
21%|██ | 2388/11526 [24:53<1:33:37, 1.63it/s] 21%|██ | 2389/11526 [24:54<1:33:34, 1.63it/s] {'loss': 0.2456, 'grad_norm': 0.5323671102523804, 'learning_rate': 9.653749373490341e-06, 'epoch': 0.62}
21%|██ | 2389/11526 [24:54<1:33:34, 1.63it/s] 21%|██ | 2390/11526 [24:54<1:33:32, 1.63it/s] {'loss': 0.2656, 'grad_norm': 0.5067787766456604, 'learning_rate': 9.653195441877203e-06, 'epoch': 0.62}
21%|██ | 2390/11526 [24:55<1:33:32, 1.63it/s] 21%|██ | 2391/11526 [24:55<1:33:31, 1.63it/s] {'loss': 0.2339, 'grad_norm': 0.5524668097496033, 'learning_rate': 9.652641083446505e-06, 'epoch': 0.62}
21%|██ | 2391/11526 [24:55<1:33:31, 1.63it/s] 21%|██ | 2392/11526 [24:56<1:33:30, 1.63it/s] {'loss': 0.3574, 'grad_norm': 0.6179357171058655, 'learning_rate': 9.652086298249093e-06, 'epoch': 0.62}
21%|██ | 2392/11526 [24:56<1:33:30, 1.63it/s] 21%|██ | 2393/11526 [24:56<1:33:29, 1.63it/s] {'loss': 0.2319, 'grad_norm': 0.5677816271781921, 'learning_rate': 9.651531086335856e-06, 'epoch': 0.62}
21%|██ | 2393/11526 [24:56<1:33:29, 1.63it/s] 21%|██ | 2394/11526 [24:57<1:33:31, 1.63it/s] {'loss': 0.2652, 'grad_norm': 0.5367191433906555, 'learning_rate': 9.650975447757723e-06, 'epoch': 0.62}
21%|██ | 2394/11526 [24:57<1:33:31, 1.63it/s] 21%|██ | 2395/11526 [24:58<1:33:37, 1.63it/s] {'loss': 0.2404, 'grad_norm': 0.5921828746795654, 'learning_rate': 9.650419382565658e-06, 'epoch': 0.62}
21%|██ | 2395/11526 [24:58<1:33:37, 1.63it/s] 21%|██ | 2396/11526 [24:58<1:33:36, 1.63it/s] {'loss': 0.2377, 'grad_norm': 0.5241285562515259, 'learning_rate': 9.649862890810668e-06, 'epoch': 0.62}
21%|██ | 2396/11526 [24:58<1:33:36, 1.63it/s] 21%|██ | 2397/11526 [24:59<1:33:43, 1.62it/s] {'loss': 0.2624, 'grad_norm': 0.49179717898368835, 'learning_rate': 9.649305972543797e-06, 'epoch': 0.62}
21%|██ | 2397/11526 [24:59<1:33:43, 1.62it/s] 21%|██ | 2398/11526 [24:59<1:33:37, 1.62it/s] {'loss': 0.2509, 'grad_norm': 0.5476941466331482, 'learning_rate': 9.648748627816127e-06, 'epoch': 0.62}
21%|██ | 2398/11526 [25:00<1:33:37, 1.62it/s] 21%|██ | 2399/11526 [25:00<1:33:32, 1.63it/s] {'loss': 0.2691, 'grad_norm': 0.5281398296356201, 'learning_rate': 9.648190856678783e-06, 'epoch': 0.62}
21%|██ | 2399/11526 [25:00<1:33:32, 1.63it/s] 21%|██ | 2400/11526 [25:01<1:33:38, 1.62it/s] {'loss': 0.2675, 'grad_norm': 0.5188595056533813, 'learning_rate': 9.647632659182928e-06, 'epoch': 0.62}
21%|██ | 2400/11526 [25:01<1:33:38, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.02it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6850428581237793, 'eval_runtime': 1.9545, 'eval_samples_per_second': 102.328, 'eval_steps_per_second': 6.651, 'epoch': 0.62}
21%|██ | 2400/11526 [25:03<1:33:38, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 21%|██ | 2401/11526 [25:03<3:03:03, 1.20s/it] {'loss': 0.2907, 'grad_norm': 0.6613944172859192, 'learning_rate': 9.647074035379762e-06, 'epoch': 0.62}
21%|██ | 2401/11526 [25:03<3:03:03, 1.20s/it] 21%|██ | 2402/11526 [25:04<2:36:06, 1.03s/it] {'loss': 0.2512, 'grad_norm': 0.6062567830085754, 'learning_rate': 9.646514985320524e-06, 'epoch': 0.63}
21%|██ | 2402/11526 [25:04<2:36:06, 1.03s/it] 21%|██ | 2403/11526 [25:04<2:17:16, 1.11it/s] {'loss': 0.2123, 'grad_norm': 0.5637663006782532, 'learning_rate': 9.645955509056492e-06, 'epoch': 0.63}
21%|██ | 2403/11526 [25:05<2:17:16, 1.11it/s] 21%|██ | 2404/11526 [25:05<2:04:04, 1.23it/s] {'loss': 0.2425, 'grad_norm': 0.5997516512870789, 'learning_rate': 9.64539560663899e-06, 'epoch': 0.63}
21%|██ | 2404/11526 [25:05<2:04:04, 1.23it/s] 21%|██ | 2405/11526 [25:06<1:54:50, 1.32it/s] {'loss': 0.2462, 'grad_norm': 0.5521284341812134, 'learning_rate': 9.644835278119369e-06, 'epoch': 0.63}
21%|██ | 2405/11526 [25:06<1:54:50, 1.32it/s] 21%|██ | 2406/11526 [25:06<1:48:22, 1.40it/s] {'loss': 0.2222, 'grad_norm': 0.5024690628051758, 'learning_rate': 9.644274523549029e-06, 'epoch': 0.63}
21%|██ | 2406/11526 [25:06<1:48:22, 1.40it/s] 21%|██ | 2407/11526 [25:07<1:44:08, 1.46it/s] {'loss': 0.2823, 'grad_norm': 0.5755563378334045, 'learning_rate': 9.643713342979405e-06, 'epoch': 0.63}
21%|██ | 2407/11526 [25:07<1:44:08, 1.46it/s] 21%|██ | 2408/11526 [25:07<1:40:50, 1.51it/s] {'loss': 0.2423, 'grad_norm': 0.5113295316696167, 'learning_rate': 9.643151736461971e-06, 'epoch': 0.63}
21%|██ | 2408/11526 [25:08<1:40:50, 1.51it/s] 21%|██ | 2409/11526 [25:08<1:38:35, 1.54it/s] {'loss': 0.2355, 'grad_norm': 0.511438250541687, 'learning_rate': 9.642589704048242e-06, 'epoch': 0.63}
21%|██ | 2409/11526 [25:08<1:38:35, 1.54it/s] 21%|██ | 2410/11526 [25:09<1:37:00, 1.57it/s] {'loss': 0.2342, 'grad_norm': 0.4919203221797943, 'learning_rate': 9.64202724578977e-06, 'epoch': 0.63}
21%|██ | 2410/11526 [25:09<1:37:00, 1.57it/s] 21%|██ | 2411/11526 [25:09<1:35:52, 1.58it/s] {'loss': 0.4882, 'grad_norm': 0.8417021036148071, 'learning_rate': 9.641464361738147e-06, 'epoch': 0.63}
21%|██ | 2411/11526 [25:09<1:35:52, 1.58it/s] 21%|██ | 2412/11526 [25:10<1:35:15, 1.59it/s] {'loss': 0.2505, 'grad_norm': 0.5468502640724182, 'learning_rate': 9.640901051945004e-06, 'epoch': 0.63}
21%|██ | 2412/11526 [25:10<1:35:15, 1.59it/s] 21%|██ | 2413/11526 [25:11<1:34:40, 1.60it/s] {'loss': 0.2954, 'grad_norm': 0.6245058178901672, 'learning_rate': 9.64033731646201e-06, 'epoch': 0.63}
21%|██ | 2413/11526 [25:11<1:34:40, 1.60it/s] 21%|██ | 2414/11526 [25:11<1:34:11, 1.61it/s] {'loss': 0.3064, 'grad_norm': 0.5602511167526245, 'learning_rate': 9.639773155340877e-06, 'epoch': 0.63}
21%|██ | 2414/11526 [25:11<1:34:11, 1.61it/s] 21%|██ | 2415/11526 [25:12<1:33:59, 1.62it/s] {'loss': 0.2124, 'grad_norm': 0.4584086239337921, 'learning_rate': 9.639208568633349e-06, 'epoch': 0.63}
21%|██ | 2415/11526 [25:12<1:33:59, 1.62it/s] 21%|██ | 2416/11526 [25:12<1:33:47, 1.62it/s] {'loss': 0.2799, 'grad_norm': 0.5918323993682861, 'learning_rate': 9.638643556391215e-06, 'epoch': 0.63}
21%|██ | 2416/11526 [25:13<1:33:47, 1.62it/s] 21%|██ | 2417/11526 [25:13<1:33:42, 1.62it/s] {'loss': 0.2475, 'grad_norm': 0.5801216959953308, 'learning_rate': 9.638078118666302e-06, 'epoch': 0.63}
21%|██ | 2417/11526 [25:13<1:33:42, 1.62it/s] 21%|██ | 2418/11526 [25:14<1:33:34, 1.62it/s] {'loss': 0.2643, 'grad_norm': 0.5951655507087708, 'learning_rate': 9.637512255510475e-06, 'epoch': 0.63}
21%|██ | 2418/11526 [25:14<1:33:34, 1.62it/s] 21%|██ | 2419/11526 [25:14<1:33:26, 1.62it/s] {'loss': 0.4109, 'grad_norm': 0.7206952571868896, 'learning_rate': 9.636945966975636e-06, 'epoch': 0.63}
21%|██ | 2419/11526 [25:14<1:33:26, 1.62it/s] 21%|██ | 2420/11526 [25:15<1:33:26, 1.62it/s] {'loss': 0.2309, 'grad_norm': 0.49547863006591797, 'learning_rate': 9.63637925311373e-06, 'epoch': 0.63}
21%|██ | 2420/11526 [25:15<1:33:26, 1.62it/s] 21%|██ | 2421/11526 [25:15<1:33:20, 1.63it/s] {'loss': 0.2395, 'grad_norm': 0.5705538392066956, 'learning_rate': 9.63581211397674e-06, 'epoch': 0.63}
21%|██ | 2421/11526 [25:16<1:33:20, 1.63it/s] 21%|██ | 2422/11526 [25:16<1:33:27, 1.62it/s] {'loss': 0.2123, 'grad_norm': 0.4915350675582886, 'learning_rate': 9.635244549616685e-06, 'epoch': 0.63}
21%|██ | 2422/11526 [25:16<1:33:27, 1.62it/s] 21%|██ | 2423/11526 [25:17<1:33:21, 1.63it/s] {'loss': 0.2449, 'grad_norm': 0.5880899429321289, 'learning_rate': 9.634676560085627e-06, 'epoch': 0.63}
21%|██ | 2423/11526 [25:17<1:33:21, 1.63it/s] 21%|██ | 2424/11526 [25:17<1:33:18, 1.63it/s] {'loss': 0.2552, 'grad_norm': 0.5020283460617065, 'learning_rate': 9.634108145435665e-06, 'epoch': 0.63}
21%|██ | 2424/11526 [25:17<1:33:18, 1.63it/s] 21%|██ | 2425/11526 [25:18<1:33:14, 1.63it/s] {'loss': 0.234, 'grad_norm': 0.6952956914901733, 'learning_rate': 9.633539305718938e-06, 'epoch': 0.63}
21%|██ | 2425/11526 [25:18<1:33:14, 1.63it/s] 21%|██ | 2426/11526 [25:19<1:33:09, 1.63it/s] {'loss': 0.2827, 'grad_norm': 0.5837037563323975, 'learning_rate': 9.63297004098762e-06, 'epoch': 0.63}
21%|██ | 2426/11526 [25:19<1:33:09, 1.63it/s] 21%|██ | 2427/11526 [25:19<1:33:17, 1.63it/s] {'loss': 0.2832, 'grad_norm': 0.5358526110649109, 'learning_rate': 9.632400351293931e-06, 'epoch': 0.63}
21%|██ | 2427/11526 [25:19<1:33:17, 1.63it/s] 21%|██ | 2428/11526 [25:20<1:33:14, 1.63it/s] {'loss': 0.2586, 'grad_norm': 0.5789212584495544, 'learning_rate': 9.631830236690123e-06, 'epoch': 0.63}
21%|██ | 2428/11526 [25:20<1:33:14, 1.63it/s] 21%|██ | 2429/11526 [25:20<1:33:09, 1.63it/s] {'loss': 0.2822, 'grad_norm': 0.6297745704650879, 'learning_rate': 9.631259697228492e-06, 'epoch': 0.63}
21%|██ | 2429/11526 [25:21<1:33:09, 1.63it/s] 21%|██ | 2430/11526 [25:21<1:33:06, 1.63it/s] {'loss': 0.3189, 'grad_norm': 0.5822621583938599, 'learning_rate': 9.630688732961372e-06, 'epoch': 0.63}
21%|██ | 2430/11526 [25:21<1:33:06, 1.63it/s] 21%|██ | 2431/11526 [25:22<1:33:07, 1.63it/s] {'loss': 0.329, 'grad_norm': 0.6811425685882568, 'learning_rate': 9.630117343941133e-06, 'epoch': 0.63}
21%|██ | 2431/11526 [25:22<1:33:07, 1.63it/s] 21%|██ | 2432/11526 [25:22<1:33:15, 1.63it/s] {'loss': 0.2292, 'grad_norm': 0.4897702932357788, 'learning_rate': 9.629545530220188e-06, 'epoch': 0.63}
21%|██ | 2432/11526 [25:22<1:33:15, 1.63it/s] 21%|██ | 2433/11526 [25:23<1:33:10, 1.63it/s] {'loss': 0.3208, 'grad_norm': 0.6040560603141785, 'learning_rate': 9.628973291850985e-06, 'epoch': 0.63}
21%|██ | 2433/11526 [25:23<1:33:10, 1.63it/s] 21%|██ | 2434/11526 [25:23<1:33:09, 1.63it/s] {'loss': 0.2659, 'grad_norm': 0.5127859711647034, 'learning_rate': 9.628400628886013e-06, 'epoch': 0.63}
21%|██ | 2434/11526 [25:24<1:33:09, 1.63it/s] 21%|██ | 2435/11526 [25:24<1:33:16, 1.62it/s] {'loss': 0.2802, 'grad_norm': 0.5647030472755432, 'learning_rate': 9.627827541377801e-06, 'epoch': 0.63}
21%|██ | 2435/11526 [25:24<1:33:16, 1.62it/s] 21%|██ | 2436/11526 [25:25<1:33:12, 1.63it/s] {'loss': 0.2776, 'grad_norm': 0.6236886978149414, 'learning_rate': 9.627254029378917e-06, 'epoch': 0.63}
21%|██ | 2436/11526 [25:25<1:33:12, 1.63it/s] 21%|██ | 2437/11526 [25:25<1:33:40, 1.62it/s] {'loss': 0.2422, 'grad_norm': 0.5300828218460083, 'learning_rate': 9.626680092941965e-06, 'epoch': 0.63}
21%|██ | 2437/11526 [25:25<1:33:40, 1.62it/s] 21%|██ | 2438/11526 [25:26<1:33:31, 1.62it/s] {'loss': 0.218, 'grad_norm': 0.5699900388717651, 'learning_rate': 9.62610573211959e-06, 'epoch': 0.63}
21%|██ | 2438/11526 [25:26<1:33:31, 1.62it/s] 21%|██ | 2439/11526 [25:27<1:33:21, 1.62it/s] {'loss': 0.3119, 'grad_norm': 0.6086682081222534, 'learning_rate': 9.625530946964476e-06, 'epoch': 0.63}
21%|██ | 2439/11526 [25:27<1:33:21, 1.62it/s] 21%|██ | 2440/11526 [25:27<1:33:19, 1.62it/s] {'loss': 0.2481, 'grad_norm': 0.6198521852493286, 'learning_rate': 9.624955737529345e-06, 'epoch': 0.64}
21%|██ | 2440/11526 [25:27<1:33:19, 1.62it/s] 21%|██ | 2441/11526 [25:28<1:33:12, 1.62it/s] {'loss': 0.2304, 'grad_norm': 0.5375490188598633, 'learning_rate': 9.624380103866959e-06, 'epoch': 0.64}
21%|██ | 2441/11526 [25:28<1:33:12, 1.62it/s] 21%|██ | 2442/11526 [25:28<1:33:16, 1.62it/s] {'loss': 0.3557, 'grad_norm': 0.6304376125335693, 'learning_rate': 9.623804046030119e-06, 'epoch': 0.64}
21%|██ | 2442/11526 [25:29<1:33:16, 1.62it/s] 21%|██ | 2443/11526 [25:29<1:33:08, 1.63it/s] {'loss': 0.3347, 'grad_norm': 0.6386612057685852, 'learning_rate': 9.623227564071662e-06, 'epoch': 0.64}
21%|██ | 2443/11526 [25:29<1:33:08, 1.63it/s] 21%|██ | 2444/11526 [25:30<1:33:05, 1.63it/s] {'loss': 0.2833, 'grad_norm': 0.6564096212387085, 'learning_rate': 9.622650658044467e-06, 'epoch': 0.64}
21%|██ | 2444/11526 [25:30<1:33:05, 1.63it/s] 21%|██ | 2445/11526 [25:30<1:33:08, 1.62it/s] {'loss': 0.2654, 'grad_norm': 0.5850387811660767, 'learning_rate': 9.622073328001454e-06, 'epoch': 0.64}
21%|██ | 2445/11526 [25:30<1:33:08, 1.62it/s] 21%|██ | 2446/11526 [25:31<1:33:04, 1.63it/s] {'loss': 0.3071, 'grad_norm': 0.5810196399688721, 'learning_rate': 9.621495573995575e-06, 'epoch': 0.64}
21%|██ | 2446/11526 [25:31<1:33:04, 1.63it/s] 21%|██ | 2447/11526 [25:31<1:33:13, 1.62it/s] {'loss': 0.2, 'grad_norm': 0.4899342358112335, 'learning_rate': 9.620917396079829e-06, 'epoch': 0.64}
21%|██ | 2447/11526 [25:32<1:33:13, 1.62it/s] 21%|██ | 2448/11526 [25:32<1:33:04, 1.63it/s] {'loss': 0.2342, 'grad_norm': 0.5391389727592468, 'learning_rate': 9.620338794307248e-06, 'epoch': 0.64}
21%|██ | 2448/11526 [25:32<1:33:04, 1.63it/s] 21%|██ | 2449/11526 [25:33<1:33:02, 1.63it/s] {'loss': 0.2828, 'grad_norm': 0.5546200275421143, 'learning_rate': 9.619759768730901e-06, 'epoch': 0.64}
21%|██ | 2449/11526 [25:33<1:33:02, 1.63it/s] 21%|██▏ | 2450/11526 [25:33<1:33:03, 1.63it/s] {'loss': 0.2777, 'grad_norm': 0.5029544830322266, 'learning_rate': 9.619180319403905e-06, 'epoch': 0.64}
21%|██▏ | 2450/11526 [25:33<1:33:03, 1.63it/s] 21%|██▏ | 2451/11526 [25:34<1:32:57, 1.63it/s] {'loss': 0.2032, 'grad_norm': 0.5118046402931213, 'learning_rate': 9.618600446379407e-06, 'epoch': 0.64}
21%|██▏ | 2451/11526 [25:34<1:32:57, 1.63it/s] 21%|██▏ | 2452/11526 [25:35<1:33:02, 1.63it/s] {'loss': 0.4058, 'grad_norm': 0.6853023171424866, 'learning_rate': 9.618020149710596e-06, 'epoch': 0.64}
21%|██▏ | 2452/11526 [25:35<1:33:02, 1.63it/s] 21%|██▏ | 2453/11526 [25:35<1:33:01, 1.63it/s] {'loss': 0.319, 'grad_norm': 0.7185158133506775, 'learning_rate': 9.617439429450704e-06, 'epoch': 0.64}
21%|██▏ | 2453/11526 [25:35<1:33:01, 1.63it/s] 21%|██▏ | 2454/11526 [25:36<1:32:55, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.5577727556228638, 'learning_rate': 9.616858285652994e-06, 'epoch': 0.64}
21%|██▏ | 2454/11526 [25:36<1:32:55, 1.63it/s] 21%|██▏ | 2455/11526 [25:36<1:32:53, 1.63it/s] {'loss': 0.2913, 'grad_norm': 0.6189290881156921, 'learning_rate': 9.616276718370774e-06, 'epoch': 0.64}
21%|██▏ | 2455/11526 [25:37<1:32:53, 1.63it/s] 21%|██▏ | 2456/11526 [25:37<1:32:54, 1.63it/s] {'loss': 0.2219, 'grad_norm': 0.5357576608657837, 'learning_rate': 9.615694727657387e-06, 'epoch': 0.64}
21%|██▏ | 2456/11526 [25:37<1:32:54, 1.63it/s] 21%|██▏ | 2457/11526 [25:38<1:33:00, 1.62it/s] {'loss': 0.2749, 'grad_norm': 0.5677977800369263, 'learning_rate': 9.615112313566218e-06, 'epoch': 0.64}
21%|██▏ | 2457/11526 [25:38<1:33:00, 1.62it/s] 21%|██▏ | 2458/11526 [25:38<1:32:57, 1.63it/s] {'loss': 0.3227, 'grad_norm': 0.7070935964584351, 'learning_rate': 9.61452947615069e-06, 'epoch': 0.64}
21%|██▏ | 2458/11526 [25:38<1:32:57, 1.63it/s] 21%|██▏ | 2459/11526 [25:39<1:32:55, 1.63it/s] {'loss': 0.2729, 'grad_norm': 0.6460191607475281, 'learning_rate': 9.61394621546426e-06, 'epoch': 0.64}
21%|██▏ | 2459/11526 [25:39<1:32:55, 1.63it/s] 21%|██▏ | 2460/11526 [25:39<1:32:53, 1.63it/s] {'loss': 0.2745, 'grad_norm': 0.6291245222091675, 'learning_rate': 9.613362531560432e-06, 'epoch': 0.64}
21%|██▏ | 2460/11526 [25:40<1:32:53, 1.63it/s] 21%|██▏ | 2461/11526 [25:40<1:32:52, 1.63it/s] {'loss': 0.2495, 'grad_norm': 0.5491254329681396, 'learning_rate': 9.612778424492744e-06, 'epoch': 0.64}
21%|██▏ | 2461/11526 [25:40<1:32:52, 1.63it/s] 21%|██▏ | 2462/11526 [25:41<1:33:00, 1.62it/s] {'loss': 0.26, 'grad_norm': 0.6171245574951172, 'learning_rate': 9.612193894314774e-06, 'epoch': 0.64}
21%|██▏ | 2462/11526 [25:41<1:33:00, 1.62it/s] 21%|██▏ | 2463/11526 [25:41<1:32:55, 1.63it/s] {'loss': 0.3153, 'grad_norm': 0.5827672481536865, 'learning_rate': 9.611608941080135e-06, 'epoch': 0.64}
21%|██▏ | 2463/11526 [25:41<1:32:55, 1.63it/s] 21%|██▏ | 2464/11526 [25:42<1:32:53, 1.63it/s] {'loss': 0.3268, 'grad_norm': 0.6890742778778076, 'learning_rate': 9.611023564842487e-06, 'epoch': 0.64}
21%|██▏ | 2464/11526 [25:42<1:32:53, 1.63it/s] 21%|██▏ | 2465/11526 [25:43<1:32:54, 1.63it/s] {'loss': 0.2112, 'grad_norm': 0.5450580716133118, 'learning_rate': 9.610437765655522e-06, 'epoch': 0.64}
21%|██▏ | 2465/11526 [25:43<1:32:54, 1.63it/s] 21%|██▏ | 2466/11526 [25:43<1:32:53, 1.63it/s] {'loss': 0.2972, 'grad_norm': 0.5369660258293152, 'learning_rate': 9.609851543572972e-06, 'epoch': 0.64}
21%|██▏ | 2466/11526 [25:43<1:32:53, 1.63it/s] 21%|██▏ | 2467/11526 [25:44<1:32:56, 1.62it/s] {'loss': 0.2824, 'grad_norm': 0.515816330909729, 'learning_rate': 9.609264898648612e-06, 'epoch': 0.64}
21%|██▏ | 2467/11526 [25:44<1:32:56, 1.62it/s] 21%|██▏ | 2468/11526 [25:44<1:32:55, 1.62it/s] {'loss': 0.3153, 'grad_norm': 0.5760408639907837, 'learning_rate': 9.608677830936245e-06, 'epoch': 0.64}
21%|██▏ | 2468/11526 [25:45<1:32:55, 1.62it/s] 21%|██▏ | 2469/11526 [25:45<1:32:52, 1.63it/s] {'loss': 0.2515, 'grad_norm': 0.5973147749900818, 'learning_rate': 9.60809034048973e-06, 'epoch': 0.64}
21%|██▏ | 2469/11526 [25:45<1:32:52, 1.63it/s] 21%|██▏ | 2470/11526 [25:46<1:32:51, 1.63it/s] {'loss': 0.2979, 'grad_norm': 0.6011741161346436, 'learning_rate': 9.607502427362946e-06, 'epoch': 0.64}
21%|██▏ | 2470/11526 [25:46<1:32:51, 1.63it/s] 21%|██▏ | 2471/11526 [25:46<1:32:49, 1.63it/s] {'loss': 0.3065, 'grad_norm': 0.5542375445365906, 'learning_rate': 9.606914091609826e-06, 'epoch': 0.64}
21%|██▏ | 2471/11526 [25:46<1:32:49, 1.63it/s] 21%|██▏ | 2472/11526 [25:47<1:32:53, 1.62it/s] {'loss': 0.2622, 'grad_norm': 0.5603646039962769, 'learning_rate': 9.606325333284334e-06, 'epoch': 0.64}
21%|██▏ | 2472/11526 [25:47<1:32:53, 1.62it/s] 21%|██▏ | 2473/11526 [25:47<1:32:48, 1.63it/s] {'loss': 0.2774, 'grad_norm': 0.5740708708763123, 'learning_rate': 9.605736152440472e-06, 'epoch': 0.64}
21%|██▏ | 2473/11526 [25:48<1:32:48, 1.63it/s] 21%|██▏ | 2474/11526 [25:48<1:32:46, 1.63it/s] {'loss': 0.3182, 'grad_norm': 0.6633690595626831, 'learning_rate': 9.605146549132286e-06, 'epoch': 0.64}
21%|██▏ | 2474/11526 [25:48<1:32:46, 1.63it/s] 21%|██▏ | 2475/11526 [25:49<1:32:51, 1.62it/s] {'loss': 0.275, 'grad_norm': 0.5222040414810181, 'learning_rate': 9.604556523413855e-06, 'epoch': 0.64}
21%|██▏ | 2475/11526 [25:49<1:32:51, 1.62it/s] 21%|██▏ | 2476/11526 [25:49<1:32:42, 1.63it/s] {'loss': 0.2334, 'grad_norm': 0.5357258915901184, 'learning_rate': 9.603966075339302e-06, 'epoch': 0.64}
21%|██▏ | 2476/11526 [25:49<1:32:42, 1.63it/s] 21%|██▏ | 2477/11526 [25:50<1:32:59, 1.62it/s] {'loss': 0.2362, 'grad_norm': 0.524754524230957, 'learning_rate': 9.603375204962783e-06, 'epoch': 0.64}
21%|██▏ | 2477/11526 [25:50<1:32:59, 1.62it/s] 21%|██▏ | 2478/11526 [25:51<1:32:54, 1.62it/s] {'loss': 0.2334, 'grad_norm': 0.4894295930862427, 'learning_rate': 9.6027839123385e-06, 'epoch': 0.64}
21%|██▏ | 2478/11526 [25:51<1:32:54, 1.62it/s] 22%|██▏ | 2479/11526 [25:51<1:32:48, 1.62it/s] {'loss': 0.2285, 'grad_norm': 0.48570865392684937, 'learning_rate': 9.602192197520688e-06, 'epoch': 0.65}
22%|██▏ | 2479/11526 [25:51<1:32:48, 1.62it/s] 22%|██▏ | 2480/11526 [25:52<1:32:47, 1.62it/s] {'loss': 0.2478, 'grad_norm': 0.5375857949256897, 'learning_rate': 9.601600060563621e-06, 'epoch': 0.65}
22%|██▏ | 2480/11526 [25:52<1:32:47, 1.62it/s] 22%|██▏ | 2481/11526 [25:52<1:32:46, 1.63it/s] {'loss': 0.2352, 'grad_norm': 0.6218629479408264, 'learning_rate': 9.601007501521614e-06, 'epoch': 0.65}
22%|██▏ | 2481/11526 [25:53<1:32:46, 1.63it/s] 22%|██▏ | 2482/11526 [25:53<1:32:46, 1.62it/s] {'loss': 0.3553, 'grad_norm': 0.5910660624504089, 'learning_rate': 9.600414520449022e-06, 'epoch': 0.65}
22%|██▏ | 2482/11526 [25:53<1:32:46, 1.62it/s] 22%|██▏ | 2483/11526 [25:54<1:32:40, 1.63it/s] {'loss': 0.244, 'grad_norm': 0.5236440300941467, 'learning_rate': 9.599821117400233e-06, 'epoch': 0.65}
22%|██▏ | 2483/11526 [25:54<1:32:40, 1.63it/s] 22%|██▏ | 2484/11526 [25:54<1:32:37, 1.63it/s] {'loss': 0.3506, 'grad_norm': 0.7367263436317444, 'learning_rate': 9.59922729242968e-06, 'epoch': 0.65}
22%|██▏ | 2484/11526 [25:54<1:32:37, 1.63it/s] 22%|██▏ | 2485/11526 [25:55<1:32:33, 1.63it/s] {'loss': 0.2499, 'grad_norm': 0.4864495098590851, 'learning_rate': 9.598633045591831e-06, 'epoch': 0.65}
22%|██▏ | 2485/11526 [25:55<1:32:33, 1.63it/s] 22%|██▏ | 2486/11526 [25:55<1:32:31, 1.63it/s] {'loss': 0.2913, 'grad_norm': 0.6153245568275452, 'learning_rate': 9.598038376941196e-06, 'epoch': 0.65}
22%|██▏ | 2486/11526 [25:56<1:32:31, 1.63it/s] 22%|██▏ | 2487/11526 [25:56<1:32:36, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.4848952293395996, 'learning_rate': 9.597443286532318e-06, 'epoch': 0.65}
22%|██▏ | 2487/11526 [25:56<1:32:36, 1.63it/s] 22%|██▏ | 2488/11526 [25:57<1:32:32, 1.63it/s] {'loss': 0.2753, 'grad_norm': 0.5939898490905762, 'learning_rate': 9.596847774419782e-06, 'epoch': 0.65}
22%|██▏ | 2488/11526 [25:57<1:32:32, 1.63it/s] 22%|██▏ | 2489/11526 [25:57<1:32:30, 1.63it/s] {'loss': 0.1733, 'grad_norm': 0.46277597546577454, 'learning_rate': 9.596251840658215e-06, 'epoch': 0.65}
22%|██▏ | 2489/11526 [25:57<1:32:30, 1.63it/s] 22%|██▏ | 2490/11526 [25:58<1:32:30, 1.63it/s] {'loss': 0.3078, 'grad_norm': 0.6730993986129761, 'learning_rate': 9.595655485302276e-06, 'epoch': 0.65}
22%|██▏ | 2490/11526 [25:58<1:32:30, 1.63it/s] 22%|██▏ | 2491/11526 [25:59<1:32:33, 1.63it/s] {'loss': 0.3071, 'grad_norm': 0.5953859090805054, 'learning_rate': 9.595058708406669e-06, 'epoch': 0.65}
22%|██▏ | 2491/11526 [25:59<1:32:33, 1.63it/s] 22%|██▏ | 2492/11526 [25:59<1:32:38, 1.63it/s] {'loss': 0.3034, 'grad_norm': 0.5942074656486511, 'learning_rate': 9.594461510026132e-06, 'epoch': 0.65}
22%|██▏ | 2492/11526 [25:59<1:32:38, 1.63it/s] 22%|██▏ | 2493/11526 [26:00<1:32:36, 1.63it/s] {'loss': 0.2256, 'grad_norm': 0.5272020101547241, 'learning_rate': 9.593863890215444e-06, 'epoch': 0.65}
22%|██▏ | 2493/11526 [26:00<1:32:36, 1.63it/s] 22%|██▏ | 2494/11526 [26:00<1:32:32, 1.63it/s] {'loss': 0.1966, 'grad_norm': 0.448517769575119, 'learning_rate': 9.593265849029422e-06, 'epoch': 0.65}
22%|██▏ | 2494/11526 [26:01<1:32:32, 1.63it/s] 22%|██▏ | 2495/11526 [26:01<1:32:31, 1.63it/s] {'loss': 0.3312, 'grad_norm': 0.6212528944015503, 'learning_rate': 9.59266738652292e-06, 'epoch': 0.65}
22%|██▏ | 2495/11526 [26:01<1:32:31, 1.63it/s] 22%|██▏ | 2496/11526 [26:02<1:32:32, 1.63it/s] {'loss': 0.3325, 'grad_norm': 0.6540561318397522, 'learning_rate': 9.592068502750836e-06, 'epoch': 0.65}
22%|██▏ | 2496/11526 [26:02<1:32:32, 1.63it/s] 22%|██▏ | 2497/11526 [26:02<1:32:33, 1.63it/s] {'loss': 0.2323, 'grad_norm': 0.5511423349380493, 'learning_rate': 9.591469197768102e-06, 'epoch': 0.65}
22%|██▏ | 2497/11526 [26:02<1:32:33, 1.63it/s] 22%|██▏ | 2498/11526 [26:03<1:32:30, 1.63it/s] {'loss': 0.2866, 'grad_norm': 0.6615066528320312, 'learning_rate': 9.590869471629687e-06, 'epoch': 0.65}
22%|██▏ | 2498/11526 [26:03<1:32:30, 1.63it/s] 22%|██▏ | 2499/11526 [26:03<1:32:30, 1.63it/s] {'loss': 0.2719, 'grad_norm': 0.6208752989768982, 'learning_rate': 9.590269324390604e-06, 'epoch': 0.65}
22%|██▏ | 2499/11526 [26:04<1:32:30, 1.63it/s] 22%|██▏ | 2500/11526 [26:04<1:32:27, 1.63it/s] {'loss': 0.3426, 'grad_norm': 0.8614863157272339, 'learning_rate': 9.589668756105902e-06, 'epoch': 0.65}
22%|██▏ | 2500/11526 [26:04<1:32:27, 1.63it/s] 22%|██▏ | 2501/11526 [26:05<1:32:26, 1.63it/s] {'loss': 0.373, 'grad_norm': 0.671309232711792, 'learning_rate': 9.589067766830664e-06, 'epoch': 0.65}
22%|██▏ | 2501/11526 [26:05<1:32:26, 1.63it/s] 22%|██▏ | 2502/11526 [26:05<1:32:30, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.6031649112701416, 'learning_rate': 9.588466356620022e-06, 'epoch': 0.65}
22%|██▏ | 2502/11526 [26:05<1:32:30, 1.63it/s] 22%|██▏ | 2503/11526 [26:06<1:32:27, 1.63it/s] {'loss': 0.3278, 'grad_norm': 0.636231541633606, 'learning_rate': 9.587864525529139e-06, 'epoch': 0.65}
22%|██▏ | 2503/11526 [26:06<1:32:27, 1.63it/s] 22%|██▏ | 2504/11526 [26:07<1:32:26, 1.63it/s] {'loss': 0.2494, 'grad_norm': 0.5709266066551208, 'learning_rate': 9.587262273613217e-06, 'epoch': 0.65}
22%|██▏ | 2504/11526 [26:07<1:32:26, 1.63it/s] 22%|██▏ | 2505/11526 [26:07<1:32:49, 1.62it/s] {'loss': 0.2439, 'grad_norm': 0.6106228232383728, 'learning_rate': 9.5866596009275e-06, 'epoch': 0.65}
22%|██▏ | 2505/11526 [26:07<1:32:49, 1.62it/s] 22%|██▏ | 2506/11526 [26:08<1:32:37, 1.62it/s] {'loss': 0.2978, 'grad_norm': 0.6165745258331299, 'learning_rate': 9.586056507527266e-06, 'epoch': 0.65}
22%|██▏ | 2506/11526 [26:08<1:32:37, 1.62it/s] 22%|██▏ | 2507/11526 [26:08<1:32:35, 1.62it/s] {'loss': 0.2301, 'grad_norm': 0.573626697063446, 'learning_rate': 9.585452993467836e-06, 'epoch': 0.65}
22%|██▏ | 2507/11526 [26:09<1:32:35, 1.62it/s] 22%|██▏ | 2508/11526 [26:09<1:32:28, 1.63it/s] {'loss': 0.2736, 'grad_norm': 0.5097870230674744, 'learning_rate': 9.584849058804567e-06, 'epoch': 0.65}
22%|██▏ | 2508/11526 [26:09<1:32:28, 1.63it/s] 22%|██▏ | 2509/11526 [26:10<1:32:23, 1.63it/s] {'loss': 0.2202, 'grad_norm': 0.4837658405303955, 'learning_rate': 9.584244703592855e-06, 'epoch': 0.65}
22%|██▏ | 2509/11526 [26:10<1:32:23, 1.63it/s] 22%|██▏ | 2510/11526 [26:10<1:32:20, 1.63it/s] {'loss': 0.2803, 'grad_norm': 0.5767895579338074, 'learning_rate': 9.583639927888138e-06, 'epoch': 0.65}
22%|██▏ | 2510/11526 [26:10<1:32:20, 1.63it/s] 22%|██▏ | 2511/11526 [26:11<1:32:19, 1.63it/s] {'loss': 0.3383, 'grad_norm': 0.6464500427246094, 'learning_rate': 9.583034731745886e-06, 'epoch': 0.65}
22%|██▏ | 2511/11526 [26:11<1:32:19, 1.63it/s] 22%|██▏ | 2512/11526 [26:11<1:32:22, 1.63it/s] {'loss': 0.2218, 'grad_norm': 0.5258198380470276, 'learning_rate': 9.58242911522161e-06, 'epoch': 0.65}
22%|██▏ | 2512/11526 [26:12<1:32:22, 1.63it/s] 22%|██▏ | 2513/11526 [26:12<1:32:20, 1.63it/s] {'loss': 0.2647, 'grad_norm': 0.492392361164093, 'learning_rate': 9.581823078370864e-06, 'epoch': 0.65}
22%|██▏ | 2513/11526 [26:12<1:32:20, 1.63it/s] 22%|██▏ | 2514/11526 [26:13<1:32:19, 1.63it/s] {'loss': 0.2881, 'grad_norm': 0.5867494344711304, 'learning_rate': 9.581216621249237e-06, 'epoch': 0.65}
22%|██▏ | 2514/11526 [26:13<1:32:19, 1.63it/s] 22%|██▏ | 2515/11526 [26:13<1:32:15, 1.63it/s] {'loss': 0.2306, 'grad_norm': 0.5205506682395935, 'learning_rate': 9.580609743912353e-06, 'epoch': 0.65}
22%|██▏ | 2515/11526 [26:13<1:32:15, 1.63it/s] 22%|██▏ | 2516/11526 [26:14<1:32:13, 1.63it/s] {'loss': 0.3199, 'grad_norm': 0.66487056016922, 'learning_rate': 9.580002446415883e-06, 'epoch': 0.65}
22%|██▏ | 2516/11526 [26:14<1:32:13, 1.63it/s] 22%|██▏ | 2517/11526 [26:15<1:32:15, 1.63it/s] {'loss': 0.3116, 'grad_norm': 0.6260663270950317, 'learning_rate': 9.579394728815527e-06, 'epoch': 0.66}
22%|██▏ | 2517/11526 [26:15<1:32:15, 1.63it/s] 22%|██▏ | 2518/11526 [26:15<1:32:10, 1.63it/s] {'loss': 0.3043, 'grad_norm': 0.5799584984779358, 'learning_rate': 9.578786591167032e-06, 'epoch': 0.66}
22%|██▏ | 2518/11526 [26:15<1:32:10, 1.63it/s] 22%|██▏ | 2519/11526 [26:16<1:32:13, 1.63it/s] {'loss': 0.2882, 'grad_norm': 0.5068584680557251, 'learning_rate': 9.57817803352618e-06, 'epoch': 0.66}
22%|██▏ | 2519/11526 [26:16<1:32:13, 1.63it/s] 22%|██▏ | 2520/11526 [26:16<1:32:12, 1.63it/s] {'loss': 0.2116, 'grad_norm': 0.48329415917396545, 'learning_rate': 9.57756905594879e-06, 'epoch': 0.66}
22%|██▏ | 2520/11526 [26:17<1:32:12, 1.63it/s] 22%|██▏ | 2521/11526 [26:17<1:32:10, 1.63it/s] {'loss': 0.2589, 'grad_norm': 0.5267789959907532, 'learning_rate': 9.57695965849072e-06, 'epoch': 0.66}
22%|██▏ | 2521/11526 [26:17<1:32:10, 1.63it/s] 22%|██▏ | 2522/11526 [26:18<1:32:11, 1.63it/s] {'loss': 0.3756, 'grad_norm': 0.613336443901062, 'learning_rate': 9.576349841207865e-06, 'epoch': 0.66}
22%|██▏ | 2522/11526 [26:18<1:32:11, 1.63it/s] 22%|██▏ | 2523/11526 [26:18<1:32:13, 1.63it/s] {'loss': 0.2351, 'grad_norm': 0.5165157914161682, 'learning_rate': 9.575739604156169e-06, 'epoch': 0.66}
22%|██▏ | 2523/11526 [26:18<1:32:13, 1.63it/s] 22%|██▏ | 2524/11526 [26:19<1:32:12, 1.63it/s] {'loss': 0.3011, 'grad_norm': 0.5586745738983154, 'learning_rate': 9.5751289473916e-06, 'epoch': 0.66}
22%|██▏ | 2524/11526 [26:19<1:32:12, 1.63it/s] 22%|██▏ | 2525/11526 [26:19<1:32:13, 1.63it/s] {'loss': 0.2444, 'grad_norm': 0.5572899580001831, 'learning_rate': 9.574517870970172e-06, 'epoch': 0.66}
22%|██▏ | 2525/11526 [26:20<1:32:13, 1.63it/s] 22%|██▏ | 2526/11526 [26:20<1:32:11, 1.63it/s] {'loss': 0.3671, 'grad_norm': 0.6846582889556885, 'learning_rate': 9.573906374947938e-06, 'epoch': 0.66}
22%|██▏ | 2526/11526 [26:20<1:32:11, 1.63it/s] 22%|██▏ | 2527/11526 [26:21<1:32:12, 1.63it/s] {'loss': 0.2752, 'grad_norm': 0.5905472040176392, 'learning_rate': 9.573294459380986e-06, 'epoch': 0.66}
22%|██▏ | 2527/11526 [26:21<1:32:12, 1.63it/s] 22%|██▏ | 2528/11526 [26:21<1:32:09, 1.63it/s] {'loss': 0.3276, 'grad_norm': 0.6591384410858154, 'learning_rate': 9.572682124325446e-06, 'epoch': 0.66}
22%|██▏ | 2528/11526 [26:21<1:32:09, 1.63it/s] 22%|██▏ | 2529/11526 [26:22<1:32:08, 1.63it/s] {'loss': 0.3254, 'grad_norm': 0.6735184192657471, 'learning_rate': 9.572069369837483e-06, 'epoch': 0.66}
22%|██▏ | 2529/11526 [26:22<1:32:08, 1.63it/s] 22%|██▏ | 2530/11526 [26:23<1:32:08, 1.63it/s] {'loss': 0.314, 'grad_norm': 0.6693761944770813, 'learning_rate': 9.571456195973303e-06, 'epoch': 0.66}
22%|██▏ | 2530/11526 [26:23<1:32:08, 1.63it/s] 22%|██▏ | 2531/11526 [26:23<1:32:04, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.5255808234214783, 'learning_rate': 9.570842602789152e-06, 'epoch': 0.66}
22%|██▏ | 2531/11526 [26:23<1:32:04, 1.63it/s] 22%|██▏ | 2532/11526 [26:24<1:32:10, 1.63it/s] {'loss': 0.4455, 'grad_norm': 0.7926754355430603, 'learning_rate': 9.57022859034131e-06, 'epoch': 0.66}
22%|██▏ | 2532/11526 [26:24<1:32:10, 1.63it/s] 22%|██▏ | 2533/11526 [26:24<1:32:05, 1.63it/s] {'loss': 0.242, 'grad_norm': 0.5910010933876038, 'learning_rate': 9.569614158686097e-06, 'epoch': 0.66}
22%|██▏ | 2533/11526 [26:24<1:32:05, 1.63it/s] 22%|██▏ | 2534/11526 [26:25<1:32:03, 1.63it/s] {'loss': 0.2714, 'grad_norm': 0.6154428124427795, 'learning_rate': 9.568999307879874e-06, 'epoch': 0.66}
22%|██▏ | 2534/11526 [26:25<1:32:03, 1.63it/s] 22%|██▏ | 2535/11526 [26:26<1:32:02, 1.63it/s] {'loss': 0.2342, 'grad_norm': 0.550249457359314, 'learning_rate': 9.568384037979039e-06, 'epoch': 0.66}
22%|██▏ | 2535/11526 [26:26<1:32:02, 1.63it/s] 22%|██▏ | 2536/11526 [26:26<1:32:05, 1.63it/s] {'loss': 0.2202, 'grad_norm': 0.5453953146934509, 'learning_rate': 9.567768349040025e-06, 'epoch': 0.66}
22%|██▏ | 2536/11526 [26:26<1:32:05, 1.63it/s] 22%|██▏ | 2537/11526 [26:27<1:32:39, 1.62it/s] {'loss': 0.2397, 'grad_norm': 0.5371474027633667, 'learning_rate': 9.56715224111931e-06, 'epoch': 0.66}
22%|██▏ | 2537/11526 [26:27<1:32:39, 1.62it/s] 22%|██▏ | 2538/11526 [26:27<1:32:29, 1.62it/s] {'loss': 0.2459, 'grad_norm': 0.5643278360366821, 'learning_rate': 9.566535714273404e-06, 'epoch': 0.66}
22%|██▏ | 2538/11526 [26:28<1:32:29, 1.62it/s] 22%|██▏ | 2539/11526 [26:28<1:32:19, 1.62it/s] {'loss': 0.3651, 'grad_norm': 0.6953021287918091, 'learning_rate': 9.565918768558862e-06, 'epoch': 0.66}
22%|██▏ | 2539/11526 [26:28<1:32:19, 1.62it/s] 22%|██▏ | 2540/11526 [26:29<1:32:20, 1.62it/s] {'loss': 0.2494, 'grad_norm': 0.5325703024864197, 'learning_rate': 9.565301404032269e-06, 'epoch': 0.66}
22%|██▏ | 2540/11526 [26:29<1:32:20, 1.62it/s] 22%|██▏ | 2541/11526 [26:29<1:32:12, 1.62it/s] {'loss': 0.2702, 'grad_norm': 0.5779030323028564, 'learning_rate': 9.564683620750257e-06, 'epoch': 0.66}
22%|██▏ | 2541/11526 [26:29<1:32:12, 1.62it/s] 22%|██▏ | 2542/11526 [26:30<1:32:18, 1.62it/s] {'loss': 0.2348, 'grad_norm': 0.4918253719806671, 'learning_rate': 9.56406541876949e-06, 'epoch': 0.66}
22%|██▏ | 2542/11526 [26:30<1:32:18, 1.62it/s] 22%|██▏ | 2543/11526 [26:31<1:32:10, 1.62it/s] {'loss': 0.2634, 'grad_norm': 0.47001272439956665, 'learning_rate': 9.563446798146678e-06, 'epoch': 0.66}
22%|██▏ | 2543/11526 [26:31<1:32:10, 1.62it/s] 22%|██▏ | 2544/11526 [26:31<1:32:05, 1.63it/s] {'loss': 0.266, 'grad_norm': 0.5640379190444946, 'learning_rate': 9.562827758938558e-06, 'epoch': 0.66}
22%|██▏ | 2544/11526 [26:31<1:32:05, 1.63it/s] 22%|██▏ | 2545/11526 [26:32<1:32:03, 1.63it/s] {'loss': 0.2954, 'grad_norm': 0.5291218757629395, 'learning_rate': 9.562208301201914e-06, 'epoch': 0.66}
22%|██▏ | 2545/11526 [26:32<1:32:03, 1.63it/s] 22%|██▏ | 2546/11526 [26:32<1:31:59, 1.63it/s] {'loss': 0.2345, 'grad_norm': 0.5820351839065552, 'learning_rate': 9.56158842499357e-06, 'epoch': 0.66}
22%|██▏ | 2546/11526 [26:32<1:31:59, 1.63it/s] 22%|██▏ | 2547/11526 [26:33<1:32:02, 1.63it/s] {'loss': 0.2271, 'grad_norm': 0.5564321279525757, 'learning_rate': 9.560968130370376e-06, 'epoch': 0.66}
22%|██▏ | 2547/11526 [26:33<1:32:02, 1.63it/s] 22%|██▏ | 2548/11526 [26:34<1:31:57, 1.63it/s] {'loss': 0.2942, 'grad_norm': 0.7432230710983276, 'learning_rate': 9.560347417389238e-06, 'epoch': 0.66}
22%|██▏ | 2548/11526 [26:34<1:31:57, 1.63it/s] 22%|██▏ | 2549/11526 [26:34<1:31:56, 1.63it/s] {'loss': 0.301, 'grad_norm': 0.5467342138290405, 'learning_rate': 9.559726286107088e-06, 'epoch': 0.66}
22%|██▏ | 2549/11526 [26:34<1:31:56, 1.63it/s] 22%|██▏ | 2550/11526 [26:35<1:32:00, 1.63it/s] {'loss': 0.2278, 'grad_norm': 0.5228471755981445, 'learning_rate': 9.559104736580897e-06, 'epoch': 0.66}
22%|██▏ | 2550/11526 [26:35<1:32:00, 1.63it/s] 22%|██▏ | 2551/11526 [26:35<1:31:58, 1.63it/s] {'loss': 0.2521, 'grad_norm': 0.5332677960395813, 'learning_rate': 9.558482768867679e-06, 'epoch': 0.66}
22%|██▏ | 2551/11526 [26:36<1:31:58, 1.63it/s] 22%|██▏ | 2552/11526 [26:36<1:32:23, 1.62it/s] {'loss': 0.3076, 'grad_norm': 0.6308724880218506, 'learning_rate': 9.557860383024486e-06, 'epoch': 0.66}
22%|██▏ | 2552/11526 [26:36<1:32:23, 1.62it/s] 22%|██▏ | 2553/11526 [26:37<1:32:12, 1.62it/s] {'loss': 0.3005, 'grad_norm': 0.6405683159828186, 'learning_rate': 9.557237579108406e-06, 'epoch': 0.66}
22%|██▏ | 2553/11526 [26:37<1:32:12, 1.62it/s] 22%|██▏ | 2554/11526 [26:37<1:32:06, 1.62it/s] {'loss': 0.2567, 'grad_norm': 0.7137377858161926, 'learning_rate': 9.556614357176566e-06, 'epoch': 0.66}
22%|██▏ | 2554/11526 [26:37<1:32:06, 1.62it/s] 22%|██▏ | 2555/11526 [26:38<1:32:02, 1.62it/s] {'loss': 0.2434, 'grad_norm': 0.5693735480308533, 'learning_rate': 9.55599071728613e-06, 'epoch': 0.67}
22%|██▏ | 2555/11526 [26:38<1:32:02, 1.62it/s] 22%|██▏ | 2556/11526 [26:39<1:31:57, 1.63it/s] {'loss': 0.263, 'grad_norm': 0.6603737473487854, 'learning_rate': 9.555366659494303e-06, 'epoch': 0.67}
22%|██▏ | 2556/11526 [26:39<1:31:57, 1.63it/s] 22%|██▏ | 2557/11526 [26:39<1:31:59, 1.62it/s] {'loss': 0.2071, 'grad_norm': 0.5040447115898132, 'learning_rate': 9.554742183858327e-06, 'epoch': 0.67}
22%|██▏ | 2557/11526 [26:39<1:31:59, 1.62it/s] 22%|██▏ | 2558/11526 [26:40<1:31:55, 1.63it/s] {'loss': 0.2684, 'grad_norm': 0.5417655110359192, 'learning_rate': 9.554117290435482e-06, 'epoch': 0.67}
22%|██▏ | 2558/11526 [26:40<1:31:55, 1.63it/s] 22%|██▏ | 2559/11526 [26:40<1:31:54, 1.63it/s] {'loss': 0.2165, 'grad_norm': 0.5181728005409241, 'learning_rate': 9.55349197928309e-06, 'epoch': 0.67}
22%|██▏ | 2559/11526 [26:40<1:31:54, 1.63it/s] 22%|██▏ | 2560/11526 [26:41<1:32:00, 1.62it/s] {'loss': 0.2755, 'grad_norm': 0.5668189525604248, 'learning_rate': 9.552866250458501e-06, 'epoch': 0.67}
22%|██▏ | 2560/11526 [26:41<1:32:00, 1.62it/s] 22%|██▏ | 2561/11526 [26:42<1:31:51, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.6825908422470093, 'learning_rate': 9.552240104019118e-06, 'epoch': 0.67}
22%|██▏ | 2561/11526 [26:42<1:31:51, 1.63it/s] 22%|██▏ | 2562/11526 [26:42<1:31:58, 1.62it/s] {'loss': 0.2199, 'grad_norm': 0.5104976296424866, 'learning_rate': 9.551613540022372e-06, 'epoch': 0.67}
22%|██▏ | 2562/11526 [26:42<1:31:58, 1.62it/s] 22%|██▏ | 2563/11526 [26:43<1:31:56, 1.62it/s] {'loss': 0.2942, 'grad_norm': 0.5788955092430115, 'learning_rate': 9.550986558525732e-06, 'epoch': 0.67}
22%|██▏ | 2563/11526 [26:43<1:31:56, 1.62it/s] 22%|██▏ | 2564/11526 [26:43<1:31:53, 1.63it/s] {'loss': 0.3181, 'grad_norm': 0.6398277282714844, 'learning_rate': 9.550359159586712e-06, 'epoch': 0.67}
22%|██▏ | 2564/11526 [26:44<1:31:53, 1.63it/s] 22%|██▏ | 2565/11526 [26:44<1:31:55, 1.62it/s] {'loss': 0.3716, 'grad_norm': 0.6511211395263672, 'learning_rate': 9.549731343262859e-06, 'epoch': 0.67}
22%|██▏ | 2565/11526 [26:44<1:31:55, 1.62it/s] 22%|██▏ | 2566/11526 [26:45<1:31:57, 1.62it/s] {'loss': 0.2731, 'grad_norm': 0.5370916724205017, 'learning_rate': 9.54910310961176e-06, 'epoch': 0.67}
22%|██▏ | 2566/11526 [26:45<1:31:57, 1.62it/s] 22%|██▏ | 2567/11526 [26:45<1:31:55, 1.62it/s] {'loss': 0.2524, 'grad_norm': 0.47354671359062195, 'learning_rate': 9.548474458691042e-06, 'epoch': 0.67}
22%|██▏ | 2567/11526 [26:45<1:31:55, 1.62it/s] 22%|██▏ | 2568/11526 [26:46<1:31:57, 1.62it/s] {'loss': 0.2688, 'grad_norm': 0.5392929911613464, 'learning_rate': 9.547845390558366e-06, 'epoch': 0.67}
22%|██▏ | 2568/11526 [26:46<1:31:57, 1.62it/s] 22%|██▏ | 2569/11526 [26:47<1:31:55, 1.62it/s] {'loss': 0.2644, 'grad_norm': 0.5154317617416382, 'learning_rate': 9.547215905271434e-06, 'epoch': 0.67}
22%|██▏ | 2569/11526 [26:47<1:31:55, 1.62it/s] 22%|██▏ | 2570/11526 [26:47<1:32:14, 1.62it/s] {'loss': 0.287, 'grad_norm': 0.652003288269043, 'learning_rate': 9.546586002887987e-06, 'epoch': 0.67}
22%|██▏ | 2570/11526 [26:47<1:32:14, 1.62it/s] 22%|██▏ | 2571/11526 [26:48<1:32:02, 1.62it/s] {'loss': 0.1768, 'grad_norm': 0.44061216711997986, 'learning_rate': 9.545955683465803e-06, 'epoch': 0.67}
22%|██▏ | 2571/11526 [26:48<1:32:02, 1.62it/s] 22%|██▏ | 2572/11526 [26:48<1:32:19, 1.62it/s] {'loss': 0.2844, 'grad_norm': 0.6388780474662781, 'learning_rate': 9.545324947062697e-06, 'epoch': 0.67}
22%|██▏ | 2572/11526 [26:49<1:32:19, 1.62it/s] 22%|██▏ | 2573/11526 [26:49<1:32:06, 1.62it/s] {'loss': 0.2267, 'grad_norm': 0.5017347931861877, 'learning_rate': 9.544693793736527e-06, 'epoch': 0.67}
22%|██▏ | 2573/11526 [26:49<1:32:06, 1.62it/s] 22%|██▏ | 2574/11526 [26:50<1:32:02, 1.62it/s] {'loss': 0.2587, 'grad_norm': 0.5986546874046326, 'learning_rate': 9.544062223545183e-06, 'epoch': 0.67}
22%|██▏ | 2574/11526 [26:50<1:32:02, 1.62it/s] 22%|██▏ | 2575/11526 [26:50<1:32:03, 1.62it/s] {'loss': 0.2246, 'grad_norm': 0.5520259141921997, 'learning_rate': 9.543430236546598e-06, 'epoch': 0.67}
22%|██▏ | 2575/11526 [26:50<1:32:03, 1.62it/s] 22%|██▏ | 2576/11526 [26:51<1:31:51, 1.62it/s] {'loss': 0.6452, 'grad_norm': 0.7008392810821533, 'learning_rate': 9.54279783279874e-06, 'epoch': 0.67}
22%|██▏ | 2576/11526 [26:51<1:31:51, 1.62it/s] 22%|██▏ | 2577/11526 [26:51<1:31:57, 1.62it/s] {'loss': 0.2954, 'grad_norm': 0.6220493912696838, 'learning_rate': 9.542165012359618e-06, 'epoch': 0.67}
22%|██▏ | 2577/11526 [26:52<1:31:57, 1.62it/s] 22%|██▏ | 2578/11526 [26:52<1:31:49, 1.62it/s] {'loss': 0.309, 'grad_norm': 0.6458481550216675, 'learning_rate': 9.541531775287276e-06, 'epoch': 0.67}
22%|██▏ | 2578/11526 [26:52<1:31:49, 1.62it/s] 22%|██▏ | 2579/11526 [26:53<1:31:45, 1.63it/s] {'loss': 0.3496, 'grad_norm': 0.7115090489387512, 'learning_rate': 9.5408981216398e-06, 'epoch': 0.67}
22%|██▏ | 2579/11526 [26:53<1:31:45, 1.63it/s] 22%|██▏ | 2580/11526 [26:53<1:31:45, 1.62it/s] {'loss': 0.2674, 'grad_norm': 0.6594437956809998, 'learning_rate': 9.540264051475313e-06, 'epoch': 0.67}
22%|██▏ | 2580/11526 [26:53<1:31:45, 1.62it/s] 22%|██▏ | 2581/11526 [26:54<1:31:43, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.602155864238739, 'learning_rate': 9.539629564851971e-06, 'epoch': 0.67}
22%|██▏ | 2581/11526 [26:54<1:31:43, 1.63it/s] 22%|██▏ | 2582/11526 [26:55<1:31:50, 1.62it/s] {'loss': 0.3513, 'grad_norm': 0.7415337562561035, 'learning_rate': 9.538994661827979e-06, 'epoch': 0.67}
22%|██▏ | 2582/11526 [26:55<1:31:50, 1.62it/s] 22%|██▏ | 2583/11526 [26:55<1:31:43, 1.63it/s] {'loss': 0.2507, 'grad_norm': 0.5730432271957397, 'learning_rate': 9.538359342461568e-06, 'epoch': 0.67}
22%|██▏ | 2583/11526 [26:55<1:31:43, 1.63it/s] 22%|██▏ | 2584/11526 [26:56<1:31:38, 1.63it/s] {'loss': 0.3249, 'grad_norm': 0.5870863795280457, 'learning_rate': 9.537723606811018e-06, 'epoch': 0.67}
22%|██▏ | 2584/11526 [26:56<1:31:38, 1.63it/s] 22%|██▏ | 2585/11526 [26:56<1:31:39, 1.63it/s] {'loss': 0.2315, 'grad_norm': 0.5741580128669739, 'learning_rate': 9.53708745493464e-06, 'epoch': 0.67}
22%|██▏ | 2585/11526 [26:57<1:31:39, 1.63it/s] 22%|██▏ | 2586/11526 [26:57<1:31:35, 1.63it/s] {'loss': 0.2536, 'grad_norm': 0.4964587688446045, 'learning_rate': 9.536450886890785e-06, 'epoch': 0.67}
22%|██▏ | 2586/11526 [26:57<1:31:35, 1.63it/s] 22%|██▏ | 2587/11526 [26:58<1:31:41, 1.62it/s] {'loss': 0.2087, 'grad_norm': 0.5048887133598328, 'learning_rate': 9.535813902737842e-06, 'epoch': 0.67}
22%|██▏ | 2587/11526 [26:58<1:31:41, 1.62it/s] 22%|██▏ | 2588/11526 [26:58<1:31:41, 1.62it/s] {'loss': 0.2827, 'grad_norm': 0.5907896161079407, 'learning_rate': 9.535176502534242e-06, 'epoch': 0.67}
22%|██▏ | 2588/11526 [26:58<1:31:41, 1.62it/s] 22%|██▏ | 2589/11526 [26:59<1:31:36, 1.63it/s] {'loss': 0.2515, 'grad_norm': 0.6664296388626099, 'learning_rate': 9.53453868633845e-06, 'epoch': 0.67}
22%|██▏ | 2589/11526 [26:59<1:31:36, 1.63it/s] 22%|██▏ | 2590/11526 [26:59<1:31:47, 1.62it/s] {'loss': 0.3104, 'grad_norm': 0.6072967052459717, 'learning_rate': 9.533900454208967e-06, 'epoch': 0.67}
22%|██▏ | 2590/11526 [27:00<1:31:47, 1.62it/s] 22%|██▏ | 2591/11526 [27:00<1:31:34, 1.63it/s] {'loss': 0.2281, 'grad_norm': 0.5442471504211426, 'learning_rate': 9.533261806204339e-06, 'epoch': 0.67}
22%|██▏ | 2591/11526 [27:00<1:31:34, 1.63it/s] 22%|██▏ | 2592/11526 [27:01<1:31:40, 1.62it/s] {'loss': 0.3027, 'grad_norm': 0.5911776423454285, 'learning_rate': 9.532622742383144e-06, 'epoch': 0.67}
22%|██▏ | 2592/11526 [27:01<1:31:40, 1.62it/s] 22%|██▏ | 2593/11526 [27:01<1:31:35, 1.63it/s] {'loss': 0.2771, 'grad_norm': 0.5816844701766968, 'learning_rate': 9.531983262804003e-06, 'epoch': 0.67}
22%|██▏ | 2593/11526 [27:01<1:31:35, 1.63it/s] 23%|██▎ | 2594/11526 [27:02<1:31:31, 1.63it/s] {'loss': 0.2151, 'grad_norm': 0.6282395124435425, 'learning_rate': 9.53134336752557e-06, 'epoch': 0.68}
23%|██▎ | 2594/11526 [27:02<1:31:31, 1.63it/s] 23%|██▎ | 2595/11526 [27:03<1:31:32, 1.63it/s] {'loss': 0.2688, 'grad_norm': 0.6403831243515015, 'learning_rate': 9.530703056606541e-06, 'epoch': 0.68}
23%|██▎ | 2595/11526 [27:03<1:31:32, 1.63it/s] 23%|██▎ | 2596/11526 [27:03<1:31:28, 1.63it/s] {'loss': 0.2452, 'grad_norm': 0.47468504309654236, 'learning_rate': 9.53006233010565e-06, 'epoch': 0.68}
23%|██▎ | 2596/11526 [27:03<1:31:28, 1.63it/s] 23%|██▎ | 2597/11526 [27:04<1:31:30, 1.63it/s] {'loss': 0.3636, 'grad_norm': 0.6014848351478577, 'learning_rate': 9.529421188081665e-06, 'epoch': 0.68}
23%|██▎ | 2597/11526 [27:04<1:31:30, 1.63it/s] 23%|██▎ | 2598/11526 [27:04<1:31:26, 1.63it/s] {'loss': 0.3237, 'grad_norm': 0.6486244797706604, 'learning_rate': 9.528779630593398e-06, 'epoch': 0.68}
23%|██▎ | 2598/11526 [27:05<1:31:26, 1.63it/s] 23%|██▎ | 2599/11526 [27:05<1:31:28, 1.63it/s] {'loss': 0.3264, 'grad_norm': 0.6599380970001221, 'learning_rate': 9.528137657699697e-06, 'epoch': 0.68}
23%|██▎ | 2599/11526 [27:05<1:31:28, 1.63it/s] 23%|██▎ | 2600/11526 [27:06<1:31:27, 1.63it/s] {'loss': 0.4011, 'grad_norm': 0.6258840560913086, 'learning_rate': 9.527495269459445e-06, 'epoch': 0.68}
23%|██▎ | 2600/11526 [27:06<1:31:27, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.6805815696716309, 'eval_runtime': 1.9537, 'eval_samples_per_second': 102.371, 'eval_steps_per_second': 6.654, 'epoch': 0.68}
23%|██▎ | 2600/11526 [27:08<1:31:27, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 23%|██▎ | 2601/11526 [27:08<2:58:48, 1.20s/it] {'loss': 0.2569, 'grad_norm': 0.5158833265304565, 'learning_rate': 9.526852465931568e-06, 'epoch': 0.68}
23%|██▎ | 2601/11526 [27:08<2:58:48, 1.20s/it] 23%|██▎ | 2602/11526 [27:09<2:32:32, 1.03s/it] {'loss': 0.1767, 'grad_norm': 0.49115127325057983, 'learning_rate': 9.526209247175024e-06, 'epoch': 0.68}
23%|██▎ | 2602/11526 [27:09<2:32:32, 1.03s/it] 23%|██▎ | 2603/11526 [27:09<2:14:11, 1.11it/s] {'loss': 0.2458, 'grad_norm': 0.6167973279953003, 'learning_rate': 9.525565613248815e-06, 'epoch': 0.68}
23%|██▎ | 2603/11526 [27:10<2:14:11, 1.11it/s] 23%|██▎ | 2604/11526 [27:10<2:01:21, 1.23it/s] {'loss': 0.254, 'grad_norm': 0.536146342754364, 'learning_rate': 9.52492156421198e-06, 'epoch': 0.68}
23%|██▎ | 2604/11526 [27:10<2:01:21, 1.23it/s] 23%|██▎ | 2605/11526 [27:11<1:52:21, 1.32it/s] {'loss': 0.2789, 'grad_norm': 0.6106979846954346, 'learning_rate': 9.524277100123592e-06, 'epoch': 0.68}
23%|██▎ | 2605/11526 [27:11<1:52:21, 1.32it/s] 23%|██▎ | 2606/11526 [27:11<1:46:04, 1.40it/s] {'loss': 0.2808, 'grad_norm': 0.5707802176475525, 'learning_rate': 9.523632221042767e-06, 'epoch': 0.68}
23%|██▎ | 2606/11526 [27:11<1:46:04, 1.40it/s] 23%|██▎ | 2607/11526 [27:12<1:41:44, 1.46it/s] {'loss': 0.2211, 'grad_norm': 0.5435354113578796, 'learning_rate': 9.522986927028657e-06, 'epoch': 0.68}
23%|██▎ | 2607/11526 [27:12<1:41:44, 1.46it/s] 23%|██▎ | 2608/11526 [27:12<1:38:34, 1.51it/s] {'loss': 0.2625, 'grad_norm': 0.6293509602546692, 'learning_rate': 9.52234121814045e-06, 'epoch': 0.68}
23%|██▎ | 2608/11526 [27:13<1:38:34, 1.51it/s] 23%|██▎ | 2609/11526 [27:13<1:36:25, 1.54it/s] {'loss': 0.3408, 'grad_norm': 0.7309309244155884, 'learning_rate': 9.521695094437376e-06, 'epoch': 0.68}
23%|██▎ | 2609/11526 [27:13<1:36:25, 1.54it/s] 23%|██▎ | 2610/11526 [27:14<1:34:56, 1.57it/s] {'loss': 0.2269, 'grad_norm': 0.5055480599403381, 'learning_rate': 9.5210485559787e-06, 'epoch': 0.68}
23%|██▎ | 2610/11526 [27:14<1:34:56, 1.57it/s] 23%|██▎ | 2611/11526 [27:14<1:33:48, 1.58it/s] {'loss': 0.3202, 'grad_norm': 0.5774530172348022, 'learning_rate': 9.520401602823727e-06, 'epoch': 0.68}
23%|██▎ | 2611/11526 [27:14<1:33:48, 1.58it/s] 23%|██▎ | 2612/11526 [27:15<1:33:04, 1.60it/s] {'loss': 0.3952, 'grad_norm': 0.6983982920646667, 'learning_rate': 9.5197542350318e-06, 'epoch': 0.68}
23%|██▎ | 2612/11526 [27:15<1:33:04, 1.60it/s] 23%|██▎ | 2613/11526 [27:16<1:32:29, 1.61it/s] {'loss': 0.352, 'grad_norm': 0.6737574338912964, 'learning_rate': 9.519106452662296e-06, 'epoch': 0.68}
23%|██▎ | 2613/11526 [27:16<1:32:29, 1.61it/s] 23%|██▎ | 2614/11526 [27:16<1:32:04, 1.61it/s] {'loss': 0.2441, 'grad_norm': 0.5574196577072144, 'learning_rate': 9.518458255774636e-06, 'epoch': 0.68}
23%|██▎ | 2614/11526 [27:16<1:32:04, 1.61it/s] 23%|██▎ | 2615/11526 [27:17<1:31:51, 1.62it/s] {'loss': 0.3787, 'grad_norm': 0.6093575358390808, 'learning_rate': 9.517809644428277e-06, 'epoch': 0.68}
23%|██▎ | 2615/11526 [27:17<1:31:51, 1.62it/s] 23%|██▎ | 2616/11526 [27:17<1:31:38, 1.62it/s] {'loss': 0.2432, 'grad_norm': 0.5033789277076721, 'learning_rate': 9.517160618682712e-06, 'epoch': 0.68}
23%|██▎ | 2616/11526 [27:18<1:31:38, 1.62it/s] 23%|██▎ | 2617/11526 [27:18<1:31:40, 1.62it/s] {'loss': 0.3414, 'grad_norm': 0.5658825635910034, 'learning_rate': 9.516511178597474e-06, 'epoch': 0.68}
23%|██▎ | 2617/11526 [27:18<1:31:40, 1.62it/s] 23%|██▎ | 2618/11526 [27:19<1:31:32, 1.62it/s] {'loss': 0.2194, 'grad_norm': 0.5037969946861267, 'learning_rate': 9.515861324232132e-06, 'epoch': 0.68}
23%|██▎ | 2618/11526 [27:19<1:31:32, 1.62it/s] 23%|██▎ | 2619/11526 [27:19<1:31:26, 1.62it/s] {'loss': 0.3075, 'grad_norm': 0.6078910231590271, 'learning_rate': 9.515211055646295e-06, 'epoch': 0.68}
23%|██▎ | 2619/11526 [27:19<1:31:26, 1.62it/s] 23%|██▎ | 2620/11526 [27:20<1:31:23, 1.62it/s] {'loss': 0.2826, 'grad_norm': 0.667625367641449, 'learning_rate': 9.51456037289961e-06, 'epoch': 0.68}
23%|██▎ | 2620/11526 [27:20<1:31:23, 1.62it/s] 23%|██▎ | 2621/11526 [27:20<1:31:19, 1.63it/s] {'loss': 0.3708, 'grad_norm': 0.6895810961723328, 'learning_rate': 9.513909276051761e-06, 'epoch': 0.68}
23%|██▎ | 2621/11526 [27:21<1:31:19, 1.63it/s] 23%|██▎ | 2622/11526 [27:21<1:31:27, 1.62it/s] {'loss': 0.2797, 'grad_norm': 0.5221375226974487, 'learning_rate': 9.51325776516247e-06, 'epoch': 0.68}
23%|██▎ | 2622/11526 [27:21<1:31:27, 1.62it/s] 23%|██▎ | 2623/11526 [27:22<1:31:17, 1.63it/s] {'loss': 0.2169, 'grad_norm': 0.5107182860374451, 'learning_rate': 9.512605840291496e-06, 'epoch': 0.68}
23%|██▎ | 2623/11526 [27:22<1:31:17, 1.63it/s] 23%|██▎ | 2624/11526 [27:22<1:31:11, 1.63it/s] {'loss': 0.2348, 'grad_norm': 0.5062435865402222, 'learning_rate': 9.51195350149864e-06, 'epoch': 0.68}
23%|██▎ | 2624/11526 [27:22<1:31:11, 1.63it/s] 23%|██▎ | 2625/11526 [27:23<1:31:12, 1.63it/s] {'loss': 0.2349, 'grad_norm': 0.5584383606910706, 'learning_rate': 9.511300748843736e-06, 'epoch': 0.68}
23%|██▎ | 2625/11526 [27:23<1:31:12, 1.63it/s] 23%|██▎ | 2626/11526 [27:24<1:31:09, 1.63it/s] {'loss': 0.2407, 'grad_norm': 0.5831106901168823, 'learning_rate': 9.51064758238666e-06, 'epoch': 0.68}
23%|██▎ | 2626/11526 [27:24<1:31:09, 1.63it/s] 23%|██▎ | 2627/11526 [27:24<1:31:15, 1.63it/s] {'loss': 0.2683, 'grad_norm': 0.6070579290390015, 'learning_rate': 9.509994002187323e-06, 'epoch': 0.68}
23%|██▎ | 2627/11526 [27:24<1:31:15, 1.63it/s] 23%|██▎ | 2628/11526 [27:25<1:31:11, 1.63it/s] {'loss': 0.2255, 'grad_norm': 0.5469780564308167, 'learning_rate': 9.509340008305675e-06, 'epoch': 0.68}
23%|██▎ | 2628/11526 [27:25<1:31:11, 1.63it/s] 23%|██▎ | 2629/11526 [27:25<1:31:07, 1.63it/s] {'loss': 0.3455, 'grad_norm': 0.7017973065376282, 'learning_rate': 9.508685600801704e-06, 'epoch': 0.68}
23%|██▎ | 2629/11526 [27:26<1:31:07, 1.63it/s] 23%|██▎ | 2630/11526 [27:26<1:31:10, 1.63it/s] {'loss': 0.3048, 'grad_norm': 0.5655391216278076, 'learning_rate': 9.508030779735437e-06, 'epoch': 0.68}
23%|██▎ | 2630/11526 [27:26<1:31:10, 1.63it/s] 23%|██▎ | 2631/11526 [27:27<1:31:04, 1.63it/s] {'loss': 0.2689, 'grad_norm': 0.5967617034912109, 'learning_rate': 9.507375545166937e-06, 'epoch': 0.68}
23%|██▎ | 2631/11526 [27:27<1:31:04, 1.63it/s] 23%|██▎ | 2632/11526 [27:27<1:31:02, 1.63it/s] {'loss': 0.338, 'grad_norm': 0.6387160420417786, 'learning_rate': 9.506719897156307e-06, 'epoch': 0.69}
23%|██▎ | 2632/11526 [27:27<1:31:02, 1.63it/s] 23%|██▎ | 2633/11526 [27:28<1:31:01, 1.63it/s] {'loss': 0.2276, 'grad_norm': 0.5055999755859375, 'learning_rate': 9.506063835763685e-06, 'epoch': 0.69}
23%|██▎ | 2633/11526 [27:28<1:31:01, 1.63it/s] 23%|██▎ | 2634/11526 [27:28<1:31:02, 1.63it/s] {'loss': 0.2892, 'grad_norm': 0.5948834419250488, 'learning_rate': 9.505407361049249e-06, 'epoch': 0.69}
23%|██▎ | 2634/11526 [27:29<1:31:02, 1.63it/s] 23%|██▎ | 2635/11526 [27:29<1:31:06, 1.63it/s] {'loss': 0.2067, 'grad_norm': 0.4756108522415161, 'learning_rate': 9.504750473073216e-06, 'epoch': 0.69}
23%|██▎ | 2635/11526 [27:29<1:31:06, 1.63it/s] 23%|██▎ | 2636/11526 [27:30<1:31:03, 1.63it/s] {'loss': 0.3137, 'grad_norm': 0.6438934206962585, 'learning_rate': 9.504093171895838e-06, 'epoch': 0.69}
23%|██▎ | 2636/11526 [27:30<1:31:03, 1.63it/s] 23%|██▎ | 2637/11526 [27:30<1:31:06, 1.63it/s] {'loss': 0.2897, 'grad_norm': 0.5516982674598694, 'learning_rate': 9.503435457577409e-06, 'epoch': 0.69}
23%|██▎ | 2637/11526 [27:30<1:31:06, 1.63it/s] 23%|██▎ | 2638/11526 [27:31<1:31:01, 1.63it/s] {'loss': 0.2737, 'grad_norm': 0.672874927520752, 'learning_rate': 9.502777330178253e-06, 'epoch': 0.69}
23%|██▎ | 2638/11526 [27:31<1:31:01, 1.63it/s] 23%|██▎ | 2639/11526 [27:32<1:31:03, 1.63it/s] {'loss': 0.3196, 'grad_norm': 0.5676605105400085, 'learning_rate': 9.502118789758744e-06, 'epoch': 0.69}
23%|██▎ | 2639/11526 [27:32<1:31:03, 1.63it/s] 23%|██▎ | 2640/11526 [27:32<1:31:11, 1.62it/s] {'loss': 0.3174, 'grad_norm': 0.5143129229545593, 'learning_rate': 9.50145983637928e-06, 'epoch': 0.69}
23%|██▎ | 2640/11526 [27:32<1:31:11, 1.62it/s] 23%|██▎ | 2641/11526 [27:33<1:31:06, 1.63it/s] {'loss': 0.2189, 'grad_norm': 0.5144158005714417, 'learning_rate': 9.500800470100312e-06, 'epoch': 0.69}
23%|██▎ | 2641/11526 [27:33<1:31:06, 1.63it/s] 23%|██▎ | 2642/11526 [27:33<1:31:07, 1.62it/s] {'loss': 0.2323, 'grad_norm': 0.528559684753418, 'learning_rate': 9.50014069098231e-06, 'epoch': 0.69}
23%|██▎ | 2642/11526 [27:34<1:31:07, 1.62it/s] 23%|██▎ | 2643/11526 [27:34<1:31:02, 1.63it/s] {'loss': 0.2457, 'grad_norm': 0.5094040036201477, 'learning_rate': 9.499480499085805e-06, 'epoch': 0.69}
23%|██▎ | 2643/11526 [27:34<1:31:02, 1.63it/s] 23%|██▎ | 2644/11526 [27:35<1:30:59, 1.63it/s] {'loss': 0.2763, 'grad_norm': 0.5783657431602478, 'learning_rate': 9.498819894471346e-06, 'epoch': 0.69}
23%|██▎ | 2644/11526 [27:35<1:30:59, 1.63it/s] 23%|██▎ | 2645/11526 [27:35<1:31:08, 1.62it/s] {'loss': 0.2189, 'grad_norm': 0.45144718885421753, 'learning_rate': 9.498158877199528e-06, 'epoch': 0.69}
23%|██▎ | 2645/11526 [27:35<1:31:08, 1.62it/s] 23%|██▎ | 2646/11526 [27:36<1:31:06, 1.62it/s] {'loss': 0.2227, 'grad_norm': 0.46886005997657776, 'learning_rate': 9.497497447330985e-06, 'epoch': 0.69}
23%|██▎ | 2646/11526 [27:36<1:31:06, 1.62it/s] 23%|██▎ | 2647/11526 [27:36<1:31:13, 1.62it/s] {'loss': 0.3622, 'grad_norm': 0.6501901745796204, 'learning_rate': 9.496835604926387e-06, 'epoch': 0.69}
23%|██▎ | 2647/11526 [27:37<1:31:13, 1.62it/s] 23%|██▎ | 2648/11526 [27:37<1:31:14, 1.62it/s] {'loss': 0.2544, 'grad_norm': 0.661710798740387, 'learning_rate': 9.49617335004644e-06, 'epoch': 0.69}
23%|██▎ | 2648/11526 [27:37<1:31:14, 1.62it/s] 23%|██▎ | 2649/11526 [27:38<1:31:09, 1.62it/s] {'loss': 0.2747, 'grad_norm': 0.6030400395393372, 'learning_rate': 9.495510682751891e-06, 'epoch': 0.69}
23%|██▎ | 2649/11526 [27:38<1:31:09, 1.62it/s] 23%|██▎ | 2650/11526 [27:38<1:31:15, 1.62it/s] {'loss': 0.2792, 'grad_norm': 0.5886674523353577, 'learning_rate': 9.494847603103524e-06, 'epoch': 0.69}
23%|██▎ | 2650/11526 [27:38<1:31:15, 1.62it/s] 23%|██▎ | 2651/11526 [27:39<1:31:07, 1.62it/s] {'loss': 0.4244, 'grad_norm': 0.9581487774848938, 'learning_rate': 9.494184111162162e-06, 'epoch': 0.69}
23%|██▎ | 2651/11526 [27:39<1:31:07, 1.62it/s] 23%|██▎ | 2652/11526 [27:40<1:31:13, 1.62it/s] {'loss': 0.3205, 'grad_norm': 0.6019923090934753, 'learning_rate': 9.493520206988662e-06, 'epoch': 0.69}
23%|██▎ | 2652/11526 [27:40<1:31:13, 1.62it/s] 23%|██▎ | 2653/11526 [27:40<1:31:06, 1.62it/s] {'loss': 0.3081, 'grad_norm': 0.55750972032547, 'learning_rate': 9.492855890643921e-06, 'epoch': 0.69}
23%|██▎ | 2653/11526 [27:40<1:31:06, 1.62it/s] 23%|██▎ | 2654/11526 [27:41<1:31:00, 1.62it/s] {'loss': 0.3088, 'grad_norm': 0.6075685620307922, 'learning_rate': 9.492191162188873e-06, 'epoch': 0.69}
23%|██▎ | 2654/11526 [27:41<1:31:00, 1.62it/s] 23%|██▎ | 2655/11526 [27:41<1:31:00, 1.62it/s] {'loss': 0.3133, 'grad_norm': 0.605523407459259, 'learning_rate': 9.491526021684494e-06, 'epoch': 0.69}
23%|██▎ | 2655/11526 [27:42<1:31:00, 1.62it/s] 23%|██▎ | 2656/11526 [27:42<1:30:54, 1.63it/s] {'loss': 0.3252, 'grad_norm': 0.6901845932006836, 'learning_rate': 9.490860469191792e-06, 'epoch': 0.69}
23%|██▎ | 2656/11526 [27:42<1:30:54, 1.63it/s] 23%|██▎ | 2657/11526 [27:43<1:31:00, 1.62it/s] {'loss': 0.3042, 'grad_norm': 0.6316941380500793, 'learning_rate': 9.490194504771815e-06, 'epoch': 0.69}
23%|██▎ | 2657/11526 [27:43<1:31:00, 1.62it/s] 23%|██▎ | 2658/11526 [27:43<1:30:55, 1.63it/s] {'loss': 0.3129, 'grad_norm': 0.644494354724884, 'learning_rate': 9.489528128485653e-06, 'epoch': 0.69}
23%|██▎ | 2658/11526 [27:43<1:30:55, 1.63it/s] 23%|██▎ | 2659/11526 [27:44<1:30:50, 1.63it/s] {'loss': 0.3197, 'grad_norm': 0.6267033219337463, 'learning_rate': 9.488861340394423e-06, 'epoch': 0.69}
23%|██▎ | 2659/11526 [27:44<1:30:50, 1.63it/s] 23%|██▎ | 2660/11526 [27:44<1:30:48, 1.63it/s] {'loss': 0.3915, 'grad_norm': 0.7117418050765991, 'learning_rate': 9.488194140559292e-06, 'epoch': 0.69}
23%|██▎ | 2660/11526 [27:45<1:30:48, 1.63it/s] 23%|██▎ | 2661/11526 [27:45<1:30:46, 1.63it/s] {'loss': 0.2351, 'grad_norm': 0.5929414629936218, 'learning_rate': 9.487526529041457e-06, 'epoch': 0.69}
23%|██▎ | 2661/11526 [27:45<1:30:46, 1.63it/s] 23%|██▎ | 2662/11526 [27:46<1:31:08, 1.62it/s] {'loss': 0.2088, 'grad_norm': 0.5000252723693848, 'learning_rate': 9.486858505902156e-06, 'epoch': 0.69}
23%|██▎ | 2662/11526 [27:46<1:31:08, 1.62it/s] 23%|██▎ | 2663/11526 [27:46<1:31:00, 1.62it/s] {'loss': 0.2693, 'grad_norm': 0.573788046836853, 'learning_rate': 9.486190071202664e-06, 'epoch': 0.69}
23%|██▎ | 2663/11526 [27:46<1:31:00, 1.62it/s] 23%|██▎ | 2664/11526 [27:47<1:30:53, 1.63it/s] {'loss': 0.3088, 'grad_norm': 0.59837806224823, 'learning_rate': 9.485521225004292e-06, 'epoch': 0.69}
23%|██▎ | 2664/11526 [27:47<1:30:53, 1.63it/s] 23%|██▎ | 2665/11526 [27:48<1:30:55, 1.62it/s] {'loss': 0.3344, 'grad_norm': 0.6566981077194214, 'learning_rate': 9.484851967368393e-06, 'epoch': 0.69}
23%|██▎ | 2665/11526 [27:48<1:30:55, 1.62it/s] 23%|██▎ | 2666/11526 [27:48<1:30:51, 1.63it/s] {'loss': 0.3307, 'grad_norm': 0.6045358777046204, 'learning_rate': 9.484182298356355e-06, 'epoch': 0.69}
23%|██▎ | 2666/11526 [27:48<1:30:51, 1.63it/s] 23%|██▎ | 2667/11526 [27:49<1:30:56, 1.62it/s] {'loss': 0.2022, 'grad_norm': 0.5275006294250488, 'learning_rate': 9.4835122180296e-06, 'epoch': 0.69}
23%|██▎ | 2667/11526 [27:49<1:30:56, 1.62it/s] 23%|██▎ | 2668/11526 [27:49<1:30:50, 1.63it/s] {'loss': 0.2278, 'grad_norm': 0.46058544516563416, 'learning_rate': 9.482841726449595e-06, 'epoch': 0.69}
23%|██▎ | 2668/11526 [27:50<1:30:50, 1.63it/s] 23%|██▎ | 2669/11526 [27:50<1:30:46, 1.63it/s] {'loss': 0.3561, 'grad_norm': 0.6424175500869751, 'learning_rate': 9.482170823677842e-06, 'epoch': 0.69}
23%|██▎ | 2669/11526 [27:50<1:30:46, 1.63it/s] 23%|██▎ | 2670/11526 [27:51<1:30:46, 1.63it/s] {'loss': 0.2681, 'grad_norm': 0.5469070672988892, 'learning_rate': 9.481499509775878e-06, 'epoch': 0.69}
23%|██▎ | 2670/11526 [27:51<1:30:46, 1.63it/s] 23%|██▎ | 2671/11526 [27:51<1:30:42, 1.63it/s] {'loss': 0.3577, 'grad_norm': 0.6095420122146606, 'learning_rate': 9.480827784805278e-06, 'epoch': 0.7}
23%|██▎ | 2671/11526 [27:51<1:30:42, 1.63it/s] 23%|██▎ | 2672/11526 [27:52<1:30:49, 1.62it/s] {'loss': 0.2893, 'grad_norm': 0.5549034476280212, 'learning_rate': 9.48015564882766e-06, 'epoch': 0.7}
23%|██▎ | 2672/11526 [27:52<1:30:49, 1.62it/s] 23%|██▎ | 2673/11526 [27:52<1:30:45, 1.63it/s] {'loss': 0.2327, 'grad_norm': 0.5481250286102295, 'learning_rate': 9.479483101904677e-06, 'epoch': 0.7}
23%|██▎ | 2673/11526 [27:53<1:30:45, 1.63it/s] 23%|██▎ | 2674/11526 [27:53<1:30:41, 1.63it/s] {'loss': 0.2767, 'grad_norm': 0.5946663022041321, 'learning_rate': 9.478810144098015e-06, 'epoch': 0.7}
23%|██▎ | 2674/11526 [27:53<1:30:41, 1.63it/s] 23%|██▎ | 2675/11526 [27:54<1:30:43, 1.63it/s] {'loss': 0.2231, 'grad_norm': 0.45745041966438293, 'learning_rate': 9.478136775469404e-06, 'epoch': 0.7}
23%|██▎ | 2675/11526 [27:54<1:30:43, 1.63it/s] 23%|██▎ | 2676/11526 [27:54<1:30:38, 1.63it/s] {'loss': 0.2302, 'grad_norm': 0.5063327550888062, 'learning_rate': 9.477462996080607e-06, 'epoch': 0.7}
23%|██▎ | 2676/11526 [27:54<1:30:38, 1.63it/s] 23%|██▎ | 2677/11526 [27:55<1:30:42, 1.63it/s] {'loss': 0.1789, 'grad_norm': 0.46190959215164185, 'learning_rate': 9.47678880599343e-06, 'epoch': 0.7}
23%|██▎ | 2677/11526 [27:55<1:30:42, 1.63it/s] 23%|██▎ | 2678/11526 [27:56<1:30:43, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.4519427716732025, 'learning_rate': 9.47611420526971e-06, 'epoch': 0.7}
23%|██▎ | 2678/11526 [27:56<1:30:43, 1.63it/s] 23%|██▎ | 2679/11526 [27:56<1:30:42, 1.63it/s] {'loss': 0.266, 'grad_norm': 0.565636396408081, 'learning_rate': 9.475439193971327e-06, 'epoch': 0.7}
23%|██▎ | 2679/11526 [27:56<1:30:42, 1.63it/s] 23%|██▎ | 2680/11526 [27:57<1:30:45, 1.62it/s] {'loss': 0.2331, 'grad_norm': 0.5052024722099304, 'learning_rate': 9.474763772160198e-06, 'epoch': 0.7}
23%|██▎ | 2680/11526 [27:57<1:30:45, 1.62it/s] 23%|██▎ | 2681/11526 [27:57<1:30:38, 1.63it/s] {'loss': 0.2276, 'grad_norm': 0.5164679884910583, 'learning_rate': 9.474087939898276e-06, 'epoch': 0.7}
23%|██▎ | 2681/11526 [27:58<1:30:38, 1.63it/s] 23%|██▎ | 2682/11526 [27:58<1:30:40, 1.63it/s] {'loss': 0.2516, 'grad_norm': 0.5303423404693604, 'learning_rate': 9.47341169724755e-06, 'epoch': 0.7}
23%|██▎ | 2682/11526 [27:58<1:30:40, 1.63it/s] 23%|██▎ | 2683/11526 [27:59<1:30:35, 1.63it/s] {'loss': 0.2725, 'grad_norm': 0.6206834316253662, 'learning_rate': 9.472735044270052e-06, 'epoch': 0.7}
23%|██▎ | 2683/11526 [27:59<1:30:35, 1.63it/s] 23%|██▎ | 2684/11526 [27:59<1:30:34, 1.63it/s] {'loss': 0.2566, 'grad_norm': 0.5716359615325928, 'learning_rate': 9.472057981027846e-06, 'epoch': 0.7}
23%|██▎ | 2684/11526 [27:59<1:30:34, 1.63it/s] 23%|██▎ | 2685/11526 [28:00<1:30:36, 1.63it/s] {'loss': 0.2426, 'grad_norm': 0.5405532121658325, 'learning_rate': 9.471380507583038e-06, 'epoch': 0.7}
23%|██▎ | 2685/11526 [28:00<1:30:36, 1.63it/s] 23%|██▎ | 2686/11526 [28:00<1:30:33, 1.63it/s] {'loss': 0.354, 'grad_norm': 0.7111836075782776, 'learning_rate': 9.470702623997767e-06, 'epoch': 0.7}
23%|██▎ | 2686/11526 [28:01<1:30:33, 1.63it/s] 23%|██▎ | 2687/11526 [28:01<1:30:32, 1.63it/s] {'loss': 0.3951, 'grad_norm': 0.6595993041992188, 'learning_rate': 9.470024330334216e-06, 'epoch': 0.7}
23%|██▎ | 2687/11526 [28:01<1:30:32, 1.63it/s] 23%|██▎ | 2688/11526 [28:02<1:30:31, 1.63it/s] {'loss': 0.3002, 'grad_norm': 0.5531798601150513, 'learning_rate': 9.469345626654597e-06, 'epoch': 0.7}
23%|██▎ | 2688/11526 [28:02<1:30:31, 1.63it/s] 23%|██▎ | 2689/11526 [28:02<1:30:25, 1.63it/s] {'loss': 0.2433, 'grad_norm': 0.5693020820617676, 'learning_rate': 9.468666513021171e-06, 'epoch': 0.7}
23%|██▎ | 2689/11526 [28:02<1:30:25, 1.63it/s] 23%|██▎ | 2690/11526 [28:03<1:30:35, 1.63it/s] {'loss': 0.2747, 'grad_norm': 1.8414231538772583, 'learning_rate': 9.467986989496225e-06, 'epoch': 0.7}
23%|██▎ | 2690/11526 [28:03<1:30:35, 1.63it/s] 23%|██▎ | 2691/11526 [28:04<1:30:36, 1.63it/s] {'loss': 0.2637, 'grad_norm': 0.6385213732719421, 'learning_rate': 9.467307056142092e-06, 'epoch': 0.7}
23%|██▎ | 2691/11526 [28:04<1:30:36, 1.63it/s] 23%|██▎ | 2692/11526 [28:04<1:30:33, 1.63it/s] {'loss': 0.2574, 'grad_norm': 0.5387895107269287, 'learning_rate': 9.466626713021137e-06, 'epoch': 0.7}
23%|██▎ | 2692/11526 [28:04<1:30:33, 1.63it/s] 23%|██▎ | 2693/11526 [28:05<1:30:32, 1.63it/s] {'loss': 0.2964, 'grad_norm': 0.5511511564254761, 'learning_rate': 9.465945960195766e-06, 'epoch': 0.7}
23%|██▎ | 2693/11526 [28:05<1:30:32, 1.63it/s] 23%|██▎ | 2694/11526 [28:05<1:30:31, 1.63it/s] {'loss': 0.2711, 'grad_norm': 0.6490522027015686, 'learning_rate': 9.46526479772842e-06, 'epoch': 0.7}
23%|██▎ | 2694/11526 [28:06<1:30:31, 1.63it/s] 23%|██▎ | 2695/11526 [28:06<1:30:32, 1.63it/s] {'loss': 0.2462, 'grad_norm': 0.5374130606651306, 'learning_rate': 9.464583225681582e-06, 'epoch': 0.7}
23%|██▎ | 2695/11526 [28:06<1:30:32, 1.63it/s] 23%|██▎ | 2696/11526 [28:07<1:30:30, 1.63it/s] {'loss': 0.3341, 'grad_norm': 0.6130473017692566, 'learning_rate': 9.463901244117767e-06, 'epoch': 0.7}
23%|██▎ | 2696/11526 [28:07<1:30:30, 1.63it/s] 23%|██▎ | 2697/11526 [28:07<1:30:32, 1.63it/s] {'loss': 0.2459, 'grad_norm': 0.5827908515930176, 'learning_rate': 9.463218853099531e-06, 'epoch': 0.7}
23%|██▎ | 2697/11526 [28:07<1:30:32, 1.63it/s] 23%|██▎ | 2698/11526 [28:08<1:30:26, 1.63it/s] {'loss': 0.3015, 'grad_norm': 0.5690516829490662, 'learning_rate': 9.462536052689469e-06, 'epoch': 0.7}
23%|██▎ | 2698/11526 [28:08<1:30:26, 1.63it/s] 23%|██▎ | 2699/11526 [28:08<1:30:24, 1.63it/s] {'loss': 0.2965, 'grad_norm': 0.6145632266998291, 'learning_rate': 9.46185284295021e-06, 'epoch': 0.7}
23%|██▎ | 2699/11526 [28:09<1:30:24, 1.63it/s] 23%|██▎ | 2700/11526 [28:09<1:30:31, 1.63it/s] {'loss': 0.2666, 'grad_norm': 0.5665719509124756, 'learning_rate': 9.46116922394442e-06, 'epoch': 0.7}
23%|██▎ | 2700/11526 [28:09<1:30:31, 1.63it/s] 23%|██▎ | 2701/11526 [28:10<1:30:28, 1.63it/s] {'loss': 0.2732, 'grad_norm': 0.5651576519012451, 'learning_rate': 9.460485195734805e-06, 'epoch': 0.7}
23%|██▎ | 2701/11526 [28:10<1:30:28, 1.63it/s] 23%|██▎ | 2702/11526 [28:10<1:30:28, 1.63it/s] {'loss': 0.2106, 'grad_norm': 0.5136215686798096, 'learning_rate': 9.459800758384111e-06, 'epoch': 0.7}
23%|██▎ | 2702/11526 [28:10<1:30:28, 1.63it/s] 23%|██▎ | 2703/11526 [28:11<1:30:23, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.5207316875457764, 'learning_rate': 9.459115911955112e-06, 'epoch': 0.7}
23%|██▎ | 2703/11526 [28:11<1:30:23, 1.63it/s] 23%|██▎ | 2704/11526 [28:12<1:30:20, 1.63it/s] {'loss': 0.2757, 'grad_norm': 0.6208451986312866, 'learning_rate': 9.458430656510634e-06, 'epoch': 0.7}
23%|██▎ | 2704/11526 [28:12<1:30:20, 1.63it/s] 23%|██▎ | 2705/11526 [28:12<1:30:22, 1.63it/s] {'loss': 0.2218, 'grad_norm': 0.46891799569129944, 'learning_rate': 9.457744992113526e-06, 'epoch': 0.7}
23%|██▎ | 2705/11526 [28:12<1:30:22, 1.63it/s] 23%|██▎ | 2706/11526 [28:13<1:30:21, 1.63it/s] {'loss': 0.2098, 'grad_norm': 0.5086029767990112, 'learning_rate': 9.457058918826687e-06, 'epoch': 0.7}
23%|██▎ | 2706/11526 [28:13<1:30:21, 1.63it/s] 23%|██▎ | 2707/11526 [28:13<1:30:18, 1.63it/s] {'loss': 0.3007, 'grad_norm': 0.5248028635978699, 'learning_rate': 9.456372436713044e-06, 'epoch': 0.7}
23%|██▎ | 2707/11526 [28:14<1:30:18, 1.63it/s] 23%|██▎ | 2708/11526 [28:14<1:30:15, 1.63it/s] {'loss': 0.2952, 'grad_norm': 0.5515304207801819, 'learning_rate': 9.455685545835562e-06, 'epoch': 0.7}
23%|██▎ | 2708/11526 [28:14<1:30:15, 1.63it/s] 24%|██▎ | 2709/11526 [28:15<1:30:14, 1.63it/s] {'loss': 0.2693, 'grad_norm': 0.5767040848731995, 'learning_rate': 9.454998246257253e-06, 'epoch': 0.71}
24%|██▎ | 2709/11526 [28:15<1:30:14, 1.63it/s] 24%|██▎ | 2710/11526 [28:15<1:30:17, 1.63it/s] {'loss': 0.1968, 'grad_norm': 0.5134610533714294, 'learning_rate': 9.454310538041157e-06, 'epoch': 0.71}
24%|██▎ | 2710/11526 [28:15<1:30:17, 1.63it/s] 24%|██▎ | 2711/11526 [28:16<1:30:15, 1.63it/s] {'loss': 0.2883, 'grad_norm': 0.6335015892982483, 'learning_rate': 9.453622421250353e-06, 'epoch': 0.71}
24%|██▎ | 2711/11526 [28:16<1:30:15, 1.63it/s] 24%|██▎ | 2712/11526 [28:16<1:30:13, 1.63it/s] {'loss': 0.2606, 'grad_norm': 0.5333151817321777, 'learning_rate': 9.45293389594796e-06, 'epoch': 0.71}
24%|██▎ | 2712/11526 [28:17<1:30:13, 1.63it/s] 24%|██▎ | 2713/11526 [28:17<1:30:13, 1.63it/s] {'loss': 0.2411, 'grad_norm': 0.5133199095726013, 'learning_rate': 9.452244962197135e-06, 'epoch': 0.71}
24%|██▎ | 2713/11526 [28:17<1:30:13, 1.63it/s] 24%|██▎ | 2714/11526 [28:18<1:30:13, 1.63it/s] {'loss': 0.2578, 'grad_norm': 0.5454739928245544, 'learning_rate': 9.45155562006107e-06, 'epoch': 0.71}
24%|██▎ | 2714/11526 [28:18<1:30:13, 1.63it/s] 24%|██▎ | 2715/11526 [28:18<1:30:14, 1.63it/s] {'loss': 0.326, 'grad_norm': 0.5895969867706299, 'learning_rate': 9.450865869602996e-06, 'epoch': 0.71}
24%|██▎ | 2715/11526 [28:18<1:30:14, 1.63it/s] 24%|██▎ | 2716/11526 [28:19<1:30:11, 1.63it/s] {'loss': 0.2977, 'grad_norm': 0.618649959564209, 'learning_rate': 9.450175710886179e-06, 'epoch': 0.71}
24%|██▎ | 2716/11526 [28:19<1:30:11, 1.63it/s] 24%|██▎ | 2717/11526 [28:20<1:30:09, 1.63it/s] {'loss': 0.2327, 'grad_norm': 0.5424466133117676, 'learning_rate': 9.449485143973924e-06, 'epoch': 0.71}
24%|██▎ | 2717/11526 [28:20<1:30:09, 1.63it/s] 24%|██▎ | 2718/11526 [28:20<1:30:08, 1.63it/s] {'loss': 0.2526, 'grad_norm': 0.5713294148445129, 'learning_rate': 9.448794168929576e-06, 'epoch': 0.71}
24%|██▎ | 2718/11526 [28:20<1:30:08, 1.63it/s] 24%|██▎ | 2719/11526 [28:21<1:30:05, 1.63it/s] {'loss': 0.2544, 'grad_norm': 0.5772705674171448, 'learning_rate': 9.448102785816515e-06, 'epoch': 0.71}
24%|██▎ | 2719/11526 [28:21<1:30:05, 1.63it/s] 24%|██▎ | 2720/11526 [28:21<1:30:06, 1.63it/s] {'loss': 0.2921, 'grad_norm': 0.633676290512085, 'learning_rate': 9.447410994698159e-06, 'epoch': 0.71}
24%|██▎ | 2720/11526 [28:21<1:30:06, 1.63it/s] 24%|██▎ | 2721/11526 [28:22<1:30:07, 1.63it/s] {'loss': 0.3475, 'grad_norm': 0.7320243716239929, 'learning_rate': 9.44671879563796e-06, 'epoch': 0.71}
24%|██▎ | 2721/11526 [28:22<1:30:07, 1.63it/s] 24%|██▎ | 2722/11526 [28:23<1:30:03, 1.63it/s] {'loss': 0.2453, 'grad_norm': 0.5239514112472534, 'learning_rate': 9.446026188699413e-06, 'epoch': 0.71}
24%|██▎ | 2722/11526 [28:23<1:30:03, 1.63it/s] 24%|██▎ | 2723/11526 [28:23<1:30:03, 1.63it/s] {'loss': 0.2614, 'grad_norm': 0.5893319249153137, 'learning_rate': 9.445333173946047e-06, 'epoch': 0.71}
24%|██▎ | 2723/11526 [28:23<1:30:03, 1.63it/s] 24%|██▎ | 2724/11526 [28:24<1:30:04, 1.63it/s] {'loss': 0.2449, 'grad_norm': 0.5111149549484253, 'learning_rate': 9.44463975144143e-06, 'epoch': 0.71}
24%|██▎ | 2724/11526 [28:24<1:30:04, 1.63it/s] 24%|██▎ | 2725/11526 [28:24<1:30:04, 1.63it/s] {'loss': 0.2643, 'grad_norm': 0.5985398292541504, 'learning_rate': 9.443945921249164e-06, 'epoch': 0.71}
24%|██▎ | 2725/11526 [28:25<1:30:04, 1.63it/s] 24%|██▎ | 2726/11526 [28:25<1:30:01, 1.63it/s] {'loss': 0.2915, 'grad_norm': 0.49801158905029297, 'learning_rate': 9.443251683432893e-06, 'epoch': 0.71}
24%|██▎ | 2726/11526 [28:25<1:30:01, 1.63it/s] 24%|██▎ | 2727/11526 [28:26<1:30:01, 1.63it/s] {'loss': 0.3047, 'grad_norm': 0.5340232253074646, 'learning_rate': 9.4425570380563e-06, 'epoch': 0.71}
24%|██▎ | 2727/11526 [28:26<1:30:01, 1.63it/s] 24%|██▎ | 2728/11526 [28:26<1:29:59, 1.63it/s] {'loss': 0.2487, 'grad_norm': 0.5622614026069641, 'learning_rate': 9.441861985183094e-06, 'epoch': 0.71}
24%|██▎ | 2728/11526 [28:26<1:29:59, 1.63it/s] 24%|██▎ | 2729/11526 [28:27<1:30:01, 1.63it/s] {'loss': 0.3103, 'grad_norm': 0.5992222428321838, 'learning_rate': 9.441166524877036e-06, 'epoch': 0.71}
24%|██▎ | 2729/11526 [28:27<1:30:01, 1.63it/s] 24%|██▎ | 2730/11526 [28:28<1:30:00, 1.63it/s] {'loss': 0.2667, 'grad_norm': 0.5860572457313538, 'learning_rate': 9.440470657201915e-06, 'epoch': 0.71}
24%|██▎ | 2730/11526 [28:28<1:30:00, 1.63it/s] 24%|██▎ | 2731/11526 [28:28<1:29:57, 1.63it/s] {'loss': 0.281, 'grad_norm': 0.5827460289001465, 'learning_rate': 9.43977438222156e-06, 'epoch': 0.71}
24%|██▎ | 2731/11526 [28:28<1:29:57, 1.63it/s] 24%|██▎ | 2732/11526 [28:29<1:29:58, 1.63it/s] {'loss': 0.2468, 'grad_norm': 0.5219461917877197, 'learning_rate': 9.439077699999838e-06, 'epoch': 0.71}
24%|██▎ | 2732/11526 [28:29<1:29:58, 1.63it/s] 24%|██▎ | 2733/11526 [28:29<1:30:00, 1.63it/s] {'loss': 0.3133, 'grad_norm': 0.6245459914207458, 'learning_rate': 9.438380610600652e-06, 'epoch': 0.71}
24%|██▎ | 2733/11526 [28:29<1:30:00, 1.63it/s] 24%|██▎ | 2734/11526 [28:30<1:30:02, 1.63it/s] {'loss': 0.3178, 'grad_norm': 0.7219768166542053, 'learning_rate': 9.437683114087941e-06, 'epoch': 0.71}
24%|██▎ | 2734/11526 [28:30<1:30:02, 1.63it/s] 24%|██▎ | 2735/11526 [28:31<1:30:00, 1.63it/s] {'loss': 0.3111, 'grad_norm': 0.6441379189491272, 'learning_rate': 9.436985210525687e-06, 'epoch': 0.71}
24%|██▎ | 2735/11526 [28:31<1:30:00, 1.63it/s] 24%|██▎ | 2736/11526 [28:31<1:30:01, 1.63it/s] {'loss': 0.2667, 'grad_norm': 0.5163134932518005, 'learning_rate': 9.436286899977905e-06, 'epoch': 0.71}
24%|██▎ | 2736/11526 [28:31<1:30:01, 1.63it/s] 24%|██▎ | 2737/11526 [28:32<1:29:57, 1.63it/s] {'loss': 0.2445, 'grad_norm': 0.5817526578903198, 'learning_rate': 9.435588182508646e-06, 'epoch': 0.71}
24%|██▎ | 2737/11526 [28:32<1:29:57, 1.63it/s] 24%|██▍ | 2738/11526 [28:32<1:30:00, 1.63it/s] {'loss': 0.2836, 'grad_norm': 0.5765065550804138, 'learning_rate': 9.434889058182002e-06, 'epoch': 0.71}
24%|██▍ | 2738/11526 [28:33<1:30:00, 1.63it/s] 24%|██▍ | 2739/11526 [28:33<1:29:59, 1.63it/s] {'loss': 0.2879, 'grad_norm': 0.6494408249855042, 'learning_rate': 9.4341895270621e-06, 'epoch': 0.71}
24%|██▍ | 2739/11526 [28:33<1:29:59, 1.63it/s] 24%|██▍ | 2740/11526 [28:34<1:29:57, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5095300674438477, 'learning_rate': 9.433489589213103e-06, 'epoch': 0.71}
24%|██▍ | 2740/11526 [28:34<1:29:57, 1.63it/s] 24%|██▍ | 2741/11526 [28:34<1:29:59, 1.63it/s] {'loss': 0.2685, 'grad_norm': 0.5593273639678955, 'learning_rate': 9.432789244699218e-06, 'epoch': 0.71}
24%|██▍ | 2741/11526 [28:34<1:29:59, 1.63it/s] 24%|██▍ | 2742/11526 [28:35<1:29:57, 1.63it/s] {'loss': 0.24, 'grad_norm': 0.6050967574119568, 'learning_rate': 9.432088493584682e-06, 'epoch': 0.71}
24%|██▍ | 2742/11526 [28:35<1:29:57, 1.63it/s] 24%|██▍ | 2743/11526 [28:35<1:29:54, 1.63it/s] {'loss': 0.243, 'grad_norm': 0.49076634645462036, 'learning_rate': 9.431387335933769e-06, 'epoch': 0.71}
24%|██▍ | 2743/11526 [28:36<1:29:54, 1.63it/s] 24%|██▍ | 2744/11526 [28:36<1:29:52, 1.63it/s] {'loss': 0.275, 'grad_norm': 0.5874615907669067, 'learning_rate': 9.430685771810797e-06, 'epoch': 0.71}
24%|██▍ | 2744/11526 [28:36<1:29:52, 1.63it/s] 24%|██▍ | 2745/11526 [28:37<1:29:51, 1.63it/s] {'loss': 0.2782, 'grad_norm': 0.6004157662391663, 'learning_rate': 9.429983801280118e-06, 'epoch': 0.71}
24%|██▍ | 2745/11526 [28:37<1:29:51, 1.63it/s] 24%|██▍ | 2746/11526 [28:37<1:29:50, 1.63it/s] {'loss': 0.2607, 'grad_norm': 0.5977951288223267, 'learning_rate': 9.429281424406118e-06, 'epoch': 0.71}
24%|██▍ | 2746/11526 [28:37<1:29:50, 1.63it/s] 24%|██▍ | 2747/11526 [28:38<1:29:58, 1.63it/s] {'loss': 0.3217, 'grad_norm': 0.584911584854126, 'learning_rate': 9.428578641253226e-06, 'epoch': 0.71}
24%|██▍ | 2747/11526 [28:38<1:29:58, 1.63it/s] 24%|██▍ | 2748/11526 [28:39<1:29:54, 1.63it/s] {'loss': 0.2333, 'grad_norm': 0.5358606576919556, 'learning_rate': 9.427875451885902e-06, 'epoch': 0.72}
24%|██▍ | 2748/11526 [28:39<1:29:54, 1.63it/s] 24%|██▍ | 2749/11526 [28:39<1:29:51, 1.63it/s] {'loss': 0.2698, 'grad_norm': 0.5329960584640503, 'learning_rate': 9.427171856368647e-06, 'epoch': 0.72}
24%|██▍ | 2749/11526 [28:39<1:29:51, 1.63it/s] 24%|██▍ | 2750/11526 [28:40<1:29:49, 1.63it/s] {'loss': 0.2236, 'grad_norm': 0.536318838596344, 'learning_rate': 9.426467854766002e-06, 'epoch': 0.72}
24%|██▍ | 2750/11526 [28:40<1:29:49, 1.63it/s] 24%|██▍ | 2751/11526 [28:40<1:29:49, 1.63it/s] {'loss': 0.2036, 'grad_norm': 0.502009391784668, 'learning_rate': 9.425763447142538e-06, 'epoch': 0.72}
24%|██▍ | 2751/11526 [28:41<1:29:49, 1.63it/s] 24%|██▍ | 2752/11526 [28:41<1:29:52, 1.63it/s] {'loss': 0.2666, 'grad_norm': 0.5540882349014282, 'learning_rate': 9.42505863356287e-06, 'epoch': 0.72}
24%|██▍ | 2752/11526 [28:41<1:29:52, 1.63it/s] 24%|██▍ | 2753/11526 [28:42<1:29:49, 1.63it/s] {'loss': 0.2727, 'grad_norm': 0.6310216188430786, 'learning_rate': 9.424353414091645e-06, 'epoch': 0.72}
24%|██▍ | 2753/11526 [28:42<1:29:49, 1.63it/s] 24%|██▍ | 2754/11526 [28:42<1:29:47, 1.63it/s] {'loss': 0.2104, 'grad_norm': 0.5097343325614929, 'learning_rate': 9.423647788793555e-06, 'epoch': 0.72}
24%|██▍ | 2754/11526 [28:42<1:29:47, 1.63it/s] 24%|██▍ | 2755/11526 [28:43<1:29:49, 1.63it/s] {'loss': 0.3214, 'grad_norm': 0.6064619421958923, 'learning_rate': 9.422941757733318e-06, 'epoch': 0.72}
24%|██▍ | 2755/11526 [28:43<1:29:49, 1.63it/s] 24%|██▍ | 2756/11526 [28:43<1:29:45, 1.63it/s] {'loss': 0.2797, 'grad_norm': 0.6283191442489624, 'learning_rate': 9.422235320975697e-06, 'epoch': 0.72}
24%|██▍ | 2756/11526 [28:44<1:29:45, 1.63it/s] 24%|██▍ | 2757/11526 [28:44<1:29:43, 1.63it/s] {'loss': 0.2194, 'grad_norm': 0.5210425853729248, 'learning_rate': 9.421528478585492e-06, 'epoch': 0.72}
24%|██▍ | 2757/11526 [28:44<1:29:43, 1.63it/s] 24%|██▍ | 2758/11526 [28:45<1:29:41, 1.63it/s] {'loss': 0.3541, 'grad_norm': 0.6775555610656738, 'learning_rate': 9.420821230627534e-06, 'epoch': 0.72}
24%|██▍ | 2758/11526 [28:45<1:29:41, 1.63it/s] 24%|██▍ | 2759/11526 [28:45<1:29:41, 1.63it/s] {'loss': 0.1714, 'grad_norm': 0.4345233142375946, 'learning_rate': 9.420113577166702e-06, 'epoch': 0.72}
24%|██▍ | 2759/11526 [28:45<1:29:41, 1.63it/s] 24%|██▍ | 2760/11526 [28:46<1:29:41, 1.63it/s] {'loss': 0.2259, 'grad_norm': 0.5533421039581299, 'learning_rate': 9.419405518267904e-06, 'epoch': 0.72}
24%|██▍ | 2760/11526 [28:46<1:29:41, 1.63it/s] 24%|██▍ | 2761/11526 [28:47<1:29:38, 1.63it/s] {'loss': 0.313, 'grad_norm': 0.6464202404022217, 'learning_rate': 9.418697053996086e-06, 'epoch': 0.72}
24%|██▍ | 2761/11526 [28:47<1:29:38, 1.63it/s] 24%|██▍ | 2762/11526 [28:47<1:29:38, 1.63it/s] {'loss': 0.2329, 'grad_norm': 0.4883823096752167, 'learning_rate': 9.417988184416234e-06, 'epoch': 0.72}
24%|██▍ | 2762/11526 [28:47<1:29:38, 1.63it/s] 24%|██▍ | 2763/11526 [28:48<1:29:40, 1.63it/s] {'loss': 0.2528, 'grad_norm': 0.5461239814758301, 'learning_rate': 9.417278909593366e-06, 'epoch': 0.72}
24%|██▍ | 2763/11526 [28:48<1:29:40, 1.63it/s] 24%|██▍ | 2764/11526 [28:48<1:29:41, 1.63it/s] {'loss': 0.2579, 'grad_norm': 0.47375041246414185, 'learning_rate': 9.416569229592545e-06, 'epoch': 0.72}
24%|██▍ | 2764/11526 [28:49<1:29:41, 1.63it/s] 24%|██▍ | 2765/11526 [28:49<1:29:38, 1.63it/s] {'loss': 0.2068, 'grad_norm': 0.5144727826118469, 'learning_rate': 9.415859144478864e-06, 'epoch': 0.72}
24%|██▍ | 2765/11526 [28:49<1:29:38, 1.63it/s] 24%|██▍ | 2766/11526 [28:50<1:29:42, 1.63it/s] {'loss': 0.2294, 'grad_norm': 0.531468391418457, 'learning_rate': 9.415148654317456e-06, 'epoch': 0.72}
24%|██▍ | 2766/11526 [28:50<1:29:42, 1.63it/s] 24%|██▍ | 2767/11526 [28:50<1:29:39, 1.63it/s] {'loss': 0.2768, 'grad_norm': 0.5810679793357849, 'learning_rate': 9.414437759173495e-06, 'epoch': 0.72}
24%|██▍ | 2767/11526 [28:50<1:29:39, 1.63it/s] 24%|██▍ | 2768/11526 [28:51<1:29:40, 1.63it/s] {'loss': 0.3656, 'grad_norm': 0.5467879176139832, 'learning_rate': 9.413726459112185e-06, 'epoch': 0.72}
24%|██▍ | 2768/11526 [28:51<1:29:40, 1.63it/s] 24%|██▍ | 2769/11526 [28:51<1:29:37, 1.63it/s] {'loss': 0.3592, 'grad_norm': 0.5825479030609131, 'learning_rate': 9.41301475419877e-06, 'epoch': 0.72}
24%|██▍ | 2769/11526 [28:52<1:29:37, 1.63it/s] 24%|██▍ | 2770/11526 [28:52<1:29:38, 1.63it/s] {'loss': 0.2172, 'grad_norm': 0.5234613418579102, 'learning_rate': 9.412302644498532e-06, 'epoch': 0.72}
24%|██▍ | 2770/11526 [28:52<1:29:38, 1.63it/s] 24%|██▍ | 2771/11526 [28:53<1:29:35, 1.63it/s] {'loss': 0.3147, 'grad_norm': 0.7082445621490479, 'learning_rate': 9.411590130076792e-06, 'epoch': 0.72}
24%|██▍ | 2771/11526 [28:53<1:29:35, 1.63it/s] 24%|██▍ | 2772/11526 [28:53<1:29:34, 1.63it/s] {'loss': 0.2585, 'grad_norm': 0.5906031131744385, 'learning_rate': 9.410877210998903e-06, 'epoch': 0.72}
24%|██▍ | 2772/11526 [28:53<1:29:34, 1.63it/s] 24%|██▍ | 2773/11526 [28:54<1:29:33, 1.63it/s] {'loss': 0.275, 'grad_norm': 0.5006771087646484, 'learning_rate': 9.41016388733026e-06, 'epoch': 0.72}
24%|██▍ | 2773/11526 [28:54<1:29:33, 1.63it/s] 24%|██▍ | 2774/11526 [28:55<1:29:30, 1.63it/s] {'loss': 0.318, 'grad_norm': 0.708152711391449, 'learning_rate': 9.409450159136293e-06, 'epoch': 0.72}
24%|██▍ | 2774/11526 [28:55<1:29:30, 1.63it/s] 24%|██▍ | 2775/11526 [28:55<1:29:30, 1.63it/s] {'loss': 0.3187, 'grad_norm': 0.8195157647132874, 'learning_rate': 9.408736026482469e-06, 'epoch': 0.72}
24%|██▍ | 2775/11526 [28:55<1:29:30, 1.63it/s] 24%|██▍ | 2776/11526 [28:56<1:29:29, 1.63it/s] {'loss': 0.3552, 'grad_norm': 0.5943922996520996, 'learning_rate': 9.408021489434291e-06, 'epoch': 0.72}
24%|██▍ | 2776/11526 [28:56<1:29:29, 1.63it/s] 24%|██▍ | 2777/11526 [28:56<1:29:32, 1.63it/s] {'loss': 0.2188, 'grad_norm': 0.5907689929008484, 'learning_rate': 9.407306548057302e-06, 'epoch': 0.72}
24%|██▍ | 2777/11526 [28:56<1:29:32, 1.63it/s] 24%|██▍ | 2778/11526 [28:57<1:29:36, 1.63it/s] {'loss': 0.208, 'grad_norm': 0.5128298401832581, 'learning_rate': 9.406591202417079e-06, 'epoch': 0.72}
24%|██▍ | 2778/11526 [28:57<1:29:36, 1.63it/s] 24%|██▍ | 2779/11526 [28:58<1:29:36, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.49965962767601013, 'learning_rate': 9.405875452579239e-06, 'epoch': 0.72}
24%|██▍ | 2779/11526 [28:58<1:29:36, 1.63it/s] 24%|██▍ | 2780/11526 [28:58<1:29:37, 1.63it/s] {'loss': 0.2329, 'grad_norm': 0.525865375995636, 'learning_rate': 9.405159298609435e-06, 'epoch': 0.72}
24%|██▍ | 2780/11526 [28:58<1:29:37, 1.63it/s] 24%|██▍ | 2781/11526 [28:59<1:29:30, 1.63it/s] {'loss': 0.3035, 'grad_norm': 0.6320933699607849, 'learning_rate': 9.404442740573355e-06, 'epoch': 0.72}
24%|██▍ | 2781/11526 [28:59<1:29:30, 1.63it/s] 24%|██▍ | 2782/11526 [28:59<1:29:33, 1.63it/s] {'loss': 0.3213, 'grad_norm': 0.678416907787323, 'learning_rate': 9.403725778536726e-06, 'epoch': 0.72}
24%|██▍ | 2782/11526 [29:00<1:29:33, 1.63it/s] 24%|██▍ | 2783/11526 [29:00<1:29:31, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.4638051986694336, 'learning_rate': 9.403008412565313e-06, 'epoch': 0.72}
24%|██▍ | 2783/11526 [29:00<1:29:31, 1.63it/s] 24%|██▍ | 2784/11526 [29:01<1:29:30, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.5626169443130493, 'learning_rate': 9.402290642724917e-06, 'epoch': 0.72}
24%|██▍ | 2784/11526 [29:01<1:29:30, 1.63it/s] 24%|██▍ | 2785/11526 [29:01<1:29:31, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.5454785823822021, 'learning_rate': 9.401572469081374e-06, 'epoch': 0.72}
24%|██▍ | 2785/11526 [29:01<1:29:31, 1.63it/s] 24%|██▍ | 2786/11526 [29:02<1:29:26, 1.63it/s] {'loss': 0.3053, 'grad_norm': 0.5928676128387451, 'learning_rate': 9.40085389170056e-06, 'epoch': 0.73}
24%|██▍ | 2786/11526 [29:02<1:29:26, 1.63it/s] 24%|██▍ | 2787/11526 [29:03<1:29:26, 1.63it/s] {'loss': 0.2541, 'grad_norm': 0.5874133110046387, 'learning_rate': 9.400134910648389e-06, 'epoch': 0.73}
24%|██▍ | 2787/11526 [29:03<1:29:26, 1.63it/s] 24%|██▍ | 2788/11526 [29:03<1:29:25, 1.63it/s] {'loss': 0.2006, 'grad_norm': 0.4911229610443115, 'learning_rate': 9.399415525990807e-06, 'epoch': 0.73}
24%|██▍ | 2788/11526 [29:03<1:29:25, 1.63it/s] 24%|██▍ | 2789/11526 [29:04<1:29:25, 1.63it/s] {'loss': 0.3266, 'grad_norm': 0.6529443860054016, 'learning_rate': 9.398695737793802e-06, 'epoch': 0.73}
24%|██▍ | 2789/11526 [29:04<1:29:25, 1.63it/s] 24%|██▍ | 2790/11526 [29:04<1:29:24, 1.63it/s] {'loss': 0.1862, 'grad_norm': 0.43829381465911865, 'learning_rate': 9.397975546123395e-06, 'epoch': 0.73}
24%|██▍ | 2790/11526 [29:04<1:29:24, 1.63it/s] 24%|██▍ | 2791/11526 [29:05<1:29:26, 1.63it/s] {'loss': 0.2194, 'grad_norm': 0.5320106744766235, 'learning_rate': 9.397254951045649e-06, 'epoch': 0.73}
24%|██▍ | 2791/11526 [29:05<1:29:26, 1.63it/s] 24%|██▍ | 2792/11526 [29:06<1:29:24, 1.63it/s] {'loss': 0.2939, 'grad_norm': 0.5630029439926147, 'learning_rate': 9.396533952626659e-06, 'epoch': 0.73}
24%|██▍ | 2792/11526 [29:06<1:29:24, 1.63it/s] 24%|██▍ | 2793/11526 [29:06<1:29:23, 1.63it/s] {'loss': 0.296, 'grad_norm': 0.7668532133102417, 'learning_rate': 9.395812550932559e-06, 'epoch': 0.73}
24%|██▍ | 2793/11526 [29:06<1:29:23, 1.63it/s] 24%|██▍ | 2794/11526 [29:07<1:29:24, 1.63it/s] {'loss': 0.2334, 'grad_norm': 0.5746144652366638, 'learning_rate': 9.395090746029522e-06, 'epoch': 0.73}
24%|██▍ | 2794/11526 [29:07<1:29:24, 1.63it/s] 24%|██▍ | 2795/11526 [29:07<1:29:21, 1.63it/s] {'loss': 0.2544, 'grad_norm': 0.5988734364509583, 'learning_rate': 9.394368537983754e-06, 'epoch': 0.73}
24%|██▍ | 2795/11526 [29:08<1:29:21, 1.63it/s] 24%|██▍ | 2796/11526 [29:08<1:29:22, 1.63it/s] {'loss': 0.2528, 'grad_norm': 0.6238309144973755, 'learning_rate': 9.3936459268615e-06, 'epoch': 0.73}
24%|██▍ | 2796/11526 [29:08<1:29:22, 1.63it/s] 24%|██▍ | 2797/11526 [29:09<1:29:20, 1.63it/s] {'loss': 0.2305, 'grad_norm': 0.5359101295471191, 'learning_rate': 9.392922912729043e-06, 'epoch': 0.73}
24%|██▍ | 2797/11526 [29:09<1:29:20, 1.63it/s] 24%|██▍ | 2798/11526 [29:09<1:29:21, 1.63it/s] {'loss': 0.2293, 'grad_norm': 0.5716822147369385, 'learning_rate': 9.392199495652703e-06, 'epoch': 0.73}
24%|██▍ | 2798/11526 [29:09<1:29:21, 1.63it/s] 24%|██▍ | 2799/11526 [29:10<1:29:20, 1.63it/s] {'loss': 0.2404, 'grad_norm': 0.5568745136260986, 'learning_rate': 9.391475675698834e-06, 'epoch': 0.73}
24%|██▍ | 2799/11526 [29:10<1:29:20, 1.63it/s] 24%|██▍ | 2800/11526 [29:10<1:29:21, 1.63it/s] {'loss': 0.268, 'grad_norm': 0.5937018990516663, 'learning_rate': 9.390751452933829e-06, 'epoch': 0.73}
24%|██▍ | 2800/11526 [29:11<1:29:21, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6657302975654602, 'eval_runtime': 1.9557, 'eval_samples_per_second': 102.266, 'eval_steps_per_second': 6.647, 'epoch': 0.73}
24%|██▍ | 2800/11526 [29:13<1:29:21, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 24%|██▍ | 2801/11526 [29:13<2:54:52, 1.20s/it] {'loss': 0.3059, 'grad_norm': 0.5785420536994934, 'learning_rate': 9.390026827424119e-06, 'epoch': 0.73}
24%|██▍ | 2801/11526 [29:13<2:54:52, 1.20s/it] 24%|██▍ | 2802/11526 [29:14<2:29:10, 1.03s/it] {'loss': 0.1982, 'grad_norm': 0.5152596831321716, 'learning_rate': 9.38930179923617e-06, 'epoch': 0.73}
24%|██▍ | 2802/11526 [29:14<2:29:10, 1.03s/it] 24%|██▍ | 2803/11526 [29:14<2:11:10, 1.11it/s] {'loss': 0.2748, 'grad_norm': 0.6446214318275452, 'learning_rate': 9.388576368436485e-06, 'epoch': 0.73}
24%|██▍ | 2803/11526 [29:14<2:11:10, 1.11it/s] 24%|██▍ | 2804/11526 [29:15<1:58:35, 1.23it/s] {'loss': 0.311, 'grad_norm': 0.6372789740562439, 'learning_rate': 9.387850535091608e-06, 'epoch': 0.73}
24%|██▍ | 2804/11526 [29:15<1:58:35, 1.23it/s] 24%|██▍ | 2805/11526 [29:16<1:49:46, 1.32it/s] {'loss': 0.2363, 'grad_norm': 0.5253145098686218, 'learning_rate': 9.387124299268113e-06, 'epoch': 0.73}
24%|██▍ | 2805/11526 [29:16<1:49:46, 1.32it/s] 24%|██▍ | 2806/11526 [29:16<1:43:37, 1.40it/s] {'loss': 0.2548, 'grad_norm': 0.5018353462219238, 'learning_rate': 9.386397661032615e-06, 'epoch': 0.73}
24%|██▍ | 2806/11526 [29:16<1:43:37, 1.40it/s] 24%|██▍ | 2807/11526 [29:17<1:39:13, 1.46it/s] {'loss': 0.2271, 'grad_norm': 0.5110366344451904, 'learning_rate': 9.385670620451766e-06, 'epoch': 0.73}
24%|██▍ | 2807/11526 [29:17<1:39:13, 1.46it/s] 24%|██▍ | 2808/11526 [29:17<1:36:16, 1.51it/s] {'loss': 0.2672, 'grad_norm': 0.5213480591773987, 'learning_rate': 9.384943177592255e-06, 'epoch': 0.73}
24%|██▍ | 2808/11526 [29:17<1:36:16, 1.51it/s] 24%|██▍ | 2809/11526 [29:18<1:34:10, 1.54it/s] {'loss': 0.2958, 'grad_norm': 0.6404922604560852, 'learning_rate': 9.384215332520805e-06, 'epoch': 0.73}
24%|██▍ | 2809/11526 [29:18<1:34:10, 1.54it/s] 24%|██▍ | 2810/11526 [29:19<1:32:39, 1.57it/s] {'loss': 0.2759, 'grad_norm': 0.5190278887748718, 'learning_rate': 9.38348708530418e-06, 'epoch': 0.73}
24%|██▍ | 2810/11526 [29:19<1:32:39, 1.57it/s] 24%|██▍ | 2811/11526 [29:19<1:31:36, 1.59it/s] {'loss': 0.2224, 'grad_norm': 0.46640852093696594, 'learning_rate': 9.382758436009179e-06, 'epoch': 0.73}
24%|██▍ | 2811/11526 [29:19<1:31:36, 1.59it/s] 24%|██▍ | 2812/11526 [29:20<1:30:53, 1.60it/s] {'loss': 0.2632, 'grad_norm': 0.6300980448722839, 'learning_rate': 9.382029384702638e-06, 'epoch': 0.73}
24%|██▍ | 2812/11526 [29:20<1:30:53, 1.60it/s] 24%|██▍ | 2813/11526 [29:20<1:30:21, 1.61it/s] {'loss': 0.3116, 'grad_norm': 0.6505565643310547, 'learning_rate': 9.381299931451427e-06, 'epoch': 0.73}
24%|██▍ | 2813/11526 [29:21<1:30:21, 1.61it/s] 24%|██▍ | 2814/11526 [29:21<1:29:59, 1.61it/s] {'loss': 0.1991, 'grad_norm': 0.45023831725120544, 'learning_rate': 9.38057007632246e-06, 'epoch': 0.73}
24%|██▍ | 2814/11526 [29:21<1:29:59, 1.61it/s] 24%|██▍ | 2815/11526 [29:22<1:29:46, 1.62it/s] {'loss': 0.2209, 'grad_norm': 0.5075018405914307, 'learning_rate': 9.37983981938268e-06, 'epoch': 0.73}
24%|██▍ | 2815/11526 [29:22<1:29:46, 1.62it/s] 24%|██▍ | 2816/11526 [29:22<1:29:33, 1.62it/s] {'loss': 0.2687, 'grad_norm': 0.5889638662338257, 'learning_rate': 9.379109160699071e-06, 'epoch': 0.73}
24%|██▍ | 2816/11526 [29:22<1:29:33, 1.62it/s] 24%|██▍ | 2817/11526 [29:23<1:29:28, 1.62it/s] {'loss': 0.3467, 'grad_norm': 0.7642179727554321, 'learning_rate': 9.378378100338655e-06, 'epoch': 0.73}
24%|██▍ | 2817/11526 [29:23<1:29:28, 1.62it/s] 24%|██▍ | 2818/11526 [29:24<1:29:26, 1.62it/s] {'loss': 0.4262, 'grad_norm': 0.7011123895645142, 'learning_rate': 9.377646638368487e-06, 'epoch': 0.73}
24%|██▍ | 2818/11526 [29:24<1:29:26, 1.62it/s] 24%|██▍ | 2819/11526 [29:24<1:29:24, 1.62it/s] {'loss': 0.2241, 'grad_norm': 0.4752791225910187, 'learning_rate': 9.376914774855662e-06, 'epoch': 0.73}
24%|██▍ | 2819/11526 [29:24<1:29:24, 1.62it/s] 24%|██▍ | 2820/11526 [29:25<1:29:21, 1.62it/s] {'loss': 0.2562, 'grad_norm': 0.5050241947174072, 'learning_rate': 9.37618250986731e-06, 'epoch': 0.73}
24%|██▍ | 2820/11526 [29:25<1:29:21, 1.62it/s] 24%|██▍ | 2821/11526 [29:25<1:29:14, 1.63it/s] {'loss': 0.2466, 'grad_norm': 0.49970918893814087, 'learning_rate': 9.375449843470599e-06, 'epoch': 0.73}
24%|██▍ | 2821/11526 [29:25<1:29:14, 1.63it/s] 24%|██▍ | 2822/11526 [29:26<1:29:20, 1.62it/s] {'loss': 0.2817, 'grad_norm': 0.5536237359046936, 'learning_rate': 9.374716775732733e-06, 'epoch': 0.73}
24%|██▍ | 2822/11526 [29:26<1:29:20, 1.62it/s] 24%|██▍ | 2823/11526 [29:27<1:29:15, 1.63it/s] {'loss': 0.2715, 'grad_norm': 0.5560149550437927, 'learning_rate': 9.373983306720953e-06, 'epoch': 0.73}
24%|██▍ | 2823/11526 [29:27<1:29:15, 1.63it/s] 25%|██▍ | 2824/11526 [29:27<1:29:10, 1.63it/s] {'loss': 0.2515, 'grad_norm': 0.5619730353355408, 'learning_rate': 9.373249436502537e-06, 'epoch': 0.74}
25%|██▍ | 2824/11526 [29:27<1:29:10, 1.63it/s] 25%|██▍ | 2825/11526 [29:28<1:29:07, 1.63it/s] {'loss': 0.2805, 'grad_norm': 0.5594207048416138, 'learning_rate': 9.3725151651448e-06, 'epoch': 0.74}
25%|██▍ | 2825/11526 [29:28<1:29:07, 1.63it/s] 25%|██▍ | 2826/11526 [29:28<1:29:04, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.5651585459709167, 'learning_rate': 9.371780492715092e-06, 'epoch': 0.74}
25%|██▍ | 2826/11526 [29:29<1:29:04, 1.63it/s] 25%|██▍ | 2827/11526 [29:29<1:29:07, 1.63it/s] {'loss': 0.2323, 'grad_norm': 0.551776111125946, 'learning_rate': 9.371045419280806e-06, 'epoch': 0.74}
25%|██▍ | 2827/11526 [29:29<1:29:07, 1.63it/s] 25%|██▍ | 2828/11526 [29:30<1:29:04, 1.63it/s] {'loss': 0.3027, 'grad_norm': 0.6278568506240845, 'learning_rate': 9.370309944909361e-06, 'epoch': 0.74}
25%|██▍ | 2828/11526 [29:30<1:29:04, 1.63it/s] 25%|██▍ | 2829/11526 [29:30<1:29:04, 1.63it/s] {'loss': 0.2567, 'grad_norm': 0.4868076741695404, 'learning_rate': 9.369574069668224e-06, 'epoch': 0.74}
25%|██▍ | 2829/11526 [29:30<1:29:04, 1.63it/s] 25%|██▍ | 2830/11526 [29:31<1:29:03, 1.63it/s] {'loss': 0.2396, 'grad_norm': 0.5140426754951477, 'learning_rate': 9.36883779362489e-06, 'epoch': 0.74}
25%|██▍ | 2830/11526 [29:31<1:29:03, 1.63it/s] 25%|██▍ | 2831/11526 [29:32<1:29:01, 1.63it/s] {'loss': 0.2206, 'grad_norm': 0.522959291934967, 'learning_rate': 9.368101116846898e-06, 'epoch': 0.74}
25%|██▍ | 2831/11526 [29:32<1:29:01, 1.63it/s] 25%|██▍ | 2832/11526 [29:32<1:29:01, 1.63it/s] {'loss': 0.3074, 'grad_norm': 0.5768859386444092, 'learning_rate': 9.367364039401815e-06, 'epoch': 0.74}
25%|██▍ | 2832/11526 [29:32<1:29:01, 1.63it/s] 25%|██▍ | 2833/11526 [29:33<1:28:59, 1.63it/s] {'loss': 0.3002, 'grad_norm': 0.6318361163139343, 'learning_rate': 9.366626561357254e-06, 'epoch': 0.74}
25%|██▍ | 2833/11526 [29:33<1:28:59, 1.63it/s] 25%|██▍ | 2834/11526 [29:33<1:28:58, 1.63it/s] {'loss': 0.2583, 'grad_norm': 0.6193525791168213, 'learning_rate': 9.365888682780862e-06, 'epoch': 0.74}
25%|██▍ | 2834/11526 [29:33<1:28:58, 1.63it/s] 25%|██▍ | 2835/11526 [29:34<1:28:54, 1.63it/s] {'loss': 0.286, 'grad_norm': 0.5068265199661255, 'learning_rate': 9.365150403740316e-06, 'epoch': 0.74}
25%|██▍ | 2835/11526 [29:34<1:28:54, 1.63it/s] 25%|██▍ | 2836/11526 [29:35<1:28:54, 1.63it/s] {'loss': 0.3166, 'grad_norm': 0.6237710118293762, 'learning_rate': 9.36441172430334e-06, 'epoch': 0.74}
25%|██▍ | 2836/11526 [29:35<1:28:54, 1.63it/s] 25%|██▍ | 2837/11526 [29:35<1:28:56, 1.63it/s] {'loss': 0.3336, 'grad_norm': 0.6647444367408752, 'learning_rate': 9.363672644537688e-06, 'epoch': 0.74}
25%|██▍ | 2837/11526 [29:35<1:28:56, 1.63it/s] 25%|██▍ | 2838/11526 [29:36<1:28:56, 1.63it/s] {'loss': 0.1883, 'grad_norm': 0.48184654116630554, 'learning_rate': 9.362933164511152e-06, 'epoch': 0.74}
25%|██▍ | 2838/11526 [29:36<1:28:56, 1.63it/s] 25%|██▍ | 2839/11526 [29:36<1:28:58, 1.63it/s] {'loss': 0.2274, 'grad_norm': 0.5521653294563293, 'learning_rate': 9.362193284291563e-06, 'epoch': 0.74}
25%|██▍ | 2839/11526 [29:37<1:28:58, 1.63it/s] 25%|██▍ | 2840/11526 [29:37<1:28:56, 1.63it/s] {'loss': 0.2327, 'grad_norm': 0.5091734528541565, 'learning_rate': 9.361453003946787e-06, 'epoch': 0.74}
25%|██▍ | 2840/11526 [29:37<1:28:56, 1.63it/s] 25%|██▍ | 2841/11526 [29:38<1:28:54, 1.63it/s] {'loss': 0.2347, 'grad_norm': 0.5357245802879333, 'learning_rate': 9.360712323544723e-06, 'epoch': 0.74}
25%|██▍ | 2841/11526 [29:38<1:28:54, 1.63it/s] 25%|██▍ | 2842/11526 [29:38<1:28:55, 1.63it/s] {'loss': 0.2766, 'grad_norm': 0.5695508718490601, 'learning_rate': 9.359971243153316e-06, 'epoch': 0.74}
25%|██▍ | 2842/11526 [29:38<1:28:55, 1.63it/s] 25%|██▍ | 2843/11526 [29:39<1:28:50, 1.63it/s] {'loss': 0.237, 'grad_norm': 0.5316282510757446, 'learning_rate': 9.359229762840538e-06, 'epoch': 0.74}
25%|██▍ | 2843/11526 [29:39<1:28:50, 1.63it/s] 25%|██▍ | 2844/11526 [29:39<1:28:51, 1.63it/s] {'loss': 0.3316, 'grad_norm': 0.6161412000656128, 'learning_rate': 9.358487882674404e-06, 'epoch': 0.74}
25%|██▍ | 2844/11526 [29:40<1:28:51, 1.63it/s] 25%|██▍ | 2845/11526 [29:40<1:28:51, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.55083829164505, 'learning_rate': 9.357745602722962e-06, 'epoch': 0.74}
25%|██▍ | 2845/11526 [29:40<1:28:51, 1.63it/s] 25%|██▍ | 2846/11526 [29:41<1:28:50, 1.63it/s] {'loss': 0.2524, 'grad_norm': 0.5897186994552612, 'learning_rate': 9.357002923054299e-06, 'epoch': 0.74}
25%|██▍ | 2846/11526 [29:41<1:28:50, 1.63it/s] 25%|██▍ | 2847/11526 [29:41<1:28:55, 1.63it/s] {'loss': 0.2129, 'grad_norm': 0.46575963497161865, 'learning_rate': 9.356259843736537e-06, 'epoch': 0.74}
25%|██▍ | 2847/11526 [29:41<1:28:55, 1.63it/s] 25%|██▍ | 2848/11526 [29:42<1:28:52, 1.63it/s] {'loss': 0.2201, 'grad_norm': 0.4618547558784485, 'learning_rate': 9.355516364837837e-06, 'epoch': 0.74}
25%|██▍ | 2848/11526 [29:42<1:28:52, 1.63it/s] 25%|██▍ | 2849/11526 [29:43<1:28:48, 1.63it/s] {'loss': 0.2407, 'grad_norm': 0.5728344321250916, 'learning_rate': 9.354772486426394e-06, 'epoch': 0.74}
25%|██▍ | 2849/11526 [29:43<1:28:48, 1.63it/s] 25%|██▍ | 2850/11526 [29:43<1:28:50, 1.63it/s] {'loss': 0.2715, 'grad_norm': 0.4851275086402893, 'learning_rate': 9.35402820857044e-06, 'epoch': 0.74}
25%|██▍ | 2850/11526 [29:43<1:28:50, 1.63it/s] 25%|██▍ | 2851/11526 [29:44<1:28:47, 1.63it/s] {'loss': 0.2356, 'grad_norm': 0.466187983751297, 'learning_rate': 9.353283531338246e-06, 'epoch': 0.74}
25%|██▍ | 2851/11526 [29:44<1:28:47, 1.63it/s] 25%|██▍ | 2852/11526 [29:44<1:28:54, 1.63it/s] {'loss': 0.3272, 'grad_norm': 0.6190059781074524, 'learning_rate': 9.352538454798117e-06, 'epoch': 0.74}
25%|██▍ | 2852/11526 [29:45<1:28:54, 1.63it/s] 25%|██▍ | 2853/11526 [29:45<1:28:51, 1.63it/s] {'loss': 0.3419, 'grad_norm': 0.7135983109474182, 'learning_rate': 9.351792979018396e-06, 'epoch': 0.74}
25%|██▍ | 2853/11526 [29:45<1:28:51, 1.63it/s] 25%|██▍ | 2854/11526 [29:46<1:28:48, 1.63it/s] {'loss': 0.2917, 'grad_norm': 0.6021162867546082, 'learning_rate': 9.351047104067462e-06, 'epoch': 0.74}
25%|██▍ | 2854/11526 [29:46<1:28:48, 1.63it/s] 25%|██▍ | 2855/11526 [29:46<1:28:53, 1.63it/s] {'loss': 0.2279, 'grad_norm': 0.5527094602584839, 'learning_rate': 9.350300830013732e-06, 'epoch': 0.74}
25%|██▍ | 2855/11526 [29:46<1:28:53, 1.63it/s] 25%|██▍ | 2856/11526 [29:47<1:28:46, 1.63it/s] {'loss': 0.3171, 'grad_norm': 0.6137954592704773, 'learning_rate': 9.349554156925657e-06, 'epoch': 0.74}
25%|██▍ | 2856/11526 [29:47<1:28:46, 1.63it/s] 25%|██▍ | 2857/11526 [29:47<1:28:49, 1.63it/s] {'loss': 0.3101, 'grad_norm': 0.5946828126907349, 'learning_rate': 9.348807084871727e-06, 'epoch': 0.74}
25%|██▍ | 2857/11526 [29:48<1:28:49, 1.63it/s] 25%|██▍ | 2858/11526 [29:48<1:28:45, 1.63it/s] {'loss': 0.2283, 'grad_norm': 0.5120584964752197, 'learning_rate': 9.348059613920468e-06, 'epoch': 0.74}
25%|██▍ | 2858/11526 [29:48<1:28:45, 1.63it/s] 25%|██▍ | 2859/11526 [29:49<1:28:43, 1.63it/s] {'loss': 0.2338, 'grad_norm': 0.531356930732727, 'learning_rate': 9.347311744140441e-06, 'epoch': 0.74}
25%|██▍ | 2859/11526 [29:49<1:28:43, 1.63it/s] 25%|██▍ | 2860/11526 [29:49<1:28:48, 1.63it/s] {'loss': 0.2708, 'grad_norm': 0.5577950477600098, 'learning_rate': 9.346563475600247e-06, 'epoch': 0.74}
25%|██▍ | 2860/11526 [29:49<1:28:48, 1.63it/s] 25%|██▍ | 2861/11526 [29:50<1:28:47, 1.63it/s] {'loss': 0.2749, 'grad_norm': 0.5008726119995117, 'learning_rate': 9.345814808368519e-06, 'epoch': 0.74}
25%|██▍ | 2861/11526 [29:50<1:28:47, 1.63it/s] 25%|██▍ | 2862/11526 [29:51<1:28:51, 1.62it/s] {'loss': 0.3268, 'grad_norm': 0.6654064655303955, 'learning_rate': 9.34506574251393e-06, 'epoch': 0.74}
25%|██▍ | 2862/11526 [29:51<1:28:51, 1.62it/s] 25%|██▍ | 2863/11526 [29:51<1:28:48, 1.63it/s] {'loss': 0.3589, 'grad_norm': 0.5986437201499939, 'learning_rate': 9.34431627810519e-06, 'epoch': 0.75}
25%|██▍ | 2863/11526 [29:51<1:28:48, 1.63it/s] 25%|██▍ | 2864/11526 [29:52<1:28:43, 1.63it/s] {'loss': 0.2485, 'grad_norm': 0.5361539721488953, 'learning_rate': 9.343566415211041e-06, 'epoch': 0.75}
25%|██▍ | 2864/11526 [29:52<1:28:43, 1.63it/s] 25%|██▍ | 2865/11526 [29:52<1:29:01, 1.62it/s] {'loss': 0.1977, 'grad_norm': 0.5429577231407166, 'learning_rate': 9.342816153900267e-06, 'epoch': 0.75}
25%|██▍ | 2865/11526 [29:53<1:29:01, 1.62it/s] 25%|██▍ | 2866/11526 [29:53<1:28:54, 1.62it/s] {'loss': 0.3165, 'grad_norm': 0.5791149139404297, 'learning_rate': 9.342065494241686e-06, 'epoch': 0.75}
25%|██▍ | 2866/11526 [29:53<1:28:54, 1.62it/s] 25%|██▍ | 2867/11526 [29:54<1:28:58, 1.62it/s] {'loss': 0.2197, 'grad_norm': 0.42681679129600525, 'learning_rate': 9.341314436304152e-06, 'epoch': 0.75}
25%|██▍ | 2867/11526 [29:54<1:28:58, 1.62it/s] 25%|██▍ | 2868/11526 [29:54<1:28:49, 1.62it/s] {'loss': 0.3334, 'grad_norm': 0.7058205008506775, 'learning_rate': 9.340562980156558e-06, 'epoch': 0.75}
25%|██▍ | 2868/11526 [29:54<1:28:49, 1.62it/s] 25%|██▍ | 2869/11526 [29:55<1:28:43, 1.63it/s] {'loss': 0.258, 'grad_norm': 0.6134601831436157, 'learning_rate': 9.33981112586783e-06, 'epoch': 0.75}
25%|██▍ | 2869/11526 [29:55<1:28:43, 1.63it/s] 25%|██▍ | 2870/11526 [29:55<1:28:47, 1.62it/s] {'loss': 0.3215, 'grad_norm': 2.4459071159362793, 'learning_rate': 9.339058873506933e-06, 'epoch': 0.75}
25%|██▍ | 2870/11526 [29:56<1:28:47, 1.62it/s] 25%|██▍ | 2871/11526 [29:56<1:28:43, 1.63it/s] {'loss': 0.2706, 'grad_norm': 0.5898390412330627, 'learning_rate': 9.338306223142868e-06, 'epoch': 0.75}
25%|██▍ | 2871/11526 [29:56<1:28:43, 1.63it/s] 25%|██▍ | 2872/11526 [29:57<1:28:48, 1.62it/s] {'loss': 0.2513, 'grad_norm': 0.5420616865158081, 'learning_rate': 9.337553174844673e-06, 'epoch': 0.75}
25%|██▍ | 2872/11526 [29:57<1:28:48, 1.62it/s] 25%|██▍ | 2873/11526 [29:57<1:28:43, 1.63it/s] {'loss': 0.294, 'grad_norm': 0.5766288042068481, 'learning_rate': 9.33679972868142e-06, 'epoch': 0.75}
25%|██▍ | 2873/11526 [29:57<1:28:43, 1.63it/s] 25%|██▍ | 2874/11526 [29:58<1:28:37, 1.63it/s] {'loss': 0.3519, 'grad_norm': 0.5749242901802063, 'learning_rate': 9.336045884722222e-06, 'epoch': 0.75}
25%|██▍ | 2874/11526 [29:58<1:28:37, 1.63it/s] 25%|██▍ | 2875/11526 [29:59<1:28:37, 1.63it/s] {'loss': 0.2713, 'grad_norm': 0.5332265496253967, 'learning_rate': 9.335291643036221e-06, 'epoch': 0.75}
25%|██▍ | 2875/11526 [29:59<1:28:37, 1.63it/s] 25%|██▍ | 2876/11526 [29:59<1:28:33, 1.63it/s] {'loss': 0.265, 'grad_norm': 0.5913083553314209, 'learning_rate': 9.334537003692608e-06, 'epoch': 0.75}
25%|██▍ | 2876/11526 [29:59<1:28:33, 1.63it/s] 25%|██▍ | 2877/11526 [30:00<1:28:35, 1.63it/s] {'loss': 0.2016, 'grad_norm': 0.4915767014026642, 'learning_rate': 9.333781966760595e-06, 'epoch': 0.75}
25%|██▍ | 2877/11526 [30:00<1:28:35, 1.63it/s] 25%|██▍ | 2878/11526 [30:00<1:28:33, 1.63it/s] {'loss': 0.3099, 'grad_norm': 0.5370355844497681, 'learning_rate': 9.333026532309444e-06, 'epoch': 0.75}
25%|██▍ | 2878/11526 [30:01<1:28:33, 1.63it/s] 25%|██▍ | 2879/11526 [30:01<1:28:30, 1.63it/s] {'loss': 0.3206, 'grad_norm': 0.5754830241203308, 'learning_rate': 9.332270700408446e-06, 'epoch': 0.75}
25%|██▍ | 2879/11526 [30:01<1:28:30, 1.63it/s] 25%|██▍ | 2880/11526 [30:02<1:28:27, 1.63it/s] {'loss': 0.2427, 'grad_norm': 0.5332116484642029, 'learning_rate': 9.331514471126927e-06, 'epoch': 0.75}
25%|██▍ | 2880/11526 [30:02<1:28:27, 1.63it/s] 25%|██▍ | 2881/11526 [30:02<1:28:27, 1.63it/s] {'loss': 0.2211, 'grad_norm': 0.4793088436126709, 'learning_rate': 9.330757844534257e-06, 'epoch': 0.75}
25%|██▍ | 2881/11526 [30:02<1:28:27, 1.63it/s] 25%|██▌ | 2882/11526 [30:03<1:28:28, 1.63it/s] {'loss': 0.2044, 'grad_norm': 0.5243709087371826, 'learning_rate': 9.330000820699839e-06, 'epoch': 0.75}
25%|██▌ | 2882/11526 [30:03<1:28:28, 1.63it/s] 25%|██▌ | 2883/11526 [30:03<1:28:29, 1.63it/s] {'loss': 0.2445, 'grad_norm': 0.4656815528869629, 'learning_rate': 9.329243399693106e-06, 'epoch': 0.75}
25%|██▌ | 2883/11526 [30:04<1:28:29, 1.63it/s] 25%|██▌ | 2884/11526 [30:04<1:28:28, 1.63it/s] {'loss': 0.2992, 'grad_norm': 0.545238733291626, 'learning_rate': 9.328485581583536e-06, 'epoch': 0.75}
25%|██▌ | 2884/11526 [30:04<1:28:28, 1.63it/s] 25%|██▌ | 2885/11526 [30:05<1:28:27, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.5134063363075256, 'learning_rate': 9.32772736644064e-06, 'epoch': 0.75}
25%|██▌ | 2885/11526 [30:05<1:28:27, 1.63it/s] 25%|██▌ | 2886/11526 [30:05<1:28:26, 1.63it/s] {'loss': 0.3513, 'grad_norm': 0.6591767072677612, 'learning_rate': 9.326968754333968e-06, 'epoch': 0.75}
25%|██▌ | 2886/11526 [30:05<1:28:26, 1.63it/s] 25%|██▌ | 2887/11526 [30:06<1:28:27, 1.63it/s] {'loss': 0.2622, 'grad_norm': 0.5410744547843933, 'learning_rate': 9.326209745333101e-06, 'epoch': 0.75}
25%|██▌ | 2887/11526 [30:06<1:28:27, 1.63it/s] 25%|██▌ | 2888/11526 [30:07<1:28:25, 1.63it/s] {'loss': 0.3209, 'grad_norm': 0.5687559843063354, 'learning_rate': 9.325450339507662e-06, 'epoch': 0.75}
25%|██▌ | 2888/11526 [30:07<1:28:25, 1.63it/s] 25%|██▌ | 2889/11526 [30:07<1:28:23, 1.63it/s] {'loss': 0.3201, 'grad_norm': 0.6196237802505493, 'learning_rate': 9.324690536927307e-06, 'epoch': 0.75}
25%|██▌ | 2889/11526 [30:07<1:28:23, 1.63it/s] 25%|██▌ | 2890/11526 [30:08<1:28:22, 1.63it/s] {'loss': 0.1952, 'grad_norm': 0.43836069107055664, 'learning_rate': 9.32393033766173e-06, 'epoch': 0.75}
25%|██▌ | 2890/11526 [30:08<1:28:22, 1.63it/s] 25%|██▌ | 2891/11526 [30:08<1:28:23, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.5405975580215454, 'learning_rate': 9.32316974178066e-06, 'epoch': 0.75}
25%|██▌ | 2891/11526 [30:09<1:28:23, 1.63it/s] 25%|██▌ | 2892/11526 [30:09<1:28:30, 1.63it/s] {'loss': 0.4097, 'grad_norm': 0.6298275589942932, 'learning_rate': 9.322408749353863e-06, 'epoch': 0.75}
25%|██▌ | 2892/11526 [30:09<1:28:30, 1.63it/s] 25%|██▌ | 2893/11526 [30:10<1:28:27, 1.63it/s] {'loss': 0.2702, 'grad_norm': 0.6661752462387085, 'learning_rate': 9.32164736045114e-06, 'epoch': 0.75}
25%|██▌ | 2893/11526 [30:10<1:28:27, 1.63it/s] 25%|██▌ | 2894/11526 [30:10<1:28:24, 1.63it/s] {'loss': 0.3442, 'grad_norm': 0.7327930331230164, 'learning_rate': 9.320885575142335e-06, 'epoch': 0.75}
25%|██▌ | 2894/11526 [30:10<1:28:24, 1.63it/s] 25%|██▌ | 2895/11526 [30:11<1:28:21, 1.63it/s] {'loss': 0.2912, 'grad_norm': 0.44607868790626526, 'learning_rate': 9.32012339349732e-06, 'epoch': 0.75}
25%|██▌ | 2895/11526 [30:11<1:28:21, 1.63it/s] 25%|██▌ | 2896/11526 [30:11<1:28:18, 1.63it/s] {'loss': 0.3477, 'grad_norm': 0.7365370392799377, 'learning_rate': 9.319360815586005e-06, 'epoch': 0.75}
25%|██▌ | 2896/11526 [30:12<1:28:18, 1.63it/s] 25%|██▌ | 2897/11526 [30:12<1:28:24, 1.63it/s] {'loss': 0.3379, 'grad_norm': 0.6488296985626221, 'learning_rate': 9.31859784147834e-06, 'epoch': 0.75}
25%|██▌ | 2897/11526 [30:12<1:28:24, 1.63it/s] 25%|██▌ | 2898/11526 [30:13<1:28:22, 1.63it/s] {'loss': 0.2594, 'grad_norm': 0.5244240760803223, 'learning_rate': 9.31783447124431e-06, 'epoch': 0.75}
25%|██▌ | 2898/11526 [30:13<1:28:22, 1.63it/s] 25%|██▌ | 2899/11526 [30:13<1:28:17, 1.63it/s] {'loss': 0.2895, 'grad_norm': 0.502099871635437, 'learning_rate': 9.317070704953937e-06, 'epoch': 0.75}
25%|██▌ | 2899/11526 [30:13<1:28:17, 1.63it/s] 25%|██▌ | 2900/11526 [30:14<1:28:17, 1.63it/s] {'loss': 0.2865, 'grad_norm': 0.6359516978263855, 'learning_rate': 9.316306542677273e-06, 'epoch': 0.75}
25%|██▌ | 2900/11526 [30:14<1:28:17, 1.63it/s] 25%|██▌ | 2901/11526 [30:15<1:28:16, 1.63it/s] {'loss': 0.3038, 'grad_norm': 0.605526864528656, 'learning_rate': 9.315541984484414e-06, 'epoch': 0.76}
25%|██▌ | 2901/11526 [30:15<1:28:16, 1.63it/s] 25%|██▌ | 2902/11526 [30:15<1:28:22, 1.63it/s] {'loss': 0.2511, 'grad_norm': 0.573546290397644, 'learning_rate': 9.314777030445491e-06, 'epoch': 0.76}
25%|██▌ | 2902/11526 [30:15<1:28:22, 1.63it/s] 25%|██▌ | 2903/11526 [30:16<1:28:32, 1.62it/s] {'loss': 0.2875, 'grad_norm': 0.5146523714065552, 'learning_rate': 9.314011680630668e-06, 'epoch': 0.76}
25%|██▌ | 2903/11526 [30:16<1:28:32, 1.62it/s] 25%|██▌ | 2904/11526 [30:16<1:28:30, 1.62it/s] {'loss': 0.3819, 'grad_norm': 0.6581248641014099, 'learning_rate': 9.31324593511015e-06, 'epoch': 0.76}
25%|██▌ | 2904/11526 [30:17<1:28:30, 1.62it/s] 25%|██▌ | 2905/11526 [30:17<1:28:24, 1.63it/s] {'loss': 0.3419, 'grad_norm': 0.6775704622268677, 'learning_rate': 9.31247979395417e-06, 'epoch': 0.76}
25%|██▌ | 2905/11526 [30:17<1:28:24, 1.63it/s] 25%|██▌ | 2906/11526 [30:18<1:28:32, 1.62it/s] {'loss': 0.2026, 'grad_norm': 0.5327399373054504, 'learning_rate': 9.311713257233008e-06, 'epoch': 0.76}
25%|██▌ | 2906/11526 [30:18<1:28:32, 1.62it/s] 25%|██▌ | 2907/11526 [30:18<1:28:49, 1.62it/s] {'loss': 0.3109, 'grad_norm': 0.6610994338989258, 'learning_rate': 9.310946325016974e-06, 'epoch': 0.76}
25%|██▌ | 2907/11526 [30:18<1:28:49, 1.62it/s] 25%|██▌ | 2908/11526 [30:19<1:28:39, 1.62it/s] {'loss': 0.2678, 'grad_norm': 0.49222972989082336, 'learning_rate': 9.310178997376414e-06, 'epoch': 0.76}
25%|██▌ | 2908/11526 [30:19<1:28:39, 1.62it/s] 25%|██▌ | 2909/11526 [30:19<1:28:39, 1.62it/s] {'loss': 0.2433, 'grad_norm': 0.44907617568969727, 'learning_rate': 9.309411274381711e-06, 'epoch': 0.76}
25%|██▌ | 2909/11526 [30:20<1:28:39, 1.62it/s] 25%|██▌ | 2910/11526 [30:20<1:28:27, 1.62it/s] {'loss': 0.2885, 'grad_norm': 0.5357388257980347, 'learning_rate': 9.30864315610329e-06, 'epoch': 0.76}
25%|██▌ | 2910/11526 [30:20<1:28:27, 1.62it/s] 25%|██▌ | 2911/11526 [30:21<1:28:21, 1.63it/s] {'loss': 0.2534, 'grad_norm': 0.5373389720916748, 'learning_rate': 9.307874642611601e-06, 'epoch': 0.76}
25%|██▌ | 2911/11526 [30:21<1:28:21, 1.63it/s] 25%|██▌ | 2912/11526 [30:21<1:28:24, 1.62it/s] {'loss': 0.2583, 'grad_norm': 0.5197340846061707, 'learning_rate': 9.30710573397714e-06, 'epoch': 0.76}
25%|██▌ | 2912/11526 [30:21<1:28:24, 1.62it/s] 25%|██▌ | 2913/11526 [30:22<1:28:20, 1.62it/s] {'loss': 0.2776, 'grad_norm': 0.5483844876289368, 'learning_rate': 9.306336430270433e-06, 'epoch': 0.76}
25%|██▌ | 2913/11526 [30:22<1:28:20, 1.62it/s] 25%|██▌ | 2914/11526 [30:23<1:28:17, 1.63it/s] {'loss': 0.2236, 'grad_norm': 0.4876668155193329, 'learning_rate': 9.305566731562048e-06, 'epoch': 0.76}
25%|██▌ | 2914/11526 [30:23<1:28:17, 1.63it/s] 25%|██▌ | 2915/11526 [30:23<1:28:19, 1.62it/s] {'loss': 0.2811, 'grad_norm': 0.5018674731254578, 'learning_rate': 9.304796637922585e-06, 'epoch': 0.76}
25%|██▌ | 2915/11526 [30:23<1:28:19, 1.62it/s] 25%|██▌ | 2916/11526 [30:24<1:28:18, 1.62it/s] {'loss': 0.3797, 'grad_norm': 0.7595043778419495, 'learning_rate': 9.30402614942268e-06, 'epoch': 0.76}
25%|██▌ | 2916/11526 [30:24<1:28:18, 1.62it/s] 25%|██▌ | 2917/11526 [30:24<1:28:22, 1.62it/s] {'loss': 0.299, 'grad_norm': 0.6091175079345703, 'learning_rate': 9.303255266133008e-06, 'epoch': 0.76}
25%|██▌ | 2917/11526 [30:25<1:28:22, 1.62it/s] 25%|██▌ | 2918/11526 [30:25<1:28:17, 1.62it/s] {'loss': 0.3037, 'grad_norm': 0.6256495714187622, 'learning_rate': 9.30248398812428e-06, 'epoch': 0.76}
25%|██▌ | 2918/11526 [30:25<1:28:17, 1.62it/s] 25%|██▌ | 2919/11526 [30:26<1:28:14, 1.63it/s] {'loss': 0.257, 'grad_norm': 0.5249691009521484, 'learning_rate': 9.301712315467239e-06, 'epoch': 0.76}
25%|██▌ | 2919/11526 [30:26<1:28:14, 1.63it/s] 25%|██▌ | 2920/11526 [30:26<1:28:11, 1.63it/s] {'loss': 0.3272, 'grad_norm': 0.5936935544013977, 'learning_rate': 9.300940248232669e-06, 'epoch': 0.76}
25%|██▌ | 2920/11526 [30:26<1:28:11, 1.63it/s] 25%|██▌ | 2921/11526 [30:27<1:28:07, 1.63it/s] {'loss': 0.2432, 'grad_norm': 0.5392624139785767, 'learning_rate': 9.300167786491387e-06, 'epoch': 0.76}
25%|██▌ | 2921/11526 [30:27<1:28:07, 1.63it/s] 25%|██▌ | 2922/11526 [30:28<1:32:58, 1.54it/s] {'loss': 0.228, 'grad_norm': 0.5518317818641663, 'learning_rate': 9.299394930314251e-06, 'epoch': 0.76}
25%|██▌ | 2922/11526 [30:28<1:32:58, 1.54it/s] 25%|██▌ | 2923/11526 [30:28<1:31:27, 1.57it/s] {'loss': 0.2601, 'grad_norm': 0.562703013420105, 'learning_rate': 9.298621679772148e-06, 'epoch': 0.76}
25%|██▌ | 2923/11526 [30:28<1:31:27, 1.57it/s] 25%|██▌ | 2924/11526 [30:29<1:30:25, 1.59it/s] {'loss': 0.2483, 'grad_norm': 0.5180319547653198, 'learning_rate': 9.297848034936007e-06, 'epoch': 0.76}
25%|██▌ | 2924/11526 [30:29<1:30:25, 1.59it/s] 25%|██▌ | 2925/11526 [30:29<1:29:46, 1.60it/s] {'loss': 0.2892, 'grad_norm': 0.6179478764533997, 'learning_rate': 9.29707399587679e-06, 'epoch': 0.76}
25%|██▌ | 2925/11526 [30:30<1:29:46, 1.60it/s] 25%|██▌ | 2926/11526 [30:30<1:29:32, 1.60it/s] {'loss': 0.3699, 'grad_norm': 0.593598484992981, 'learning_rate': 9.296299562665496e-06, 'epoch': 0.76}
25%|██▌ | 2926/11526 [30:30<1:29:32, 1.60it/s] 25%|██▌ | 2927/11526 [30:31<1:29:03, 1.61it/s] {'loss': 0.2653, 'grad_norm': 0.5299161672592163, 'learning_rate': 9.295524735373163e-06, 'epoch': 0.76}
25%|██▌ | 2927/11526 [30:31<1:29:03, 1.61it/s] 25%|██▌ | 2928/11526 [30:31<1:28:45, 1.61it/s] {'loss': 0.2083, 'grad_norm': 0.5487064123153687, 'learning_rate': 9.29474951407086e-06, 'epoch': 0.76}
25%|██▌ | 2928/11526 [30:31<1:28:45, 1.61it/s] 25%|██▌ | 2929/11526 [30:32<1:28:32, 1.62it/s] {'loss': 0.1947, 'grad_norm': 0.4676041007041931, 'learning_rate': 9.293973898829695e-06, 'epoch': 0.76}
25%|██▌ | 2929/11526 [30:32<1:28:32, 1.62it/s] 25%|██▌ | 2930/11526 [30:32<1:28:25, 1.62it/s] {'loss': 0.2602, 'grad_norm': 0.5500452518463135, 'learning_rate': 9.293197889720812e-06, 'epoch': 0.76}
25%|██▌ | 2930/11526 [30:33<1:28:25, 1.62it/s] 25%|██▌ | 2931/11526 [30:33<1:28:17, 1.62it/s] {'loss': 0.2353, 'grad_norm': 0.5514093041419983, 'learning_rate': 9.292421486815392e-06, 'epoch': 0.76}
25%|██▌ | 2931/11526 [30:33<1:28:17, 1.62it/s] 25%|██▌ | 2932/11526 [30:34<1:28:11, 1.62it/s] {'loss': 0.2356, 'grad_norm': 0.5587570667266846, 'learning_rate': 9.29164469018465e-06, 'epoch': 0.76}
25%|██▌ | 2932/11526 [30:34<1:28:11, 1.62it/s] 25%|██▌ | 2933/11526 [30:34<1:28:08, 1.62it/s] {'loss': 0.2582, 'grad_norm': 0.5402334928512573, 'learning_rate': 9.290867499899839e-06, 'epoch': 0.76}
25%|██▌ | 2933/11526 [30:34<1:28:08, 1.62it/s] 25%|██▌ | 2934/11526 [30:35<1:28:05, 1.63it/s] {'loss': 0.2048, 'grad_norm': 0.4875383973121643, 'learning_rate': 9.290089916032245e-06, 'epoch': 0.76}
25%|██▌ | 2934/11526 [30:35<1:28:05, 1.63it/s] 25%|██▌ | 2935/11526 [30:36<1:28:03, 1.63it/s] {'loss': 0.2725, 'grad_norm': 0.5300759673118591, 'learning_rate': 9.289311938653197e-06, 'epoch': 0.76}
25%|██▌ | 2935/11526 [30:36<1:28:03, 1.63it/s] 25%|██▌ | 2936/11526 [30:36<1:28:00, 1.63it/s] {'loss': 0.2421, 'grad_norm': 0.5436717867851257, 'learning_rate': 9.288533567834052e-06, 'epoch': 0.76}
25%|██▌ | 2936/11526 [30:36<1:28:00, 1.63it/s] 25%|██▌ | 2937/11526 [30:37<1:27:56, 1.63it/s] {'loss': 0.2382, 'grad_norm': 0.5997971296310425, 'learning_rate': 9.287754803646207e-06, 'epoch': 0.76}
25%|██▌ | 2937/11526 [30:37<1:27:56, 1.63it/s] 25%|██▌ | 2938/11526 [30:37<1:27:58, 1.63it/s] {'loss': 0.2491, 'grad_norm': 0.5709542036056519, 'learning_rate': 9.286975646161094e-06, 'epoch': 0.76}
25%|██▌ | 2938/11526 [30:38<1:27:58, 1.63it/s] 25%|██▌ | 2939/11526 [30:38<1:27:53, 1.63it/s] {'loss': 0.2278, 'grad_norm': 0.5008200407028198, 'learning_rate': 9.286196095450185e-06, 'epoch': 0.76}
25%|██▌ | 2939/11526 [30:38<1:27:53, 1.63it/s] 26%|██▌ | 2940/11526 [30:39<1:27:58, 1.63it/s] {'loss': 0.3002, 'grad_norm': 0.561231255531311, 'learning_rate': 9.28541615158498e-06, 'epoch': 0.77}
26%|██▌ | 2940/11526 [30:39<1:27:58, 1.63it/s] 26%|██▌ | 2941/11526 [30:39<1:27:58, 1.63it/s] {'loss': 0.2259, 'grad_norm': 0.539849042892456, 'learning_rate': 9.284635814637027e-06, 'epoch': 0.77}
26%|██▌ | 2941/11526 [30:39<1:27:58, 1.63it/s] 26%|██▌ | 2942/11526 [30:40<1:27:55, 1.63it/s] {'loss': 0.2559, 'grad_norm': 0.5239291787147522, 'learning_rate': 9.283855084677893e-06, 'epoch': 0.77}
26%|██▌ | 2942/11526 [30:40<1:27:55, 1.63it/s] 26%|██▌ | 2943/11526 [30:40<1:27:55, 1.63it/s] {'loss': 0.2519, 'grad_norm': 0.542546808719635, 'learning_rate': 9.283073961779202e-06, 'epoch': 0.77}
26%|██▌ | 2943/11526 [30:41<1:27:55, 1.63it/s] 26%|██▌ | 2944/11526 [30:41<1:27:50, 1.63it/s] {'loss': 0.2825, 'grad_norm': 0.5658879280090332, 'learning_rate': 9.282292446012594e-06, 'epoch': 0.77}
26%|██▌ | 2944/11526 [30:41<1:27:50, 1.63it/s] 26%|██▌ | 2945/11526 [30:42<1:27:56, 1.63it/s] {'loss': 0.2426, 'grad_norm': 0.6272329688072205, 'learning_rate': 9.281510537449759e-06, 'epoch': 0.77}
26%|██▌ | 2945/11526 [30:42<1:27:56, 1.63it/s] 26%|██▌ | 2946/11526 [30:42<1:27:56, 1.63it/s] {'loss': 0.2319, 'grad_norm': 0.5164383053779602, 'learning_rate': 9.280728236162415e-06, 'epoch': 0.77}
26%|██▌ | 2946/11526 [30:42<1:27:56, 1.63it/s] 26%|██▌ | 2947/11526 [30:43<1:27:55, 1.63it/s] {'loss': 0.265, 'grad_norm': 0.529619038105011, 'learning_rate': 9.279945542222321e-06, 'epoch': 0.77}
26%|██▌ | 2947/11526 [30:43<1:27:55, 1.63it/s] 26%|██▌ | 2948/11526 [30:44<1:27:56, 1.63it/s] {'loss': 0.2697, 'grad_norm': 0.6109445095062256, 'learning_rate': 9.279162455701271e-06, 'epoch': 0.77}
26%|██▌ | 2948/11526 [30:44<1:27:56, 1.63it/s] 26%|██▌ | 2949/11526 [30:44<1:27:54, 1.63it/s] {'loss': 0.2404, 'grad_norm': 0.49050015211105347, 'learning_rate': 9.278378976671095e-06, 'epoch': 0.77}
26%|██▌ | 2949/11526 [30:44<1:27:54, 1.63it/s] 26%|██▌ | 2950/11526 [30:45<1:27:55, 1.63it/s] {'loss': 0.2961, 'grad_norm': 0.5995020866394043, 'learning_rate': 9.277595105203654e-06, 'epoch': 0.77}
26%|██▌ | 2950/11526 [30:45<1:27:55, 1.63it/s] 26%|██▌ | 2951/11526 [30:45<1:27:53, 1.63it/s] {'loss': 0.2448, 'grad_norm': 0.5898842215538025, 'learning_rate': 9.276810841370852e-06, 'epoch': 0.77}
26%|██▌ | 2951/11526 [30:46<1:27:53, 1.63it/s] 26%|██▌ | 2952/11526 [30:46<1:27:48, 1.63it/s] {'loss': 0.3344, 'grad_norm': 0.5694189667701721, 'learning_rate': 9.276026185244625e-06, 'epoch': 0.77}
26%|██▌ | 2952/11526 [30:46<1:27:48, 1.63it/s] 26%|██▌ | 2953/11526 [30:47<1:27:55, 1.63it/s] {'loss': 0.1885, 'grad_norm': 0.43999406695365906, 'learning_rate': 9.275241136896946e-06, 'epoch': 0.77}
26%|██▌ | 2953/11526 [30:47<1:27:55, 1.63it/s] 26%|██▌ | 2954/11526 [30:47<1:27:48, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5122827887535095, 'learning_rate': 9.274455696399826e-06, 'epoch': 0.77}
26%|██▌ | 2954/11526 [30:47<1:27:48, 1.63it/s] 26%|██▌ | 2955/11526 [30:48<1:27:52, 1.63it/s] {'loss': 0.3264, 'grad_norm': 0.6886528730392456, 'learning_rate': 9.27366986382531e-06, 'epoch': 0.77}
26%|██▌ | 2955/11526 [30:48<1:27:52, 1.63it/s] 26%|██▌ | 2956/11526 [30:48<1:27:50, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.53089839220047, 'learning_rate': 9.272883639245476e-06, 'epoch': 0.77}
26%|██▌ | 2956/11526 [30:49<1:27:50, 1.63it/s] 26%|██▌ | 2957/11526 [30:49<1:27:47, 1.63it/s] {'loss': 0.2554, 'grad_norm': 0.5658058524131775, 'learning_rate': 9.272097022732444e-06, 'epoch': 0.77}
26%|██▌ | 2957/11526 [30:49<1:27:47, 1.63it/s] 26%|██▌ | 2958/11526 [30:50<1:27:54, 1.62it/s] {'loss': 0.2771, 'grad_norm': 0.6663472652435303, 'learning_rate': 9.271310014358365e-06, 'epoch': 0.77}
26%|██▌ | 2958/11526 [30:50<1:27:54, 1.62it/s] 26%|██▌ | 2959/11526 [30:50<1:27:50, 1.63it/s] {'loss': 0.2519, 'grad_norm': 0.5272754430770874, 'learning_rate': 9.270522614195429e-06, 'epoch': 0.77}
26%|██▌ | 2959/11526 [30:50<1:27:50, 1.63it/s] 26%|██▌ | 2960/11526 [30:51<1:27:57, 1.62it/s] {'loss': 0.1995, 'grad_norm': 0.47067776322364807, 'learning_rate': 9.269734822315862e-06, 'epoch': 0.77}
26%|██▌ | 2960/11526 [30:51<1:27:57, 1.62it/s] 26%|██▌ | 2961/11526 [30:52<1:27:49, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5261691808700562, 'learning_rate': 9.268946638791921e-06, 'epoch': 0.77}
26%|██▌ | 2961/11526 [30:52<1:27:49, 1.63it/s] 26%|██▌ | 2962/11526 [30:52<1:27:45, 1.63it/s] {'loss': 0.2809, 'grad_norm': 0.6042687892913818, 'learning_rate': 9.268158063695908e-06, 'epoch': 0.77}
26%|██▌ | 2962/11526 [30:52<1:27:45, 1.63it/s] 26%|██▌ | 2963/11526 [30:53<1:27:48, 1.63it/s] {'loss': 0.2377, 'grad_norm': 0.5699784159660339, 'learning_rate': 9.267369097100148e-06, 'epoch': 0.77}
26%|██▌ | 2963/11526 [30:53<1:27:48, 1.63it/s] 26%|██▌ | 2964/11526 [30:53<1:27:44, 1.63it/s] {'loss': 0.2603, 'grad_norm': 0.5299711227416992, 'learning_rate': 9.266579739077017e-06, 'epoch': 0.77}
26%|██▌ | 2964/11526 [30:54<1:27:44, 1.63it/s] 26%|██▌ | 2965/11526 [30:54<1:27:46, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.5365992188453674, 'learning_rate': 9.265789989698917e-06, 'epoch': 0.77}
26%|██▌ | 2965/11526 [30:54<1:27:46, 1.63it/s] 26%|██▌ | 2966/11526 [30:55<1:27:49, 1.62it/s] {'loss': 0.2272, 'grad_norm': 0.4875604212284088, 'learning_rate': 9.264999849038287e-06, 'epoch': 0.77}
26%|██▌ | 2966/11526 [30:55<1:27:49, 1.62it/s] 26%|██▌ | 2967/11526 [30:55<1:27:45, 1.63it/s] {'loss': 0.3327, 'grad_norm': 0.5189756751060486, 'learning_rate': 9.264209317167602e-06, 'epoch': 0.77}
26%|██▌ | 2967/11526 [30:55<1:27:45, 1.63it/s] 26%|██▌ | 2968/11526 [30:56<1:27:45, 1.63it/s] {'loss': 0.3199, 'grad_norm': 0.5775659680366516, 'learning_rate': 9.26341839415938e-06, 'epoch': 0.77}
26%|██▌ | 2968/11526 [30:56<1:27:45, 1.63it/s] 26%|██▌ | 2969/11526 [30:56<1:27:41, 1.63it/s] {'loss': 0.3383, 'grad_norm': 0.5440739989280701, 'learning_rate': 9.262627080086163e-06, 'epoch': 0.77}
26%|██▌ | 2969/11526 [30:57<1:27:41, 1.63it/s] 26%|██▌ | 2970/11526 [30:57<1:27:50, 1.62it/s] {'loss': 0.2315, 'grad_norm': 0.603252112865448, 'learning_rate': 9.261835375020536e-06, 'epoch': 0.77}
26%|██▌ | 2970/11526 [30:57<1:27:50, 1.62it/s] 26%|██▌ | 2971/11526 [30:58<1:27:45, 1.62it/s] {'loss': 0.262, 'grad_norm': 0.6548337936401367, 'learning_rate': 9.26104327903512e-06, 'epoch': 0.77}
26%|██▌ | 2971/11526 [30:58<1:27:45, 1.62it/s] 26%|██▌ | 2972/11526 [30:58<1:27:41, 1.63it/s] {'loss': 0.296, 'grad_norm': 0.6120018362998962, 'learning_rate': 9.260250792202572e-06, 'epoch': 0.77}
26%|██▌ | 2972/11526 [30:58<1:27:41, 1.63it/s] 26%|██▌ | 2973/11526 [30:59<1:27:44, 1.62it/s] {'loss': 0.3135, 'grad_norm': 0.5405257940292358, 'learning_rate': 9.25945791459558e-06, 'epoch': 0.77}
26%|██▌ | 2973/11526 [30:59<1:27:44, 1.62it/s] 26%|██▌ | 2974/11526 [31:00<1:27:40, 1.63it/s] {'loss': 0.2704, 'grad_norm': 0.5283586382865906, 'learning_rate': 9.258664646286872e-06, 'epoch': 0.77}
26%|██▌ | 2974/11526 [31:00<1:27:40, 1.63it/s] 26%|██▌ | 2975/11526 [31:00<1:32:34, 1.54it/s] {'loss': 0.1927, 'grad_norm': 0.4964679479598999, 'learning_rate': 9.257870987349213e-06, 'epoch': 0.77}
26%|██▌ | 2975/11526 [31:00<1:32:34, 1.54it/s] 26%|██▌ | 2976/11526 [31:01<1:31:02, 1.57it/s] {'loss': 0.2194, 'grad_norm': 0.5207916498184204, 'learning_rate': 9.257076937855402e-06, 'epoch': 0.77}
26%|██▌ | 2976/11526 [31:01<1:31:02, 1.57it/s] 26%|██▌ | 2977/11526 [31:02<1:29:58, 1.58it/s] {'loss': 0.2057, 'grad_norm': 0.496451735496521, 'learning_rate': 9.25628249787827e-06, 'epoch': 0.77}
26%|██▌ | 2977/11526 [31:02<1:29:58, 1.58it/s] 26%|██▌ | 2978/11526 [31:02<1:29:10, 1.60it/s] {'loss': 0.3555, 'grad_norm': 0.7259489893913269, 'learning_rate': 9.255487667490691e-06, 'epoch': 0.78}
26%|██▌ | 2978/11526 [31:02<1:29:10, 1.60it/s] 26%|██▌ | 2979/11526 [31:03<1:28:39, 1.61it/s] {'loss': 0.2503, 'grad_norm': 0.5827690362930298, 'learning_rate': 9.25469244676557e-06, 'epoch': 0.78}
26%|██▌ | 2979/11526 [31:03<1:28:39, 1.61it/s] 26%|██▌ | 2980/11526 [31:03<1:28:16, 1.61it/s] {'loss': 0.25, 'grad_norm': 0.5382062792778015, 'learning_rate': 9.253896835775851e-06, 'epoch': 0.78}
26%|██▌ | 2980/11526 [31:04<1:28:16, 1.61it/s] 26%|██▌ | 2981/11526 [31:04<1:32:41, 1.54it/s] {'loss': 0.2449, 'grad_norm': 0.5798316597938538, 'learning_rate': 9.25310083459451e-06, 'epoch': 0.78}
26%|██▌ | 2981/11526 [31:04<1:32:41, 1.54it/s] 26%|██▌ | 2982/11526 [31:05<1:31:06, 1.56it/s] {'loss': 0.2247, 'grad_norm': 0.531351625919342, 'learning_rate': 9.25230444329456e-06, 'epoch': 0.78}
26%|██▌ | 2982/11526 [31:05<1:31:06, 1.56it/s] 26%|██▌ | 2983/11526 [31:05<1:30:02, 1.58it/s] {'loss': 0.3276, 'grad_norm': 0.6539353728294373, 'learning_rate': 9.251507661949052e-06, 'epoch': 0.78}
26%|██▌ | 2983/11526 [31:05<1:30:02, 1.58it/s] 26%|██▌ | 2984/11526 [31:06<1:29:16, 1.59it/s] {'loss': 0.3119, 'grad_norm': 0.504713773727417, 'learning_rate': 9.250710490631073e-06, 'epoch': 0.78}
26%|██▌ | 2984/11526 [31:06<1:29:16, 1.59it/s] 26%|██▌ | 2985/11526 [31:07<1:28:41, 1.61it/s] {'loss': 0.2092, 'grad_norm': 0.49391892552375793, 'learning_rate': 9.24991292941374e-06, 'epoch': 0.78}
26%|██▌ | 2985/11526 [31:07<1:28:41, 1.61it/s] 26%|██▌ | 2986/11526 [31:07<1:28:17, 1.61it/s] {'loss': 0.2219, 'grad_norm': 0.5561579465866089, 'learning_rate': 9.249114978370214e-06, 'epoch': 0.78}
26%|██▌ | 2986/11526 [31:07<1:28:17, 1.61it/s] 26%|██▌ | 2987/11526 [31:08<1:28:01, 1.62it/s] {'loss': 0.222, 'grad_norm': 0.4886794984340668, 'learning_rate': 9.248316637573686e-06, 'epoch': 0.78}
26%|██▌ | 2987/11526 [31:08<1:28:01, 1.62it/s] 26%|██▌ | 2988/11526 [31:08<1:27:56, 1.62it/s] {'loss': 0.1987, 'grad_norm': 0.4607357978820801, 'learning_rate': 9.247517907097382e-06, 'epoch': 0.78}
26%|██▌ | 2988/11526 [31:09<1:27:56, 1.62it/s] 26%|██▌ | 2989/11526 [31:09<1:27:46, 1.62it/s] {'loss': 0.2769, 'grad_norm': 0.6536540985107422, 'learning_rate': 9.246718787014571e-06, 'epoch': 0.78}
26%|██▌ | 2989/11526 [31:09<1:27:46, 1.62it/s] 26%|██▌ | 2990/11526 [31:10<1:27:42, 1.62it/s] {'loss': 0.1871, 'grad_norm': 0.44145941734313965, 'learning_rate': 9.245919277398548e-06, 'epoch': 0.78}
26%|██▌ | 2990/11526 [31:10<1:27:42, 1.62it/s] 26%|██▌ | 2991/11526 [31:10<1:27:40, 1.62it/s] {'loss': 0.3791, 'grad_norm': 0.6169655323028564, 'learning_rate': 9.24511937832265e-06, 'epoch': 0.78}
26%|██▌ | 2991/11526 [31:10<1:27:40, 1.62it/s] 26%|██▌ | 2992/11526 [31:11<1:27:33, 1.62it/s] {'loss': 0.2095, 'grad_norm': 0.5082868933677673, 'learning_rate': 9.244319089860249e-06, 'epoch': 0.78}
26%|██▌ | 2992/11526 [31:11<1:27:33, 1.62it/s] 26%|██▌ | 2993/11526 [31:11<1:27:35, 1.62it/s] {'loss': 0.2212, 'grad_norm': 0.4417838156223297, 'learning_rate': 9.243518412084752e-06, 'epoch': 0.78}
26%|██▌ | 2993/11526 [31:12<1:27:35, 1.62it/s] 26%|██▌ | 2994/11526 [31:12<1:27:30, 1.62it/s] {'loss': 0.252, 'grad_norm': 0.5127540826797485, 'learning_rate': 9.242717345069603e-06, 'epoch': 0.78}
26%|██▌ | 2994/11526 [31:12<1:27:30, 1.62it/s] 26%|██▌ | 2995/11526 [31:13<1:27:26, 1.63it/s] {'loss': 0.2234, 'grad_norm': 0.4395052194595337, 'learning_rate': 9.24191588888828e-06, 'epoch': 0.78}
26%|██▌ | 2995/11526 [31:13<1:27:26, 1.63it/s] 26%|██▌ | 2996/11526 [31:13<1:27:27, 1.63it/s] {'loss': 0.2228, 'grad_norm': 0.5334908962249756, 'learning_rate': 9.241114043614294e-06, 'epoch': 0.78}
26%|██▌ | 2996/11526 [31:13<1:27:27, 1.63it/s] 26%|██▌ | 2997/11526 [31:14<1:27:25, 1.63it/s] {'loss': 0.2897, 'grad_norm': 0.6149044632911682, 'learning_rate': 9.240311809321196e-06, 'epoch': 0.78}
26%|██▌ | 2997/11526 [31:14<1:27:25, 1.63it/s] 26%|██▌ | 2998/11526 [31:15<1:27:26, 1.63it/s] {'loss': 0.4147, 'grad_norm': 0.738267183303833, 'learning_rate': 9.239509186082574e-06, 'epoch': 0.78}
26%|██▌ | 2998/11526 [31:15<1:27:26, 1.63it/s] 26%|██▌ | 2999/11526 [31:15<1:27:20, 1.63it/s] {'loss': 0.2539, 'grad_norm': 0.4786839783191681, 'learning_rate': 9.238706173972048e-06, 'epoch': 0.78}
26%|██▌ | 2999/11526 [31:15<1:27:20, 1.63it/s] 26%|██▌ | 3000/11526 [31:16<1:27:22, 1.63it/s] {'loss': 0.2503, 'grad_norm': 0.5503491759300232, 'learning_rate': 9.237902773063275e-06, 'epoch': 0.78}
26%|██▌ | 3000/11526 [31:16<1:27:22, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.20it/s]
31%|███ | 4/13 [00:00<00:01, 8.37it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6594894528388977, 'eval_runtime': 1.9556, 'eval_samples_per_second': 102.272, 'eval_steps_per_second': 6.648, 'epoch': 0.78}
26%|██▌ | 3000/11526 [31:18<1:27:22, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 26%|██▌ | 3001/11526 [31:18<2:50:52, 1.20s/it] {'loss': 0.335, 'grad_norm': 0.5406580567359924, 'learning_rate': 9.237098983429946e-06, 'epoch': 0.78}
26%|██▌ | 3001/11526 [31:18<2:50:52, 1.20s/it] 26%|██▌ | 3002/11526 [31:19<2:25:44, 1.03s/it] {'loss': 0.2591, 'grad_norm': 0.45107099413871765, 'learning_rate': 9.23629480514579e-06, 'epoch': 0.78}
26%|██▌ | 3002/11526 [31:19<2:25:44, 1.03s/it] 26%|██▌ | 3003/11526 [31:20<2:08:17, 1.11it/s] {'loss': 0.2275, 'grad_norm': 0.49860942363739014, 'learning_rate': 9.235490238284572e-06, 'epoch': 0.78}
26%|██▌ | 3003/11526 [31:20<2:08:17, 1.11it/s] 26%|██▌ | 3004/11526 [31:20<1:55:56, 1.23it/s] {'loss': 0.2657, 'grad_norm': 0.6567229628562927, 'learning_rate': 9.23468528292009e-06, 'epoch': 0.78}
26%|██▌ | 3004/11526 [31:20<1:55:56, 1.23it/s] 26%|██▌ | 3005/11526 [31:21<1:47:17, 1.32it/s] {'loss': 0.2531, 'grad_norm': 0.5648027658462524, 'learning_rate': 9.233879939126178e-06, 'epoch': 0.78}
26%|██▌ | 3005/11526 [31:21<1:47:17, 1.32it/s] 26%|██▌ | 3006/11526 [31:21<1:41:09, 1.40it/s] {'loss': 0.2144, 'grad_norm': 0.4942297637462616, 'learning_rate': 9.233074206976709e-06, 'epoch': 0.78}
26%|██▌ | 3006/11526 [31:22<1:41:09, 1.40it/s] 26%|██▌ | 3007/11526 [31:22<1:36:59, 1.46it/s] {'loss': 0.3405, 'grad_norm': 0.6270516514778137, 'learning_rate': 9.232268086545588e-06, 'epoch': 0.78}
26%|██▌ | 3007/11526 [31:22<1:36:59, 1.46it/s] 26%|██▌ | 3008/11526 [31:23<1:34:10, 1.51it/s] {'loss': 0.2873, 'grad_norm': 0.6046134829521179, 'learning_rate': 9.23146157790676e-06, 'epoch': 0.78}
26%|██▌ | 3008/11526 [31:23<1:34:10, 1.51it/s] 26%|██▌ | 3009/11526 [31:23<1:32:02, 1.54it/s] {'loss': 0.2621, 'grad_norm': 0.5031048655509949, 'learning_rate': 9.230654681134196e-06, 'epoch': 0.78}
26%|██▌ | 3009/11526 [31:23<1:32:02, 1.54it/s] 26%|██▌ | 3010/11526 [31:24<1:30:33, 1.57it/s] {'loss': 0.3089, 'grad_norm': 0.5513750910758972, 'learning_rate': 9.229847396301916e-06, 'epoch': 0.78}
26%|██▌ | 3010/11526 [31:24<1:30:33, 1.57it/s] 26%|██▌ | 3011/11526 [31:24<1:29:39, 1.58it/s] {'loss': 0.2472, 'grad_norm': 0.5145773887634277, 'learning_rate': 9.229039723483965e-06, 'epoch': 0.78}
26%|██▌ | 3011/11526 [31:25<1:29:39, 1.58it/s] 26%|██▌ | 3012/11526 [31:25<1:28:53, 1.60it/s] {'loss': 0.2812, 'grad_norm': 0.5883171558380127, 'learning_rate': 9.22823166275443e-06, 'epoch': 0.78}
26%|██▌ | 3012/11526 [31:25<1:28:53, 1.60it/s] 26%|██▌ | 3013/11526 [31:26<1:28:24, 1.60it/s] {'loss': 0.3075, 'grad_norm': 0.5376749038696289, 'learning_rate': 9.227423214187428e-06, 'epoch': 0.78}
26%|██▌ | 3013/11526 [31:26<1:28:24, 1.60it/s] 26%|██▌ | 3014/11526 [31:26<1:27:59, 1.61it/s] {'loss': 0.1752, 'grad_norm': 0.4383312463760376, 'learning_rate': 9.226614377857116e-06, 'epoch': 0.78}
26%|██▌ | 3014/11526 [31:26<1:27:59, 1.61it/s] 26%|██▌ | 3015/11526 [31:27<1:27:45, 1.62it/s] {'loss': 0.3321, 'grad_norm': 0.5766792893409729, 'learning_rate': 9.225805153837684e-06, 'epoch': 0.78}
26%|██▌ | 3015/11526 [31:27<1:27:45, 1.62it/s] 26%|██▌ | 3016/11526 [31:28<1:27:37, 1.62it/s] {'loss': 0.2718, 'grad_norm': 0.5413677096366882, 'learning_rate': 9.22499554220336e-06, 'epoch': 0.79}
26%|██▌ | 3016/11526 [31:28<1:27:37, 1.62it/s] 26%|██▌ | 3017/11526 [31:28<1:27:25, 1.62it/s] {'loss': 0.2497, 'grad_norm': 0.5399598479270935, 'learning_rate': 9.224185543028407e-06, 'epoch': 0.79}
26%|██▌ | 3017/11526 [31:28<1:27:25, 1.62it/s] 26%|██▌ | 3018/11526 [31:29<1:27:23, 1.62it/s] {'loss': 0.2411, 'grad_norm': 0.5327816009521484, 'learning_rate': 9.22337515638712e-06, 'epoch': 0.79}
26%|██▌ | 3018/11526 [31:29<1:27:23, 1.62it/s] 26%|██▌ | 3019/11526 [31:29<1:27:16, 1.62it/s] {'loss': 0.2401, 'grad_norm': 0.532244861125946, 'learning_rate': 9.222564382353834e-06, 'epoch': 0.79}
26%|██▌ | 3019/11526 [31:30<1:27:16, 1.62it/s] 26%|██▌ | 3020/11526 [31:30<1:27:12, 1.63it/s] {'loss': 0.29, 'grad_norm': 0.5520691871643066, 'learning_rate': 9.221753221002915e-06, 'epoch': 0.79}
26%|██▌ | 3020/11526 [31:30<1:27:12, 1.63it/s] 26%|██▌ | 3021/11526 [31:31<1:27:08, 1.63it/s] {'loss': 0.2491, 'grad_norm': 0.5836203694343567, 'learning_rate': 9.220941672408773e-06, 'epoch': 0.79}
26%|██▌ | 3021/11526 [31:31<1:27:08, 1.63it/s] 26%|██▌ | 3022/11526 [31:31<1:27:05, 1.63it/s] {'loss': 0.3036, 'grad_norm': 0.5937520265579224, 'learning_rate': 9.220129736645844e-06, 'epoch': 0.79}
26%|██▌ | 3022/11526 [31:31<1:27:05, 1.63it/s] 26%|██▌ | 3023/11526 [31:32<1:27:08, 1.63it/s] {'loss': 0.292, 'grad_norm': 0.5508491396903992, 'learning_rate': 9.219317413788605e-06, 'epoch': 0.79}
26%|██▌ | 3023/11526 [31:32<1:27:08, 1.63it/s] 26%|██▌ | 3024/11526 [31:32<1:27:08, 1.63it/s] {'loss': 0.3463, 'grad_norm': 0.682892382144928, 'learning_rate': 9.218504703911563e-06, 'epoch': 0.79}
26%|██▌ | 3024/11526 [31:33<1:27:08, 1.63it/s] 26%|██▌ | 3025/11526 [31:33<1:27:05, 1.63it/s] {'loss': 0.3257, 'grad_norm': 0.6754419803619385, 'learning_rate': 9.21769160708927e-06, 'epoch': 0.79}
26%|██▌ | 3025/11526 [31:33<1:27:05, 1.63it/s] 26%|██▋ | 3026/11526 [31:34<1:27:02, 1.63it/s] {'loss': 0.2248, 'grad_norm': 0.5045442581176758, 'learning_rate': 9.216878123396301e-06, 'epoch': 0.79}
26%|██▋ | 3026/11526 [31:34<1:27:02, 1.63it/s] 26%|██▋ | 3027/11526 [31:34<1:27:02, 1.63it/s] {'loss': 0.2941, 'grad_norm': 0.5559493899345398, 'learning_rate': 9.216064252907282e-06, 'epoch': 0.79}
26%|██▋ | 3027/11526 [31:34<1:27:02, 1.63it/s] 26%|██▋ | 3028/11526 [31:35<1:27:02, 1.63it/s] {'loss': 0.2906, 'grad_norm': 0.6001330614089966, 'learning_rate': 9.21524999569686e-06, 'epoch': 0.79}
26%|██▋ | 3028/11526 [31:35<1:27:02, 1.63it/s] 26%|██▋ | 3029/11526 [31:36<1:26:58, 1.63it/s] {'loss': 0.3173, 'grad_norm': 0.57319575548172, 'learning_rate': 9.214435351839724e-06, 'epoch': 0.79}
26%|██▋ | 3029/11526 [31:36<1:26:58, 1.63it/s] 26%|██▋ | 3030/11526 [31:36<1:27:01, 1.63it/s] {'loss': 0.3044, 'grad_norm': 0.5773330330848694, 'learning_rate': 9.2136203214106e-06, 'epoch': 0.79}
26%|██▋ | 3030/11526 [31:36<1:27:01, 1.63it/s] 26%|██▋ | 3031/11526 [31:37<1:27:04, 1.63it/s] {'loss': 0.2678, 'grad_norm': 0.4843858778476715, 'learning_rate': 9.212804904484243e-06, 'epoch': 0.79}
26%|██▋ | 3031/11526 [31:37<1:27:04, 1.63it/s] 26%|██▋ | 3032/11526 [31:37<1:27:03, 1.63it/s] {'loss': 0.2812, 'grad_norm': 0.5805982351303101, 'learning_rate': 9.211989101135452e-06, 'epoch': 0.79}
26%|██▋ | 3032/11526 [31:38<1:27:03, 1.63it/s] 26%|██▋ | 3033/11526 [31:38<1:27:07, 1.62it/s] {'loss': 0.2286, 'grad_norm': 0.5032035708427429, 'learning_rate': 9.211172911439055e-06, 'epoch': 0.79}
26%|██▋ | 3033/11526 [31:38<1:27:07, 1.62it/s] 26%|██▋ | 3034/11526 [31:39<1:27:04, 1.63it/s] {'loss': 0.2291, 'grad_norm': 0.590433657169342, 'learning_rate': 9.210356335469918e-06, 'epoch': 0.79}
26%|██▋ | 3034/11526 [31:39<1:27:04, 1.63it/s] 26%|██▋ | 3035/11526 [31:39<1:27:01, 1.63it/s] {'loss': 0.3189, 'grad_norm': 0.7342865467071533, 'learning_rate': 9.209539373302941e-06, 'epoch': 0.79}
26%|██▋ | 3035/11526 [31:39<1:27:01, 1.63it/s] 26%|██▋ | 3036/11526 [31:40<1:27:02, 1.63it/s] {'loss': 0.3048, 'grad_norm': 0.5270050168037415, 'learning_rate': 9.208722025013063e-06, 'epoch': 0.79}
26%|██▋ | 3036/11526 [31:40<1:27:02, 1.63it/s] 26%|██▋ | 3037/11526 [31:40<1:26:57, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5089142322540283, 'learning_rate': 9.207904290675254e-06, 'epoch': 0.79}
26%|██▋ | 3037/11526 [31:41<1:26:57, 1.63it/s] 26%|██▋ | 3038/11526 [31:41<1:26:54, 1.63it/s] {'loss': 0.261, 'grad_norm': 0.5540294647216797, 'learning_rate': 9.207086170364521e-06, 'epoch': 0.79}
26%|██▋ | 3038/11526 [31:41<1:26:54, 1.63it/s] 26%|██▋ | 3039/11526 [31:42<1:26:52, 1.63it/s] {'loss': 0.3292, 'grad_norm': 0.5905458331108093, 'learning_rate': 9.206267664155906e-06, 'epoch': 0.79}
26%|██▋ | 3039/11526 [31:42<1:26:52, 1.63it/s] 26%|██▋ | 3040/11526 [31:42<1:26:51, 1.63it/s] {'loss': 0.2704, 'grad_norm': 0.4961971044540405, 'learning_rate': 9.20544877212449e-06, 'epoch': 0.79}
26%|██▋ | 3040/11526 [31:42<1:26:51, 1.63it/s] 26%|██▋ | 3041/11526 [31:43<1:26:51, 1.63it/s] {'loss': 0.3137, 'grad_norm': 0.6309118866920471, 'learning_rate': 9.204629494345383e-06, 'epoch': 0.79}
26%|██▋ | 3041/11526 [31:43<1:26:51, 1.63it/s] 26%|██▋ | 3042/11526 [31:44<1:26:50, 1.63it/s] {'loss': 0.1933, 'grad_norm': 0.47745710611343384, 'learning_rate': 9.203809830893737e-06, 'epoch': 0.79}
26%|██▋ | 3042/11526 [31:44<1:26:50, 1.63it/s] 26%|██▋ | 3043/11526 [31:44<1:26:51, 1.63it/s] {'loss': 0.2886, 'grad_norm': 0.551207423210144, 'learning_rate': 9.202989781844734e-06, 'epoch': 0.79}
26%|██▋ | 3043/11526 [31:44<1:26:51, 1.63it/s] 26%|██▋ | 3044/11526 [31:45<1:26:51, 1.63it/s] {'loss': 0.2634, 'grad_norm': 0.518082320690155, 'learning_rate': 9.202169347273595e-06, 'epoch': 0.79}
26%|██▋ | 3044/11526 [31:45<1:26:51, 1.63it/s] 26%|██▋ | 3045/11526 [31:45<1:26:53, 1.63it/s] {'loss': 0.2644, 'grad_norm': 0.5826083421707153, 'learning_rate': 9.201348527255573e-06, 'epoch': 0.79}
26%|██▋ | 3045/11526 [31:45<1:26:53, 1.63it/s] 26%|██▋ | 3046/11526 [31:46<1:26:51, 1.63it/s] {'loss': 0.2558, 'grad_norm': 0.5957307815551758, 'learning_rate': 9.20052732186596e-06, 'epoch': 0.79}
26%|██▋ | 3046/11526 [31:46<1:26:51, 1.63it/s] 26%|██▋ | 3047/11526 [31:47<1:26:48, 1.63it/s] {'loss': 0.2746, 'grad_norm': 0.5831025242805481, 'learning_rate': 9.19970573118008e-06, 'epoch': 0.79}
26%|██▋ | 3047/11526 [31:47<1:26:48, 1.63it/s] 26%|██▋ | 3048/11526 [31:47<1:26:49, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.5762686729431152, 'learning_rate': 9.198883755273295e-06, 'epoch': 0.79}
26%|██▋ | 3048/11526 [31:47<1:26:49, 1.63it/s] 26%|██▋ | 3049/11526 [31:48<1:26:48, 1.63it/s] {'loss': 0.2463, 'grad_norm': 0.5101103186607361, 'learning_rate': 9.198061394221003e-06, 'epoch': 0.79}
26%|██▋ | 3049/11526 [31:48<1:26:48, 1.63it/s] 26%|██▋ | 3050/11526 [31:48<1:26:46, 1.63it/s] {'loss': 0.2035, 'grad_norm': 0.46043843030929565, 'learning_rate': 9.197238648098631e-06, 'epoch': 0.79}
26%|██▋ | 3050/11526 [31:49<1:26:46, 1.63it/s] 26%|██▋ | 3051/11526 [31:49<1:26:46, 1.63it/s] {'loss': 0.1938, 'grad_norm': 0.4717422127723694, 'learning_rate': 9.196415516981652e-06, 'epoch': 0.79}
26%|██▋ | 3051/11526 [31:49<1:26:46, 1.63it/s] 26%|██▋ | 3052/11526 [31:50<1:26:44, 1.63it/s] {'loss': 0.2938, 'grad_norm': 0.6311392188072205, 'learning_rate': 9.195592000945565e-06, 'epoch': 0.79}
26%|██▋ | 3052/11526 [31:50<1:26:44, 1.63it/s] 26%|██▋ | 3053/11526 [31:50<1:26:44, 1.63it/s] {'loss': 0.2065, 'grad_norm': 0.4700833261013031, 'learning_rate': 9.194768100065905e-06, 'epoch': 0.79}
26%|██▋ | 3053/11526 [31:50<1:26:44, 1.63it/s] 26%|██▋ | 3054/11526 [31:51<1:26:43, 1.63it/s] {'loss': 0.2952, 'grad_norm': 0.5599157810211182, 'learning_rate': 9.19394381441825e-06, 'epoch': 0.79}
26%|██▋ | 3054/11526 [31:51<1:26:43, 1.63it/s] 27%|██▋ | 3055/11526 [31:52<1:26:41, 1.63it/s] {'loss': 0.1907, 'grad_norm': 0.46523725986480713, 'learning_rate': 9.193119144078206e-06, 'epoch': 0.8}
27%|██▋ | 3055/11526 [31:52<1:26:41, 1.63it/s] 27%|██▋ | 3056/11526 [31:52<1:26:38, 1.63it/s] {'loss': 0.2076, 'grad_norm': 0.48545917868614197, 'learning_rate': 9.192294089121414e-06, 'epoch': 0.8}
27%|██▋ | 3056/11526 [31:52<1:26:38, 1.63it/s] 27%|██▋ | 3057/11526 [31:53<1:26:37, 1.63it/s] {'loss': 0.2392, 'grad_norm': 0.5620201826095581, 'learning_rate': 9.191468649623558e-06, 'epoch': 0.8}
27%|██▋ | 3057/11526 [31:53<1:26:37, 1.63it/s] 27%|██▋ | 3058/11526 [31:53<1:26:39, 1.63it/s] {'loss': 0.2415, 'grad_norm': 0.5757526159286499, 'learning_rate': 9.190642825660346e-06, 'epoch': 0.8}
27%|██▋ | 3058/11526 [31:53<1:26:39, 1.63it/s] 27%|██▋ | 3059/11526 [31:54<1:26:37, 1.63it/s] {'loss': 0.2923, 'grad_norm': 0.6092456579208374, 'learning_rate': 9.189816617307532e-06, 'epoch': 0.8}
27%|██▋ | 3059/11526 [31:54<1:26:37, 1.63it/s] 27%|██▋ | 3060/11526 [31:55<1:26:36, 1.63it/s] {'loss': 0.198, 'grad_norm': 0.5076879858970642, 'learning_rate': 9.188990024640899e-06, 'epoch': 0.8}
27%|██▋ | 3060/11526 [31:55<1:26:36, 1.63it/s] 27%|██▋ | 3061/11526 [31:55<1:26:36, 1.63it/s] {'loss': 0.3693, 'grad_norm': 0.6216142177581787, 'learning_rate': 9.188163047736265e-06, 'epoch': 0.8}
27%|██▋ | 3061/11526 [31:55<1:26:36, 1.63it/s] 27%|██▋ | 3062/11526 [31:56<1:26:35, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.45958295464515686, 'learning_rate': 9.187335686669487e-06, 'epoch': 0.8}
27%|██▋ | 3062/11526 [31:56<1:26:35, 1.63it/s] 27%|██▋ | 3063/11526 [31:56<1:26:34, 1.63it/s] {'loss': 0.2683, 'grad_norm': 0.5562298893928528, 'learning_rate': 9.186507941516455e-06, 'epoch': 0.8}
27%|██▋ | 3063/11526 [31:57<1:26:34, 1.63it/s] 27%|██▋ | 3064/11526 [31:57<1:26:35, 1.63it/s] {'loss': 0.2837, 'grad_norm': 0.6862987875938416, 'learning_rate': 9.185679812353096e-06, 'epoch': 0.8}
27%|██▋ | 3064/11526 [31:57<1:26:35, 1.63it/s] 27%|██▋ | 3065/11526 [31:58<1:26:33, 1.63it/s] {'loss': 0.3075, 'grad_norm': 0.629828691482544, 'learning_rate': 9.184851299255368e-06, 'epoch': 0.8}
27%|██▋ | 3065/11526 [31:58<1:26:33, 1.63it/s] 27%|██▋ | 3066/11526 [31:58<1:26:32, 1.63it/s] {'loss': 0.289, 'grad_norm': 0.5513049960136414, 'learning_rate': 9.184022402299268e-06, 'epoch': 0.8}
27%|██▋ | 3066/11526 [31:58<1:26:32, 1.63it/s] 27%|██▋ | 3067/11526 [31:59<1:26:33, 1.63it/s] {'loss': 0.3061, 'grad_norm': 0.5400452017784119, 'learning_rate': 9.183193121560827e-06, 'epoch': 0.8}
27%|██▋ | 3067/11526 [31:59<1:26:33, 1.63it/s] 27%|██▋ | 3068/11526 [31:59<1:26:38, 1.63it/s] {'loss': 0.2229, 'grad_norm': 0.5219743251800537, 'learning_rate': 9.182363457116112e-06, 'epoch': 0.8}
27%|██▋ | 3068/11526 [32:00<1:26:38, 1.63it/s] 27%|██▋ | 3069/11526 [32:00<1:26:34, 1.63it/s] {'loss': 0.2062, 'grad_norm': 0.5105881690979004, 'learning_rate': 9.181533409041223e-06, 'epoch': 0.8}
27%|██▋ | 3069/11526 [32:00<1:26:34, 1.63it/s] 27%|██▋ | 3070/11526 [32:01<1:26:33, 1.63it/s] {'loss': 0.3086, 'grad_norm': 0.528667688369751, 'learning_rate': 9.180702977412298e-06, 'epoch': 0.8}
27%|██▋ | 3070/11526 [32:01<1:26:33, 1.63it/s] 27%|██▋ | 3071/11526 [32:01<1:26:30, 1.63it/s] {'loss': 0.2204, 'grad_norm': 0.5586403608322144, 'learning_rate': 9.179872162305509e-06, 'epoch': 0.8}
27%|██▋ | 3071/11526 [32:01<1:26:30, 1.63it/s] 27%|██▋ | 3072/11526 [32:02<1:26:34, 1.63it/s] {'loss': 0.2427, 'grad_norm': 0.5858162045478821, 'learning_rate': 9.179040963797063e-06, 'epoch': 0.8}
27%|██▋ | 3072/11526 [32:02<1:26:34, 1.63it/s] 27%|██▋ | 3073/11526 [32:03<1:26:40, 1.63it/s] {'loss': 0.2267, 'grad_norm': 0.4600662291049957, 'learning_rate': 9.178209381963202e-06, 'epoch': 0.8}
27%|██▋ | 3073/11526 [32:03<1:26:40, 1.63it/s] 27%|██▋ | 3074/11526 [32:03<1:26:38, 1.63it/s] {'loss': 0.2525, 'grad_norm': 0.542016863822937, 'learning_rate': 9.177377416880203e-06, 'epoch': 0.8}
27%|██▋ | 3074/11526 [32:03<1:26:38, 1.63it/s] 27%|██▋ | 3075/11526 [32:04<1:26:38, 1.63it/s] {'loss': 0.237, 'grad_norm': 0.53066486120224, 'learning_rate': 9.176545068624379e-06, 'epoch': 0.8}
27%|██▋ | 3075/11526 [32:04<1:26:38, 1.63it/s] 27%|██▋ | 3076/11526 [32:04<1:26:39, 1.63it/s] {'loss': 0.2998, 'grad_norm': 0.6115075945854187, 'learning_rate': 9.175712337272077e-06, 'epoch': 0.8}
27%|██▋ | 3076/11526 [32:05<1:26:39, 1.63it/s] 27%|██▋ | 3077/11526 [32:05<1:26:33, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.47668981552124023, 'learning_rate': 9.174879222899683e-06, 'epoch': 0.8}
27%|██▋ | 3077/11526 [32:05<1:26:33, 1.63it/s] 27%|██▋ | 3078/11526 [32:06<1:26:35, 1.63it/s] {'loss': 0.2375, 'grad_norm': 0.4943140149116516, 'learning_rate': 9.174045725583612e-06, 'epoch': 0.8}
27%|██▋ | 3078/11526 [32:06<1:26:35, 1.63it/s] 27%|██▋ | 3079/11526 [32:06<1:26:33, 1.63it/s] {'loss': 0.2492, 'grad_norm': 0.5493106842041016, 'learning_rate': 9.173211845400316e-06, 'epoch': 0.8}
27%|██▋ | 3079/11526 [32:06<1:26:33, 1.63it/s] 27%|██▋ | 3080/11526 [32:07<1:26:30, 1.63it/s] {'loss': 0.3226, 'grad_norm': 0.5331341028213501, 'learning_rate': 9.172377582426286e-06, 'epoch': 0.8}
27%|██▋ | 3080/11526 [32:07<1:26:30, 1.63it/s] 27%|██▋ | 3081/11526 [32:07<1:26:32, 1.63it/s] {'loss': 0.2552, 'grad_norm': 0.516236424446106, 'learning_rate': 9.171542936738044e-06, 'epoch': 0.8}
27%|██▋ | 3081/11526 [32:08<1:26:32, 1.63it/s] 27%|██▋ | 3082/11526 [32:08<1:26:29, 1.63it/s] {'loss': 0.2656, 'grad_norm': 0.6299492120742798, 'learning_rate': 9.17070790841215e-06, 'epoch': 0.8}
27%|██▋ | 3082/11526 [32:08<1:26:29, 1.63it/s] 27%|██▋ | 3083/11526 [32:09<1:26:28, 1.63it/s] {'loss': 0.2475, 'grad_norm': 0.5201883316040039, 'learning_rate': 9.169872497525195e-06, 'epoch': 0.8}
27%|██▋ | 3083/11526 [32:09<1:26:28, 1.63it/s] 27%|██▋ | 3084/11526 [32:09<1:26:24, 1.63it/s] {'loss': 0.256, 'grad_norm': 0.5477860569953918, 'learning_rate': 9.169036704153808e-06, 'epoch': 0.8}
27%|██▋ | 3084/11526 [32:09<1:26:24, 1.63it/s] 27%|██▋ | 3085/11526 [32:10<1:26:23, 1.63it/s] {'loss': 0.2563, 'grad_norm': 0.5671538710594177, 'learning_rate': 9.168200528374656e-06, 'epoch': 0.8}
27%|██▋ | 3085/11526 [32:10<1:26:23, 1.63it/s] 27%|██▋ | 3086/11526 [32:11<1:26:31, 1.63it/s] {'loss': 0.2421, 'grad_norm': 0.5450835227966309, 'learning_rate': 9.167363970264434e-06, 'epoch': 0.8}
27%|██▋ | 3086/11526 [32:11<1:26:31, 1.63it/s] 27%|██▋ | 3087/11526 [32:11<1:26:26, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.5242211222648621, 'learning_rate': 9.166527029899878e-06, 'epoch': 0.8}
27%|██▋ | 3087/11526 [32:11<1:26:26, 1.63it/s] 27%|██▋ | 3088/11526 [32:12<1:26:29, 1.63it/s] {'loss': 0.2871, 'grad_norm': 0.49054744839668274, 'learning_rate': 9.165689707357756e-06, 'epoch': 0.8}
27%|██▋ | 3088/11526 [32:12<1:26:29, 1.63it/s] 27%|██▋ | 3089/11526 [32:12<1:26:28, 1.63it/s] {'loss': 0.2493, 'grad_norm': 0.5359742045402527, 'learning_rate': 9.164852002714872e-06, 'epoch': 0.8}
27%|██▋ | 3089/11526 [32:13<1:26:28, 1.63it/s] 27%|██▋ | 3090/11526 [32:13<1:26:26, 1.63it/s] {'loss': 0.2711, 'grad_norm': 0.5434713959693909, 'learning_rate': 9.164013916048065e-06, 'epoch': 0.8}
27%|██▋ | 3090/11526 [32:13<1:26:26, 1.63it/s] 27%|██▋ | 3091/11526 [32:14<1:26:35, 1.62it/s] {'loss': 0.2198, 'grad_norm': 0.44997361302375793, 'learning_rate': 9.16317544743421e-06, 'epoch': 0.8}
27%|██▋ | 3091/11526 [32:14<1:26:35, 1.62it/s] 27%|██▋ | 3092/11526 [32:14<1:26:25, 1.63it/s] {'loss': 0.2544, 'grad_norm': 0.5382542610168457, 'learning_rate': 9.162336596950216e-06, 'epoch': 0.8}
27%|██▋ | 3092/11526 [32:14<1:26:25, 1.63it/s] 27%|██▋ | 3093/11526 [32:15<1:26:29, 1.62it/s] {'loss': 0.2399, 'grad_norm': 0.5288941860198975, 'learning_rate': 9.161497364673024e-06, 'epoch': 0.81}
27%|██▋ | 3093/11526 [32:15<1:26:29, 1.62it/s] 27%|██▋ | 3094/11526 [32:15<1:26:26, 1.63it/s] {'loss': 0.3464, 'grad_norm': 0.6502922773361206, 'learning_rate': 9.160657750679618e-06, 'epoch': 0.81}
27%|██▋ | 3094/11526 [32:16<1:26:26, 1.63it/s] 27%|██▋ | 3095/11526 [32:16<1:26:22, 1.63it/s] {'loss': 0.2938, 'grad_norm': 0.5739909410476685, 'learning_rate': 9.15981775504701e-06, 'epoch': 0.81}
27%|██▋ | 3095/11526 [32:16<1:26:22, 1.63it/s] 27%|██▋ | 3096/11526 [32:17<1:26:19, 1.63it/s] {'loss': 0.294, 'grad_norm': 0.6305001974105835, 'learning_rate': 9.15897737785225e-06, 'epoch': 0.81}
27%|██▋ | 3096/11526 [32:17<1:26:19, 1.63it/s] 27%|██▋ | 3097/11526 [32:17<1:26:17, 1.63it/s] {'loss': 0.34, 'grad_norm': 0.5803483128547668, 'learning_rate': 9.158136619172419e-06, 'epoch': 0.81}
27%|██▋ | 3097/11526 [32:17<1:26:17, 1.63it/s] 27%|██▋ | 3098/11526 [32:18<1:26:21, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.49617239832878113, 'learning_rate': 9.15729547908464e-06, 'epoch': 0.81}
27%|██▋ | 3098/11526 [32:18<1:26:21, 1.63it/s] 27%|██▋ | 3099/11526 [32:19<1:26:19, 1.63it/s] {'loss': 0.2571, 'grad_norm': 0.5607039928436279, 'learning_rate': 9.156453957666064e-06, 'epoch': 0.81}
27%|██▋ | 3099/11526 [32:19<1:26:19, 1.63it/s] 27%|██▋ | 3100/11526 [32:19<1:26:18, 1.63it/s] {'loss': 0.2376, 'grad_norm': 0.5051054358482361, 'learning_rate': 9.155612054993884e-06, 'epoch': 0.81}
27%|██▋ | 3100/11526 [32:19<1:26:18, 1.63it/s] 27%|██▋ | 3101/11526 [32:20<1:26:22, 1.63it/s] {'loss': 0.2692, 'grad_norm': 0.6027132868766785, 'learning_rate': 9.15476977114532e-06, 'epoch': 0.81}
27%|██▋ | 3101/11526 [32:20<1:26:22, 1.63it/s] 27%|██▋ | 3102/11526 [32:20<1:26:17, 1.63it/s] {'loss': 0.2988, 'grad_norm': 0.585300862789154, 'learning_rate': 9.153927106197633e-06, 'epoch': 0.81}
27%|██▋ | 3102/11526 [32:21<1:26:17, 1.63it/s] 27%|██▋ | 3103/11526 [32:21<1:26:22, 1.63it/s] {'loss': 0.2952, 'grad_norm': 0.6092053055763245, 'learning_rate': 9.153084060228119e-06, 'epoch': 0.81}
27%|██▋ | 3103/11526 [32:21<1:26:22, 1.63it/s] 27%|██▋ | 3104/11526 [32:22<1:26:20, 1.63it/s] {'loss': 0.2315, 'grad_norm': 0.5192955136299133, 'learning_rate': 9.152240633314102e-06, 'epoch': 0.81}
27%|██▋ | 3104/11526 [32:22<1:26:20, 1.63it/s] 27%|██▋ | 3105/11526 [32:22<1:26:15, 1.63it/s] {'loss': 0.2225, 'grad_norm': 0.5004680156707764, 'learning_rate': 9.151396825532951e-06, 'epoch': 0.81}
27%|██▋ | 3105/11526 [32:22<1:26:15, 1.63it/s] 27%|██▋ | 3106/11526 [32:23<1:26:18, 1.63it/s] {'loss': 0.1834, 'grad_norm': 0.4984295666217804, 'learning_rate': 9.150552636962062e-06, 'epoch': 0.81}
27%|██▋ | 3106/11526 [32:23<1:26:18, 1.63it/s] 27%|██▋ | 3107/11526 [32:23<1:26:15, 1.63it/s] {'loss': 0.257, 'grad_norm': 0.5178962349891663, 'learning_rate': 9.14970806767887e-06, 'epoch': 0.81}
27%|██▋ | 3107/11526 [32:24<1:26:15, 1.63it/s] 27%|██▋ | 3108/11526 [32:24<1:26:15, 1.63it/s] {'loss': 0.2367, 'grad_norm': 0.5368848443031311, 'learning_rate': 9.148863117760843e-06, 'epoch': 0.81}
27%|██▋ | 3108/11526 [32:24<1:26:15, 1.63it/s] 27%|██▋ | 3109/11526 [32:25<1:26:14, 1.63it/s] {'loss': 0.1991, 'grad_norm': 0.5321548581123352, 'learning_rate': 9.148017787285484e-06, 'epoch': 0.81}
27%|██▋ | 3109/11526 [32:25<1:26:14, 1.63it/s] 27%|██▋ | 3110/11526 [32:25<1:26:12, 1.63it/s] {'loss': 0.2655, 'grad_norm': 0.49675384163856506, 'learning_rate': 9.147172076330333e-06, 'epoch': 0.81}
27%|██▋ | 3110/11526 [32:25<1:26:12, 1.63it/s] 27%|██▋ | 3111/11526 [32:26<1:26:10, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.49317431449890137, 'learning_rate': 9.146325984972962e-06, 'epoch': 0.81}
27%|██▋ | 3111/11526 [32:26<1:26:10, 1.63it/s] 27%|██▋ | 3112/11526 [32:27<1:26:08, 1.63it/s] {'loss': 0.2883, 'grad_norm': 0.6464446783065796, 'learning_rate': 9.14547951329098e-06, 'epoch': 0.81}
27%|██▋ | 3112/11526 [32:27<1:26:08, 1.63it/s] 27%|██▋ | 3113/11526 [32:27<1:26:12, 1.63it/s] {'loss': 0.3419, 'grad_norm': 0.6227864027023315, 'learning_rate': 9.144632661362032e-06, 'epoch': 0.81}
27%|██▋ | 3113/11526 [32:27<1:26:12, 1.63it/s] 27%|██▋ | 3114/11526 [32:28<1:26:10, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.5570346713066101, 'learning_rate': 9.143785429263791e-06, 'epoch': 0.81}
27%|██▋ | 3114/11526 [32:28<1:26:10, 1.63it/s] 27%|██▋ | 3115/11526 [32:28<1:26:11, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.5251136422157288, 'learning_rate': 9.142937817073976e-06, 'epoch': 0.81}
27%|██▋ | 3115/11526 [32:29<1:26:11, 1.63it/s] 27%|██▋ | 3116/11526 [32:29<1:26:10, 1.63it/s] {'loss': 0.2739, 'grad_norm': 0.5700234174728394, 'learning_rate': 9.142089824870332e-06, 'epoch': 0.81}
27%|██▋ | 3116/11526 [32:29<1:26:10, 1.63it/s] 27%|██▋ | 3117/11526 [32:30<1:26:09, 1.63it/s] {'loss': 0.2703, 'grad_norm': 0.6232224702835083, 'learning_rate': 9.14124145273064e-06, 'epoch': 0.81}
27%|██▋ | 3117/11526 [32:30<1:26:09, 1.63it/s] 27%|██▋ | 3118/11526 [32:30<1:26:16, 1.62it/s] {'loss': 0.2448, 'grad_norm': 0.5943734049797058, 'learning_rate': 9.140392700732722e-06, 'epoch': 0.81}
27%|██▋ | 3118/11526 [32:30<1:26:16, 1.62it/s] 27%|██▋ | 3119/11526 [32:31<1:26:11, 1.63it/s] {'loss': 0.2704, 'grad_norm': 0.5020868182182312, 'learning_rate': 9.139543568954425e-06, 'epoch': 0.81}
27%|██▋ | 3119/11526 [32:31<1:26:11, 1.63it/s] 27%|██▋ | 3120/11526 [32:31<1:26:13, 1.62it/s] {'loss': 0.2815, 'grad_norm': 0.5663296580314636, 'learning_rate': 9.13869405747364e-06, 'epoch': 0.81}
27%|██▋ | 3120/11526 [32:32<1:26:13, 1.62it/s] 27%|██▋ | 3121/11526 [32:32<1:26:14, 1.62it/s] {'loss': 0.376, 'grad_norm': 0.7276212573051453, 'learning_rate': 9.137844166368289e-06, 'epoch': 0.81}
27%|██▋ | 3121/11526 [32:32<1:26:14, 1.62it/s] 27%|██▋ | 3122/11526 [32:33<1:26:09, 1.63it/s] {'loss': 0.3315, 'grad_norm': 0.5942776203155518, 'learning_rate': 9.136993895716325e-06, 'epoch': 0.81}
27%|██▋ | 3122/11526 [32:33<1:26:09, 1.63it/s] 27%|██▋ | 3123/11526 [32:33<1:26:15, 1.62it/s] {'loss': 0.1947, 'grad_norm': 0.46644124388694763, 'learning_rate': 9.136143245595744e-06, 'epoch': 0.81}
27%|██▋ | 3123/11526 [32:33<1:26:15, 1.62it/s] 27%|██▋ | 3124/11526 [32:34<1:26:09, 1.63it/s] {'loss': 0.3445, 'grad_norm': 0.7719324231147766, 'learning_rate': 9.13529221608457e-06, 'epoch': 0.81}
27%|██▋ | 3124/11526 [32:34<1:26:09, 1.63it/s] 27%|██▋ | 3125/11526 [32:35<1:26:04, 1.63it/s] {'loss': 0.2232, 'grad_norm': 0.5575551986694336, 'learning_rate': 9.134440807260867e-06, 'epoch': 0.81}
27%|██▋ | 3125/11526 [32:35<1:26:04, 1.63it/s] 27%|██▋ | 3126/11526 [32:35<1:26:04, 1.63it/s] {'loss': 0.3535, 'grad_norm': 0.5991417765617371, 'learning_rate': 9.133589019202727e-06, 'epoch': 0.81}
27%|██▋ | 3126/11526 [32:35<1:26:04, 1.63it/s] 27%|██▋ | 3127/11526 [32:36<1:26:00, 1.63it/s] {'loss': 0.2548, 'grad_norm': 0.5902590155601501, 'learning_rate': 9.132736851988285e-06, 'epoch': 0.81}
27%|██▋ | 3127/11526 [32:36<1:26:00, 1.63it/s] 27%|██▋ | 3128/11526 [32:36<1:26:06, 1.63it/s] {'loss': 0.2678, 'grad_norm': 0.5684567093849182, 'learning_rate': 9.131884305695701e-06, 'epoch': 0.81}
27%|██▋ | 3128/11526 [32:37<1:26:06, 1.63it/s] 27%|██▋ | 3129/11526 [32:37<1:26:05, 1.63it/s] {'loss': 0.2739, 'grad_norm': 0.5239400863647461, 'learning_rate': 9.131031380403184e-06, 'epoch': 0.81}
27%|██▋ | 3129/11526 [32:37<1:26:05, 1.63it/s] 27%|██▋ | 3130/11526 [32:38<1:26:01, 1.63it/s] {'loss': 0.2223, 'grad_norm': 0.5192832350730896, 'learning_rate': 9.13017807618896e-06, 'epoch': 0.81}
27%|██▋ | 3130/11526 [32:38<1:26:01, 1.63it/s] 27%|██▋ | 3131/11526 [32:38<1:25:58, 1.63it/s] {'loss': 0.2392, 'grad_norm': 0.5560265779495239, 'learning_rate': 9.129324393131306e-06, 'epoch': 0.81}
27%|██▋ | 3131/11526 [32:38<1:25:58, 1.63it/s] 27%|██▋ | 3132/11526 [32:39<1:25:57, 1.63it/s] {'loss': 0.2994, 'grad_norm': 0.49263229966163635, 'learning_rate': 9.128470331308522e-06, 'epoch': 0.82}
27%|██▋ | 3132/11526 [32:39<1:25:57, 1.63it/s] 27%|██▋ | 3133/11526 [32:39<1:26:20, 1.62it/s] {'loss': 0.2238, 'grad_norm': 0.5506243705749512, 'learning_rate': 9.12761589079895e-06, 'epoch': 0.82}
27%|██▋ | 3133/11526 [32:40<1:26:20, 1.62it/s] 27%|██▋ | 3134/11526 [32:40<1:26:13, 1.62it/s] {'loss': 0.2448, 'grad_norm': 0.5434053540229797, 'learning_rate': 9.126761071680963e-06, 'epoch': 0.82}
27%|██▋ | 3134/11526 [32:40<1:26:13, 1.62it/s] 27%|██▋ | 3135/11526 [32:41<1:26:06, 1.62it/s] {'loss': 0.209, 'grad_norm': 0.5177617073059082, 'learning_rate': 9.12590587403297e-06, 'epoch': 0.82}
27%|██▋ | 3135/11526 [32:41<1:26:06, 1.62it/s] 27%|██▋ | 3136/11526 [32:41<1:26:01, 1.63it/s] {'loss': 0.2772, 'grad_norm': 0.5220098495483398, 'learning_rate': 9.125050297933416e-06, 'epoch': 0.82}
27%|██▋ | 3136/11526 [32:41<1:26:01, 1.63it/s] 27%|██▋ | 3137/11526 [32:42<1:25:57, 1.63it/s] {'loss': 0.2836, 'grad_norm': 0.5917682647705078, 'learning_rate': 9.124194343460778e-06, 'epoch': 0.82}
27%|██▋ | 3137/11526 [32:42<1:25:57, 1.63it/s] 27%|██▋ | 3138/11526 [32:43<1:25:57, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.46823087334632874, 'learning_rate': 9.123338010693568e-06, 'epoch': 0.82}
27%|██▋ | 3138/11526 [32:43<1:25:57, 1.63it/s] 27%|██▋ | 3139/11526 [32:43<1:25:54, 1.63it/s] {'loss': 0.2646, 'grad_norm': 0.5076177716255188, 'learning_rate': 9.122481299710334e-06, 'epoch': 0.82}
27%|██▋ | 3139/11526 [32:43<1:25:54, 1.63it/s] 27%|██▋ | 3140/11526 [32:44<1:25:52, 1.63it/s] {'loss': 0.2567, 'grad_norm': 0.5416399240493774, 'learning_rate': 9.121624210589662e-06, 'epoch': 0.82}
27%|██▋ | 3140/11526 [32:44<1:25:52, 1.63it/s] 27%|██▋ | 3141/11526 [32:44<1:25:53, 1.63it/s] {'loss': 0.2573, 'grad_norm': 0.5810614824295044, 'learning_rate': 9.120766743410163e-06, 'epoch': 0.82}
27%|██▋ | 3141/11526 [32:45<1:25:53, 1.63it/s] 27%|██▋ | 3142/11526 [32:45<1:25:51, 1.63it/s] {'loss': 0.1624, 'grad_norm': 0.4119635820388794, 'learning_rate': 9.119908898250494e-06, 'epoch': 0.82}
27%|██▋ | 3142/11526 [32:45<1:25:51, 1.63it/s] 27%|██▋ | 3143/11526 [32:46<1:25:48, 1.63it/s] {'loss': 0.3095, 'grad_norm': 0.6810997128486633, 'learning_rate': 9.119050675189341e-06, 'epoch': 0.82}
27%|██▋ | 3143/11526 [32:46<1:25:48, 1.63it/s] 27%|██▋ | 3144/11526 [32:46<1:25:48, 1.63it/s] {'loss': 0.2132, 'grad_norm': 0.4594711363315582, 'learning_rate': 9.118192074305421e-06, 'epoch': 0.82}
27%|██▋ | 3144/11526 [32:46<1:25:48, 1.63it/s] 27%|██▋ | 3145/11526 [32:47<1:25:44, 1.63it/s] {'loss': 0.2644, 'grad_norm': 0.5650225877761841, 'learning_rate': 9.117333095677493e-06, 'epoch': 0.82}
27%|██▋ | 3145/11526 [32:47<1:25:44, 1.63it/s] 27%|██▋ | 3146/11526 [32:47<1:25:53, 1.63it/s] {'loss': 0.2893, 'grad_norm': 0.6043537855148315, 'learning_rate': 9.116473739384348e-06, 'epoch': 0.82}
27%|██▋ | 3146/11526 [32:48<1:25:53, 1.63it/s] 27%|██▋ | 3147/11526 [32:48<1:25:50, 1.63it/s] {'loss': 0.271, 'grad_norm': 0.5432782769203186, 'learning_rate': 9.115614005504809e-06, 'epoch': 0.82}
27%|██▋ | 3147/11526 [32:48<1:25:50, 1.63it/s] 27%|██▋ | 3148/11526 [32:49<1:25:49, 1.63it/s] {'loss': 0.2796, 'grad_norm': 0.5915676951408386, 'learning_rate': 9.114753894117736e-06, 'epoch': 0.82}
27%|██▋ | 3148/11526 [32:49<1:25:49, 1.63it/s] 27%|██▋ | 3149/11526 [32:49<1:25:47, 1.63it/s] {'loss': 0.2593, 'grad_norm': 0.5959764122962952, 'learning_rate': 9.113893405302025e-06, 'epoch': 0.82}
27%|██▋ | 3149/11526 [32:49<1:25:47, 1.63it/s] 27%|██▋ | 3150/11526 [32:50<1:25:47, 1.63it/s] {'loss': 0.2798, 'grad_norm': 0.6452190279960632, 'learning_rate': 9.113032539136603e-06, 'epoch': 0.82}
27%|██▋ | 3150/11526 [32:50<1:25:47, 1.63it/s] 27%|██▋ | 3151/11526 [32:51<1:25:57, 1.62it/s] {'loss': 0.2431, 'grad_norm': 0.49686938524246216, 'learning_rate': 9.112171295700435e-06, 'epoch': 0.82}
27%|██▋ | 3151/11526 [32:51<1:25:57, 1.62it/s] 27%|██▋ | 3152/11526 [32:51<1:25:54, 1.62it/s] {'loss': 0.2356, 'grad_norm': 0.6296645402908325, 'learning_rate': 9.111309675072518e-06, 'epoch': 0.82}
27%|██▋ | 3152/11526 [32:51<1:25:54, 1.62it/s] 27%|██▋ | 3153/11526 [32:52<1:25:54, 1.62it/s] {'loss': 0.192, 'grad_norm': 0.4592069983482361, 'learning_rate': 9.110447677331885e-06, 'epoch': 0.82}
27%|██▋ | 3153/11526 [32:52<1:25:54, 1.62it/s] 27%|██▋ | 3154/11526 [32:52<1:25:48, 1.63it/s] {'loss': 0.2251, 'grad_norm': 0.5207000970840454, 'learning_rate': 9.109585302557605e-06, 'epoch': 0.82}
27%|██▋ | 3154/11526 [32:53<1:25:48, 1.63it/s] 27%|██▋ | 3155/11526 [32:53<1:25:45, 1.63it/s] {'loss': 0.3039, 'grad_norm': 0.6676435470581055, 'learning_rate': 9.108722550828776e-06, 'epoch': 0.82}
27%|██▋ | 3155/11526 [32:53<1:25:45, 1.63it/s] 27%|██▋ | 3156/11526 [32:54<1:25:42, 1.63it/s] {'loss': 0.2727, 'grad_norm': 0.5718393325805664, 'learning_rate': 9.10785942222454e-06, 'epoch': 0.82}
27%|██▋ | 3156/11526 [32:54<1:25:42, 1.63it/s] 27%|██▋ | 3157/11526 [32:54<1:25:39, 1.63it/s] {'loss': 0.3244, 'grad_norm': 0.5793758034706116, 'learning_rate': 9.106995916824062e-06, 'epoch': 0.82}
27%|██▋ | 3157/11526 [32:54<1:25:39, 1.63it/s] 27%|██▋ | 3158/11526 [32:55<1:25:40, 1.63it/s] {'loss': 0.2558, 'grad_norm': 0.5267273187637329, 'learning_rate': 9.106132034706552e-06, 'epoch': 0.82}
27%|██▋ | 3158/11526 [32:55<1:25:40, 1.63it/s] 27%|██▋ | 3159/11526 [32:55<1:25:41, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.4736555218696594, 'learning_rate': 9.105267775951247e-06, 'epoch': 0.82}
27%|██▋ | 3159/11526 [32:56<1:25:41, 1.63it/s] 27%|██▋ | 3160/11526 [32:56<1:25:41, 1.63it/s] {'loss': 0.241, 'grad_norm': 0.5227603912353516, 'learning_rate': 9.104403140637427e-06, 'epoch': 0.82}
27%|██▋ | 3160/11526 [32:56<1:25:41, 1.63it/s] 27%|██▋ | 3161/11526 [32:57<1:25:42, 1.63it/s] {'loss': 0.3074, 'grad_norm': 0.6509437561035156, 'learning_rate': 9.103538128844397e-06, 'epoch': 0.82}
27%|██▋ | 3161/11526 [32:57<1:25:42, 1.63it/s] 27%|██▋ | 3162/11526 [32:57<1:25:40, 1.63it/s] {'loss': 0.1984, 'grad_norm': 0.4775879383087158, 'learning_rate': 9.1026727406515e-06, 'epoch': 0.82}
27%|██▋ | 3162/11526 [32:57<1:25:40, 1.63it/s] 27%|██▋ | 3163/11526 [32:58<1:25:36, 1.63it/s] {'loss': 0.2101, 'grad_norm': 0.5413249135017395, 'learning_rate': 9.101806976138117e-06, 'epoch': 0.82}
27%|██▋ | 3163/11526 [32:58<1:25:36, 1.63it/s] 27%|██▋ | 3164/11526 [32:59<1:25:34, 1.63it/s] {'loss': 0.2819, 'grad_norm': 0.5784923434257507, 'learning_rate': 9.10094083538366e-06, 'epoch': 0.82}
27%|██▋ | 3164/11526 [32:59<1:25:34, 1.63it/s] 27%|██▋ | 3165/11526 [32:59<1:25:34, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.5413036942481995, 'learning_rate': 9.100074318467576e-06, 'epoch': 0.82}
27%|██▋ | 3165/11526 [32:59<1:25:34, 1.63it/s] 27%|██▋ | 3166/11526 [33:00<1:25:32, 1.63it/s] {'loss': 0.2418, 'grad_norm': 0.5726471543312073, 'learning_rate': 9.099207425469347e-06, 'epoch': 0.82}
27%|██▋ | 3166/11526 [33:00<1:25:32, 1.63it/s] 27%|██▋ | 3167/11526 [33:00<1:25:30, 1.63it/s] {'loss': 0.2679, 'grad_norm': 0.6169052124023438, 'learning_rate': 9.098340156468488e-06, 'epoch': 0.82}
27%|██▋ | 3167/11526 [33:00<1:25:30, 1.63it/s] 27%|██▋ | 3168/11526 [33:01<1:25:30, 1.63it/s] {'loss': 0.3169, 'grad_norm': 0.6371598243713379, 'learning_rate': 9.097472511544555e-06, 'epoch': 0.82}
27%|██▋ | 3168/11526 [33:01<1:25:30, 1.63it/s] 27%|██▋ | 3169/11526 [33:02<1:25:29, 1.63it/s] {'loss': 0.2775, 'grad_norm': 0.708266019821167, 'learning_rate': 9.096604490777125e-06, 'epoch': 0.82}
27%|██▋ | 3169/11526 [33:02<1:25:29, 1.63it/s] 28%|██▊ | 3170/11526 [33:02<1:25:27, 1.63it/s] {'loss': 0.2466, 'grad_norm': 0.5710569620132446, 'learning_rate': 9.095736094245826e-06, 'epoch': 0.83}
28%|██▊ | 3170/11526 [33:02<1:25:27, 1.63it/s] 28%|██▊ | 3171/11526 [33:03<1:25:28, 1.63it/s] {'loss': 0.2814, 'grad_norm': 0.607520341873169, 'learning_rate': 9.094867322030307e-06, 'epoch': 0.83}
28%|██▊ | 3171/11526 [33:03<1:25:28, 1.63it/s] 28%|██▊ | 3172/11526 [33:03<1:25:28, 1.63it/s] {'loss': 0.2967, 'grad_norm': 0.5832104682922363, 'learning_rate': 9.093998174210258e-06, 'epoch': 0.83}
28%|██▊ | 3172/11526 [33:04<1:25:28, 1.63it/s] 28%|██▊ | 3173/11526 [33:04<1:25:30, 1.63it/s] {'loss': 0.2732, 'grad_norm': 0.5754496455192566, 'learning_rate': 9.093128650865403e-06, 'epoch': 0.83}
28%|██▊ | 3173/11526 [33:04<1:25:30, 1.63it/s] 28%|██▊ | 3174/11526 [33:05<1:25:28, 1.63it/s] {'loss': 0.2969, 'grad_norm': 0.5981727838516235, 'learning_rate': 9.0922587520755e-06, 'epoch': 0.83}
28%|██▊ | 3174/11526 [33:05<1:25:28, 1.63it/s] 28%|██▊ | 3175/11526 [33:05<1:25:30, 1.63it/s] {'loss': 0.1984, 'grad_norm': 0.42986297607421875, 'learning_rate': 9.09138847792034e-06, 'epoch': 0.83}
28%|██▊ | 3175/11526 [33:05<1:25:30, 1.63it/s] 28%|██▊ | 3176/11526 [33:06<1:25:28, 1.63it/s] {'loss': 0.3155, 'grad_norm': 0.6588969230651855, 'learning_rate': 9.09051782847975e-06, 'epoch': 0.83}
28%|██▊ | 3176/11526 [33:06<1:25:28, 1.63it/s] 28%|██▊ | 3177/11526 [33:06<1:25:26, 1.63it/s] {'loss': 0.3136, 'grad_norm': 0.6654079556465149, 'learning_rate': 9.089646803833589e-06, 'epoch': 0.83}
28%|██▊ | 3177/11526 [33:07<1:25:26, 1.63it/s] 28%|██▊ | 3178/11526 [33:07<1:25:25, 1.63it/s] {'loss': 0.2829, 'grad_norm': 0.5329433679580688, 'learning_rate': 9.088775404061757e-06, 'epoch': 0.83}
28%|██▊ | 3178/11526 [33:07<1:25:25, 1.63it/s] 28%|██▊ | 3179/11526 [33:08<1:25:23, 1.63it/s] {'loss': 0.267, 'grad_norm': 0.5385951399803162, 'learning_rate': 9.087903629244176e-06, 'epoch': 0.83}
28%|██▊ | 3179/11526 [33:08<1:25:23, 1.63it/s] 28%|██▊ | 3180/11526 [33:08<1:25:45, 1.62it/s] {'loss': 0.184, 'grad_norm': 0.48305338621139526, 'learning_rate': 9.087031479460819e-06, 'epoch': 0.83}
28%|██▊ | 3180/11526 [33:08<1:25:45, 1.62it/s] 28%|██▊ | 3181/11526 [33:09<1:25:38, 1.62it/s] {'loss': 0.3075, 'grad_norm': 0.6122949719429016, 'learning_rate': 9.086158954791679e-06, 'epoch': 0.83}
28%|██▊ | 3181/11526 [33:09<1:25:38, 1.62it/s] 28%|██▊ | 3182/11526 [33:10<1:25:33, 1.63it/s] {'loss': 0.3094, 'grad_norm': 0.5428199768066406, 'learning_rate': 9.08528605531679e-06, 'epoch': 0.83}
28%|██▊ | 3182/11526 [33:10<1:25:33, 1.63it/s] 28%|██▊ | 3183/11526 [33:10<1:25:32, 1.63it/s] {'loss': 0.2419, 'grad_norm': 0.48257219791412354, 'learning_rate': 9.08441278111622e-06, 'epoch': 0.83}
28%|██▊ | 3183/11526 [33:10<1:25:32, 1.63it/s] 28%|██▊ | 3184/11526 [33:11<1:25:27, 1.63it/s] {'loss': 0.241, 'grad_norm': 0.5147779583930969, 'learning_rate': 9.083539132270072e-06, 'epoch': 0.83}
28%|██▊ | 3184/11526 [33:11<1:25:27, 1.63it/s] 28%|██▊ | 3185/11526 [33:11<1:25:22, 1.63it/s] {'loss': 0.2788, 'grad_norm': 0.5268240571022034, 'learning_rate': 9.08266510885848e-06, 'epoch': 0.83}
28%|██▊ | 3185/11526 [33:12<1:25:22, 1.63it/s] 28%|██▊ | 3186/11526 [33:12<1:25:20, 1.63it/s] {'loss': 0.2884, 'grad_norm': 0.6283797025680542, 'learning_rate': 9.081790710961613e-06, 'epoch': 0.83}
28%|██▊ | 3186/11526 [33:12<1:25:20, 1.63it/s] 28%|██▊ | 3187/11526 [33:13<1:25:20, 1.63it/s] {'loss': 0.2298, 'grad_norm': 0.5493772625923157, 'learning_rate': 9.080915938659678e-06, 'epoch': 0.83}
28%|██▊ | 3187/11526 [33:13<1:25:20, 1.63it/s] 28%|██▊ | 3188/11526 [33:13<1:25:26, 1.63it/s] {'loss': 0.2413, 'grad_norm': 0.5544422268867493, 'learning_rate': 9.080040792032917e-06, 'epoch': 0.83}
28%|██▊ | 3188/11526 [33:13<1:25:26, 1.63it/s] 28%|██▊ | 3189/11526 [33:14<1:25:24, 1.63it/s] {'loss': 0.2956, 'grad_norm': 0.5578621625900269, 'learning_rate': 9.079165271161595e-06, 'epoch': 0.83}
28%|██▊ | 3189/11526 [33:14<1:25:24, 1.63it/s] 28%|██▊ | 3190/11526 [33:14<1:25:22, 1.63it/s] {'loss': 0.3134, 'grad_norm': 0.6515772342681885, 'learning_rate': 9.078289376126028e-06, 'epoch': 0.83}
28%|██▊ | 3190/11526 [33:15<1:25:22, 1.63it/s] 28%|██▊ | 3191/11526 [33:15<1:25:19, 1.63it/s] {'loss': 0.2237, 'grad_norm': 0.5063353180885315, 'learning_rate': 9.077413107006554e-06, 'epoch': 0.83}
28%|██▊ | 3191/11526 [33:15<1:25:19, 1.63it/s] 28%|██▊ | 3192/11526 [33:16<1:25:16, 1.63it/s] {'loss': 0.2923, 'grad_norm': 0.6466684341430664, 'learning_rate': 9.07653646388355e-06, 'epoch': 0.83}
28%|██▊ | 3192/11526 [33:16<1:25:16, 1.63it/s] 28%|██▊ | 3193/11526 [33:16<1:25:22, 1.63it/s] {'loss': 0.3226, 'grad_norm': 0.6029257774353027, 'learning_rate': 9.075659446837427e-06, 'epoch': 0.83}
28%|██▊ | 3193/11526 [33:16<1:25:22, 1.63it/s] 28%|██▊ | 3194/11526 [33:17<1:25:19, 1.63it/s] {'loss': 0.2461, 'grad_norm': 0.5563582181930542, 'learning_rate': 9.07478205594863e-06, 'epoch': 0.83}
28%|██▊ | 3194/11526 [33:17<1:25:19, 1.63it/s] 28%|██▊ | 3195/11526 [33:18<1:25:18, 1.63it/s] {'loss': 0.3006, 'grad_norm': 0.6352213621139526, 'learning_rate': 9.07390429129764e-06, 'epoch': 0.83}
28%|██▊ | 3195/11526 [33:18<1:25:18, 1.63it/s] 28%|██▊ | 3196/11526 [33:18<1:25:17, 1.63it/s] {'loss': 0.2273, 'grad_norm': 0.4829563796520233, 'learning_rate': 9.073026152964966e-06, 'epoch': 0.83}
28%|██▊ | 3196/11526 [33:18<1:25:17, 1.63it/s] 28%|██▊ | 3197/11526 [33:19<1:25:18, 1.63it/s] {'loss': 0.2448, 'grad_norm': 0.5691375732421875, 'learning_rate': 9.072147641031161e-06, 'epoch': 0.83}
28%|██▊ | 3197/11526 [33:19<1:25:18, 1.63it/s] 28%|██▊ | 3198/11526 [33:19<1:25:18, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.44344860315322876, 'learning_rate': 9.071268755576802e-06, 'epoch': 0.83}
28%|██▊ | 3198/11526 [33:20<1:25:18, 1.63it/s] 28%|██▊ | 3199/11526 [33:20<1:25:18, 1.63it/s] {'loss': 0.2514, 'grad_norm': 0.5468016862869263, 'learning_rate': 9.07038949668251e-06, 'epoch': 0.83}
28%|██▊ | 3199/11526 [33:20<1:25:18, 1.63it/s] 28%|██▊ | 3200/11526 [33:21<1:25:14, 1.63it/s] {'loss': 0.2839, 'grad_norm': 0.6560117602348328, 'learning_rate': 9.069509864428935e-06, 'epoch': 0.83}
28%|██▊ | 3200/11526 [33:21<1:25:14, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.23it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.99it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.654864490032196, 'eval_runtime': 1.9583, 'eval_samples_per_second': 102.131, 'eval_steps_per_second': 6.639, 'epoch': 0.83}
28%|██▊ | 3200/11526 [33:23<1:25:14, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 28%|██▊ | 3201/11526 [33:23<2:47:01, 1.20s/it] {'loss': 0.2926, 'grad_norm': 0.5747640132904053, 'learning_rate': 9.068629858896761e-06, 'epoch': 0.83}
28%|██▊ | 3201/11526 [33:23<2:47:01, 1.20s/it] 28%|██▊ | 3202/11526 [33:24<2:22:28, 1.03s/it] {'loss': 0.2996, 'grad_norm': 0.5593655109405518, 'learning_rate': 9.067749480166705e-06, 'epoch': 0.83}
28%|██▊ | 3202/11526 [33:24<2:22:28, 1.03s/it] 28%|██▊ | 3203/11526 [33:24<2:05:14, 1.11it/s] {'loss': 0.2725, 'grad_norm': 0.5512081384658813, 'learning_rate': 9.066868728319522e-06, 'epoch': 0.83}
28%|██▊ | 3203/11526 [33:25<2:05:14, 1.11it/s] 28%|██▊ | 3204/11526 [33:25<1:53:12, 1.23it/s] {'loss': 0.2047, 'grad_norm': 0.5087103247642517, 'learning_rate': 9.065987603436e-06, 'epoch': 0.83}
28%|██▊ | 3204/11526 [33:25<1:53:12, 1.23it/s] 28%|██▊ | 3205/11526 [33:26<1:44:47, 1.32it/s] {'loss': 0.2632, 'grad_norm': 0.532639741897583, 'learning_rate': 9.065106105596962e-06, 'epoch': 0.83}
28%|██▊ | 3205/11526 [33:26<1:44:47, 1.32it/s] 28%|██▊ | 3206/11526 [33:26<1:38:53, 1.40it/s] {'loss': 0.2092, 'grad_norm': 0.4861185550689697, 'learning_rate': 9.06422423488326e-06, 'epoch': 0.83}
28%|██▊ | 3206/11526 [33:26<1:38:53, 1.40it/s] 28%|██▊ | 3207/11526 [33:27<1:34:43, 1.46it/s] {'loss': 0.2346, 'grad_norm': 0.5568384528160095, 'learning_rate': 9.063341991375788e-06, 'epoch': 0.83}
28%|██▊ | 3207/11526 [33:27<1:34:43, 1.46it/s] 28%|██▊ | 3208/11526 [33:28<1:31:50, 1.51it/s] {'loss': 0.2184, 'grad_norm': 0.5893059968948364, 'learning_rate': 9.062459375155468e-06, 'epoch': 0.83}
28%|██▊ | 3208/11526 [33:28<1:31:50, 1.51it/s] 28%|██▊ | 3209/11526 [33:28<1:29:46, 1.54it/s] {'loss': 0.2722, 'grad_norm': 0.6342519521713257, 'learning_rate': 9.061576386303261e-06, 'epoch': 0.84}
28%|██▊ | 3209/11526 [33:28<1:29:46, 1.54it/s] 28%|██▊ | 3210/11526 [33:29<1:28:42, 1.56it/s] {'loss': 0.3198, 'grad_norm': 0.623525857925415, 'learning_rate': 9.060693024900158e-06, 'epoch': 0.84}
28%|██▊ | 3210/11526 [33:29<1:28:42, 1.56it/s] 28%|██▊ | 3211/11526 [33:29<1:27:39, 1.58it/s] {'loss': 0.216, 'grad_norm': 0.5448970794677734, 'learning_rate': 9.059809291027186e-06, 'epoch': 0.84}
28%|██▊ | 3211/11526 [33:29<1:27:39, 1.58it/s] 28%|██▊ | 3212/11526 [33:30<1:26:50, 1.60it/s] {'loss': 0.2549, 'grad_norm': 0.5479270815849304, 'learning_rate': 9.058925184765408e-06, 'epoch': 0.84}
28%|██▊ | 3212/11526 [33:30<1:26:50, 1.60it/s] 28%|██▊ | 3213/11526 [33:31<1:26:18, 1.61it/s] {'loss': 0.2619, 'grad_norm': 0.5389341115951538, 'learning_rate': 9.058040706195917e-06, 'epoch': 0.84}
28%|██▊ | 3213/11526 [33:31<1:26:18, 1.61it/s] 28%|██▊ | 3214/11526 [33:31<1:25:56, 1.61it/s] {'loss': 0.3942, 'grad_norm': 0.6843936443328857, 'learning_rate': 9.057155855399841e-06, 'epoch': 0.84}
28%|██▊ | 3214/11526 [33:31<1:25:56, 1.61it/s] 28%|██▊ | 3215/11526 [33:32<1:25:39, 1.62it/s] {'loss': 0.2276, 'grad_norm': 0.4944741725921631, 'learning_rate': 9.056270632458348e-06, 'epoch': 0.84}
28%|██▊ | 3215/11526 [33:32<1:25:39, 1.62it/s] 28%|██▊ | 3216/11526 [33:32<1:25:29, 1.62it/s] {'loss': 0.1866, 'grad_norm': 0.44892463088035583, 'learning_rate': 9.055385037452633e-06, 'epoch': 0.84}
28%|██▊ | 3216/11526 [33:33<1:25:29, 1.62it/s] 28%|██▊ | 3217/11526 [33:33<1:25:21, 1.62it/s] {'loss': 0.2323, 'grad_norm': 0.48843133449554443, 'learning_rate': 9.054499070463929e-06, 'epoch': 0.84}
28%|██▊ | 3217/11526 [33:33<1:25:21, 1.62it/s] 28%|██▊ | 3218/11526 [33:34<1:25:20, 1.62it/s] {'loss': 0.2704, 'grad_norm': 0.7017242312431335, 'learning_rate': 9.0536127315735e-06, 'epoch': 0.84}
28%|██▊ | 3218/11526 [33:34<1:25:20, 1.62it/s] 28%|██▊ | 3219/11526 [33:34<1:25:13, 1.62it/s] {'loss': 0.2523, 'grad_norm': 0.5479088425636292, 'learning_rate': 9.052726020862649e-06, 'epoch': 0.84}
28%|██▊ | 3219/11526 [33:34<1:25:13, 1.62it/s] 28%|██▊ | 3220/11526 [33:35<1:25:09, 1.63it/s] {'loss': 0.2384, 'grad_norm': 0.5245539546012878, 'learning_rate': 9.051838938412704e-06, 'epoch': 0.84}
28%|██▊ | 3220/11526 [33:35<1:25:09, 1.63it/s] 28%|██▊ | 3221/11526 [33:35<1:25:05, 1.63it/s] {'loss': 0.2531, 'grad_norm': 0.5286369323730469, 'learning_rate': 9.050951484305041e-06, 'epoch': 0.84}
28%|██▊ | 3221/11526 [33:36<1:25:05, 1.63it/s] 28%|██▊ | 3222/11526 [33:36<1:25:02, 1.63it/s] {'loss': 0.3103, 'grad_norm': 0.7145279049873352, 'learning_rate': 9.050063658621058e-06, 'epoch': 0.84}
28%|██▊ | 3222/11526 [33:36<1:25:02, 1.63it/s] 28%|██▊ | 3223/11526 [33:37<1:25:04, 1.63it/s] {'loss': 0.3159, 'grad_norm': 0.5922104716300964, 'learning_rate': 9.049175461442192e-06, 'epoch': 0.84}
28%|██▊ | 3223/11526 [33:37<1:25:04, 1.63it/s] 28%|██▊ | 3224/11526 [33:37<1:25:00, 1.63it/s] {'loss': 0.2829, 'grad_norm': 0.5589531064033508, 'learning_rate': 9.048286892849914e-06, 'epoch': 0.84}
28%|██▊ | 3224/11526 [33:37<1:25:00, 1.63it/s] 28%|██▊ | 3225/11526 [33:38<1:24:58, 1.63it/s] {'loss': 0.2893, 'grad_norm': 0.5831744074821472, 'learning_rate': 9.04739795292573e-06, 'epoch': 0.84}
28%|██▊ | 3225/11526 [33:38<1:24:58, 1.63it/s] 28%|██▊ | 3226/11526 [33:39<1:24:59, 1.63it/s] {'loss': 0.2497, 'grad_norm': 0.5552953481674194, 'learning_rate': 9.046508641751174e-06, 'epoch': 0.84}
28%|██▊ | 3226/11526 [33:39<1:24:59, 1.63it/s] 28%|██▊ | 3227/11526 [33:39<1:24:56, 1.63it/s] {'loss': 0.2404, 'grad_norm': 0.5239242911338806, 'learning_rate': 9.045618959407824e-06, 'epoch': 0.84}
28%|██▊ | 3227/11526 [33:39<1:24:56, 1.63it/s] 28%|██▊ | 3228/11526 [33:40<1:25:00, 1.63it/s] {'loss': 0.2791, 'grad_norm': 0.5634572505950928, 'learning_rate': 9.044728905977282e-06, 'epoch': 0.84}
28%|██▊ | 3228/11526 [33:40<1:25:00, 1.63it/s] 28%|██▊ | 3229/11526 [33:40<1:24:57, 1.63it/s] {'loss': 0.2491, 'grad_norm': 0.5793245434761047, 'learning_rate': 9.043838481541194e-06, 'epoch': 0.84}
28%|██▊ | 3229/11526 [33:41<1:24:57, 1.63it/s] 28%|██▊ | 3230/11526 [33:41<1:24:55, 1.63it/s] {'loss': 0.2278, 'grad_norm': 0.47686511278152466, 'learning_rate': 9.042947686181231e-06, 'epoch': 0.84}
28%|██▊ | 3230/11526 [33:41<1:24:55, 1.63it/s] 28%|██▊ | 3231/11526 [33:42<1:24:52, 1.63it/s] {'loss': 0.2963, 'grad_norm': 0.6190134882926941, 'learning_rate': 9.042056519979104e-06, 'epoch': 0.84}
28%|██▊ | 3231/11526 [33:42<1:24:52, 1.63it/s] 28%|██▊ | 3232/11526 [33:42<1:24:50, 1.63it/s] {'loss': 0.2575, 'grad_norm': 0.6390113830566406, 'learning_rate': 9.041164983016552e-06, 'epoch': 0.84}
28%|██▊ | 3232/11526 [33:42<1:24:50, 1.63it/s] 28%|██▊ | 3233/11526 [33:43<1:24:52, 1.63it/s] {'loss': 0.2387, 'grad_norm': 0.5095664262771606, 'learning_rate': 9.040273075375356e-06, 'epoch': 0.84}
28%|██▊ | 3233/11526 [33:43<1:24:52, 1.63it/s] 28%|██▊ | 3234/11526 [33:43<1:24:50, 1.63it/s] {'loss': 0.2443, 'grad_norm': 0.6280965209007263, 'learning_rate': 9.039380797137325e-06, 'epoch': 0.84}
28%|██▊ | 3234/11526 [33:44<1:24:50, 1.63it/s] 28%|██▊ | 3235/11526 [33:44<1:24:48, 1.63it/s] {'loss': 0.2812, 'grad_norm': 0.6229792833328247, 'learning_rate': 9.038488148384303e-06, 'epoch': 0.84}
28%|██▊ | 3235/11526 [33:44<1:24:48, 1.63it/s] 28%|██▊ | 3236/11526 [33:45<1:24:49, 1.63it/s] {'loss': 0.2573, 'grad_norm': 0.5534254312515259, 'learning_rate': 9.037595129198172e-06, 'epoch': 0.84}
28%|██▊ | 3236/11526 [33:45<1:24:49, 1.63it/s] 28%|██▊ | 3237/11526 [33:45<1:24:47, 1.63it/s] {'loss': 0.2125, 'grad_norm': 0.4795273244380951, 'learning_rate': 9.036701739660842e-06, 'epoch': 0.84}
28%|██▊ | 3237/11526 [33:45<1:24:47, 1.63it/s] 28%|██▊ | 3238/11526 [33:46<1:24:47, 1.63it/s] {'loss': 0.2163, 'grad_norm': 0.5065281987190247, 'learning_rate': 9.035807979854261e-06, 'epoch': 0.84}
28%|██▊ | 3238/11526 [33:46<1:24:47, 1.63it/s] 28%|██▊ | 3239/11526 [33:47<1:24:47, 1.63it/s] {'loss': 0.2333, 'grad_norm': 0.5835736393928528, 'learning_rate': 9.034913849860409e-06, 'epoch': 0.84}
28%|██▊ | 3239/11526 [33:47<1:24:47, 1.63it/s] 28%|██▊ | 3240/11526 [33:47<1:24:45, 1.63it/s] {'loss': 0.2676, 'grad_norm': 0.5810648798942566, 'learning_rate': 9.0340193497613e-06, 'epoch': 0.84}
28%|██▊ | 3240/11526 [33:47<1:24:45, 1.63it/s] 28%|██▊ | 3241/11526 [33:48<1:24:51, 1.63it/s] {'loss': 0.2628, 'grad_norm': 0.6205868721008301, 'learning_rate': 9.033124479638985e-06, 'epoch': 0.84}
28%|██▊ | 3241/11526 [33:48<1:24:51, 1.63it/s] 28%|██▊ | 3242/11526 [33:48<1:24:52, 1.63it/s] {'loss': 0.2499, 'grad_norm': 0.5347071886062622, 'learning_rate': 9.032229239575545e-06, 'epoch': 0.84}
28%|██▊ | 3242/11526 [33:49<1:24:52, 1.63it/s] 28%|██▊ | 3243/11526 [33:49<1:24:51, 1.63it/s] {'loss': 0.2699, 'grad_norm': 0.6165748834609985, 'learning_rate': 9.031333629653096e-06, 'epoch': 0.84}
28%|██▊ | 3243/11526 [33:49<1:24:51, 1.63it/s] 28%|██▊ | 3244/11526 [33:50<1:24:49, 1.63it/s] {'loss': 0.3451, 'grad_norm': 0.5739336609840393, 'learning_rate': 9.03043764995379e-06, 'epoch': 0.84}
28%|██▊ | 3244/11526 [33:50<1:24:49, 1.63it/s] 28%|██▊ | 3245/11526 [33:50<1:24:51, 1.63it/s] {'loss': 0.2314, 'grad_norm': 0.5382508039474487, 'learning_rate': 9.02954130055981e-06, 'epoch': 0.84}
28%|██▊ | 3245/11526 [33:50<1:24:51, 1.63it/s] 28%|██▊ | 3246/11526 [33:51<1:24:52, 1.63it/s] {'loss': 0.2749, 'grad_norm': 0.5808917284011841, 'learning_rate': 9.028644581553374e-06, 'epoch': 0.84}
28%|██▊ | 3246/11526 [33:51<1:24:52, 1.63it/s] 28%|██▊ | 3247/11526 [33:51<1:24:48, 1.63it/s] {'loss': 0.2629, 'grad_norm': 0.5040305256843567, 'learning_rate': 9.027747493016737e-06, 'epoch': 0.85}
28%|██▊ | 3247/11526 [33:52<1:24:48, 1.63it/s] 28%|██▊ | 3248/11526 [33:52<1:24:46, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.45728766918182373, 'learning_rate': 9.026850035032181e-06, 'epoch': 0.85}
28%|██▊ | 3248/11526 [33:52<1:24:46, 1.63it/s] 28%|██▊ | 3249/11526 [33:53<1:24:43, 1.63it/s] {'loss': 0.2741, 'grad_norm': 0.6145495176315308, 'learning_rate': 9.025952207682028e-06, 'epoch': 0.85}
28%|██▊ | 3249/11526 [33:53<1:24:43, 1.63it/s] 28%|██▊ | 3250/11526 [33:53<1:24:41, 1.63it/s] {'loss': 0.2305, 'grad_norm': 0.5549578070640564, 'learning_rate': 9.025054011048634e-06, 'epoch': 0.85}
28%|██▊ | 3250/11526 [33:53<1:24:41, 1.63it/s] 28%|██▊ | 3251/11526 [33:54<1:24:45, 1.63it/s] {'loss': 0.2725, 'grad_norm': 0.5832191705703735, 'learning_rate': 9.024155445214382e-06, 'epoch': 0.85}
28%|██▊ | 3251/11526 [33:54<1:24:45, 1.63it/s] 28%|██▊ | 3252/11526 [33:55<1:24:44, 1.63it/s] {'loss': 0.2246, 'grad_norm': 0.4970328211784363, 'learning_rate': 9.023256510261697e-06, 'epoch': 0.85}
28%|██▊ | 3252/11526 [33:55<1:24:44, 1.63it/s] 28%|██▊ | 3253/11526 [33:55<1:24:48, 1.63it/s] {'loss': 0.2677, 'grad_norm': 0.5766865015029907, 'learning_rate': 9.022357206273034e-06, 'epoch': 0.85}
28%|██▊ | 3253/11526 [33:55<1:24:48, 1.63it/s] 28%|██▊ | 3254/11526 [33:56<1:24:44, 1.63it/s] {'loss': 0.2423, 'grad_norm': 0.577963650226593, 'learning_rate': 9.02145753333088e-06, 'epoch': 0.85}
28%|██▊ | 3254/11526 [33:56<1:24:44, 1.63it/s] 28%|██▊ | 3255/11526 [33:56<1:24:40, 1.63it/s] {'loss': 0.2879, 'grad_norm': 0.5770542621612549, 'learning_rate': 9.020557491517761e-06, 'epoch': 0.85}
28%|██▊ | 3255/11526 [33:57<1:24:40, 1.63it/s] 28%|██▊ | 3256/11526 [33:57<1:24:42, 1.63it/s] {'loss': 0.3043, 'grad_norm': 0.605155885219574, 'learning_rate': 9.019657080916233e-06, 'epoch': 0.85}
28%|██▊ | 3256/11526 [33:57<1:24:42, 1.63it/s] 28%|██▊ | 3257/11526 [33:58<1:24:42, 1.63it/s] {'loss': 0.2865, 'grad_norm': 0.563761830329895, 'learning_rate': 9.018756301608887e-06, 'epoch': 0.85}
28%|██▊ | 3257/11526 [33:58<1:24:42, 1.63it/s] 28%|██▊ | 3258/11526 [33:58<1:24:42, 1.63it/s] {'loss': 0.1975, 'grad_norm': 0.44002076983451843, 'learning_rate': 9.017855153678347e-06, 'epoch': 0.85}
28%|██▊ | 3258/11526 [33:58<1:24:42, 1.63it/s] 28%|██▊ | 3259/11526 [33:59<1:24:40, 1.63it/s] {'loss': 0.2229, 'grad_norm': 0.527032196521759, 'learning_rate': 9.016953637207272e-06, 'epoch': 0.85}
28%|██▊ | 3259/11526 [33:59<1:24:40, 1.63it/s] 28%|██▊ | 3260/11526 [33:59<1:24:37, 1.63it/s] {'loss': 0.2037, 'grad_norm': 0.5014447569847107, 'learning_rate': 9.016051752278352e-06, 'epoch': 0.85}
28%|██▊ | 3260/11526 [34:00<1:24:37, 1.63it/s] 28%|██▊ | 3261/11526 [34:00<1:24:40, 1.63it/s] {'loss': 0.2497, 'grad_norm': 0.5213087201118469, 'learning_rate': 9.015149498974318e-06, 'epoch': 0.85}
28%|██▊ | 3261/11526 [34:00<1:24:40, 1.63it/s] 28%|██▊ | 3262/11526 [34:01<1:24:37, 1.63it/s] {'loss': 0.258, 'grad_norm': 0.5345727205276489, 'learning_rate': 9.014246877377925e-06, 'epoch': 0.85}
28%|██▊ | 3262/11526 [34:01<1:24:37, 1.63it/s] 28%|██▊ | 3263/11526 [34:01<1:24:34, 1.63it/s] {'loss': 0.2877, 'grad_norm': 0.5493413209915161, 'learning_rate': 9.013343887571968e-06, 'epoch': 0.85}
28%|██▊ | 3263/11526 [34:01<1:24:34, 1.63it/s] 28%|██▊ | 3264/11526 [34:02<1:24:32, 1.63it/s] {'loss': 0.3698, 'grad_norm': 0.6425647735595703, 'learning_rate': 9.012440529639275e-06, 'epoch': 0.85}
28%|██▊ | 3264/11526 [34:02<1:24:32, 1.63it/s] 28%|██▊ | 3265/11526 [34:03<1:24:34, 1.63it/s] {'loss': 0.2637, 'grad_norm': 0.5490221381187439, 'learning_rate': 9.011536803662706e-06, 'epoch': 0.85}
28%|██▊ | 3265/11526 [34:03<1:24:34, 1.63it/s] 28%|██▊ | 3266/11526 [34:03<1:24:38, 1.63it/s] {'loss': 0.2423, 'grad_norm': 0.48906293511390686, 'learning_rate': 9.010632709725158e-06, 'epoch': 0.85}
28%|██▊ | 3266/11526 [34:03<1:24:38, 1.63it/s] 28%|██▊ | 3267/11526 [34:04<1:24:34, 1.63it/s] {'loss': 0.2923, 'grad_norm': 0.5867401361465454, 'learning_rate': 9.009728247909557e-06, 'epoch': 0.85}
28%|██▊ | 3267/11526 [34:04<1:24:34, 1.63it/s] 28%|██▊ | 3268/11526 [34:04<1:24:33, 1.63it/s] {'loss': 0.324, 'grad_norm': 0.6185171008110046, 'learning_rate': 9.008823418298868e-06, 'epoch': 0.85}
28%|██▊ | 3268/11526 [34:05<1:24:33, 1.63it/s] 28%|██▊ | 3269/11526 [34:05<1:24:31, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.5854234099388123, 'learning_rate': 9.007918220976086e-06, 'epoch': 0.85}
28%|██▊ | 3269/11526 [34:05<1:24:31, 1.63it/s] 28%|██▊ | 3270/11526 [34:06<1:24:29, 1.63it/s] {'loss': 0.2402, 'grad_norm': 0.5365552306175232, 'learning_rate': 9.00701265602424e-06, 'epoch': 0.85}
28%|██▊ | 3270/11526 [34:06<1:24:29, 1.63it/s] 28%|██▊ | 3271/11526 [34:06<1:24:31, 1.63it/s] {'loss': 0.3175, 'grad_norm': 0.6167342066764832, 'learning_rate': 9.006106723526394e-06, 'epoch': 0.85}
28%|██▊ | 3271/11526 [34:06<1:24:31, 1.63it/s] 28%|██▊ | 3272/11526 [34:07<1:24:32, 1.63it/s] {'loss': 0.2558, 'grad_norm': 0.5223198533058167, 'learning_rate': 9.005200423565645e-06, 'epoch': 0.85}
28%|██▊ | 3272/11526 [34:07<1:24:32, 1.63it/s] 28%|██▊ | 3273/11526 [34:07<1:24:34, 1.63it/s] {'loss': 0.2718, 'grad_norm': 0.574465274810791, 'learning_rate': 9.004293756225125e-06, 'epoch': 0.85}
28%|██▊ | 3273/11526 [34:08<1:24:34, 1.63it/s] 28%|██▊ | 3274/11526 [34:08<1:24:30, 1.63it/s] {'loss': 0.2253, 'grad_norm': 0.5110756158828735, 'learning_rate': 9.003386721587999e-06, 'epoch': 0.85}
28%|██▊ | 3274/11526 [34:08<1:24:30, 1.63it/s] 28%|██▊ | 3275/11526 [34:09<1:24:31, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.47871336340904236, 'learning_rate': 9.002479319737463e-06, 'epoch': 0.85}
28%|██▊ | 3275/11526 [34:09<1:24:31, 1.63it/s] 28%|██▊ | 3276/11526 [34:09<1:24:27, 1.63it/s] {'loss': 0.3303, 'grad_norm': 0.6472916603088379, 'learning_rate': 9.001571550756751e-06, 'epoch': 0.85}
28%|██▊ | 3276/11526 [34:09<1:24:27, 1.63it/s] 28%|██▊ | 3277/11526 [34:10<1:24:26, 1.63it/s] {'loss': 0.2044, 'grad_norm': 0.4936976730823517, 'learning_rate': 9.000663414729129e-06, 'epoch': 0.85}
28%|██▊ | 3277/11526 [34:10<1:24:26, 1.63it/s] 28%|██▊ | 3278/11526 [34:11<1:24:24, 1.63it/s] {'loss': 0.2065, 'grad_norm': 0.4544784724712372, 'learning_rate': 8.999754911737896e-06, 'epoch': 0.85}
28%|██▊ | 3278/11526 [34:11<1:24:24, 1.63it/s] 28%|██▊ | 3279/11526 [34:11<1:24:24, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.47186580300331116, 'learning_rate': 8.998846041866384e-06, 'epoch': 0.85}
28%|██▊ | 3279/11526 [34:11<1:24:24, 1.63it/s] 28%|██▊ | 3280/11526 [34:12<1:24:23, 1.63it/s] {'loss': 0.336, 'grad_norm': 0.6494742035865784, 'learning_rate': 8.997936805197962e-06, 'epoch': 0.85}
28%|██▊ | 3280/11526 [34:12<1:24:23, 1.63it/s] 28%|██▊ | 3281/11526 [34:12<1:24:22, 1.63it/s] {'loss': 0.2387, 'grad_norm': 0.565132737159729, 'learning_rate': 8.997027201816026e-06, 'epoch': 0.85}
28%|██▊ | 3281/11526 [34:12<1:24:22, 1.63it/s] 28%|██▊ | 3282/11526 [34:13<1:24:20, 1.63it/s] {'loss': 0.2704, 'grad_norm': 0.5348547101020813, 'learning_rate': 8.996117231804015e-06, 'epoch': 0.85}
28%|██▊ | 3282/11526 [34:13<1:24:20, 1.63it/s] 28%|██▊ | 3283/11526 [34:14<1:24:28, 1.63it/s] {'loss': 0.2585, 'grad_norm': 0.6466012001037598, 'learning_rate': 8.995206895245395e-06, 'epoch': 0.85}
28%|██▊ | 3283/11526 [34:14<1:24:28, 1.63it/s] 28%|██▊ | 3284/11526 [34:14<1:24:27, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.4417588710784912, 'learning_rate': 8.994296192223667e-06, 'epoch': 0.85}
28%|██▊ | 3284/11526 [34:14<1:24:27, 1.63it/s] 29%|██▊ | 3285/11526 [34:15<1:24:28, 1.63it/s] {'loss': 0.2098, 'grad_norm': 0.4699929356575012, 'learning_rate': 8.993385122822364e-06, 'epoch': 0.86}
29%|██▊ | 3285/11526 [34:15<1:24:28, 1.63it/s] 29%|██▊ | 3286/11526 [34:15<1:24:29, 1.63it/s] {'loss': 0.2879, 'grad_norm': 0.5274131298065186, 'learning_rate': 8.992473687125057e-06, 'epoch': 0.86}
29%|██▊ | 3286/11526 [34:16<1:24:29, 1.63it/s] 29%|██▊ | 3287/11526 [34:16<1:24:31, 1.62it/s] {'loss': 0.3249, 'grad_norm': 0.6180456876754761, 'learning_rate': 8.991561885215348e-06, 'epoch': 0.86}
29%|██▊ | 3287/11526 [34:16<1:24:31, 1.62it/s] 29%|██▊ | 3288/11526 [34:17<1:24:29, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.5479853749275208, 'learning_rate': 8.99064971717687e-06, 'epoch': 0.86}
29%|██▊ | 3288/11526 [34:17<1:24:29, 1.63it/s] 29%|██▊ | 3289/11526 [34:17<1:24:26, 1.63it/s] {'loss': 0.2229, 'grad_norm': 0.4895799458026886, 'learning_rate': 8.989737183093294e-06, 'epoch': 0.86}
29%|██▊ | 3289/11526 [34:17<1:24:26, 1.63it/s] 29%|██▊ | 3290/11526 [34:18<1:24:23, 1.63it/s] {'loss': 0.3134, 'grad_norm': 0.6213431358337402, 'learning_rate': 8.988824283048324e-06, 'epoch': 0.86}
29%|██▊ | 3290/11526 [34:18<1:24:23, 1.63it/s] 29%|██▊ | 3291/11526 [34:19<1:24:21, 1.63it/s] {'loss': 0.3558, 'grad_norm': 0.6578519940376282, 'learning_rate': 8.987911017125696e-06, 'epoch': 0.86}
29%|██▊ | 3291/11526 [34:19<1:24:21, 1.63it/s] 29%|██▊ | 3292/11526 [34:19<1:24:16, 1.63it/s] {'loss': 0.3553, 'grad_norm': 0.7388201355934143, 'learning_rate': 8.986997385409179e-06, 'epoch': 0.86}
29%|██▊ | 3292/11526 [34:19<1:24:16, 1.63it/s] 29%|██▊ | 3293/11526 [34:20<1:24:19, 1.63it/s] {'loss': 0.2719, 'grad_norm': 0.5228049755096436, 'learning_rate': 8.986083387982576e-06, 'epoch': 0.86}
29%|██▊ | 3293/11526 [34:20<1:24:19, 1.63it/s] 29%|██▊ | 3294/11526 [34:20<1:24:18, 1.63it/s] {'loss': 0.2319, 'grad_norm': 0.5237506031990051, 'learning_rate': 8.985169024929725e-06, 'epoch': 0.86}
29%|██▊ | 3294/11526 [34:20<1:24:18, 1.63it/s] 29%|██▊ | 3295/11526 [34:21<1:24:14, 1.63it/s] {'loss': 0.3473, 'grad_norm': 0.5160418152809143, 'learning_rate': 8.984254296334496e-06, 'epoch': 0.86}
29%|██▊ | 3295/11526 [34:21<1:24:14, 1.63it/s] 29%|██▊ | 3296/11526 [34:22<1:24:15, 1.63it/s] {'loss': 0.3722, 'grad_norm': 0.6714357733726501, 'learning_rate': 8.983339202280795e-06, 'epoch': 0.86}
29%|██▊ | 3296/11526 [34:22<1:24:15, 1.63it/s] 29%|██▊ | 3297/11526 [34:22<1:24:14, 1.63it/s] {'loss': 0.2528, 'grad_norm': 0.4824368953704834, 'learning_rate': 8.982423742852555e-06, 'epoch': 0.86}
29%|██▊ | 3297/11526 [34:22<1:24:14, 1.63it/s] 29%|██▊ | 3298/11526 [34:23<1:24:16, 1.63it/s] {'loss': 0.2407, 'grad_norm': 0.5887393355369568, 'learning_rate': 8.981507918133751e-06, 'epoch': 0.86}
29%|██▊ | 3298/11526 [34:23<1:24:16, 1.63it/s] 29%|██▊ | 3299/11526 [34:23<1:24:13, 1.63it/s] {'loss': 0.3109, 'grad_norm': 0.549519419670105, 'learning_rate': 8.98059172820839e-06, 'epoch': 0.86}
29%|██▊ | 3299/11526 [34:24<1:24:13, 1.63it/s] 29%|██▊ | 3300/11526 [34:24<1:24:11, 1.63it/s] {'loss': 0.282, 'grad_norm': 0.6507381200790405, 'learning_rate': 8.979675173160505e-06, 'epoch': 0.86}
29%|██▊ | 3300/11526 [34:24<1:24:11, 1.63it/s] 29%|██▊ | 3301/11526 [34:25<1:24:15, 1.63it/s] {'loss': 0.2333, 'grad_norm': 0.5324950218200684, 'learning_rate': 8.978758253074169e-06, 'epoch': 0.86}
29%|██▊ | 3301/11526 [34:25<1:24:15, 1.63it/s] 29%|██▊ | 3302/11526 [34:25<1:24:13, 1.63it/s] {'loss': 0.2317, 'grad_norm': 0.4423257112503052, 'learning_rate': 8.97784096803349e-06, 'epoch': 0.86}
29%|██▊ | 3302/11526 [34:25<1:24:13, 1.63it/s] 29%|██▊ | 3303/11526 [34:26<1:24:16, 1.63it/s] {'loss': 0.3167, 'grad_norm': 0.5868343114852905, 'learning_rate': 8.976923318122602e-06, 'epoch': 0.86}
29%|██▊ | 3303/11526 [34:26<1:24:16, 1.63it/s] 29%|██▊ | 3304/11526 [34:26<1:24:14, 1.63it/s] {'loss': 0.2832, 'grad_norm': 0.5108069777488708, 'learning_rate': 8.97600530342568e-06, 'epoch': 0.86}
29%|██▊ | 3304/11526 [34:27<1:24:14, 1.63it/s] 29%|██▊ | 3305/11526 [34:27<1:24:10, 1.63it/s] {'loss': 0.2821, 'grad_norm': 0.57464998960495, 'learning_rate': 8.97508692402693e-06, 'epoch': 0.86}
29%|██▊ | 3305/11526 [34:27<1:24:10, 1.63it/s] 29%|██▊ | 3306/11526 [34:28<1:24:10, 1.63it/s] {'loss': 0.2579, 'grad_norm': 0.5224084854125977, 'learning_rate': 8.97416818001059e-06, 'epoch': 0.86}
29%|██▊ | 3306/11526 [34:28<1:24:10, 1.63it/s] 29%|██▊ | 3307/11526 [34:28<1:24:08, 1.63it/s] {'loss': 0.2266, 'grad_norm': 0.542919397354126, 'learning_rate': 8.97324907146093e-06, 'epoch': 0.86}
29%|██▊ | 3307/11526 [34:28<1:24:08, 1.63it/s] 29%|██▊ | 3308/11526 [34:29<1:24:35, 1.62it/s] {'loss': 0.2286, 'grad_norm': 0.4641464948654175, 'learning_rate': 8.972329598462262e-06, 'epoch': 0.86}
29%|██▊ | 3308/11526 [34:29<1:24:35, 1.62it/s] 29%|██▊ | 3309/11526 [34:30<1:24:26, 1.62it/s] {'loss': 0.2159, 'grad_norm': 0.4637403190135956, 'learning_rate': 8.971409761098919e-06, 'epoch': 0.86}
29%|██▊ | 3309/11526 [34:30<1:24:26, 1.62it/s] 29%|██▊ | 3310/11526 [34:30<1:24:18, 1.62it/s] {'loss': 0.2431, 'grad_norm': 0.4890817701816559, 'learning_rate': 8.970489559455278e-06, 'epoch': 0.86}
29%|██▊ | 3310/11526 [34:30<1:24:18, 1.62it/s] 29%|██▊ | 3311/11526 [34:31<1:24:16, 1.62it/s] {'loss': 0.3312, 'grad_norm': 0.5488962531089783, 'learning_rate': 8.969568993615743e-06, 'epoch': 0.86}
29%|██▊ | 3311/11526 [34:31<1:24:16, 1.62it/s] 29%|██▊ | 3312/11526 [34:31<1:24:13, 1.63it/s] {'loss': 0.2763, 'grad_norm': 0.566710352897644, 'learning_rate': 8.968648063664754e-06, 'epoch': 0.86}
29%|██▊ | 3312/11526 [34:32<1:24:13, 1.63it/s] 29%|██▊ | 3313/11526 [34:32<1:24:16, 1.62it/s] {'loss': 0.2232, 'grad_norm': 0.5113885998725891, 'learning_rate': 8.967726769686783e-06, 'epoch': 0.86}
29%|██▊ | 3313/11526 [34:32<1:24:16, 1.62it/s] 29%|██▉ | 3314/11526 [34:33<1:24:11, 1.63it/s] {'loss': 0.2361, 'grad_norm': 0.4826493263244629, 'learning_rate': 8.966805111766337e-06, 'epoch': 0.86}
29%|██▉ | 3314/11526 [34:33<1:24:11, 1.63it/s] 29%|██▉ | 3315/11526 [34:33<1:24:08, 1.63it/s] {'loss': 0.2017, 'grad_norm': 0.4731743335723877, 'learning_rate': 8.965883089987957e-06, 'epoch': 0.86}
29%|██▉ | 3315/11526 [34:33<1:24:08, 1.63it/s] 29%|██▉ | 3316/11526 [34:34<1:24:08, 1.63it/s] {'loss': 0.332, 'grad_norm': 0.5640020370483398, 'learning_rate': 8.964960704436215e-06, 'epoch': 0.86}
29%|██▉ | 3316/11526 [34:34<1:24:08, 1.63it/s] 29%|██▉ | 3317/11526 [34:34<1:24:06, 1.63it/s] {'loss': 0.1914, 'grad_norm': 0.4671497344970703, 'learning_rate': 8.964037955195716e-06, 'epoch': 0.86}
29%|██▉ | 3317/11526 [34:35<1:24:06, 1.63it/s] 29%|██▉ | 3318/11526 [34:35<1:24:09, 1.63it/s] {'loss': 0.2571, 'grad_norm': 0.5524727702140808, 'learning_rate': 8.963114842351104e-06, 'epoch': 0.86}
29%|██▉ | 3318/11526 [34:35<1:24:09, 1.63it/s] 29%|██▉ | 3319/11526 [34:36<1:24:09, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.46040964126586914, 'learning_rate': 8.962191365987047e-06, 'epoch': 0.86}
29%|██▉ | 3319/11526 [34:36<1:24:09, 1.63it/s] 29%|██▉ | 3320/11526 [34:36<1:24:04, 1.63it/s] {'loss': 0.3828, 'grad_norm': 0.6848766207695007, 'learning_rate': 8.961267526188256e-06, 'epoch': 0.86}
29%|██▉ | 3320/11526 [34:36<1:24:04, 1.63it/s] 29%|██▉ | 3321/11526 [34:37<1:24:02, 1.63it/s] {'loss': 0.2516, 'grad_norm': 0.6254339218139648, 'learning_rate': 8.960343323039468e-06, 'epoch': 0.86}
29%|██▉ | 3321/11526 [34:37<1:24:02, 1.63it/s] 29%|██▉ | 3322/11526 [34:38<1:24:00, 1.63it/s] {'loss': 0.3521, 'grad_norm': 0.6242287158966064, 'learning_rate': 8.959418756625457e-06, 'epoch': 0.86}
29%|██▉ | 3322/11526 [34:38<1:24:00, 1.63it/s] 29%|██▉ | 3323/11526 [34:38<1:24:05, 1.63it/s] {'loss': 0.2372, 'grad_norm': 0.5276476144790649, 'learning_rate': 8.95849382703103e-06, 'epoch': 0.86}
29%|██▉ | 3323/11526 [34:38<1:24:05, 1.63it/s] 29%|██▉ | 3324/11526 [34:39<1:24:02, 1.63it/s] {'loss': 0.4231, 'grad_norm': 0.6450731158256531, 'learning_rate': 8.957568534341026e-06, 'epoch': 0.87}
29%|██▉ | 3324/11526 [34:39<1:24:02, 1.63it/s] 29%|██▉ | 3325/11526 [34:39<1:24:03, 1.63it/s] {'loss': 0.2768, 'grad_norm': 0.5985851287841797, 'learning_rate': 8.956642878640317e-06, 'epoch': 0.87}
29%|██▉ | 3325/11526 [34:40<1:24:03, 1.63it/s] 29%|██▉ | 3326/11526 [34:40<1:24:03, 1.63it/s] {'loss': 0.3488, 'grad_norm': 0.7288268804550171, 'learning_rate': 8.955716860013812e-06, 'epoch': 0.87}
29%|██▉ | 3326/11526 [34:40<1:24:03, 1.63it/s] 29%|██▉ | 3327/11526 [34:41<1:24:03, 1.63it/s] {'loss': 0.226, 'grad_norm': 0.462102472782135, 'learning_rate': 8.95479047854645e-06, 'epoch': 0.87}
29%|██▉ | 3327/11526 [34:41<1:24:03, 1.63it/s] 29%|██▉ | 3328/11526 [34:41<1:24:06, 1.62it/s] {'loss': 0.271, 'grad_norm': 0.584722638130188, 'learning_rate': 8.953863734323202e-06, 'epoch': 0.87}
29%|██▉ | 3328/11526 [34:41<1:24:06, 1.62it/s] 29%|██▉ | 3329/11526 [34:42<1:24:04, 1.62it/s] {'loss': 0.2469, 'grad_norm': 0.6225091218948364, 'learning_rate': 8.952936627429077e-06, 'epoch': 0.87}
29%|██▉ | 3329/11526 [34:42<1:24:04, 1.62it/s] 29%|██▉ | 3330/11526 [34:42<1:24:00, 1.63it/s] {'loss': 0.2672, 'grad_norm': 0.5576671361923218, 'learning_rate': 8.952009157949113e-06, 'epoch': 0.87}
29%|██▉ | 3330/11526 [34:43<1:24:00, 1.63it/s] 29%|██▉ | 3331/11526 [34:43<1:24:04, 1.62it/s] {'loss': 0.3614, 'grad_norm': 0.5954854488372803, 'learning_rate': 8.951081325968383e-06, 'epoch': 0.87}
29%|██▉ | 3331/11526 [34:43<1:24:04, 1.62it/s] 29%|██▉ | 3332/11526 [34:44<1:24:00, 1.63it/s] {'loss': 0.3034, 'grad_norm': 0.6375239491462708, 'learning_rate': 8.950153131571992e-06, 'epoch': 0.87}
29%|██▉ | 3332/11526 [34:44<1:24:00, 1.63it/s] 29%|██▉ | 3333/11526 [34:44<1:24:00, 1.63it/s] {'loss': 0.2728, 'grad_norm': 0.5618076324462891, 'learning_rate': 8.94922457484508e-06, 'epoch': 0.87}
29%|██▉ | 3333/11526 [34:44<1:24:00, 1.63it/s] 29%|██▉ | 3334/11526 [34:45<1:23:58, 1.63it/s] {'loss': 0.2799, 'grad_norm': 0.6043860912322998, 'learning_rate': 8.948295655872822e-06, 'epoch': 0.87}
29%|██▉ | 3334/11526 [34:45<1:23:58, 1.63it/s] 29%|██▉ | 3335/11526 [34:46<1:23:51, 1.63it/s] {'loss': 0.3738, 'grad_norm': 0.5668084025382996, 'learning_rate': 8.94736637474042e-06, 'epoch': 0.87}
29%|██▉ | 3335/11526 [34:46<1:23:51, 1.63it/s] 29%|██▉ | 3336/11526 [34:46<1:23:57, 1.63it/s] {'loss': 0.227, 'grad_norm': 0.47024354338645935, 'learning_rate': 8.946436731533117e-06, 'epoch': 0.87}
29%|██▉ | 3336/11526 [34:46<1:23:57, 1.63it/s] 29%|██▉ | 3337/11526 [34:47<1:23:55, 1.63it/s] {'loss': 0.2635, 'grad_norm': 0.5324330925941467, 'learning_rate': 8.945506726336179e-06, 'epoch': 0.87}
29%|██▉ | 3337/11526 [34:47<1:23:55, 1.63it/s] 29%|██▉ | 3338/11526 [34:47<1:23:58, 1.63it/s] {'loss': 0.2393, 'grad_norm': 0.5441247224807739, 'learning_rate': 8.944576359234918e-06, 'epoch': 0.87}
29%|██▉ | 3338/11526 [34:48<1:23:58, 1.63it/s] 29%|██▉ | 3339/11526 [34:48<1:23:53, 1.63it/s] {'loss': 0.2525, 'grad_norm': 0.5044384002685547, 'learning_rate': 8.943645630314668e-06, 'epoch': 0.87}
29%|██▉ | 3339/11526 [34:48<1:23:53, 1.63it/s] 29%|██▉ | 3340/11526 [34:49<1:23:53, 1.63it/s] {'loss': 0.2703, 'grad_norm': 0.6360739469528198, 'learning_rate': 8.942714539660805e-06, 'epoch': 0.87}
29%|██▉ | 3340/11526 [34:49<1:23:53, 1.63it/s] 29%|██▉ | 3341/11526 [34:49<1:23:57, 1.62it/s] {'loss': 0.2443, 'grad_norm': 0.5289915800094604, 'learning_rate': 8.941783087358728e-06, 'epoch': 0.87}
29%|██▉ | 3341/11526 [34:49<1:23:57, 1.62it/s] 29%|██▉ | 3342/11526 [34:50<1:23:54, 1.63it/s] {'loss': 0.2602, 'grad_norm': 0.5462681651115417, 'learning_rate': 8.94085127349388e-06, 'epoch': 0.87}
29%|██▉ | 3342/11526 [34:50<1:23:54, 1.63it/s] 29%|██▉ | 3343/11526 [34:50<1:23:52, 1.63it/s] {'loss': 0.2106, 'grad_norm': 0.5229120850563049, 'learning_rate': 8.93991909815173e-06, 'epoch': 0.87}
29%|██▉ | 3343/11526 [34:51<1:23:52, 1.63it/s] 29%|██▉ | 3344/11526 [34:51<1:23:49, 1.63it/s] {'loss': 0.2458, 'grad_norm': 0.5348454117774963, 'learning_rate': 8.938986561417781e-06, 'epoch': 0.87}
29%|██▉ | 3344/11526 [34:51<1:23:49, 1.63it/s] 29%|██▉ | 3345/11526 [34:52<1:23:47, 1.63it/s] {'loss': 0.2577, 'grad_norm': 0.5670090317726135, 'learning_rate': 8.938053663377578e-06, 'epoch': 0.87}
29%|██▉ | 3345/11526 [34:52<1:23:47, 1.63it/s] 29%|██▉ | 3346/11526 [34:52<1:23:46, 1.63it/s] {'loss': 0.3201, 'grad_norm': 0.642723023891449, 'learning_rate': 8.93712040411668e-06, 'epoch': 0.87}
29%|██▉ | 3346/11526 [34:52<1:23:46, 1.63it/s] 29%|██▉ | 3347/11526 [34:53<1:23:41, 1.63it/s] {'loss': 0.2957, 'grad_norm': 0.6216418147087097, 'learning_rate': 8.936186783720703e-06, 'epoch': 0.87}
29%|██▉ | 3347/11526 [34:53<1:23:41, 1.63it/s] 29%|██▉ | 3348/11526 [34:54<1:23:43, 1.63it/s] {'loss': 0.2451, 'grad_norm': 0.5174314379692078, 'learning_rate': 8.935252802275277e-06, 'epoch': 0.87}
29%|██▉ | 3348/11526 [34:54<1:23:43, 1.63it/s] 29%|██▉ | 3349/11526 [34:54<1:23:43, 1.63it/s] {'loss': 0.2444, 'grad_norm': 0.5113734602928162, 'learning_rate': 8.934318459866072e-06, 'epoch': 0.87}
29%|██▉ | 3349/11526 [34:54<1:23:43, 1.63it/s] 29%|██▉ | 3350/11526 [34:55<1:23:43, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.4054253101348877, 'learning_rate': 8.933383756578792e-06, 'epoch': 0.87}
29%|██▉ | 3350/11526 [34:55<1:23:43, 1.63it/s] 29%|██▉ | 3351/11526 [34:55<1:23:44, 1.63it/s] {'loss': 0.3116, 'grad_norm': 0.6944946050643921, 'learning_rate': 8.932448692499175e-06, 'epoch': 0.87}
29%|██▉ | 3351/11526 [34:56<1:23:44, 1.63it/s] 29%|██▉ | 3352/11526 [34:56<1:23:43, 1.63it/s] {'loss': 0.278, 'grad_norm': 0.5656095743179321, 'learning_rate': 8.931513267712987e-06, 'epoch': 0.87}
29%|██▉ | 3352/11526 [34:56<1:23:43, 1.63it/s] 29%|██▉ | 3353/11526 [34:57<1:23:45, 1.63it/s] {'loss': 0.2489, 'grad_norm': 0.5217074155807495, 'learning_rate': 8.930577482306035e-06, 'epoch': 0.87}
29%|██▉ | 3353/11526 [34:57<1:23:45, 1.63it/s] 29%|██▉ | 3354/11526 [34:57<1:23:43, 1.63it/s] {'loss': 0.2243, 'grad_norm': 0.4764023423194885, 'learning_rate': 8.929641336364151e-06, 'epoch': 0.87}
29%|██▉ | 3354/11526 [34:57<1:23:43, 1.63it/s] 29%|██▉ | 3355/11526 [34:58<1:23:40, 1.63it/s] {'loss': 0.2173, 'grad_norm': 0.48321473598480225, 'learning_rate': 8.928704829973206e-06, 'epoch': 0.87}
29%|██▉ | 3355/11526 [34:58<1:23:40, 1.63it/s] 29%|██▉ | 3356/11526 [34:58<1:23:39, 1.63it/s] {'loss': 0.2419, 'grad_norm': 0.5273038744926453, 'learning_rate': 8.9277679632191e-06, 'epoch': 0.87}
29%|██▉ | 3356/11526 [34:59<1:23:39, 1.63it/s] 29%|██▉ | 3357/11526 [34:59<1:23:36, 1.63it/s] {'loss': 0.2284, 'grad_norm': 0.4935547113418579, 'learning_rate': 8.92683073618777e-06, 'epoch': 0.87}
29%|██▉ | 3357/11526 [34:59<1:23:36, 1.63it/s] 29%|██▉ | 3358/11526 [35:00<1:23:41, 1.63it/s] {'loss': 0.2808, 'grad_norm': 0.6395642757415771, 'learning_rate': 8.925893148965182e-06, 'epoch': 0.87}
29%|██▉ | 3358/11526 [35:00<1:23:41, 1.63it/s] 29%|██▉ | 3359/11526 [35:00<1:23:39, 1.63it/s] {'loss': 0.2529, 'grad_norm': 0.6237903237342834, 'learning_rate': 8.924955201637334e-06, 'epoch': 0.87}
29%|██▉ | 3359/11526 [35:00<1:23:39, 1.63it/s] 29%|██▉ | 3360/11526 [35:01<1:23:36, 1.63it/s] {'loss': 0.2789, 'grad_norm': 0.6038985252380371, 'learning_rate': 8.924016894290264e-06, 'epoch': 0.87}
29%|██▉ | 3360/11526 [35:01<1:23:36, 1.63it/s] 29%|██▉ | 3361/11526 [35:02<1:23:41, 1.63it/s] {'loss': 0.251, 'grad_norm': 0.4995173513889313, 'learning_rate': 8.923078227010038e-06, 'epoch': 0.87}
29%|██▉ | 3361/11526 [35:02<1:23:41, 1.63it/s] 29%|██▉ | 3362/11526 [35:02<1:23:37, 1.63it/s] {'loss': 0.2711, 'grad_norm': 0.5278076529502869, 'learning_rate': 8.922139199882758e-06, 'epoch': 0.88}
29%|██▉ | 3362/11526 [35:02<1:23:37, 1.63it/s] 29%|██▉ | 3363/11526 [35:03<1:23:35, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.4772874414920807, 'learning_rate': 8.92119981299455e-06, 'epoch': 0.88}
29%|██▉ | 3363/11526 [35:03<1:23:35, 1.63it/s] 29%|██▉ | 3364/11526 [35:03<1:23:36, 1.63it/s] {'loss': 0.2483, 'grad_norm': 0.5135446190834045, 'learning_rate': 8.920260066431587e-06, 'epoch': 0.88}
29%|██▉ | 3364/11526 [35:04<1:23:36, 1.63it/s] 29%|██▉ | 3365/11526 [35:04<1:23:36, 1.63it/s] {'loss': 0.2646, 'grad_norm': 0.5959073901176453, 'learning_rate': 8.919319960280066e-06, 'epoch': 0.88}
29%|██▉ | 3365/11526 [35:04<1:23:36, 1.63it/s] 29%|██▉ | 3366/11526 [35:05<1:23:36, 1.63it/s] {'loss': 0.2749, 'grad_norm': 0.5951917171478271, 'learning_rate': 8.918379494626218e-06, 'epoch': 0.88}
29%|██▉ | 3366/11526 [35:05<1:23:36, 1.63it/s] 29%|██▉ | 3367/11526 [35:05<1:23:36, 1.63it/s] {'loss': 0.2516, 'grad_norm': 0.4223868250846863, 'learning_rate': 8.917438669556307e-06, 'epoch': 0.88}
29%|██▉ | 3367/11526 [35:05<1:23:36, 1.63it/s] 29%|██▉ | 3368/11526 [35:06<1:23:34, 1.63it/s] {'loss': 0.3805, 'grad_norm': 0.6772180199623108, 'learning_rate': 8.916497485156632e-06, 'epoch': 0.88}
29%|██▉ | 3368/11526 [35:06<1:23:34, 1.63it/s] 29%|██▉ | 3369/11526 [35:06<1:23:30, 1.63it/s] {'loss': 0.2875, 'grad_norm': 0.5707951188087463, 'learning_rate': 8.915555941513525e-06, 'epoch': 0.88}
29%|██▉ | 3369/11526 [35:07<1:23:30, 1.63it/s] 29%|██▉ | 3370/11526 [35:07<1:23:33, 1.63it/s] {'loss': 0.2369, 'grad_norm': 0.5634805560112, 'learning_rate': 8.914614038713347e-06, 'epoch': 0.88}
29%|██▉ | 3370/11526 [35:07<1:23:33, 1.63it/s] 29%|██▉ | 3371/11526 [35:08<1:23:32, 1.63it/s] {'loss': 0.2188, 'grad_norm': 0.5917531251907349, 'learning_rate': 8.913671776842496e-06, 'epoch': 0.88}
29%|██▉ | 3371/11526 [35:08<1:23:32, 1.63it/s] 29%|██▉ | 3372/11526 [35:08<1:23:31, 1.63it/s] {'loss': 0.3105, 'grad_norm': 0.5988631248474121, 'learning_rate': 8.912729155987403e-06, 'epoch': 0.88}
29%|██▉ | 3372/11526 [35:08<1:23:31, 1.63it/s] 29%|██▉ | 3373/11526 [35:09<1:23:30, 1.63it/s] {'loss': 0.2633, 'grad_norm': 0.57813560962677, 'learning_rate': 8.911786176234529e-06, 'epoch': 0.88}
29%|██▉ | 3373/11526 [35:09<1:23:30, 1.63it/s] 29%|██▉ | 3374/11526 [35:10<1:23:31, 1.63it/s] {'loss': 0.2487, 'grad_norm': 0.5092580914497375, 'learning_rate': 8.91084283767037e-06, 'epoch': 0.88}
29%|██▉ | 3374/11526 [35:10<1:23:31, 1.63it/s] 29%|██▉ | 3375/11526 [35:10<1:23:27, 1.63it/s] {'loss': 0.2584, 'grad_norm': 0.5822776556015015, 'learning_rate': 8.909899140381454e-06, 'epoch': 0.88}
29%|██▉ | 3375/11526 [35:10<1:23:27, 1.63it/s] 29%|██▉ | 3376/11526 [35:11<1:23:26, 1.63it/s] {'loss': 0.2491, 'grad_norm': 0.5218001008033752, 'learning_rate': 8.90895508445434e-06, 'epoch': 0.88}
29%|██▉ | 3376/11526 [35:11<1:23:26, 1.63it/s] 29%|██▉ | 3377/11526 [35:11<1:23:27, 1.63it/s] {'loss': 0.2883, 'grad_norm': 0.5840132236480713, 'learning_rate': 8.908010669975628e-06, 'epoch': 0.88}
29%|██▉ | 3377/11526 [35:12<1:23:27, 1.63it/s] 29%|██▉ | 3378/11526 [35:12<1:23:24, 1.63it/s] {'loss': 0.2989, 'grad_norm': 0.6598412990570068, 'learning_rate': 8.907065897031941e-06, 'epoch': 0.88}
29%|██▉ | 3378/11526 [35:12<1:23:24, 1.63it/s] 29%|██▉ | 3379/11526 [35:13<1:23:23, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5414140224456787, 'learning_rate': 8.906120765709938e-06, 'epoch': 0.88}
29%|██▉ | 3379/11526 [35:13<1:23:23, 1.63it/s] 29%|██▉ | 3380/11526 [35:13<1:23:23, 1.63it/s] {'loss': 0.2564, 'grad_norm': 0.5808610916137695, 'learning_rate': 8.905175276096314e-06, 'epoch': 0.88}
29%|██▉ | 3380/11526 [35:13<1:23:23, 1.63it/s] 29%|██▉ | 3381/11526 [35:14<1:23:19, 1.63it/s] {'loss': 0.2022, 'grad_norm': 0.4767516553401947, 'learning_rate': 8.904229428277794e-06, 'epoch': 0.88}
29%|██▉ | 3381/11526 [35:14<1:23:19, 1.63it/s] 29%|██▉ | 3382/11526 [35:14<1:23:21, 1.63it/s] {'loss': 0.2101, 'grad_norm': 0.5224301815032959, 'learning_rate': 8.90328322234114e-06, 'epoch': 0.88}
29%|██▉ | 3382/11526 [35:15<1:23:21, 1.63it/s] 29%|██▉ | 3383/11526 [35:15<1:23:20, 1.63it/s] {'loss': 0.2584, 'grad_norm': 0.5774487853050232, 'learning_rate': 8.902336658373136e-06, 'epoch': 0.88}
29%|██▉ | 3383/11526 [35:15<1:23:20, 1.63it/s] 29%|██▉ | 3384/11526 [35:16<1:23:18, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.5055240988731384, 'learning_rate': 8.901389736460611e-06, 'epoch': 0.88}
29%|██▉ | 3384/11526 [35:16<1:23:18, 1.63it/s] 29%|██▉ | 3385/11526 [35:16<1:23:19, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.48320987820625305, 'learning_rate': 8.900442456690423e-06, 'epoch': 0.88}
29%|██▉ | 3385/11526 [35:16<1:23:19, 1.63it/s] 29%|██▉ | 3386/11526 [35:17<1:23:18, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.4984889030456543, 'learning_rate': 8.89949481914946e-06, 'epoch': 0.88}
29%|██▉ | 3386/11526 [35:17<1:23:18, 1.63it/s] 29%|██▉ | 3387/11526 [35:18<1:23:17, 1.63it/s] {'loss': 0.2228, 'grad_norm': 0.4963016211986542, 'learning_rate': 8.898546823924644e-06, 'epoch': 0.88}
29%|██▉ | 3387/11526 [35:18<1:23:17, 1.63it/s] 29%|██▉ | 3388/11526 [35:18<1:23:17, 1.63it/s] {'loss': 0.2302, 'grad_norm': 0.4862391948699951, 'learning_rate': 8.897598471102933e-06, 'epoch': 0.88}
29%|██▉ | 3388/11526 [35:18<1:23:17, 1.63it/s] 29%|██▉ | 3389/11526 [35:19<1:23:18, 1.63it/s] {'loss': 0.351, 'grad_norm': 0.6968088746070862, 'learning_rate': 8.896649760771311e-06, 'epoch': 0.88}
29%|██▉ | 3389/11526 [35:19<1:23:18, 1.63it/s] 29%|██▉ | 3390/11526 [35:19<1:23:15, 1.63it/s] {'loss': 0.2308, 'grad_norm': 0.4921184182167053, 'learning_rate': 8.895700693016804e-06, 'epoch': 0.88}
29%|██▉ | 3390/11526 [35:19<1:23:15, 1.63it/s] 29%|██▉ | 3391/11526 [35:20<1:23:15, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5437926650047302, 'learning_rate': 8.894751267926463e-06, 'epoch': 0.88}
29%|██▉ | 3391/11526 [35:20<1:23:15, 1.63it/s] 29%|██▉ | 3392/11526 [35:21<1:23:14, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.4578608274459839, 'learning_rate': 8.893801485587377e-06, 'epoch': 0.88}
29%|██▉ | 3392/11526 [35:21<1:23:14, 1.63it/s] 29%|██▉ | 3393/11526 [35:21<1:23:13, 1.63it/s] {'loss': 0.2778, 'grad_norm': 0.5503886938095093, 'learning_rate': 8.892851346086661e-06, 'epoch': 0.88}
29%|██▉ | 3393/11526 [35:21<1:23:13, 1.63it/s] 29%|██▉ | 3394/11526 [35:22<1:23:12, 1.63it/s] {'loss': 0.2917, 'grad_norm': 0.6014608144760132, 'learning_rate': 8.891900849511473e-06, 'epoch': 0.88}
29%|██▉ | 3394/11526 [35:22<1:23:12, 1.63it/s] 29%|██▉ | 3395/11526 [35:22<1:23:12, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.531826376914978, 'learning_rate': 8.890949995948994e-06, 'epoch': 0.88}
29%|██▉ | 3395/11526 [35:23<1:23:12, 1.63it/s] 29%|██▉ | 3396/11526 [35:23<1:23:13, 1.63it/s] {'loss': 0.271, 'grad_norm': 0.4659925401210785, 'learning_rate': 8.889998785486443e-06, 'epoch': 0.88}
29%|██▉ | 3396/11526 [35:23<1:23:13, 1.63it/s] 29%|██▉ | 3397/11526 [35:24<1:23:14, 1.63it/s] {'loss': 0.2433, 'grad_norm': 0.5870956778526306, 'learning_rate': 8.88904721821107e-06, 'epoch': 0.88}
29%|██▉ | 3397/11526 [35:24<1:23:14, 1.63it/s] 29%|██▉ | 3398/11526 [35:24<1:23:13, 1.63it/s] {'loss': 0.3004, 'grad_norm': 0.5768120884895325, 'learning_rate': 8.888095294210159e-06, 'epoch': 0.88}
29%|██▉ | 3398/11526 [35:24<1:23:13, 1.63it/s] 29%|██▉ | 3399/11526 [35:25<1:23:11, 1.63it/s] {'loss': 0.2346, 'grad_norm': 0.4798045754432678, 'learning_rate': 8.887143013571024e-06, 'epoch': 0.88}
29%|██▉ | 3399/11526 [35:25<1:23:11, 1.63it/s] 29%|██▉ | 3400/11526 [35:26<1:23:10, 1.63it/s] {'loss': 0.2113, 'grad_norm': 0.4450205862522125, 'learning_rate': 8.886190376381017e-06, 'epoch': 0.88}
29%|██▉ | 3400/11526 [35:26<1:23:10, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.27it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6406479477882385, 'eval_runtime': 1.955, 'eval_samples_per_second': 102.302, 'eval_steps_per_second': 6.65, 'epoch': 0.88}
29%|██▉ | 3400/11526 [35:28<1:23:10, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 30%|██▉ | 3401/11526 [35:28<2:42:47, 1.20s/it] {'loss': 0.2579, 'grad_norm': 0.5783776640892029, 'learning_rate': 8.885237382727514e-06, 'epoch': 0.89}
30%|██▉ | 3401/11526 [35:28<2:42:47, 1.20s/it] 30%|██▉ | 3402/11526 [35:29<2:18:53, 1.03s/it] {'loss': 0.2113, 'grad_norm': 0.5266005396842957, 'learning_rate': 8.884284032697933e-06, 'epoch': 0.89}
30%|██▉ | 3402/11526 [35:29<2:18:53, 1.03s/it] 30%|██▉ | 3403/11526 [35:29<2:02:07, 1.11it/s] {'loss': 0.2822, 'grad_norm': 0.6089188456535339, 'learning_rate': 8.883330326379722e-06, 'epoch': 0.89}
30%|██▉ | 3403/11526 [35:29<2:02:07, 1.11it/s] 30%|██▉ | 3404/11526 [35:30<1:50:24, 1.23it/s] {'loss': 0.2061, 'grad_norm': 0.5159859657287598, 'learning_rate': 8.882376263860357e-06, 'epoch': 0.89}
30%|██▉ | 3404/11526 [35:30<1:50:24, 1.23it/s] 30%|██▉ | 3405/11526 [35:31<1:42:11, 1.32it/s] {'loss': 0.2973, 'grad_norm': 0.6374790668487549, 'learning_rate': 8.881421845227351e-06, 'epoch': 0.89}
30%|██▉ | 3405/11526 [35:31<1:42:11, 1.32it/s] 30%|██▉ | 3406/11526 [35:31<1:36:24, 1.40it/s] {'loss': 0.2966, 'grad_norm': 0.6066188216209412, 'learning_rate': 8.88046707056825e-06, 'epoch': 0.89}
30%|██▉ | 3406/11526 [35:31<1:36:24, 1.40it/s] 30%|██▉ | 3407/11526 [35:32<1:32:26, 1.46it/s] {'loss': 0.2431, 'grad_norm': 0.5734796524047852, 'learning_rate': 8.879511939970629e-06, 'epoch': 0.89}
30%|██▉ | 3407/11526 [35:32<1:32:26, 1.46it/s] 30%|██▉ | 3408/11526 [35:32<1:29:36, 1.51it/s] {'loss': 0.2922, 'grad_norm': 0.5677675604820251, 'learning_rate': 8.8785564535221e-06, 'epoch': 0.89}
30%|██▉ | 3408/11526 [35:33<1:29:36, 1.51it/s] 30%|██▉ | 3409/11526 [35:33<1:27:39, 1.54it/s] {'loss': 0.2456, 'grad_norm': 0.6469428539276123, 'learning_rate': 8.877600611310305e-06, 'epoch': 0.89}
30%|██▉ | 3409/11526 [35:33<1:27:39, 1.54it/s] 30%|██▉ | 3410/11526 [35:34<1:26:16, 1.57it/s] {'loss': 0.2194, 'grad_norm': 0.45197367668151855, 'learning_rate': 8.876644413422918e-06, 'epoch': 0.89}
30%|██▉ | 3410/11526 [35:34<1:26:16, 1.57it/s] 30%|██▉ | 3411/11526 [35:34<1:25:19, 1.59it/s] {'loss': 0.2062, 'grad_norm': 0.4286777675151825, 'learning_rate': 8.87568785994765e-06, 'epoch': 0.89}
30%|██▉ | 3411/11526 [35:34<1:25:19, 1.59it/s] 30%|██▉ | 3412/11526 [35:35<1:24:36, 1.60it/s] {'loss': 0.3052, 'grad_norm': 0.5592882037162781, 'learning_rate': 8.874730950972239e-06, 'epoch': 0.89}
30%|██▉ | 3412/11526 [35:35<1:24:36, 1.60it/s] 30%|██▉ | 3413/11526 [35:35<1:24:11, 1.61it/s] {'loss': 0.2479, 'grad_norm': 0.5153293013572693, 'learning_rate': 8.873773686584459e-06, 'epoch': 0.89}
30%|██▉ | 3413/11526 [35:36<1:24:11, 1.61it/s] 30%|██▉ | 3414/11526 [35:36<1:23:51, 1.61it/s] {'loss': 0.214, 'grad_norm': 0.46586039662361145, 'learning_rate': 8.872816066872116e-06, 'epoch': 0.89}
30%|██▉ | 3414/11526 [35:36<1:23:51, 1.61it/s] 30%|██▉ | 3415/11526 [35:37<1:23:33, 1.62it/s] {'loss': 0.2309, 'grad_norm': 0.48219043016433716, 'learning_rate': 8.871858091923047e-06, 'epoch': 0.89}
30%|██▉ | 3415/11526 [35:37<1:23:33, 1.62it/s] 30%|██▉ | 3416/11526 [35:37<1:23:21, 1.62it/s] {'loss': 0.2011, 'grad_norm': 0.44164276123046875, 'learning_rate': 8.870899761825125e-06, 'epoch': 0.89}
30%|██▉ | 3416/11526 [35:37<1:23:21, 1.62it/s] 30%|██▉ | 3417/11526 [35:38<1:23:13, 1.62it/s] {'loss': 0.2622, 'grad_norm': 0.6196989417076111, 'learning_rate': 8.869941076666251e-06, 'epoch': 0.89}
30%|██▉ | 3417/11526 [35:38<1:23:13, 1.62it/s] 30%|██▉ | 3418/11526 [35:39<1:23:07, 1.63it/s] {'loss': 0.3094, 'grad_norm': 0.6543906927108765, 'learning_rate': 8.868982036534364e-06, 'epoch': 0.89}
30%|██▉ | 3418/11526 [35:39<1:23:07, 1.63it/s] 30%|██▉ | 3419/11526 [35:39<1:23:03, 1.63it/s] {'loss': 0.2322, 'grad_norm': 0.558591365814209, 'learning_rate': 8.86802264151743e-06, 'epoch': 0.89}
30%|██▉ | 3419/11526 [35:39<1:23:03, 1.63it/s] 30%|██▉ | 3420/11526 [35:40<1:23:03, 1.63it/s] {'loss': 0.1909, 'grad_norm': 0.44510772824287415, 'learning_rate': 8.86706289170345e-06, 'epoch': 0.89}
30%|██▉ | 3420/11526 [35:40<1:23:03, 1.63it/s] 30%|██▉ | 3421/11526 [35:40<1:22:59, 1.63it/s] {'loss': 0.2386, 'grad_norm': 0.5556600093841553, 'learning_rate': 8.866102787180461e-06, 'epoch': 0.89}
30%|██▉ | 3421/11526 [35:40<1:22:59, 1.63it/s] 30%|██▉ | 3422/11526 [35:41<1:22:59, 1.63it/s] {'loss': 0.2782, 'grad_norm': 0.5801070928573608, 'learning_rate': 8.865142328036527e-06, 'epoch': 0.89}
30%|██▉ | 3422/11526 [35:41<1:22:59, 1.63it/s] 30%|██▉ | 3423/11526 [35:42<1:22:57, 1.63it/s] {'loss': 0.2045, 'grad_norm': 0.5559228658676147, 'learning_rate': 8.864181514359746e-06, 'epoch': 0.89}
30%|██▉ | 3423/11526 [35:42<1:22:57, 1.63it/s] 30%|██▉ | 3424/11526 [35:42<1:22:55, 1.63it/s] {'loss': 0.2169, 'grad_norm': 0.48113489151000977, 'learning_rate': 8.863220346238251e-06, 'epoch': 0.89}
30%|██▉ | 3424/11526 [35:42<1:22:55, 1.63it/s] 30%|██▉ | 3425/11526 [35:43<1:22:56, 1.63it/s] {'loss': 0.1966, 'grad_norm': 0.42545342445373535, 'learning_rate': 8.862258823760205e-06, 'epoch': 0.89}
30%|██▉ | 3425/11526 [35:43<1:22:56, 1.63it/s] 30%|██▉ | 3426/11526 [35:43<1:22:53, 1.63it/s] {'loss': 0.2365, 'grad_norm': 0.5311763286590576, 'learning_rate': 8.861296947013803e-06, 'epoch': 0.89}
30%|██▉ | 3426/11526 [35:44<1:22:53, 1.63it/s] 30%|██▉ | 3427/11526 [35:44<1:22:55, 1.63it/s] {'loss': 0.2949, 'grad_norm': 0.6805049777030945, 'learning_rate': 8.860334716087276e-06, 'epoch': 0.89}
30%|██▉ | 3427/11526 [35:44<1:22:55, 1.63it/s] 30%|██▉ | 3428/11526 [35:45<1:22:52, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.3997363746166229, 'learning_rate': 8.859372131068884e-06, 'epoch': 0.89}
30%|██▉ | 3428/11526 [35:45<1:22:52, 1.63it/s] 30%|██▉ | 3429/11526 [35:45<1:22:56, 1.63it/s] {'loss': 0.3017, 'grad_norm': 0.641312301158905, 'learning_rate': 8.858409192046922e-06, 'epoch': 0.89}
30%|██▉ | 3429/11526 [35:45<1:22:56, 1.63it/s] 30%|██▉ | 3430/11526 [35:46<1:22:52, 1.63it/s] {'loss': 0.2758, 'grad_norm': 0.5318644642829895, 'learning_rate': 8.857445899109716e-06, 'epoch': 0.89}
30%|██▉ | 3430/11526 [35:46<1:22:52, 1.63it/s] 30%|██▉ | 3431/11526 [35:46<1:22:47, 1.63it/s] {'loss': 0.2378, 'grad_norm': 0.5316352844238281, 'learning_rate': 8.856482252345623e-06, 'epoch': 0.89}
30%|██▉ | 3431/11526 [35:47<1:22:47, 1.63it/s] 30%|██▉ | 3432/11526 [35:47<1:22:47, 1.63it/s] {'loss': 0.2517, 'grad_norm': 0.5973569750785828, 'learning_rate': 8.855518251843035e-06, 'epoch': 0.89}
30%|██▉ | 3432/11526 [35:47<1:22:47, 1.63it/s] 30%|██▉ | 3433/11526 [35:48<1:22:47, 1.63it/s] {'loss': 0.2805, 'grad_norm': 0.5977531671524048, 'learning_rate': 8.854553897690377e-06, 'epoch': 0.89}
30%|██▉ | 3433/11526 [35:48<1:22:47, 1.63it/s] 30%|██▉ | 3434/11526 [35:48<1:22:46, 1.63it/s] {'loss': 0.2433, 'grad_norm': 0.5363415479660034, 'learning_rate': 8.853589189976105e-06, 'epoch': 0.89}
30%|██▉ | 3434/11526 [35:48<1:22:46, 1.63it/s] 30%|██▉ | 3435/11526 [35:49<1:22:46, 1.63it/s] {'loss': 0.3001, 'grad_norm': 0.6430956125259399, 'learning_rate': 8.852624128788705e-06, 'epoch': 0.89}
30%|██▉ | 3435/11526 [35:49<1:22:46, 1.63it/s] 30%|██▉ | 3436/11526 [35:50<1:22:46, 1.63it/s] {'loss': 0.1966, 'grad_norm': 0.45123377442359924, 'learning_rate': 8.851658714216698e-06, 'epoch': 0.89}
30%|██▉ | 3436/11526 [35:50<1:22:46, 1.63it/s] 30%|██▉ | 3437/11526 [35:50<1:22:46, 1.63it/s] {'loss': 0.2959, 'grad_norm': 0.6183458566665649, 'learning_rate': 8.85069294634864e-06, 'epoch': 0.89}
30%|██▉ | 3437/11526 [35:50<1:22:46, 1.63it/s] 30%|██▉ | 3438/11526 [35:51<1:22:45, 1.63it/s] {'loss': 0.2199, 'grad_norm': 0.4813840091228485, 'learning_rate': 8.849726825273117e-06, 'epoch': 0.89}
30%|██▉ | 3438/11526 [35:51<1:22:45, 1.63it/s] 30%|██▉ | 3439/11526 [35:51<1:22:46, 1.63it/s] {'loss': 0.221, 'grad_norm': 0.5176568031311035, 'learning_rate': 8.848760351078742e-06, 'epoch': 0.9}
30%|██▉ | 3439/11526 [35:52<1:22:46, 1.63it/s] 30%|██▉ | 3440/11526 [35:52<1:22:46, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.4947095513343811, 'learning_rate': 8.847793523854173e-06, 'epoch': 0.9}
30%|██▉ | 3440/11526 [35:52<1:22:46, 1.63it/s] 30%|██▉ | 3441/11526 [35:53<1:22:46, 1.63it/s] {'loss': 0.2508, 'grad_norm': 0.6136111617088318, 'learning_rate': 8.846826343688086e-06, 'epoch': 0.9}
30%|██▉ | 3441/11526 [35:53<1:22:46, 1.63it/s] 30%|██▉ | 3442/11526 [35:53<1:22:44, 1.63it/s] {'loss': 0.2304, 'grad_norm': 0.4331652522087097, 'learning_rate': 8.8458588106692e-06, 'epoch': 0.9}
30%|██▉ | 3442/11526 [35:53<1:22:44, 1.63it/s] 30%|██▉ | 3443/11526 [35:54<1:22:43, 1.63it/s] {'loss': 0.2099, 'grad_norm': 0.4416663646697998, 'learning_rate': 8.844890924886263e-06, 'epoch': 0.9}
30%|██▉ | 3443/11526 [35:54<1:22:43, 1.63it/s] 30%|██▉ | 3444/11526 [35:54<1:22:43, 1.63it/s] {'loss': 0.2454, 'grad_norm': 0.5727618932723999, 'learning_rate': 8.843922686428052e-06, 'epoch': 0.9}
30%|██▉ | 3444/11526 [35:55<1:22:43, 1.63it/s] 30%|██▉ | 3445/11526 [35:55<1:22:43, 1.63it/s] {'loss': 0.2849, 'grad_norm': 0.5543092489242554, 'learning_rate': 8.842954095383383e-06, 'epoch': 0.9}
30%|██▉ | 3445/11526 [35:55<1:22:43, 1.63it/s] 30%|██▉ | 3446/11526 [35:56<1:22:41, 1.63it/s] {'loss': 0.2569, 'grad_norm': 0.5430343747138977, 'learning_rate': 8.841985151841098e-06, 'epoch': 0.9}
30%|██▉ | 3446/11526 [35:56<1:22:41, 1.63it/s] 30%|██▉ | 3447/11526 [35:56<1:22:42, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.5123645663261414, 'learning_rate': 8.841015855890074e-06, 'epoch': 0.9}
30%|██▉ | 3447/11526 [35:56<1:22:42, 1.63it/s] 30%|██▉ | 3448/11526 [35:57<1:22:40, 1.63it/s] {'loss': 0.2039, 'grad_norm': 0.517353355884552, 'learning_rate': 8.840046207619225e-06, 'epoch': 0.9}
30%|██▉ | 3448/11526 [35:57<1:22:40, 1.63it/s] 30%|██▉ | 3449/11526 [35:58<1:23:06, 1.62it/s] {'loss': 0.2832, 'grad_norm': 0.5339198708534241, 'learning_rate': 8.839076207117485e-06, 'epoch': 0.9}
30%|██▉ | 3449/11526 [35:58<1:23:06, 1.62it/s] 30%|██▉ | 3450/11526 [35:58<1:23:01, 1.62it/s] {'loss': 0.352, 'grad_norm': 0.6353195309638977, 'learning_rate': 8.838105854473833e-06, 'epoch': 0.9}
30%|██▉ | 3450/11526 [35:58<1:23:01, 1.62it/s] 30%|██▉ | 3451/11526 [35:59<1:22:55, 1.62it/s] {'loss': 0.2696, 'grad_norm': 0.5070340633392334, 'learning_rate': 8.837135149777273e-06, 'epoch': 0.9}
30%|██▉ | 3451/11526 [35:59<1:22:55, 1.62it/s] 30%|██▉ | 3452/11526 [35:59<1:22:49, 1.62it/s] {'loss': 0.2189, 'grad_norm': 0.517085611820221, 'learning_rate': 8.836164093116846e-06, 'epoch': 0.9}
30%|██▉ | 3452/11526 [36:00<1:22:49, 1.62it/s] 30%|██▉ | 3453/11526 [36:00<1:23:11, 1.62it/s] {'loss': 0.2304, 'grad_norm': 0.5114079713821411, 'learning_rate': 8.835192684581621e-06, 'epoch': 0.9}
30%|██▉ | 3453/11526 [36:00<1:23:11, 1.62it/s] 30%|██▉ | 3454/11526 [36:01<1:23:05, 1.62it/s] {'loss': 0.3285, 'grad_norm': 0.5794717073440552, 'learning_rate': 8.8342209242607e-06, 'epoch': 0.9}
30%|██▉ | 3454/11526 [36:01<1:23:05, 1.62it/s] 30%|██▉ | 3455/11526 [36:01<1:22:59, 1.62it/s] {'loss': 0.3231, 'grad_norm': 0.6313707828521729, 'learning_rate': 8.833248812243224e-06, 'epoch': 0.9}
30%|██▉ | 3455/11526 [36:01<1:22:59, 1.62it/s] 30%|██▉ | 3456/11526 [36:02<1:23:00, 1.62it/s] {'loss': 0.2283, 'grad_norm': 0.5483278036117554, 'learning_rate': 8.832276348618354e-06, 'epoch': 0.9}
30%|██▉ | 3456/11526 [36:02<1:23:00, 1.62it/s] 30%|██▉ | 3457/11526 [36:02<1:22:54, 1.62it/s] {'loss': 0.2602, 'grad_norm': 0.5143687725067139, 'learning_rate': 8.831303533475294e-06, 'epoch': 0.9}
30%|██▉ | 3457/11526 [36:03<1:22:54, 1.62it/s] 30%|███ | 3458/11526 [36:03<1:23:12, 1.62it/s] {'loss': 0.2307, 'grad_norm': 0.5824894905090332, 'learning_rate': 8.830330366903273e-06, 'epoch': 0.9}
30%|███ | 3458/11526 [36:03<1:23:12, 1.62it/s] 30%|███ | 3459/11526 [36:04<1:23:00, 1.62it/s] {'loss': 0.2635, 'grad_norm': 0.5783587694168091, 'learning_rate': 8.829356848991557e-06, 'epoch': 0.9}
30%|███ | 3459/11526 [36:04<1:23:00, 1.62it/s] 30%|███ | 3460/11526 [36:04<1:22:52, 1.62it/s] {'loss': 0.2601, 'grad_norm': 0.5410026907920837, 'learning_rate': 8.828382979829444e-06, 'epoch': 0.9}
30%|███ | 3460/11526 [36:04<1:22:52, 1.62it/s] 30%|███ | 3461/11526 [36:05<1:22:49, 1.62it/s] {'loss': 0.3395, 'grad_norm': 0.6499840617179871, 'learning_rate': 8.827408759506261e-06, 'epoch': 0.9}
30%|███ | 3461/11526 [36:05<1:22:49, 1.62it/s] 30%|███ | 3462/11526 [36:06<1:22:46, 1.62it/s] {'loss': 0.2812, 'grad_norm': 0.5451678037643433, 'learning_rate': 8.82643418811137e-06, 'epoch': 0.9}
30%|███ | 3462/11526 [36:06<1:22:46, 1.62it/s] 30%|███ | 3463/11526 [36:06<1:22:45, 1.62it/s] {'loss': 0.1955, 'grad_norm': 0.5798733830451965, 'learning_rate': 8.825459265734163e-06, 'epoch': 0.9}
30%|███ | 3463/11526 [36:06<1:22:45, 1.62it/s] 30%|███ | 3464/11526 [36:07<1:22:42, 1.62it/s] {'loss': 0.2649, 'grad_norm': 0.6007682681083679, 'learning_rate': 8.824483992464066e-06, 'epoch': 0.9}
30%|███ | 3464/11526 [36:07<1:22:42, 1.62it/s] 30%|███ | 3465/11526 [36:07<1:22:36, 1.63it/s] {'loss': 0.3468, 'grad_norm': 0.6760718822479248, 'learning_rate': 8.823508368390537e-06, 'epoch': 0.9}
30%|███ | 3465/11526 [36:08<1:22:36, 1.63it/s] 30%|███ | 3466/11526 [36:08<1:22:38, 1.63it/s] {'loss': 0.2658, 'grad_norm': 0.574600100517273, 'learning_rate': 8.822532393603066e-06, 'epoch': 0.9}
30%|███ | 3466/11526 [36:08<1:22:38, 1.63it/s] 30%|███ | 3467/11526 [36:09<1:22:37, 1.63it/s] {'loss': 0.2473, 'grad_norm': 0.46598005294799805, 'learning_rate': 8.821556068191175e-06, 'epoch': 0.9}
30%|███ | 3467/11526 [36:09<1:22:37, 1.63it/s] 30%|███ | 3468/11526 [36:09<1:22:44, 1.62it/s] {'loss': 0.2627, 'grad_norm': 0.49881765246391296, 'learning_rate': 8.820579392244418e-06, 'epoch': 0.9}
30%|███ | 3468/11526 [36:09<1:22:44, 1.62it/s] 30%|███ | 3469/11526 [36:10<1:22:40, 1.62it/s] {'loss': 0.2574, 'grad_norm': 0.5565898418426514, 'learning_rate': 8.819602365852378e-06, 'epoch': 0.9}
30%|███ | 3469/11526 [36:10<1:22:40, 1.62it/s] 30%|███ | 3470/11526 [36:11<1:22:38, 1.62it/s] {'loss': 0.2521, 'grad_norm': 0.4806572198867798, 'learning_rate': 8.818624989104679e-06, 'epoch': 0.9}
30%|███ | 3470/11526 [36:11<1:22:38, 1.62it/s] 30%|███ | 3471/11526 [36:11<1:22:35, 1.63it/s] {'loss': 0.2597, 'grad_norm': 0.5391694903373718, 'learning_rate': 8.817647262090968e-06, 'epoch': 0.9}
30%|███ | 3471/11526 [36:11<1:22:35, 1.63it/s] 30%|███ | 3472/11526 [36:12<1:22:33, 1.63it/s] {'loss': 0.2803, 'grad_norm': 0.6407533884048462, 'learning_rate': 8.816669184900928e-06, 'epoch': 0.9}
30%|███ | 3472/11526 [36:12<1:22:33, 1.63it/s] 30%|███ | 3473/11526 [36:12<1:22:37, 1.62it/s] {'loss': 0.2176, 'grad_norm': 0.5385170578956604, 'learning_rate': 8.815690757624276e-06, 'epoch': 0.9}
30%|███ | 3473/11526 [36:12<1:22:37, 1.62it/s] 30%|███ | 3474/11526 [36:13<1:22:33, 1.63it/s] {'loss': 0.2973, 'grad_norm': 0.5845932960510254, 'learning_rate': 8.814711980350757e-06, 'epoch': 0.9}
30%|███ | 3474/11526 [36:13<1:22:33, 1.63it/s] 30%|███ | 3475/11526 [36:14<1:22:29, 1.63it/s] {'loss': 0.2345, 'grad_norm': 0.465877503156662, 'learning_rate': 8.813732853170149e-06, 'epoch': 0.9}
30%|███ | 3475/11526 [36:14<1:22:29, 1.63it/s] 30%|███ | 3476/11526 [36:14<1:22:31, 1.63it/s] {'loss': 0.2214, 'grad_norm': 0.518582284450531, 'learning_rate': 8.812753376172266e-06, 'epoch': 0.9}
30%|███ | 3476/11526 [36:14<1:22:31, 1.63it/s] 30%|███ | 3477/11526 [36:15<1:22:27, 1.63it/s] {'loss': 0.2487, 'grad_norm': 0.6255497336387634, 'learning_rate': 8.81177354944695e-06, 'epoch': 0.9}
30%|███ | 3477/11526 [36:15<1:22:27, 1.63it/s] 30%|███ | 3478/11526 [36:15<1:22:34, 1.62it/s] {'loss': 0.241, 'grad_norm': 0.5241779685020447, 'learning_rate': 8.810793373084074e-06, 'epoch': 0.91}
30%|███ | 3478/11526 [36:16<1:22:34, 1.62it/s] 30%|███ | 3479/11526 [36:16<1:22:28, 1.63it/s] {'loss': 0.2911, 'grad_norm': 0.5430248975753784, 'learning_rate': 8.809812847173547e-06, 'epoch': 0.91}
30%|███ | 3479/11526 [36:16<1:22:28, 1.63it/s] 30%|███ | 3480/11526 [36:17<1:22:25, 1.63it/s] {'loss': 0.283, 'grad_norm': 0.711655855178833, 'learning_rate': 8.808831971805311e-06, 'epoch': 0.91}
30%|███ | 3480/11526 [36:17<1:22:25, 1.63it/s] 30%|███ | 3481/11526 [36:17<1:22:24, 1.63it/s] {'loss': 0.2036, 'grad_norm': 0.46426257491111755, 'learning_rate': 8.807850747069335e-06, 'epoch': 0.91}
30%|███ | 3481/11526 [36:17<1:22:24, 1.63it/s] 30%|███ | 3482/11526 [36:18<1:22:27, 1.63it/s] {'loss': 0.2846, 'grad_norm': 0.49124595522880554, 'learning_rate': 8.806869173055621e-06, 'epoch': 0.91}
30%|███ | 3482/11526 [36:18<1:22:27, 1.63it/s] 30%|███ | 3483/11526 [36:18<1:22:32, 1.62it/s] {'loss': 0.1726, 'grad_norm': 0.4793456494808197, 'learning_rate': 8.805887249854207e-06, 'epoch': 0.91}
30%|███ | 3483/11526 [36:19<1:22:32, 1.62it/s] 30%|███ | 3484/11526 [36:19<1:22:28, 1.63it/s] {'loss': 0.321, 'grad_norm': 0.7678573727607727, 'learning_rate': 8.80490497755516e-06, 'epoch': 0.91}
30%|███ | 3484/11526 [36:19<1:22:28, 1.63it/s] 30%|███ | 3485/11526 [36:20<1:22:24, 1.63it/s] {'loss': 0.2428, 'grad_norm': 0.5330924987792969, 'learning_rate': 8.80392235624858e-06, 'epoch': 0.91}
30%|███ | 3485/11526 [36:20<1:22:24, 1.63it/s] 30%|███ | 3486/11526 [36:20<1:22:21, 1.63it/s] {'loss': 0.2481, 'grad_norm': 0.5250867009162903, 'learning_rate': 8.802939386024597e-06, 'epoch': 0.91}
30%|███ | 3486/11526 [36:20<1:22:21, 1.63it/s] 30%|███ | 3487/11526 [36:21<1:22:21, 1.63it/s] {'loss': 0.2694, 'grad_norm': 0.6357471942901611, 'learning_rate': 8.801956066973377e-06, 'epoch': 0.91}
30%|███ | 3487/11526 [36:21<1:22:21, 1.63it/s] 30%|███ | 3488/11526 [36:22<1:22:19, 1.63it/s] {'loss': 0.3244, 'grad_norm': 0.6307570934295654, 'learning_rate': 8.800972399185113e-06, 'epoch': 0.91}
30%|███ | 3488/11526 [36:22<1:22:19, 1.63it/s] 30%|███ | 3489/11526 [36:22<1:22:20, 1.63it/s] {'loss': 0.3251, 'grad_norm': 0.6207303404808044, 'learning_rate': 8.799988382750034e-06, 'epoch': 0.91}
30%|███ | 3489/11526 [36:22<1:22:20, 1.63it/s] 30%|███ | 3490/11526 [36:23<1:22:19, 1.63it/s] {'loss': 0.2421, 'grad_norm': 0.5510879755020142, 'learning_rate': 8.7990040177584e-06, 'epoch': 0.91}
30%|███ | 3490/11526 [36:23<1:22:19, 1.63it/s] 30%|███ | 3491/11526 [36:23<1:22:20, 1.63it/s] {'loss': 0.3261, 'grad_norm': 0.6086142063140869, 'learning_rate': 8.798019304300503e-06, 'epoch': 0.91}
30%|███ | 3491/11526 [36:24<1:22:20, 1.63it/s] 30%|███ | 3492/11526 [36:24<1:22:20, 1.63it/s] {'loss': 0.2407, 'grad_norm': 0.5158511400222778, 'learning_rate': 8.797034242466663e-06, 'epoch': 0.91}
30%|███ | 3492/11526 [36:24<1:22:20, 1.63it/s] 30%|███ | 3493/11526 [36:25<1:22:21, 1.63it/s] {'loss': 0.2267, 'grad_norm': 0.49556395411491394, 'learning_rate': 8.79604883234724e-06, 'epoch': 0.91}
30%|███ | 3493/11526 [36:25<1:22:21, 1.63it/s] 30%|███ | 3494/11526 [36:25<1:22:16, 1.63it/s] {'loss': 0.2998, 'grad_norm': 0.6148397922515869, 'learning_rate': 8.79506307403262e-06, 'epoch': 0.91}
30%|███ | 3494/11526 [36:25<1:22:16, 1.63it/s] 30%|███ | 3495/11526 [36:26<1:22:16, 1.63it/s] {'loss': 0.3565, 'grad_norm': 0.6940520405769348, 'learning_rate': 8.79407696761322e-06, 'epoch': 0.91}
30%|███ | 3495/11526 [36:26<1:22:16, 1.63it/s] 30%|███ | 3496/11526 [36:26<1:22:14, 1.63it/s] {'loss': 0.2688, 'grad_norm': 0.6145391464233398, 'learning_rate': 8.793090513179493e-06, 'epoch': 0.91}
30%|███ | 3496/11526 [36:27<1:22:14, 1.63it/s] 30%|███ | 3497/11526 [36:27<1:22:14, 1.63it/s] {'loss': 0.2451, 'grad_norm': 0.5058197975158691, 'learning_rate': 8.792103710821925e-06, 'epoch': 0.91}
30%|███ | 3497/11526 [36:27<1:22:14, 1.63it/s] 30%|███ | 3498/11526 [36:28<1:22:18, 1.63it/s] {'loss': 0.1819, 'grad_norm': 0.4975220561027527, 'learning_rate': 8.791116560631026e-06, 'epoch': 0.91}
30%|███ | 3498/11526 [36:28<1:22:18, 1.63it/s] 30%|███ | 3499/11526 [36:28<1:22:16, 1.63it/s] {'loss': 0.2045, 'grad_norm': 0.4923948645591736, 'learning_rate': 8.790129062697347e-06, 'epoch': 0.91}
30%|███ | 3499/11526 [36:28<1:22:16, 1.63it/s] 30%|███ | 3500/11526 [36:29<1:22:16, 1.63it/s] {'loss': 0.2367, 'grad_norm': 0.4857613742351532, 'learning_rate': 8.789141217111464e-06, 'epoch': 0.91}
30%|███ | 3500/11526 [36:29<1:22:16, 1.63it/s] 30%|███ | 3501/11526 [36:30<1:22:19, 1.62it/s] {'loss': 0.2808, 'grad_norm': 0.5876249074935913, 'learning_rate': 8.78815302396399e-06, 'epoch': 0.91}
30%|███ | 3501/11526 [36:30<1:22:19, 1.62it/s] 30%|███ | 3502/11526 [36:30<1:22:15, 1.63it/s] {'loss': 0.2736, 'grad_norm': 0.6019162535667419, 'learning_rate': 8.787164483345568e-06, 'epoch': 0.91}
30%|███ | 3502/11526 [36:30<1:22:15, 1.63it/s] 30%|███ | 3503/11526 [36:31<1:22:39, 1.62it/s] {'loss': 0.2337, 'grad_norm': 0.5100170373916626, 'learning_rate': 8.786175595346869e-06, 'epoch': 0.91}
30%|███ | 3503/11526 [36:31<1:22:39, 1.62it/s] 30%|███ | 3504/11526 [36:31<1:22:28, 1.62it/s] {'loss': 0.2408, 'grad_norm': 0.5055410265922546, 'learning_rate': 8.785186360058604e-06, 'epoch': 0.91}
30%|███ | 3504/11526 [36:32<1:22:28, 1.62it/s] 30%|███ | 3505/11526 [36:32<1:22:21, 1.62it/s] {'loss': 0.3197, 'grad_norm': 0.5757748484611511, 'learning_rate': 8.784196777571508e-06, 'epoch': 0.91}
30%|███ | 3505/11526 [36:32<1:22:21, 1.62it/s] 30%|███ | 3506/11526 [36:33<1:22:25, 1.62it/s] {'loss': 0.2207, 'grad_norm': 0.5512620806694031, 'learning_rate': 8.783206847976353e-06, 'epoch': 0.91}
30%|███ | 3506/11526 [36:33<1:22:25, 1.62it/s] 30%|███ | 3507/11526 [36:33<1:22:19, 1.62it/s] {'loss': 0.2518, 'grad_norm': 0.5979750752449036, 'learning_rate': 8.782216571363941e-06, 'epoch': 0.91}
30%|███ | 3507/11526 [36:33<1:22:19, 1.62it/s] 30%|███ | 3508/11526 [36:34<1:22:24, 1.62it/s] {'loss': 0.2445, 'grad_norm': 0.5641458034515381, 'learning_rate': 8.781225947825104e-06, 'epoch': 0.91}
30%|███ | 3508/11526 [36:34<1:22:24, 1.62it/s] 30%|███ | 3509/11526 [36:34<1:22:16, 1.62it/s] {'loss': 0.2693, 'grad_norm': 0.6108233332633972, 'learning_rate': 8.780234977450709e-06, 'epoch': 0.91}
30%|███ | 3509/11526 [36:35<1:22:16, 1.62it/s] 30%|███ | 3510/11526 [36:35<1:22:12, 1.63it/s] {'loss': 0.2279, 'grad_norm': 0.475714772939682, 'learning_rate': 8.779243660331653e-06, 'epoch': 0.91}
30%|███ | 3510/11526 [36:35<1:22:12, 1.63it/s] 30%|███ | 3511/11526 [36:36<1:22:14, 1.62it/s] {'loss': 0.2746, 'grad_norm': 0.5775438547134399, 'learning_rate': 8.778251996558867e-06, 'epoch': 0.91}
30%|███ | 3511/11526 [36:36<1:22:14, 1.62it/s] 30%|███ | 3512/11526 [36:36<1:22:10, 1.63it/s] {'loss': 0.2796, 'grad_norm': 0.5833345055580139, 'learning_rate': 8.77725998622331e-06, 'epoch': 0.91}
30%|███ | 3512/11526 [36:36<1:22:10, 1.63it/s] 30%|███ | 3513/11526 [36:37<1:22:10, 1.63it/s] {'loss': 0.2867, 'grad_norm': 0.5163644552230835, 'learning_rate': 8.776267629415974e-06, 'epoch': 0.91}
30%|███ | 3513/11526 [36:37<1:22:10, 1.63it/s] 30%|███ | 3514/11526 [36:38<1:22:07, 1.63it/s] {'loss': 0.2212, 'grad_norm': 0.5258894562721252, 'learning_rate': 8.775274926227886e-06, 'epoch': 0.91}
30%|███ | 3514/11526 [36:38<1:22:07, 1.63it/s] 30%|███ | 3515/11526 [36:38<1:22:02, 1.63it/s] {'loss': 0.2533, 'grad_norm': 0.5352962017059326, 'learning_rate': 8.7742818767501e-06, 'epoch': 0.91}
30%|███ | 3515/11526 [36:38<1:22:02, 1.63it/s] 31%|███ | 3516/11526 [36:39<1:22:03, 1.63it/s] {'loss': 0.2467, 'grad_norm': 0.5265458226203918, 'learning_rate': 8.773288481073707e-06, 'epoch': 0.92}
31%|███ | 3516/11526 [36:39<1:22:03, 1.63it/s] 31%|███ | 3517/11526 [36:39<1:22:04, 1.63it/s] {'loss': 0.3793, 'grad_norm': 0.6767374873161316, 'learning_rate': 8.772294739289825e-06, 'epoch': 0.92}
31%|███ | 3517/11526 [36:40<1:22:04, 1.63it/s] 31%|███ | 3518/11526 [36:40<1:22:25, 1.62it/s] {'loss': 0.2984, 'grad_norm': 0.6501851081848145, 'learning_rate': 8.771300651489606e-06, 'epoch': 0.92}
31%|███ | 3518/11526 [36:40<1:22:25, 1.62it/s] 31%|███ | 3519/11526 [36:41<1:22:14, 1.62it/s] {'loss': 0.3072, 'grad_norm': 0.7016598582267761, 'learning_rate': 8.770306217764233e-06, 'epoch': 0.92}
31%|███ | 3519/11526 [36:41<1:22:14, 1.62it/s] 31%|███ | 3520/11526 [36:41<1:22:09, 1.62it/s] {'loss': 0.2708, 'grad_norm': 0.5517572164535522, 'learning_rate': 8.769311438204922e-06, 'epoch': 0.92}
31%|███ | 3520/11526 [36:41<1:22:09, 1.62it/s] 31%|███ | 3521/11526 [36:42<1:22:10, 1.62it/s] {'loss': 0.1822, 'grad_norm': 0.42094656825065613, 'learning_rate': 8.76831631290292e-06, 'epoch': 0.92}
31%|███ | 3521/11526 [36:42<1:22:10, 1.62it/s] 31%|███ | 3522/11526 [36:42<1:22:01, 1.63it/s] {'loss': 0.3716, 'grad_norm': 0.656853199005127, 'learning_rate': 8.767320841949504e-06, 'epoch': 0.92}
31%|███ | 3522/11526 [36:43<1:22:01, 1.63it/s] 31%|███ | 3523/11526 [36:43<1:22:09, 1.62it/s] {'loss': 0.2553, 'grad_norm': 0.5281513929367065, 'learning_rate': 8.766325025435986e-06, 'epoch': 0.92}
31%|███ | 3523/11526 [36:43<1:22:09, 1.62it/s] 31%|███ | 3524/11526 [36:44<1:22:03, 1.63it/s] {'loss': 0.2558, 'grad_norm': 0.5522525906562805, 'learning_rate': 8.765328863453706e-06, 'epoch': 0.92}
31%|███ | 3524/11526 [36:44<1:22:03, 1.63it/s] 31%|███ | 3525/11526 [36:44<1:21:58, 1.63it/s] {'loss': 0.2197, 'grad_norm': 0.5639029741287231, 'learning_rate': 8.76433235609404e-06, 'epoch': 0.92}
31%|███ | 3525/11526 [36:44<1:21:58, 1.63it/s] 31%|███ | 3526/11526 [36:45<1:21:57, 1.63it/s] {'loss': 0.2389, 'grad_norm': 0.5141273140907288, 'learning_rate': 8.763335503448391e-06, 'epoch': 0.92}
31%|███ | 3526/11526 [36:45<1:21:57, 1.63it/s] 31%|███ | 3527/11526 [36:46<1:21:52, 1.63it/s] {'loss': 0.2438, 'grad_norm': 0.5100824236869812, 'learning_rate': 8.762338305608198e-06, 'epoch': 0.92}
31%|███ | 3527/11526 [36:46<1:21:52, 1.63it/s] 31%|███ | 3528/11526 [36:46<1:21:56, 1.63it/s] {'loss': 0.2561, 'grad_norm': 0.6101739406585693, 'learning_rate': 8.761340762664928e-06, 'epoch': 0.92}
31%|███ | 3528/11526 [36:46<1:21:56, 1.63it/s] 31%|███ | 3529/11526 [36:47<1:21:53, 1.63it/s] {'loss': 0.2458, 'grad_norm': 0.5745782852172852, 'learning_rate': 8.760342874710082e-06, 'epoch': 0.92}
31%|███ | 3529/11526 [36:47<1:21:53, 1.63it/s] 31%|███ | 3530/11526 [36:47<1:21:52, 1.63it/s] {'loss': 0.2536, 'grad_norm': 0.47153952717781067, 'learning_rate': 8.759344641835194e-06, 'epoch': 0.92}
31%|███ | 3530/11526 [36:48<1:21:52, 1.63it/s] 31%|███ | 3531/11526 [36:48<1:21:54, 1.63it/s] {'loss': 0.227, 'grad_norm': 0.5798426270484924, 'learning_rate': 8.758346064131824e-06, 'epoch': 0.92}
31%|███ | 3531/11526 [36:48<1:21:54, 1.63it/s] 31%|███ | 3532/11526 [36:49<1:21:56, 1.63it/s] {'loss': 0.2524, 'grad_norm': 0.5248987674713135, 'learning_rate': 8.757347141691568e-06, 'epoch': 0.92}
31%|███ | 3532/11526 [36:49<1:21:56, 1.63it/s] 31%|███ | 3533/11526 [36:49<1:21:57, 1.63it/s] {'loss': 0.2771, 'grad_norm': 0.5951041579246521, 'learning_rate': 8.756347874606056e-06, 'epoch': 0.92}
31%|███ | 3533/11526 [36:49<1:21:57, 1.63it/s] 31%|███ | 3534/11526 [36:50<1:21:53, 1.63it/s] {'loss': 0.3526, 'grad_norm': 0.6641544103622437, 'learning_rate': 8.755348262966942e-06, 'epoch': 0.92}
31%|███ | 3534/11526 [36:50<1:21:53, 1.63it/s] 31%|███ | 3535/11526 [36:50<1:21:50, 1.63it/s] {'loss': 0.3416, 'grad_norm': 0.6003164052963257, 'learning_rate': 8.75434830686592e-06, 'epoch': 0.92}
31%|███ | 3535/11526 [36:51<1:21:50, 1.63it/s] 31%|███ | 3536/11526 [36:51<1:21:46, 1.63it/s] {'loss': 0.2166, 'grad_norm': 0.5126489996910095, 'learning_rate': 8.75334800639471e-06, 'epoch': 0.92}
31%|███ | 3536/11526 [36:51<1:21:46, 1.63it/s] 31%|███ | 3537/11526 [36:52<1:21:45, 1.63it/s] {'loss': 0.2906, 'grad_norm': 0.5984213948249817, 'learning_rate': 8.752347361645065e-06, 'epoch': 0.92}
31%|███ | 3537/11526 [36:52<1:21:45, 1.63it/s] 31%|███ | 3538/11526 [36:52<1:21:50, 1.63it/s] {'loss': 0.4373, 'grad_norm': 0.7689931392669678, 'learning_rate': 8.751346372708769e-06, 'epoch': 0.92}
31%|███ | 3538/11526 [36:52<1:21:50, 1.63it/s] 31%|███ | 3539/11526 [36:53<1:21:45, 1.63it/s] {'loss': 0.2354, 'grad_norm': 0.5632514357566833, 'learning_rate': 8.750345039677642e-06, 'epoch': 0.92}
31%|███ | 3539/11526 [36:53<1:21:45, 1.63it/s] 31%|███ | 3540/11526 [36:54<1:21:48, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.3749343156814575, 'learning_rate': 8.749343362643525e-06, 'epoch': 0.92}
31%|███ | 3540/11526 [36:54<1:21:48, 1.63it/s] 31%|███ | 3541/11526 [36:54<1:21:48, 1.63it/s] {'loss': 0.2844, 'grad_norm': 0.4777382016181946, 'learning_rate': 8.748341341698305e-06, 'epoch': 0.92}
31%|███ | 3541/11526 [36:54<1:21:48, 1.63it/s] 31%|███ | 3542/11526 [36:55<1:21:48, 1.63it/s] {'loss': 0.2203, 'grad_norm': 0.5142214298248291, 'learning_rate': 8.747338976933889e-06, 'epoch': 0.92}
31%|███ | 3542/11526 [36:55<1:21:48, 1.63it/s] 31%|███ | 3543/11526 [36:55<1:21:54, 1.62it/s] {'loss': 0.2222, 'grad_norm': 0.5310955047607422, 'learning_rate': 8.746336268442222e-06, 'epoch': 0.92}
31%|███ | 3543/11526 [36:56<1:21:54, 1.62it/s] 31%|███ | 3544/11526 [36:56<1:21:51, 1.63it/s] {'loss': 0.7552, 'grad_norm': 0.6741878986358643, 'learning_rate': 8.745333216315275e-06, 'epoch': 0.92}
31%|███ | 3544/11526 [36:56<1:21:51, 1.63it/s] 31%|███ | 3545/11526 [36:57<1:21:47, 1.63it/s] {'loss': 0.3628, 'grad_norm': 0.6379654407501221, 'learning_rate': 8.744329820645055e-06, 'epoch': 0.92}
31%|███ | 3545/11526 [36:57<1:21:47, 1.63it/s] 31%|███ | 3546/11526 [36:57<1:21:48, 1.63it/s] {'loss': 0.2263, 'grad_norm': 0.4724216163158417, 'learning_rate': 8.743326081523602e-06, 'epoch': 0.92}
31%|███ | 3546/11526 [36:57<1:21:48, 1.63it/s] 31%|███ | 3547/11526 [36:58<1:21:45, 1.63it/s] {'loss': 0.2729, 'grad_norm': 0.5684082508087158, 'learning_rate': 8.74232199904298e-06, 'epoch': 0.92}
31%|███ | 3547/11526 [36:58<1:21:45, 1.63it/s] 31%|███ | 3548/11526 [36:58<1:21:48, 1.63it/s] {'loss': 0.2279, 'grad_norm': 0.521946907043457, 'learning_rate': 8.74131757329529e-06, 'epoch': 0.92}
31%|███ | 3548/11526 [36:59<1:21:48, 1.63it/s] 31%|███ | 3549/11526 [36:59<1:21:44, 1.63it/s] {'loss': 0.2355, 'grad_norm': 0.5117738842964172, 'learning_rate': 8.740312804372668e-06, 'epoch': 0.92}
31%|███ | 3549/11526 [36:59<1:21:44, 1.63it/s] 31%|███ | 3550/11526 [37:00<1:21:42, 1.63it/s] {'loss': 0.3375, 'grad_norm': 0.6192073225975037, 'learning_rate': 8.739307692367273e-06, 'epoch': 0.92}
31%|███ | 3550/11526 [37:00<1:21:42, 1.63it/s] 31%|███ | 3551/11526 [37:00<1:21:43, 1.63it/s] {'loss': 0.2554, 'grad_norm': 0.5339572429656982, 'learning_rate': 8.7383022373713e-06, 'epoch': 0.92}
31%|███ | 3551/11526 [37:00<1:21:43, 1.63it/s] 31%|███ | 3552/11526 [37:01<1:21:43, 1.63it/s] {'loss': 0.2683, 'grad_norm': 0.4881262481212616, 'learning_rate': 8.737296439476976e-06, 'epoch': 0.92}
31%|███ | 3552/11526 [37:01<1:21:43, 1.63it/s] 31%|███ | 3553/11526 [37:02<1:21:50, 1.62it/s] {'loss': 0.2762, 'grad_norm': 0.5334048271179199, 'learning_rate': 8.736290298776558e-06, 'epoch': 0.92}
31%|███ | 3553/11526 [37:02<1:21:50, 1.62it/s] 31%|███ | 3554/11526 [37:02<1:21:43, 1.63it/s] {'loss': 0.218, 'grad_norm': 0.46116530895233154, 'learning_rate': 8.735283815362337e-06, 'epoch': 0.93}
31%|███ | 3554/11526 [37:02<1:21:43, 1.63it/s] 31%|███ | 3555/11526 [37:03<1:21:40, 1.63it/s] {'loss': 0.2588, 'grad_norm': 0.568915069103241, 'learning_rate': 8.734276989326628e-06, 'epoch': 0.93}
31%|███ | 3555/11526 [37:03<1:21:40, 1.63it/s] 31%|███ | 3556/11526 [37:03<1:21:45, 1.62it/s] {'loss': 0.2248, 'grad_norm': 0.49977216124534607, 'learning_rate': 8.73326982076179e-06, 'epoch': 0.93}
31%|███ | 3556/11526 [37:04<1:21:45, 1.62it/s] 31%|███ | 3557/11526 [37:04<1:21:40, 1.63it/s] {'loss': 0.3011, 'grad_norm': 0.5456662178039551, 'learning_rate': 8.732262309760202e-06, 'epoch': 0.93}
31%|███ | 3557/11526 [37:04<1:21:40, 1.63it/s] 31%|███ | 3558/11526 [37:05<1:21:47, 1.62it/s] {'loss': 0.3199, 'grad_norm': 0.6235488057136536, 'learning_rate': 8.731254456414278e-06, 'epoch': 0.93}
31%|███ | 3558/11526 [37:05<1:21:47, 1.62it/s] 31%|███ | 3559/11526 [37:05<1:21:41, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.46225953102111816, 'learning_rate': 8.730246260816465e-06, 'epoch': 0.93}
31%|███ | 3559/11526 [37:05<1:21:41, 1.63it/s] 31%|███ | 3560/11526 [37:06<1:21:38, 1.63it/s] {'loss': 0.262, 'grad_norm': 0.5320008993148804, 'learning_rate': 8.729237723059242e-06, 'epoch': 0.93}
31%|███ | 3560/11526 [37:06<1:21:38, 1.63it/s] 31%|███ | 3561/11526 [37:06<1:21:37, 1.63it/s] {'loss': 0.2316, 'grad_norm': 0.5199187397956848, 'learning_rate': 8.728228843235116e-06, 'epoch': 0.93}
31%|███ | 3561/11526 [37:07<1:21:37, 1.63it/s] 31%|███ | 3562/11526 [37:07<1:21:43, 1.62it/s] {'loss': 0.2523, 'grad_norm': 0.5521278381347656, 'learning_rate': 8.727219621436629e-06, 'epoch': 0.93}
31%|███ | 3562/11526 [37:07<1:21:43, 1.62it/s] 31%|███ | 3563/11526 [37:08<1:22:01, 1.62it/s] {'loss': 0.2772, 'grad_norm': 0.5623727440834045, 'learning_rate': 8.72621005775635e-06, 'epoch': 0.93}
31%|███ | 3563/11526 [37:08<1:22:01, 1.62it/s] 31%|███ | 3564/11526 [37:08<1:21:55, 1.62it/s] {'loss': 0.2113, 'grad_norm': 0.4535275399684906, 'learning_rate': 8.725200152286882e-06, 'epoch': 0.93}
31%|███ | 3564/11526 [37:08<1:21:55, 1.62it/s] 31%|███ | 3565/11526 [37:09<1:21:46, 1.62it/s] {'loss': 0.3373, 'grad_norm': 0.5762205123901367, 'learning_rate': 8.724189905120861e-06, 'epoch': 0.93}
31%|███ | 3565/11526 [37:09<1:21:46, 1.62it/s] 31%|███ | 3566/11526 [37:10<1:21:46, 1.62it/s] {'loss': 0.2505, 'grad_norm': 0.5387022495269775, 'learning_rate': 8.723179316350953e-06, 'epoch': 0.93}
31%|███ | 3566/11526 [37:10<1:21:46, 1.62it/s] 31%|███ | 3567/11526 [37:10<1:21:40, 1.62it/s] {'loss': 0.2454, 'grad_norm': 0.5823970437049866, 'learning_rate': 8.722168386069855e-06, 'epoch': 0.93}
31%|███ | 3567/11526 [37:10<1:21:40, 1.62it/s] 31%|███ | 3568/11526 [37:11<1:21:41, 1.62it/s] {'loss': 0.2741, 'grad_norm': 0.6014996767044067, 'learning_rate': 8.721157114370293e-06, 'epoch': 0.93}
31%|███ | 3568/11526 [37:11<1:21:41, 1.62it/s] 31%|███ | 3569/11526 [37:11<1:21:36, 1.62it/s] {'loss': 0.2713, 'grad_norm': 0.5755122900009155, 'learning_rate': 8.720145501345028e-06, 'epoch': 0.93}
31%|███ | 3569/11526 [37:12<1:21:36, 1.62it/s] 31%|███ | 3570/11526 [37:12<1:21:34, 1.63it/s] {'loss': 0.2475, 'grad_norm': 0.5292026996612549, 'learning_rate': 8.719133547086852e-06, 'epoch': 0.93}
31%|███ | 3570/11526 [37:12<1:21:34, 1.63it/s] 31%|███ | 3571/11526 [37:13<1:21:35, 1.62it/s] {'loss': 0.2033, 'grad_norm': 0.47738662362098694, 'learning_rate': 8.718121251688584e-06, 'epoch': 0.93}
31%|███ | 3571/11526 [37:13<1:21:35, 1.62it/s] 31%|███ | 3572/11526 [37:13<1:21:33, 1.63it/s] {'loss': 0.2916, 'grad_norm': 0.5887089967727661, 'learning_rate': 8.717108615243081e-06, 'epoch': 0.93}
31%|███ | 3572/11526 [37:13<1:21:33, 1.63it/s] 31%|███ | 3573/11526 [37:14<1:25:57, 1.54it/s] {'loss': 0.3561, 'grad_norm': 0.6625359058380127, 'learning_rate': 8.716095637843227e-06, 'epoch': 0.93}
31%|███ | 3573/11526 [37:14<1:25:57, 1.54it/s] 31%|███ | 3574/11526 [37:15<1:24:35, 1.57it/s] {'loss': 0.1842, 'grad_norm': 0.4243215024471283, 'learning_rate': 8.715082319581938e-06, 'epoch': 0.93}
31%|███ | 3574/11526 [37:15<1:24:35, 1.57it/s] 31%|███ | 3575/11526 [37:15<1:23:38, 1.58it/s] {'loss': 0.2502, 'grad_norm': 0.5031136870384216, 'learning_rate': 8.714068660552158e-06, 'epoch': 0.93}
31%|███ | 3575/11526 [37:15<1:23:38, 1.58it/s] 31%|███ | 3576/11526 [37:16<1:23:02, 1.60it/s] {'loss': 0.1844, 'grad_norm': 0.4648086130619049, 'learning_rate': 8.713054660846871e-06, 'epoch': 0.93}
31%|███ | 3576/11526 [37:16<1:23:02, 1.60it/s] 31%|███ | 3577/11526 [37:16<1:22:32, 1.60it/s] {'loss': 0.2522, 'grad_norm': 0.576806366443634, 'learning_rate': 8.712040320559082e-06, 'epoch': 0.93}
31%|███ | 3577/11526 [37:17<1:22:32, 1.60it/s] 31%|███ | 3578/11526 [37:17<1:22:08, 1.61it/s] {'loss': 0.3256, 'grad_norm': 0.6583822965621948, 'learning_rate': 8.711025639781834e-06, 'epoch': 0.93}
31%|███ | 3578/11526 [37:17<1:22:08, 1.61it/s] 31%|███ | 3579/11526 [37:18<1:21:52, 1.62it/s] {'loss': 0.2206, 'grad_norm': 0.48127248883247375, 'learning_rate': 8.7100106186082e-06, 'epoch': 0.93}
31%|███ | 3579/11526 [37:18<1:21:52, 1.62it/s] 31%|███ | 3580/11526 [37:18<1:21:50, 1.62it/s] {'loss': 0.3274, 'grad_norm': 0.571951687335968, 'learning_rate': 8.708995257131283e-06, 'epoch': 0.93}
31%|███ | 3580/11526 [37:18<1:21:50, 1.62it/s] 31%|███ | 3581/11526 [37:19<1:21:40, 1.62it/s] {'loss': 0.3258, 'grad_norm': 0.6044633984565735, 'learning_rate': 8.70797955544422e-06, 'epoch': 0.93}
31%|███ | 3581/11526 [37:19<1:21:40, 1.62it/s] 31%|███ | 3582/11526 [37:20<1:21:36, 1.62it/s] {'loss': 0.2253, 'grad_norm': 0.4863988757133484, 'learning_rate': 8.706963513640171e-06, 'epoch': 0.93}
31%|███ | 3582/11526 [37:20<1:21:36, 1.62it/s] 31%|███ | 3583/11526 [37:20<1:21:31, 1.62it/s] {'loss': 0.2251, 'grad_norm': 0.45906615257263184, 'learning_rate': 8.70594713181234e-06, 'epoch': 0.93}
31%|███ | 3583/11526 [37:20<1:21:31, 1.62it/s] 31%|███ | 3584/11526 [37:21<1:21:26, 1.63it/s] {'loss': 0.2513, 'grad_norm': 0.5439960956573486, 'learning_rate': 8.70493041005395e-06, 'epoch': 0.93}
31%|███ | 3584/11526 [37:21<1:21:26, 1.63it/s] 31%|███ | 3585/11526 [37:21<1:21:24, 1.63it/s] {'loss': 0.2617, 'grad_norm': 0.5623352527618408, 'learning_rate': 8.703913348458262e-06, 'epoch': 0.93}
31%|███ | 3585/11526 [37:21<1:21:24, 1.63it/s] 31%|███ | 3586/11526 [37:22<1:21:26, 1.63it/s] {'loss': 0.2352, 'grad_norm': 0.514956533908844, 'learning_rate': 8.70289594711857e-06, 'epoch': 0.93}
31%|███ | 3586/11526 [37:22<1:21:26, 1.63it/s] 31%|███ | 3587/11526 [37:23<1:21:21, 1.63it/s] {'loss': 0.2154, 'grad_norm': 0.49200835824012756, 'learning_rate': 8.70187820612819e-06, 'epoch': 0.93}
31%|███ | 3587/11526 [37:23<1:21:21, 1.63it/s] 31%|███ | 3588/11526 [37:23<1:21:18, 1.63it/s] {'loss': 0.3174, 'grad_norm': 0.5226626396179199, 'learning_rate': 8.70086012558048e-06, 'epoch': 0.93}
31%|███ | 3588/11526 [37:23<1:21:18, 1.63it/s] 31%|███ | 3589/11526 [37:24<1:21:17, 1.63it/s] {'loss': 0.2366, 'grad_norm': 0.5292640328407288, 'learning_rate': 8.699841705568823e-06, 'epoch': 0.93}
31%|███ | 3589/11526 [37:24<1:21:17, 1.63it/s] 31%|███ | 3590/11526 [37:24<1:21:14, 1.63it/s] {'loss': 0.2487, 'grad_norm': 0.5717898607254028, 'learning_rate': 8.698822946186631e-06, 'epoch': 0.93}
31%|███ | 3590/11526 [37:25<1:21:14, 1.63it/s] 31%|███ | 3591/11526 [37:25<1:21:18, 1.63it/s] {'loss': 0.2359, 'grad_norm': 0.5037286281585693, 'learning_rate': 8.697803847527355e-06, 'epoch': 0.93}
31%|███ | 3591/11526 [37:25<1:21:18, 1.63it/s] 31%|███ | 3592/11526 [37:26<1:21:16, 1.63it/s] {'loss': 0.237, 'grad_norm': 0.5287806391716003, 'learning_rate': 8.69678440968447e-06, 'epoch': 0.93}
31%|███ | 3592/11526 [37:26<1:21:16, 1.63it/s] 31%|███ | 3593/11526 [37:26<1:21:12, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.48734551668167114, 'learning_rate': 8.695764632751485e-06, 'epoch': 0.94}
31%|███ | 3593/11526 [37:26<1:21:12, 1.63it/s] 31%|███ | 3594/11526 [37:27<1:21:11, 1.63it/s] {'loss': 0.2774, 'grad_norm': 0.5710761547088623, 'learning_rate': 8.69474451682194e-06, 'epoch': 0.94}
31%|███ | 3594/11526 [37:27<1:21:11, 1.63it/s] 31%|███ | 3595/11526 [37:28<1:21:08, 1.63it/s] {'loss': 0.2556, 'grad_norm': 0.5093226432800293, 'learning_rate': 8.693724061989407e-06, 'epoch': 0.94}
31%|███ | 3595/11526 [37:28<1:21:08, 1.63it/s] 31%|███ | 3596/11526 [37:28<1:21:15, 1.63it/s] {'loss': 0.2294, 'grad_norm': 0.5767619013786316, 'learning_rate': 8.692703268347484e-06, 'epoch': 0.94}
31%|███ | 3596/11526 [37:28<1:21:15, 1.63it/s] 31%|███ | 3597/11526 [37:29<1:21:12, 1.63it/s] {'loss': 0.231, 'grad_norm': 0.6011772155761719, 'learning_rate': 8.691682135989807e-06, 'epoch': 0.94}
31%|███ | 3597/11526 [37:29<1:21:12, 1.63it/s] 31%|███ | 3598/11526 [37:29<1:21:12, 1.63it/s] {'loss': 0.2839, 'grad_norm': 0.5737415552139282, 'learning_rate': 8.69066066501004e-06, 'epoch': 0.94}
31%|███ | 3598/11526 [37:29<1:21:12, 1.63it/s] 31%|███ | 3599/11526 [37:30<1:21:09, 1.63it/s] {'loss': 0.2381, 'grad_norm': 0.5336970090866089, 'learning_rate': 8.689638855501879e-06, 'epoch': 0.94}
31%|███ | 3599/11526 [37:30<1:21:09, 1.63it/s] 31%|███ | 3600/11526 [37:31<1:21:10, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.39722365140914917, 'learning_rate': 8.688616707559048e-06, 'epoch': 0.94}
31%|███ | 3600/11526 [37:31<1:21:10, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 12.63it/s]
31%|███ | 4/13 [00:00<00:01, 8.27it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.71it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.37it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.14it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.99it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6292176246643066, 'eval_runtime': 1.9651, 'eval_samples_per_second': 101.778, 'eval_steps_per_second': 6.616, 'epoch': 0.94}
31%|███ | 3600/11526 [37:33<1:21:10, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 31%|███ | 3601/11526 [37:33<2:39:18, 1.21s/it] {'loss': 0.3419, 'grad_norm': 0.6861042976379395, 'learning_rate': 8.687594221275306e-06, 'epoch': 0.94}
31%|███ | 3601/11526 [37:33<2:39:18, 1.21s/it] 31%|███▏ | 3602/11526 [37:34<2:15:50, 1.03s/it] {'loss': 0.2756, 'grad_norm': 0.5234807133674622, 'learning_rate': 8.686571396744439e-06, 'epoch': 0.94}
31%|███▏ | 3602/11526 [37:34<2:15:50, 1.03s/it] 31%|███▏ | 3603/11526 [37:34<1:59:22, 1.11it/s] {'loss': 0.2471, 'grad_norm': 0.5969184637069702, 'learning_rate': 8.685548234060268e-06, 'epoch': 0.94}
31%|███▏ | 3603/11526 [37:35<1:59:22, 1.11it/s] 31%|███▏ | 3604/11526 [37:35<1:47:53, 1.22it/s] {'loss': 0.248, 'grad_norm': 0.5990720987319946, 'learning_rate': 8.684524733316643e-06, 'epoch': 0.94}
31%|███▏ | 3604/11526 [37:35<1:47:53, 1.22it/s] 31%|███▏ | 3605/11526 [37:36<1:39:47, 1.32it/s] {'loss': 0.2735, 'grad_norm': 0.5654877424240112, 'learning_rate': 8.683500894607448e-06, 'epoch': 0.94}
31%|███▏ | 3605/11526 [37:36<1:39:47, 1.32it/s] 31%|███▏ | 3606/11526 [37:36<1:34:10, 1.40it/s] {'loss': 0.2537, 'grad_norm': 0.525377631187439, 'learning_rate': 8.68247671802659e-06, 'epoch': 0.94}
31%|███▏ | 3606/11526 [37:36<1:34:10, 1.40it/s] 31%|███▏ | 3607/11526 [37:37<1:30:13, 1.46it/s] {'loss': 0.2981, 'grad_norm': 0.7345564365386963, 'learning_rate': 8.681452203668015e-06, 'epoch': 0.94}
31%|███▏ | 3607/11526 [37:37<1:30:13, 1.46it/s] 31%|███▏ | 3608/11526 [37:37<1:27:26, 1.51it/s] {'loss': 0.2319, 'grad_norm': 0.5317774415016174, 'learning_rate': 8.6804273516257e-06, 'epoch': 0.94}
31%|███▏ | 3608/11526 [37:38<1:27:26, 1.51it/s] 31%|███▏ | 3609/11526 [37:38<1:25:27, 1.54it/s] {'loss': 0.279, 'grad_norm': 0.6020731329917908, 'learning_rate': 8.679402161993644e-06, 'epoch': 0.94}
31%|███▏ | 3609/11526 [37:38<1:25:27, 1.54it/s] 31%|███▏ | 3610/11526 [37:39<1:24:08, 1.57it/s] {'loss': 0.284, 'grad_norm': 0.5904935598373413, 'learning_rate': 8.678376634865887e-06, 'epoch': 0.94}
31%|███▏ | 3610/11526 [37:39<1:24:08, 1.57it/s] 31%|███▏ | 3611/11526 [37:39<1:23:12, 1.59it/s] {'loss': 0.2473, 'grad_norm': 0.625020444393158, 'learning_rate': 8.677350770336498e-06, 'epoch': 0.94}
31%|███▏ | 3611/11526 [37:39<1:23:12, 1.59it/s] 31%|███▏ | 3612/11526 [37:40<1:22:32, 1.60it/s] {'loss': 0.266, 'grad_norm': 0.5037685036659241, 'learning_rate': 8.676324568499574e-06, 'epoch': 0.94}
31%|███▏ | 3612/11526 [37:40<1:22:32, 1.60it/s] 31%|███▏ | 3613/11526 [37:41<1:22:06, 1.61it/s] {'loss': 0.2363, 'grad_norm': 0.5059702396392822, 'learning_rate': 8.675298029449241e-06, 'epoch': 0.94}
31%|███▏ | 3613/11526 [37:41<1:22:06, 1.61it/s] 31%|███▏ | 3614/11526 [37:41<1:21:46, 1.61it/s] {'loss': 0.18, 'grad_norm': 0.42219749093055725, 'learning_rate': 8.674271153279663e-06, 'epoch': 0.94}
31%|███▏ | 3614/11526 [37:41<1:21:46, 1.61it/s] 31%|███▏ | 3615/11526 [37:42<1:21:30, 1.62it/s] {'loss': 0.2385, 'grad_norm': 0.526631236076355, 'learning_rate': 8.673243940085029e-06, 'epoch': 0.94}
31%|███▏ | 3615/11526 [37:42<1:21:30, 1.62it/s] 31%|███▏ | 3616/11526 [37:42<1:21:19, 1.62it/s] {'loss': 0.2594, 'grad_norm': 0.581261396408081, 'learning_rate': 8.672216389959558e-06, 'epoch': 0.94}
31%|███▏ | 3616/11526 [37:43<1:21:19, 1.62it/s] 31%|███▏ | 3617/11526 [37:43<1:21:10, 1.62it/s] {'loss': 0.1956, 'grad_norm': 0.45388734340667725, 'learning_rate': 8.671188502997507e-06, 'epoch': 0.94}
31%|███▏ | 3617/11526 [37:43<1:21:10, 1.62it/s] 31%|███▏ | 3618/11526 [37:44<1:21:05, 1.63it/s] {'loss': 0.1959, 'grad_norm': 0.4532140791416168, 'learning_rate': 8.670160279293159e-06, 'epoch': 0.94}
31%|███▏ | 3618/11526 [37:44<1:21:05, 1.63it/s] 31%|███▏ | 3619/11526 [37:44<1:21:00, 1.63it/s] {'loss': 0.3833, 'grad_norm': 0.5985592007637024, 'learning_rate': 8.669131718940828e-06, 'epoch': 0.94}
31%|███▏ | 3619/11526 [37:44<1:21:00, 1.63it/s] 31%|███▏ | 3620/11526 [37:45<1:21:01, 1.63it/s] {'loss': 0.2338, 'grad_norm': 0.5133522748947144, 'learning_rate': 8.668102822034858e-06, 'epoch': 0.94}
31%|███▏ | 3620/11526 [37:45<1:21:01, 1.63it/s] 31%|███▏ | 3621/11526 [37:45<1:20:59, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.45985448360443115, 'learning_rate': 8.667073588669628e-06, 'epoch': 0.94}
31%|███▏ | 3621/11526 [37:46<1:20:59, 1.63it/s] 31%|███▏ | 3622/11526 [37:46<1:20:57, 1.63it/s] {'loss': 0.2301, 'grad_norm': 0.5036062598228455, 'learning_rate': 8.666044018939543e-06, 'epoch': 0.94}
31%|███▏ | 3622/11526 [37:46<1:20:57, 1.63it/s] 31%|███▏ | 3623/11526 [37:47<1:20:58, 1.63it/s] {'loss': 0.2306, 'grad_norm': 0.571769654750824, 'learning_rate': 8.66501411293904e-06, 'epoch': 0.94}
31%|███▏ | 3623/11526 [37:47<1:20:58, 1.63it/s] 31%|███▏ | 3624/11526 [37:47<1:20:58, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.48071011900901794, 'learning_rate': 8.663983870762593e-06, 'epoch': 0.94}
31%|███▏ | 3624/11526 [37:48<1:20:58, 1.63it/s] 31%|███▏ | 3625/11526 [37:48<1:25:23, 1.54it/s] {'loss': 0.2048, 'grad_norm': 0.5052418112754822, 'learning_rate': 8.662953292504695e-06, 'epoch': 0.94}
31%|███▏ | 3625/11526 [37:48<1:25:23, 1.54it/s] 31%|███▏ | 3626/11526 [37:49<1:24:11, 1.56it/s] {'loss': 0.3113, 'grad_norm': 0.7328222393989563, 'learning_rate': 8.661922378259882e-06, 'epoch': 0.94}
31%|███▏ | 3626/11526 [37:49<1:24:11, 1.56it/s] 31%|███▏ | 3627/11526 [37:49<1:23:08, 1.58it/s] {'loss': 0.2454, 'grad_norm': 0.5176063179969788, 'learning_rate': 8.660891128122711e-06, 'epoch': 0.94}
31%|███▏ | 3627/11526 [37:49<1:23:08, 1.58it/s] 31%|███▏ | 3628/11526 [37:50<1:22:27, 1.60it/s] {'loss': 0.395, 'grad_norm': 0.6605504155158997, 'learning_rate': 8.659859542187778e-06, 'epoch': 0.94}
31%|███▏ | 3628/11526 [37:50<1:22:27, 1.60it/s] 31%|███▏ | 3629/11526 [37:50<1:21:57, 1.61it/s] {'loss': 0.2654, 'grad_norm': 0.5197849869728088, 'learning_rate': 8.658827620549703e-06, 'epoch': 0.94}
31%|███▏ | 3629/11526 [37:51<1:21:57, 1.61it/s] 31%|███▏ | 3630/11526 [37:51<1:21:37, 1.61it/s] {'loss': 0.2608, 'grad_norm': 0.5348836779594421, 'learning_rate': 8.657795363303141e-06, 'epoch': 0.94}
31%|███▏ | 3630/11526 [37:51<1:21:37, 1.61it/s] 32%|███▏ | 3631/11526 [37:52<1:25:49, 1.53it/s] {'loss': 0.2893, 'grad_norm': 0.5681641697883606, 'learning_rate': 8.656762770542776e-06, 'epoch': 0.95}
32%|███▏ | 3631/11526 [37:52<1:25:49, 1.53it/s] 32%|███▏ | 3632/11526 [37:52<1:24:19, 1.56it/s] {'loss': 0.2773, 'grad_norm': 0.5685955882072449, 'learning_rate': 8.655729842363323e-06, 'epoch': 0.95}
32%|███▏ | 3632/11526 [37:53<1:24:19, 1.56it/s] 32%|███▏ | 3633/11526 [37:53<1:23:13, 1.58it/s] {'loss': 0.3004, 'grad_norm': 0.5976361036300659, 'learning_rate': 8.654696578859529e-06, 'epoch': 0.95}
32%|███▏ | 3633/11526 [37:53<1:23:13, 1.58it/s] 32%|███▏ | 3634/11526 [37:54<1:22:27, 1.60it/s] {'loss': 0.2505, 'grad_norm': 0.5063525438308716, 'learning_rate': 8.653662980126171e-06, 'epoch': 0.95}
32%|███▏ | 3634/11526 [37:54<1:22:27, 1.60it/s] 32%|███▏ | 3635/11526 [37:54<1:21:55, 1.61it/s] {'loss': 0.2146, 'grad_norm': 0.510444700717926, 'learning_rate': 8.652629046258057e-06, 'epoch': 0.95}
32%|███▏ | 3635/11526 [37:54<1:21:55, 1.61it/s] 32%|███▏ | 3636/11526 [37:55<1:21:36, 1.61it/s] {'loss': 0.2894, 'grad_norm': 0.5714998841285706, 'learning_rate': 8.651594777350022e-06, 'epoch': 0.95}
32%|███▏ | 3636/11526 [37:55<1:21:36, 1.61it/s] 32%|███▏ | 3637/11526 [37:55<1:21:20, 1.62it/s] {'loss': 0.2337, 'grad_norm': 0.4934414327144623, 'learning_rate': 8.650560173496937e-06, 'epoch': 0.95}
32%|███▏ | 3637/11526 [37:56<1:21:20, 1.62it/s] 32%|███▏ | 3638/11526 [37:56<1:21:09, 1.62it/s] {'loss': 0.2329, 'grad_norm': 0.5472906231880188, 'learning_rate': 8.649525234793705e-06, 'epoch': 0.95}
32%|███▏ | 3638/11526 [37:56<1:21:09, 1.62it/s] 32%|███▏ | 3639/11526 [37:57<1:21:02, 1.62it/s] {'loss': 0.2392, 'grad_norm': 0.5294278860092163, 'learning_rate': 8.648489961335251e-06, 'epoch': 0.95}
32%|███▏ | 3639/11526 [37:57<1:21:02, 1.62it/s] 32%|███▏ | 3640/11526 [37:57<1:20:56, 1.62it/s] {'loss': 0.2101, 'grad_norm': 0.4442051947116852, 'learning_rate': 8.64745435321654e-06, 'epoch': 0.95}
32%|███▏ | 3640/11526 [37:57<1:20:56, 1.62it/s] 32%|███▏ | 3641/11526 [37:58<1:20:56, 1.62it/s] {'loss': 0.2223, 'grad_norm': 0.5311015248298645, 'learning_rate': 8.646418410532561e-06, 'epoch': 0.95}
32%|███▏ | 3641/11526 [37:58<1:20:56, 1.62it/s] 32%|███▏ | 3642/11526 [37:59<1:20:50, 1.63it/s] {'loss': 0.2429, 'grad_norm': 0.4692399501800537, 'learning_rate': 8.64538213337834e-06, 'epoch': 0.95}
32%|███▏ | 3642/11526 [37:59<1:20:50, 1.63it/s] 32%|███▏ | 3643/11526 [37:59<1:20:48, 1.63it/s] {'loss': 0.2696, 'grad_norm': 0.5316739082336426, 'learning_rate': 8.64434552184893e-06, 'epoch': 0.95}
32%|███▏ | 3643/11526 [37:59<1:20:48, 1.63it/s] 32%|███▏ | 3644/11526 [38:00<1:20:44, 1.63it/s] {'loss': 0.257, 'grad_norm': 0.47679397463798523, 'learning_rate': 8.643308576039409e-06, 'epoch': 0.95}
32%|███▏ | 3644/11526 [38:00<1:20:44, 1.63it/s] 32%|███▏ | 3645/11526 [38:00<1:20:40, 1.63it/s] {'loss': 0.2043, 'grad_norm': 0.5338336825370789, 'learning_rate': 8.642271296044897e-06, 'epoch': 0.95}
32%|███▏ | 3645/11526 [38:01<1:20:40, 1.63it/s] 32%|███▏ | 3646/11526 [38:01<1:20:40, 1.63it/s] {'loss': 0.2572, 'grad_norm': 0.545141339302063, 'learning_rate': 8.641233681960539e-06, 'epoch': 0.95}
32%|███▏ | 3646/11526 [38:01<1:20:40, 1.63it/s] 32%|███▏ | 3647/11526 [38:02<1:20:39, 1.63it/s] {'loss': 0.258, 'grad_norm': 0.5474610328674316, 'learning_rate': 8.640195733881511e-06, 'epoch': 0.95}
32%|███▏ | 3647/11526 [38:02<1:20:39, 1.63it/s] 32%|███▏ | 3648/11526 [38:02<1:20:41, 1.63it/s] {'loss': 0.3129, 'grad_norm': 0.5587754845619202, 'learning_rate': 8.639157451903017e-06, 'epoch': 0.95}
32%|███▏ | 3648/11526 [38:02<1:20:41, 1.63it/s] 32%|███▏ | 3649/11526 [38:03<1:20:40, 1.63it/s] {'loss': 0.3136, 'grad_norm': 0.5090718865394592, 'learning_rate': 8.638118836120295e-06, 'epoch': 0.95}
32%|███▏ | 3649/11526 [38:03<1:20:40, 1.63it/s] 32%|███▏ | 3650/11526 [38:03<1:20:37, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.4868636429309845, 'learning_rate': 8.637079886628614e-06, 'epoch': 0.95}
32%|███▏ | 3650/11526 [38:04<1:20:37, 1.63it/s] 32%|███▏ | 3651/11526 [38:04<1:20:43, 1.63it/s] {'loss': 0.2554, 'grad_norm': 0.5911749005317688, 'learning_rate': 8.636040603523273e-06, 'epoch': 0.95}
32%|███▏ | 3651/11526 [38:04<1:20:43, 1.63it/s] 32%|███▏ | 3652/11526 [38:05<1:20:42, 1.63it/s] {'loss': 0.2464, 'grad_norm': 0.5125033259391785, 'learning_rate': 8.635000986899598e-06, 'epoch': 0.95}
32%|███▏ | 3652/11526 [38:05<1:20:42, 1.63it/s] 32%|███▏ | 3653/11526 [38:05<1:20:39, 1.63it/s] {'loss': 0.2099, 'grad_norm': 0.4742435812950134, 'learning_rate': 8.63396103685295e-06, 'epoch': 0.95}
32%|███▏ | 3653/11526 [38:05<1:20:39, 1.63it/s] 32%|███▏ | 3654/11526 [38:06<1:20:38, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.4530431032180786, 'learning_rate': 8.63292075347872e-06, 'epoch': 0.95}
32%|███▏ | 3654/11526 [38:06<1:20:38, 1.63it/s] 32%|███▏ | 3655/11526 [38:07<1:20:36, 1.63it/s] {'loss': 0.193, 'grad_norm': 0.4499170482158661, 'learning_rate': 8.631880136872328e-06, 'epoch': 0.95}
32%|███▏ | 3655/11526 [38:07<1:20:36, 1.63it/s] 32%|███▏ | 3656/11526 [38:07<1:20:45, 1.62it/s] {'loss': 0.289, 'grad_norm': 0.6263868808746338, 'learning_rate': 8.630839187129225e-06, 'epoch': 0.95}
32%|███▏ | 3656/11526 [38:07<1:20:45, 1.62it/s] 32%|███▏ | 3657/11526 [38:08<1:20:39, 1.63it/s] {'loss': 0.3052, 'grad_norm': 0.6323683261871338, 'learning_rate': 8.629797904344894e-06, 'epoch': 0.95}
32%|███▏ | 3657/11526 [38:08<1:20:39, 1.63it/s] 32%|███▏ | 3658/11526 [38:08<1:20:38, 1.63it/s] {'loss': 0.2318, 'grad_norm': 0.5588894486427307, 'learning_rate': 8.628756288614844e-06, 'epoch': 0.95}
32%|███▏ | 3658/11526 [38:09<1:20:38, 1.63it/s] 32%|███▏ | 3659/11526 [38:09<1:20:35, 1.63it/s] {'loss': 0.1886, 'grad_norm': 0.4987980127334595, 'learning_rate': 8.627714340034623e-06, 'epoch': 0.95}
32%|███▏ | 3659/11526 [38:09<1:20:35, 1.63it/s] 32%|███▏ | 3660/11526 [38:10<1:20:32, 1.63it/s] {'loss': 0.1993, 'grad_norm': 0.443468302488327, 'learning_rate': 8.626672058699802e-06, 'epoch': 0.95}
32%|███▏ | 3660/11526 [38:10<1:20:32, 1.63it/s] 32%|███▏ | 3661/11526 [38:10<1:20:35, 1.63it/s] {'loss': 0.271, 'grad_norm': 0.6373046040534973, 'learning_rate': 8.625629444705982e-06, 'epoch': 0.95}
32%|███▏ | 3661/11526 [38:10<1:20:35, 1.63it/s] 32%|███▏ | 3662/11526 [38:11<1:20:32, 1.63it/s] {'loss': 0.2438, 'grad_norm': 0.48588547110557556, 'learning_rate': 8.624586498148803e-06, 'epoch': 0.95}
32%|███▏ | 3662/11526 [38:11<1:20:32, 1.63it/s] 32%|███▏ | 3663/11526 [38:11<1:20:27, 1.63it/s] {'loss': 0.2694, 'grad_norm': 0.5552598834037781, 'learning_rate': 8.623543219123926e-06, 'epoch': 0.95}
32%|███▏ | 3663/11526 [38:12<1:20:27, 1.63it/s] 32%|███▏ | 3664/11526 [38:12<1:20:27, 1.63it/s] {'loss': 0.2881, 'grad_norm': 0.6620440483093262, 'learning_rate': 8.622499607727049e-06, 'epoch': 0.95}
32%|███▏ | 3664/11526 [38:12<1:20:27, 1.63it/s] 32%|███▏ | 3665/11526 [38:13<1:20:27, 1.63it/s] {'loss': 0.3154, 'grad_norm': 0.625004768371582, 'learning_rate': 8.621455664053897e-06, 'epoch': 0.95}
32%|███▏ | 3665/11526 [38:13<1:20:27, 1.63it/s] 32%|███▏ | 3666/11526 [38:13<1:20:33, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.45975038409233093, 'learning_rate': 8.620411388200226e-06, 'epoch': 0.95}
32%|███▏ | 3666/11526 [38:13<1:20:33, 1.63it/s] 32%|███▏ | 3667/11526 [38:14<1:20:33, 1.63it/s] {'loss': 0.2323, 'grad_norm': 0.5334672927856445, 'learning_rate': 8.619366780261822e-06, 'epoch': 0.95}
32%|███▏ | 3667/11526 [38:14<1:20:33, 1.63it/s] 32%|███▏ | 3668/11526 [38:15<1:20:30, 1.63it/s] {'loss': 0.2294, 'grad_norm': 0.4663825035095215, 'learning_rate': 8.618321840334505e-06, 'epoch': 0.95}
32%|███▏ | 3668/11526 [38:15<1:20:30, 1.63it/s] 32%|███▏ | 3669/11526 [38:15<1:20:28, 1.63it/s] {'loss': 0.2227, 'grad_norm': 0.4888138175010681, 'learning_rate': 8.61727656851412e-06, 'epoch': 0.95}
32%|███▏ | 3669/11526 [38:15<1:20:28, 1.63it/s] 32%|███▏ | 3670/11526 [38:16<1:20:27, 1.63it/s] {'loss': 0.2766, 'grad_norm': 0.6534618139266968, 'learning_rate': 8.616230964896548e-06, 'epoch': 0.96}
32%|███▏ | 3670/11526 [38:16<1:20:27, 1.63it/s] 32%|███▏ | 3671/11526 [38:16<1:20:32, 1.63it/s] {'loss': 0.2372, 'grad_norm': 0.5110915303230286, 'learning_rate': 8.615185029577695e-06, 'epoch': 0.96}
32%|███▏ | 3671/11526 [38:17<1:20:32, 1.63it/s] 32%|███▏ | 3672/11526 [38:17<1:20:30, 1.63it/s] {'loss': 0.1871, 'grad_norm': 0.4395880699157715, 'learning_rate': 8.614138762653504e-06, 'epoch': 0.96}
32%|███▏ | 3672/11526 [38:17<1:20:30, 1.63it/s] 32%|███▏ | 3673/11526 [38:18<1:20:28, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.5440385937690735, 'learning_rate': 8.61309216421994e-06, 'epoch': 0.96}
32%|███▏ | 3673/11526 [38:18<1:20:28, 1.63it/s] 32%|███▏ | 3674/11526 [38:18<1:20:25, 1.63it/s] {'loss': 0.297, 'grad_norm': 0.5811224579811096, 'learning_rate': 8.612045234373005e-06, 'epoch': 0.96}
32%|███▏ | 3674/11526 [38:18<1:20:25, 1.63it/s] 32%|███▏ | 3675/11526 [38:19<1:20:21, 1.63it/s] {'loss': 0.2647, 'grad_norm': 0.5808017253875732, 'learning_rate': 8.61099797320873e-06, 'epoch': 0.96}
32%|███▏ | 3675/11526 [38:19<1:20:21, 1.63it/s] 32%|███▏ | 3676/11526 [38:19<1:20:24, 1.63it/s] {'loss': 0.2336, 'grad_norm': 0.5150797367095947, 'learning_rate': 8.609950380823176e-06, 'epoch': 0.96}
32%|███▏ | 3676/11526 [38:20<1:20:24, 1.63it/s] 32%|███▏ | 3677/11526 [38:20<1:20:24, 1.63it/s] {'loss': 0.3057, 'grad_norm': 0.567832887172699, 'learning_rate': 8.608902457312431e-06, 'epoch': 0.96}
32%|███▏ | 3677/11526 [38:20<1:20:24, 1.63it/s] 32%|███▏ | 3678/11526 [38:21<1:20:20, 1.63it/s] {'loss': 0.2815, 'grad_norm': 0.5268412828445435, 'learning_rate': 8.60785420277262e-06, 'epoch': 0.96}
32%|███▏ | 3678/11526 [38:21<1:20:20, 1.63it/s] 32%|███▏ | 3679/11526 [38:21<1:20:18, 1.63it/s] {'loss': 0.2459, 'grad_norm': 0.5466567277908325, 'learning_rate': 8.606805617299894e-06, 'epoch': 0.96}
32%|███▏ | 3679/11526 [38:21<1:20:18, 1.63it/s] 32%|███▏ | 3680/11526 [38:22<1:20:17, 1.63it/s] {'loss': 0.2701, 'grad_norm': 0.6188588738441467, 'learning_rate': 8.605756700990435e-06, 'epoch': 0.96}
32%|███▏ | 3680/11526 [38:22<1:20:17, 1.63it/s] 32%|███▏ | 3681/11526 [38:23<1:20:23, 1.63it/s] {'loss': 0.2376, 'grad_norm': 0.46557989716529846, 'learning_rate': 8.604707453940455e-06, 'epoch': 0.96}
32%|███▏ | 3681/11526 [38:23<1:20:23, 1.63it/s] 32%|███▏ | 3682/11526 [38:23<1:20:21, 1.63it/s] {'loss': 0.1991, 'grad_norm': 0.6118673086166382, 'learning_rate': 8.603657876246198e-06, 'epoch': 0.96}
32%|███▏ | 3682/11526 [38:23<1:20:21, 1.63it/s] 32%|███▏ | 3683/11526 [38:24<1:20:19, 1.63it/s] {'loss': 0.2327, 'grad_norm': 0.5290951728820801, 'learning_rate': 8.602607968003935e-06, 'epoch': 0.96}
32%|███▏ | 3683/11526 [38:24<1:20:19, 1.63it/s] 32%|███▏ | 3684/11526 [38:24<1:20:16, 1.63it/s] {'loss': 0.2845, 'grad_norm': 0.5128681063652039, 'learning_rate': 8.601557729309974e-06, 'epoch': 0.96}
32%|███▏ | 3684/11526 [38:25<1:20:16, 1.63it/s] 32%|███▏ | 3685/11526 [38:25<1:20:15, 1.63it/s] {'loss': 0.26, 'grad_norm': 0.5873117446899414, 'learning_rate': 8.600507160260644e-06, 'epoch': 0.96}
32%|███▏ | 3685/11526 [38:25<1:20:15, 1.63it/s] 32%|███▏ | 3686/11526 [38:26<1:20:22, 1.63it/s] {'loss': 0.2594, 'grad_norm': 0.6057473421096802, 'learning_rate': 8.599456260952312e-06, 'epoch': 0.96}
32%|███▏ | 3686/11526 [38:26<1:20:22, 1.63it/s] 32%|███▏ | 3687/11526 [38:26<1:20:18, 1.63it/s] {'loss': 0.247, 'grad_norm': 0.6262389421463013, 'learning_rate': 8.598405031481371e-06, 'epoch': 0.96}
32%|███▏ | 3687/11526 [38:26<1:20:18, 1.63it/s] 32%|███▏ | 3688/11526 [38:27<1:20:15, 1.63it/s] {'loss': 0.2571, 'grad_norm': 0.5432509183883667, 'learning_rate': 8.597353471944246e-06, 'epoch': 0.96}
32%|███▏ | 3688/11526 [38:27<1:20:15, 1.63it/s] 32%|███▏ | 3689/11526 [38:27<1:20:12, 1.63it/s] {'loss': 0.25, 'grad_norm': 0.5750232934951782, 'learning_rate': 8.596301582437394e-06, 'epoch': 0.96}
32%|███▏ | 3689/11526 [38:28<1:20:12, 1.63it/s] 32%|███▏ | 3690/11526 [38:28<1:20:12, 1.63it/s] {'loss': 0.1917, 'grad_norm': 0.4950275123119354, 'learning_rate': 8.595249363057299e-06, 'epoch': 0.96}
32%|███▏ | 3690/11526 [38:28<1:20:12, 1.63it/s] 32%|███▏ | 3691/11526 [38:29<1:20:16, 1.63it/s] {'loss': 0.2357, 'grad_norm': 0.5060938596725464, 'learning_rate': 8.594196813900475e-06, 'epoch': 0.96}
32%|███▏ | 3691/11526 [38:29<1:20:16, 1.63it/s] 32%|███▏ | 3692/11526 [38:29<1:20:13, 1.63it/s] {'loss': 0.2978, 'grad_norm': 0.5686876773834229, 'learning_rate': 8.593143935063469e-06, 'epoch': 0.96}
32%|███▏ | 3692/11526 [38:29<1:20:13, 1.63it/s] 32%|███▏ | 3693/11526 [38:30<1:20:11, 1.63it/s] {'loss': 0.2412, 'grad_norm': 0.5522571802139282, 'learning_rate': 8.592090726642856e-06, 'epoch': 0.96}
32%|███▏ | 3693/11526 [38:30<1:20:11, 1.63it/s] 32%|███▏ | 3694/11526 [38:31<1:20:08, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.6172462701797485, 'learning_rate': 8.591037188735247e-06, 'epoch': 0.96}
32%|███▏ | 3694/11526 [38:31<1:20:08, 1.63it/s] 32%|███▏ | 3695/11526 [38:31<1:20:11, 1.63it/s] {'loss': 0.2918, 'grad_norm': 0.534767746925354, 'learning_rate': 8.589983321437271e-06, 'epoch': 0.96}
32%|███▏ | 3695/11526 [38:31<1:20:11, 1.63it/s] 32%|███▏ | 3696/11526 [38:32<1:20:11, 1.63it/s] {'loss': 0.2467, 'grad_norm': 0.4739910364151001, 'learning_rate': 8.588929124845602e-06, 'epoch': 0.96}
32%|███▏ | 3696/11526 [38:32<1:20:11, 1.63it/s] 32%|███▏ | 3697/11526 [38:32<1:20:09, 1.63it/s] {'loss': 0.2386, 'grad_norm': 0.5492808818817139, 'learning_rate': 8.587874599056932e-06, 'epoch': 0.96}
32%|███▏ | 3697/11526 [38:33<1:20:09, 1.63it/s] 32%|███▏ | 3698/11526 [38:33<1:20:10, 1.63it/s] {'loss': 0.2628, 'grad_norm': 0.5451539754867554, 'learning_rate': 8.58681974416799e-06, 'epoch': 0.96}
32%|███▏ | 3698/11526 [38:33<1:20:10, 1.63it/s] 32%|███▏ | 3699/11526 [38:34<1:20:08, 1.63it/s] {'loss': 0.209, 'grad_norm': 0.449885755777359, 'learning_rate': 8.585764560275532e-06, 'epoch': 0.96}
32%|███▏ | 3699/11526 [38:34<1:20:08, 1.63it/s] 32%|███▏ | 3700/11526 [38:34<1:20:06, 1.63it/s] {'loss': 0.2195, 'grad_norm': 0.4838949143886566, 'learning_rate': 8.584709047476346e-06, 'epoch': 0.96}
32%|███▏ | 3700/11526 [38:34<1:20:06, 1.63it/s] 32%|███▏ | 3701/11526 [38:35<1:20:11, 1.63it/s] {'loss': 0.2256, 'grad_norm': 0.5289916396141052, 'learning_rate': 8.58365320586725e-06, 'epoch': 0.96}
32%|███▏ | 3701/11526 [38:35<1:20:11, 1.63it/s] 32%|███▏ | 3702/11526 [38:35<1:20:07, 1.63it/s] {'loss': 0.3142, 'grad_norm': 0.6188779473304749, 'learning_rate': 8.582597035545094e-06, 'epoch': 0.96}
32%|███▏ | 3702/11526 [38:36<1:20:07, 1.63it/s] 32%|███▏ | 3703/11526 [38:36<1:20:05, 1.63it/s] {'loss': 0.2757, 'grad_norm': 0.6289758086204529, 'learning_rate': 8.581540536606751e-06, 'epoch': 0.96}
32%|███▏ | 3703/11526 [38:36<1:20:05, 1.63it/s] 32%|███▏ | 3704/11526 [38:37<1:20:04, 1.63it/s] {'loss': 0.2787, 'grad_norm': 0.5337526798248291, 'learning_rate': 8.580483709149135e-06, 'epoch': 0.96}
32%|███▏ | 3704/11526 [38:37<1:20:04, 1.63it/s] 32%|███▏ | 3705/11526 [38:37<1:20:02, 1.63it/s] {'loss': 0.2333, 'grad_norm': 0.5628654360771179, 'learning_rate': 8.579426553269179e-06, 'epoch': 0.96}
32%|███▏ | 3705/11526 [38:37<1:20:02, 1.63it/s] 32%|███▏ | 3706/11526 [38:38<1:20:04, 1.63it/s] {'loss': 0.2648, 'grad_norm': 0.5101370215415955, 'learning_rate': 8.578369069063854e-06, 'epoch': 0.96}
32%|███▏ | 3706/11526 [38:38<1:20:04, 1.63it/s] 32%|███▏ | 3707/11526 [38:39<1:20:05, 1.63it/s] {'loss': 0.2719, 'grad_norm': 0.5497829914093018, 'learning_rate': 8.577311256630156e-06, 'epoch': 0.96}
32%|███▏ | 3707/11526 [38:39<1:20:05, 1.63it/s] 32%|███▏ | 3708/11526 [38:39<1:20:02, 1.63it/s] {'loss': 0.2191, 'grad_norm': 0.546440839767456, 'learning_rate': 8.57625311606512e-06, 'epoch': 0.97}
32%|███▏ | 3708/11526 [38:39<1:20:02, 1.63it/s] 32%|███▏ | 3709/11526 [38:40<1:20:03, 1.63it/s] {'loss': 0.2112, 'grad_norm': 0.47619563341140747, 'learning_rate': 8.575194647465796e-06, 'epoch': 0.97}
32%|███▏ | 3709/11526 [38:40<1:20:03, 1.63it/s] 32%|███▏ | 3710/11526 [38:40<1:20:02, 1.63it/s] {'loss': 0.2619, 'grad_norm': 0.5849548578262329, 'learning_rate': 8.574135850929277e-06, 'epoch': 0.97}
32%|███▏ | 3710/11526 [38:40<1:20:02, 1.63it/s] 32%|███▏ | 3711/11526 [38:41<1:20:05, 1.63it/s] {'loss': 0.2538, 'grad_norm': 0.5688178539276123, 'learning_rate': 8.573076726552684e-06, 'epoch': 0.97}
32%|███▏ | 3711/11526 [38:41<1:20:05, 1.63it/s] 32%|███▏ | 3712/11526 [38:42<1:20:03, 1.63it/s] {'loss': 0.3007, 'grad_norm': 0.504641056060791, 'learning_rate': 8.572017274433163e-06, 'epoch': 0.97}
32%|███▏ | 3712/11526 [38:42<1:20:03, 1.63it/s] 32%|███▏ | 3713/11526 [38:42<1:20:02, 1.63it/s] {'loss': 0.3362, 'grad_norm': 0.6409885883331299, 'learning_rate': 8.570957494667894e-06, 'epoch': 0.97}
32%|███▏ | 3713/11526 [38:42<1:20:02, 1.63it/s] 32%|███▏ | 3714/11526 [38:43<1:19:57, 1.63it/s] {'loss': 0.2597, 'grad_norm': 0.5519269704818726, 'learning_rate': 8.569897387354083e-06, 'epoch': 0.97}
32%|███▏ | 3714/11526 [38:43<1:19:57, 1.63it/s] 32%|███▏ | 3715/11526 [38:43<1:19:55, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.4150357246398926, 'learning_rate': 8.568836952588975e-06, 'epoch': 0.97}
32%|███▏ | 3715/11526 [38:44<1:19:55, 1.63it/s] 32%|███▏ | 3716/11526 [38:44<1:19:55, 1.63it/s] {'loss': 0.2312, 'grad_norm': 0.4350292384624481, 'learning_rate': 8.567776190469835e-06, 'epoch': 0.97}
32%|███▏ | 3716/11526 [38:44<1:19:55, 1.63it/s] 32%|███▏ | 3717/11526 [38:45<1:19:54, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.5333831310272217, 'learning_rate': 8.566715101093961e-06, 'epoch': 0.97}
32%|███▏ | 3717/11526 [38:45<1:19:54, 1.63it/s] 32%|███▏ | 3718/11526 [38:45<1:19:54, 1.63it/s] {'loss': 0.2377, 'grad_norm': 0.49481093883514404, 'learning_rate': 8.565653684558685e-06, 'epoch': 0.97}
32%|███▏ | 3718/11526 [38:45<1:19:54, 1.63it/s] 32%|███▏ | 3719/11526 [38:46<1:19:52, 1.63it/s] {'loss': 0.2882, 'grad_norm': 0.5888106226921082, 'learning_rate': 8.564591940961367e-06, 'epoch': 0.97}
32%|███▏ | 3719/11526 [38:46<1:19:52, 1.63it/s] 32%|███▏ | 3720/11526 [38:46<1:19:51, 1.63it/s] {'loss': 0.2564, 'grad_norm': 0.6113447546958923, 'learning_rate': 8.563529870399394e-06, 'epoch': 0.97}
32%|███▏ | 3720/11526 [38:47<1:19:51, 1.63it/s] 32%|███▏ | 3721/11526 [38:47<1:19:53, 1.63it/s] {'loss': 0.2906, 'grad_norm': 0.5702401399612427, 'learning_rate': 8.562467472970187e-06, 'epoch': 0.97}
32%|███▏ | 3721/11526 [38:47<1:19:53, 1.63it/s] 32%|███▏ | 3722/11526 [38:48<1:19:56, 1.63it/s] {'loss': 0.3107, 'grad_norm': 0.6572955250740051, 'learning_rate': 8.561404748771191e-06, 'epoch': 0.97}
32%|███▏ | 3722/11526 [38:48<1:19:56, 1.63it/s] 32%|███▏ | 3723/11526 [38:48<1:19:55, 1.63it/s] {'loss': 0.2398, 'grad_norm': 0.5347450971603394, 'learning_rate': 8.560341697899889e-06, 'epoch': 0.97}
32%|███▏ | 3723/11526 [38:48<1:19:55, 1.63it/s] 32%|███▏ | 3724/11526 [38:49<1:19:54, 1.63it/s] {'loss': 0.3061, 'grad_norm': 0.6162471175193787, 'learning_rate': 8.559278320453788e-06, 'epoch': 0.97}
32%|███▏ | 3724/11526 [38:49<1:19:54, 1.63it/s] 32%|███▏ | 3725/11526 [38:50<1:19:51, 1.63it/s] {'loss': 0.3053, 'grad_norm': 0.6356250047683716, 'learning_rate': 8.558214616530429e-06, 'epoch': 0.97}
32%|███▏ | 3725/11526 [38:50<1:19:51, 1.63it/s] 32%|███▏ | 3726/11526 [38:50<1:19:56, 1.63it/s] {'loss': 0.3227, 'grad_norm': 0.6402420401573181, 'learning_rate': 8.55715058622738e-06, 'epoch': 0.97}
32%|███▏ | 3726/11526 [38:50<1:19:56, 1.63it/s] 32%|███▏ | 3727/11526 [38:51<1:19:52, 1.63it/s] {'loss': 0.2478, 'grad_norm': 0.5822762250900269, 'learning_rate': 8.556086229642239e-06, 'epoch': 0.97}
32%|███▏ | 3727/11526 [38:51<1:19:52, 1.63it/s] 32%|███▏ | 3728/11526 [38:51<1:19:52, 1.63it/s] {'loss': 0.2408, 'grad_norm': 0.5000674724578857, 'learning_rate': 8.555021546872637e-06, 'epoch': 0.97}
32%|███▏ | 3728/11526 [38:52<1:19:52, 1.63it/s] 32%|███▏ | 3729/11526 [38:52<1:19:51, 1.63it/s] {'loss': 0.4285, 'grad_norm': 0.6075376868247986, 'learning_rate': 8.55395653801623e-06, 'epoch': 0.97}
32%|███▏ | 3729/11526 [38:52<1:19:51, 1.63it/s] 32%|███▏ | 3730/11526 [38:53<1:19:48, 1.63it/s] {'loss': 0.2489, 'grad_norm': 0.5012380480766296, 'learning_rate': 8.55289120317071e-06, 'epoch': 0.97}
32%|███▏ | 3730/11526 [38:53<1:19:48, 1.63it/s] 32%|███▏ | 3731/11526 [38:53<1:19:51, 1.63it/s] {'loss': 0.2247, 'grad_norm': 0.5004627108573914, 'learning_rate': 8.551825542433792e-06, 'epoch': 0.97}
32%|███▏ | 3731/11526 [38:53<1:19:51, 1.63it/s] 32%|███▏ | 3732/11526 [38:54<1:19:50, 1.63it/s] {'loss': 0.2958, 'grad_norm': 0.5623750686645508, 'learning_rate': 8.550759555903228e-06, 'epoch': 0.97}
32%|███▏ | 3732/11526 [38:54<1:19:50, 1.63it/s] 32%|███▏ | 3733/11526 [38:54<1:19:45, 1.63it/s] {'loss': 0.2846, 'grad_norm': 0.5778385400772095, 'learning_rate': 8.549693243676793e-06, 'epoch': 0.97}
32%|███▏ | 3733/11526 [38:55<1:19:45, 1.63it/s] 32%|███▏ | 3734/11526 [38:55<1:19:45, 1.63it/s] {'loss': 0.3014, 'grad_norm': 0.6068590879440308, 'learning_rate': 8.548626605852297e-06, 'epoch': 0.97}
32%|███▏ | 3734/11526 [38:55<1:19:45, 1.63it/s] 32%|███▏ | 3735/11526 [38:56<1:19:44, 1.63it/s] {'loss': 0.244, 'grad_norm': 0.4617166817188263, 'learning_rate': 8.547559642527578e-06, 'epoch': 0.97}
32%|███▏ | 3735/11526 [38:56<1:19:44, 1.63it/s] 32%|███▏ | 3736/11526 [38:56<1:19:45, 1.63it/s] {'loss': 0.3109, 'grad_norm': 0.6629009246826172, 'learning_rate': 8.546492353800504e-06, 'epoch': 0.97}
32%|███▏ | 3736/11526 [38:56<1:19:45, 1.63it/s] 32%|███▏ | 3737/11526 [38:57<1:19:43, 1.63it/s] {'loss': 0.2555, 'grad_norm': 0.5104737877845764, 'learning_rate': 8.545424739768973e-06, 'epoch': 0.97}
32%|███▏ | 3737/11526 [38:57<1:19:43, 1.63it/s] 32%|███▏ | 3738/11526 [38:58<1:19:43, 1.63it/s] {'loss': 0.2678, 'grad_norm': 0.49302011728286743, 'learning_rate': 8.544356800530912e-06, 'epoch': 0.97}
32%|███▏ | 3738/11526 [38:58<1:19:43, 1.63it/s] 32%|███▏ | 3739/11526 [38:58<1:19:40, 1.63it/s] {'loss': 0.2771, 'grad_norm': 0.5704540610313416, 'learning_rate': 8.543288536184279e-06, 'epoch': 0.97}
32%|███▏ | 3739/11526 [38:58<1:19:40, 1.63it/s] 32%|███▏ | 3740/11526 [38:59<1:19:39, 1.63it/s] {'loss': 0.2374, 'grad_norm': 0.5356829166412354, 'learning_rate': 8.54221994682706e-06, 'epoch': 0.97}
32%|███▏ | 3740/11526 [38:59<1:19:39, 1.63it/s] 32%|███▏ | 3741/11526 [38:59<1:19:42, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.48583850264549255, 'learning_rate': 8.541151032557273e-06, 'epoch': 0.97}
32%|███▏ | 3741/11526 [39:00<1:19:42, 1.63it/s] 32%|███▏ | 3742/11526 [39:00<1:19:42, 1.63it/s] {'loss': 0.2735, 'grad_norm': 0.5261116027832031, 'learning_rate': 8.540081793472966e-06, 'epoch': 0.97}
32%|███▏ | 3742/11526 [39:00<1:19:42, 1.63it/s] 32%|███▏ | 3743/11526 [39:01<1:19:41, 1.63it/s] {'loss': 0.3179, 'grad_norm': 0.6487370729446411, 'learning_rate': 8.539012229672215e-06, 'epoch': 0.97}
32%|███▏ | 3743/11526 [39:01<1:19:41, 1.63it/s] 32%|███▏ | 3744/11526 [39:01<1:19:41, 1.63it/s] {'loss': 0.1966, 'grad_norm': 0.47726500034332275, 'learning_rate': 8.537942341253125e-06, 'epoch': 0.97}
32%|███▏ | 3744/11526 [39:01<1:19:41, 1.63it/s] 32%|███▏ | 3745/11526 [39:02<1:19:36, 1.63it/s] {'loss': 0.2823, 'grad_norm': 0.5562405586242676, 'learning_rate': 8.536872128313833e-06, 'epoch': 0.97}
32%|███▏ | 3745/11526 [39:02<1:19:36, 1.63it/s] 33%|███▎ | 3746/11526 [39:02<1:19:40, 1.63it/s] {'loss': 0.2367, 'grad_norm': 0.5290451645851135, 'learning_rate': 8.535801590952505e-06, 'epoch': 0.98}
33%|███▎ | 3746/11526 [39:03<1:19:40, 1.63it/s] 33%|███▎ | 3747/11526 [39:03<1:19:40, 1.63it/s] {'loss': 0.2886, 'grad_norm': 0.5983887910842896, 'learning_rate': 8.53473072926734e-06, 'epoch': 0.98}
33%|███▎ | 3747/11526 [39:03<1:19:40, 1.63it/s] 33%|███▎ | 3748/11526 [39:04<1:19:37, 1.63it/s] {'loss': 0.275, 'grad_norm': 0.5990630984306335, 'learning_rate': 8.533659543356559e-06, 'epoch': 0.98}
33%|███▎ | 3748/11526 [39:04<1:19:37, 1.63it/s] 33%|███▎ | 3749/11526 [39:04<1:19:37, 1.63it/s] {'loss': 0.2443, 'grad_norm': 0.5639902949333191, 'learning_rate': 8.532588033318418e-06, 'epoch': 0.98}
33%|███▎ | 3749/11526 [39:04<1:19:37, 1.63it/s] 33%|███▎ | 3750/11526 [39:05<1:19:35, 1.63it/s] {'loss': 0.2235, 'grad_norm': 0.550783634185791, 'learning_rate': 8.531516199251206e-06, 'epoch': 0.98}
33%|███▎ | 3750/11526 [39:05<1:19:35, 1.63it/s] 33%|███▎ | 3751/11526 [39:06<1:19:42, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.5523108839988708, 'learning_rate': 8.530444041253231e-06, 'epoch': 0.98}
33%|███▎ | 3751/11526 [39:06<1:19:42, 1.63it/s] 33%|███▎ | 3752/11526 [39:06<1:19:38, 1.63it/s] {'loss': 0.2473, 'grad_norm': 0.4838404357433319, 'learning_rate': 8.529371559422843e-06, 'epoch': 0.98}
33%|███▎ | 3752/11526 [39:06<1:19:38, 1.63it/s] 33%|███▎ | 3753/11526 [39:07<1:19:40, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.4472960829734802, 'learning_rate': 8.528298753858416e-06, 'epoch': 0.98}
33%|███▎ | 3753/11526 [39:07<1:19:40, 1.63it/s] 33%|███▎ | 3754/11526 [39:07<1:19:38, 1.63it/s] {'loss': 0.2427, 'grad_norm': 0.5212552547454834, 'learning_rate': 8.527225624658349e-06, 'epoch': 0.98}
33%|███▎ | 3754/11526 [39:08<1:19:38, 1.63it/s] 33%|███▎ | 3755/11526 [39:08<1:19:41, 1.63it/s] {'loss': 0.2421, 'grad_norm': 0.4729224741458893, 'learning_rate': 8.52615217192108e-06, 'epoch': 0.98}
33%|███▎ | 3755/11526 [39:08<1:19:41, 1.63it/s] 33%|███▎ | 3756/11526 [39:09<1:19:45, 1.62it/s] {'loss': 0.2165, 'grad_norm': 0.4813438057899475, 'learning_rate': 8.52507839574507e-06, 'epoch': 0.98}
33%|███▎ | 3756/11526 [39:09<1:19:45, 1.62it/s] 33%|███▎ | 3757/11526 [39:09<1:19:39, 1.63it/s] {'loss': 0.1944, 'grad_norm': 0.48992446064949036, 'learning_rate': 8.524004296228814e-06, 'epoch': 0.98}
33%|███▎ | 3757/11526 [39:09<1:19:39, 1.63it/s] 33%|███▎ | 3758/11526 [39:10<1:19:36, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.4414837062358856, 'learning_rate': 8.522929873470833e-06, 'epoch': 0.98}
33%|███▎ | 3758/11526 [39:10<1:19:36, 1.63it/s] 33%|███▎ | 3759/11526 [39:10<1:19:34, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.45965269207954407, 'learning_rate': 8.52185512756968e-06, 'epoch': 0.98}
33%|███▎ | 3759/11526 [39:11<1:19:34, 1.63it/s] 33%|███▎ | 3760/11526 [39:11<1:19:32, 1.63it/s] {'loss': 0.2779, 'grad_norm': 0.49723130464553833, 'learning_rate': 8.520780058623935e-06, 'epoch': 0.98}
33%|███▎ | 3760/11526 [39:11<1:19:32, 1.63it/s] 33%|███▎ | 3761/11526 [39:12<1:19:39, 1.62it/s] {'loss': 0.3159, 'grad_norm': 0.566037118434906, 'learning_rate': 8.51970466673221e-06, 'epoch': 0.98}
33%|███▎ | 3761/11526 [39:12<1:19:39, 1.62it/s] 33%|███▎ | 3762/11526 [39:12<1:19:36, 1.63it/s] {'loss': 0.2306, 'grad_norm': 0.42194607853889465, 'learning_rate': 8.518628951993147e-06, 'epoch': 0.98}
33%|███▎ | 3762/11526 [39:12<1:19:36, 1.63it/s] 33%|███▎ | 3763/11526 [39:13<1:19:32, 1.63it/s] {'loss': 0.2771, 'grad_norm': 0.553989589214325, 'learning_rate': 8.517552914505417e-06, 'epoch': 0.98}
33%|███▎ | 3763/11526 [39:13<1:19:32, 1.63it/s] 33%|███▎ | 3764/11526 [39:14<1:19:27, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.4139954745769501, 'learning_rate': 8.516476554367719e-06, 'epoch': 0.98}
33%|███▎ | 3764/11526 [39:14<1:19:27, 1.63it/s] 33%|███▎ | 3765/11526 [39:14<1:19:25, 1.63it/s] {'loss': 0.2289, 'grad_norm': 0.5443884134292603, 'learning_rate': 8.515399871678784e-06, 'epoch': 0.98}
33%|███▎ | 3765/11526 [39:14<1:19:25, 1.63it/s] 33%|███▎ | 3766/11526 [39:15<1:19:30, 1.63it/s] {'loss': 0.3302, 'grad_norm': 0.6584514379501343, 'learning_rate': 8.514322866537371e-06, 'epoch': 0.98}
33%|███▎ | 3766/11526 [39:15<1:19:30, 1.63it/s] 33%|███▎ | 3767/11526 [39:15<1:19:33, 1.63it/s] {'loss': 0.2701, 'grad_norm': 0.5252086520195007, 'learning_rate': 8.513245539042269e-06, 'epoch': 0.98}
33%|███▎ | 3767/11526 [39:16<1:19:33, 1.63it/s] 33%|███▎ | 3768/11526 [39:16<1:19:28, 1.63it/s] {'loss': 0.2576, 'grad_norm': 0.5510846376419067, 'learning_rate': 8.512167889292296e-06, 'epoch': 0.98}
33%|███▎ | 3768/11526 [39:16<1:19:28, 1.63it/s] 33%|███▎ | 3769/11526 [39:17<1:19:27, 1.63it/s] {'loss': 0.3034, 'grad_norm': 0.5748445391654968, 'learning_rate': 8.511089917386302e-06, 'epoch': 0.98}
33%|███▎ | 3769/11526 [39:17<1:19:27, 1.63it/s] 33%|███▎ | 3770/11526 [39:17<1:19:24, 1.63it/s] {'loss': 0.2436, 'grad_norm': 0.5501213669776917, 'learning_rate': 8.510011623423163e-06, 'epoch': 0.98}
33%|███▎ | 3770/11526 [39:17<1:19:24, 1.63it/s] 33%|███▎ | 3771/11526 [39:18<1:19:27, 1.63it/s] {'loss': 0.2754, 'grad_norm': 0.5628907084465027, 'learning_rate': 8.508933007501786e-06, 'epoch': 0.98}
33%|███▎ | 3771/11526 [39:18<1:19:27, 1.63it/s] 33%|███▎ | 3772/11526 [39:18<1:19:22, 1.63it/s] {'loss': 0.2605, 'grad_norm': 0.5949823260307312, 'learning_rate': 8.50785406972111e-06, 'epoch': 0.98}
33%|███▎ | 3772/11526 [39:19<1:19:22, 1.63it/s] 33%|███▎ | 3773/11526 [39:19<1:19:22, 1.63it/s] {'loss': 0.2659, 'grad_norm': 0.5572293996810913, 'learning_rate': 8.506774810180099e-06, 'epoch': 0.98}
33%|███▎ | 3773/11526 [39:19<1:19:22, 1.63it/s] 33%|███▎ | 3774/11526 [39:20<1:19:21, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5046249628067017, 'learning_rate': 8.505695228977748e-06, 'epoch': 0.98}
33%|███▎ | 3774/11526 [39:20<1:19:21, 1.63it/s] 33%|███▎ | 3775/11526 [39:20<1:19:18, 1.63it/s] {'loss': 0.2398, 'grad_norm': 0.5055828094482422, 'learning_rate': 8.504615326213085e-06, 'epoch': 0.98}
33%|███▎ | 3775/11526 [39:20<1:19:18, 1.63it/s] 33%|███▎ | 3776/11526 [39:21<1:19:26, 1.63it/s] {'loss': 0.2746, 'grad_norm': 0.5850760340690613, 'learning_rate': 8.503535101985166e-06, 'epoch': 0.98}
33%|███▎ | 3776/11526 [39:21<1:19:26, 1.63it/s] 33%|███▎ | 3777/11526 [39:22<1:19:22, 1.63it/s] {'loss': 0.2301, 'grad_norm': 0.5632278323173523, 'learning_rate': 8.502454556393071e-06, 'epoch': 0.98}
33%|███▎ | 3777/11526 [39:22<1:19:22, 1.63it/s] 33%|███▎ | 3778/11526 [39:22<1:19:19, 1.63it/s] {'loss': 0.2585, 'grad_norm': 0.5543249845504761, 'learning_rate': 8.501373689535916e-06, 'epoch': 0.98}
33%|███▎ | 3778/11526 [39:22<1:19:19, 1.63it/s] 33%|███▎ | 3779/11526 [39:23<1:19:17, 1.63it/s] {'loss': 0.3087, 'grad_norm': 0.6444487571716309, 'learning_rate': 8.500292501512845e-06, 'epoch': 0.98}
33%|███▎ | 3779/11526 [39:23<1:19:17, 1.63it/s] 33%|███▎ | 3780/11526 [39:23<1:19:17, 1.63it/s] {'loss': 0.2358, 'grad_norm': 0.515963613986969, 'learning_rate': 8.499210992423031e-06, 'epoch': 0.98}
33%|███▎ | 3780/11526 [39:24<1:19:17, 1.63it/s] 33%|███▎ | 3781/11526 [39:24<1:19:20, 1.63it/s] {'loss': 0.2719, 'grad_norm': 0.5201125741004944, 'learning_rate': 8.498129162365674e-06, 'epoch': 0.98}
33%|███▎ | 3781/11526 [39:24<1:19:20, 1.63it/s] 33%|███▎ | 3782/11526 [39:25<1:19:18, 1.63it/s] {'loss': 0.2772, 'grad_norm': 0.5689443349838257, 'learning_rate': 8.497047011440005e-06, 'epoch': 0.98}
33%|███▎ | 3782/11526 [39:25<1:19:18, 1.63it/s] 33%|███▎ | 3783/11526 [39:25<1:19:16, 1.63it/s] {'loss': 0.3206, 'grad_norm': 0.6712009310722351, 'learning_rate': 8.495964539745289e-06, 'epoch': 0.98}
33%|███▎ | 3783/11526 [39:25<1:19:16, 1.63it/s] 33%|███▎ | 3784/11526 [39:26<1:19:16, 1.63it/s] {'loss': 0.3182, 'grad_norm': 0.6467244029045105, 'learning_rate': 8.494881747380814e-06, 'epoch': 0.98}
33%|███▎ | 3784/11526 [39:26<1:19:16, 1.63it/s] 33%|███▎ | 3785/11526 [39:26<1:19:14, 1.63it/s] {'loss': 0.206, 'grad_norm': 0.4547690153121948, 'learning_rate': 8.493798634445898e-06, 'epoch': 0.99}
33%|███▎ | 3785/11526 [39:27<1:19:14, 1.63it/s] 33%|███▎ | 3786/11526 [39:27<1:19:14, 1.63it/s] {'loss': 0.2548, 'grad_norm': 0.5037268996238708, 'learning_rate': 8.492715201039894e-06, 'epoch': 0.99}
33%|███▎ | 3786/11526 [39:27<1:19:14, 1.63it/s] 33%|███▎ | 3787/11526 [39:28<1:19:12, 1.63it/s] {'loss': 0.2358, 'grad_norm': 0.5487033724784851, 'learning_rate': 8.491631447262178e-06, 'epoch': 0.99}
33%|███▎ | 3787/11526 [39:28<1:19:12, 1.63it/s] 33%|███▎ | 3788/11526 [39:28<1:19:10, 1.63it/s] {'loss': 0.2761, 'grad_norm': 0.5337626338005066, 'learning_rate': 8.49054737321216e-06, 'epoch': 0.99}
33%|███▎ | 3788/11526 [39:28<1:19:10, 1.63it/s] 33%|███▎ | 3789/11526 [39:29<1:19:09, 1.63it/s] {'loss': 0.2829, 'grad_norm': 0.5650826692581177, 'learning_rate': 8.489462978989278e-06, 'epoch': 0.99}
33%|███▎ | 3789/11526 [39:29<1:19:09, 1.63it/s] 33%|███▎ | 3790/11526 [39:30<1:19:10, 1.63it/s] {'loss': 0.1845, 'grad_norm': 0.4954807460308075, 'learning_rate': 8.488378264692996e-06, 'epoch': 0.99}
33%|███▎ | 3790/11526 [39:30<1:19:10, 1.63it/s] 33%|███▎ | 3791/11526 [39:30<1:19:11, 1.63it/s] {'loss': 0.2109, 'grad_norm': 0.4646914005279541, 'learning_rate': 8.487293230422811e-06, 'epoch': 0.99}
33%|███▎ | 3791/11526 [39:30<1:19:11, 1.63it/s] 33%|███▎ | 3792/11526 [39:31<1:19:09, 1.63it/s] {'loss': 0.2773, 'grad_norm': 0.606476902961731, 'learning_rate': 8.48620787627825e-06, 'epoch': 0.99}
33%|███▎ | 3792/11526 [39:31<1:19:09, 1.63it/s] 33%|███▎ | 3793/11526 [39:31<1:19:07, 1.63it/s] {'loss': 0.2347, 'grad_norm': 0.5617648363113403, 'learning_rate': 8.485122202358866e-06, 'epoch': 0.99}
33%|███▎ | 3793/11526 [39:31<1:19:07, 1.63it/s] 33%|███▎ | 3794/11526 [39:32<1:19:10, 1.63it/s] {'loss': 0.2268, 'grad_norm': 0.5414064526557922, 'learning_rate': 8.484036208764244e-06, 'epoch': 0.99}
33%|███▎ | 3794/11526 [39:32<1:19:10, 1.63it/s] 33%|███▎ | 3795/11526 [39:33<1:19:06, 1.63it/s] {'loss': 0.2251, 'grad_norm': 0.5355178117752075, 'learning_rate': 8.482949895593999e-06, 'epoch': 0.99}
33%|███▎ | 3795/11526 [39:33<1:19:06, 1.63it/s] 33%|███▎ | 3796/11526 [39:33<1:19:13, 1.63it/s] {'loss': 0.2469, 'grad_norm': 0.6509897708892822, 'learning_rate': 8.48186326294777e-06, 'epoch': 0.99}
33%|███▎ | 3796/11526 [39:33<1:19:13, 1.63it/s] 33%|███▎ | 3797/11526 [39:34<1:19:13, 1.63it/s] {'loss': 0.2042, 'grad_norm': 0.4999769330024719, 'learning_rate': 8.480776310925234e-06, 'epoch': 0.99}
33%|███▎ | 3797/11526 [39:34<1:19:13, 1.63it/s] 33%|███▎ | 3798/11526 [39:34<1:19:12, 1.63it/s] {'loss': 0.225, 'grad_norm': 0.5276270508766174, 'learning_rate': 8.479689039626088e-06, 'epoch': 0.99}
33%|███▎ | 3798/11526 [39:35<1:19:12, 1.63it/s] 33%|███▎ | 3799/11526 [39:35<1:19:11, 1.63it/s] {'loss': 0.2177, 'grad_norm': 0.4849066138267517, 'learning_rate': 8.478601449150066e-06, 'epoch': 0.99}
33%|███▎ | 3799/11526 [39:35<1:19:11, 1.63it/s] 33%|███▎ | 3800/11526 [39:36<1:19:08, 1.63it/s] {'loss': 0.2002, 'grad_norm': 0.4548960328102112, 'learning_rate': 8.477513539596925e-06, 'epoch': 0.99}
33%|███▎ | 3800/11526 [39:36<1:19:08, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6247916221618652, 'eval_runtime': 1.9548, 'eval_samples_per_second': 102.313, 'eval_steps_per_second': 6.65, 'epoch': 0.99}
33%|███▎ | 3800/11526 [39:38<1:19:08, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 33%|███▎ | 3801/11526 [39:38<2:34:57, 1.20s/it] {'loss': 0.26, 'grad_norm': 0.5488936901092529, 'learning_rate': 8.476425311066459e-06, 'epoch': 0.99}
33%|███▎ | 3801/11526 [39:38<2:34:57, 1.20s/it] 33%|███▎ | 3802/11526 [39:39<2:12:12, 1.03s/it] {'loss': 0.218, 'grad_norm': 0.49437591433525085, 'learning_rate': 8.47533676365848e-06, 'epoch': 0.99}
33%|███▎ | 3802/11526 [39:39<2:12:12, 1.03s/it] 33%|███▎ | 3803/11526 [39:39<1:56:14, 1.11it/s] {'loss': 0.1818, 'grad_norm': 0.4582686722278595, 'learning_rate': 8.474247897472842e-06, 'epoch': 0.99}
33%|███▎ | 3803/11526 [39:40<1:56:14, 1.11it/s] 33%|███▎ | 3804/11526 [39:40<1:45:03, 1.22it/s] {'loss': 0.3753, 'grad_norm': 0.6336904764175415, 'learning_rate': 8.473158712609417e-06, 'epoch': 0.99}
33%|███▎ | 3804/11526 [39:40<1:45:03, 1.22it/s] 33%|███▎ | 3805/11526 [39:41<1:37:14, 1.32it/s] {'loss': 0.2795, 'grad_norm': 0.6017728447914124, 'learning_rate': 8.472069209168115e-06, 'epoch': 0.99}
33%|███▎ | 3805/11526 [39:41<1:37:14, 1.32it/s] 33%|███▎ | 3806/11526 [39:41<1:31:47, 1.40it/s] {'loss': 0.224, 'grad_norm': 0.4671497344970703, 'learning_rate': 8.470979387248869e-06, 'epoch': 0.99}
33%|███▎ | 3806/11526 [39:41<1:31:47, 1.40it/s] 33%|███▎ | 3807/11526 [39:42<1:27:56, 1.46it/s] {'loss': 0.2461, 'grad_norm': 0.47942644357681274, 'learning_rate': 8.469889246951644e-06, 'epoch': 0.99}
33%|███▎ | 3807/11526 [39:42<1:27:56, 1.46it/s] 33%|███▎ | 3808/11526 [39:43<1:25:14, 1.51it/s] {'loss': 0.3078, 'grad_norm': 0.6417940258979797, 'learning_rate': 8.468798788376436e-06, 'epoch': 0.99}
33%|███▎ | 3808/11526 [39:43<1:25:14, 1.51it/s] 33%|███▎ | 3809/11526 [39:43<1:23:21, 1.54it/s] {'loss': 0.2542, 'grad_norm': 0.5567340850830078, 'learning_rate': 8.467708011623266e-06, 'epoch': 0.99}
33%|███▎ | 3809/11526 [39:43<1:23:21, 1.54it/s] 33%|███▎ | 3810/11526 [39:44<1:22:00, 1.57it/s] {'loss': 0.2479, 'grad_norm': 0.47228506207466125, 'learning_rate': 8.466616916792185e-06, 'epoch': 0.99}
33%|███▎ | 3810/11526 [39:44<1:22:00, 1.57it/s] 33%|███▎ | 3811/11526 [39:44<1:21:03, 1.59it/s] {'loss': 0.3321, 'grad_norm': 0.6476002335548401, 'learning_rate': 8.465525503983278e-06, 'epoch': 0.99}
33%|███▎ | 3811/11526 [39:45<1:21:03, 1.59it/s] 33%|███▎ | 3812/11526 [39:45<1:20:27, 1.60it/s] {'loss': 0.2956, 'grad_norm': 0.5145086050033569, 'learning_rate': 8.464433773296652e-06, 'epoch': 0.99}
33%|███▎ | 3812/11526 [39:45<1:20:27, 1.60it/s] 33%|███▎ | 3813/11526 [39:46<1:19:57, 1.61it/s] {'loss': 0.2561, 'grad_norm': 0.6086496710777283, 'learning_rate': 8.46334172483245e-06, 'epoch': 0.99}
33%|███▎ | 3813/11526 [39:46<1:19:57, 1.61it/s] 33%|███▎ | 3814/11526 [39:46<1:19:38, 1.61it/s] {'loss': 0.2155, 'grad_norm': 0.49508756399154663, 'learning_rate': 8.462249358690839e-06, 'epoch': 0.99}
33%|███▎ | 3814/11526 [39:46<1:19:38, 1.61it/s] 33%|███▎ | 3815/11526 [39:47<1:19:25, 1.62it/s] {'loss': 0.302, 'grad_norm': 0.5455553531646729, 'learning_rate': 8.461156674972018e-06, 'epoch': 0.99}
33%|███▎ | 3815/11526 [39:47<1:19:25, 1.62it/s] 33%|███▎ | 3816/11526 [39:47<1:19:16, 1.62it/s] {'loss': 0.2518, 'grad_norm': 0.5789459943771362, 'learning_rate': 8.460063673776213e-06, 'epoch': 0.99}
33%|███▎ | 3816/11526 [39:48<1:19:16, 1.62it/s] 33%|███▎ | 3817/11526 [39:48<1:19:12, 1.62it/s] {'loss': 0.1816, 'grad_norm': 0.388903945684433, 'learning_rate': 8.458970355203681e-06, 'epoch': 0.99}
33%|███▎ | 3817/11526 [39:48<1:19:12, 1.62it/s] 33%|███▎ | 3818/11526 [39:49<1:19:04, 1.62it/s] {'loss': 0.205, 'grad_norm': 0.4841565787792206, 'learning_rate': 8.457876719354708e-06, 'epoch': 0.99}
33%|███▎ | 3818/11526 [39:49<1:19:04, 1.62it/s] 33%|███▎ | 3819/11526 [39:49<1:18:58, 1.63it/s] {'loss': 0.254, 'grad_norm': 0.5059241652488708, 'learning_rate': 8.456782766329607e-06, 'epoch': 0.99}
33%|███▎ | 3819/11526 [39:49<1:18:58, 1.63it/s] 33%|███▎ | 3820/11526 [39:50<1:18:58, 1.63it/s] {'loss': 0.268, 'grad_norm': 0.5625717043876648, 'learning_rate': 8.455688496228723e-06, 'epoch': 0.99}
33%|███▎ | 3820/11526 [39:50<1:18:58, 1.63it/s] 33%|███▎ | 3821/11526 [39:51<1:18:55, 1.63it/s] {'loss': 0.2071, 'grad_norm': 0.4680609107017517, 'learning_rate': 8.454593909152427e-06, 'epoch': 0.99}
33%|███▎ | 3821/11526 [39:51<1:18:55, 1.63it/s] 33%|███▎ | 3822/11526 [39:51<1:18:52, 1.63it/s] {'loss': 0.2293, 'grad_norm': 0.5513056516647339, 'learning_rate': 8.453499005201123e-06, 'epoch': 0.99}
33%|███▎ | 3822/11526 [39:51<1:18:52, 1.63it/s] 33%|███▎ | 3823/11526 [39:52<1:18:48, 1.63it/s] {'loss': 0.2254, 'grad_norm': 0.4537743031978607, 'learning_rate': 8.45240378447524e-06, 'epoch': 1.0}
33%|███▎ | 3823/11526 [39:52<1:18:48, 1.63it/s] 33%|███▎ | 3824/11526 [39:52<1:18:51, 1.63it/s] {'loss': 0.2465, 'grad_norm': 0.47801801562309265, 'learning_rate': 8.451308247075238e-06, 'epoch': 1.0}
33%|███▎ | 3824/11526 [39:52<1:18:51, 1.63it/s] 33%|███▎ | 3825/11526 [39:53<1:18:48, 1.63it/s] {'loss': 0.2571, 'grad_norm': 0.5338993072509766, 'learning_rate': 8.450212393101608e-06, 'epoch': 1.0}
33%|███▎ | 3825/11526 [39:53<1:18:48, 1.63it/s] 33%|███▎ | 3826/11526 [39:54<1:18:47, 1.63it/s] {'loss': 0.3348, 'grad_norm': 0.5926414728164673, 'learning_rate': 8.449116222654866e-06, 'epoch': 1.0}
33%|███▎ | 3826/11526 [39:54<1:18:47, 1.63it/s] 33%|███▎ | 3827/11526 [39:54<1:18:47, 1.63it/s] {'loss': 0.2974, 'grad_norm': 0.5782370567321777, 'learning_rate': 8.448019735835558e-06, 'epoch': 1.0}
33%|███▎ | 3827/11526 [39:54<1:18:47, 1.63it/s] 33%|███▎ | 3828/11526 [39:55<1:18:47, 1.63it/s] {'loss': 0.2595, 'grad_norm': 0.5859867930412292, 'learning_rate': 8.446922932744263e-06, 'epoch': 1.0}
33%|███▎ | 3828/11526 [39:55<1:18:47, 1.63it/s] 33%|███▎ | 3829/11526 [39:55<1:18:46, 1.63it/s] {'loss': 0.2192, 'grad_norm': 0.5648504495620728, 'learning_rate': 8.445825813481582e-06, 'epoch': 1.0}
33%|███▎ | 3829/11526 [39:56<1:18:46, 1.63it/s] 33%|███▎ | 3830/11526 [39:56<1:18:46, 1.63it/s] {'loss': 0.282, 'grad_norm': 0.6144405007362366, 'learning_rate': 8.444728378148155e-06, 'epoch': 1.0}
33%|███▎ | 3830/11526 [39:56<1:18:46, 1.63it/s] 33%|███▎ | 3831/11526 [39:57<1:18:46, 1.63it/s] {'loss': 0.303, 'grad_norm': 0.5212909579277039, 'learning_rate': 8.443630626844639e-06, 'epoch': 1.0}
33%|███▎ | 3831/11526 [39:57<1:18:46, 1.63it/s] 33%|███▎ | 3832/11526 [39:57<1:18:46, 1.63it/s] {'loss': 0.1979, 'grad_norm': 0.4818902313709259, 'learning_rate': 8.44253255967173e-06, 'epoch': 1.0}
33%|███▎ | 3832/11526 [39:57<1:18:46, 1.63it/s] 33%|███▎ | 3833/11526 [39:58<1:18:45, 1.63it/s] {'loss': 0.2424, 'grad_norm': 0.4684607982635498, 'learning_rate': 8.441434176730146e-06, 'epoch': 1.0}
33%|███▎ | 3833/11526 [39:58<1:18:45, 1.63it/s] 33%|███▎ | 3834/11526 [39:58<1:18:41, 1.63it/s] {'loss': 0.2422, 'grad_norm': 0.5151761174201965, 'learning_rate': 8.440335478120637e-06, 'epoch': 1.0}
33%|███▎ | 3834/11526 [39:59<1:18:41, 1.63it/s] 33%|███▎ | 3835/11526 [39:59<1:18:43, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.4758394956588745, 'learning_rate': 8.439236463943988e-06, 'epoch': 1.0}
33%|███▎ | 3835/11526 [39:59<1:18:43, 1.63it/s] 33%|███▎ | 3836/11526 [40:00<1:18:50, 1.63it/s] {'loss': 0.2977, 'grad_norm': 0.5975261330604553, 'learning_rate': 8.438137134300998e-06, 'epoch': 1.0}
33%|███▎ | 3836/11526 [40:00<1:18:50, 1.63it/s] 33%|███▎ | 3837/11526 [40:00<1:18:47, 1.63it/s] {'loss': 0.2614, 'grad_norm': 0.5510948300361633, 'learning_rate': 8.437037489292509e-06, 'epoch': 1.0}
33%|███▎ | 3837/11526 [40:00<1:18:47, 1.63it/s] 33%|███▎ | 3838/11526 [40:01<1:18:45, 1.63it/s] {'loss': 0.2348, 'grad_norm': 0.4992464482784271, 'learning_rate': 8.435937529019385e-06, 'epoch': 1.0}
33%|███▎ | 3838/11526 [40:01<1:18:45, 1.63it/s] 33%|███▎ | 3839/11526 [40:02<1:18:45, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.455365389585495, 'learning_rate': 8.434837253582522e-06, 'epoch': 1.0}
33%|███▎ | 3839/11526 [40:02<1:18:45, 1.63it/s] 33%|███▎ | 3840/11526 [40:02<1:18:42, 1.63it/s] {'loss': 0.2648, 'grad_norm': 0.5137496590614319, 'learning_rate': 8.433736663082841e-06, 'epoch': 1.0}
33%|███▎ | 3840/11526 [40:02<1:18:42, 1.63it/s] 33%|███▎ | 3841/11526 [40:03<1:18:52, 1.62it/s] {'loss': 0.2113, 'grad_norm': 0.4409901797771454, 'learning_rate': 8.4326357576213e-06, 'epoch': 1.0}
33%|███▎ | 3841/11526 [40:03<1:18:52, 1.62it/s] 33%|███▎ | 3842/11526 [40:03<1:18:43, 1.63it/s] {'loss': 0.3341, 'grad_norm': 0.7030341625213623, 'learning_rate': 8.431534537298874e-06, 'epoch': 1.0}
33%|███▎ | 3842/11526 [40:04<1:18:43, 1.63it/s] 33%|███▎ | 3843/11526 [40:04<1:18:49, 1.62it/s] {'loss': 0.2902, 'grad_norm': 0.5617229342460632, 'learning_rate': 8.430433002216575e-06, 'epoch': 1.0}
33%|███▎ | 3843/11526 [40:04<1:18:49, 1.62it/s] 33%|███▎ | 3844/11526 [40:05<1:18:44, 1.63it/s] {'loss': 0.174, 'grad_norm': 0.4442766308784485, 'learning_rate': 8.429331152475443e-06, 'epoch': 1.0}
33%|███▎ | 3844/11526 [40:05<1:18:44, 1.63it/s] 33%|███▎ | 3845/11526 [40:05<1:18:42, 1.63it/s] {'loss': 0.1974, 'grad_norm': 0.49135875701904297, 'learning_rate': 8.428228988176546e-06, 'epoch': 1.0}
33%|███▎ | 3845/11526 [40:05<1:18:42, 1.63it/s] 33%|███▎ | 3846/11526 [40:06<1:18:45, 1.63it/s] {'loss': 0.2483, 'grad_norm': 0.49681544303894043, 'learning_rate': 8.42712650942098e-06, 'epoch': 1.0}
33%|███▎ | 3846/11526 [40:06<1:18:45, 1.63it/s] 33%|███▎ | 3847/11526 [40:06<1:18:43, 1.63it/s] {'loss': 0.1768, 'grad_norm': 0.43857887387275696, 'learning_rate': 8.426023716309871e-06, 'epoch': 1.0}
33%|███▎ | 3847/11526 [40:07<1:18:43, 1.63it/s] 33%|███▎ | 3848/11526 [40:07<1:18:39, 1.63it/s] {'loss': 0.1874, 'grad_norm': 0.4602213501930237, 'learning_rate': 8.424920608944374e-06, 'epoch': 1.0}
33%|███▎ | 3848/11526 [40:07<1:18:39, 1.63it/s] 33%|███▎ | 3849/11526 [40:08<1:18:37, 1.63it/s] {'loss': 0.1973, 'grad_norm': 0.47424834966659546, 'learning_rate': 8.42381718742567e-06, 'epoch': 1.0}
33%|███▎ | 3849/11526 [40:08<1:18:37, 1.63it/s] 33%|███▎ | 3850/11526 [40:08<1:18:35, 1.63it/s] {'loss': 0.2069, 'grad_norm': 0.4478027820587158, 'learning_rate': 8.422713451854976e-06, 'epoch': 1.0}
33%|███▎ | 3850/11526 [40:08<1:18:35, 1.63it/s] 33%|███▎ | 3851/11526 [40:09<1:18:37, 1.63it/s] {'loss': 0.1859, 'grad_norm': 0.4331139326095581, 'learning_rate': 8.421609402333529e-06, 'epoch': 1.0}
33%|███▎ | 3851/11526 [40:09<1:18:37, 1.63it/s] 33%|███▎ | 3852/11526 [40:10<1:18:35, 1.63it/s] {'loss': 0.2122, 'grad_norm': 0.5311474204063416, 'learning_rate': 8.420505038962596e-06, 'epoch': 1.0}
33%|███▎ | 3852/11526 [40:10<1:18:35, 1.63it/s] 33%|███▎ | 3853/11526 [40:10<1:18:32, 1.63it/s] {'loss': 0.2716, 'grad_norm': 0.5631378293037415, 'learning_rate': 8.419400361843482e-06, 'epoch': 1.0}
33%|███▎ | 3853/11526 [40:10<1:18:32, 1.63it/s] 33%|███▎ | 3854/11526 [40:11<1:18:30, 1.63it/s] {'loss': 0.2672, 'grad_norm': 0.5686108469963074, 'learning_rate': 8.418295371077511e-06, 'epoch': 1.0}
33%|███▎ | 3854/11526 [40:11<1:18:30, 1.63it/s] 33%|███▎ | 3855/11526 [40:11<1:18:27, 1.63it/s] {'loss': 0.2474, 'grad_norm': 0.546911358833313, 'learning_rate': 8.41719006676604e-06, 'epoch': 1.0}
33%|███▎ | 3855/11526 [40:12<1:18:27, 1.63it/s] 33%|███▎ | 3856/11526 [40:12<1:18:32, 1.63it/s] {'loss': 0.2035, 'grad_norm': 0.5447494387626648, 'learning_rate': 8.416084449010451e-06, 'epoch': 1.0}
33%|███▎ | 3856/11526 [40:12<1:18:32, 1.63it/s] 33%|███▎ | 3857/11526 [40:13<1:18:30, 1.63it/s] {'loss': 0.3619, 'grad_norm': 0.7383764386177063, 'learning_rate': 8.414978517912161e-06, 'epoch': 1.0}
33%|███▎ | 3857/11526 [40:13<1:18:30, 1.63it/s] 33%|███▎ | 3858/11526 [40:13<1:18:27, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.48736679553985596, 'learning_rate': 8.41387227357261e-06, 'epoch': 1.0}
33%|███▎ | 3858/11526 [40:13<1:18:27, 1.63it/s] 33%|███▎ | 3859/11526 [40:14<1:18:27, 1.63it/s] {'loss': 0.167, 'grad_norm': 0.4942989945411682, 'learning_rate': 8.412765716093273e-06, 'epoch': 1.0}
33%|███▎ | 3859/11526 [40:14<1:18:27, 1.63it/s] 33%|███▎ | 3860/11526 [40:14<1:18:27, 1.63it/s] {'loss': 0.2383, 'grad_norm': 0.6189345717430115, 'learning_rate': 8.411658845575642e-06, 'epoch': 1.0}
33%|███▎ | 3860/11526 [40:15<1:18:27, 1.63it/s] 33%|███▎ | 3861/11526 [40:15<1:18:28, 1.63it/s] {'loss': 0.2116, 'grad_norm': 0.5575351715087891, 'learning_rate': 8.410551662121253e-06, 'epoch': 1.0}
33%|███▎ | 3861/11526 [40:15<1:18:28, 1.63it/s] 34%|███▎ | 3862/11526 [40:16<1:18:32, 1.63it/s] {'loss': 0.2266, 'grad_norm': 0.5540524125099182, 'learning_rate': 8.40944416583166e-06, 'epoch': 1.01}
34%|███▎ | 3862/11526 [40:16<1:18:32, 1.63it/s] 34%|███▎ | 3863/11526 [40:16<1:18:31, 1.63it/s] {'loss': 0.2111, 'grad_norm': 0.5912890434265137, 'learning_rate': 8.408336356808449e-06, 'epoch': 1.01}
34%|███▎ | 3863/11526 [40:16<1:18:31, 1.63it/s] 34%|███▎ | 3864/11526 [40:17<1:18:29, 1.63it/s] {'loss': 0.3176, 'grad_norm': 0.6425330638885498, 'learning_rate': 8.407228235153234e-06, 'epoch': 1.01}
34%|███▎ | 3864/11526 [40:17<1:18:29, 1.63it/s] 34%|███▎ | 3865/11526 [40:18<1:18:29, 1.63it/s] {'loss': 0.2552, 'grad_norm': 0.6199027299880981, 'learning_rate': 8.40611980096766e-06, 'epoch': 1.01}
34%|███▎ | 3865/11526 [40:18<1:18:29, 1.63it/s] 34%|███▎ | 3866/11526 [40:18<1:18:33, 1.62it/s] {'loss': 0.2854, 'grad_norm': 0.5503511428833008, 'learning_rate': 8.405011054353396e-06, 'epoch': 1.01}
34%|███▎ | 3866/11526 [40:18<1:18:33, 1.62it/s] 34%|███▎ | 3867/11526 [40:19<1:18:29, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.4747142791748047, 'learning_rate': 8.403901995412147e-06, 'epoch': 1.01}
34%|███▎ | 3867/11526 [40:19<1:18:29, 1.63it/s] 34%|███▎ | 3868/11526 [40:19<1:18:27, 1.63it/s] {'loss': 0.2441, 'grad_norm': 0.6490676999092102, 'learning_rate': 8.402792624245637e-06, 'epoch': 1.01}
34%|███▎ | 3868/11526 [40:20<1:18:27, 1.63it/s] 34%|███▎ | 3869/11526 [40:20<1:18:27, 1.63it/s] {'loss': 0.2185, 'grad_norm': 0.5771580338478088, 'learning_rate': 8.401682940955628e-06, 'epoch': 1.01}
34%|███▎ | 3869/11526 [40:20<1:18:27, 1.63it/s] 34%|███▎ | 3870/11526 [40:21<1:18:24, 1.63it/s] {'loss': 0.2146, 'grad_norm': 0.484714150428772, 'learning_rate': 8.400572945643905e-06, 'epoch': 1.01}
34%|███▎ | 3870/11526 [40:21<1:18:24, 1.63it/s] 34%|███▎ | 3871/11526 [40:21<1:18:27, 1.63it/s] {'loss': 0.2466, 'grad_norm': 0.5797418355941772, 'learning_rate': 8.399462638412282e-06, 'epoch': 1.01}
34%|███▎ | 3871/11526 [40:21<1:18:27, 1.63it/s] 34%|███▎ | 3872/11526 [40:22<1:18:25, 1.63it/s] {'loss': 0.1859, 'grad_norm': 0.49791109561920166, 'learning_rate': 8.398352019362605e-06, 'epoch': 1.01}
34%|███▎ | 3872/11526 [40:22<1:18:25, 1.63it/s] 34%|███▎ | 3873/11526 [40:22<1:18:23, 1.63it/s] {'loss': 0.2713, 'grad_norm': 0.589189350605011, 'learning_rate': 8.397241088596743e-06, 'epoch': 1.01}
34%|███▎ | 3873/11526 [40:23<1:18:23, 1.63it/s] 34%|███▎ | 3874/11526 [40:23<1:18:20, 1.63it/s] {'loss': 0.2253, 'grad_norm': 0.5627443790435791, 'learning_rate': 8.3961298462166e-06, 'epoch': 1.01}
34%|███▎ | 3874/11526 [40:23<1:18:20, 1.63it/s] 34%|███▎ | 3875/11526 [40:24<1:18:17, 1.63it/s] {'loss': 0.167, 'grad_norm': 0.5182158350944519, 'learning_rate': 8.395018292324107e-06, 'epoch': 1.01}
34%|███▎ | 3875/11526 [40:24<1:18:17, 1.63it/s] 34%|███▎ | 3876/11526 [40:24<1:18:19, 1.63it/s] {'loss': 0.1719, 'grad_norm': 0.5318343043327332, 'learning_rate': 8.393906427021214e-06, 'epoch': 1.01}
34%|███▎ | 3876/11526 [40:24<1:18:19, 1.63it/s] 34%|███▎ | 3877/11526 [40:25<1:18:18, 1.63it/s] {'loss': 0.1783, 'grad_norm': 0.4579927325248718, 'learning_rate': 8.392794250409916e-06, 'epoch': 1.01}
34%|███▎ | 3877/11526 [40:25<1:18:18, 1.63it/s] 34%|███▎ | 3878/11526 [40:26<1:18:16, 1.63it/s] {'loss': 0.2037, 'grad_norm': 0.5170977711677551, 'learning_rate': 8.391681762592225e-06, 'epoch': 1.01}
34%|███▎ | 3878/11526 [40:26<1:18:16, 1.63it/s] 34%|███▎ | 3879/11526 [40:26<1:18:17, 1.63it/s] {'loss': 0.2007, 'grad_norm': 0.48385873436927795, 'learning_rate': 8.390568963670185e-06, 'epoch': 1.01}
34%|███▎ | 3879/11526 [40:26<1:18:17, 1.63it/s] 34%|███▎ | 3880/11526 [40:27<1:18:17, 1.63it/s] {'loss': 0.2257, 'grad_norm': 0.5944651365280151, 'learning_rate': 8.389455853745868e-06, 'epoch': 1.01}
34%|███▎ | 3880/11526 [40:27<1:18:17, 1.63it/s] 34%|███▎ | 3881/11526 [40:27<1:18:20, 1.63it/s] {'loss': 0.1965, 'grad_norm': 0.5529300570487976, 'learning_rate': 8.388342432921374e-06, 'epoch': 1.01}
34%|███▎ | 3881/11526 [40:28<1:18:20, 1.63it/s] 34%|███▎ | 3882/11526 [40:28<1:18:18, 1.63it/s] {'loss': 0.1921, 'grad_norm': 0.5397601127624512, 'learning_rate': 8.387228701298835e-06, 'epoch': 1.01}
34%|███▎ | 3882/11526 [40:28<1:18:18, 1.63it/s] 34%|███▎ | 3883/11526 [40:29<1:18:16, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.5710594654083252, 'learning_rate': 8.386114658980407e-06, 'epoch': 1.01}
34%|███▎ | 3883/11526 [40:29<1:18:16, 1.63it/s] 34%|███▎ | 3884/11526 [40:29<1:18:13, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5468589663505554, 'learning_rate': 8.385000306068274e-06, 'epoch': 1.01}
34%|███▎ | 3884/11526 [40:29<1:18:13, 1.63it/s] 34%|███▎ | 3885/11526 [40:30<1:18:12, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.5207247734069824, 'learning_rate': 8.383885642664655e-06, 'epoch': 1.01}
34%|███▎ | 3885/11526 [40:30<1:18:12, 1.63it/s] 34%|███▎ | 3886/11526 [40:30<1:18:20, 1.63it/s] {'loss': 0.2318, 'grad_norm': 0.6221569776535034, 'learning_rate': 8.38277066887179e-06, 'epoch': 1.01}
34%|███▎ | 3886/11526 [40:31<1:18:20, 1.63it/s] 34%|███▎ | 3887/11526 [40:31<1:18:16, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.5670037865638733, 'learning_rate': 8.381655384791952e-06, 'epoch': 1.01}
34%|███▎ | 3887/11526 [40:31<1:18:16, 1.63it/s] 34%|███▎ | 3888/11526 [40:32<1:18:13, 1.63it/s] {'loss': 0.1811, 'grad_norm': 0.4916304051876068, 'learning_rate': 8.380539790527443e-06, 'epoch': 1.01}
34%|███▎ | 3888/11526 [40:32<1:18:13, 1.63it/s] 34%|███▎ | 3889/11526 [40:32<1:18:15, 1.63it/s] {'loss': 0.2067, 'grad_norm': 0.528251051902771, 'learning_rate': 8.379423886180588e-06, 'epoch': 1.01}
34%|███▎ | 3889/11526 [40:32<1:18:15, 1.63it/s] 34%|███▎ | 3890/11526 [40:33<1:18:13, 1.63it/s] {'loss': 0.2182, 'grad_norm': 0.583684504032135, 'learning_rate': 8.378307671853747e-06, 'epoch': 1.01}
34%|███▎ | 3890/11526 [40:33<1:18:13, 1.63it/s] 34%|███▍ | 3891/11526 [40:34<1:18:15, 1.63it/s] {'loss': 0.2165, 'grad_norm': 0.5338238477706909, 'learning_rate': 8.377191147649303e-06, 'epoch': 1.01}
34%|███▍ | 3891/11526 [40:34<1:18:15, 1.63it/s] 34%|███▍ | 3892/11526 [40:34<1:18:12, 1.63it/s] {'loss': 0.2212, 'grad_norm': 0.5077522397041321, 'learning_rate': 8.376074313669675e-06, 'epoch': 1.01}
34%|███▍ | 3892/11526 [40:34<1:18:12, 1.63it/s] 34%|███▍ | 3893/11526 [40:35<1:18:14, 1.63it/s] {'loss': 0.2202, 'grad_norm': 0.5157799124717712, 'learning_rate': 8.3749571700173e-06, 'epoch': 1.01}
34%|███▍ | 3893/11526 [40:35<1:18:14, 1.63it/s] 34%|███▍ | 3894/11526 [40:35<1:18:12, 1.63it/s] {'loss': 0.2173, 'grad_norm': 0.546218752861023, 'learning_rate': 8.37383971679465e-06, 'epoch': 1.01}
34%|███▍ | 3894/11526 [40:36<1:18:12, 1.63it/s] 34%|███▍ | 3895/11526 [40:36<1:18:12, 1.63it/s] {'loss': 0.144, 'grad_norm': 0.40249666571617126, 'learning_rate': 8.372721954104227e-06, 'epoch': 1.01}
34%|███▍ | 3895/11526 [40:36<1:18:12, 1.63it/s] 34%|███▍ | 3896/11526 [40:37<1:18:16, 1.62it/s] {'loss': 0.1843, 'grad_norm': 0.4975471496582031, 'learning_rate': 8.371603882048554e-06, 'epoch': 1.01}
34%|███▍ | 3896/11526 [40:37<1:18:16, 1.62it/s] 34%|███▍ | 3897/11526 [40:37<1:18:10, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.5159737467765808, 'learning_rate': 8.370485500730189e-06, 'epoch': 1.01}
34%|███▍ | 3897/11526 [40:37<1:18:10, 1.63it/s] 34%|███▍ | 3898/11526 [40:38<1:18:07, 1.63it/s] {'loss': 0.2053, 'grad_norm': 0.5857053399085999, 'learning_rate': 8.369366810251717e-06, 'epoch': 1.01}
34%|███▍ | 3898/11526 [40:38<1:18:07, 1.63it/s] 34%|███▍ | 3899/11526 [40:38<1:18:06, 1.63it/s] {'loss': 0.1781, 'grad_norm': 0.5189196467399597, 'learning_rate': 8.368247810715752e-06, 'epoch': 1.01}
34%|███▍ | 3899/11526 [40:39<1:18:06, 1.63it/s] 34%|███▍ | 3900/11526 [40:39<1:18:03, 1.63it/s] {'loss': 0.1951, 'grad_norm': 0.5444295406341553, 'learning_rate': 8.367128502224931e-06, 'epoch': 1.02}
34%|███▍ | 3900/11526 [40:39<1:18:03, 1.63it/s] 34%|███▍ | 3901/11526 [40:40<1:18:11, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.49123910069465637, 'learning_rate': 8.366008884881926e-06, 'epoch': 1.02}
34%|███▍ | 3901/11526 [40:40<1:18:11, 1.63it/s] 34%|███▍ | 3902/11526 [40:40<1:18:12, 1.62it/s] {'loss': 0.2727, 'grad_norm': 0.600300669670105, 'learning_rate': 8.364888958789437e-06, 'epoch': 1.02}
34%|███▍ | 3902/11526 [40:40<1:18:12, 1.62it/s] 34%|███▍ | 3903/11526 [40:41<1:18:09, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.4135107696056366, 'learning_rate': 8.363768724050186e-06, 'epoch': 1.02}
34%|███▍ | 3903/11526 [40:41<1:18:09, 1.63it/s] 34%|███▍ | 3904/11526 [40:42<1:18:05, 1.63it/s] {'loss': 0.1645, 'grad_norm': 0.45465636253356934, 'learning_rate': 8.362648180766926e-06, 'epoch': 1.02}
34%|███▍ | 3904/11526 [40:42<1:18:05, 1.63it/s] 34%|███▍ | 3905/11526 [40:42<1:18:03, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5661675930023193, 'learning_rate': 8.361527329042445e-06, 'epoch': 1.02}
34%|███▍ | 3905/11526 [40:42<1:18:03, 1.63it/s] 34%|███▍ | 3906/11526 [40:43<1:18:07, 1.63it/s] {'loss': 0.2491, 'grad_norm': 0.5230737924575806, 'learning_rate': 8.36040616897955e-06, 'epoch': 1.02}
34%|███▍ | 3906/11526 [40:43<1:18:07, 1.63it/s] 34%|███▍ | 3907/11526 [40:43<1:18:03, 1.63it/s] {'loss': 0.1716, 'grad_norm': 0.4780765473842621, 'learning_rate': 8.359284700681081e-06, 'epoch': 1.02}
34%|███▍ | 3907/11526 [40:44<1:18:03, 1.63it/s] 34%|███▍ | 3908/11526 [40:44<1:18:02, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.48647090792655945, 'learning_rate': 8.358162924249906e-06, 'epoch': 1.02}
34%|███▍ | 3908/11526 [40:44<1:18:02, 1.63it/s] 34%|███▍ | 3909/11526 [40:45<1:17:59, 1.63it/s] {'loss': 0.2404, 'grad_norm': 0.5898938775062561, 'learning_rate': 8.35704083978892e-06, 'epoch': 1.02}
34%|███▍ | 3909/11526 [40:45<1:17:59, 1.63it/s] 34%|███▍ | 3910/11526 [40:45<1:17:56, 1.63it/s] {'loss': 0.3134, 'grad_norm': 0.6376853585243225, 'learning_rate': 8.355918447401048e-06, 'epoch': 1.02}
34%|███▍ | 3910/11526 [40:45<1:17:56, 1.63it/s] 34%|███▍ | 3911/11526 [40:46<1:17:58, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.4887515604496002, 'learning_rate': 8.354795747189243e-06, 'epoch': 1.02}
34%|███▍ | 3911/11526 [40:46<1:17:58, 1.63it/s] 34%|███▍ | 3912/11526 [40:46<1:17:56, 1.63it/s] {'loss': 0.244, 'grad_norm': 0.5969088077545166, 'learning_rate': 8.353672739256481e-06, 'epoch': 1.02}
34%|███▍ | 3912/11526 [40:47<1:17:56, 1.63it/s] 34%|███▍ | 3913/11526 [40:47<1:17:54, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.4705663323402405, 'learning_rate': 8.352549423705776e-06, 'epoch': 1.02}
34%|███▍ | 3913/11526 [40:47<1:17:54, 1.63it/s] 34%|███▍ | 3914/11526 [40:48<1:17:53, 1.63it/s] {'loss': 0.2002, 'grad_norm': 0.527810275554657, 'learning_rate': 8.351425800640161e-06, 'epoch': 1.02}
34%|███▍ | 3914/11526 [40:48<1:17:53, 1.63it/s] 34%|███▍ | 3915/11526 [40:48<1:17:51, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5468966960906982, 'learning_rate': 8.350301870162705e-06, 'epoch': 1.02}
34%|███▍ | 3915/11526 [40:48<1:17:51, 1.63it/s] 34%|███▍ | 3916/11526 [40:49<1:17:52, 1.63it/s] {'loss': 0.2186, 'grad_norm': 0.556243896484375, 'learning_rate': 8.349177632376497e-06, 'epoch': 1.02}
34%|███▍ | 3916/11526 [40:49<1:17:52, 1.63it/s] 34%|███▍ | 3917/11526 [40:50<1:17:52, 1.63it/s] {'loss': 0.1715, 'grad_norm': 0.5180574655532837, 'learning_rate': 8.348053087384663e-06, 'epoch': 1.02}
34%|███▍ | 3917/11526 [40:50<1:17:52, 1.63it/s] 34%|███▍ | 3918/11526 [40:50<1:17:49, 1.63it/s] {'loss': 0.1974, 'grad_norm': 0.5555828809738159, 'learning_rate': 8.346928235290346e-06, 'epoch': 1.02}
34%|███▍ | 3918/11526 [40:50<1:17:49, 1.63it/s] 34%|███▍ | 3919/11526 [40:51<1:17:50, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.5049116015434265, 'learning_rate': 8.345803076196733e-06, 'epoch': 1.02}
34%|███▍ | 3919/11526 [40:51<1:17:50, 1.63it/s] 34%|███▍ | 3920/11526 [40:51<1:17:49, 1.63it/s] {'loss': 0.2519, 'grad_norm': 0.5962488055229187, 'learning_rate': 8.344677610207021e-06, 'epoch': 1.02}
34%|███▍ | 3920/11526 [40:51<1:17:49, 1.63it/s] 34%|███▍ | 3921/11526 [40:52<1:17:50, 1.63it/s] {'loss': 0.2989, 'grad_norm': 0.671721875667572, 'learning_rate': 8.343551837424451e-06, 'epoch': 1.02}
34%|███▍ | 3921/11526 [40:52<1:17:50, 1.63it/s] 34%|███▍ | 3922/11526 [40:53<1:17:51, 1.63it/s] {'loss': 0.2762, 'grad_norm': 0.6163680553436279, 'learning_rate': 8.342425757952281e-06, 'epoch': 1.02}
34%|███▍ | 3922/11526 [40:53<1:17:51, 1.63it/s] 34%|███▍ | 3923/11526 [40:53<1:17:53, 1.63it/s] {'loss': 0.2006, 'grad_norm': 0.5211683511734009, 'learning_rate': 8.341299371893801e-06, 'epoch': 1.02}
34%|███▍ | 3923/11526 [40:53<1:17:53, 1.63it/s] 34%|███▍ | 3924/11526 [40:54<1:17:56, 1.63it/s] {'loss': 0.2244, 'grad_norm': 0.502079427242279, 'learning_rate': 8.340172679352335e-06, 'epoch': 1.02}
34%|███▍ | 3924/11526 [40:54<1:17:56, 1.63it/s] 34%|███▍ | 3925/11526 [40:54<1:17:53, 1.63it/s] {'loss': 0.2039, 'grad_norm': 0.5900307893753052, 'learning_rate': 8.339045680431223e-06, 'epoch': 1.02}
34%|███▍ | 3925/11526 [40:55<1:17:53, 1.63it/s] 34%|███▍ | 3926/11526 [40:55<1:17:55, 1.63it/s] {'loss': 0.2251, 'grad_norm': 0.5906327962875366, 'learning_rate': 8.337918375233845e-06, 'epoch': 1.02}
34%|███▍ | 3926/11526 [40:55<1:17:55, 1.63it/s] 34%|███▍ | 3927/11526 [40:56<1:17:53, 1.63it/s] {'loss': 0.1973, 'grad_norm': 0.6492132544517517, 'learning_rate': 8.336790763863601e-06, 'epoch': 1.02}
34%|███▍ | 3927/11526 [40:56<1:17:53, 1.63it/s] 34%|███▍ | 3928/11526 [40:56<1:17:48, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.5714696049690247, 'learning_rate': 8.335662846423924e-06, 'epoch': 1.02}
34%|███▍ | 3928/11526 [40:56<1:17:48, 1.63it/s] 34%|███▍ | 3929/11526 [40:57<1:17:48, 1.63it/s] {'loss': 0.2035, 'grad_norm': 0.5190034508705139, 'learning_rate': 8.334534623018268e-06, 'epoch': 1.02}
34%|███▍ | 3929/11526 [40:57<1:17:48, 1.63it/s] 34%|███▍ | 3930/11526 [40:58<1:17:47, 1.63it/s] {'loss': 0.2362, 'grad_norm': 0.5899050235748291, 'learning_rate': 8.333406093750127e-06, 'epoch': 1.02}
34%|███▍ | 3930/11526 [40:58<1:17:47, 1.63it/s] 34%|███▍ | 3931/11526 [40:58<1:17:53, 1.63it/s] {'loss': 0.2452, 'grad_norm': 0.5364802479743958, 'learning_rate': 8.332277258723012e-06, 'epoch': 1.02}
34%|███▍ | 3931/11526 [40:58<1:17:53, 1.63it/s] 34%|███▍ | 3932/11526 [40:59<1:17:50, 1.63it/s] {'loss': 0.1988, 'grad_norm': 0.4889678359031677, 'learning_rate': 8.331148118040467e-06, 'epoch': 1.02}
34%|███▍ | 3932/11526 [40:59<1:17:50, 1.63it/s] 34%|███▍ | 3933/11526 [40:59<1:17:50, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.4396674633026123, 'learning_rate': 8.330018671806061e-06, 'epoch': 1.02}
34%|███▍ | 3933/11526 [40:59<1:17:50, 1.63it/s] 34%|███▍ | 3934/11526 [41:00<1:17:49, 1.63it/s] {'loss': 0.2833, 'grad_norm': 0.6801697611808777, 'learning_rate': 8.328888920123396e-06, 'epoch': 1.02}
34%|███▍ | 3934/11526 [41:00<1:17:49, 1.63it/s] 34%|███▍ | 3935/11526 [41:01<1:17:45, 1.63it/s] {'loss': 0.2103, 'grad_norm': 0.5650310516357422, 'learning_rate': 8.3277588630961e-06, 'epoch': 1.02}
34%|███▍ | 3935/11526 [41:01<1:17:45, 1.63it/s] 34%|███▍ | 3936/11526 [41:01<1:17:53, 1.62it/s] {'loss': 0.1722, 'grad_norm': 0.46239280700683594, 'learning_rate': 8.326628500827826e-06, 'epoch': 1.02}
34%|███▍ | 3936/11526 [41:01<1:17:53, 1.62it/s] 34%|███▍ | 3937/11526 [41:02<1:17:47, 1.63it/s] {'loss': 0.2019, 'grad_norm': 0.618800699710846, 'learning_rate': 8.32549783342226e-06, 'epoch': 1.02}
34%|███▍ | 3937/11526 [41:02<1:17:47, 1.63it/s] 34%|███▍ | 3938/11526 [41:02<1:17:44, 1.63it/s] {'loss': 0.2037, 'grad_norm': 0.5710545182228088, 'learning_rate': 8.32436686098311e-06, 'epoch': 1.02}
34%|███▍ | 3938/11526 [41:03<1:17:44, 1.63it/s] 34%|███▍ | 3939/11526 [41:03<1:17:41, 1.63it/s] {'loss': 0.2004, 'grad_norm': 0.49538201093673706, 'learning_rate': 8.323235583614118e-06, 'epoch': 1.03}
34%|███▍ | 3939/11526 [41:03<1:17:41, 1.63it/s] 34%|███▍ | 3940/11526 [41:04<1:17:40, 1.63it/s] {'loss': 0.196, 'grad_norm': 0.4918733239173889, 'learning_rate': 8.322104001419048e-06, 'epoch': 1.03}
34%|███▍ | 3940/11526 [41:04<1:17:40, 1.63it/s] 34%|███▍ | 3941/11526 [41:04<1:17:42, 1.63it/s] {'loss': 0.2586, 'grad_norm': 0.5543307662010193, 'learning_rate': 8.320972114501698e-06, 'epoch': 1.03}
34%|███▍ | 3941/11526 [41:04<1:17:42, 1.63it/s] 34%|███▍ | 3942/11526 [41:05<1:17:41, 1.63it/s] {'loss': 0.2503, 'grad_norm': 0.5484928488731384, 'learning_rate': 8.31983992296589e-06, 'epoch': 1.03}
34%|███▍ | 3942/11526 [41:05<1:17:41, 1.63it/s] 34%|███▍ | 3943/11526 [41:06<1:17:57, 1.62it/s] {'loss': 0.2162, 'grad_norm': 0.6127955913543701, 'learning_rate': 8.318707426915474e-06, 'epoch': 1.03}
34%|███▍ | 3943/11526 [41:06<1:17:57, 1.62it/s] 34%|███▍ | 3944/11526 [41:06<1:17:50, 1.62it/s] {'loss': 0.1738, 'grad_norm': 0.4556379020214081, 'learning_rate': 8.317574626454331e-06, 'epoch': 1.03}
34%|███▍ | 3944/11526 [41:06<1:17:50, 1.62it/s] 34%|███▍ | 3945/11526 [41:07<1:17:45, 1.62it/s] {'loss': 0.2255, 'grad_norm': 0.6022009253501892, 'learning_rate': 8.316441521686367e-06, 'epoch': 1.03}
34%|███▍ | 3945/11526 [41:07<1:17:45, 1.62it/s] 34%|███▍ | 3946/11526 [41:07<1:17:44, 1.63it/s] {'loss': 0.2379, 'grad_norm': 0.7765471339225769, 'learning_rate': 8.315308112715518e-06, 'epoch': 1.03}
34%|███▍ | 3946/11526 [41:07<1:17:44, 1.63it/s] 34%|███▍ | 3947/11526 [41:08<1:17:41, 1.63it/s] {'loss': 0.1748, 'grad_norm': 0.5018616318702698, 'learning_rate': 8.314174399645745e-06, 'epoch': 1.03}
34%|███▍ | 3947/11526 [41:08<1:17:41, 1.63it/s] 34%|███▍ | 3948/11526 [41:09<1:17:38, 1.63it/s] {'loss': 0.2234, 'grad_norm': 0.6029800772666931, 'learning_rate': 8.313040382581038e-06, 'epoch': 1.03}
34%|███▍ | 3948/11526 [41:09<1:17:38, 1.63it/s] 34%|███▍ | 3949/11526 [41:09<1:17:35, 1.63it/s] {'loss': 0.2718, 'grad_norm': 0.6574292778968811, 'learning_rate': 8.311906061625417e-06, 'epoch': 1.03}
34%|███▍ | 3949/11526 [41:09<1:17:35, 1.63it/s] 34%|███▍ | 3950/11526 [41:10<1:17:33, 1.63it/s] {'loss': 0.2691, 'grad_norm': 0.6708565354347229, 'learning_rate': 8.310771436882929e-06, 'epoch': 1.03}
34%|███▍ | 3950/11526 [41:10<1:17:33, 1.63it/s] 34%|███▍ | 3951/11526 [41:10<1:17:39, 1.63it/s] {'loss': 0.224, 'grad_norm': 0.6194794774055481, 'learning_rate': 8.309636508457647e-06, 'epoch': 1.03}
34%|███▍ | 3951/11526 [41:11<1:17:39, 1.63it/s] 34%|███▍ | 3952/11526 [41:11<1:17:35, 1.63it/s] {'loss': 0.2634, 'grad_norm': 0.6697297096252441, 'learning_rate': 8.308501276453673e-06, 'epoch': 1.03}
34%|███▍ | 3952/11526 [41:11<1:17:35, 1.63it/s] 34%|███▍ | 3953/11526 [41:12<1:17:35, 1.63it/s] {'loss': 0.1876, 'grad_norm': 0.49038323760032654, 'learning_rate': 8.307365740975137e-06, 'epoch': 1.03}
34%|███▍ | 3953/11526 [41:12<1:17:35, 1.63it/s] 34%|███▍ | 3954/11526 [41:12<1:17:34, 1.63it/s] {'loss': 0.2031, 'grad_norm': 0.48489072918891907, 'learning_rate': 8.306229902126197e-06, 'epoch': 1.03}
34%|███▍ | 3954/11526 [41:12<1:17:34, 1.63it/s] 34%|███▍ | 3955/11526 [41:13<1:17:31, 1.63it/s] {'loss': 0.2143, 'grad_norm': 0.5115652084350586, 'learning_rate': 8.30509376001104e-06, 'epoch': 1.03}
34%|███▍ | 3955/11526 [41:13<1:17:31, 1.63it/s] 34%|███▍ | 3956/11526 [41:13<1:17:34, 1.63it/s] {'loss': 0.2186, 'grad_norm': 0.5249896049499512, 'learning_rate': 8.303957314733877e-06, 'epoch': 1.03}
34%|███▍ | 3956/11526 [41:14<1:17:34, 1.63it/s] 34%|███▍ | 3957/11526 [41:14<1:17:31, 1.63it/s] {'loss': 0.2109, 'grad_norm': 0.5597401857376099, 'learning_rate': 8.30282056639895e-06, 'epoch': 1.03}
34%|███▍ | 3957/11526 [41:14<1:17:31, 1.63it/s] 34%|███▍ | 3958/11526 [41:15<1:17:29, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.4645538628101349, 'learning_rate': 8.30168351511053e-06, 'epoch': 1.03}
34%|███▍ | 3958/11526 [41:15<1:17:29, 1.63it/s] 34%|███▍ | 3959/11526 [41:15<1:17:33, 1.63it/s] {'loss': 0.2836, 'grad_norm': 0.6078776717185974, 'learning_rate': 8.300546160972911e-06, 'epoch': 1.03}
34%|███▍ | 3959/11526 [41:15<1:17:33, 1.63it/s] 34%|███▍ | 3960/11526 [41:16<1:17:33, 1.63it/s] {'loss': 0.2212, 'grad_norm': 0.5036238431930542, 'learning_rate': 8.29940850409042e-06, 'epoch': 1.03}
34%|███▍ | 3960/11526 [41:16<1:17:33, 1.63it/s] 34%|███▍ | 3961/11526 [41:17<1:17:30, 1.63it/s] {'loss': 0.2023, 'grad_norm': 0.5104913711547852, 'learning_rate': 8.298270544567407e-06, 'epoch': 1.03}
34%|███▍ | 3961/11526 [41:17<1:17:30, 1.63it/s] 34%|███▍ | 3962/11526 [41:17<1:17:27, 1.63it/s] {'loss': 0.2077, 'grad_norm': 0.5337479710578918, 'learning_rate': 8.297132282508254e-06, 'epoch': 1.03}
34%|███▍ | 3962/11526 [41:17<1:17:27, 1.63it/s] 34%|███▍ | 3963/11526 [41:18<1:17:26, 1.63it/s] {'loss': 0.2289, 'grad_norm': 0.5776016712188721, 'learning_rate': 8.295993718017369e-06, 'epoch': 1.03}
34%|███▍ | 3963/11526 [41:18<1:17:26, 1.63it/s] 34%|███▍ | 3964/11526 [41:18<1:17:24, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.5513113141059875, 'learning_rate': 8.294854851199184e-06, 'epoch': 1.03}
34%|███▍ | 3964/11526 [41:19<1:17:24, 1.63it/s] 34%|███▍ | 3965/11526 [41:19<1:17:25, 1.63it/s] {'loss': 0.2079, 'grad_norm': 0.511696994304657, 'learning_rate': 8.293715682158166e-06, 'epoch': 1.03}
34%|███▍ | 3965/11526 [41:19<1:17:25, 1.63it/s] 34%|███▍ | 3966/11526 [41:20<1:17:33, 1.62it/s] {'loss': 0.2171, 'grad_norm': 0.5629448890686035, 'learning_rate': 8.292576210998806e-06, 'epoch': 1.03}
34%|███▍ | 3966/11526 [41:20<1:17:33, 1.62it/s] 34%|███▍ | 3967/11526 [41:20<1:17:28, 1.63it/s] {'loss': 0.2536, 'grad_norm': 0.5985271334648132, 'learning_rate': 8.291436437825622e-06, 'epoch': 1.03}
34%|███▍ | 3967/11526 [41:20<1:17:28, 1.63it/s] 34%|███▍ | 3968/11526 [41:21<1:17:26, 1.63it/s] {'loss': 0.1698, 'grad_norm': 0.512270987033844, 'learning_rate': 8.29029636274316e-06, 'epoch': 1.03}
34%|███▍ | 3968/11526 [41:21<1:17:26, 1.63it/s] 34%|███▍ | 3969/11526 [41:21<1:17:24, 1.63it/s] {'loss': 0.3237, 'grad_norm': 0.682799756526947, 'learning_rate': 8.289155985855995e-06, 'epoch': 1.03}
34%|███▍ | 3969/11526 [41:22<1:17:24, 1.63it/s] 34%|███▍ | 3970/11526 [41:22<1:17:23, 1.63it/s] {'loss': 0.1551, 'grad_norm': 0.4479008913040161, 'learning_rate': 8.288015307268729e-06, 'epoch': 1.03}
34%|███▍ | 3970/11526 [41:22<1:17:23, 1.63it/s] 34%|███▍ | 3971/11526 [41:23<1:17:27, 1.63it/s] {'loss': 0.2095, 'grad_norm': 0.5197000503540039, 'learning_rate': 8.28687432708599e-06, 'epoch': 1.03}
34%|███▍ | 3971/11526 [41:23<1:17:27, 1.63it/s] 34%|███▍ | 3972/11526 [41:23<1:17:26, 1.63it/s] {'loss': 0.1628, 'grad_norm': 0.5283327698707581, 'learning_rate': 8.285733045412434e-06, 'epoch': 1.03}
34%|███▍ | 3972/11526 [41:23<1:17:26, 1.63it/s] 34%|███▍ | 3973/11526 [41:24<1:17:22, 1.63it/s] {'loss': 0.2202, 'grad_norm': 0.5163598656654358, 'learning_rate': 8.284591462352752e-06, 'epoch': 1.03}
34%|███▍ | 3973/11526 [41:24<1:17:22, 1.63it/s] 34%|███▍ | 3974/11526 [41:25<1:17:22, 1.63it/s] {'loss': 0.2185, 'grad_norm': 0.5456569194793701, 'learning_rate': 8.283449578011651e-06, 'epoch': 1.03}
34%|███▍ | 3974/11526 [41:25<1:17:22, 1.63it/s] 34%|███▍ | 3975/11526 [41:25<1:17:21, 1.63it/s] {'loss': 0.206, 'grad_norm': 0.5601513981819153, 'learning_rate': 8.282307392493871e-06, 'epoch': 1.03}
34%|███▍ | 3975/11526 [41:25<1:17:21, 1.63it/s] 34%|███▍ | 3976/11526 [41:26<1:17:40, 1.62it/s] {'loss': 0.1579, 'grad_norm': 0.5123019218444824, 'learning_rate': 8.281164905904184e-06, 'epoch': 1.03}
34%|███▍ | 3976/11526 [41:26<1:17:40, 1.62it/s] 35%|███▍ | 3977/11526 [41:26<1:17:27, 1.62it/s] {'loss': 0.2384, 'grad_norm': 0.5890372395515442, 'learning_rate': 8.280022118347381e-06, 'epoch': 1.04}
35%|███▍ | 3977/11526 [41:27<1:17:27, 1.62it/s] 35%|███▍ | 3978/11526 [41:27<1:17:25, 1.62it/s] {'loss': 0.2147, 'grad_norm': 0.5247375965118408, 'learning_rate': 8.278879029928289e-06, 'epoch': 1.04}
35%|███▍ | 3978/11526 [41:27<1:17:25, 1.62it/s] 35%|███▍ | 3979/11526 [41:28<1:17:20, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.5203347206115723, 'learning_rate': 8.277735640751754e-06, 'epoch': 1.04}
35%|███▍ | 3979/11526 [41:28<1:17:20, 1.63it/s] 35%|███▍ | 3980/11526 [41:28<1:17:16, 1.63it/s] {'loss': 0.1779, 'grad_norm': 0.5427337288856506, 'learning_rate': 8.276591950922656e-06, 'epoch': 1.04}
35%|███▍ | 3980/11526 [41:28<1:17:16, 1.63it/s] 35%|███▍ | 3981/11526 [41:29<1:17:34, 1.62it/s] {'loss': 0.1888, 'grad_norm': 0.5319368243217468, 'learning_rate': 8.275447960545903e-06, 'epoch': 1.04}
35%|███▍ | 3981/11526 [41:29<1:17:34, 1.62it/s] 35%|███▍ | 3982/11526 [41:29<1:17:27, 1.62it/s] {'loss': 0.2811, 'grad_norm': 0.5943263173103333, 'learning_rate': 8.274303669726427e-06, 'epoch': 1.04}
35%|███▍ | 3982/11526 [41:30<1:17:27, 1.62it/s] 35%|███▍ | 3983/11526 [41:30<1:17:22, 1.62it/s] {'loss': 0.2528, 'grad_norm': 0.7272209525108337, 'learning_rate': 8.273159078569186e-06, 'epoch': 1.04}
35%|███▍ | 3983/11526 [41:30<1:17:22, 1.62it/s] 35%|███▍ | 3984/11526 [41:31<1:17:18, 1.63it/s] {'loss': 0.2342, 'grad_norm': 0.638358473777771, 'learning_rate': 8.272014187179173e-06, 'epoch': 1.04}
35%|███▍ | 3984/11526 [41:31<1:17:18, 1.63it/s] 35%|███▍ | 3985/11526 [41:31<1:17:14, 1.63it/s] {'loss': 0.2147, 'grad_norm': 0.5522480010986328, 'learning_rate': 8.270868995661402e-06, 'epoch': 1.04}
35%|███▍ | 3985/11526 [41:31<1:17:14, 1.63it/s] 35%|███▍ | 3986/11526 [41:32<1:17:18, 1.63it/s] {'loss': 0.1869, 'grad_norm': 0.5053954124450684, 'learning_rate': 8.269723504120914e-06, 'epoch': 1.04}
35%|███▍ | 3986/11526 [41:32<1:17:18, 1.63it/s] 35%|███▍ | 3987/11526 [41:33<1:17:14, 1.63it/s] {'loss': 0.2056, 'grad_norm': 0.47597596049308777, 'learning_rate': 8.268577712662782e-06, 'epoch': 1.04}
35%|███▍ | 3987/11526 [41:33<1:17:14, 1.63it/s] 35%|███▍ | 3988/11526 [41:33<1:17:10, 1.63it/s] {'loss': 0.1557, 'grad_norm': 0.4939345419406891, 'learning_rate': 8.267431621392108e-06, 'epoch': 1.04}
35%|███▍ | 3988/11526 [41:33<1:17:10, 1.63it/s] 35%|███▍ | 3989/11526 [41:34<1:17:08, 1.63it/s] {'loss': 0.1695, 'grad_norm': 0.5053800940513611, 'learning_rate': 8.266285230414012e-06, 'epoch': 1.04}
35%|███▍ | 3989/11526 [41:34<1:17:08, 1.63it/s] 35%|███▍ | 3990/11526 [41:34<1:17:06, 1.63it/s] {'loss': 0.2162, 'grad_norm': 0.5085758566856384, 'learning_rate': 8.26513853983365e-06, 'epoch': 1.04}
35%|███▍ | 3990/11526 [41:35<1:17:06, 1.63it/s] 35%|███▍ | 3991/11526 [41:35<1:17:13, 1.63it/s] {'loss': 0.1777, 'grad_norm': 0.4962286651134491, 'learning_rate': 8.263991549756207e-06, 'epoch': 1.04}
35%|███▍ | 3991/11526 [41:35<1:17:13, 1.63it/s] 35%|███▍ | 3992/11526 [41:36<1:17:10, 1.63it/s] {'loss': 0.2453, 'grad_norm': 0.6095378994941711, 'learning_rate': 8.262844260286884e-06, 'epoch': 1.04}
35%|███▍ | 3992/11526 [41:36<1:17:10, 1.63it/s] 35%|███▍ | 3993/11526 [41:36<1:17:09, 1.63it/s] {'loss': 0.1937, 'grad_norm': 0.5457762479782104, 'learning_rate': 8.26169667153092e-06, 'epoch': 1.04}
35%|███▍ | 3993/11526 [41:36<1:17:09, 1.63it/s] 35%|███▍ | 3994/11526 [41:37<1:17:06, 1.63it/s] {'loss': 0.2394, 'grad_norm': 0.586613655090332, 'learning_rate': 8.260548783593582e-06, 'epoch': 1.04}
35%|███▍ | 3994/11526 [41:37<1:17:06, 1.63it/s] 35%|███▍ | 3995/11526 [41:37<1:17:05, 1.63it/s] {'loss': 0.1732, 'grad_norm': 0.4979362487792969, 'learning_rate': 8.259400596580158e-06, 'epoch': 1.04}
35%|███▍ | 3995/11526 [41:38<1:17:05, 1.63it/s] 35%|███▍ | 3996/11526 [41:38<1:17:09, 1.63it/s] {'loss': 0.2223, 'grad_norm': 0.5459935665130615, 'learning_rate': 8.258252110595964e-06, 'epoch': 1.04}
35%|███▍ | 3996/11526 [41:38<1:17:09, 1.63it/s] 35%|███▍ | 3997/11526 [41:39<1:17:06, 1.63it/s] {'loss': 0.2362, 'grad_norm': 0.6335797905921936, 'learning_rate': 8.257103325746348e-06, 'epoch': 1.04}
35%|███▍ | 3997/11526 [41:39<1:17:06, 1.63it/s] 35%|███▍ | 3998/11526 [41:39<1:17:03, 1.63it/s] {'loss': 0.1919, 'grad_norm': 0.5264695286750793, 'learning_rate': 8.255954242136683e-06, 'epoch': 1.04}
35%|███▍ | 3998/11526 [41:39<1:17:03, 1.63it/s] 35%|███▍ | 3999/11526 [41:40<1:17:02, 1.63it/s] {'loss': 0.2185, 'grad_norm': 0.5455175638198853, 'learning_rate': 8.25480485987237e-06, 'epoch': 1.04}
35%|███▍ | 3999/11526 [41:40<1:17:02, 1.63it/s] 35%|███▍ | 4000/11526 [41:41<1:17:01, 1.63it/s] {'loss': 0.2094, 'grad_norm': 0.5456580519676208, 'learning_rate': 8.253655179058835e-06, 'epoch': 1.04}
35%|███▍ | 4000/11526 [41:41<1:17:01, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.02it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.6233740448951721, 'eval_runtime': 1.9541, 'eval_samples_per_second': 102.347, 'eval_steps_per_second': 6.653, 'epoch': 1.04}
35%|███▍ | 4000/11526 [41:43<1:17:01, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 35%|███▍ | 4001/11526 [41:43<2:30:52, 1.20s/it] {'loss': 0.2031, 'grad_norm': 0.47147247195243835, 'learning_rate': 8.252505199801535e-06, 'epoch': 1.04}
35%|███▍ | 4001/11526 [41:43<2:30:52, 1.20s/it] 35%|███▍ | 4002/11526 [41:44<2:08:41, 1.03s/it] {'loss': 0.1407, 'grad_norm': 0.44049596786499023, 'learning_rate': 8.251354922205954e-06, 'epoch': 1.04}
35%|███▍ | 4002/11526 [41:44<2:08:41, 1.03s/it] 35%|███▍ | 4003/11526 [41:44<1:53:09, 1.11it/s] {'loss': 0.1684, 'grad_norm': 0.4560467004776001, 'learning_rate': 8.250204346377597e-06, 'epoch': 1.04}
35%|███▍ | 4003/11526 [41:44<1:53:09, 1.11it/s] 35%|███▍ | 4004/11526 [41:45<1:42:21, 1.22it/s] {'loss': 0.1839, 'grad_norm': 0.6259543299674988, 'learning_rate': 8.249053472422006e-06, 'epoch': 1.04}
35%|███▍ | 4004/11526 [41:45<1:42:21, 1.22it/s] 35%|███▍ | 4005/11526 [41:46<1:34:43, 1.32it/s] {'loss': 0.1561, 'grad_norm': 0.4687540829181671, 'learning_rate': 8.247902300444743e-06, 'epoch': 1.04}
35%|███▍ | 4005/11526 [41:46<1:34:43, 1.32it/s] 35%|███▍ | 4006/11526 [41:46<1:29:23, 1.40it/s] {'loss': 0.1856, 'grad_norm': 0.5413231253623962, 'learning_rate': 8.246750830551402e-06, 'epoch': 1.04}
35%|███▍ | 4006/11526 [41:46<1:29:23, 1.40it/s] 35%|███▍ | 4007/11526 [41:47<1:25:43, 1.46it/s] {'loss': 0.1727, 'grad_norm': 0.48110222816467285, 'learning_rate': 8.245599062847602e-06, 'epoch': 1.04}
35%|███▍ | 4007/11526 [41:47<1:25:43, 1.46it/s] 35%|███▍ | 4008/11526 [41:47<1:23:05, 1.51it/s] {'loss': 0.1817, 'grad_norm': 0.47991615533828735, 'learning_rate': 8.24444699743899e-06, 'epoch': 1.04}
35%|███▍ | 4008/11526 [41:48<1:23:05, 1.51it/s] 35%|███▍ | 4009/11526 [41:48<1:21:14, 1.54it/s] {'loss': 0.2136, 'grad_norm': 0.518352746963501, 'learning_rate': 8.243294634431238e-06, 'epoch': 1.04}
35%|███▍ | 4009/11526 [41:48<1:21:14, 1.54it/s] 35%|███▍ | 4010/11526 [41:49<1:19:57, 1.57it/s] {'loss': 0.2289, 'grad_norm': 0.5759556889533997, 'learning_rate': 8.242141973930049e-06, 'epoch': 1.04}
35%|███▍ | 4010/11526 [41:49<1:19:57, 1.57it/s] 35%|███▍ | 4011/11526 [41:49<1:19:02, 1.58it/s] {'loss': 0.2537, 'grad_norm': 0.5723068118095398, 'learning_rate': 8.24098901604115e-06, 'epoch': 1.04}
35%|███▍ | 4011/11526 [41:49<1:19:02, 1.58it/s] 35%|███▍ | 4012/11526 [41:50<1:18:20, 1.60it/s] {'loss': 0.1662, 'grad_norm': 0.6413497924804688, 'learning_rate': 8.239835760870299e-06, 'epoch': 1.04}
35%|███▍ | 4012/11526 [41:50<1:18:20, 1.60it/s] 35%|███▍ | 4013/11526 [41:50<1:17:55, 1.61it/s] {'loss': 0.1928, 'grad_norm': 0.6021493077278137, 'learning_rate': 8.23868220852328e-06, 'epoch': 1.04}
35%|███▍ | 4013/11526 [41:51<1:17:55, 1.61it/s] 35%|███▍ | 4014/11526 [41:51<1:17:35, 1.61it/s] {'loss': 0.2431, 'grad_norm': 0.6363776922225952, 'learning_rate': 8.237528359105898e-06, 'epoch': 1.04}
35%|███▍ | 4014/11526 [41:51<1:17:35, 1.61it/s] 35%|███▍ | 4015/11526 [41:52<1:17:21, 1.62it/s] {'loss': 0.218, 'grad_norm': 0.5081867575645447, 'learning_rate': 8.236374212723995e-06, 'epoch': 1.05}
35%|███▍ | 4015/11526 [41:52<1:17:21, 1.62it/s] 35%|███▍ | 4016/11526 [41:52<1:17:13, 1.62it/s] {'loss': 0.2066, 'grad_norm': 0.5754663348197937, 'learning_rate': 8.235219769483436e-06, 'epoch': 1.05}
35%|███▍ | 4016/11526 [41:52<1:17:13, 1.62it/s] 35%|███▍ | 4017/11526 [41:53<1:17:06, 1.62it/s] {'loss': 0.2526, 'grad_norm': 0.574370265007019, 'learning_rate': 8.234065029490112e-06, 'epoch': 1.05}
35%|███▍ | 4017/11526 [41:53<1:17:06, 1.62it/s] 35%|███▍ | 4018/11526 [41:54<1:16:57, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.4491112530231476, 'learning_rate': 8.232909992849942e-06, 'epoch': 1.05}
35%|███▍ | 4018/11526 [41:54<1:16:57, 1.63it/s] 35%|███▍ | 4019/11526 [41:54<1:16:55, 1.63it/s] {'loss': 0.1952, 'grad_norm': 0.4720725417137146, 'learning_rate': 8.231754659668872e-06, 'epoch': 1.05}
35%|███▍ | 4019/11526 [41:54<1:16:55, 1.63it/s] 35%|███▍ | 4020/11526 [41:55<1:16:55, 1.63it/s] {'loss': 0.185, 'grad_norm': 0.5227355360984802, 'learning_rate': 8.230599030052879e-06, 'epoch': 1.05}
35%|███▍ | 4020/11526 [41:55<1:16:55, 1.63it/s] 35%|███▍ | 4021/11526 [41:55<1:16:57, 1.63it/s] {'loss': 0.1926, 'grad_norm': 0.48150452971458435, 'learning_rate': 8.22944310410796e-06, 'epoch': 1.05}
35%|███▍ | 4021/11526 [41:56<1:16:57, 1.63it/s] 35%|███▍ | 4022/11526 [41:56<1:16:54, 1.63it/s] {'loss': 0.2122, 'grad_norm': 0.5254482626914978, 'learning_rate': 8.228286881940145e-06, 'epoch': 1.05}
35%|███▍ | 4022/11526 [41:56<1:16:54, 1.63it/s] 35%|███▍ | 4023/11526 [41:57<1:16:52, 1.63it/s] {'loss': 0.2208, 'grad_norm': 0.7056509256362915, 'learning_rate': 8.22713036365549e-06, 'epoch': 1.05}
35%|███▍ | 4023/11526 [41:57<1:16:52, 1.63it/s] 35%|███▍ | 4024/11526 [41:57<1:16:49, 1.63it/s] {'loss': 0.1968, 'grad_norm': 0.5254235863685608, 'learning_rate': 8.225973549360074e-06, 'epoch': 1.05}
35%|███▍ | 4024/11526 [41:57<1:16:49, 1.63it/s] 35%|███▍ | 4025/11526 [41:58<1:16:47, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.5479786396026611, 'learning_rate': 8.22481643916001e-06, 'epoch': 1.05}
35%|███▍ | 4025/11526 [41:58<1:16:47, 1.63it/s] 35%|███▍ | 4026/11526 [41:58<1:16:53, 1.63it/s] {'loss': 0.1984, 'grad_norm': 0.5378801822662354, 'learning_rate': 8.223659033161435e-06, 'epoch': 1.05}
35%|███▍ | 4026/11526 [41:59<1:16:53, 1.63it/s] 35%|███▍ | 4027/11526 [41:59<1:16:48, 1.63it/s] {'loss': 0.1999, 'grad_norm': 0.6451026201248169, 'learning_rate': 8.222501331470512e-06, 'epoch': 1.05}
35%|███▍ | 4027/11526 [41:59<1:16:48, 1.63it/s] 35%|███▍ | 4028/11526 [42:00<1:16:46, 1.63it/s] {'loss': 0.2858, 'grad_norm': 0.5766314268112183, 'learning_rate': 8.22134333419343e-06, 'epoch': 1.05}
35%|███▍ | 4028/11526 [42:00<1:16:46, 1.63it/s] 35%|███▍ | 4029/11526 [42:00<1:16:46, 1.63it/s] {'loss': 0.2069, 'grad_norm': 0.5732244849205017, 'learning_rate': 8.220185041436408e-06, 'epoch': 1.05}
35%|███▍ | 4029/11526 [42:00<1:16:46, 1.63it/s] 35%|███▍ | 4030/11526 [42:01<1:16:43, 1.63it/s] {'loss': 0.1866, 'grad_norm': 0.5309785008430481, 'learning_rate': 8.219026453305694e-06, 'epoch': 1.05}
35%|███▍ | 4030/11526 [42:01<1:16:43, 1.63it/s] 35%|███▍ | 4031/11526 [42:02<1:16:50, 1.63it/s] {'loss': 0.1907, 'grad_norm': 0.5356010794639587, 'learning_rate': 8.217867569907558e-06, 'epoch': 1.05}
35%|███▍ | 4031/11526 [42:02<1:16:50, 1.63it/s] 35%|███▍ | 4032/11526 [42:02<1:16:47, 1.63it/s] {'loss': 0.1887, 'grad_norm': 0.5160710215568542, 'learning_rate': 8.216708391348297e-06, 'epoch': 1.05}
35%|███▍ | 4032/11526 [42:02<1:16:47, 1.63it/s] 35%|███▍ | 4033/11526 [42:03<1:16:42, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.4937271177768707, 'learning_rate': 8.215548917734243e-06, 'epoch': 1.05}
35%|███▍ | 4033/11526 [42:03<1:16:42, 1.63it/s] 35%|███▍ | 4034/11526 [42:03<1:16:42, 1.63it/s] {'loss': 0.2246, 'grad_norm': 0.6185543537139893, 'learning_rate': 8.214389149171745e-06, 'epoch': 1.05}
35%|███▍ | 4034/11526 [42:04<1:16:42, 1.63it/s] 35%|███▌ | 4035/11526 [42:04<1:16:41, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.5866132378578186, 'learning_rate': 8.213229085767186e-06, 'epoch': 1.05}
35%|███▌ | 4035/11526 [42:04<1:16:41, 1.63it/s] 35%|███▌ | 4036/11526 [42:05<1:16:42, 1.63it/s] {'loss': 0.2488, 'grad_norm': 0.7421290874481201, 'learning_rate': 8.212068727626972e-06, 'epoch': 1.05}
35%|███▌ | 4036/11526 [42:05<1:16:42, 1.63it/s] 35%|███▌ | 4037/11526 [42:05<1:16:41, 1.63it/s] {'loss': 0.1996, 'grad_norm': 0.5091370344161987, 'learning_rate': 8.21090807485754e-06, 'epoch': 1.05}
35%|███▌ | 4037/11526 [42:05<1:16:41, 1.63it/s] 35%|███▌ | 4038/11526 [42:06<1:16:39, 1.63it/s] {'loss': 0.2979, 'grad_norm': 0.6444469094276428, 'learning_rate': 8.209747127565348e-06, 'epoch': 1.05}
35%|███▌ | 4038/11526 [42:06<1:16:39, 1.63it/s] 35%|███▌ | 4039/11526 [42:06<1:16:38, 1.63it/s] {'loss': 0.2297, 'grad_norm': 0.5934173464775085, 'learning_rate': 8.208585885856887e-06, 'epoch': 1.05}
35%|███▌ | 4039/11526 [42:07<1:16:38, 1.63it/s] 35%|███▌ | 4040/11526 [42:07<1:16:37, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.4841199219226837, 'learning_rate': 8.207424349838672e-06, 'epoch': 1.05}
35%|███▌ | 4040/11526 [42:07<1:16:37, 1.63it/s] 35%|███▌ | 4041/11526 [42:08<1:16:37, 1.63it/s] {'loss': 0.1936, 'grad_norm': 0.49921658635139465, 'learning_rate': 8.206262519617248e-06, 'epoch': 1.05}
35%|███▌ | 4041/11526 [42:08<1:16:37, 1.63it/s] 35%|███▌ | 4042/11526 [42:08<1:16:35, 1.63it/s] {'loss': 0.2, 'grad_norm': 0.5129127502441406, 'learning_rate': 8.205100395299182e-06, 'epoch': 1.05}
35%|███▌ | 4042/11526 [42:08<1:16:35, 1.63it/s] 35%|███▌ | 4043/11526 [42:09<1:16:35, 1.63it/s] {'loss': 0.1713, 'grad_norm': 0.47060322761535645, 'learning_rate': 8.203937976991072e-06, 'epoch': 1.05}
35%|███▌ | 4043/11526 [42:09<1:16:35, 1.63it/s] 35%|███▌ | 4044/11526 [42:10<1:16:35, 1.63it/s] {'loss': 0.196, 'grad_norm': 0.5215697884559631, 'learning_rate': 8.20277526479954e-06, 'epoch': 1.05}
35%|███▌ | 4044/11526 [42:10<1:16:35, 1.63it/s] 35%|███▌ | 4045/11526 [42:10<1:16:35, 1.63it/s] {'loss': 0.1949, 'grad_norm': 0.49276623129844666, 'learning_rate': 8.201612258831238e-06, 'epoch': 1.05}
35%|███▌ | 4045/11526 [42:10<1:16:35, 1.63it/s] 35%|███▌ | 4046/11526 [42:11<1:16:40, 1.63it/s] {'loss': 0.2157, 'grad_norm': 0.5749131441116333, 'learning_rate': 8.200448959192844e-06, 'epoch': 1.05}
35%|███▌ | 4046/11526 [42:11<1:16:40, 1.63it/s] 35%|███▌ | 4047/11526 [42:11<1:16:37, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.5589984655380249, 'learning_rate': 8.199285365991061e-06, 'epoch': 1.05}
35%|███▌ | 4047/11526 [42:12<1:16:37, 1.63it/s] 35%|███▌ | 4048/11526 [42:12<1:16:35, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.5369610786437988, 'learning_rate': 8.19812147933262e-06, 'epoch': 1.05}
35%|███▌ | 4048/11526 [42:12<1:16:35, 1.63it/s] 35%|███▌ | 4049/11526 [42:13<1:16:35, 1.63it/s] {'loss': 0.1775, 'grad_norm': 0.4778631329536438, 'learning_rate': 8.196957299324281e-06, 'epoch': 1.05}
35%|███▌ | 4049/11526 [42:13<1:16:35, 1.63it/s] 35%|███▌ | 4050/11526 [42:13<1:16:33, 1.63it/s] {'loss': 0.2034, 'grad_norm': 0.542945146560669, 'learning_rate': 8.19579282607283e-06, 'epoch': 1.05}
35%|███▌ | 4050/11526 [42:13<1:16:33, 1.63it/s] 35%|███▌ | 4051/11526 [42:14<1:16:35, 1.63it/s] {'loss': 0.224, 'grad_norm': 0.6120182275772095, 'learning_rate': 8.194628059685077e-06, 'epoch': 1.05}
35%|███▌ | 4051/11526 [42:14<1:16:35, 1.63it/s] 35%|███▌ | 4052/11526 [42:14<1:16:33, 1.63it/s] {'loss': 0.2215, 'grad_norm': 0.5631906986236572, 'learning_rate': 8.19346300026786e-06, 'epoch': 1.05}
35%|███▌ | 4052/11526 [42:15<1:16:33, 1.63it/s] 35%|███▌ | 4053/11526 [42:15<1:16:32, 1.63it/s] {'loss': 0.2464, 'grad_norm': 0.5998072624206543, 'learning_rate': 8.192297647928049e-06, 'epoch': 1.05}
35%|███▌ | 4053/11526 [42:15<1:16:32, 1.63it/s] 35%|███▌ | 4054/11526 [42:16<1:16:29, 1.63it/s] {'loss': 0.1974, 'grad_norm': 0.546891987323761, 'learning_rate': 8.191132002772533e-06, 'epoch': 1.06}
35%|███▌ | 4054/11526 [42:16<1:16:29, 1.63it/s] 35%|███▌ | 4055/11526 [42:16<1:16:31, 1.63it/s] {'loss': 0.2428, 'grad_norm': 0.6906431317329407, 'learning_rate': 8.189966064908233e-06, 'epoch': 1.06}
35%|███▌ | 4055/11526 [42:16<1:16:31, 1.63it/s] 35%|███▌ | 4056/11526 [42:17<1:16:35, 1.63it/s] {'loss': 0.234, 'grad_norm': 0.5568280816078186, 'learning_rate': 8.188799834442096e-06, 'epoch': 1.06}
35%|███▌ | 4056/11526 [42:17<1:16:35, 1.63it/s] 35%|███▌ | 4057/11526 [42:18<1:16:33, 1.63it/s] {'loss': 0.2011, 'grad_norm': 0.49609190225601196, 'learning_rate': 8.187633311481094e-06, 'epoch': 1.06}
35%|███▌ | 4057/11526 [42:18<1:16:33, 1.63it/s] 35%|███▌ | 4058/11526 [42:18<1:16:28, 1.63it/s] {'loss': 0.2242, 'grad_norm': 0.5150290131568909, 'learning_rate': 8.186466496132228e-06, 'epoch': 1.06}
35%|███▌ | 4058/11526 [42:18<1:16:28, 1.63it/s] 35%|███▌ | 4059/11526 [42:19<1:16:26, 1.63it/s] {'loss': 0.2555, 'grad_norm': 0.838394284248352, 'learning_rate': 8.185299388502525e-06, 'epoch': 1.06}
35%|███▌ | 4059/11526 [42:19<1:16:26, 1.63it/s] 35%|███▌ | 4060/11526 [42:19<1:16:27, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.536166787147522, 'learning_rate': 8.184131988699038e-06, 'epoch': 1.06}
35%|███▌ | 4060/11526 [42:20<1:16:27, 1.63it/s] 35%|███▌ | 4061/11526 [42:20<1:16:34, 1.62it/s] {'loss': 0.1868, 'grad_norm': 0.532267689704895, 'learning_rate': 8.182964296828848e-06, 'epoch': 1.06}
35%|███▌ | 4061/11526 [42:20<1:16:34, 1.62it/s] 35%|███▌ | 4062/11526 [42:21<1:16:30, 1.63it/s] {'loss': 0.2099, 'grad_norm': 0.612815797328949, 'learning_rate': 8.181796312999063e-06, 'epoch': 1.06}
35%|███▌ | 4062/11526 [42:21<1:16:30, 1.63it/s] 35%|███▌ | 4063/11526 [42:21<1:16:26, 1.63it/s] {'loss': 0.2642, 'grad_norm': 0.6639155149459839, 'learning_rate': 8.180628037316815e-06, 'epoch': 1.06}
35%|███▌ | 4063/11526 [42:21<1:16:26, 1.63it/s] 35%|███▌ | 4064/11526 [42:22<1:16:26, 1.63it/s] {'loss': 0.1809, 'grad_norm': 0.4971505403518677, 'learning_rate': 8.179459469889269e-06, 'epoch': 1.06}
35%|███▌ | 4064/11526 [42:22<1:16:26, 1.63it/s] 35%|███▌ | 4065/11526 [42:22<1:16:25, 1.63it/s] {'loss': 0.231, 'grad_norm': 0.5860890746116638, 'learning_rate': 8.178290610823607e-06, 'epoch': 1.06}
35%|███▌ | 4065/11526 [42:23<1:16:25, 1.63it/s] 35%|███▌ | 4066/11526 [42:23<1:16:32, 1.62it/s] {'loss': 0.2074, 'grad_norm': 0.5334957242012024, 'learning_rate': 8.177121460227048e-06, 'epoch': 1.06}
35%|███▌ | 4066/11526 [42:23<1:16:32, 1.62it/s] 35%|███▌ | 4067/11526 [42:24<1:16:26, 1.63it/s] {'loss': 0.2058, 'grad_norm': 0.5527533292770386, 'learning_rate': 8.175952018206832e-06, 'epoch': 1.06}
35%|███▌ | 4067/11526 [42:24<1:16:26, 1.63it/s] 35%|███▌ | 4068/11526 [42:24<1:16:23, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.45751404762268066, 'learning_rate': 8.174782284870227e-06, 'epoch': 1.06}
35%|███▌ | 4068/11526 [42:24<1:16:23, 1.63it/s] 35%|███▌ | 4069/11526 [42:25<1:16:19, 1.63it/s] {'loss': 0.2918, 'grad_norm': 0.6868885159492493, 'learning_rate': 8.173612260324526e-06, 'epoch': 1.06}
35%|███▌ | 4069/11526 [42:25<1:16:19, 1.63it/s] 35%|███▌ | 4070/11526 [42:26<1:16:20, 1.63it/s] {'loss': 0.2776, 'grad_norm': 0.6049869060516357, 'learning_rate': 8.17244194467705e-06, 'epoch': 1.06}
35%|███▌ | 4070/11526 [42:26<1:16:20, 1.63it/s] 35%|███▌ | 4071/11526 [42:26<1:16:30, 1.62it/s] {'loss': 0.1468, 'grad_norm': 0.4493536353111267, 'learning_rate': 8.17127133803515e-06, 'epoch': 1.06}
35%|███▌ | 4071/11526 [42:26<1:16:30, 1.62it/s] 35%|███▌ | 4072/11526 [42:27<1:16:23, 1.63it/s] {'loss': 0.2274, 'grad_norm': 0.49712884426116943, 'learning_rate': 8.1701004405062e-06, 'epoch': 1.06}
35%|███▌ | 4072/11526 [42:27<1:16:23, 1.63it/s] 35%|███▌ | 4073/11526 [42:27<1:16:21, 1.63it/s] {'loss': 0.2016, 'grad_norm': 0.5752489566802979, 'learning_rate': 8.168929252197599e-06, 'epoch': 1.06}
35%|███▌ | 4073/11526 [42:27<1:16:21, 1.63it/s] 35%|███▌ | 4074/11526 [42:28<1:16:17, 1.63it/s] {'loss': 0.2575, 'grad_norm': 0.6167989373207092, 'learning_rate': 8.167757773216776e-06, 'epoch': 1.06}
35%|███▌ | 4074/11526 [42:28<1:16:17, 1.63it/s] 35%|███▌ | 4075/11526 [42:29<1:16:15, 1.63it/s] {'loss': 0.1962, 'grad_norm': 0.5565755367279053, 'learning_rate': 8.166586003671189e-06, 'epoch': 1.06}
35%|███▌ | 4075/11526 [42:29<1:16:15, 1.63it/s] 35%|███▌ | 4076/11526 [42:29<1:16:17, 1.63it/s] {'loss': 0.2041, 'grad_norm': 0.5708906054496765, 'learning_rate': 8.165413943668314e-06, 'epoch': 1.06}
35%|███▌ | 4076/11526 [42:29<1:16:17, 1.63it/s] 35%|███▌ | 4077/11526 [42:30<1:16:18, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.6418193578720093, 'learning_rate': 8.164241593315662e-06, 'epoch': 1.06}
35%|███▌ | 4077/11526 [42:30<1:16:18, 1.63it/s] 35%|███▌ | 4078/11526 [42:30<1:16:22, 1.63it/s] {'loss': 0.2809, 'grad_norm': 0.7057561278343201, 'learning_rate': 8.163068952720769e-06, 'epoch': 1.06}
35%|███▌ | 4078/11526 [42:31<1:16:22, 1.63it/s] 35%|███▌ | 4079/11526 [42:31<1:16:20, 1.63it/s] {'loss': 0.1987, 'grad_norm': 0.514682948589325, 'learning_rate': 8.16189602199119e-06, 'epoch': 1.06}
35%|███▌ | 4079/11526 [42:31<1:16:20, 1.63it/s] 35%|███▌ | 4080/11526 [42:32<1:16:17, 1.63it/s] {'loss': 0.2022, 'grad_norm': 0.4736998975276947, 'learning_rate': 8.160722801234524e-06, 'epoch': 1.06}
35%|███▌ | 4080/11526 [42:32<1:16:17, 1.63it/s] 35%|███▌ | 4081/11526 [42:32<1:16:18, 1.63it/s] {'loss': 0.2638, 'grad_norm': 0.5582015514373779, 'learning_rate': 8.159549290558374e-06, 'epoch': 1.06}
35%|███▌ | 4081/11526 [42:32<1:16:18, 1.63it/s] 35%|███▌ | 4082/11526 [42:33<1:16:19, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.46217650175094604, 'learning_rate': 8.158375490070388e-06, 'epoch': 1.06}
35%|███▌ | 4082/11526 [42:33<1:16:19, 1.63it/s] 35%|███▌ | 4083/11526 [42:34<1:16:14, 1.63it/s] {'loss': 0.2539, 'grad_norm': 0.6746267080307007, 'learning_rate': 8.15720139987823e-06, 'epoch': 1.06}
35%|███▌ | 4083/11526 [42:34<1:16:14, 1.63it/s] 35%|███▌ | 4084/11526 [42:34<1:16:09, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.5428937077522278, 'learning_rate': 8.156027020089599e-06, 'epoch': 1.06}
35%|███▌ | 4084/11526 [42:34<1:16:09, 1.63it/s] 35%|███▌ | 4085/11526 [42:35<1:16:06, 1.63it/s] {'loss': 0.2322, 'grad_norm': 0.6594823598861694, 'learning_rate': 8.15485235081221e-06, 'epoch': 1.06}
35%|███▌ | 4085/11526 [42:35<1:16:06, 1.63it/s] 35%|███▌ | 4086/11526 [42:35<1:16:31, 1.62it/s] {'loss': 0.2065, 'grad_norm': 0.5312978029251099, 'learning_rate': 8.153677392153813e-06, 'epoch': 1.06}
35%|███▌ | 4086/11526 [42:35<1:16:31, 1.62it/s] 35%|███▌ | 4087/11526 [42:36<1:16:23, 1.62it/s] {'loss': 0.2486, 'grad_norm': 0.5602365136146545, 'learning_rate': 8.152502144222183e-06, 'epoch': 1.06}
35%|███▌ | 4087/11526 [42:36<1:16:23, 1.62it/s] 35%|███▌ | 4088/11526 [42:37<1:16:15, 1.63it/s] {'loss': 0.1962, 'grad_norm': 0.4904923439025879, 'learning_rate': 8.15132660712512e-06, 'epoch': 1.06}
35%|███▌ | 4088/11526 [42:37<1:16:15, 1.63it/s] 35%|███▌ | 4089/11526 [42:37<1:16:12, 1.63it/s] {'loss': 0.2108, 'grad_norm': 0.5410940647125244, 'learning_rate': 8.150150780970449e-06, 'epoch': 1.06}
35%|███▌ | 4089/11526 [42:37<1:16:12, 1.63it/s] 35%|███▌ | 4090/11526 [42:38<1:16:09, 1.63it/s] {'loss': 0.1724, 'grad_norm': 0.4883768558502197, 'learning_rate': 8.148974665866024e-06, 'epoch': 1.06}
35%|███▌ | 4090/11526 [42:38<1:16:09, 1.63it/s] 35%|███▌ | 4091/11526 [42:38<1:16:13, 1.63it/s] {'loss': 0.2196, 'grad_norm': 0.4955219328403473, 'learning_rate': 8.147798261919728e-06, 'epoch': 1.06}
35%|███▌ | 4091/11526 [42:39<1:16:13, 1.63it/s] 36%|███▌ | 4092/11526 [42:39<1:16:11, 1.63it/s] {'loss': 0.2137, 'grad_norm': 0.5488519668579102, 'learning_rate': 8.146621569239463e-06, 'epoch': 1.07}
36%|███▌ | 4092/11526 [42:39<1:16:11, 1.63it/s] 36%|███▌ | 4093/11526 [42:40<1:16:09, 1.63it/s] {'loss': 0.2058, 'grad_norm': 0.5064741969108582, 'learning_rate': 8.145444587933165e-06, 'epoch': 1.07}
36%|███▌ | 4093/11526 [42:40<1:16:09, 1.63it/s] 36%|███▌ | 4094/11526 [42:40<1:16:07, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.5086061358451843, 'learning_rate': 8.144267318108792e-06, 'epoch': 1.07}
36%|███▌ | 4094/11526 [42:40<1:16:07, 1.63it/s] 36%|███▌ | 4095/11526 [42:41<1:16:06, 1.63it/s] {'loss': 0.2284, 'grad_norm': 0.8488316535949707, 'learning_rate': 8.143089759874333e-06, 'epoch': 1.07}
36%|███▌ | 4095/11526 [42:41<1:16:06, 1.63it/s] 36%|███▌ | 4096/11526 [42:42<1:16:11, 1.63it/s] {'loss': 0.2046, 'grad_norm': 0.6049647927284241, 'learning_rate': 8.141911913337794e-06, 'epoch': 1.07}
36%|███▌ | 4096/11526 [42:42<1:16:11, 1.63it/s] 36%|███▌ | 4097/11526 [42:42<1:16:07, 1.63it/s] {'loss': 0.1998, 'grad_norm': 0.4598557651042938, 'learning_rate': 8.140733778607219e-06, 'epoch': 1.07}
36%|███▌ | 4097/11526 [42:42<1:16:07, 1.63it/s] 36%|███▌ | 4098/11526 [42:43<1:16:04, 1.63it/s] {'loss': 0.1749, 'grad_norm': 0.5307749509811401, 'learning_rate': 8.139555355790673e-06, 'epoch': 1.07}
36%|███▌ | 4098/11526 [42:43<1:16:04, 1.63it/s] 36%|███▌ | 4099/11526 [42:43<1:16:01, 1.63it/s] {'loss': 0.1853, 'grad_norm': 0.5255457162857056, 'learning_rate': 8.138376644996246e-06, 'epoch': 1.07}
36%|███▌ | 4099/11526 [42:43<1:16:01, 1.63it/s] 36%|███▌ | 4100/11526 [42:44<1:15:59, 1.63it/s] {'loss': 0.207, 'grad_norm': 0.5778472423553467, 'learning_rate': 8.137197646332055e-06, 'epoch': 1.07}
36%|███▌ | 4100/11526 [42:44<1:15:59, 1.63it/s] 36%|███▌ | 4101/11526 [42:45<1:16:08, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.624174952507019, 'learning_rate': 8.136018359906248e-06, 'epoch': 1.07}
36%|███▌ | 4101/11526 [42:45<1:16:08, 1.63it/s] 36%|███▌ | 4102/11526 [42:45<1:16:07, 1.63it/s] {'loss': 0.2037, 'grad_norm': 0.594325602054596, 'learning_rate': 8.134838785826993e-06, 'epoch': 1.07}
36%|███▌ | 4102/11526 [42:45<1:16:07, 1.63it/s] 36%|███▌ | 4103/11526 [42:46<1:16:04, 1.63it/s] {'loss': 0.182, 'grad_norm': 0.5045993328094482, 'learning_rate': 8.133658924202489e-06, 'epoch': 1.07}
36%|███▌ | 4103/11526 [42:46<1:16:04, 1.63it/s] 36%|███▌ | 4104/11526 [42:46<1:16:01, 1.63it/s] {'loss': 0.2458, 'grad_norm': 0.5623756647109985, 'learning_rate': 8.132478775140955e-06, 'epoch': 1.07}
36%|███▌ | 4104/11526 [42:47<1:16:01, 1.63it/s] 36%|███▌ | 4105/11526 [42:47<1:15:58, 1.63it/s] {'loss': 0.1795, 'grad_norm': 0.5429458618164062, 'learning_rate': 8.131298338750648e-06, 'epoch': 1.07}
36%|███▌ | 4105/11526 [42:47<1:15:58, 1.63it/s] 36%|███▌ | 4106/11526 [42:48<1:16:04, 1.63it/s] {'loss': 0.1727, 'grad_norm': 0.4667350649833679, 'learning_rate': 8.130117615139841e-06, 'epoch': 1.07}
36%|███▌ | 4106/11526 [42:48<1:16:04, 1.63it/s] 36%|███▌ | 4107/11526 [42:48<1:16:01, 1.63it/s] {'loss': 0.2495, 'grad_norm': 0.5642751455307007, 'learning_rate': 8.128936604416836e-06, 'epoch': 1.07}
36%|███▌ | 4107/11526 [42:48<1:16:01, 1.63it/s] 36%|███▌ | 4108/11526 [42:49<1:15:59, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.5607107877731323, 'learning_rate': 8.127755306689961e-06, 'epoch': 1.07}
36%|███▌ | 4108/11526 [42:49<1:15:59, 1.63it/s] 36%|███▌ | 4109/11526 [42:49<1:15:57, 1.63it/s] {'loss': 0.2003, 'grad_norm': 0.47000110149383545, 'learning_rate': 8.126573722067577e-06, 'epoch': 1.07}
36%|███▌ | 4109/11526 [42:50<1:15:57, 1.63it/s] 36%|███▌ | 4110/11526 [42:50<1:15:55, 1.63it/s] {'loss': 0.1659, 'grad_norm': 0.43776896595954895, 'learning_rate': 8.125391850658059e-06, 'epoch': 1.07}
36%|███▌ | 4110/11526 [42:50<1:15:55, 1.63it/s] 36%|███▌ | 4111/11526 [42:51<1:16:02, 1.63it/s] {'loss': 0.1527, 'grad_norm': 0.4354403018951416, 'learning_rate': 8.124209692569817e-06, 'epoch': 1.07}
36%|███▌ | 4111/11526 [42:51<1:16:02, 1.63it/s] 36%|███▌ | 4112/11526 [42:51<1:15:57, 1.63it/s] {'loss': 0.1923, 'grad_norm': 0.6118038892745972, 'learning_rate': 8.123027247911287e-06, 'epoch': 1.07}
36%|███▌ | 4112/11526 [42:51<1:15:57, 1.63it/s] 36%|███▌ | 4113/11526 [42:52<1:15:53, 1.63it/s] {'loss': 0.2431, 'grad_norm': 0.5973016619682312, 'learning_rate': 8.12184451679093e-06, 'epoch': 1.07}
36%|███▌ | 4113/11526 [42:52<1:15:53, 1.63it/s] 36%|███▌ | 4114/11526 [42:53<1:15:53, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.6007474660873413, 'learning_rate': 8.12066149931723e-06, 'epoch': 1.07}
36%|███▌ | 4114/11526 [42:53<1:15:53, 1.63it/s] 36%|███▌ | 4115/11526 [42:53<1:15:50, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.539337158203125, 'learning_rate': 8.119478195598702e-06, 'epoch': 1.07}
36%|███▌ | 4115/11526 [42:53<1:15:50, 1.63it/s] 36%|███▌ | 4116/11526 [42:54<1:15:56, 1.63it/s] {'loss': 0.1931, 'grad_norm': 0.5239818692207336, 'learning_rate': 8.118294605743884e-06, 'epoch': 1.07}
36%|███▌ | 4116/11526 [42:54<1:15:56, 1.63it/s] 36%|███▌ | 4117/11526 [42:54<1:15:53, 1.63it/s] {'loss': 0.1964, 'grad_norm': 0.5875909328460693, 'learning_rate': 8.117110729861343e-06, 'epoch': 1.07}
36%|███▌ | 4117/11526 [42:55<1:15:53, 1.63it/s] 36%|███▌ | 4118/11526 [42:55<1:15:52, 1.63it/s] {'loss': 0.183, 'grad_norm': 0.5107603073120117, 'learning_rate': 8.11592656805967e-06, 'epoch': 1.07}
36%|███▌ | 4118/11526 [42:55<1:15:52, 1.63it/s] 36%|███▌ | 4119/11526 [42:56<1:15:49, 1.63it/s] {'loss': 0.1923, 'grad_norm': 0.46346166729927063, 'learning_rate': 8.114742120447482e-06, 'epoch': 1.07}
36%|███▌ | 4119/11526 [42:56<1:15:49, 1.63it/s] 36%|███▌ | 4120/11526 [42:56<1:15:47, 1.63it/s] {'loss': 0.259, 'grad_norm': 0.7498629093170166, 'learning_rate': 8.113557387133427e-06, 'epoch': 1.07}
36%|███▌ | 4120/11526 [42:56<1:15:47, 1.63it/s] 36%|███▌ | 4121/11526 [42:57<1:15:50, 1.63it/s] {'loss': 0.1536, 'grad_norm': 0.40540266036987305, 'learning_rate': 8.112372368226172e-06, 'epoch': 1.07}
36%|███▌ | 4121/11526 [42:57<1:15:50, 1.63it/s] 36%|███▌ | 4122/11526 [42:57<1:15:49, 1.63it/s] {'loss': 0.1768, 'grad_norm': 0.5285024642944336, 'learning_rate': 8.111187063834414e-06, 'epoch': 1.07}
36%|███▌ | 4122/11526 [42:58<1:15:49, 1.63it/s] 36%|███▌ | 4123/11526 [42:58<1:15:47, 1.63it/s] {'loss': 0.2339, 'grad_norm': 0.5749674439430237, 'learning_rate': 8.110001474066878e-06, 'epoch': 1.07}
36%|███▌ | 4123/11526 [42:58<1:15:47, 1.63it/s] 36%|███▌ | 4124/11526 [42:59<1:15:44, 1.63it/s] {'loss': 0.2008, 'grad_norm': 0.5151695609092712, 'learning_rate': 8.10881559903231e-06, 'epoch': 1.07}
36%|███▌ | 4124/11526 [42:59<1:15:44, 1.63it/s] 36%|███▌ | 4125/11526 [42:59<1:15:46, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.5240480303764343, 'learning_rate': 8.107629438839489e-06, 'epoch': 1.07}
36%|███▌ | 4125/11526 [42:59<1:15:46, 1.63it/s] 36%|███▌ | 4126/11526 [43:00<1:15:50, 1.63it/s] {'loss': 0.2104, 'grad_norm': 0.5367041230201721, 'learning_rate': 8.106442993597212e-06, 'epoch': 1.07}
36%|███▌ | 4126/11526 [43:00<1:15:50, 1.63it/s] 36%|███▌ | 4127/11526 [43:01<1:15:46, 1.63it/s] {'loss': 0.247, 'grad_norm': 0.6630675792694092, 'learning_rate': 8.105256263414309e-06, 'epoch': 1.07}
36%|███▌ | 4127/11526 [43:01<1:15:46, 1.63it/s] 36%|███▌ | 4128/11526 [43:01<1:15:44, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.5502515435218811, 'learning_rate': 8.104069248399635e-06, 'epoch': 1.07}
36%|███▌ | 4128/11526 [43:01<1:15:44, 1.63it/s] 36%|███▌ | 4129/11526 [43:02<1:15:42, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.5450975894927979, 'learning_rate': 8.102881948662067e-06, 'epoch': 1.07}
36%|███▌ | 4129/11526 [43:02<1:15:42, 1.63it/s] 36%|███▌ | 4130/11526 [43:02<1:15:41, 1.63it/s] {'loss': 0.2164, 'grad_norm': 0.6102941036224365, 'learning_rate': 8.101694364310511e-06, 'epoch': 1.07}
36%|███▌ | 4130/11526 [43:03<1:15:41, 1.63it/s] 36%|███▌ | 4131/11526 [43:03<1:15:48, 1.63it/s] {'loss': 0.2131, 'grad_norm': 0.5794852375984192, 'learning_rate': 8.100506495453902e-06, 'epoch': 1.08}
36%|███▌ | 4131/11526 [43:03<1:15:48, 1.63it/s] 36%|███▌ | 4132/11526 [43:04<1:15:46, 1.63it/s] {'loss': 0.1885, 'grad_norm': 0.6166860461235046, 'learning_rate': 8.099318342201195e-06, 'epoch': 1.08}
36%|███▌ | 4132/11526 [43:04<1:15:46, 1.63it/s] 36%|███▌ | 4133/11526 [43:04<1:15:43, 1.63it/s] {'loss': 0.2092, 'grad_norm': 0.5747907161712646, 'learning_rate': 8.098129904661377e-06, 'epoch': 1.08}
36%|███▌ | 4133/11526 [43:04<1:15:43, 1.63it/s] 36%|███▌ | 4134/11526 [43:05<1:15:41, 1.63it/s] {'loss': 0.2498, 'grad_norm': 0.6161974668502808, 'learning_rate': 8.096941182943452e-06, 'epoch': 1.08}
36%|███▌ | 4134/11526 [43:05<1:15:41, 1.63it/s] 36%|███▌ | 4135/11526 [43:05<1:15:43, 1.63it/s] {'loss': 0.2052, 'grad_norm': 0.5631926655769348, 'learning_rate': 8.095752177156463e-06, 'epoch': 1.08}
36%|███▌ | 4135/11526 [43:06<1:15:43, 1.63it/s] 36%|███▌ | 4136/11526 [43:06<1:15:48, 1.62it/s] {'loss': 0.2307, 'grad_norm': 0.6750310063362122, 'learning_rate': 8.094562887409471e-06, 'epoch': 1.08}
36%|███▌ | 4136/11526 [43:06<1:15:48, 1.62it/s] 36%|███▌ | 4137/11526 [43:07<1:15:45, 1.63it/s] {'loss': 0.165, 'grad_norm': 0.48189154267311096, 'learning_rate': 8.093373313811566e-06, 'epoch': 1.08}
36%|███▌ | 4137/11526 [43:07<1:15:45, 1.63it/s] 36%|███▌ | 4138/11526 [43:07<1:15:43, 1.63it/s] {'loss': 0.2428, 'grad_norm': 0.6424674391746521, 'learning_rate': 8.092183456471856e-06, 'epoch': 1.08}
36%|███▌ | 4138/11526 [43:07<1:15:43, 1.63it/s] 36%|███▌ | 4139/11526 [43:08<1:15:39, 1.63it/s] {'loss': 0.2156, 'grad_norm': 0.5708689093589783, 'learning_rate': 8.090993315499488e-06, 'epoch': 1.08}
36%|███▌ | 4139/11526 [43:08<1:15:39, 1.63it/s] 36%|███▌ | 4140/11526 [43:09<1:15:36, 1.63it/s] {'loss': 0.171, 'grad_norm': 0.5700467228889465, 'learning_rate': 8.089802891003626e-06, 'epoch': 1.08}
36%|███▌ | 4140/11526 [43:09<1:15:36, 1.63it/s] 36%|███▌ | 4141/11526 [43:09<1:15:45, 1.62it/s] {'loss': 0.2279, 'grad_norm': 0.5969499349594116, 'learning_rate': 8.08861218309346e-06, 'epoch': 1.08}
36%|███▌ | 4141/11526 [43:09<1:15:45, 1.62it/s] 36%|███▌ | 4142/11526 [43:10<1:15:40, 1.63it/s] {'loss': 0.1569, 'grad_norm': 0.44344502687454224, 'learning_rate': 8.087421191878213e-06, 'epoch': 1.08}
36%|███▌ | 4142/11526 [43:10<1:15:40, 1.63it/s] 36%|███▌ | 4143/11526 [43:10<1:15:40, 1.63it/s] {'loss': 0.1876, 'grad_norm': 0.5383709669113159, 'learning_rate': 8.086229917467128e-06, 'epoch': 1.08}
36%|███▌ | 4143/11526 [43:11<1:15:40, 1.63it/s] 36%|███▌ | 4144/11526 [43:11<1:15:37, 1.63it/s] {'loss': 0.1971, 'grad_norm': 0.5209817886352539, 'learning_rate': 8.085038359969475e-06, 'epoch': 1.08}
36%|███▌ | 4144/11526 [43:11<1:15:37, 1.63it/s] 36%|███▌ | 4145/11526 [43:12<1:15:34, 1.63it/s] {'loss': 0.1969, 'grad_norm': 0.5245858430862427, 'learning_rate': 8.083846519494549e-06, 'epoch': 1.08}
36%|███▌ | 4145/11526 [43:12<1:15:34, 1.63it/s] 36%|███▌ | 4146/11526 [43:12<1:15:39, 1.63it/s] {'loss': 0.2033, 'grad_norm': 0.4819725453853607, 'learning_rate': 8.082654396151676e-06, 'epoch': 1.08}
36%|███▌ | 4146/11526 [43:12<1:15:39, 1.63it/s] 36%|███▌ | 4147/11526 [43:13<1:15:36, 1.63it/s] {'loss': 0.2458, 'grad_norm': 0.5565529465675354, 'learning_rate': 8.081461990050202e-06, 'epoch': 1.08}
36%|███▌ | 4147/11526 [43:13<1:15:36, 1.63it/s] 36%|███▌ | 4148/11526 [43:13<1:15:32, 1.63it/s] {'loss': 0.1732, 'grad_norm': 0.425310343503952, 'learning_rate': 8.080269301299499e-06, 'epoch': 1.08}
36%|███▌ | 4148/11526 [43:14<1:15:32, 1.63it/s] 36%|███▌ | 4149/11526 [43:14<1:15:30, 1.63it/s] {'loss': 0.1718, 'grad_norm': 0.45756351947784424, 'learning_rate': 8.079076330008972e-06, 'epoch': 1.08}
36%|███▌ | 4149/11526 [43:14<1:15:30, 1.63it/s] 36%|███▌ | 4150/11526 [43:15<1:15:31, 1.63it/s] {'loss': 0.1763, 'grad_norm': 0.5121030211448669, 'learning_rate': 8.077883076288044e-06, 'epoch': 1.08}
36%|███▌ | 4150/11526 [43:15<1:15:31, 1.63it/s] 36%|███▌ | 4151/11526 [43:15<1:15:38, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.5508302450180054, 'learning_rate': 8.076689540246167e-06, 'epoch': 1.08}
36%|███▌ | 4151/11526 [43:15<1:15:38, 1.63it/s] 36%|███▌ | 4152/11526 [43:16<1:15:36, 1.63it/s] {'loss': 0.2021, 'grad_norm': 0.611681342124939, 'learning_rate': 8.075495721992821e-06, 'epoch': 1.08}
36%|███▌ | 4152/11526 [43:16<1:15:36, 1.63it/s] 36%|███▌ | 4153/11526 [43:17<1:15:33, 1.63it/s] {'loss': 0.1857, 'grad_norm': 0.5096912384033203, 'learning_rate': 8.074301621637508e-06, 'epoch': 1.08}
36%|███▌ | 4153/11526 [43:17<1:15:33, 1.63it/s] 36%|███▌ | 4154/11526 [43:17<1:15:29, 1.63it/s] {'loss': 0.2601, 'grad_norm': 0.5492280125617981, 'learning_rate': 8.073107239289758e-06, 'epoch': 1.08}
36%|███▌ | 4154/11526 [43:17<1:15:29, 1.63it/s] 36%|███▌ | 4155/11526 [43:18<1:15:27, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.4976823925971985, 'learning_rate': 8.071912575059127e-06, 'epoch': 1.08}
36%|███▌ | 4155/11526 [43:18<1:15:27, 1.63it/s] 36%|███▌ | 4156/11526 [43:18<1:15:31, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.4484266936779022, 'learning_rate': 8.070717629055199e-06, 'epoch': 1.08}
36%|███▌ | 4156/11526 [43:19<1:15:31, 1.63it/s] 36%|███▌ | 4157/11526 [43:19<1:15:29, 1.63it/s] {'loss': 0.2405, 'grad_norm': 0.5448426008224487, 'learning_rate': 8.069522401387573e-06, 'epoch': 1.08}
36%|███▌ | 4157/11526 [43:19<1:15:29, 1.63it/s] 36%|███▌ | 4158/11526 [43:20<1:15:27, 1.63it/s] {'loss': 0.2125, 'grad_norm': 0.5023510456085205, 'learning_rate': 8.068326892165891e-06, 'epoch': 1.08}
36%|███▌ | 4158/11526 [43:20<1:15:27, 1.63it/s] 36%|███▌ | 4159/11526 [43:20<1:15:27, 1.63it/s] {'loss': 0.2027, 'grad_norm': 0.5795854330062866, 'learning_rate': 8.067131101499808e-06, 'epoch': 1.08}
36%|███▌ | 4159/11526 [43:20<1:15:27, 1.63it/s] 36%|███▌ | 4160/11526 [43:21<1:15:26, 1.63it/s] {'loss': 0.1941, 'grad_norm': 0.4677744209766388, 'learning_rate': 8.065935029499007e-06, 'epoch': 1.08}
36%|███▌ | 4160/11526 [43:21<1:15:26, 1.63it/s] 36%|███▌ | 4161/11526 [43:21<1:15:30, 1.63it/s] {'loss': 0.198, 'grad_norm': 0.5462692975997925, 'learning_rate': 8.064738676273202e-06, 'epoch': 1.08}
36%|███▌ | 4161/11526 [43:22<1:15:30, 1.63it/s] 36%|███▌ | 4162/11526 [43:22<1:15:24, 1.63it/s] {'loss': 0.196, 'grad_norm': 0.5843931436538696, 'learning_rate': 8.06354204193213e-06, 'epoch': 1.08}
36%|███▌ | 4162/11526 [43:22<1:15:24, 1.63it/s] 36%|███▌ | 4163/11526 [43:23<1:15:26, 1.63it/s] {'loss': 0.2749, 'grad_norm': 0.6358931064605713, 'learning_rate': 8.062345126585549e-06, 'epoch': 1.08}
36%|███▌ | 4163/11526 [43:23<1:15:26, 1.63it/s] 36%|███▌ | 4164/11526 [43:23<1:15:24, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.504209578037262, 'learning_rate': 8.061147930343249e-06, 'epoch': 1.08}
36%|███▌ | 4164/11526 [43:23<1:15:24, 1.63it/s] 36%|███▌ | 4165/11526 [43:24<1:15:25, 1.63it/s] {'loss': 0.2216, 'grad_norm': 0.5716229677200317, 'learning_rate': 8.059950453315044e-06, 'epoch': 1.08}
36%|███▌ | 4165/11526 [43:24<1:15:25, 1.63it/s] 36%|███▌ | 4166/11526 [43:25<1:15:29, 1.62it/s] {'loss': 0.196, 'grad_norm': 0.5556299686431885, 'learning_rate': 8.058752695610772e-06, 'epoch': 1.08}
36%|███▌ | 4166/11526 [43:25<1:15:29, 1.62it/s] 36%|███▌ | 4167/11526 [43:25<1:15:24, 1.63it/s] {'loss': 0.1874, 'grad_norm': 0.48068463802337646, 'learning_rate': 8.057554657340301e-06, 'epoch': 1.08}
36%|███▌ | 4167/11526 [43:25<1:15:24, 1.63it/s] 36%|███▌ | 4168/11526 [43:26<1:15:18, 1.63it/s] {'loss': 0.2484, 'grad_norm': 0.6174185276031494, 'learning_rate': 8.056356338613519e-06, 'epoch': 1.08}
36%|███▌ | 4168/11526 [43:26<1:15:18, 1.63it/s] 36%|███▌ | 4169/11526 [43:26<1:15:21, 1.63it/s] {'loss': 0.2046, 'grad_norm': 0.5507436990737915, 'learning_rate': 8.055157739540344e-06, 'epoch': 1.09}
36%|███▌ | 4169/11526 [43:27<1:15:21, 1.63it/s] 36%|███▌ | 4170/11526 [43:27<1:15:19, 1.63it/s] {'loss': 0.1964, 'grad_norm': 0.4598886966705322, 'learning_rate': 8.053958860230718e-06, 'epoch': 1.09}
36%|███▌ | 4170/11526 [43:27<1:15:19, 1.63it/s] 36%|███▌ | 4171/11526 [43:28<1:15:37, 1.62it/s] {'loss': 0.1784, 'grad_norm': 0.5622082352638245, 'learning_rate': 8.05275970079461e-06, 'epoch': 1.09}
36%|███▌ | 4171/11526 [43:28<1:15:37, 1.62it/s] 36%|███▌ | 4172/11526 [43:28<1:15:31, 1.62it/s] {'loss': 0.1767, 'grad_norm': 0.47884324193000793, 'learning_rate': 8.05156026134201e-06, 'epoch': 1.09}
36%|███▌ | 4172/11526 [43:28<1:15:31, 1.62it/s] 36%|███▌ | 4173/11526 [43:29<1:15:25, 1.62it/s] {'loss': 0.1922, 'grad_norm': 0.49880918860435486, 'learning_rate': 8.050360541982943e-06, 'epoch': 1.09}
36%|███▌ | 4173/11526 [43:29<1:15:25, 1.62it/s] 36%|███▌ | 4174/11526 [43:29<1:15:24, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5041133165359497, 'learning_rate': 8.049160542827449e-06, 'epoch': 1.09}
36%|███▌ | 4174/11526 [43:30<1:15:24, 1.63it/s] 36%|███▌ | 4175/11526 [43:30<1:15:23, 1.63it/s] {'loss': 0.2435, 'grad_norm': 0.6193109750747681, 'learning_rate': 8.047960263985603e-06, 'epoch': 1.09}
36%|███▌ | 4175/11526 [43:30<1:15:23, 1.63it/s] 36%|███▌ | 4176/11526 [43:31<1:15:27, 1.62it/s] {'loss': 0.2231, 'grad_norm': 0.5070488452911377, 'learning_rate': 8.046759705567498e-06, 'epoch': 1.09}
36%|███▌ | 4176/11526 [43:31<1:15:27, 1.62it/s] 36%|███▌ | 4177/11526 [43:31<1:15:22, 1.62it/s] {'loss': 0.2005, 'grad_norm': 0.5317742824554443, 'learning_rate': 8.045558867683258e-06, 'epoch': 1.09}
36%|███▌ | 4177/11526 [43:31<1:15:22, 1.62it/s] 36%|███▌ | 4178/11526 [43:32<1:15:21, 1.63it/s] {'loss': 0.2698, 'grad_norm': 0.5937094688415527, 'learning_rate': 8.04435775044303e-06, 'epoch': 1.09}
36%|███▌ | 4178/11526 [43:32<1:15:21, 1.63it/s] 36%|███▋ | 4179/11526 [43:33<1:15:20, 1.63it/s] {'loss': 0.2155, 'grad_norm': 0.5251777768135071, 'learning_rate': 8.043156353956987e-06, 'epoch': 1.09}
36%|███▋ | 4179/11526 [43:33<1:15:20, 1.63it/s] 36%|███▋ | 4180/11526 [43:33<1:15:22, 1.62it/s] {'loss': 0.1995, 'grad_norm': 0.5418022274971008, 'learning_rate': 8.04195467833533e-06, 'epoch': 1.09}
36%|███▋ | 4180/11526 [43:33<1:15:22, 1.62it/s] 36%|███▋ | 4181/11526 [43:34<1:15:26, 1.62it/s] {'loss': 0.255, 'grad_norm': 0.6645930409431458, 'learning_rate': 8.04075272368828e-06, 'epoch': 1.09}
36%|███▋ | 4181/11526 [43:34<1:15:26, 1.62it/s] 36%|███▋ | 4182/11526 [43:34<1:15:18, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.4646974802017212, 'learning_rate': 8.039550490126091e-06, 'epoch': 1.09}
36%|███▋ | 4182/11526 [43:35<1:15:18, 1.63it/s] 36%|███▋ | 4183/11526 [43:35<1:15:15, 1.63it/s] {'loss': 0.2092, 'grad_norm': 0.5609748363494873, 'learning_rate': 8.038347977759034e-06, 'epoch': 1.09}
36%|███▋ | 4183/11526 [43:35<1:15:15, 1.63it/s] 36%|███▋ | 4184/11526 [43:36<1:15:14, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.5351585149765015, 'learning_rate': 8.037145186697415e-06, 'epoch': 1.09}
36%|███▋ | 4184/11526 [43:36<1:15:14, 1.63it/s] 36%|███▋ | 4185/11526 [43:36<1:15:11, 1.63it/s] {'loss': 0.2498, 'grad_norm': 0.5782595276832581, 'learning_rate': 8.035942117051558e-06, 'epoch': 1.09}
36%|███▋ | 4185/11526 [43:36<1:15:11, 1.63it/s] 36%|███▋ | 4186/11526 [43:37<1:15:21, 1.62it/s] {'loss': 0.2054, 'grad_norm': 0.5381998419761658, 'learning_rate': 8.034738768931817e-06, 'epoch': 1.09}
36%|███▋ | 4186/11526 [43:37<1:15:21, 1.62it/s] 36%|███▋ | 4187/11526 [43:37<1:15:16, 1.62it/s] {'loss': 0.2493, 'grad_norm': 0.5716237425804138, 'learning_rate': 8.03353514244857e-06, 'epoch': 1.09}
36%|███▋ | 4187/11526 [43:38<1:15:16, 1.62it/s] 36%|███▋ | 4188/11526 [43:38<1:15:11, 1.63it/s] {'loss': 0.2055, 'grad_norm': 0.5933632254600525, 'learning_rate': 8.032331237712217e-06, 'epoch': 1.09}
36%|███▋ | 4188/11526 [43:38<1:15:11, 1.63it/s] 36%|███▋ | 4189/11526 [43:39<1:15:08, 1.63it/s] {'loss': 0.1771, 'grad_norm': 0.5224254727363586, 'learning_rate': 8.031127054833192e-06, 'epoch': 1.09}
36%|███▋ | 4189/11526 [43:39<1:15:08, 1.63it/s] 36%|███▋ | 4190/11526 [43:39<1:15:06, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.5843708515167236, 'learning_rate': 8.029922593921945e-06, 'epoch': 1.09}
36%|███▋ | 4190/11526 [43:39<1:15:06, 1.63it/s] 36%|███▋ | 4191/11526 [43:40<1:15:28, 1.62it/s] {'loss': 0.1933, 'grad_norm': 0.5632019639015198, 'learning_rate': 8.028717855088959e-06, 'epoch': 1.09}
36%|███▋ | 4191/11526 [43:40<1:15:28, 1.62it/s] 36%|███▋ | 4192/11526 [43:41<1:15:20, 1.62it/s] {'loss': 0.2013, 'grad_norm': 0.6316463947296143, 'learning_rate': 8.027512838444738e-06, 'epoch': 1.09}
36%|███▋ | 4192/11526 [43:41<1:15:20, 1.62it/s] 36%|███▋ | 4193/11526 [43:41<1:15:14, 1.62it/s] {'loss': 0.164, 'grad_norm': 0.5760238766670227, 'learning_rate': 8.026307544099817e-06, 'epoch': 1.09}
36%|███▋ | 4193/11526 [43:41<1:15:14, 1.62it/s] 36%|███▋ | 4194/11526 [43:42<1:15:09, 1.63it/s] {'loss': 0.2071, 'grad_norm': 0.5111436247825623, 'learning_rate': 8.025101972164745e-06, 'epoch': 1.09}
36%|███▋ | 4194/11526 [43:42<1:15:09, 1.63it/s] 36%|███▋ | 4195/11526 [43:42<1:15:05, 1.63it/s] {'loss': 0.1742, 'grad_norm': 0.4864507019519806, 'learning_rate': 8.02389612275011e-06, 'epoch': 1.09}
36%|███▋ | 4195/11526 [43:43<1:15:05, 1.63it/s] 36%|███▋ | 4196/11526 [43:43<1:15:07, 1.63it/s] {'loss': 0.2111, 'grad_norm': 0.6231271624565125, 'learning_rate': 8.022689995966517e-06, 'epoch': 1.09}
36%|███▋ | 4196/11526 [43:43<1:15:07, 1.63it/s] 36%|███▋ | 4197/11526 [43:44<1:15:08, 1.63it/s] {'loss': 0.3231, 'grad_norm': 0.7387405633926392, 'learning_rate': 8.021483591924599e-06, 'epoch': 1.09}
36%|███▋ | 4197/11526 [43:44<1:15:08, 1.63it/s] 36%|███▋ | 4198/11526 [43:44<1:15:04, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.501737654209137, 'learning_rate': 8.020276910735015e-06, 'epoch': 1.09}
36%|███▋ | 4198/11526 [43:44<1:15:04, 1.63it/s] 36%|███▋ | 4199/11526 [43:45<1:15:03, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.5254444479942322, 'learning_rate': 8.019069952508447e-06, 'epoch': 1.09}
36%|███▋ | 4199/11526 [43:45<1:15:03, 1.63it/s] 36%|███▋ | 4200/11526 [43:45<1:15:00, 1.63it/s] {'loss': 0.2545, 'grad_norm': 0.6146236658096313, 'learning_rate': 8.017862717355606e-06, 'epoch': 1.09}
36%|███▋ | 4200/11526 [43:46<1:15:00, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.620903730392456, 'eval_runtime': 1.955, 'eval_samples_per_second': 102.301, 'eval_steps_per_second': 6.65, 'epoch': 1.09}
36%|███▋ | 4200/11526 [43:48<1:15:00, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 36%|███▋ | 4201/11526 [43:48<2:26:53, 1.20s/it] {'loss': 0.1992, 'grad_norm': 0.5479962825775146, 'learning_rate': 8.016655205387225e-06, 'epoch': 1.09}
36%|███▋ | 4201/11526 [43:48<2:26:53, 1.20s/it] 36%|███▋ | 4202/11526 [43:49<2:05:15, 1.03s/it] {'loss': 0.2122, 'grad_norm': 0.5480123162269592, 'learning_rate': 8.015447416714065e-06, 'epoch': 1.09}
36%|███▋ | 4202/11526 [43:49<2:05:15, 1.03s/it] 36%|███▋ | 4203/11526 [43:49<1:50:08, 1.11it/s] {'loss': 0.1848, 'grad_norm': 0.5215761065483093, 'learning_rate': 8.014239351446911e-06, 'epoch': 1.09}
36%|███▋ | 4203/11526 [43:49<1:50:08, 1.11it/s] 36%|███▋ | 4204/11526 [43:50<1:39:34, 1.23it/s] {'loss': 0.1847, 'grad_norm': 0.499186635017395, 'learning_rate': 8.013031009696573e-06, 'epoch': 1.09}
36%|███▋ | 4204/11526 [43:50<1:39:34, 1.23it/s] 36%|███▋ | 4205/11526 [43:50<1:32:09, 1.32it/s] {'loss': 0.1716, 'grad_norm': 0.4681195020675659, 'learning_rate': 8.011822391573887e-06, 'epoch': 1.09}
36%|███▋ | 4205/11526 [43:51<1:32:09, 1.32it/s] 36%|███▋ | 4206/11526 [43:51<1:26:57, 1.40it/s] {'loss': 0.2592, 'grad_norm': 0.5897737145423889, 'learning_rate': 8.010613497189715e-06, 'epoch': 1.09}
36%|███▋ | 4206/11526 [43:51<1:26:57, 1.40it/s] 37%|███▋ | 4207/11526 [43:52<1:23:21, 1.46it/s] {'loss': 0.1975, 'grad_norm': 0.5553279519081116, 'learning_rate': 8.009404326654943e-06, 'epoch': 1.1}
37%|███▋ | 4207/11526 [43:52<1:23:21, 1.46it/s] 37%|███▋ | 4208/11526 [43:52<1:20:50, 1.51it/s] {'loss': 0.2327, 'grad_norm': 0.5920388698577881, 'learning_rate': 8.008194880080484e-06, 'epoch': 1.1}
37%|███▋ | 4208/11526 [43:52<1:20:50, 1.51it/s] 37%|███▋ | 4209/11526 [43:53<1:19:02, 1.54it/s] {'loss': 0.191, 'grad_norm': 0.4773310422897339, 'learning_rate': 8.006985157577277e-06, 'epoch': 1.1}
37%|███▋ | 4209/11526 [43:53<1:19:02, 1.54it/s] 37%|███▋ | 4210/11526 [43:54<1:17:46, 1.57it/s] {'loss': 0.1612, 'grad_norm': 0.466590017080307, 'learning_rate': 8.005775159256277e-06, 'epoch': 1.1}
37%|███▋ | 4210/11526 [43:54<1:17:46, 1.57it/s] 37%|███▋ | 4211/11526 [43:54<1:16:54, 1.59it/s] {'loss': 0.1878, 'grad_norm': 0.515770435333252, 'learning_rate': 8.004564885228481e-06, 'epoch': 1.1}
37%|███▋ | 4211/11526 [43:54<1:16:54, 1.59it/s] 37%|███▋ | 4212/11526 [43:55<1:16:15, 1.60it/s] {'loss': 0.2284, 'grad_norm': 0.5593000054359436, 'learning_rate': 8.003354335604897e-06, 'epoch': 1.1}
37%|███▋ | 4212/11526 [43:55<1:16:15, 1.60it/s] 37%|███▋ | 4213/11526 [43:55<1:15:50, 1.61it/s] {'loss': 0.1879, 'grad_norm': 0.5249115228652954, 'learning_rate': 8.002143510496566e-06, 'epoch': 1.1}
37%|███▋ | 4213/11526 [43:56<1:15:50, 1.61it/s] 37%|███▋ | 4214/11526 [43:56<1:15:32, 1.61it/s] {'loss': 0.1847, 'grad_norm': 0.5472351312637329, 'learning_rate': 8.000932410014549e-06, 'epoch': 1.1}
37%|███▋ | 4214/11526 [43:56<1:15:32, 1.61it/s] 37%|███▋ | 4215/11526 [43:57<1:15:18, 1.62it/s] {'loss': 0.2147, 'grad_norm': 0.5633552670478821, 'learning_rate': 7.999721034269938e-06, 'epoch': 1.1}
37%|███▋ | 4215/11526 [43:57<1:15:18, 1.62it/s] 37%|███▋ | 4216/11526 [43:57<1:15:13, 1.62it/s] {'loss': 0.2341, 'grad_norm': 0.5335580110549927, 'learning_rate': 7.998509383373847e-06, 'epoch': 1.1}
37%|███▋ | 4216/11526 [43:57<1:15:13, 1.62it/s] 37%|███▋ | 4217/11526 [43:58<1:15:05, 1.62it/s] {'loss': 0.2419, 'grad_norm': 0.6289557218551636, 'learning_rate': 7.997297457437413e-06, 'epoch': 1.1}
37%|███▋ | 4217/11526 [43:58<1:15:05, 1.62it/s] 37%|███▋ | 4218/11526 [43:58<1:14:59, 1.62it/s] {'loss': 0.2551, 'grad_norm': 0.6230116486549377, 'learning_rate': 7.996085256571804e-06, 'epoch': 1.1}
37%|███▋ | 4218/11526 [43:59<1:14:59, 1.62it/s] 37%|███▋ | 4219/11526 [43:59<1:14:56, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.5848832130432129, 'learning_rate': 7.994872780888206e-06, 'epoch': 1.1}
37%|███▋ | 4219/11526 [43:59<1:14:56, 1.63it/s] 37%|███▋ | 4220/11526 [44:00<1:14:53, 1.63it/s] {'loss': 0.2533, 'grad_norm': 0.6973192691802979, 'learning_rate': 7.993660030497838e-06, 'epoch': 1.1}
37%|███▋ | 4220/11526 [44:00<1:14:53, 1.63it/s] 37%|███▋ | 4221/11526 [44:00<1:14:58, 1.62it/s] {'loss': 0.2423, 'grad_norm': 0.6309161186218262, 'learning_rate': 7.992447005511937e-06, 'epoch': 1.1}
37%|███▋ | 4221/11526 [44:00<1:14:58, 1.62it/s] 37%|███▋ | 4222/11526 [44:01<1:14:53, 1.63it/s] {'loss': 0.1868, 'grad_norm': 0.582487165927887, 'learning_rate': 7.991233706041771e-06, 'epoch': 1.1}
37%|███▋ | 4222/11526 [44:01<1:14:53, 1.63it/s] 37%|███▋ | 4223/11526 [44:02<1:18:54, 1.54it/s] {'loss': 0.1837, 'grad_norm': 0.4556950628757477, 'learning_rate': 7.990020132198632e-06, 'epoch': 1.1}
37%|███▋ | 4223/11526 [44:02<1:18:54, 1.54it/s] 37%|███▋ | 4224/11526 [44:02<1:17:39, 1.57it/s] {'loss': 0.1342, 'grad_norm': 0.4087528884410858, 'learning_rate': 7.988806284093833e-06, 'epoch': 1.1}
37%|███▋ | 4224/11526 [44:02<1:17:39, 1.57it/s] 37%|███▋ | 4225/11526 [44:03<1:16:44, 1.59it/s] {'loss': 0.2673, 'grad_norm': 0.5988442897796631, 'learning_rate': 7.987592161838715e-06, 'epoch': 1.1}
37%|███▋ | 4225/11526 [44:03<1:16:44, 1.59it/s] 37%|███▋ | 4226/11526 [44:03<1:16:11, 1.60it/s] {'loss': 0.184, 'grad_norm': 0.4777023196220398, 'learning_rate': 7.986377765544646e-06, 'epoch': 1.1}
37%|███▋ | 4226/11526 [44:04<1:16:11, 1.60it/s] 37%|███▋ | 4227/11526 [44:04<1:15:46, 1.61it/s] {'loss': 0.1663, 'grad_norm': 0.4741024971008301, 'learning_rate': 7.985163095323016e-06, 'epoch': 1.1}
37%|███▋ | 4227/11526 [44:04<1:15:46, 1.61it/s] 37%|███▋ | 4228/11526 [44:05<1:15:24, 1.61it/s] {'loss': 0.2091, 'grad_norm': 0.5136925578117371, 'learning_rate': 7.983948151285242e-06, 'epoch': 1.1}
37%|███▋ | 4228/11526 [44:05<1:15:24, 1.61it/s] 37%|███▋ | 4229/11526 [44:05<1:15:10, 1.62it/s] {'loss': 0.2197, 'grad_norm': 0.5729277729988098, 'learning_rate': 7.982732933542767e-06, 'epoch': 1.1}
37%|███▋ | 4229/11526 [44:05<1:15:10, 1.62it/s] 37%|███▋ | 4230/11526 [44:06<1:14:59, 1.62it/s] {'loss': 0.2179, 'grad_norm': 0.5770925879478455, 'learning_rate': 7.981517442207055e-06, 'epoch': 1.1}
37%|███▋ | 4230/11526 [44:06<1:14:59, 1.62it/s] 37%|███▋ | 4231/11526 [44:07<1:14:56, 1.62it/s] {'loss': 0.2841, 'grad_norm': 0.6410744190216064, 'learning_rate': 7.980301677389598e-06, 'epoch': 1.1}
37%|███▋ | 4231/11526 [44:07<1:14:56, 1.62it/s] 37%|███▋ | 4232/11526 [44:07<1:14:48, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5246779322624207, 'learning_rate': 7.979085639201916e-06, 'epoch': 1.1}
37%|███▋ | 4232/11526 [44:07<1:14:48, 1.63it/s] 37%|███▋ | 4233/11526 [44:08<1:14:44, 1.63it/s] {'loss': 0.2366, 'grad_norm': 0.5738869905471802, 'learning_rate': 7.977869327755547e-06, 'epoch': 1.1}
37%|███▋ | 4233/11526 [44:08<1:14:44, 1.63it/s] 37%|███▋ | 4234/11526 [44:08<1:14:41, 1.63it/s] {'loss': 0.2153, 'grad_norm': 0.5289487242698669, 'learning_rate': 7.976652743162062e-06, 'epoch': 1.1}
37%|███▋ | 4234/11526 [44:09<1:14:41, 1.63it/s] 37%|███▋ | 4235/11526 [44:09<1:14:39, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.49654245376586914, 'learning_rate': 7.97543588553305e-06, 'epoch': 1.1}
37%|███▋ | 4235/11526 [44:09<1:14:39, 1.63it/s] 37%|███▋ | 4236/11526 [44:10<1:14:41, 1.63it/s] {'loss': 0.1945, 'grad_norm': 0.558733344078064, 'learning_rate': 7.97421875498013e-06, 'epoch': 1.1}
37%|███▋ | 4236/11526 [44:10<1:14:41, 1.63it/s] 37%|███▋ | 4237/11526 [44:10<1:14:38, 1.63it/s] {'loss': 0.2268, 'grad_norm': 0.6188108325004578, 'learning_rate': 7.973001351614943e-06, 'epoch': 1.1}
37%|███▋ | 4237/11526 [44:10<1:14:38, 1.63it/s] 37%|███▋ | 4238/11526 [44:11<1:14:34, 1.63it/s] {'loss': 0.2666, 'grad_norm': 0.5237649083137512, 'learning_rate': 7.971783675549154e-06, 'epoch': 1.1}
37%|███▋ | 4238/11526 [44:11<1:14:34, 1.63it/s] 37%|███▋ | 4239/11526 [44:11<1:14:36, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.5072370171546936, 'learning_rate': 7.97056572689446e-06, 'epoch': 1.1}
37%|███▋ | 4239/11526 [44:12<1:14:36, 1.63it/s] 37%|███▋ | 4240/11526 [44:12<1:14:34, 1.63it/s] {'loss': 0.2347, 'grad_norm': 0.5449449419975281, 'learning_rate': 7.969347505762574e-06, 'epoch': 1.1}
37%|███▋ | 4240/11526 [44:12<1:14:34, 1.63it/s] 37%|███▋ | 4241/11526 [44:13<1:14:37, 1.63it/s] {'loss': 0.2942, 'grad_norm': 0.6113754510879517, 'learning_rate': 7.96812901226524e-06, 'epoch': 1.1}
37%|███▋ | 4241/11526 [44:13<1:14:37, 1.63it/s] 37%|███▋ | 4242/11526 [44:13<1:14:37, 1.63it/s] {'loss': 0.1979, 'grad_norm': 0.5638747811317444, 'learning_rate': 7.966910246514225e-06, 'epoch': 1.1}
37%|███▋ | 4242/11526 [44:13<1:14:37, 1.63it/s] 37%|███▋ | 4243/11526 [44:14<1:14:34, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.5822088122367859, 'learning_rate': 7.965691208621322e-06, 'epoch': 1.1}
37%|███▋ | 4243/11526 [44:14<1:14:34, 1.63it/s] 37%|███▋ | 4244/11526 [44:15<1:14:33, 1.63it/s] {'loss': 0.1492, 'grad_norm': 0.436822772026062, 'learning_rate': 7.964471898698347e-06, 'epoch': 1.1}
37%|███▋ | 4244/11526 [44:15<1:14:33, 1.63it/s] 37%|███▋ | 4245/11526 [44:15<1:14:32, 1.63it/s] {'loss': 0.1872, 'grad_norm': 0.49332088232040405, 'learning_rate': 7.963252316857144e-06, 'epoch': 1.1}
37%|███▋ | 4245/11526 [44:15<1:14:32, 1.63it/s] 37%|███▋ | 4246/11526 [44:16<1:14:33, 1.63it/s] {'loss': 0.178, 'grad_norm': 0.5015227198600769, 'learning_rate': 7.962032463209576e-06, 'epoch': 1.11}
37%|███▋ | 4246/11526 [44:16<1:14:33, 1.63it/s] 37%|███▋ | 4247/11526 [44:16<1:14:33, 1.63it/s] {'loss': 0.1933, 'grad_norm': 0.48383983969688416, 'learning_rate': 7.960812337867539e-06, 'epoch': 1.11}
37%|███▋ | 4247/11526 [44:17<1:14:33, 1.63it/s] 37%|███▋ | 4248/11526 [44:17<1:14:33, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.5084644556045532, 'learning_rate': 7.959591940942946e-06, 'epoch': 1.11}
37%|███▋ | 4248/11526 [44:17<1:14:33, 1.63it/s] 37%|███▋ | 4249/11526 [44:18<1:14:37, 1.63it/s] {'loss': 0.2034, 'grad_norm': 0.5018383264541626, 'learning_rate': 7.958371272547742e-06, 'epoch': 1.11}
37%|███▋ | 4249/11526 [44:18<1:14:37, 1.63it/s] 37%|███▋ | 4250/11526 [44:18<1:14:34, 1.63it/s] {'loss': 0.1853, 'grad_norm': 0.4355396628379822, 'learning_rate': 7.957150332793892e-06, 'epoch': 1.11}
37%|███▋ | 4250/11526 [44:18<1:14:34, 1.63it/s] 37%|███▋ | 4251/11526 [44:19<1:14:33, 1.63it/s] {'loss': 0.1948, 'grad_norm': 0.573942244052887, 'learning_rate': 7.955929121793389e-06, 'epoch': 1.11}
37%|███▋ | 4251/11526 [44:19<1:14:33, 1.63it/s] 37%|███▋ | 4252/11526 [44:19<1:14:32, 1.63it/s] {'loss': 0.2259, 'grad_norm': 0.5228246450424194, 'learning_rate': 7.954707639658246e-06, 'epoch': 1.11}
37%|███▋ | 4252/11526 [44:20<1:14:32, 1.63it/s] 37%|███▋ | 4253/11526 [44:20<1:14:30, 1.63it/s] {'loss': 0.2268, 'grad_norm': 0.5889917016029358, 'learning_rate': 7.95348588650051e-06, 'epoch': 1.11}
37%|███▋ | 4253/11526 [44:20<1:14:30, 1.63it/s] 37%|███▋ | 4254/11526 [44:21<1:14:28, 1.63it/s] {'loss': 0.2105, 'grad_norm': 0.5393950939178467, 'learning_rate': 7.95226386243224e-06, 'epoch': 1.11}
37%|███▋ | 4254/11526 [44:21<1:14:28, 1.63it/s] 37%|███▋ | 4255/11526 [44:21<1:14:26, 1.63it/s] {'loss': 0.2106, 'grad_norm': 0.5311200618743896, 'learning_rate': 7.951041567565534e-06, 'epoch': 1.11}
37%|███▋ | 4255/11526 [44:21<1:14:26, 1.63it/s] 37%|███▋ | 4256/11526 [44:22<1:14:47, 1.62it/s] {'loss': 0.2633, 'grad_norm': 0.6397219896316528, 'learning_rate': 7.949819002012505e-06, 'epoch': 1.11}
37%|███▋ | 4256/11526 [44:22<1:14:47, 1.62it/s] 37%|███▋ | 4257/11526 [44:23<1:14:39, 1.62it/s] {'loss': 0.1955, 'grad_norm': 0.47029006481170654, 'learning_rate': 7.948596165885292e-06, 'epoch': 1.11}
37%|███▋ | 4257/11526 [44:23<1:14:39, 1.62it/s] 37%|███▋ | 4258/11526 [44:23<1:14:33, 1.62it/s] {'loss': 0.244, 'grad_norm': 0.570986807346344, 'learning_rate': 7.947373059296061e-06, 'epoch': 1.11}
37%|███▋ | 4258/11526 [44:23<1:14:33, 1.62it/s] 37%|███▋ | 4259/11526 [44:24<1:14:27, 1.63it/s] {'loss': 0.2108, 'grad_norm': 0.5649303793907166, 'learning_rate': 7.946149682357004e-06, 'epoch': 1.11}
37%|███▋ | 4259/11526 [44:24<1:14:27, 1.63it/s] 37%|███▋ | 4260/11526 [44:24<1:14:25, 1.63it/s] {'loss': 0.1967, 'grad_norm': 0.51922208070755, 'learning_rate': 7.944926035180336e-06, 'epoch': 1.11}
37%|███▋ | 4260/11526 [44:25<1:14:25, 1.63it/s] 37%|███▋ | 4261/11526 [44:25<1:14:26, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.5750104188919067, 'learning_rate': 7.943702117878294e-06, 'epoch': 1.11}
37%|███▋ | 4261/11526 [44:25<1:14:26, 1.63it/s] 37%|███▋ | 4262/11526 [44:26<1:14:25, 1.63it/s] {'loss': 0.1795, 'grad_norm': 0.4742863178253174, 'learning_rate': 7.942477930563146e-06, 'epoch': 1.11}
37%|███▋ | 4262/11526 [44:26<1:14:25, 1.63it/s] 37%|███▋ | 4263/11526 [44:26<1:14:26, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.483107328414917, 'learning_rate': 7.94125347334718e-06, 'epoch': 1.11}
37%|███▋ | 4263/11526 [44:26<1:14:26, 1.63it/s] 37%|███▋ | 4264/11526 [44:27<1:14:26, 1.63it/s] {'loss': 0.2954, 'grad_norm': 0.7939332127571106, 'learning_rate': 7.940028746342712e-06, 'epoch': 1.11}
37%|███▋ | 4264/11526 [44:27<1:14:26, 1.63it/s] 37%|███▋ | 4265/11526 [44:27<1:14:23, 1.63it/s] {'loss': 0.2425, 'grad_norm': 0.542183518409729, 'learning_rate': 7.938803749662079e-06, 'epoch': 1.11}
37%|███▋ | 4265/11526 [44:28<1:14:23, 1.63it/s] 37%|███▋ | 4266/11526 [44:28<1:14:27, 1.63it/s] {'loss': 0.1777, 'grad_norm': 0.4919421970844269, 'learning_rate': 7.937578483417642e-06, 'epoch': 1.11}
37%|███▋ | 4266/11526 [44:28<1:14:27, 1.63it/s] 37%|███▋ | 4267/11526 [44:29<1:14:24, 1.63it/s] {'loss': 0.1762, 'grad_norm': 0.4907155930995941, 'learning_rate': 7.936352947721795e-06, 'epoch': 1.11}
37%|███▋ | 4267/11526 [44:29<1:14:24, 1.63it/s] 37%|███▋ | 4268/11526 [44:29<1:14:22, 1.63it/s] {'loss': 0.1709, 'grad_norm': 0.5597339272499084, 'learning_rate': 7.935127142686949e-06, 'epoch': 1.11}
37%|███▋ | 4268/11526 [44:29<1:14:22, 1.63it/s] 37%|███▋ | 4269/11526 [44:30<1:14:22, 1.63it/s] {'loss': 0.1896, 'grad_norm': 0.5929450392723083, 'learning_rate': 7.933901068425539e-06, 'epoch': 1.11}
37%|███▋ | 4269/11526 [44:30<1:14:22, 1.63it/s] 37%|███▋ | 4270/11526 [44:31<1:14:22, 1.63it/s] {'loss': 0.2956, 'grad_norm': 0.6669577956199646, 'learning_rate': 7.932674725050032e-06, 'epoch': 1.11}
37%|███▋ | 4270/11526 [44:31<1:14:22, 1.63it/s] 37%|███▋ | 4271/11526 [44:31<1:14:26, 1.62it/s] {'loss': 0.2697, 'grad_norm': 0.6219680309295654, 'learning_rate': 7.93144811267291e-06, 'epoch': 1.11}
37%|███▋ | 4271/11526 [44:31<1:14:26, 1.62it/s] 37%|███▋ | 4272/11526 [44:32<1:14:24, 1.62it/s] {'loss': 0.1978, 'grad_norm': 0.6296505331993103, 'learning_rate': 7.93022123140669e-06, 'epoch': 1.11}
37%|███▋ | 4272/11526 [44:32<1:14:24, 1.62it/s] 37%|███▋ | 4273/11526 [44:32<1:14:21, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.5420436263084412, 'learning_rate': 7.928994081363908e-06, 'epoch': 1.11}
37%|███▋ | 4273/11526 [44:33<1:14:21, 1.63it/s] 37%|███▋ | 4274/11526 [44:33<1:14:22, 1.63it/s] {'loss': 0.1901, 'grad_norm': 0.51070636510849, 'learning_rate': 7.927766662657122e-06, 'epoch': 1.11}
37%|███▋ | 4274/11526 [44:33<1:14:22, 1.63it/s] 37%|███▋ | 4275/11526 [44:34<1:14:20, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.5177673697471619, 'learning_rate': 7.926538975398919e-06, 'epoch': 1.11}
37%|███▋ | 4275/11526 [44:34<1:14:20, 1.63it/s] 37%|███▋ | 4276/11526 [44:34<1:18:18, 1.54it/s] {'loss': 0.1785, 'grad_norm': 0.48070356249809265, 'learning_rate': 7.925311019701909e-06, 'epoch': 1.11}
37%|███▋ | 4276/11526 [44:34<1:18:18, 1.54it/s] 37%|███▋ | 4277/11526 [44:35<1:17:03, 1.57it/s] {'loss': 0.1605, 'grad_norm': 0.4959013760089874, 'learning_rate': 7.924082795678729e-06, 'epoch': 1.11}
37%|███▋ | 4277/11526 [44:35<1:17:03, 1.57it/s] 37%|███▋ | 4278/11526 [44:36<1:16:10, 1.59it/s] {'loss': 0.1965, 'grad_norm': 0.5267845988273621, 'learning_rate': 7.922854303442038e-06, 'epoch': 1.11}
37%|███▋ | 4278/11526 [44:36<1:16:10, 1.59it/s] 37%|███▋ | 4279/11526 [44:36<1:15:34, 1.60it/s] {'loss': 0.1947, 'grad_norm': 0.5386367440223694, 'learning_rate': 7.92162554310452e-06, 'epoch': 1.11}
37%|███▋ | 4279/11526 [44:36<1:15:34, 1.60it/s] 37%|███▋ | 4280/11526 [44:37<1:15:10, 1.61it/s] {'loss': 0.1388, 'grad_norm': 0.41622066497802734, 'learning_rate': 7.920396514778883e-06, 'epoch': 1.11}
37%|███▋ | 4280/11526 [44:37<1:15:10, 1.61it/s] 37%|███▋ | 4281/11526 [44:37<1:14:53, 1.61it/s] {'loss': 0.2037, 'grad_norm': 0.5229920744895935, 'learning_rate': 7.919167218577862e-06, 'epoch': 1.11}
37%|███▋ | 4281/11526 [44:38<1:14:53, 1.61it/s] 37%|███▋ | 4282/11526 [44:38<1:18:44, 1.53it/s] {'loss': 0.2059, 'grad_norm': 0.5707792043685913, 'learning_rate': 7.917937654614213e-06, 'epoch': 1.11}
37%|███▋ | 4282/11526 [44:38<1:18:44, 1.53it/s] 37%|███▋ | 4283/11526 [44:39<1:17:20, 1.56it/s] {'loss': 0.3067, 'grad_norm': 0.7056237459182739, 'learning_rate': 7.91670782300072e-06, 'epoch': 1.11}
37%|███▋ | 4283/11526 [44:39<1:17:20, 1.56it/s] 37%|███▋ | 4284/11526 [44:39<1:16:26, 1.58it/s] {'loss': 0.1892, 'grad_norm': 0.498462975025177, 'learning_rate': 7.915477723850192e-06, 'epoch': 1.12}
37%|███▋ | 4284/11526 [44:40<1:16:26, 1.58it/s] 37%|███▋ | 4285/11526 [44:40<1:15:44, 1.59it/s] {'loss': 0.2073, 'grad_norm': 0.4786842465400696, 'learning_rate': 7.914247357275458e-06, 'epoch': 1.12}
37%|███▋ | 4285/11526 [44:40<1:15:44, 1.59it/s] 37%|███▋ | 4286/11526 [44:41<1:15:14, 1.60it/s] {'loss': 0.2311, 'grad_norm': 0.6341710090637207, 'learning_rate': 7.913016723389375e-06, 'epoch': 1.12}
37%|███▋ | 4286/11526 [44:41<1:15:14, 1.60it/s] 37%|███▋ | 4287/11526 [44:41<1:15:18, 1.60it/s] {'loss': 0.1945, 'grad_norm': 0.5369309782981873, 'learning_rate': 7.911785822304823e-06, 'epoch': 1.12}
37%|███▋ | 4287/11526 [44:41<1:15:18, 1.60it/s] 37%|███▋ | 4288/11526 [44:42<1:14:57, 1.61it/s] {'loss': 0.1657, 'grad_norm': 0.4543599486351013, 'learning_rate': 7.910554654134708e-06, 'epoch': 1.12}
37%|███▋ | 4288/11526 [44:42<1:14:57, 1.61it/s] 37%|███▋ | 4289/11526 [44:42<1:14:47, 1.61it/s] {'loss': 0.2229, 'grad_norm': 0.4712790250778198, 'learning_rate': 7.909323218991961e-06, 'epoch': 1.12}
37%|███▋ | 4289/11526 [44:43<1:14:47, 1.61it/s] 37%|███▋ | 4290/11526 [44:43<1:14:35, 1.62it/s] {'loss': 0.1827, 'grad_norm': 0.5022124648094177, 'learning_rate': 7.908091516989533e-06, 'epoch': 1.12}
37%|███▋ | 4290/11526 [44:43<1:14:35, 1.62it/s] 37%|███▋ | 4291/11526 [44:44<1:14:27, 1.62it/s] {'loss': 0.1864, 'grad_norm': 0.5879376530647278, 'learning_rate': 7.906859548240408e-06, 'epoch': 1.12}
37%|███▋ | 4291/11526 [44:44<1:14:27, 1.62it/s] 37%|███▋ | 4292/11526 [44:44<1:14:24, 1.62it/s] {'loss': 0.2459, 'grad_norm': 0.7138120532035828, 'learning_rate': 7.905627312857582e-06, 'epoch': 1.12}
37%|███▋ | 4292/11526 [44:44<1:14:24, 1.62it/s] 37%|███▋ | 4293/11526 [44:45<1:14:18, 1.62it/s] {'loss': 0.1509, 'grad_norm': 0.39992576837539673, 'learning_rate': 7.90439481095409e-06, 'epoch': 1.12}
37%|███▋ | 4293/11526 [44:45<1:14:18, 1.62it/s] 37%|███▋ | 4294/11526 [44:46<1:14:16, 1.62it/s] {'loss': 0.1787, 'grad_norm': 0.4882949888706207, 'learning_rate': 7.90316204264298e-06, 'epoch': 1.12}
37%|███▋ | 4294/11526 [44:46<1:14:16, 1.62it/s] 37%|███▋ | 4295/11526 [44:46<1:14:11, 1.62it/s] {'loss': 0.1763, 'grad_norm': 0.5186259150505066, 'learning_rate': 7.901929008037327e-06, 'epoch': 1.12}
37%|███▋ | 4295/11526 [44:46<1:14:11, 1.62it/s] 37%|███▋ | 4296/11526 [44:47<1:14:08, 1.63it/s] {'loss': 0.2497, 'grad_norm': 0.6407511830329895, 'learning_rate': 7.900695707250234e-06, 'epoch': 1.12}
37%|███▋ | 4296/11526 [44:47<1:14:08, 1.63it/s] 37%|███▋ | 4297/11526 [44:47<1:14:10, 1.62it/s] {'loss': 0.2022, 'grad_norm': 0.5070221424102783, 'learning_rate': 7.899462140394829e-06, 'epoch': 1.12}
37%|███▋ | 4297/11526 [44:48<1:14:10, 1.62it/s] 37%|███▋ | 4298/11526 [44:48<1:14:07, 1.63it/s] {'loss': 0.173, 'grad_norm': 0.49783432483673096, 'learning_rate': 7.898228307584257e-06, 'epoch': 1.12}
37%|███▋ | 4298/11526 [44:48<1:14:07, 1.63it/s] 37%|███▋ | 4299/11526 [44:49<1:14:07, 1.62it/s] {'loss': 0.2003, 'grad_norm': 0.5729155540466309, 'learning_rate': 7.896994208931696e-06, 'epoch': 1.12}
37%|███▋ | 4299/11526 [44:49<1:14:07, 1.62it/s] 37%|███▋ | 4300/11526 [44:49<1:14:06, 1.62it/s] {'loss': 0.2221, 'grad_norm': 0.7475684881210327, 'learning_rate': 7.895759844550342e-06, 'epoch': 1.12}
37%|███▋ | 4300/11526 [44:49<1:14:06, 1.62it/s] 37%|███▋ | 4301/11526 [44:50<1:14:02, 1.63it/s] {'loss': 0.1718, 'grad_norm': 0.5565648078918457, 'learning_rate': 7.894525214553419e-06, 'epoch': 1.12}
37%|███▋ | 4301/11526 [44:50<1:14:02, 1.63it/s] 37%|███▋ | 4302/11526 [44:50<1:14:00, 1.63it/s] {'loss': 0.2055, 'grad_norm': 0.5400852560997009, 'learning_rate': 7.893290319054174e-06, 'epoch': 1.12}
37%|███▋ | 4302/11526 [44:51<1:14:00, 1.63it/s] 37%|███▋ | 4303/11526 [44:51<1:14:00, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.6139136552810669, 'learning_rate': 7.892055158165878e-06, 'epoch': 1.12}
37%|███▋ | 4303/11526 [44:51<1:14:00, 1.63it/s] 37%|███▋ | 4304/11526 [44:52<1:14:21, 1.62it/s] {'loss': 0.2159, 'grad_norm': 0.5297180414199829, 'learning_rate': 7.89081973200183e-06, 'epoch': 1.12}
37%|███▋ | 4304/11526 [44:52<1:14:21, 1.62it/s] 37%|███▋ | 4305/11526 [44:52<1:14:14, 1.62it/s] {'loss': 0.2618, 'grad_norm': 0.5940263867378235, 'learning_rate': 7.889584040675348e-06, 'epoch': 1.12}
37%|███▋ | 4305/11526 [44:52<1:14:14, 1.62it/s] 37%|███▋ | 4306/11526 [44:53<1:14:08, 1.62it/s] {'loss': 0.2332, 'grad_norm': 0.5101885795593262, 'learning_rate': 7.888348084299775e-06, 'epoch': 1.12}
37%|███▋ | 4306/11526 [44:53<1:14:08, 1.62it/s] 37%|███▋ | 4307/11526 [44:54<1:14:10, 1.62it/s] {'loss': 0.1921, 'grad_norm': 0.5008015036582947, 'learning_rate': 7.887111862988483e-06, 'epoch': 1.12}
37%|███▋ | 4307/11526 [44:54<1:14:10, 1.62it/s] 37%|███▋ | 4308/11526 [44:54<1:14:05, 1.62it/s] {'loss': 0.2043, 'grad_norm': 0.5014247894287109, 'learning_rate': 7.885875376854862e-06, 'epoch': 1.12}
37%|███▋ | 4308/11526 [44:54<1:14:05, 1.62it/s] 37%|███▋ | 4309/11526 [44:55<1:14:07, 1.62it/s] {'loss': 0.2505, 'grad_norm': 0.5532881617546082, 'learning_rate': 7.884638626012334e-06, 'epoch': 1.12}
37%|███▋ | 4309/11526 [44:55<1:14:07, 1.62it/s] 37%|███▋ | 4310/11526 [44:55<1:14:00, 1.63it/s] {'loss': 0.1901, 'grad_norm': 0.47335851192474365, 'learning_rate': 7.883401610574338e-06, 'epoch': 1.12}
37%|███▋ | 4310/11526 [44:56<1:14:00, 1.63it/s] 37%|███▋ | 4311/11526 [44:56<1:13:57, 1.63it/s] {'loss': 0.213, 'grad_norm': 0.5620380640029907, 'learning_rate': 7.882164330654338e-06, 'epoch': 1.12}
37%|███▋ | 4311/11526 [44:56<1:13:57, 1.63it/s] 37%|███▋ | 4312/11526 [44:57<1:14:00, 1.62it/s] {'loss': 0.2185, 'grad_norm': 0.51800137758255, 'learning_rate': 7.880926786365827e-06, 'epoch': 1.12}
37%|███▋ | 4312/11526 [44:57<1:14:00, 1.62it/s] 37%|███▋ | 4313/11526 [44:57<1:13:57, 1.63it/s] {'loss': 0.1637, 'grad_norm': 0.5019928216934204, 'learning_rate': 7.879688977822321e-06, 'epoch': 1.12}
37%|███▋ | 4313/11526 [44:57<1:13:57, 1.63it/s] 37%|███▋ | 4314/11526 [44:58<1:14:00, 1.62it/s] {'loss': 0.1676, 'grad_norm': 0.48230722546577454, 'learning_rate': 7.878450905137358e-06, 'epoch': 1.12}
37%|███▋ | 4314/11526 [44:58<1:14:00, 1.62it/s] 37%|███▋ | 4315/11526 [44:58<1:13:56, 1.63it/s] {'loss': 0.2165, 'grad_norm': 0.5168959498405457, 'learning_rate': 7.8772125684245e-06, 'epoch': 1.12}
37%|███▋ | 4315/11526 [44:59<1:13:56, 1.63it/s] 37%|███▋ | 4316/11526 [44:59<1:13:52, 1.63it/s] {'loss': 0.2438, 'grad_norm': 0.6713461875915527, 'learning_rate': 7.875973967797333e-06, 'epoch': 1.12}
37%|███▋ | 4316/11526 [44:59<1:13:52, 1.63it/s] 37%|███▋ | 4317/11526 [45:00<1:13:49, 1.63it/s] {'loss': 0.183, 'grad_norm': 0.48327454924583435, 'learning_rate': 7.87473510336947e-06, 'epoch': 1.12}
37%|███▋ | 4317/11526 [45:00<1:13:49, 1.63it/s] 37%|███▋ | 4318/11526 [45:00<1:13:49, 1.63it/s] {'loss': 0.2597, 'grad_norm': 0.6359960436820984, 'learning_rate': 7.873495975254548e-06, 'epoch': 1.12}
37%|███▋ | 4318/11526 [45:00<1:13:49, 1.63it/s] 37%|███▋ | 4319/11526 [45:01<1:13:51, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.524323046207428, 'learning_rate': 7.872256583566224e-06, 'epoch': 1.12}
37%|███▋ | 4319/11526 [45:01<1:13:51, 1.63it/s] 37%|███▋ | 4320/11526 [45:02<1:13:48, 1.63it/s] {'loss': 0.1992, 'grad_norm': 0.44901928305625916, 'learning_rate': 7.871016928418185e-06, 'epoch': 1.12}
37%|███▋ | 4320/11526 [45:02<1:13:48, 1.63it/s] 37%|███▋ | 4321/11526 [45:02<1:13:48, 1.63it/s] {'loss': 0.1924, 'grad_norm': 0.5113806128501892, 'learning_rate': 7.869777009924137e-06, 'epoch': 1.12}
37%|███▋ | 4321/11526 [45:02<1:13:48, 1.63it/s] 37%|███▋ | 4322/11526 [45:03<1:14:12, 1.62it/s] {'loss': 0.1919, 'grad_norm': 0.4973048269748688, 'learning_rate': 7.868536828197816e-06, 'epoch': 1.12}
37%|███▋ | 4322/11526 [45:03<1:14:12, 1.62it/s] 38%|███▊ | 4323/11526 [45:03<1:14:04, 1.62it/s] {'loss': 0.166, 'grad_norm': 0.4713442027568817, 'learning_rate': 7.867296383352974e-06, 'epoch': 1.13}
38%|███▊ | 4323/11526 [45:04<1:14:04, 1.62it/s] 38%|███▊ | 4324/11526 [45:04<1:14:01, 1.62it/s] {'loss': 0.1731, 'grad_norm': 0.4761435389518738, 'learning_rate': 7.866055675503393e-06, 'epoch': 1.13}
38%|███▊ | 4324/11526 [45:04<1:14:01, 1.62it/s] 38%|███▊ | 4325/11526 [45:05<1:13:53, 1.62it/s] {'loss': 0.1611, 'grad_norm': 0.4532773196697235, 'learning_rate': 7.864814704762877e-06, 'epoch': 1.13}
38%|███▊ | 4325/11526 [45:05<1:13:53, 1.62it/s] 38%|███▊ | 4326/11526 [45:05<1:13:49, 1.63it/s] {'loss': 0.1512, 'grad_norm': 0.4271910786628723, 'learning_rate': 7.863573471245259e-06, 'epoch': 1.13}
38%|███▊ | 4326/11526 [45:05<1:13:49, 1.63it/s] 38%|███▊ | 4327/11526 [45:06<1:13:54, 1.62it/s] {'loss': 0.2042, 'grad_norm': 0.5248308777809143, 'learning_rate': 7.862331975064389e-06, 'epoch': 1.13}
38%|███▊ | 4327/11526 [45:06<1:13:54, 1.62it/s] 38%|███▊ | 4328/11526 [45:06<1:13:50, 1.62it/s] {'loss': 0.2127, 'grad_norm': 0.5539354681968689, 'learning_rate': 7.86109021633414e-06, 'epoch': 1.13}
38%|███▊ | 4328/11526 [45:07<1:13:50, 1.62it/s] 38%|███▊ | 4329/11526 [45:07<1:13:51, 1.62it/s] {'loss': 0.172, 'grad_norm': 0.45757022500038147, 'learning_rate': 7.859848195168422e-06, 'epoch': 1.13}
38%|███▊ | 4329/11526 [45:07<1:13:51, 1.62it/s] 38%|███▊ | 4330/11526 [45:08<1:13:47, 1.63it/s] {'loss': 0.1917, 'grad_norm': 0.4896520674228668, 'learning_rate': 7.858605911681155e-06, 'epoch': 1.13}
38%|███▊ | 4330/11526 [45:08<1:13:47, 1.63it/s] 38%|███▊ | 4331/11526 [45:08<1:13:41, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.5635772347450256, 'learning_rate': 7.857363365986288e-06, 'epoch': 1.13}
38%|███▊ | 4331/11526 [45:08<1:13:41, 1.63it/s] 38%|███▊ | 4332/11526 [45:09<1:13:45, 1.63it/s] {'loss': 0.1741, 'grad_norm': 0.5219250321388245, 'learning_rate': 7.856120558197794e-06, 'epoch': 1.13}
38%|███▊ | 4332/11526 [45:09<1:13:45, 1.63it/s] 38%|███▊ | 4333/11526 [45:10<1:13:42, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.5515991449356079, 'learning_rate': 7.854877488429674e-06, 'epoch': 1.13}
38%|███▊ | 4333/11526 [45:10<1:13:42, 1.63it/s] 38%|███▊ | 4334/11526 [45:10<1:13:44, 1.63it/s] {'loss': 0.2103, 'grad_norm': 0.5705272555351257, 'learning_rate': 7.853634156795946e-06, 'epoch': 1.13}
38%|███▊ | 4334/11526 [45:10<1:13:44, 1.63it/s] 38%|███▊ | 4335/11526 [45:11<1:13:41, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.6605604887008667, 'learning_rate': 7.852390563410657e-06, 'epoch': 1.13}
38%|███▊ | 4335/11526 [45:11<1:13:41, 1.63it/s] 38%|███▊ | 4336/11526 [45:11<1:13:39, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.5872525572776794, 'learning_rate': 7.851146708387876e-06, 'epoch': 1.13}
38%|███▊ | 4336/11526 [45:12<1:13:39, 1.63it/s] 38%|███▊ | 4337/11526 [45:12<1:13:40, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5306509137153625, 'learning_rate': 7.849902591841696e-06, 'epoch': 1.13}
38%|███▊ | 4337/11526 [45:12<1:13:40, 1.63it/s] 38%|███▊ | 4338/11526 [45:13<1:13:38, 1.63it/s] {'loss': 0.1652, 'grad_norm': 0.45145004987716675, 'learning_rate': 7.848658213886238e-06, 'epoch': 1.13}
38%|███▊ | 4338/11526 [45:13<1:13:38, 1.63it/s] 38%|███▊ | 4339/11526 [45:13<1:13:37, 1.63it/s] {'loss': 0.2059, 'grad_norm': 0.5978443026542664, 'learning_rate': 7.847413574635638e-06, 'epoch': 1.13}
38%|███▊ | 4339/11526 [45:13<1:13:37, 1.63it/s] 38%|███▊ | 4340/11526 [45:14<1:13:35, 1.63it/s] {'loss': 0.2315, 'grad_norm': 0.5679399371147156, 'learning_rate': 7.846168674204064e-06, 'epoch': 1.13}
38%|███▊ | 4340/11526 [45:14<1:13:35, 1.63it/s] 38%|███▊ | 4341/11526 [45:14<1:13:33, 1.63it/s] {'loss': 0.1971, 'grad_norm': 0.5870610475540161, 'learning_rate': 7.844923512705704e-06, 'epoch': 1.13}
38%|███▊ | 4341/11526 [45:15<1:13:33, 1.63it/s] 38%|███▊ | 4342/11526 [45:15<1:13:36, 1.63it/s] {'loss': 0.2173, 'grad_norm': 0.5161023139953613, 'learning_rate': 7.843678090254773e-06, 'epoch': 1.13}
38%|███▊ | 4342/11526 [45:15<1:13:36, 1.63it/s] 38%|███▊ | 4343/11526 [45:16<1:13:35, 1.63it/s] {'loss': 0.2143, 'grad_norm': 0.617017388343811, 'learning_rate': 7.84243240696551e-06, 'epoch': 1.13}
38%|███▊ | 4343/11526 [45:16<1:13:35, 1.63it/s] 38%|███▊ | 4344/11526 [45:16<1:13:38, 1.63it/s] {'loss': 0.2382, 'grad_norm': 0.6104421019554138, 'learning_rate': 7.84118646295217e-06, 'epoch': 1.13}
38%|███▊ | 4344/11526 [45:16<1:13:38, 1.63it/s] 38%|███▊ | 4345/11526 [45:17<1:13:34, 1.63it/s] {'loss': 0.164, 'grad_norm': 0.4612290561199188, 'learning_rate': 7.839940258329045e-06, 'epoch': 1.13}
38%|███▊ | 4345/11526 [45:17<1:13:34, 1.63it/s] 38%|███▊ | 4346/11526 [45:18<1:13:33, 1.63it/s] {'loss': 0.2561, 'grad_norm': 0.7731278538703918, 'learning_rate': 7.83869379321044e-06, 'epoch': 1.13}
38%|███▊ | 4346/11526 [45:18<1:13:33, 1.63it/s] 38%|███▊ | 4347/11526 [45:18<1:13:36, 1.63it/s] {'loss': 0.2302, 'grad_norm': 0.6224877834320068, 'learning_rate': 7.83744706771069e-06, 'epoch': 1.13}
38%|███▊ | 4347/11526 [45:18<1:13:36, 1.63it/s] 38%|███▊ | 4348/11526 [45:19<1:13:34, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5027459263801575, 'learning_rate': 7.836200081944149e-06, 'epoch': 1.13}
38%|███▊ | 4348/11526 [45:19<1:13:34, 1.63it/s] 38%|███▊ | 4349/11526 [45:19<1:13:34, 1.63it/s] {'loss': 0.2196, 'grad_norm': 0.519388735294342, 'learning_rate': 7.8349528360252e-06, 'epoch': 1.13}
38%|███▊ | 4349/11526 [45:20<1:13:34, 1.63it/s] 38%|███▊ | 4350/11526 [45:20<1:13:32, 1.63it/s] {'loss': 0.2062, 'grad_norm': 0.5673927068710327, 'learning_rate': 7.833705330068244e-06, 'epoch': 1.13}
38%|███▊ | 4350/11526 [45:20<1:13:32, 1.63it/s] 38%|███▊ | 4351/11526 [45:21<1:13:30, 1.63it/s] {'loss': 0.2111, 'grad_norm': 0.5232966542243958, 'learning_rate': 7.832457564187715e-06, 'epoch': 1.13}
38%|███▊ | 4351/11526 [45:21<1:13:30, 1.63it/s] 38%|███▊ | 4352/11526 [45:21<1:13:32, 1.63it/s] {'loss': 0.2559, 'grad_norm': 0.6608912348747253, 'learning_rate': 7.83120953849806e-06, 'epoch': 1.13}
38%|███▊ | 4352/11526 [45:21<1:13:32, 1.63it/s] 38%|███▊ | 4353/11526 [45:22<1:13:29, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.5007588863372803, 'learning_rate': 7.829961253113761e-06, 'epoch': 1.13}
38%|███▊ | 4353/11526 [45:22<1:13:29, 1.63it/s] 38%|███▊ | 4354/11526 [45:22<1:13:34, 1.62it/s] {'loss': 0.199, 'grad_norm': 0.5302028656005859, 'learning_rate': 7.828712708149312e-06, 'epoch': 1.13}
38%|███▊ | 4354/11526 [45:23<1:13:34, 1.62it/s] 38%|███▊ | 4355/11526 [45:23<1:13:30, 1.63it/s] {'loss': 0.2576, 'grad_norm': 0.6502706408500671, 'learning_rate': 7.827463903719236e-06, 'epoch': 1.13}
38%|███▊ | 4355/11526 [45:23<1:13:30, 1.63it/s] 38%|███▊ | 4356/11526 [45:24<1:13:26, 1.63it/s] {'loss': 0.2752, 'grad_norm': 0.6079040169715881, 'learning_rate': 7.826214839938087e-06, 'epoch': 1.13}
38%|███▊ | 4356/11526 [45:24<1:13:26, 1.63it/s] 38%|███▊ | 4357/11526 [45:24<1:13:30, 1.63it/s] {'loss': 0.2282, 'grad_norm': 0.5878711938858032, 'learning_rate': 7.824965516920431e-06, 'epoch': 1.13}
38%|███▊ | 4357/11526 [45:24<1:13:30, 1.63it/s] 38%|███▊ | 4358/11526 [45:25<1:13:27, 1.63it/s] {'loss': 0.2197, 'grad_norm': 0.5318542122840881, 'learning_rate': 7.823715934780865e-06, 'epoch': 1.13}
38%|███▊ | 4358/11526 [45:25<1:13:27, 1.63it/s] 38%|███▊ | 4359/11526 [45:26<1:13:30, 1.63it/s] {'loss': 0.2503, 'grad_norm': 0.5974615216255188, 'learning_rate': 7.822466093634004e-06, 'epoch': 1.13}
38%|███▊ | 4359/11526 [45:26<1:13:30, 1.63it/s] 38%|███▊ | 4360/11526 [45:26<1:13:25, 1.63it/s] {'loss': 0.1868, 'grad_norm': 0.5022907853126526, 'learning_rate': 7.821215993594497e-06, 'epoch': 1.13}
38%|███▊ | 4360/11526 [45:26<1:13:25, 1.63it/s] 38%|███▊ | 4361/11526 [45:27<1:13:23, 1.63it/s] {'loss': 0.1927, 'grad_norm': 0.4932527542114258, 'learning_rate': 7.819965634777006e-06, 'epoch': 1.14}
38%|███▊ | 4361/11526 [45:27<1:13:23, 1.63it/s] 38%|███▊ | 4362/11526 [45:27<1:13:22, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.48502248525619507, 'learning_rate': 7.818715017296223e-06, 'epoch': 1.14}
38%|███▊ | 4362/11526 [45:27<1:13:22, 1.63it/s] 38%|███▊ | 4363/11526 [45:28<1:13:19, 1.63it/s] {'loss': 0.2078, 'grad_norm': 0.577257513999939, 'learning_rate': 7.81746414126686e-06, 'epoch': 1.14}
38%|███▊ | 4363/11526 [45:28<1:13:19, 1.63it/s] 38%|███▊ | 4364/11526 [45:29<1:13:21, 1.63it/s] {'loss': 0.2422, 'grad_norm': 0.6213303804397583, 'learning_rate': 7.816213006803655e-06, 'epoch': 1.14}
38%|███▊ | 4364/11526 [45:29<1:13:21, 1.63it/s] 38%|███▊ | 4365/11526 [45:29<1:13:17, 1.63it/s] {'loss': 0.1913, 'grad_norm': 0.4780694246292114, 'learning_rate': 7.81496161402137e-06, 'epoch': 1.14}
38%|███▊ | 4365/11526 [45:29<1:13:17, 1.63it/s] 38%|███▊ | 4366/11526 [45:30<1:13:19, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.547415018081665, 'learning_rate': 7.813709963034788e-06, 'epoch': 1.14}
38%|███▊ | 4366/11526 [45:30<1:13:19, 1.63it/s] 38%|███▊ | 4367/11526 [45:30<1:13:19, 1.63it/s] {'loss': 0.2043, 'grad_norm': 0.5402793884277344, 'learning_rate': 7.81245805395872e-06, 'epoch': 1.14}
38%|███▊ | 4367/11526 [45:31<1:13:19, 1.63it/s] 38%|███▊ | 4368/11526 [45:31<1:13:16, 1.63it/s] {'loss': 0.2173, 'grad_norm': 0.5724906921386719, 'learning_rate': 7.811205886907998e-06, 'epoch': 1.14}
38%|███▊ | 4368/11526 [45:31<1:13:16, 1.63it/s] 38%|███▊ | 4369/11526 [45:32<1:13:22, 1.63it/s] {'loss': 0.197, 'grad_norm': 0.521331250667572, 'learning_rate': 7.809953461997475e-06, 'epoch': 1.14}
38%|███▊ | 4369/11526 [45:32<1:13:22, 1.63it/s] 38%|███▊ | 4370/11526 [45:32<1:13:18, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.532658040523529, 'learning_rate': 7.808700779342033e-06, 'epoch': 1.14}
38%|███▊ | 4370/11526 [45:32<1:13:18, 1.63it/s] 38%|███▊ | 4371/11526 [45:33<1:13:14, 1.63it/s] {'loss': 0.198, 'grad_norm': 0.5332202911376953, 'learning_rate': 7.807447839056576e-06, 'epoch': 1.14}
38%|███▊ | 4371/11526 [45:33<1:13:14, 1.63it/s] 38%|███▊ | 4372/11526 [45:34<1:13:24, 1.62it/s] {'loss': 0.1687, 'grad_norm': 0.4684118926525116, 'learning_rate': 7.806194641256029e-06, 'epoch': 1.14}
38%|███▊ | 4372/11526 [45:34<1:13:24, 1.62it/s] 38%|███▊ | 4373/11526 [45:34<1:13:19, 1.63it/s] {'loss': 0.2229, 'grad_norm': 0.5474609732627869, 'learning_rate': 7.804941186055341e-06, 'epoch': 1.14}
38%|███▊ | 4373/11526 [45:34<1:13:19, 1.63it/s] 38%|███▊ | 4374/11526 [45:35<1:13:14, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5785812139511108, 'learning_rate': 7.803687473569491e-06, 'epoch': 1.14}
38%|███▊ | 4374/11526 [45:35<1:13:14, 1.63it/s] 38%|███▊ | 4375/11526 [45:35<1:13:14, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.5072490572929382, 'learning_rate': 7.80243350391347e-06, 'epoch': 1.14}
38%|███▊ | 4375/11526 [45:35<1:13:14, 1.63it/s] 38%|███▊ | 4376/11526 [45:36<1:13:14, 1.63it/s] {'loss': 0.1649, 'grad_norm': 0.46083542704582214, 'learning_rate': 7.801179277202306e-06, 'epoch': 1.14}
38%|███▊ | 4376/11526 [45:36<1:13:14, 1.63it/s] 38%|███▊ | 4377/11526 [45:37<1:13:18, 1.63it/s] {'loss': 0.2059, 'grad_norm': 0.598456621170044, 'learning_rate': 7.79992479355104e-06, 'epoch': 1.14}
38%|███▊ | 4377/11526 [45:37<1:13:18, 1.63it/s] 38%|███▊ | 4378/11526 [45:37<1:13:16, 1.63it/s] {'loss': 0.1934, 'grad_norm': 0.589884340763092, 'learning_rate': 7.798670053074739e-06, 'epoch': 1.14}
38%|███▊ | 4378/11526 [45:37<1:13:16, 1.63it/s] 38%|███▊ | 4379/11526 [45:38<1:13:16, 1.63it/s] {'loss': 0.1793, 'grad_norm': 0.47608327865600586, 'learning_rate': 7.797415055888498e-06, 'epoch': 1.14}
38%|███▊ | 4379/11526 [45:38<1:13:16, 1.63it/s] 38%|███▊ | 4380/11526 [45:38<1:13:12, 1.63it/s] {'loss': 0.2222, 'grad_norm': 0.6111418008804321, 'learning_rate': 7.796159802107432e-06, 'epoch': 1.14}
38%|███▊ | 4380/11526 [45:39<1:13:12, 1.63it/s] 38%|███▊ | 4381/11526 [45:39<1:13:12, 1.63it/s] {'loss': 0.208, 'grad_norm': 0.538481593132019, 'learning_rate': 7.794904291846679e-06, 'epoch': 1.14}
38%|███▊ | 4381/11526 [45:39<1:13:12, 1.63it/s] 38%|███▊ | 4382/11526 [45:40<1:13:33, 1.62it/s] {'loss': 0.2055, 'grad_norm': 0.5985985398292542, 'learning_rate': 7.793648525221403e-06, 'epoch': 1.14}
38%|███▊ | 4382/11526 [45:40<1:13:33, 1.62it/s] 38%|███▊ | 4383/11526 [45:40<1:13:22, 1.62it/s] {'loss': 0.1954, 'grad_norm': 0.5547814965248108, 'learning_rate': 7.792392502346789e-06, 'epoch': 1.14}
38%|███▊ | 4383/11526 [45:40<1:13:22, 1.62it/s] 38%|███▊ | 4384/11526 [45:41<1:13:22, 1.62it/s] {'loss': 0.2029, 'grad_norm': 0.5150478482246399, 'learning_rate': 7.791136223338045e-06, 'epoch': 1.14}
38%|███▊ | 4384/11526 [45:41<1:13:22, 1.62it/s] 38%|███▊ | 4385/11526 [45:42<1:13:16, 1.62it/s] {'loss': 0.1997, 'grad_norm': 0.5503224730491638, 'learning_rate': 7.789879688310409e-06, 'epoch': 1.14}
38%|███▊ | 4385/11526 [45:42<1:13:16, 1.62it/s] 38%|███▊ | 4386/11526 [45:42<1:13:11, 1.63it/s] {'loss': 0.1507, 'grad_norm': 0.6405810117721558, 'learning_rate': 7.78862289737913e-06, 'epoch': 1.14}
38%|███▊ | 4386/11526 [45:42<1:13:11, 1.63it/s] 38%|███▊ | 4387/11526 [45:43<1:13:18, 1.62it/s] {'loss': 0.2372, 'grad_norm': 0.5825710892677307, 'learning_rate': 7.787365850659494e-06, 'epoch': 1.14}
38%|███▊ | 4387/11526 [45:43<1:13:18, 1.62it/s] 38%|███▊ | 4388/11526 [45:43<1:13:16, 1.62it/s] {'loss': 0.2515, 'grad_norm': 0.5498157739639282, 'learning_rate': 7.786108548266803e-06, 'epoch': 1.14}
38%|███▊ | 4388/11526 [45:43<1:13:16, 1.62it/s] 38%|███▊ | 4389/11526 [45:44<1:13:30, 1.62it/s] {'loss': 0.2883, 'grad_norm': 0.5491591095924377, 'learning_rate': 7.784850990316384e-06, 'epoch': 1.14}
38%|███▊ | 4389/11526 [45:44<1:13:30, 1.62it/s] 38%|███▊ | 4390/11526 [45:45<1:13:23, 1.62it/s] {'loss': 0.1456, 'grad_norm': 0.4553103744983673, 'learning_rate': 7.783593176923585e-06, 'epoch': 1.14}
38%|███▊ | 4390/11526 [45:45<1:13:23, 1.62it/s] 38%|███▊ | 4391/11526 [45:45<1:13:20, 1.62it/s] {'loss': 0.2217, 'grad_norm': 0.5242800116539001, 'learning_rate': 7.782335108203784e-06, 'epoch': 1.14}
38%|███▊ | 4391/11526 [45:45<1:13:20, 1.62it/s] 38%|███▊ | 4392/11526 [45:46<1:13:20, 1.62it/s] {'loss': 0.1658, 'grad_norm': 0.48544472455978394, 'learning_rate': 7.781076784272377e-06, 'epoch': 1.14}
38%|███▊ | 4392/11526 [45:46<1:13:20, 1.62it/s] 38%|███▊ | 4393/11526 [45:46<1:13:16, 1.62it/s] {'loss': 0.2355, 'grad_norm': 0.6806809902191162, 'learning_rate': 7.779818205244781e-06, 'epoch': 1.14}
38%|███▊ | 4393/11526 [45:47<1:13:16, 1.62it/s] 38%|███▊ | 4394/11526 [45:47<1:13:17, 1.62it/s] {'loss': 0.2721, 'grad_norm': 0.6232020854949951, 'learning_rate': 7.778559371236445e-06, 'epoch': 1.14}
38%|███▊ | 4394/11526 [45:47<1:13:17, 1.62it/s] 38%|███▊ | 4395/11526 [45:48<1:13:10, 1.62it/s] {'loss': 0.2386, 'grad_norm': 0.6352280974388123, 'learning_rate': 7.777300282362832e-06, 'epoch': 1.14}
38%|███▊ | 4395/11526 [45:48<1:13:10, 1.62it/s] 38%|███▊ | 4396/11526 [45:48<1:13:08, 1.62it/s] {'loss': 0.1469, 'grad_norm': 0.5146787166595459, 'learning_rate': 7.776040938739435e-06, 'epoch': 1.14}
38%|███▊ | 4396/11526 [45:48<1:13:08, 1.62it/s] 38%|███▊ | 4397/11526 [45:49<1:13:23, 1.62it/s] {'loss': 0.2197, 'grad_norm': 0.590212881565094, 'learning_rate': 7.77478134048177e-06, 'epoch': 1.14}
38%|███▊ | 4397/11526 [45:49<1:13:23, 1.62it/s] 38%|███▊ | 4398/11526 [45:50<1:13:15, 1.62it/s] {'loss': 0.1978, 'grad_norm': 0.5137468576431274, 'learning_rate': 7.77352148770537e-06, 'epoch': 1.14}
38%|███▊ | 4398/11526 [45:50<1:13:15, 1.62it/s] 38%|███▊ | 4399/11526 [45:50<1:13:12, 1.62it/s] {'loss': 0.2583, 'grad_norm': 0.6365323066711426, 'learning_rate': 7.772261380525801e-06, 'epoch': 1.14}
38%|███▊ | 4399/11526 [45:50<1:13:12, 1.62it/s] 38%|███▊ | 4400/11526 [45:51<1:13:11, 1.62it/s] {'loss': 0.2358, 'grad_norm': 0.6049301624298096, 'learning_rate': 7.771001019058645e-06, 'epoch': 1.15}
38%|███▊ | 4400/11526 [45:51<1:13:11, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.6102961301803589, 'eval_runtime': 1.9552, 'eval_samples_per_second': 102.294, 'eval_steps_per_second': 6.649, 'epoch': 1.15}
38%|███▊ | 4400/11526 [45:53<1:13:11, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 38%|███▊ | 4401/11526 [45:53<2:22:56, 1.20s/it] {'loss': 0.2256, 'grad_norm': 0.6101135015487671, 'learning_rate': 7.769740403419506e-06, 'epoch': 1.15}
38%|███▊ | 4401/11526 [45:53<2:22:56, 1.20s/it] 38%|███▊ | 4402/11526 [45:54<2:01:55, 1.03s/it] {'loss': 0.2139, 'grad_norm': 0.5685417056083679, 'learning_rate': 7.76847953372402e-06, 'epoch': 1.15}
38%|███▊ | 4402/11526 [45:54<2:01:55, 1.03s/it] 38%|███▊ | 4403/11526 [45:55<1:47:12, 1.11it/s] {'loss': 0.1739, 'grad_norm': 0.5134730935096741, 'learning_rate': 7.76721841008784e-06, 'epoch': 1.15}
38%|███▊ | 4403/11526 [45:55<1:47:12, 1.11it/s] 38%|███▊ | 4404/11526 [45:55<1:36:56, 1.22it/s] {'loss': 0.2336, 'grad_norm': 0.629422128200531, 'learning_rate': 7.76595703262664e-06, 'epoch': 1.15}
38%|███▊ | 4404/11526 [45:55<1:36:56, 1.22it/s] 38%|███▊ | 4405/11526 [45:56<1:29:44, 1.32it/s] {'loss': 0.2021, 'grad_norm': 0.5555732250213623, 'learning_rate': 7.764695401456126e-06, 'epoch': 1.15}
38%|███▊ | 4405/11526 [45:56<1:29:44, 1.32it/s] 38%|███▊ | 4406/11526 [45:56<1:24:38, 1.40it/s] {'loss': 0.1623, 'grad_norm': 0.4765293598175049, 'learning_rate': 7.763433516692018e-06, 'epoch': 1.15}
38%|███▊ | 4406/11526 [45:57<1:24:38, 1.40it/s] 38%|███▊ | 4407/11526 [45:57<1:21:12, 1.46it/s] {'loss': 0.1843, 'grad_norm': 0.5675995349884033, 'learning_rate': 7.762171378450066e-06, 'epoch': 1.15}
38%|███▊ | 4407/11526 [45:57<1:21:12, 1.46it/s] 38%|███▊ | 4408/11526 [45:58<1:18:40, 1.51it/s] {'loss': 0.1657, 'grad_norm': 0.48701298236846924, 'learning_rate': 7.760908986846036e-06, 'epoch': 1.15}
38%|███▊ | 4408/11526 [45:58<1:18:40, 1.51it/s] 38%|███▊ | 4409/11526 [45:58<1:17:00, 1.54it/s] {'loss': 0.2231, 'grad_norm': 0.521003782749176, 'learning_rate': 7.759646341995728e-06, 'epoch': 1.15}
38%|███▊ | 4409/11526 [45:58<1:17:00, 1.54it/s] 38%|███▊ | 4410/11526 [45:59<1:15:44, 1.57it/s] {'loss': 0.2062, 'grad_norm': 0.517455518245697, 'learning_rate': 7.758383444014954e-06, 'epoch': 1.15}
38%|███▊ | 4410/11526 [45:59<1:15:44, 1.57it/s] 38%|███▊ | 4411/11526 [45:59<1:14:50, 1.58it/s] {'loss': 0.1855, 'grad_norm': 0.5175067186355591, 'learning_rate': 7.757120293019556e-06, 'epoch': 1.15}
38%|███▊ | 4411/11526 [46:00<1:14:50, 1.58it/s] 38%|███▊ | 4412/11526 [46:00<1:14:23, 1.59it/s] {'loss': 0.1948, 'grad_norm': 0.548445999622345, 'learning_rate': 7.755856889125399e-06, 'epoch': 1.15}
38%|███▊ | 4412/11526 [46:00<1:14:23, 1.59it/s] 38%|███▊ | 4413/11526 [46:01<1:13:54, 1.60it/s] {'loss': 0.2039, 'grad_norm': 0.5751815438270569, 'learning_rate': 7.754593232448364e-06, 'epoch': 1.15}
38%|███▊ | 4413/11526 [46:01<1:13:54, 1.60it/s] 38%|███▊ | 4414/11526 [46:01<1:13:40, 1.61it/s] {'loss': 0.189, 'grad_norm': 0.49260884523391724, 'learning_rate': 7.753329323104369e-06, 'epoch': 1.15}
38%|███▊ | 4414/11526 [46:01<1:13:40, 1.61it/s] 38%|███▊ | 4415/11526 [46:02<1:13:25, 1.61it/s] {'loss': 0.2477, 'grad_norm': 0.5531640648841858, 'learning_rate': 7.75206516120934e-06, 'epoch': 1.15}
38%|███▊ | 4415/11526 [46:02<1:13:25, 1.61it/s] 38%|███▊ | 4416/11526 [46:03<1:13:14, 1.62it/s] {'loss': 0.1935, 'grad_norm': 0.5053983330726624, 'learning_rate': 7.750800746879238e-06, 'epoch': 1.15}
38%|███▊ | 4416/11526 [46:03<1:13:14, 1.62it/s] 38%|███▊ | 4417/11526 [46:03<1:13:07, 1.62it/s] {'loss': 0.1607, 'grad_norm': 0.43673595786094666, 'learning_rate': 7.749536080230039e-06, 'epoch': 1.15}
38%|███▊ | 4417/11526 [46:03<1:13:07, 1.62it/s] 38%|███▊ | 4418/11526 [46:04<1:13:02, 1.62it/s] {'loss': 0.2493, 'grad_norm': 0.6401036381721497, 'learning_rate': 7.748271161377748e-06, 'epoch': 1.15}
38%|███▊ | 4418/11526 [46:04<1:13:02, 1.62it/s] 38%|███▊ | 4419/11526 [46:04<1:13:00, 1.62it/s] {'loss': 0.181, 'grad_norm': 0.48167601227760315, 'learning_rate': 7.747005990438386e-06, 'epoch': 1.15}
38%|███▊ | 4419/11526 [46:05<1:13:00, 1.62it/s] 38%|███▊ | 4420/11526 [46:05<1:12:53, 1.62it/s] {'loss': 0.2139, 'grad_norm': 0.5425388813018799, 'learning_rate': 7.745740567528006e-06, 'epoch': 1.15}
38%|███▊ | 4420/11526 [46:05<1:12:53, 1.62it/s] 38%|███▊ | 4421/11526 [46:06<1:12:49, 1.63it/s] {'loss': 0.2311, 'grad_norm': 0.5419411659240723, 'learning_rate': 7.74447489276268e-06, 'epoch': 1.15}
38%|███▊ | 4421/11526 [46:06<1:12:49, 1.63it/s] 38%|███▊ | 4422/11526 [46:06<1:12:55, 1.62it/s] {'loss': 0.2066, 'grad_norm': 0.49792248010635376, 'learning_rate': 7.7432089662585e-06, 'epoch': 1.15}
38%|███▊ | 4422/11526 [46:06<1:12:55, 1.62it/s] 38%|███▊ | 4423/11526 [46:07<1:12:51, 1.62it/s] {'loss': 0.2179, 'grad_norm': 0.47621360421180725, 'learning_rate': 7.741942788131585e-06, 'epoch': 1.15}
38%|███▊ | 4423/11526 [46:07<1:12:51, 1.62it/s] 38%|███▊ | 4424/11526 [46:07<1:12:51, 1.62it/s] {'loss': 0.1733, 'grad_norm': 0.6248130798339844, 'learning_rate': 7.740676358498079e-06, 'epoch': 1.15}
38%|███▊ | 4424/11526 [46:08<1:12:51, 1.62it/s] 38%|███▊ | 4425/11526 [46:08<1:12:48, 1.63it/s] {'loss': 0.2092, 'grad_norm': 0.6022685170173645, 'learning_rate': 7.739409677474141e-06, 'epoch': 1.15}
38%|███▊ | 4425/11526 [46:08<1:12:48, 1.63it/s] 38%|███▊ | 4426/11526 [46:09<1:12:45, 1.63it/s] {'loss': 0.2084, 'grad_norm': 0.5190178155899048, 'learning_rate': 7.738142745175961e-06, 'epoch': 1.15}
38%|███▊ | 4426/11526 [46:09<1:12:45, 1.63it/s] 38%|███▊ | 4427/11526 [46:09<1:12:47, 1.63it/s] {'loss': 0.2479, 'grad_norm': 0.6280694007873535, 'learning_rate': 7.73687556171975e-06, 'epoch': 1.15}
38%|███▊ | 4427/11526 [46:09<1:12:47, 1.63it/s] 38%|███▊ | 4428/11526 [46:10<1:12:49, 1.62it/s] {'loss': 0.1736, 'grad_norm': 0.45959100127220154, 'learning_rate': 7.73560812722174e-06, 'epoch': 1.15}
38%|███▊ | 4428/11526 [46:10<1:12:49, 1.62it/s] 38%|███▊ | 4429/11526 [46:11<1:12:49, 1.62it/s] {'loss': 0.1861, 'grad_norm': 0.5587545037269592, 'learning_rate': 7.734340441798187e-06, 'epoch': 1.15}
38%|███▊ | 4429/11526 [46:11<1:12:49, 1.62it/s] 38%|███▊ | 4430/11526 [46:11<1:12:47, 1.62it/s] {'loss': 0.2633, 'grad_norm': 0.5651487112045288, 'learning_rate': 7.733072505565371e-06, 'epoch': 1.15}
38%|███▊ | 4430/11526 [46:11<1:12:47, 1.62it/s] 38%|███▊ | 4431/11526 [46:12<1:12:44, 1.63it/s] {'loss': 0.2291, 'grad_norm': 0.5376551747322083, 'learning_rate': 7.731804318639595e-06, 'epoch': 1.15}
38%|███▊ | 4431/11526 [46:12<1:12:44, 1.63it/s] 38%|███▊ | 4432/11526 [46:12<1:12:43, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.6013729572296143, 'learning_rate': 7.730535881137183e-06, 'epoch': 1.15}
38%|███▊ | 4432/11526 [46:13<1:12:43, 1.63it/s] 38%|███▊ | 4433/11526 [46:13<1:12:42, 1.63it/s] {'loss': 0.204, 'grad_norm': 0.5180474519729614, 'learning_rate': 7.729267193174483e-06, 'epoch': 1.15}
38%|███▊ | 4433/11526 [46:13<1:12:42, 1.63it/s] 38%|███▊ | 4434/11526 [46:14<1:12:44, 1.62it/s] {'loss': 0.1704, 'grad_norm': 0.46633967757225037, 'learning_rate': 7.72799825486787e-06, 'epoch': 1.15}
38%|███▊ | 4434/11526 [46:14<1:12:44, 1.62it/s] 38%|███▊ | 4435/11526 [46:14<1:12:41, 1.63it/s] {'loss': 0.2026, 'grad_norm': 0.48473095893859863, 'learning_rate': 7.726729066333733e-06, 'epoch': 1.15}
38%|███▊ | 4435/11526 [46:14<1:12:41, 1.63it/s] 38%|███▊ | 4436/11526 [46:15<1:12:38, 1.63it/s] {'loss': 0.1906, 'grad_norm': 0.5915017127990723, 'learning_rate': 7.725459627688492e-06, 'epoch': 1.15}
38%|███▊ | 4436/11526 [46:15<1:12:38, 1.63it/s] 38%|███▊ | 4437/11526 [46:15<1:12:39, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.5080402493476868, 'learning_rate': 7.724189939048587e-06, 'epoch': 1.15}
38%|███▊ | 4437/11526 [46:16<1:12:39, 1.63it/s] 39%|███▊ | 4438/11526 [46:16<1:12:35, 1.63it/s] {'loss': 0.2155, 'grad_norm': 0.5094831585884094, 'learning_rate': 7.72292000053048e-06, 'epoch': 1.16}
39%|███▊ | 4438/11526 [46:16<1:12:35, 1.63it/s] 39%|███▊ | 4439/11526 [46:17<1:12:33, 1.63it/s] {'loss': 0.1887, 'grad_norm': 0.5044370293617249, 'learning_rate': 7.721649812250659e-06, 'epoch': 1.16}
39%|███▊ | 4439/11526 [46:17<1:12:33, 1.63it/s] 39%|███▊ | 4440/11526 [46:17<1:12:31, 1.63it/s] {'loss': 0.2071, 'grad_norm': 0.5817594528198242, 'learning_rate': 7.72037937432563e-06, 'epoch': 1.16}
39%|███▊ | 4440/11526 [46:17<1:12:31, 1.63it/s] 39%|███▊ | 4441/11526 [46:18<1:12:31, 1.63it/s] {'loss': 0.2069, 'grad_norm': 0.6018556356430054, 'learning_rate': 7.71910868687193e-06, 'epoch': 1.16}
39%|███▊ | 4441/11526 [46:18<1:12:31, 1.63it/s] 39%|███▊ | 4442/11526 [46:19<1:12:34, 1.63it/s] {'loss': 0.1936, 'grad_norm': 0.549115777015686, 'learning_rate': 7.717837750006106e-06, 'epoch': 1.16}
39%|███▊ | 4442/11526 [46:19<1:12:34, 1.63it/s] 39%|███▊ | 4443/11526 [46:19<1:12:32, 1.63it/s] {'loss': 0.222, 'grad_norm': 0.5688595771789551, 'learning_rate': 7.716566563844742e-06, 'epoch': 1.16}
39%|███▊ | 4443/11526 [46:19<1:12:32, 1.63it/s] 39%|███▊ | 4444/11526 [46:20<1:12:30, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.5217764377593994, 'learning_rate': 7.715295128504436e-06, 'epoch': 1.16}
39%|███▊ | 4444/11526 [46:20<1:12:30, 1.63it/s] 39%|███▊ | 4445/11526 [46:20<1:12:27, 1.63it/s] {'loss': 0.2065, 'grad_norm': 0.4953294098377228, 'learning_rate': 7.714023444101811e-06, 'epoch': 1.16}
39%|███▊ | 4445/11526 [46:21<1:12:27, 1.63it/s] 39%|███▊ | 4446/11526 [46:21<1:12:28, 1.63it/s] {'loss': 0.2527, 'grad_norm': 0.6007221341133118, 'learning_rate': 7.712751510753513e-06, 'epoch': 1.16}
39%|███▊ | 4446/11526 [46:21<1:12:28, 1.63it/s] 39%|███▊ | 4447/11526 [46:22<1:12:34, 1.63it/s] {'loss': 0.1869, 'grad_norm': 0.5421549677848816, 'learning_rate': 7.711479328576213e-06, 'epoch': 1.16}
39%|███▊ | 4447/11526 [46:22<1:12:34, 1.63it/s] 39%|███▊ | 4448/11526 [46:22<1:12:31, 1.63it/s] {'loss': 0.2077, 'grad_norm': 0.5417150259017944, 'learning_rate': 7.7102068976866e-06, 'epoch': 1.16}
39%|███▊ | 4448/11526 [46:22<1:12:31, 1.63it/s] 39%|███▊ | 4449/11526 [46:23<1:12:28, 1.63it/s] {'loss': 0.2213, 'grad_norm': 0.5477840304374695, 'learning_rate': 7.708934218201389e-06, 'epoch': 1.16}
39%|███▊ | 4449/11526 [46:23<1:12:28, 1.63it/s] 39%|███▊ | 4450/11526 [46:23<1:12:25, 1.63it/s] {'loss': 0.2541, 'grad_norm': 0.6906265020370483, 'learning_rate': 7.70766129023732e-06, 'epoch': 1.16}
39%|███▊ | 4450/11526 [46:24<1:12:25, 1.63it/s] 39%|███▊ | 4451/11526 [46:24<1:12:25, 1.63it/s] {'loss': 0.2079, 'grad_norm': 0.46381431818008423, 'learning_rate': 7.706388113911152e-06, 'epoch': 1.16}
39%|███▊ | 4451/11526 [46:24<1:12:25, 1.63it/s] 39%|███▊ | 4452/11526 [46:25<1:12:30, 1.63it/s] {'loss': 0.2366, 'grad_norm': 0.5638887286186218, 'learning_rate': 7.705114689339667e-06, 'epoch': 1.16}
39%|███▊ | 4452/11526 [46:25<1:12:30, 1.63it/s] 39%|███▊ | 4453/11526 [46:25<1:12:31, 1.63it/s] {'loss': 0.2065, 'grad_norm': 0.5751916170120239, 'learning_rate': 7.70384101663967e-06, 'epoch': 1.16}
39%|███▊ | 4453/11526 [46:25<1:12:31, 1.63it/s] 39%|███▊ | 4454/11526 [46:26<1:12:30, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.622690737247467, 'learning_rate': 7.702567095927994e-06, 'epoch': 1.16}
39%|███▊ | 4454/11526 [46:26<1:12:30, 1.63it/s] 39%|███▊ | 4455/11526 [46:27<1:12:28, 1.63it/s] {'loss': 0.2251, 'grad_norm': 0.8544338941574097, 'learning_rate': 7.701292927321483e-06, 'epoch': 1.16}
39%|███▊ | 4455/11526 [46:27<1:12:28, 1.63it/s] 39%|███▊ | 4456/11526 [46:27<1:12:26, 1.63it/s] {'loss': 0.1682, 'grad_norm': 0.42498287558555603, 'learning_rate': 7.700018510937015e-06, 'epoch': 1.16}
39%|███▊ | 4456/11526 [46:27<1:12:26, 1.63it/s] 39%|███▊ | 4457/11526 [46:28<1:12:27, 1.63it/s] {'loss': 0.2023, 'grad_norm': 0.5167121291160583, 'learning_rate': 7.698743846891488e-06, 'epoch': 1.16}
39%|███▊ | 4457/11526 [46:28<1:12:27, 1.63it/s] 39%|███▊ | 4458/11526 [46:28<1:12:26, 1.63it/s] {'loss': 0.1933, 'grad_norm': 0.4349765479564667, 'learning_rate': 7.697468935301822e-06, 'epoch': 1.16}
39%|███▊ | 4458/11526 [46:29<1:12:26, 1.63it/s] 39%|███▊ | 4459/11526 [46:29<1:12:32, 1.62it/s] {'loss': 0.1996, 'grad_norm': 0.6068305969238281, 'learning_rate': 7.696193776284954e-06, 'epoch': 1.16}
39%|███▊ | 4459/11526 [46:29<1:12:32, 1.62it/s] 39%|███▊ | 4460/11526 [46:30<1:12:27, 1.63it/s] {'loss': 0.1983, 'grad_norm': 0.5934961438179016, 'learning_rate': 7.694918369957856e-06, 'epoch': 1.16}
39%|███▊ | 4460/11526 [46:30<1:12:27, 1.63it/s] 39%|███▊ | 4461/11526 [46:30<1:12:21, 1.63it/s] {'loss': 0.2029, 'grad_norm': 0.5189311504364014, 'learning_rate': 7.693642716437508e-06, 'epoch': 1.16}
39%|███▊ | 4461/11526 [46:30<1:12:21, 1.63it/s] 39%|███▊ | 4462/11526 [46:31<1:12:50, 1.62it/s] {'loss': 0.1689, 'grad_norm': 0.4774225354194641, 'learning_rate': 7.692366815840927e-06, 'epoch': 1.16}
39%|███▊ | 4462/11526 [46:31<1:12:50, 1.62it/s] 39%|███▊ | 4463/11526 [46:31<1:12:39, 1.62it/s] {'loss': 0.2032, 'grad_norm': 0.5201148390769958, 'learning_rate': 7.691090668285139e-06, 'epoch': 1.16}
39%|███▊ | 4463/11526 [46:32<1:12:39, 1.62it/s] 39%|███▊ | 4464/11526 [46:32<1:12:37, 1.62it/s] {'loss': 0.3163, 'grad_norm': 0.7469417452812195, 'learning_rate': 7.689814273887206e-06, 'epoch': 1.16}
39%|███▊ | 4464/11526 [46:32<1:12:37, 1.62it/s] 39%|███▊ | 4465/11526 [46:33<1:12:31, 1.62it/s] {'loss': 0.2027, 'grad_norm': 0.549043595790863, 'learning_rate': 7.688537632764204e-06, 'epoch': 1.16}
39%|███▊ | 4465/11526 [46:33<1:12:31, 1.62it/s] 39%|███▊ | 4466/11526 [46:33<1:12:26, 1.62it/s] {'loss': 0.1722, 'grad_norm': 0.542693555355072, 'learning_rate': 7.68726074503323e-06, 'epoch': 1.16}
39%|███▊ | 4466/11526 [46:33<1:12:26, 1.62it/s] 39%|███▉ | 4467/11526 [46:34<1:12:34, 1.62it/s] {'loss': 0.2526, 'grad_norm': 0.578074038028717, 'learning_rate': 7.685983610811412e-06, 'epoch': 1.16}
39%|███▉ | 4467/11526 [46:34<1:12:34, 1.62it/s] 39%|███▉ | 4468/11526 [46:35<1:12:26, 1.62it/s] {'loss': 0.2496, 'grad_norm': 0.6156484484672546, 'learning_rate': 7.684706230215895e-06, 'epoch': 1.16}
39%|███▉ | 4468/11526 [46:35<1:12:26, 1.62it/s] 39%|███▉ | 4469/11526 [46:35<1:12:24, 1.62it/s] {'loss': 0.204, 'grad_norm': 0.5596733093261719, 'learning_rate': 7.683428603363849e-06, 'epoch': 1.16}
39%|███▉ | 4469/11526 [46:35<1:12:24, 1.62it/s] 39%|███▉ | 4470/11526 [46:36<1:12:21, 1.63it/s] {'loss': 0.1908, 'grad_norm': 0.5375909209251404, 'learning_rate': 7.68215073037246e-06, 'epoch': 1.16}
39%|███▉ | 4470/11526 [46:36<1:12:21, 1.63it/s] 39%|███▉ | 4471/11526 [46:36<1:12:17, 1.63it/s] {'loss': 0.2958, 'grad_norm': 0.7892978191375732, 'learning_rate': 7.680872611358947e-06, 'epoch': 1.16}
39%|███▉ | 4471/11526 [46:37<1:12:17, 1.63it/s] 39%|███▉ | 4472/11526 [46:37<1:12:15, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.6251571178436279, 'learning_rate': 7.679594246440545e-06, 'epoch': 1.16}
39%|███▉ | 4472/11526 [46:37<1:12:15, 1.63it/s] 39%|███▉ | 4473/11526 [46:38<1:12:17, 1.63it/s] {'loss': 0.243, 'grad_norm': 0.6215718388557434, 'learning_rate': 7.678315635734512e-06, 'epoch': 1.16}
39%|███▉ | 4473/11526 [46:38<1:12:17, 1.63it/s] 39%|███▉ | 4474/11526 [46:38<1:12:22, 1.62it/s] {'loss': 0.2322, 'grad_norm': 0.583587110042572, 'learning_rate': 7.67703677935813e-06, 'epoch': 1.16}
39%|███▉ | 4474/11526 [46:38<1:12:22, 1.62it/s] 39%|███▉ | 4475/11526 [46:39<1:12:16, 1.63it/s] {'loss': 0.2588, 'grad_norm': 0.5418509244918823, 'learning_rate': 7.675757677428702e-06, 'epoch': 1.16}
39%|███▉ | 4475/11526 [46:39<1:12:16, 1.63it/s] 39%|███▉ | 4476/11526 [46:39<1:12:15, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.43382394313812256, 'learning_rate': 7.674478330063558e-06, 'epoch': 1.17}
39%|███▉ | 4476/11526 [46:40<1:12:15, 1.63it/s] 39%|███▉ | 4477/11526 [46:40<1:12:19, 1.62it/s] {'loss': 0.3199, 'grad_norm': 0.7943550944328308, 'learning_rate': 7.67319873738004e-06, 'epoch': 1.17}
39%|███▉ | 4477/11526 [46:40<1:12:19, 1.62it/s] 39%|███▉ | 4478/11526 [46:41<1:12:15, 1.63it/s] {'loss': 0.2226, 'grad_norm': 0.5431994795799255, 'learning_rate': 7.671918899495528e-06, 'epoch': 1.17}
39%|███▉ | 4478/11526 [46:41<1:12:15, 1.63it/s] 39%|███▉ | 4479/11526 [46:41<1:12:18, 1.62it/s] {'loss': 0.1877, 'grad_norm': 0.5356970429420471, 'learning_rate': 7.67063881652741e-06, 'epoch': 1.17}
39%|███▉ | 4479/11526 [46:41<1:12:18, 1.62it/s] 39%|███▉ | 4480/11526 [46:42<1:12:15, 1.63it/s] {'loss': 0.21, 'grad_norm': 0.5279682874679565, 'learning_rate': 7.669358488593103e-06, 'epoch': 1.17}
39%|███▉ | 4480/11526 [46:42<1:12:15, 1.63it/s] 39%|███▉ | 4481/11526 [46:43<1:12:11, 1.63it/s] {'loss': 0.2238, 'grad_norm': 0.5548470616340637, 'learning_rate': 7.66807791581005e-06, 'epoch': 1.17}
39%|███▉ | 4481/11526 [46:43<1:12:11, 1.63it/s] 39%|███▉ | 4482/11526 [46:43<1:12:10, 1.63it/s] {'loss': 0.2556, 'grad_norm': 0.50179523229599, 'learning_rate': 7.666797098295711e-06, 'epoch': 1.17}
39%|███▉ | 4482/11526 [46:43<1:12:10, 1.63it/s] 39%|███▉ | 4483/11526 [46:44<1:12:09, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.5194060206413269, 'learning_rate': 7.665516036167567e-06, 'epoch': 1.17}
39%|███▉ | 4483/11526 [46:44<1:12:09, 1.63it/s] 39%|███▉ | 4484/11526 [46:44<1:12:11, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5486116409301758, 'learning_rate': 7.664234729543125e-06, 'epoch': 1.17}
39%|███▉ | 4484/11526 [46:45<1:12:11, 1.63it/s] 39%|███▉ | 4485/11526 [46:45<1:12:10, 1.63it/s] {'loss': 0.3136, 'grad_norm': 0.6642184257507324, 'learning_rate': 7.662953178539914e-06, 'epoch': 1.17}
39%|███▉ | 4485/11526 [46:45<1:12:10, 1.63it/s] 39%|███▉ | 4486/11526 [46:46<1:12:07, 1.63it/s] {'loss': 0.2372, 'grad_norm': 0.579173743724823, 'learning_rate': 7.661671383275489e-06, 'epoch': 1.17}
39%|███▉ | 4486/11526 [46:46<1:12:07, 1.63it/s] 39%|███▉ | 4487/11526 [46:46<1:12:10, 1.63it/s] {'loss': 0.2121, 'grad_norm': 0.5925828814506531, 'learning_rate': 7.660389343867418e-06, 'epoch': 1.17}
39%|███▉ | 4487/11526 [46:46<1:12:10, 1.63it/s] 39%|███▉ | 4488/11526 [46:47<1:12:02, 1.63it/s] {'loss': 0.2541, 'grad_norm': 0.5835579633712769, 'learning_rate': 7.659107060433299e-06, 'epoch': 1.17}
39%|███▉ | 4488/11526 [46:47<1:12:02, 1.63it/s] 39%|███▉ | 4489/11526 [46:47<1:12:05, 1.63it/s] {'loss': 0.2396, 'grad_norm': 0.5781698822975159, 'learning_rate': 7.657824533090753e-06, 'epoch': 1.17}
39%|███▉ | 4489/11526 [46:48<1:12:05, 1.63it/s] 39%|███▉ | 4490/11526 [46:48<1:12:04, 1.63it/s] {'loss': 0.2199, 'grad_norm': 0.5665683150291443, 'learning_rate': 7.656541761957416e-06, 'epoch': 1.17}
39%|███▉ | 4490/11526 [46:48<1:12:04, 1.63it/s] 39%|███▉ | 4491/11526 [46:49<1:12:00, 1.63it/s] {'loss': 0.2197, 'grad_norm': 0.51336669921875, 'learning_rate': 7.655258747150955e-06, 'epoch': 1.17}
39%|███▉ | 4491/11526 [46:49<1:12:00, 1.63it/s] 39%|███▉ | 4492/11526 [46:49<1:12:02, 1.63it/s] {'loss': 0.256, 'grad_norm': 0.5954098701477051, 'learning_rate': 7.653975488789054e-06, 'epoch': 1.17}
39%|███▉ | 4492/11526 [46:49<1:12:02, 1.63it/s] 39%|███▉ | 4493/11526 [46:50<1:12:06, 1.63it/s] {'loss': 0.2464, 'grad_norm': 0.5860292911529541, 'learning_rate': 7.652691986989422e-06, 'epoch': 1.17}
39%|███▉ | 4493/11526 [46:50<1:12:06, 1.63it/s] 39%|███▉ | 4494/11526 [46:51<1:12:04, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.573808491230011, 'learning_rate': 7.651408241869785e-06, 'epoch': 1.17}
39%|███▉ | 4494/11526 [46:51<1:12:04, 1.63it/s] 39%|███▉ | 4495/11526 [46:51<1:12:00, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.5015069842338562, 'learning_rate': 7.650124253547899e-06, 'epoch': 1.17}
39%|███▉ | 4495/11526 [46:51<1:12:00, 1.63it/s] 39%|███▉ | 4496/11526 [46:52<1:12:00, 1.63it/s] {'loss': 0.2183, 'grad_norm': 0.6096965074539185, 'learning_rate': 7.64884002214154e-06, 'epoch': 1.17}
39%|███▉ | 4496/11526 [46:52<1:12:00, 1.63it/s] 39%|███▉ | 4497/11526 [46:52<1:11:58, 1.63it/s] {'loss': 0.1752, 'grad_norm': 0.48319947719573975, 'learning_rate': 7.647555547768499e-06, 'epoch': 1.17}
39%|███▉ | 4497/11526 [46:53<1:11:58, 1.63it/s] 39%|███▉ | 4498/11526 [46:53<1:11:57, 1.63it/s] {'loss': 0.3103, 'grad_norm': 0.5918754935264587, 'learning_rate': 7.6462708305466e-06, 'epoch': 1.17}
39%|███▉ | 4498/11526 [46:53<1:11:57, 1.63it/s] 39%|███▉ | 4499/11526 [46:54<1:11:56, 1.63it/s] {'loss': 0.2189, 'grad_norm': 0.5103096961975098, 'learning_rate': 7.644985870593687e-06, 'epoch': 1.17}
39%|███▉ | 4499/11526 [46:54<1:11:56, 1.63it/s] 39%|███▉ | 4500/11526 [46:54<1:11:54, 1.63it/s] {'loss': 0.1684, 'grad_norm': 0.5003828406333923, 'learning_rate': 7.643700668027619e-06, 'epoch': 1.17}
39%|███▉ | 4500/11526 [46:54<1:11:54, 1.63it/s] 39%|███▉ | 4501/11526 [46:55<1:11:57, 1.63it/s] {'loss': 0.198, 'grad_norm': 0.5304433703422546, 'learning_rate': 7.642415222966283e-06, 'epoch': 1.17}
39%|███▉ | 4501/11526 [46:55<1:11:57, 1.63it/s] 39%|███▉ | 4502/11526 [46:55<1:11:57, 1.63it/s] {'loss': 0.1961, 'grad_norm': 0.5054289102554321, 'learning_rate': 7.641129535527587e-06, 'epoch': 1.17}
39%|███▉ | 4502/11526 [46:56<1:11:57, 1.63it/s] 39%|███▉ | 4503/11526 [46:56<1:11:56, 1.63it/s] {'loss': 0.1828, 'grad_norm': 0.5320274829864502, 'learning_rate': 7.639843605829465e-06, 'epoch': 1.17}
39%|███▉ | 4503/11526 [46:56<1:11:56, 1.63it/s] 39%|███▉ | 4504/11526 [46:57<1:11:56, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.523008406162262, 'learning_rate': 7.638557433989866e-06, 'epoch': 1.17}
39%|███▉ | 4504/11526 [46:57<1:11:56, 1.63it/s] 39%|███▉ | 4505/11526 [46:57<1:11:54, 1.63it/s] {'loss': 0.2472, 'grad_norm': 0.5716488361358643, 'learning_rate': 7.637271020126766e-06, 'epoch': 1.17}
39%|███▉ | 4505/11526 [46:57<1:11:54, 1.63it/s] 39%|███▉ | 4506/11526 [46:58<1:11:52, 1.63it/s] {'loss': 0.1692, 'grad_norm': 0.5232903957366943, 'learning_rate': 7.635984364358165e-06, 'epoch': 1.17}
39%|███▉ | 4506/11526 [46:58<1:11:52, 1.63it/s] 39%|███▉ | 4507/11526 [46:59<1:12:01, 1.62it/s] {'loss': 0.1805, 'grad_norm': 0.5197172164916992, 'learning_rate': 7.634697466802078e-06, 'epoch': 1.17}
39%|███▉ | 4507/11526 [46:59<1:12:01, 1.62it/s] 39%|███▉ | 4508/11526 [46:59<1:11:58, 1.63it/s] {'loss': 0.2213, 'grad_norm': 0.5484488606452942, 'learning_rate': 7.63341032757655e-06, 'epoch': 1.17}
39%|███▉ | 4508/11526 [46:59<1:11:58, 1.63it/s] 39%|███▉ | 4509/11526 [47:00<1:11:55, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.6355420351028442, 'learning_rate': 7.63212294679964e-06, 'epoch': 1.17}
39%|███▉ | 4509/11526 [47:00<1:11:55, 1.63it/s] 39%|███▉ | 4510/11526 [47:00<1:11:53, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5491510629653931, 'learning_rate': 7.630835324589441e-06, 'epoch': 1.17}
39%|███▉ | 4510/11526 [47:01<1:11:53, 1.63it/s] 39%|███▉ | 4511/11526 [47:01<1:11:52, 1.63it/s] {'loss': 0.2281, 'grad_norm': 0.5669065117835999, 'learning_rate': 7.629547461064054e-06, 'epoch': 1.17}
39%|███▉ | 4511/11526 [47:01<1:11:52, 1.63it/s] 39%|███▉ | 4512/11526 [47:02<1:11:58, 1.62it/s] {'loss': 0.2748, 'grad_norm': 0.6554185152053833, 'learning_rate': 7.628259356341614e-06, 'epoch': 1.17}
39%|███▉ | 4512/11526 [47:02<1:11:58, 1.62it/s] 39%|███▉ | 4513/11526 [47:02<1:11:56, 1.62it/s] {'loss': 0.2341, 'grad_norm': 0.584840714931488, 'learning_rate': 7.626971010540272e-06, 'epoch': 1.17}
39%|███▉ | 4513/11526 [47:02<1:11:56, 1.62it/s] 39%|███▉ | 4514/11526 [47:03<1:11:54, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.5673198699951172, 'learning_rate': 7.625682423778201e-06, 'epoch': 1.17}
39%|███▉ | 4514/11526 [47:03<1:11:54, 1.63it/s] 39%|███▉ | 4515/11526 [47:03<1:11:50, 1.63it/s] {'loss': 0.227, 'grad_norm': 0.707813024520874, 'learning_rate': 7.624393596173598e-06, 'epoch': 1.18}
39%|███▉ | 4515/11526 [47:04<1:11:50, 1.63it/s] 39%|███▉ | 4516/11526 [47:04<1:11:50, 1.63it/s] {'loss': 0.2698, 'grad_norm': 0.5317213535308838, 'learning_rate': 7.623104527844685e-06, 'epoch': 1.18}
39%|███▉ | 4516/11526 [47:04<1:11:50, 1.63it/s] 39%|███▉ | 4517/11526 [47:05<1:11:52, 1.63it/s] {'loss': 0.2059, 'grad_norm': 0.5634903311729431, 'learning_rate': 7.621815218909696e-06, 'epoch': 1.18}
39%|███▉ | 4517/11526 [47:05<1:11:52, 1.63it/s] 39%|███▉ | 4518/11526 [47:05<1:11:51, 1.63it/s] {'loss': 0.2261, 'grad_norm': 0.5466993451118469, 'learning_rate': 7.620525669486902e-06, 'epoch': 1.18}
39%|███▉ | 4518/11526 [47:05<1:11:51, 1.63it/s] 39%|███▉ | 4519/11526 [47:06<1:11:53, 1.62it/s] {'loss': 0.1659, 'grad_norm': 0.49332791566848755, 'learning_rate': 7.61923587969458e-06, 'epoch': 1.18}
39%|███▉ | 4519/11526 [47:06<1:11:53, 1.62it/s] 39%|███▉ | 4520/11526 [47:07<1:11:49, 1.63it/s] {'loss': 0.199, 'grad_norm': 0.53328537940979, 'learning_rate': 7.617945849651042e-06, 'epoch': 1.18}
39%|███▉ | 4520/11526 [47:07<1:11:49, 1.63it/s] 39%|███▉ | 4521/11526 [47:07<1:11:46, 1.63it/s] {'loss': 0.2201, 'grad_norm': 0.5902628898620605, 'learning_rate': 7.616655579474615e-06, 'epoch': 1.18}
39%|███▉ | 4521/11526 [47:07<1:11:46, 1.63it/s] 39%|███▉ | 4522/11526 [47:08<1:11:45, 1.63it/s] {'loss': 0.2312, 'grad_norm': 0.6282020807266235, 'learning_rate': 7.615365069283649e-06, 'epoch': 1.18}
39%|███▉ | 4522/11526 [47:08<1:11:45, 1.63it/s] 39%|███▉ | 4523/11526 [47:08<1:11:43, 1.63it/s] {'loss': 0.1512, 'grad_norm': 0.4179324805736542, 'learning_rate': 7.6140743191965195e-06, 'epoch': 1.18}
39%|███▉ | 4523/11526 [47:09<1:11:43, 1.63it/s] 39%|███▉ | 4524/11526 [47:09<1:11:50, 1.62it/s] {'loss': 0.198, 'grad_norm': 0.5151168704032898, 'learning_rate': 7.612783329331619e-06, 'epoch': 1.18}
39%|███▉ | 4524/11526 [47:09<1:11:50, 1.62it/s] 39%|███▉ | 4525/11526 [47:10<1:11:45, 1.63it/s] {'loss': 0.2108, 'grad_norm': 0.5648974180221558, 'learning_rate': 7.611492099807365e-06, 'epoch': 1.18}
39%|███▉ | 4525/11526 [47:10<1:11:45, 1.63it/s] 39%|███▉ | 4526/11526 [47:10<1:11:45, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.529696524143219, 'learning_rate': 7.610200630742197e-06, 'epoch': 1.18}
39%|███▉ | 4526/11526 [47:10<1:11:45, 1.63it/s] 39%|███▉ | 4527/11526 [47:11<1:11:46, 1.63it/s] {'loss': 0.1984, 'grad_norm': 0.5576267838478088, 'learning_rate': 7.608908922254577e-06, 'epoch': 1.18}
39%|███▉ | 4527/11526 [47:11<1:11:46, 1.63it/s] 39%|███▉ | 4528/11526 [47:11<1:11:43, 1.63it/s] {'loss': 0.2121, 'grad_norm': 0.46824896335601807, 'learning_rate': 7.607616974462985e-06, 'epoch': 1.18}
39%|███▉ | 4528/11526 [47:12<1:11:43, 1.63it/s] 39%|███▉ | 4529/11526 [47:12<1:11:45, 1.63it/s] {'loss': 0.1786, 'grad_norm': 0.5035425424575806, 'learning_rate': 7.606324787485928e-06, 'epoch': 1.18}
39%|███▉ | 4529/11526 [47:12<1:11:45, 1.63it/s] 39%|███▉ | 4530/11526 [47:13<1:11:45, 1.62it/s] {'loss': 0.2425, 'grad_norm': 0.5742655396461487, 'learning_rate': 7.605032361441933e-06, 'epoch': 1.18}
39%|███▉ | 4530/11526 [47:13<1:11:45, 1.62it/s] 39%|███▉ | 4531/11526 [47:13<1:11:42, 1.63it/s] {'loss': 0.1747, 'grad_norm': 0.4694085121154785, 'learning_rate': 7.603739696449547e-06, 'epoch': 1.18}
39%|███▉ | 4531/11526 [47:13<1:11:42, 1.63it/s] 39%|███▉ | 4532/11526 [47:14<1:11:45, 1.62it/s] {'loss': 0.2071, 'grad_norm': 0.5539829134941101, 'learning_rate': 7.60244679262734e-06, 'epoch': 1.18}
39%|███▉ | 4532/11526 [47:14<1:11:45, 1.62it/s] 39%|███▉ | 4533/11526 [47:15<1:11:41, 1.63it/s] {'loss': 0.2223, 'grad_norm': 0.5098680257797241, 'learning_rate': 7.601153650093908e-06, 'epoch': 1.18}
39%|███▉ | 4533/11526 [47:15<1:11:41, 1.63it/s] 39%|███▉ | 4534/11526 [47:15<1:11:43, 1.62it/s] {'loss': 0.3388, 'grad_norm': 0.5842590928077698, 'learning_rate': 7.599860268967863e-06, 'epoch': 1.18}
39%|███▉ | 4534/11526 [47:15<1:11:43, 1.62it/s] 39%|███▉ | 4535/11526 [47:16<1:11:36, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.60504150390625, 'learning_rate': 7.598566649367841e-06, 'epoch': 1.18}
39%|███▉ | 4535/11526 [47:16<1:11:36, 1.63it/s] 39%|███▉ | 4536/11526 [47:16<1:11:36, 1.63it/s] {'loss': 0.1852, 'grad_norm': 0.6234104633331299, 'learning_rate': 7.5972727914125e-06, 'epoch': 1.18}
39%|███▉ | 4536/11526 [47:17<1:11:36, 1.63it/s] 39%|███▉ | 4537/11526 [47:17<1:11:39, 1.63it/s] {'loss': 0.2269, 'grad_norm': 0.5558241009712219, 'learning_rate': 7.595978695220522e-06, 'epoch': 1.18}
39%|███▉ | 4537/11526 [47:17<1:11:39, 1.63it/s] 39%|███▉ | 4538/11526 [47:18<1:11:37, 1.63it/s] {'loss': 0.2076, 'grad_norm': 0.510050892829895, 'learning_rate': 7.5946843609106065e-06, 'epoch': 1.18}
39%|███▉ | 4538/11526 [47:18<1:11:37, 1.63it/s] 39%|███▉ | 4539/11526 [47:18<1:11:52, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.48276621103286743, 'learning_rate': 7.59338978860148e-06, 'epoch': 1.18}
39%|███▉ | 4539/11526 [47:18<1:11:52, 1.62it/s] 39%|███▉ | 4540/11526 [47:19<1:11:44, 1.62it/s] {'loss': 0.2347, 'grad_norm': 0.5723041296005249, 'learning_rate': 7.592094978411883e-06, 'epoch': 1.18}
39%|███▉ | 4540/11526 [47:19<1:11:44, 1.62it/s] 39%|███▉ | 4541/11526 [47:19<1:11:39, 1.62it/s] {'loss': 0.2712, 'grad_norm': 0.5870642066001892, 'learning_rate': 7.590799930460591e-06, 'epoch': 1.18}
39%|███▉ | 4541/11526 [47:20<1:11:39, 1.62it/s] 39%|███▉ | 4542/11526 [47:20<1:11:41, 1.62it/s] {'loss': 0.2, 'grad_norm': 0.5147961974143982, 'learning_rate': 7.5895046448663845e-06, 'epoch': 1.18}
39%|███▉ | 4542/11526 [47:20<1:11:41, 1.62it/s] 39%|███▉ | 4543/11526 [47:21<1:11:36, 1.63it/s] {'loss': 0.212, 'grad_norm': 0.5349381566047668, 'learning_rate': 7.588209121748079e-06, 'epoch': 1.18}
39%|███▉ | 4543/11526 [47:21<1:11:36, 1.63it/s] 39%|███▉ | 4544/11526 [47:21<1:11:39, 1.62it/s] {'loss': 0.2843, 'grad_norm': 0.6542953848838806, 'learning_rate': 7.586913361224506e-06, 'epoch': 1.18}
39%|███▉ | 4544/11526 [47:21<1:11:39, 1.62it/s] 39%|███▉ | 4545/11526 [47:22<1:11:34, 1.63it/s] {'loss': 0.2548, 'grad_norm': 0.6591137051582336, 'learning_rate': 7.585617363414524e-06, 'epoch': 1.18}
39%|███▉ | 4545/11526 [47:22<1:11:34, 1.63it/s] 39%|███▉ | 4546/11526 [47:23<1:11:32, 1.63it/s] {'loss': 0.2291, 'grad_norm': 0.5567545890808105, 'learning_rate': 7.584321128437002e-06, 'epoch': 1.18}
39%|███▉ | 4546/11526 [47:23<1:11:32, 1.63it/s] 39%|███▉ | 4547/11526 [47:23<1:11:30, 1.63it/s] {'loss': 0.2388, 'grad_norm': 0.5461793541908264, 'learning_rate': 7.583024656410842e-06, 'epoch': 1.18}
39%|███▉ | 4547/11526 [47:23<1:11:30, 1.63it/s] 39%|███▉ | 4548/11526 [47:24<1:11:29, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.4718206822872162, 'learning_rate': 7.581727947454966e-06, 'epoch': 1.18}
39%|███▉ | 4548/11526 [47:24<1:11:29, 1.63it/s] 39%|███▉ | 4549/11526 [47:24<1:11:32, 1.63it/s] {'loss': 0.1528, 'grad_norm': 0.490810364484787, 'learning_rate': 7.580431001688313e-06, 'epoch': 1.18}
39%|███▉ | 4549/11526 [47:25<1:11:32, 1.63it/s] 39%|███▉ | 4550/11526 [47:25<1:11:29, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.5148465633392334, 'learning_rate': 7.579133819229847e-06, 'epoch': 1.18}
39%|███▉ | 4550/11526 [47:25<1:11:29, 1.63it/s] 39%|███▉ | 4551/11526 [47:26<1:11:26, 1.63it/s] {'loss': 0.1778, 'grad_norm': 0.5031467080116272, 'learning_rate': 7.577836400198549e-06, 'epoch': 1.18}
39%|███▉ | 4551/11526 [47:26<1:11:26, 1.63it/s] 39%|███▉ | 4552/11526 [47:26<1:11:23, 1.63it/s] {'loss': 0.2406, 'grad_norm': 0.58241868019104, 'learning_rate': 7.576538744713432e-06, 'epoch': 1.18}
39%|███▉ | 4552/11526 [47:26<1:11:23, 1.63it/s] 40%|███▉ | 4553/11526 [47:27<1:11:24, 1.63it/s] {'loss': 0.1384, 'grad_norm': 0.41226571798324585, 'learning_rate': 7.575240852893521e-06, 'epoch': 1.19}
40%|███▉ | 4553/11526 [47:27<1:11:24, 1.63it/s] 40%|███▉ | 4554/11526 [47:27<1:11:27, 1.63it/s] {'loss': 0.2092, 'grad_norm': 0.6843805909156799, 'learning_rate': 7.573942724857866e-06, 'epoch': 1.19}
40%|███▉ | 4554/11526 [47:28<1:11:27, 1.63it/s] 40%|███▉ | 4555/11526 [47:28<1:11:23, 1.63it/s] {'loss': 0.2004, 'grad_norm': 0.5033668875694275, 'learning_rate': 7.572644360725538e-06, 'epoch': 1.19}
40%|███▉ | 4555/11526 [47:28<1:11:23, 1.63it/s] 40%|███▉ | 4556/11526 [47:29<1:11:19, 1.63it/s] {'loss': 0.1931, 'grad_norm': 0.5319515466690063, 'learning_rate': 7.5713457606156335e-06, 'epoch': 1.19}
40%|███▉ | 4556/11526 [47:29<1:11:19, 1.63it/s] 40%|███▉ | 4557/11526 [47:29<1:11:22, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.525076687335968, 'learning_rate': 7.570046924647265e-06, 'epoch': 1.19}
40%|███▉ | 4557/11526 [47:29<1:11:22, 1.63it/s] 40%|███▉ | 4558/11526 [47:30<1:11:21, 1.63it/s] {'loss': 0.198, 'grad_norm': 0.5339844822883606, 'learning_rate': 7.56874785293957e-06, 'epoch': 1.19}
40%|███▉ | 4558/11526 [47:30<1:11:21, 1.63it/s] 40%|███▉ | 4559/11526 [47:31<1:11:22, 1.63it/s] {'loss': 0.2255, 'grad_norm': 0.5841101408004761, 'learning_rate': 7.567448545611705e-06, 'epoch': 1.19}
40%|███▉ | 4559/11526 [47:31<1:11:22, 1.63it/s] 40%|███▉ | 4560/11526 [47:31<1:11:21, 1.63it/s] {'loss': 0.1868, 'grad_norm': 0.5132333040237427, 'learning_rate': 7.566149002782852e-06, 'epoch': 1.19}
40%|███▉ | 4560/11526 [47:31<1:11:21, 1.63it/s] 40%|███▉ | 4561/11526 [47:32<1:11:21, 1.63it/s] {'loss': 0.1919, 'grad_norm': 0.49144604802131653, 'learning_rate': 7.564849224572213e-06, 'epoch': 1.19}
40%|███▉ | 4561/11526 [47:32<1:11:21, 1.63it/s] 40%|███▉ | 4562/11526 [47:32<1:11:26, 1.62it/s] {'loss': 0.2717, 'grad_norm': 0.7378415465354919, 'learning_rate': 7.5635492110990075e-06, 'epoch': 1.19}
40%|███▉ | 4562/11526 [47:32<1:11:26, 1.62it/s] 40%|███▉ | 4563/11526 [47:33<1:11:23, 1.63it/s] {'loss': 0.3497, 'grad_norm': 0.6140624284744263, 'learning_rate': 7.562248962482483e-06, 'epoch': 1.19}
40%|███▉ | 4563/11526 [47:33<1:11:23, 1.63it/s] 40%|███▉ | 4564/11526 [47:34<1:11:20, 1.63it/s] {'loss': 0.1949, 'grad_norm': 0.48451557755470276, 'learning_rate': 7.560948478841905e-06, 'epoch': 1.19}
40%|███▉ | 4564/11526 [47:34<1:11:20, 1.63it/s] 40%|███▉ | 4565/11526 [47:34<1:11:16, 1.63it/s] {'loss': 0.2271, 'grad_norm': 0.5414625406265259, 'learning_rate': 7.559647760296562e-06, 'epoch': 1.19}
40%|███▉ | 4565/11526 [47:34<1:11:16, 1.63it/s] 40%|███▉ | 4566/11526 [47:35<1:11:14, 1.63it/s] {'loss': 0.3287, 'grad_norm': 0.6531484723091125, 'learning_rate': 7.5583468069657614e-06, 'epoch': 1.19}
40%|███▉ | 4566/11526 [47:35<1:11:14, 1.63it/s] 40%|███▉ | 4567/11526 [47:35<1:11:14, 1.63it/s] {'loss': 0.2607, 'grad_norm': 0.6999832987785339, 'learning_rate': 7.557045618968837e-06, 'epoch': 1.19}
40%|███▉ | 4567/11526 [47:36<1:11:14, 1.63it/s] 40%|███▉ | 4568/11526 [47:36<1:11:12, 1.63it/s] {'loss': 0.205, 'grad_norm': 0.5320615172386169, 'learning_rate': 7.555744196425138e-06, 'epoch': 1.19}
40%|███▉ | 4568/11526 [47:36<1:11:12, 1.63it/s] 40%|███▉ | 4569/11526 [47:37<1:11:12, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.5964834094047546, 'learning_rate': 7.554442539454041e-06, 'epoch': 1.19}
40%|███▉ | 4569/11526 [47:37<1:11:12, 1.63it/s] 40%|███▉ | 4570/11526 [47:37<1:11:15, 1.63it/s] {'loss': 0.165, 'grad_norm': 0.4900786280632019, 'learning_rate': 7.553140648174939e-06, 'epoch': 1.19}
40%|███▉ | 4570/11526 [47:37<1:11:15, 1.63it/s] 40%|███▉ | 4571/11526 [47:38<1:11:13, 1.63it/s] {'loss': 0.2401, 'grad_norm': 0.565691351890564, 'learning_rate': 7.55183852270725e-06, 'epoch': 1.19}
40%|███▉ | 4571/11526 [47:38<1:11:13, 1.63it/s] 40%|███▉ | 4572/11526 [47:39<1:11:22, 1.62it/s] {'loss': 0.2062, 'grad_norm': 0.483853816986084, 'learning_rate': 7.550536163170414e-06, 'epoch': 1.19}
40%|███▉ | 4572/11526 [47:39<1:11:22, 1.62it/s] 40%|███▉ | 4573/11526 [47:39<1:11:20, 1.62it/s] {'loss': 0.152, 'grad_norm': 0.432443767786026, 'learning_rate': 7.549233569683887e-06, 'epoch': 1.19}
40%|███▉ | 4573/11526 [47:39<1:11:20, 1.62it/s] 40%|███▉ | 4574/11526 [47:40<1:11:17, 1.63it/s] {'loss': 0.168, 'grad_norm': 0.5207617878913879, 'learning_rate': 7.547930742367153e-06, 'epoch': 1.19}
40%|███▉ | 4574/11526 [47:40<1:11:17, 1.63it/s] 40%|███▉ | 4575/11526 [47:40<1:11:17, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.475995808839798, 'learning_rate': 7.546627681339714e-06, 'epoch': 1.19}
40%|███▉ | 4575/11526 [47:40<1:11:17, 1.63it/s] 40%|███▉ | 4576/11526 [47:41<1:11:17, 1.62it/s] {'loss': 0.2503, 'grad_norm': 0.6145512461662292, 'learning_rate': 7.545324386721096e-06, 'epoch': 1.19}
40%|███▉ | 4576/11526 [47:41<1:11:17, 1.62it/s] 40%|███▉ | 4577/11526 [47:42<1:11:21, 1.62it/s] {'loss': 0.2589, 'grad_norm': 0.5689228773117065, 'learning_rate': 7.544020858630841e-06, 'epoch': 1.19}
40%|███▉ | 4577/11526 [47:42<1:11:21, 1.62it/s] 40%|███▉ | 4578/11526 [47:42<1:11:15, 1.63it/s] {'loss': 0.199, 'grad_norm': 0.5649188160896301, 'learning_rate': 7.5427170971885185e-06, 'epoch': 1.19}
40%|███▉ | 4578/11526 [47:42<1:11:15, 1.63it/s] 40%|███▉ | 4579/11526 [47:43<1:11:11, 1.63it/s] {'loss': 0.1832, 'grad_norm': 0.544620931148529, 'learning_rate': 7.541413102513717e-06, 'epoch': 1.19}
40%|███▉ | 4579/11526 [47:43<1:11:11, 1.63it/s] 40%|███▉ | 4580/11526 [47:43<1:11:09, 1.63it/s] {'loss': 0.2077, 'grad_norm': 0.5420085191726685, 'learning_rate': 7.540108874726047e-06, 'epoch': 1.19}
40%|███▉ | 4580/11526 [47:44<1:11:09, 1.63it/s] 40%|███▉ | 4581/11526 [47:44<1:11:07, 1.63it/s] {'loss': 0.175, 'grad_norm': 0.5624902248382568, 'learning_rate': 7.538804413945136e-06, 'epoch': 1.19}
40%|███▉ | 4581/11526 [47:44<1:11:07, 1.63it/s] 40%|███▉ | 4582/11526 [47:45<1:11:07, 1.63it/s] {'loss': 0.192, 'grad_norm': 0.4981358051300049, 'learning_rate': 7.537499720290638e-06, 'epoch': 1.19}
40%|███▉ | 4582/11526 [47:45<1:11:07, 1.63it/s] 40%|███▉ | 4583/11526 [47:45<1:11:05, 1.63it/s] {'loss': 0.175, 'grad_norm': 0.5304591059684753, 'learning_rate': 7.536194793882231e-06, 'epoch': 1.19}
40%|███▉ | 4583/11526 [47:45<1:11:05, 1.63it/s] 40%|███▉ | 4584/11526 [47:46<1:11:05, 1.63it/s] {'loss': 0.2452, 'grad_norm': 0.5755976438522339, 'learning_rate': 7.534889634839606e-06, 'epoch': 1.19}
40%|███▉ | 4584/11526 [47:46<1:11:05, 1.63it/s] 40%|███▉ | 4585/11526 [47:47<1:11:05, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.5154775381088257, 'learning_rate': 7.5335842432824794e-06, 'epoch': 1.19}
40%|███▉ | 4585/11526 [47:47<1:11:05, 1.63it/s] 40%|███▉ | 4586/11526 [47:47<1:11:03, 1.63it/s] {'loss': 0.1798, 'grad_norm': 0.561730682849884, 'learning_rate': 7.5322786193305905e-06, 'epoch': 1.19}
40%|███▉ | 4586/11526 [47:47<1:11:03, 1.63it/s] 40%|███▉ | 4587/11526 [47:48<1:11:02, 1.63it/s] {'loss': 0.2823, 'grad_norm': 0.6503381729125977, 'learning_rate': 7.530972763103701e-06, 'epoch': 1.19}
40%|███▉ | 4587/11526 [47:48<1:11:02, 1.63it/s] 40%|███▉ | 4588/11526 [47:48<1:11:00, 1.63it/s] {'loss': 0.2375, 'grad_norm': 0.5645028352737427, 'learning_rate': 7.529666674721588e-06, 'epoch': 1.19}
40%|███▉ | 4588/11526 [47:48<1:11:00, 1.63it/s] 40%|███▉ | 4589/11526 [47:49<1:10:58, 1.63it/s] {'loss': 0.2503, 'grad_norm': 0.6246272325515747, 'learning_rate': 7.528360354304054e-06, 'epoch': 1.19}
40%|███▉ | 4589/11526 [47:49<1:10:58, 1.63it/s] 40%|███▉ | 4590/11526 [47:50<1:10:58, 1.63it/s] {'loss': 0.1904, 'grad_norm': 0.5691344141960144, 'learning_rate': 7.5270538019709224e-06, 'epoch': 1.19}
40%|███▉ | 4590/11526 [47:50<1:10:58, 1.63it/s] 40%|███▉ | 4591/11526 [47:50<1:11:01, 1.63it/s] {'loss': 0.1828, 'grad_norm': 0.46865397691726685, 'learning_rate': 7.5257470178420376e-06, 'epoch': 1.19}
40%|███▉ | 4591/11526 [47:50<1:11:01, 1.63it/s] 40%|███▉ | 4592/11526 [47:51<1:10:58, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.5003287196159363, 'learning_rate': 7.5244400020372664e-06, 'epoch': 1.2}
40%|███▉ | 4592/11526 [47:51<1:10:58, 1.63it/s] 40%|███▉ | 4593/11526 [47:51<1:10:56, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.48022428154945374, 'learning_rate': 7.523132754676493e-06, 'epoch': 1.2}
40%|███▉ | 4593/11526 [47:52<1:10:56, 1.63it/s] 40%|███▉ | 4594/11526 [47:52<1:10:57, 1.63it/s] {'loss': 0.1944, 'grad_norm': 0.521143913269043, 'learning_rate': 7.521825275879629e-06, 'epoch': 1.2}
40%|███▉ | 4594/11526 [47:52<1:10:57, 1.63it/s] 40%|███▉ | 4595/11526 [47:53<1:10:55, 1.63it/s] {'loss': 0.1902, 'grad_norm': 0.5506625175476074, 'learning_rate': 7.520517565766601e-06, 'epoch': 1.2}
40%|███▉ | 4595/11526 [47:53<1:10:55, 1.63it/s] 40%|███▉ | 4596/11526 [47:53<1:10:56, 1.63it/s] {'loss': 0.2587, 'grad_norm': 0.6611841320991516, 'learning_rate': 7.519209624457362e-06, 'epoch': 1.2}
40%|███▉ | 4596/11526 [47:53<1:10:56, 1.63it/s] 40%|███▉ | 4597/11526 [47:54<1:10:54, 1.63it/s] {'loss': 0.2463, 'grad_norm': 0.6181154251098633, 'learning_rate': 7.51790145207188e-06, 'epoch': 1.2}
40%|███▉ | 4597/11526 [47:54<1:10:54, 1.63it/s] 40%|███▉ | 4598/11526 [47:54<1:10:55, 1.63it/s] {'loss': 0.2361, 'grad_norm': 0.6134901642799377, 'learning_rate': 7.516593048730154e-06, 'epoch': 1.2}
40%|███▉ | 4598/11526 [47:55<1:10:55, 1.63it/s] 40%|███▉ | 4599/11526 [47:55<1:10:52, 1.63it/s] {'loss': 0.2153, 'grad_norm': 0.5405537486076355, 'learning_rate': 7.515284414552193e-06, 'epoch': 1.2}
40%|███▉ | 4599/11526 [47:55<1:10:52, 1.63it/s] 40%|███▉ | 4600/11526 [47:56<1:10:54, 1.63it/s] {'loss': 0.2, 'grad_norm': 0.5460229516029358, 'learning_rate': 7.513975549658035e-06, 'epoch': 1.2}
40%|███▉ | 4600/11526 [47:56<1:10:54, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.33it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.6094086766242981, 'eval_runtime': 1.9576, 'eval_samples_per_second': 102.166, 'eval_steps_per_second': 6.641, 'epoch': 1.2}
40%|███▉ | 4600/11526 [47:58<1:10:54, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 40%|███▉ | 4601/11526 [47:58<2:18:51, 1.20s/it] {'loss': 0.1878, 'grad_norm': 0.5320327281951904, 'learning_rate': 7.512666454167734e-06, 'epoch': 1.2}
40%|███▉ | 4601/11526 [47:58<2:18:51, 1.20s/it] 40%|███▉ | 4602/11526 [47:59<1:58:27, 1.03s/it] {'loss': 0.3348, 'grad_norm': 0.5941023826599121, 'learning_rate': 7.5113571282013695e-06, 'epoch': 1.2}
40%|███▉ | 4602/11526 [47:59<1:58:27, 1.03s/it] 40%|███▉ | 4603/11526 [48:00<1:44:08, 1.11it/s] {'loss': 0.2374, 'grad_norm': 0.6208440065383911, 'learning_rate': 7.510047571879042e-06, 'epoch': 1.2}
40%|███▉ | 4603/11526 [48:00<1:44:08, 1.11it/s] 40%|███▉ | 4604/11526 [48:00<1:34:07, 1.23it/s] {'loss': 0.2584, 'grad_norm': 0.5815252065658569, 'learning_rate': 7.508737785320868e-06, 'epoch': 1.2}
40%|███▉ | 4604/11526 [48:00<1:34:07, 1.23it/s] 40%|███▉ | 4605/11526 [48:01<1:27:10, 1.32it/s] {'loss': 0.1967, 'grad_norm': 0.5322471261024475, 'learning_rate': 7.507427768646991e-06, 'epoch': 1.2}
40%|███▉ | 4605/11526 [48:01<1:27:10, 1.32it/s] 40%|███▉ | 4606/11526 [48:01<1:22:15, 1.40it/s] {'loss': 0.2124, 'grad_norm': 0.574622631072998, 'learning_rate': 7.506117521977572e-06, 'epoch': 1.2}
40%|███▉ | 4606/11526 [48:01<1:22:15, 1.40it/s] 40%|███▉ | 4607/11526 [48:02<1:18:47, 1.46it/s] {'loss': 0.2328, 'grad_norm': 0.5051283836364746, 'learning_rate': 7.504807045432795e-06, 'epoch': 1.2}
40%|███▉ | 4607/11526 [48:02<1:18:47, 1.46it/s] 40%|███▉ | 4608/11526 [48:03<1:16:23, 1.51it/s] {'loss': 0.2171, 'grad_norm': 0.5325761437416077, 'learning_rate': 7.503496339132863e-06, 'epoch': 1.2}
40%|███▉ | 4608/11526 [48:03<1:16:23, 1.51it/s] 40%|███▉ | 4609/11526 [48:03<1:14:42, 1.54it/s] {'loss': 0.2149, 'grad_norm': 0.5063133239746094, 'learning_rate': 7.502185403198004e-06, 'epoch': 1.2}
40%|███▉ | 4609/11526 [48:03<1:14:42, 1.54it/s] 40%|███▉ | 4610/11526 [48:04<1:13:32, 1.57it/s] {'loss': 0.2163, 'grad_norm': 0.5421308875083923, 'learning_rate': 7.500874237748462e-06, 'epoch': 1.2}
40%|███▉ | 4610/11526 [48:04<1:13:32, 1.57it/s] 40%|████ | 4611/11526 [48:04<1:12:42, 1.59it/s] {'loss': 0.1679, 'grad_norm': 0.45588937401771545, 'learning_rate': 7.499562842904506e-06, 'epoch': 1.2}
40%|████ | 4611/11526 [48:05<1:12:42, 1.59it/s] 40%|████ | 4612/11526 [48:05<1:12:05, 1.60it/s] {'loss': 0.1782, 'grad_norm': 0.4925858974456787, 'learning_rate': 7.498251218786423e-06, 'epoch': 1.2}
40%|████ | 4612/11526 [48:05<1:12:05, 1.60it/s] 40%|████ | 4613/11526 [48:06<1:11:40, 1.61it/s] {'loss': 0.2353, 'grad_norm': 0.680331826210022, 'learning_rate': 7.496939365514524e-06, 'epoch': 1.2}
40%|████ | 4613/11526 [48:06<1:11:40, 1.61it/s] 40%|████ | 4614/11526 [48:06<1:11:22, 1.61it/s] {'loss': 0.2207, 'grad_norm': 0.5773674845695496, 'learning_rate': 7.495627283209142e-06, 'epoch': 1.2}
40%|████ | 4614/11526 [48:06<1:11:22, 1.61it/s] 40%|████ | 4615/11526 [48:07<1:11:11, 1.62it/s] {'loss': 0.2046, 'grad_norm': 0.5825287699699402, 'learning_rate': 7.4943149719906226e-06, 'epoch': 1.2}
40%|████ | 4615/11526 [48:07<1:11:11, 1.62it/s] 40%|████ | 4616/11526 [48:07<1:11:01, 1.62it/s] {'loss': 0.2661, 'grad_norm': 0.6089974045753479, 'learning_rate': 7.493002431979344e-06, 'epoch': 1.2}
40%|████ | 4616/11526 [48:08<1:11:01, 1.62it/s] 40%|████ | 4617/11526 [48:08<1:10:55, 1.62it/s] {'loss': 0.1801, 'grad_norm': 0.4956374764442444, 'learning_rate': 7.491689663295696e-06, 'epoch': 1.2}
40%|████ | 4617/11526 [48:08<1:10:55, 1.62it/s] 40%|████ | 4618/11526 [48:09<1:10:52, 1.62it/s] {'loss': 0.2147, 'grad_norm': 0.5319060683250427, 'learning_rate': 7.490376666060096e-06, 'epoch': 1.2}
40%|████ | 4618/11526 [48:09<1:10:52, 1.62it/s] 40%|████ | 4619/11526 [48:09<1:10:48, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.6092166304588318, 'learning_rate': 7.489063440392979e-06, 'epoch': 1.2}
40%|████ | 4619/11526 [48:09<1:10:48, 1.63it/s] 40%|████ | 4620/11526 [48:10<1:10:44, 1.63it/s] {'loss': 0.1837, 'grad_norm': 0.5519968271255493, 'learning_rate': 7.487749986414801e-06, 'epoch': 1.2}
40%|████ | 4620/11526 [48:10<1:10:44, 1.63it/s] 40%|████ | 4621/11526 [48:11<1:10:41, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.5699295401573181, 'learning_rate': 7.486436304246039e-06, 'epoch': 1.2}
40%|████ | 4621/11526 [48:11<1:10:41, 1.63it/s] 40%|████ | 4622/11526 [48:11<1:10:38, 1.63it/s] {'loss': 0.2088, 'grad_norm': 0.5604458451271057, 'learning_rate': 7.4851223940071916e-06, 'epoch': 1.2}
40%|████ | 4622/11526 [48:11<1:10:38, 1.63it/s] 40%|████ | 4623/11526 [48:12<1:10:36, 1.63it/s] {'loss': 0.2941, 'grad_norm': 0.613429069519043, 'learning_rate': 7.483808255818779e-06, 'epoch': 1.2}
40%|████ | 4623/11526 [48:12<1:10:36, 1.63it/s] 40%|████ | 4624/11526 [48:12<1:10:36, 1.63it/s] {'loss': 0.1599, 'grad_norm': 0.4177069365978241, 'learning_rate': 7.482493889801341e-06, 'epoch': 1.2}
40%|████ | 4624/11526 [48:13<1:10:36, 1.63it/s] 40%|████ | 4625/11526 [48:13<1:10:35, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.39580702781677246, 'learning_rate': 7.481179296075438e-06, 'epoch': 1.2}
40%|████ | 4625/11526 [48:13<1:10:35, 1.63it/s] 40%|████ | 4626/11526 [48:14<1:10:36, 1.63it/s] {'loss': 0.2395, 'grad_norm': 0.6145123839378357, 'learning_rate': 7.4798644747616535e-06, 'epoch': 1.2}
40%|████ | 4626/11526 [48:14<1:10:36, 1.63it/s] 40%|████ | 4627/11526 [48:14<1:10:35, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.6064961552619934, 'learning_rate': 7.478549425980589e-06, 'epoch': 1.2}
40%|████ | 4627/11526 [48:14<1:10:35, 1.63it/s] 40%|████ | 4628/11526 [48:15<1:10:34, 1.63it/s] {'loss': 0.2111, 'grad_norm': 0.5447105169296265, 'learning_rate': 7.47723414985287e-06, 'epoch': 1.2}
40%|████ | 4628/11526 [48:15<1:10:34, 1.63it/s] 40%|████ | 4629/11526 [48:15<1:10:36, 1.63it/s] {'loss': 0.1994, 'grad_norm': 0.5092730522155762, 'learning_rate': 7.475918646499139e-06, 'epoch': 1.2}
40%|████ | 4629/11526 [48:16<1:10:36, 1.63it/s] 40%|████ | 4630/11526 [48:16<1:10:35, 1.63it/s] {'loss': 0.2621, 'grad_norm': 0.6783420443534851, 'learning_rate': 7.474602916040062e-06, 'epoch': 1.21}
40%|████ | 4630/11526 [48:16<1:10:35, 1.63it/s] 40%|████ | 4631/11526 [48:17<1:10:32, 1.63it/s] {'loss': 0.1876, 'grad_norm': 0.5526683926582336, 'learning_rate': 7.473286958596324e-06, 'epoch': 1.21}
40%|████ | 4631/11526 [48:17<1:10:32, 1.63it/s] 40%|████ | 4632/11526 [48:17<1:10:30, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.7025867700576782, 'learning_rate': 7.471970774288637e-06, 'epoch': 1.21}
40%|████ | 4632/11526 [48:17<1:10:30, 1.63it/s] 40%|████ | 4633/11526 [48:18<1:10:30, 1.63it/s] {'loss': 0.2323, 'grad_norm': 0.5782520771026611, 'learning_rate': 7.470654363237724e-06, 'epoch': 1.21}
40%|████ | 4633/11526 [48:18<1:10:30, 1.63it/s] 40%|████ | 4634/11526 [48:19<1:10:29, 1.63it/s] {'loss': 0.2208, 'grad_norm': 0.52076655626297, 'learning_rate': 7.469337725564334e-06, 'epoch': 1.21}
40%|████ | 4634/11526 [48:19<1:10:29, 1.63it/s] 40%|████ | 4635/11526 [48:19<1:10:28, 1.63it/s] {'loss': 0.2246, 'grad_norm': 0.599368691444397, 'learning_rate': 7.46802086138924e-06, 'epoch': 1.21}
40%|████ | 4635/11526 [48:19<1:10:28, 1.63it/s] 40%|████ | 4636/11526 [48:20<1:10:30, 1.63it/s] {'loss': 0.1979, 'grad_norm': 0.5038840174674988, 'learning_rate': 7.4667037708332305e-06, 'epoch': 1.21}
40%|████ | 4636/11526 [48:20<1:10:30, 1.63it/s] 40%|████ | 4637/11526 [48:20<1:10:27, 1.63it/s] {'loss': 0.2156, 'grad_norm': 0.5748270153999329, 'learning_rate': 7.465386454017115e-06, 'epoch': 1.21}
40%|████ | 4637/11526 [48:21<1:10:27, 1.63it/s] 40%|████ | 4638/11526 [48:21<1:10:28, 1.63it/s] {'loss': 0.1998, 'grad_norm': 0.5222418308258057, 'learning_rate': 7.464068911061726e-06, 'epoch': 1.21}
40%|████ | 4638/11526 [48:21<1:10:28, 1.63it/s] 40%|████ | 4639/11526 [48:22<1:10:27, 1.63it/s] {'loss': 0.224, 'grad_norm': 0.5353794097900391, 'learning_rate': 7.462751142087917e-06, 'epoch': 1.21}
40%|████ | 4639/11526 [48:22<1:10:27, 1.63it/s] 40%|████ | 4640/11526 [48:22<1:10:26, 1.63it/s] {'loss': 0.2524, 'grad_norm': 0.6139572858810425, 'learning_rate': 7.461433147216561e-06, 'epoch': 1.21}
40%|████ | 4640/11526 [48:22<1:10:26, 1.63it/s] 40%|████ | 4641/11526 [48:23<1:10:25, 1.63it/s] {'loss': 0.2537, 'grad_norm': 0.6088882684707642, 'learning_rate': 7.460114926568552e-06, 'epoch': 1.21}
40%|████ | 4641/11526 [48:23<1:10:25, 1.63it/s] 40%|████ | 4642/11526 [48:23<1:10:26, 1.63it/s] {'loss': 0.1769, 'grad_norm': 0.5058415532112122, 'learning_rate': 7.458796480264804e-06, 'epoch': 1.21}
40%|████ | 4642/11526 [48:24<1:10:26, 1.63it/s] 40%|████ | 4643/11526 [48:24<1:10:27, 1.63it/s] {'loss': 0.2266, 'grad_norm': 0.5386550426483154, 'learning_rate': 7.457477808426253e-06, 'epoch': 1.21}
40%|████ | 4643/11526 [48:24<1:10:27, 1.63it/s] 40%|████ | 4644/11526 [48:25<1:10:26, 1.63it/s] {'loss': 0.2246, 'grad_norm': 0.4827924966812134, 'learning_rate': 7.456158911173856e-06, 'epoch': 1.21}
40%|████ | 4644/11526 [48:25<1:10:26, 1.63it/s] 40%|████ | 4645/11526 [48:25<1:10:29, 1.63it/s] {'loss': 0.2371, 'grad_norm': 0.5768800377845764, 'learning_rate': 7.4548397886285875e-06, 'epoch': 1.21}
40%|████ | 4645/11526 [48:25<1:10:29, 1.63it/s] 40%|████ | 4646/11526 [48:26<1:10:29, 1.63it/s] {'loss': 0.204, 'grad_norm': 0.4776535928249359, 'learning_rate': 7.453520440911445e-06, 'epoch': 1.21}
40%|████ | 4646/11526 [48:26<1:10:29, 1.63it/s] 40%|████ | 4647/11526 [48:27<1:10:25, 1.63it/s] {'loss': 0.186, 'grad_norm': 0.5186001658439636, 'learning_rate': 7.45220086814345e-06, 'epoch': 1.21}
40%|████ | 4647/11526 [48:27<1:10:25, 1.63it/s] 40%|████ | 4648/11526 [48:27<1:10:23, 1.63it/s] {'loss': 0.2831, 'grad_norm': 0.7735000252723694, 'learning_rate': 7.450881070445638e-06, 'epoch': 1.21}
40%|████ | 4648/11526 [48:27<1:10:23, 1.63it/s] 40%|████ | 4649/11526 [48:28<1:10:26, 1.63it/s] {'loss': 0.1888, 'grad_norm': 0.4935811758041382, 'learning_rate': 7.449561047939069e-06, 'epoch': 1.21}
40%|████ | 4649/11526 [48:28<1:10:26, 1.63it/s] 40%|████ | 4650/11526 [48:28<1:10:23, 1.63it/s] {'loss': 0.2185, 'grad_norm': 0.5629934072494507, 'learning_rate': 7.448240800744824e-06, 'epoch': 1.21}
40%|████ | 4650/11526 [48:29<1:10:23, 1.63it/s] 40%|████ | 4651/11526 [48:29<1:10:20, 1.63it/s] {'loss': 0.2534, 'grad_norm': 0.5768975615501404, 'learning_rate': 7.4469203289840006e-06, 'epoch': 1.21}
40%|████ | 4651/11526 [48:29<1:10:20, 1.63it/s] 40%|████ | 4652/11526 [48:30<1:10:20, 1.63it/s] {'loss': 0.2678, 'grad_norm': 0.5857095718383789, 'learning_rate': 7.445599632777724e-06, 'epoch': 1.21}
40%|████ | 4652/11526 [48:30<1:10:20, 1.63it/s] 40%|████ | 4653/11526 [48:30<1:10:19, 1.63it/s] {'loss': 0.1906, 'grad_norm': 0.5279013514518738, 'learning_rate': 7.444278712247135e-06, 'epoch': 1.21}
40%|████ | 4653/11526 [48:30<1:10:19, 1.63it/s] 40%|████ | 4654/11526 [48:31<1:10:18, 1.63it/s] {'loss': 0.2238, 'grad_norm': 0.5745627880096436, 'learning_rate': 7.442957567513394e-06, 'epoch': 1.21}
40%|████ | 4654/11526 [48:31<1:10:18, 1.63it/s] 40%|████ | 4655/11526 [48:31<1:10:18, 1.63it/s] {'loss': 0.2124, 'grad_norm': 0.5146728157997131, 'learning_rate': 7.441636198697685e-06, 'epoch': 1.21}
40%|████ | 4655/11526 [48:32<1:10:18, 1.63it/s] 40%|████ | 4656/11526 [48:32<1:10:17, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.5313961505889893, 'learning_rate': 7.440314605921213e-06, 'epoch': 1.21}
40%|████ | 4656/11526 [48:32<1:10:17, 1.63it/s] 40%|████ | 4657/11526 [48:33<1:10:17, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5526767373085022, 'learning_rate': 7.438992789305198e-06, 'epoch': 1.21}
40%|████ | 4657/11526 [48:33<1:10:17, 1.63it/s] 40%|████ | 4658/11526 [48:33<1:10:16, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.5071448087692261, 'learning_rate': 7.437670748970889e-06, 'epoch': 1.21}
40%|████ | 4658/11526 [48:33<1:10:16, 1.63it/s] 40%|████ | 4659/11526 [48:34<1:10:14, 1.63it/s] {'loss': 0.2274, 'grad_norm': 0.5622314214706421, 'learning_rate': 7.436348485039549e-06, 'epoch': 1.21}
40%|████ | 4659/11526 [48:34<1:10:14, 1.63it/s] 40%|████ | 4660/11526 [48:35<1:10:11, 1.63it/s] {'loss': 0.2344, 'grad_norm': 0.5718665719032288, 'learning_rate': 7.4350259976324636e-06, 'epoch': 1.21}
40%|████ | 4660/11526 [48:35<1:10:11, 1.63it/s] 40%|████ | 4661/11526 [48:35<1:10:13, 1.63it/s] {'loss': 0.1879, 'grad_norm': 0.5140644311904907, 'learning_rate': 7.433703286870938e-06, 'epoch': 1.21}
40%|████ | 4661/11526 [48:35<1:10:13, 1.63it/s] 40%|████ | 4662/11526 [48:36<1:10:12, 1.63it/s] {'loss': 0.2641, 'grad_norm': 0.6929147243499756, 'learning_rate': 7.4323803528763e-06, 'epoch': 1.21}
40%|████ | 4662/11526 [48:36<1:10:12, 1.63it/s] 40%|████ | 4663/11526 [48:36<1:10:13, 1.63it/s] {'loss': 0.2773, 'grad_norm': 0.6217169761657715, 'learning_rate': 7.431057195769898e-06, 'epoch': 1.21}
40%|████ | 4663/11526 [48:36<1:10:13, 1.63it/s] 40%|████ | 4664/11526 [48:37<1:10:12, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.565336287021637, 'learning_rate': 7.4297338156730945e-06, 'epoch': 1.21}
40%|████ | 4664/11526 [48:37<1:10:12, 1.63it/s] 40%|████ | 4665/11526 [48:38<1:10:11, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.49467918276786804, 'learning_rate': 7.428410212707284e-06, 'epoch': 1.21}
40%|████ | 4665/11526 [48:38<1:10:11, 1.63it/s] 40%|████ | 4666/11526 [48:38<1:10:11, 1.63it/s] {'loss': 0.1924, 'grad_norm': 0.575498104095459, 'learning_rate': 7.4270863869938695e-06, 'epoch': 1.21}
40%|████ | 4666/11526 [48:38<1:10:11, 1.63it/s] 40%|████ | 4667/11526 [48:39<1:10:12, 1.63it/s] {'loss': 0.2114, 'grad_norm': 0.5921309590339661, 'learning_rate': 7.425762338654284e-06, 'epoch': 1.21}
40%|████ | 4667/11526 [48:39<1:10:12, 1.63it/s] 40%|████ | 4668/11526 [48:39<1:10:09, 1.63it/s] {'loss': 0.2111, 'grad_norm': 0.5256004333496094, 'learning_rate': 7.424438067809973e-06, 'epoch': 1.21}
40%|████ | 4668/11526 [48:40<1:10:09, 1.63it/s] 41%|████ | 4669/11526 [48:40<1:10:11, 1.63it/s] {'loss': 0.2267, 'grad_norm': 0.6298640966415405, 'learning_rate': 7.423113574582409e-06, 'epoch': 1.22}
41%|████ | 4669/11526 [48:40<1:10:11, 1.63it/s] 41%|████ | 4670/11526 [48:41<1:10:11, 1.63it/s] {'loss': 0.1999, 'grad_norm': 0.5825590491294861, 'learning_rate': 7.421788859093082e-06, 'epoch': 1.22}
41%|████ | 4670/11526 [48:41<1:10:11, 1.63it/s] 41%|████ | 4671/11526 [48:41<1:10:10, 1.63it/s] {'loss': 0.1932, 'grad_norm': 0.5834134221076965, 'learning_rate': 7.4204639214635e-06, 'epoch': 1.22}
41%|████ | 4671/11526 [48:41<1:10:10, 1.63it/s] 41%|████ | 4672/11526 [48:42<1:10:06, 1.63it/s] {'loss': 0.2289, 'grad_norm': 0.5573509335517883, 'learning_rate': 7.419138761815195e-06, 'epoch': 1.22}
41%|████ | 4672/11526 [48:42<1:10:06, 1.63it/s] 41%|████ | 4673/11526 [48:42<1:10:06, 1.63it/s] {'loss': 0.1637, 'grad_norm': 0.4690439701080322, 'learning_rate': 7.417813380269718e-06, 'epoch': 1.22}
41%|████ | 4673/11526 [48:43<1:10:06, 1.63it/s] 41%|████ | 4674/11526 [48:43<1:10:04, 1.63it/s] {'loss': 0.2164, 'grad_norm': 0.5580940842628479, 'learning_rate': 7.416487776948643e-06, 'epoch': 1.22}
41%|████ | 4674/11526 [48:43<1:10:04, 1.63it/s] 41%|████ | 4675/11526 [48:44<1:10:04, 1.63it/s] {'loss': 0.2154, 'grad_norm': 0.556473970413208, 'learning_rate': 7.415161951973559e-06, 'epoch': 1.22}
41%|████ | 4675/11526 [48:44<1:10:04, 1.63it/s] 41%|████ | 4676/11526 [48:44<1:10:04, 1.63it/s] {'loss': 0.2009, 'grad_norm': 0.5511571168899536, 'learning_rate': 7.413835905466078e-06, 'epoch': 1.22}
41%|████ | 4676/11526 [48:44<1:10:04, 1.63it/s] 41%|████ | 4677/11526 [48:45<1:10:08, 1.63it/s] {'loss': 0.1667, 'grad_norm': 0.4250086545944214, 'learning_rate': 7.412509637547835e-06, 'epoch': 1.22}
41%|████ | 4677/11526 [48:45<1:10:08, 1.63it/s] 41%|████ | 4678/11526 [48:46<1:10:06, 1.63it/s] {'loss': 0.2121, 'grad_norm': 0.5514451861381531, 'learning_rate': 7.4111831483404816e-06, 'epoch': 1.22}
41%|████ | 4678/11526 [48:46<1:10:06, 1.63it/s] 41%|████ | 4679/11526 [48:46<1:10:09, 1.63it/s] {'loss': 0.2757, 'grad_norm': 0.6367925405502319, 'learning_rate': 7.40985643796569e-06, 'epoch': 1.22}
41%|████ | 4679/11526 [48:46<1:10:09, 1.63it/s] 41%|████ | 4680/11526 [48:47<1:10:06, 1.63it/s] {'loss': 0.221, 'grad_norm': 0.5340457558631897, 'learning_rate': 7.408529506545153e-06, 'epoch': 1.22}
41%|████ | 4680/11526 [48:47<1:10:06, 1.63it/s] 41%|████ | 4681/11526 [48:47<1:10:03, 1.63it/s] {'loss': 0.2364, 'grad_norm': 0.5977463126182556, 'learning_rate': 7.407202354200587e-06, 'epoch': 1.22}
41%|████ | 4681/11526 [48:48<1:10:03, 1.63it/s] 41%|████ | 4682/11526 [48:48<1:10:01, 1.63it/s] {'loss': 0.3284, 'grad_norm': 0.6582651734352112, 'learning_rate': 7.405874981053726e-06, 'epoch': 1.22}
41%|████ | 4682/11526 [48:48<1:10:01, 1.63it/s] 41%|████ | 4683/11526 [48:49<1:10:01, 1.63it/s] {'loss': 0.2647, 'grad_norm': 0.6767250895500183, 'learning_rate': 7.404547387226322e-06, 'epoch': 1.22}
41%|████ | 4683/11526 [48:49<1:10:01, 1.63it/s] 41%|████ | 4684/11526 [48:49<1:09:59, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.5120015740394592, 'learning_rate': 7.403219572840149e-06, 'epoch': 1.22}
41%|████ | 4684/11526 [48:49<1:09:59, 1.63it/s] 41%|████ | 4685/11526 [48:50<1:10:00, 1.63it/s] {'loss': 0.235, 'grad_norm': 0.5801060199737549, 'learning_rate': 7.401891538017004e-06, 'epoch': 1.22}
41%|████ | 4685/11526 [48:50<1:10:00, 1.63it/s] 41%|████ | 4686/11526 [48:50<1:10:00, 1.63it/s] {'loss': 0.2046, 'grad_norm': 0.5270127058029175, 'learning_rate': 7.400563282878702e-06, 'epoch': 1.22}
41%|████ | 4686/11526 [48:51<1:10:00, 1.63it/s] 41%|████ | 4687/11526 [48:51<1:09:59, 1.63it/s] {'loss': 0.2015, 'grad_norm': 0.5239956974983215, 'learning_rate': 7.399234807547076e-06, 'epoch': 1.22}
41%|████ | 4687/11526 [48:51<1:09:59, 1.63it/s] 41%|████ | 4688/11526 [48:52<1:09:59, 1.63it/s] {'loss': 0.2566, 'grad_norm': 0.555554211139679, 'learning_rate': 7.397906112143982e-06, 'epoch': 1.22}
41%|████ | 4688/11526 [48:52<1:09:59, 1.63it/s] 41%|████ | 4689/11526 [48:52<1:10:00, 1.63it/s] {'loss': 0.1755, 'grad_norm': 0.5202894806861877, 'learning_rate': 7.396577196791296e-06, 'epoch': 1.22}
41%|████ | 4689/11526 [48:52<1:10:00, 1.63it/s] 41%|████ | 4690/11526 [48:53<1:09:57, 1.63it/s] {'loss': 0.1956, 'grad_norm': 0.5219477415084839, 'learning_rate': 7.3952480616109146e-06, 'epoch': 1.22}
41%|████ | 4690/11526 [48:53<1:09:57, 1.63it/s] 41%|████ | 4691/11526 [48:54<1:09:55, 1.63it/s] {'loss': 0.2216, 'grad_norm': 0.5325248837471008, 'learning_rate': 7.3939187067247505e-06, 'epoch': 1.22}
41%|████ | 4691/11526 [48:54<1:09:55, 1.63it/s] 41%|████ | 4692/11526 [48:54<1:09:57, 1.63it/s] {'loss': 0.2318, 'grad_norm': 0.5742175579071045, 'learning_rate': 7.392589132254744e-06, 'epoch': 1.22}
41%|████ | 4692/11526 [48:54<1:09:57, 1.63it/s] 41%|████ | 4693/11526 [48:55<1:09:56, 1.63it/s] {'loss': 0.1944, 'grad_norm': 0.4756685495376587, 'learning_rate': 7.391259338322847e-06, 'epoch': 1.22}
41%|████ | 4693/11526 [48:55<1:09:56, 1.63it/s] 41%|████ | 4694/11526 [48:55<1:09:54, 1.63it/s] {'loss': 0.2601, 'grad_norm': 0.6465257406234741, 'learning_rate': 7.389929325051039e-06, 'epoch': 1.22}
41%|████ | 4694/11526 [48:56<1:09:54, 1.63it/s] 41%|████ | 4695/11526 [48:56<1:09:55, 1.63it/s] {'loss': 0.2522, 'grad_norm': 0.5770804286003113, 'learning_rate': 7.388599092561315e-06, 'epoch': 1.22}
41%|████ | 4695/11526 [48:56<1:09:55, 1.63it/s] 41%|████ | 4696/11526 [48:57<1:09:52, 1.63it/s] {'loss': 0.1996, 'grad_norm': 0.5324037671089172, 'learning_rate': 7.387268640975692e-06, 'epoch': 1.22}
41%|████ | 4696/11526 [48:57<1:09:52, 1.63it/s] 41%|████ | 4697/11526 [48:57<1:09:50, 1.63it/s] {'loss': 0.2405, 'grad_norm': 0.5635653138160706, 'learning_rate': 7.385937970416206e-06, 'epoch': 1.22}
41%|████ | 4697/11526 [48:57<1:09:50, 1.63it/s] 41%|████ | 4698/11526 [48:58<1:09:51, 1.63it/s] {'loss': 0.2027, 'grad_norm': 0.4909321367740631, 'learning_rate': 7.384607081004914e-06, 'epoch': 1.22}
41%|████ | 4698/11526 [48:58<1:09:51, 1.63it/s] 41%|████ | 4699/11526 [48:58<1:09:50, 1.63it/s] {'loss': 0.2019, 'grad_norm': 0.504164457321167, 'learning_rate': 7.383275972863893e-06, 'epoch': 1.22}
41%|████ | 4699/11526 [48:59<1:09:50, 1.63it/s] 41%|████ | 4700/11526 [48:59<1:09:49, 1.63it/s] {'loss': 0.1862, 'grad_norm': 0.5112355947494507, 'learning_rate': 7.381944646115238e-06, 'epoch': 1.22}
41%|████ | 4700/11526 [48:59<1:09:49, 1.63it/s] 41%|████ | 4701/11526 [49:00<1:09:50, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.4392402172088623, 'learning_rate': 7.380613100881069e-06, 'epoch': 1.22}
41%|████ | 4701/11526 [49:00<1:09:50, 1.63it/s] 41%|████ | 4702/11526 [49:00<1:09:54, 1.63it/s] {'loss': 0.1831, 'grad_norm': 0.5239894986152649, 'learning_rate': 7.379281337283521e-06, 'epoch': 1.22}
41%|████ | 4702/11526 [49:00<1:09:54, 1.63it/s] 41%|████ | 4703/11526 [49:01<1:09:50, 1.63it/s] {'loss': 0.2212, 'grad_norm': 0.5531394481658936, 'learning_rate': 7.3779493554447504e-06, 'epoch': 1.22}
41%|████ | 4703/11526 [49:01<1:09:50, 1.63it/s] 41%|████ | 4704/11526 [49:02<1:09:52, 1.63it/s] {'loss': 0.251, 'grad_norm': 0.6128922700881958, 'learning_rate': 7.376617155486935e-06, 'epoch': 1.22}
41%|████ | 4704/11526 [49:02<1:09:52, 1.63it/s] 41%|████ | 4705/11526 [49:02<1:09:49, 1.63it/s] {'loss': 0.1976, 'grad_norm': 0.5206466913223267, 'learning_rate': 7.3752847375322725e-06, 'epoch': 1.22}
41%|████ | 4705/11526 [49:02<1:09:49, 1.63it/s] 41%|████ | 4706/11526 [49:03<1:09:49, 1.63it/s] {'loss': 0.1599, 'grad_norm': 0.42066290974617004, 'learning_rate': 7.3739521017029796e-06, 'epoch': 1.22}
41%|████ | 4706/11526 [49:03<1:09:49, 1.63it/s] 41%|████ | 4707/11526 [49:03<1:09:47, 1.63it/s] {'loss': 0.2168, 'grad_norm': 0.5563850998878479, 'learning_rate': 7.37261924812129e-06, 'epoch': 1.23}
41%|████ | 4707/11526 [49:04<1:09:47, 1.63it/s] 41%|████ | 4708/11526 [49:04<1:09:46, 1.63it/s] {'loss': 0.2395, 'grad_norm': 0.5980252623558044, 'learning_rate': 7.371286176909466e-06, 'epoch': 1.23}
41%|████ | 4708/11526 [49:04<1:09:46, 1.63it/s] 41%|████ | 4709/11526 [49:05<1:09:43, 1.63it/s] {'loss': 0.21, 'grad_norm': 0.519023597240448, 'learning_rate': 7.369952888189781e-06, 'epoch': 1.23}
41%|████ | 4709/11526 [49:05<1:09:43, 1.63it/s] 41%|████ | 4710/11526 [49:05<1:09:45, 1.63it/s] {'loss': 0.2215, 'grad_norm': 0.5720754861831665, 'learning_rate': 7.368619382084532e-06, 'epoch': 1.23}
41%|████ | 4710/11526 [49:05<1:09:45, 1.63it/s] 41%|████ | 4711/11526 [49:06<1:09:45, 1.63it/s] {'loss': 0.188, 'grad_norm': 0.47019270062446594, 'learning_rate': 7.367285658716037e-06, 'epoch': 1.23}
41%|████ | 4711/11526 [49:06<1:09:45, 1.63it/s] 41%|████ | 4712/11526 [49:06<1:09:47, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.4384952187538147, 'learning_rate': 7.365951718206632e-06, 'epoch': 1.23}
41%|████ | 4712/11526 [49:07<1:09:47, 1.63it/s] 41%|████ | 4713/11526 [49:07<1:09:47, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.5185733437538147, 'learning_rate': 7.364617560678673e-06, 'epoch': 1.23}
41%|████ | 4713/11526 [49:07<1:09:47, 1.63it/s] 41%|████ | 4714/11526 [49:08<1:09:43, 1.63it/s] {'loss': 0.2556, 'grad_norm': 0.7220249176025391, 'learning_rate': 7.3632831862545386e-06, 'epoch': 1.23}
41%|████ | 4714/11526 [49:08<1:09:43, 1.63it/s] 41%|████ | 4715/11526 [49:08<1:09:44, 1.63it/s] {'loss': 0.2123, 'grad_norm': 0.5239399671554565, 'learning_rate': 7.3619485950566224e-06, 'epoch': 1.23}
41%|████ | 4715/11526 [49:08<1:09:44, 1.63it/s] 41%|████ | 4716/11526 [49:09<1:09:42, 1.63it/s] {'loss': 0.2042, 'grad_norm': 0.5419527888298035, 'learning_rate': 7.360613787207344e-06, 'epoch': 1.23}
41%|████ | 4716/11526 [49:09<1:09:42, 1.63it/s] 41%|████ | 4717/11526 [49:10<1:09:44, 1.63it/s] {'loss': 0.2448, 'grad_norm': 0.6695840954780579, 'learning_rate': 7.359278762829136e-06, 'epoch': 1.23}
41%|████ | 4717/11526 [49:10<1:09:44, 1.63it/s] 41%|████ | 4718/11526 [49:10<1:09:40, 1.63it/s] {'loss': 0.2887, 'grad_norm': 0.6339332461357117, 'learning_rate': 7.357943522044456e-06, 'epoch': 1.23}
41%|████ | 4718/11526 [49:10<1:09:40, 1.63it/s] 41%|████ | 4719/11526 [49:11<1:09:42, 1.63it/s] {'loss': 0.2708, 'grad_norm': 0.5914446711540222, 'learning_rate': 7.356608064975781e-06, 'epoch': 1.23}
41%|████ | 4719/11526 [49:11<1:09:42, 1.63it/s] 41%|████ | 4720/11526 [49:11<1:09:41, 1.63it/s] {'loss': 0.237, 'grad_norm': 0.5628048181533813, 'learning_rate': 7.355272391745605e-06, 'epoch': 1.23}
41%|████ | 4720/11526 [49:11<1:09:41, 1.63it/s] 41%|████ | 4721/11526 [49:12<1:09:38, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.5079337954521179, 'learning_rate': 7.353936502476446e-06, 'epoch': 1.23}
41%|████ | 4721/11526 [49:12<1:09:38, 1.63it/s] 41%|████ | 4722/11526 [49:13<1:09:40, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.5278071761131287, 'learning_rate': 7.352600397290834e-06, 'epoch': 1.23}
41%|████ | 4722/11526 [49:13<1:09:40, 1.63it/s] 41%|████ | 4723/11526 [49:13<1:09:38, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.5650355219841003, 'learning_rate': 7.351264076311332e-06, 'epoch': 1.23}
41%|████ | 4723/11526 [49:13<1:09:38, 1.63it/s] 41%|████ | 4724/11526 [49:14<1:09:43, 1.63it/s] {'loss': 0.2184, 'grad_norm': 0.5585923790931702, 'learning_rate': 7.3499275396605084e-06, 'epoch': 1.23}
41%|████ | 4724/11526 [49:14<1:09:43, 1.63it/s] 41%|████ | 4725/11526 [49:14<1:09:41, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.6278653740882874, 'learning_rate': 7.348590787460962e-06, 'epoch': 1.23}
41%|████ | 4725/11526 [49:15<1:09:41, 1.63it/s] 41%|████ | 4726/11526 [49:15<1:09:41, 1.63it/s] {'loss': 0.1665, 'grad_norm': 0.5177180767059326, 'learning_rate': 7.3472538198353035e-06, 'epoch': 1.23}
41%|████ | 4726/11526 [49:15<1:09:41, 1.63it/s] 41%|████ | 4727/11526 [49:16<1:09:41, 1.63it/s] {'loss': 0.2043, 'grad_norm': 0.5273125767707825, 'learning_rate': 7.3459166369061694e-06, 'epoch': 1.23}
41%|████ | 4727/11526 [49:16<1:09:41, 1.63it/s] 41%|████ | 4728/11526 [49:16<1:09:42, 1.63it/s] {'loss': 0.2214, 'grad_norm': 0.6249099373817444, 'learning_rate': 7.344579238796216e-06, 'epoch': 1.23}
41%|████ | 4728/11526 [49:16<1:09:42, 1.63it/s] 41%|████ | 4729/11526 [49:17<1:09:39, 1.63it/s] {'loss': 0.2126, 'grad_norm': 0.5613365769386292, 'learning_rate': 7.343241625628112e-06, 'epoch': 1.23}
41%|████ | 4729/11526 [49:17<1:09:39, 1.63it/s] 41%|████ | 4730/11526 [49:18<1:09:38, 1.63it/s] {'loss': 0.2189, 'grad_norm': 0.48899295926094055, 'learning_rate': 7.341903797524556e-06, 'epoch': 1.23}
41%|████ | 4730/11526 [49:18<1:09:38, 1.63it/s] 41%|████ | 4731/11526 [49:18<1:09:35, 1.63it/s] {'loss': 0.1909, 'grad_norm': 0.47999516129493713, 'learning_rate': 7.3405657546082575e-06, 'epoch': 1.23}
41%|████ | 4731/11526 [49:18<1:09:35, 1.63it/s] 41%|████ | 4732/11526 [49:19<1:09:42, 1.62it/s] {'loss': 0.1733, 'grad_norm': 0.4969089925289154, 'learning_rate': 7.339227497001951e-06, 'epoch': 1.23}
41%|████ | 4732/11526 [49:19<1:09:42, 1.62it/s] 41%|████ | 4733/11526 [49:19<1:09:39, 1.63it/s] {'loss': 0.2476, 'grad_norm': 0.538421630859375, 'learning_rate': 7.3378890248283886e-06, 'epoch': 1.23}
41%|████ | 4733/11526 [49:19<1:09:39, 1.63it/s] 41%|████ | 4734/11526 [49:20<1:09:43, 1.62it/s] {'loss': 0.181, 'grad_norm': 0.48617202043533325, 'learning_rate': 7.336550338210343e-06, 'epoch': 1.23}
41%|████ | 4734/11526 [49:20<1:09:43, 1.62it/s] 41%|████ | 4735/11526 [49:21<1:09:39, 1.62it/s] {'loss': 0.2427, 'grad_norm': 0.5931749939918518, 'learning_rate': 7.335211437270606e-06, 'epoch': 1.23}
41%|████ | 4735/11526 [49:21<1:09:39, 1.62it/s] 41%|████ | 4736/11526 [49:21<1:09:35, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.5111019015312195, 'learning_rate': 7.333872322131989e-06, 'epoch': 1.23}
41%|████ | 4736/11526 [49:21<1:09:35, 1.63it/s] 41%|████ | 4737/11526 [49:22<1:09:36, 1.63it/s] {'loss': 0.2194, 'grad_norm': 0.5530562996864319, 'learning_rate': 7.332532992917323e-06, 'epoch': 1.23}
41%|████ | 4737/11526 [49:22<1:09:36, 1.63it/s] 41%|████ | 4738/11526 [49:22<1:09:35, 1.63it/s] {'loss': 0.2254, 'grad_norm': 0.5691665410995483, 'learning_rate': 7.3311934497494595e-06, 'epoch': 1.23}
41%|████ | 4738/11526 [49:23<1:09:35, 1.63it/s] 41%|████ | 4739/11526 [49:23<1:09:36, 1.62it/s] {'loss': 0.1553, 'grad_norm': 0.42949178814888, 'learning_rate': 7.32985369275127e-06, 'epoch': 1.23}
41%|████ | 4739/11526 [49:23<1:09:36, 1.62it/s] 41%|████ | 4740/11526 [49:24<1:09:33, 1.63it/s] {'loss': 0.237, 'grad_norm': 0.5565555691719055, 'learning_rate': 7.328513722045642e-06, 'epoch': 1.23}
41%|████ | 4740/11526 [49:24<1:09:33, 1.63it/s] 41%|████ | 4741/11526 [49:24<1:09:31, 1.63it/s] {'loss': 0.1748, 'grad_norm': 0.5898213386535645, 'learning_rate': 7.327173537755487e-06, 'epoch': 1.23}
41%|████ | 4741/11526 [49:24<1:09:31, 1.63it/s] 41%|████ | 4742/11526 [49:25<1:09:33, 1.63it/s] {'loss': 0.2035, 'grad_norm': 0.5695717930793762, 'learning_rate': 7.325833140003735e-06, 'epoch': 1.23}
41%|████ | 4742/11526 [49:25<1:09:33, 1.63it/s] 41%|████ | 4743/11526 [49:26<1:09:29, 1.63it/s] {'loss': 0.1661, 'grad_norm': 0.43334025144577026, 'learning_rate': 7.324492528913334e-06, 'epoch': 1.23}
41%|████ | 4743/11526 [49:26<1:09:29, 1.63it/s] 41%|████ | 4744/11526 [49:26<1:09:28, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.5083182454109192, 'learning_rate': 7.323151704607252e-06, 'epoch': 1.23}
41%|████ | 4744/11526 [49:26<1:09:28, 1.63it/s] 41%|████ | 4745/11526 [49:27<1:09:25, 1.63it/s] {'loss': 0.2186, 'grad_norm': 0.6123718023300171, 'learning_rate': 7.321810667208477e-06, 'epoch': 1.24}
41%|████ | 4745/11526 [49:27<1:09:25, 1.63it/s] 41%|████ | 4746/11526 [49:27<1:09:23, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.5606681704521179, 'learning_rate': 7.320469416840017e-06, 'epoch': 1.24}
41%|████ | 4746/11526 [49:27<1:09:23, 1.63it/s] 41%|████ | 4747/11526 [49:28<1:09:22, 1.63it/s] {'loss': 0.2198, 'grad_norm': 0.5044882297515869, 'learning_rate': 7.319127953624899e-06, 'epoch': 1.24}
41%|████ | 4747/11526 [49:28<1:09:22, 1.63it/s] 41%|████ | 4748/11526 [49:29<1:09:21, 1.63it/s] {'loss': 0.2452, 'grad_norm': 0.757128894329071, 'learning_rate': 7.317786277686171e-06, 'epoch': 1.24}
41%|████ | 4748/11526 [49:29<1:09:21, 1.63it/s] 41%|████ | 4749/11526 [49:29<1:09:21, 1.63it/s] {'loss': 0.164, 'grad_norm': 0.4574199318885803, 'learning_rate': 7.316444389146896e-06, 'epoch': 1.24}
41%|████ | 4749/11526 [49:29<1:09:21, 1.63it/s] 41%|████ | 4750/11526 [49:30<1:09:25, 1.63it/s] {'loss': 0.1645, 'grad_norm': 0.4742080271244049, 'learning_rate': 7.315102288130164e-06, 'epoch': 1.24}
41%|████ | 4750/11526 [49:30<1:09:25, 1.63it/s] 41%|████ | 4751/11526 [49:30<1:09:25, 1.63it/s] {'loss': 0.1783, 'grad_norm': 0.49787837266921997, 'learning_rate': 7.313759974759076e-06, 'epoch': 1.24}
41%|████ | 4751/11526 [49:31<1:09:25, 1.63it/s] 41%|████ | 4752/11526 [49:31<1:09:21, 1.63it/s] {'loss': 0.1752, 'grad_norm': 0.5027632713317871, 'learning_rate': 7.312417449156759e-06, 'epoch': 1.24}
41%|████ | 4752/11526 [49:31<1:09:21, 1.63it/s] 41%|████ | 4753/11526 [49:32<1:09:25, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5511363744735718, 'learning_rate': 7.3110747114463555e-06, 'epoch': 1.24}
41%|████ | 4753/11526 [49:32<1:09:25, 1.63it/s] 41%|████ | 4754/11526 [49:32<1:09:20, 1.63it/s] {'loss': 0.2798, 'grad_norm': 0.580000102519989, 'learning_rate': 7.309731761751032e-06, 'epoch': 1.24}
41%|████ | 4754/11526 [49:32<1:09:20, 1.63it/s] 41%|████▏ | 4755/11526 [49:33<1:09:17, 1.63it/s] {'loss': 0.2858, 'grad_norm': 0.7936846017837524, 'learning_rate': 7.30838860019397e-06, 'epoch': 1.24}
41%|████▏ | 4755/11526 [49:33<1:09:17, 1.63it/s] 41%|████▏ | 4756/11526 [49:33<1:09:19, 1.63it/s] {'loss': 0.2291, 'grad_norm': 0.6306805610656738, 'learning_rate': 7.307045226898367e-06, 'epoch': 1.24}
41%|████▏ | 4756/11526 [49:34<1:09:19, 1.63it/s] 41%|████▏ | 4757/11526 [49:34<1:09:17, 1.63it/s] {'loss': 0.2011, 'grad_norm': 0.4747185707092285, 'learning_rate': 7.305701641987453e-06, 'epoch': 1.24}
41%|████▏ | 4757/11526 [49:34<1:09:17, 1.63it/s] 41%|████▏ | 4758/11526 [49:35<1:09:15, 1.63it/s] {'loss': 0.1855, 'grad_norm': 0.5015791058540344, 'learning_rate': 7.304357845584465e-06, 'epoch': 1.24}
41%|████▏ | 4758/11526 [49:35<1:09:15, 1.63it/s] 41%|████▏ | 4759/11526 [49:35<1:09:15, 1.63it/s] {'loss': 0.2536, 'grad_norm': 0.5904855132102966, 'learning_rate': 7.303013837812663e-06, 'epoch': 1.24}
41%|████▏ | 4759/11526 [49:35<1:09:15, 1.63it/s] 41%|████▏ | 4760/11526 [49:36<1:09:13, 1.63it/s] {'loss': 0.2726, 'grad_norm': 0.5991783142089844, 'learning_rate': 7.301669618795329e-06, 'epoch': 1.24}
41%|████▏ | 4760/11526 [49:36<1:09:13, 1.63it/s] 41%|████▏ | 4761/11526 [49:37<1:09:12, 1.63it/s] {'loss': 0.1358, 'grad_norm': 0.4020959138870239, 'learning_rate': 7.300325188655762e-06, 'epoch': 1.24}
41%|████▏ | 4761/11526 [49:37<1:09:12, 1.63it/s] 41%|████▏ | 4762/11526 [49:37<1:09:11, 1.63it/s] {'loss': 0.1905, 'grad_norm': 0.6750358939170837, 'learning_rate': 7.298980547517279e-06, 'epoch': 1.24}
41%|████▏ | 4762/11526 [49:37<1:09:11, 1.63it/s] 41%|████▏ | 4763/11526 [49:38<1:09:09, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.569786012172699, 'learning_rate': 7.297635695503221e-06, 'epoch': 1.24}
41%|████▏ | 4763/11526 [49:38<1:09:09, 1.63it/s] 41%|████▏ | 4764/11526 [49:38<1:09:09, 1.63it/s] {'loss': 0.1754, 'grad_norm': 0.4966314733028412, 'learning_rate': 7.296290632736942e-06, 'epoch': 1.24}
41%|████▏ | 4764/11526 [49:39<1:09:09, 1.63it/s] 41%|████▏ | 4765/11526 [49:39<1:09:08, 1.63it/s] {'loss': 0.3189, 'grad_norm': 0.6717597246170044, 'learning_rate': 7.294945359341823e-06, 'epoch': 1.24}
41%|████▏ | 4765/11526 [49:39<1:09:08, 1.63it/s] 41%|████▏ | 4766/11526 [49:40<1:09:07, 1.63it/s] {'loss': 0.2471, 'grad_norm': 0.552617609500885, 'learning_rate': 7.293599875441257e-06, 'epoch': 1.24}
41%|████▏ | 4766/11526 [49:40<1:09:07, 1.63it/s] 41%|████▏ | 4767/11526 [49:40<1:09:06, 1.63it/s] {'loss': 0.2602, 'grad_norm': 0.6641343235969543, 'learning_rate': 7.292254181158661e-06, 'epoch': 1.24}
41%|████▏ | 4767/11526 [49:40<1:09:06, 1.63it/s] 41%|████▏ | 4768/11526 [49:41<1:09:06, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.5759797692298889, 'learning_rate': 7.2909082766174674e-06, 'epoch': 1.24}
41%|████▏ | 4768/11526 [49:41<1:09:06, 1.63it/s] 41%|████▏ | 4769/11526 [49:41<1:09:04, 1.63it/s] {'loss': 0.1856, 'grad_norm': 0.5349439382553101, 'learning_rate': 7.289562161941134e-06, 'epoch': 1.24}
41%|████▏ | 4769/11526 [49:42<1:09:04, 1.63it/s] 41%|████▏ | 4770/11526 [49:42<1:09:06, 1.63it/s] {'loss': 0.2232, 'grad_norm': 0.6057730913162231, 'learning_rate': 7.288215837253132e-06, 'epoch': 1.24}
41%|████▏ | 4770/11526 [49:42<1:09:06, 1.63it/s] 41%|████▏ | 4771/11526 [49:43<1:09:07, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.5875633358955383, 'learning_rate': 7.286869302676952e-06, 'epoch': 1.24}
41%|████▏ | 4771/11526 [49:43<1:09:07, 1.63it/s] 41%|████▏ | 4772/11526 [49:43<1:09:07, 1.63it/s] {'loss': 0.2306, 'grad_norm': 0.5567277669906616, 'learning_rate': 7.285522558336109e-06, 'epoch': 1.24}
41%|████▏ | 4772/11526 [49:43<1:09:07, 1.63it/s] 41%|████▏ | 4773/11526 [49:44<1:09:06, 1.63it/s] {'loss': 0.1961, 'grad_norm': 0.5133978128433228, 'learning_rate': 7.284175604354133e-06, 'epoch': 1.24}
41%|████▏ | 4773/11526 [49:44<1:09:06, 1.63it/s] 41%|████▏ | 4774/11526 [49:45<1:09:09, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.5534990429878235, 'learning_rate': 7.282828440854575e-06, 'epoch': 1.24}
41%|████▏ | 4774/11526 [49:45<1:09:09, 1.63it/s] 41%|████▏ | 4775/11526 [49:45<1:09:06, 1.63it/s] {'loss': 0.3181, 'grad_norm': 0.6837837100028992, 'learning_rate': 7.281481067961002e-06, 'epoch': 1.24}
41%|████▏ | 4775/11526 [49:45<1:09:06, 1.63it/s] 41%|████▏ | 4776/11526 [49:46<1:09:07, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.5060418844223022, 'learning_rate': 7.280133485797005e-06, 'epoch': 1.24}
41%|████▏ | 4776/11526 [49:46<1:09:07, 1.63it/s] 41%|████▏ | 4777/11526 [49:46<1:09:07, 1.63it/s] {'loss': 0.2157, 'grad_norm': 0.6644652485847473, 'learning_rate': 7.2787856944861915e-06, 'epoch': 1.24}
41%|████▏ | 4777/11526 [49:47<1:09:07, 1.63it/s] 41%|████▏ | 4778/11526 [49:47<1:09:04, 1.63it/s] {'loss': 0.1938, 'grad_norm': 0.5310492515563965, 'learning_rate': 7.277437694152187e-06, 'epoch': 1.24}
41%|████▏ | 4778/11526 [49:47<1:09:04, 1.63it/s] 41%|████▏ | 4779/11526 [49:48<1:09:04, 1.63it/s] {'loss': 0.2029, 'grad_norm': 0.5527928471565247, 'learning_rate': 7.27608948491864e-06, 'epoch': 1.24}
41%|████▏ | 4779/11526 [49:48<1:09:04, 1.63it/s] 41%|████▏ | 4780/11526 [49:48<1:09:03, 1.63it/s] {'loss': 0.2619, 'grad_norm': 0.697435736656189, 'learning_rate': 7.274741066909216e-06, 'epoch': 1.24}
41%|████▏ | 4780/11526 [49:48<1:09:03, 1.63it/s] 41%|████▏ | 4781/11526 [49:49<1:09:02, 1.63it/s] {'loss': 0.2126, 'grad_norm': 0.5793291330337524, 'learning_rate': 7.273392440247597e-06, 'epoch': 1.24}
41%|████▏ | 4781/11526 [49:49<1:09:02, 1.63it/s] 41%|████▏ | 4782/11526 [49:49<1:09:04, 1.63it/s] {'loss': 0.1739, 'grad_norm': 0.5119660496711731, 'learning_rate': 7.27204360505749e-06, 'epoch': 1.24}
41%|████▏ | 4782/11526 [49:50<1:09:04, 1.63it/s] 41%|████▏ | 4783/11526 [49:50<1:09:02, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5198395848274231, 'learning_rate': 7.2706945614626144e-06, 'epoch': 1.24}
41%|████▏ | 4783/11526 [49:50<1:09:02, 1.63it/s] 42%|████▏ | 4784/11526 [49:51<1:09:03, 1.63it/s] {'loss': 0.1995, 'grad_norm': 0.6436270475387573, 'learning_rate': 7.269345309586715e-06, 'epoch': 1.25}
42%|████▏ | 4784/11526 [49:51<1:09:03, 1.63it/s] 42%|████▏ | 4785/11526 [49:51<1:09:00, 1.63it/s] {'loss': 0.2165, 'grad_norm': 0.46619483828544617, 'learning_rate': 7.267995849553553e-06, 'epoch': 1.25}
42%|████▏ | 4785/11526 [49:51<1:09:00, 1.63it/s] 42%|████▏ | 4786/11526 [49:52<1:08:59, 1.63it/s] {'loss': 0.2413, 'grad_norm': 0.6018812656402588, 'learning_rate': 7.266646181486905e-06, 'epoch': 1.25}
42%|████▏ | 4786/11526 [49:52<1:08:59, 1.63it/s] 42%|████▏ | 4787/11526 [49:53<1:09:04, 1.63it/s] {'loss': 0.2286, 'grad_norm': 0.5966364145278931, 'learning_rate': 7.265296305510573e-06, 'epoch': 1.25}
42%|████▏ | 4787/11526 [49:53<1:09:04, 1.63it/s] 42%|████▏ | 4788/11526 [49:53<1:08:59, 1.63it/s] {'loss': 0.2376, 'grad_norm': 0.5732958316802979, 'learning_rate': 7.2639462217483755e-06, 'epoch': 1.25}
42%|████▏ | 4788/11526 [49:53<1:08:59, 1.63it/s] 42%|████▏ | 4789/11526 [49:54<1:08:59, 1.63it/s] {'loss': 0.3501, 'grad_norm': 0.785270631313324, 'learning_rate': 7.2625959303241485e-06, 'epoch': 1.25}
42%|████▏ | 4789/11526 [49:54<1:08:59, 1.63it/s] 42%|████▏ | 4790/11526 [49:54<1:08:57, 1.63it/s] {'loss': 0.2241, 'grad_norm': 0.5287877917289734, 'learning_rate': 7.261245431361749e-06, 'epoch': 1.25}
42%|████▏ | 4790/11526 [49:54<1:08:57, 1.63it/s] 42%|████▏ | 4791/11526 [49:55<1:08:54, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.47190016508102417, 'learning_rate': 7.259894724985054e-06, 'epoch': 1.25}
42%|████▏ | 4791/11526 [49:55<1:08:54, 1.63it/s] 42%|████▏ | 4792/11526 [49:56<1:09:03, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.5898469090461731, 'learning_rate': 7.2585438113179565e-06, 'epoch': 1.25}
42%|████▏ | 4792/11526 [49:56<1:09:03, 1.63it/s] 42%|████▏ | 4793/11526 [49:56<1:09:00, 1.63it/s] {'loss': 0.2143, 'grad_norm': 0.5005614757537842, 'learning_rate': 7.257192690484369e-06, 'epoch': 1.25}
42%|████▏ | 4793/11526 [49:56<1:09:00, 1.63it/s] 42%|████▏ | 4794/11526 [49:57<1:08:57, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.448422908782959, 'learning_rate': 7.255841362608225e-06, 'epoch': 1.25}
42%|████▏ | 4794/11526 [49:57<1:08:57, 1.63it/s] 42%|████▏ | 4795/11526 [49:57<1:08:56, 1.63it/s] {'loss': 0.1657, 'grad_norm': 0.43700361251831055, 'learning_rate': 7.254489827813477e-06, 'epoch': 1.25}
42%|████▏ | 4795/11526 [49:58<1:08:56, 1.63it/s] 42%|████▏ | 4796/11526 [49:58<1:08:54, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.536827564239502, 'learning_rate': 7.253138086224094e-06, 'epoch': 1.25}
42%|████▏ | 4796/11526 [49:58<1:08:54, 1.63it/s] 42%|████▏ | 4797/11526 [49:59<1:09:00, 1.63it/s] {'loss': 0.2109, 'grad_norm': 0.5978624224662781, 'learning_rate': 7.251786137964066e-06, 'epoch': 1.25}
42%|████▏ | 4797/11526 [49:59<1:09:00, 1.63it/s] 42%|████▏ | 4798/11526 [49:59<1:08:56, 1.63it/s] {'loss': 0.2214, 'grad_norm': 0.5993003845214844, 'learning_rate': 7.2504339831574e-06, 'epoch': 1.25}
42%|████▏ | 4798/11526 [49:59<1:08:56, 1.63it/s] 42%|████▏ | 4799/11526 [50:00<1:08:55, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.4393989145755768, 'learning_rate': 7.249081621928127e-06, 'epoch': 1.25}
42%|████▏ | 4799/11526 [50:00<1:08:55, 1.63it/s] 42%|████▏ | 4800/11526 [50:01<1:08:53, 1.63it/s] {'loss': 0.185, 'grad_norm': 0.5040348172187805, 'learning_rate': 7.247729054400289e-06, 'epoch': 1.25}
42%|████▏ | 4800/11526 [50:01<1:08:53, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.36it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5935013294219971, 'eval_runtime': 1.9539, 'eval_samples_per_second': 102.36, 'eval_steps_per_second': 6.653, 'epoch': 1.25}
42%|████▏ | 4800/11526 [50:03<1:08:53, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 42%|████▏ | 4801/11526 [50:03<2:14:43, 1.20s/it] {'loss': 0.2128, 'grad_norm': 0.5927520394325256, 'learning_rate': 7.246376280697954e-06, 'epoch': 1.25}
42%|████▏ | 4801/11526 [50:03<2:14:43, 1.20s/it] 42%|████▏ | 4802/11526 [50:04<1:54:55, 1.03s/it] {'loss': 0.1992, 'grad_norm': 0.5241811275482178, 'learning_rate': 7.245023300945203e-06, 'epoch': 1.25}
42%|████▏ | 4802/11526 [50:04<1:54:55, 1.03s/it] 42%|████▏ | 4803/11526 [50:04<1:41:03, 1.11it/s] {'loss': 0.2654, 'grad_norm': 0.6167775392532349, 'learning_rate': 7.243670115266144e-06, 'epoch': 1.25}
42%|████▏ | 4803/11526 [50:04<1:41:03, 1.11it/s] 42%|████▏ | 4804/11526 [50:05<1:31:28, 1.22it/s] {'loss': 0.189, 'grad_norm': 0.6127283573150635, 'learning_rate': 7.242316723784895e-06, 'epoch': 1.25}
42%|████▏ | 4804/11526 [50:05<1:31:28, 1.22it/s] 42%|████▏ | 4805/11526 [50:06<1:24:40, 1.32it/s] {'loss': 0.2139, 'grad_norm': 0.5143152475357056, 'learning_rate': 7.240963126625598e-06, 'epoch': 1.25}
42%|████▏ | 4805/11526 [50:06<1:24:40, 1.32it/s] 42%|████▏ | 4806/11526 [50:06<1:19:51, 1.40it/s] {'loss': 0.1753, 'grad_norm': 0.5208696722984314, 'learning_rate': 7.239609323912412e-06, 'epoch': 1.25}
42%|████▏ | 4806/11526 [50:06<1:19:51, 1.40it/s] 42%|████▏ | 4807/11526 [50:07<1:16:33, 1.46it/s] {'loss': 0.2135, 'grad_norm': 0.6229537129402161, 'learning_rate': 7.238255315769517e-06, 'epoch': 1.25}
42%|████▏ | 4807/11526 [50:07<1:16:33, 1.46it/s] 42%|████▏ | 4808/11526 [50:07<1:14:11, 1.51it/s] {'loss': 0.1646, 'grad_norm': 0.4263797104358673, 'learning_rate': 7.236901102321109e-06, 'epoch': 1.25}
42%|████▏ | 4808/11526 [50:08<1:14:11, 1.51it/s] 42%|████▏ | 4809/11526 [50:08<1:12:34, 1.54it/s] {'loss': 0.2076, 'grad_norm': 0.5563328266143799, 'learning_rate': 7.235546683691404e-06, 'epoch': 1.25}
42%|████▏ | 4809/11526 [50:08<1:12:34, 1.54it/s] 42%|████▏ | 4810/11526 [50:09<1:11:24, 1.57it/s] {'loss': 0.2217, 'grad_norm': 0.567256510257721, 'learning_rate': 7.234192060004636e-06, 'epoch': 1.25}
42%|████▏ | 4810/11526 [50:09<1:11:24, 1.57it/s] 42%|████▏ | 4811/11526 [50:09<1:10:37, 1.58it/s] {'loss': 0.2285, 'grad_norm': 0.5455598831176758, 'learning_rate': 7.2328372313850625e-06, 'epoch': 1.25}
42%|████▏ | 4811/11526 [50:09<1:10:37, 1.58it/s] 42%|████▏ | 4812/11526 [50:10<1:10:10, 1.59it/s] {'loss': 0.1565, 'grad_norm': 0.44866645336151123, 'learning_rate': 7.231482197956954e-06, 'epoch': 1.25}
42%|████▏ | 4812/11526 [50:10<1:10:10, 1.59it/s] 42%|████▏ | 4813/11526 [50:10<1:09:45, 1.60it/s] {'loss': 0.1671, 'grad_norm': 0.4284648895263672, 'learning_rate': 7.230126959844599e-06, 'epoch': 1.25}
42%|████▏ | 4813/11526 [50:11<1:09:45, 1.60it/s] 42%|████▏ | 4814/11526 [50:11<1:09:29, 1.61it/s] {'loss': 0.268, 'grad_norm': 0.5898144841194153, 'learning_rate': 7.228771517172313e-06, 'epoch': 1.25}
42%|████▏ | 4814/11526 [50:11<1:09:29, 1.61it/s] 42%|████▏ | 4815/11526 [50:12<1:09:14, 1.62it/s] {'loss': 0.265, 'grad_norm': 0.7120265960693359, 'learning_rate': 7.227415870064422e-06, 'epoch': 1.25}
42%|████▏ | 4815/11526 [50:12<1:09:14, 1.62it/s] 42%|████▏ | 4816/11526 [50:12<1:09:03, 1.62it/s] {'loss': 0.22, 'grad_norm': 0.5809049606323242, 'learning_rate': 7.226060018645274e-06, 'epoch': 1.25}
42%|████▏ | 4816/11526 [50:12<1:09:03, 1.62it/s] 42%|████▏ | 4817/11526 [50:13<1:08:55, 1.62it/s] {'loss': 0.1773, 'grad_norm': 0.46209943294525146, 'learning_rate': 7.224703963039233e-06, 'epoch': 1.25}
42%|████▏ | 4817/11526 [50:13<1:08:55, 1.62it/s] 42%|████▏ | 4818/11526 [50:14<1:08:49, 1.62it/s] {'loss': 0.1958, 'grad_norm': 0.579681396484375, 'learning_rate': 7.223347703370688e-06, 'epoch': 1.25}
42%|████▏ | 4818/11526 [50:14<1:08:49, 1.62it/s] 42%|████▏ | 4819/11526 [50:14<1:08:47, 1.62it/s] {'loss': 0.3111, 'grad_norm': 0.7045469284057617, 'learning_rate': 7.221991239764041e-06, 'epoch': 1.25}
42%|████▏ | 4819/11526 [50:14<1:08:47, 1.62it/s] 42%|████▏ | 4820/11526 [50:15<1:08:44, 1.63it/s] {'loss': 0.2041, 'grad_norm': 0.515392541885376, 'learning_rate': 7.220634572343716e-06, 'epoch': 1.25}
42%|████▏ | 4820/11526 [50:15<1:08:44, 1.63it/s] 42%|████▏ | 4821/11526 [50:15<1:08:42, 1.63it/s] {'loss': 0.1843, 'grad_norm': 0.5012373924255371, 'learning_rate': 7.219277701234152e-06, 'epoch': 1.25}
42%|████▏ | 4821/11526 [50:16<1:08:42, 1.63it/s] 42%|████▏ | 4822/11526 [50:16<1:08:40, 1.63it/s] {'loss': 0.2563, 'grad_norm': 0.5985570549964905, 'learning_rate': 7.21792062655981e-06, 'epoch': 1.26}
42%|████▏ | 4822/11526 [50:16<1:08:40, 1.63it/s] 42%|████▏ | 4823/11526 [50:17<1:08:39, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5724107027053833, 'learning_rate': 7.21656334844517e-06, 'epoch': 1.26}
42%|████▏ | 4823/11526 [50:17<1:08:39, 1.63it/s] 42%|████▏ | 4824/11526 [50:17<1:08:37, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.5304181575775146, 'learning_rate': 7.2152058670147275e-06, 'epoch': 1.26}
42%|████▏ | 4824/11526 [50:17<1:08:37, 1.63it/s] 42%|████▏ | 4825/11526 [50:18<1:08:35, 1.63it/s] {'loss': 0.2046, 'grad_norm': 0.4949517846107483, 'learning_rate': 7.213848182392997e-06, 'epoch': 1.26}
42%|████▏ | 4825/11526 [50:18<1:08:35, 1.63it/s] 42%|████▏ | 4826/11526 [50:18<1:08:36, 1.63it/s] {'loss': 0.2177, 'grad_norm': 0.5678319931030273, 'learning_rate': 7.212490294704517e-06, 'epoch': 1.26}
42%|████▏ | 4826/11526 [50:19<1:08:36, 1.63it/s] 42%|████▏ | 4827/11526 [50:19<1:08:33, 1.63it/s] {'loss': 0.1863, 'grad_norm': 0.5127792954444885, 'learning_rate': 7.211132204073838e-06, 'epoch': 1.26}
42%|████▏ | 4827/11526 [50:19<1:08:33, 1.63it/s] 42%|████▏ | 4828/11526 [50:20<1:08:33, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.45070192217826843, 'learning_rate': 7.2097739106255326e-06, 'epoch': 1.26}
42%|████▏ | 4828/11526 [50:20<1:08:33, 1.63it/s] 42%|████▏ | 4829/11526 [50:20<1:08:31, 1.63it/s] {'loss': 0.1932, 'grad_norm': 0.5098623633384705, 'learning_rate': 7.208415414484191e-06, 'epoch': 1.26}
42%|████▏ | 4829/11526 [50:20<1:08:31, 1.63it/s] 42%|████▏ | 4830/11526 [50:21<1:08:29, 1.63it/s] {'loss': 0.1561, 'grad_norm': 0.4237440526485443, 'learning_rate': 7.207056715774423e-06, 'epoch': 1.26}
42%|████▏ | 4830/11526 [50:21<1:08:29, 1.63it/s] 42%|████▏ | 4831/11526 [50:22<1:08:28, 1.63it/s] {'loss': 0.2086, 'grad_norm': 0.5211352109909058, 'learning_rate': 7.205697814620853e-06, 'epoch': 1.26}
42%|████▏ | 4831/11526 [50:22<1:08:28, 1.63it/s] 42%|████▏ | 4832/11526 [50:22<1:08:29, 1.63it/s] {'loss': 0.1893, 'grad_norm': 0.4502962529659271, 'learning_rate': 7.204338711148131e-06, 'epoch': 1.26}
42%|████▏ | 4832/11526 [50:22<1:08:29, 1.63it/s] 42%|████▏ | 4833/11526 [50:23<1:08:27, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.49540114402770996, 'learning_rate': 7.202979405480921e-06, 'epoch': 1.26}
42%|████▏ | 4833/11526 [50:23<1:08:27, 1.63it/s] 42%|████▏ | 4834/11526 [50:23<1:08:29, 1.63it/s] {'loss': 0.1737, 'grad_norm': 0.4422721862792969, 'learning_rate': 7.201619897743905e-06, 'epoch': 1.26}
42%|████▏ | 4834/11526 [50:23<1:08:29, 1.63it/s] 42%|████▏ | 4835/11526 [50:24<1:08:28, 1.63it/s] {'loss': 0.168, 'grad_norm': 0.6311444044113159, 'learning_rate': 7.200260188061786e-06, 'epoch': 1.26}
42%|████▏ | 4835/11526 [50:24<1:08:28, 1.63it/s] 42%|████▏ | 4836/11526 [50:25<1:08:26, 1.63it/s] {'loss': 0.1602, 'grad_norm': 0.44973042607307434, 'learning_rate': 7.198900276559281e-06, 'epoch': 1.26}
42%|████▏ | 4836/11526 [50:25<1:08:26, 1.63it/s] 42%|████▏ | 4837/11526 [50:25<1:08:25, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5548732876777649, 'learning_rate': 7.197540163361134e-06, 'epoch': 1.26}
42%|████▏ | 4837/11526 [50:25<1:08:25, 1.63it/s] 42%|████▏ | 4838/11526 [50:26<1:08:25, 1.63it/s] {'loss': 0.1721, 'grad_norm': 0.47115474939346313, 'learning_rate': 7.1961798485920985e-06, 'epoch': 1.26}
42%|████▏ | 4838/11526 [50:26<1:08:25, 1.63it/s] 42%|████▏ | 4839/11526 [50:26<1:08:24, 1.63it/s] {'loss': 0.1768, 'grad_norm': 0.49736127257347107, 'learning_rate': 7.194819332376952e-06, 'epoch': 1.26}
42%|████▏ | 4839/11526 [50:27<1:08:24, 1.63it/s] 42%|████▏ | 4840/11526 [50:27<1:08:21, 1.63it/s] {'loss': 0.2183, 'grad_norm': 0.5828725695610046, 'learning_rate': 7.193458614840487e-06, 'epoch': 1.26}
42%|████▏ | 4840/11526 [50:27<1:08:21, 1.63it/s] 42%|████▏ | 4841/11526 [50:28<1:08:22, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5793761610984802, 'learning_rate': 7.192097696107519e-06, 'epoch': 1.26}
42%|████▏ | 4841/11526 [50:28<1:08:22, 1.63it/s] 42%|████▏ | 4842/11526 [50:28<1:08:21, 1.63it/s] {'loss': 0.2234, 'grad_norm': 0.5303574800491333, 'learning_rate': 7.190736576302878e-06, 'epoch': 1.26}
42%|████▏ | 4842/11526 [50:28<1:08:21, 1.63it/s] 42%|████▏ | 4843/11526 [50:29<1:08:21, 1.63it/s] {'loss': 0.2441, 'grad_norm': 0.629282534122467, 'learning_rate': 7.189375255551413e-06, 'epoch': 1.26}
42%|████▏ | 4843/11526 [50:29<1:08:21, 1.63it/s] 42%|████▏ | 4844/11526 [50:30<1:08:24, 1.63it/s] {'loss': 0.2105, 'grad_norm': 0.5702300667762756, 'learning_rate': 7.188013733977993e-06, 'epoch': 1.26}
42%|████▏ | 4844/11526 [50:30<1:08:24, 1.63it/s] 42%|████▏ | 4845/11526 [50:30<1:08:20, 1.63it/s] {'loss': 0.2284, 'grad_norm': 0.5995764136314392, 'learning_rate': 7.1866520117075036e-06, 'epoch': 1.26}
42%|████▏ | 4845/11526 [50:30<1:08:20, 1.63it/s] 42%|████▏ | 4846/11526 [50:31<1:08:24, 1.63it/s] {'loss': 0.2435, 'grad_norm': 0.5299422144889832, 'learning_rate': 7.18529008886485e-06, 'epoch': 1.26}
42%|████▏ | 4846/11526 [50:31<1:08:24, 1.63it/s] 42%|████▏ | 4847/11526 [50:31<1:08:21, 1.63it/s] {'loss': 0.268, 'grad_norm': 0.6043120622634888, 'learning_rate': 7.183927965574955e-06, 'epoch': 1.26}
42%|████▏ | 4847/11526 [50:31<1:08:21, 1.63it/s] 42%|████▏ | 4848/11526 [50:32<1:08:21, 1.63it/s] {'loss': 0.2091, 'grad_norm': 0.56227707862854, 'learning_rate': 7.182565641962762e-06, 'epoch': 1.26}
42%|████▏ | 4848/11526 [50:32<1:08:21, 1.63it/s] 42%|████▏ | 4849/11526 [50:33<1:08:18, 1.63it/s] {'loss': 0.1803, 'grad_norm': 0.4827499985694885, 'learning_rate': 7.18120311815323e-06, 'epoch': 1.26}
42%|████▏ | 4849/11526 [50:33<1:08:18, 1.63it/s] 42%|████▏ | 4850/11526 [50:33<1:08:19, 1.63it/s] {'loss': 0.2162, 'grad_norm': 0.5221117734909058, 'learning_rate': 7.179840394271338e-06, 'epoch': 1.26}
42%|████▏ | 4850/11526 [50:33<1:08:19, 1.63it/s] 42%|████▏ | 4851/11526 [50:34<1:08:17, 1.63it/s] {'loss': 0.1748, 'grad_norm': 0.4627179801464081, 'learning_rate': 7.17847747044208e-06, 'epoch': 1.26}
42%|████▏ | 4851/11526 [50:34<1:08:17, 1.63it/s] 42%|████▏ | 4852/11526 [50:34<1:08:15, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5492193698883057, 'learning_rate': 7.177114346790476e-06, 'epoch': 1.26}
42%|████▏ | 4852/11526 [50:35<1:08:15, 1.63it/s] 42%|████▏ | 4853/11526 [50:35<1:08:17, 1.63it/s] {'loss': 0.1796, 'grad_norm': 0.5440188050270081, 'learning_rate': 7.1757510234415565e-06, 'epoch': 1.26}
42%|████▏ | 4853/11526 [50:35<1:08:17, 1.63it/s] 42%|████▏ | 4854/11526 [50:36<1:08:16, 1.63it/s] {'loss': 0.2454, 'grad_norm': 0.5871135592460632, 'learning_rate': 7.174387500520372e-06, 'epoch': 1.26}
42%|████▏ | 4854/11526 [50:36<1:08:16, 1.63it/s] 42%|████▏ | 4855/11526 [50:36<1:08:17, 1.63it/s] {'loss': 0.139, 'grad_norm': 0.3663928210735321, 'learning_rate': 7.173023778151995e-06, 'epoch': 1.26}
42%|████▏ | 4855/11526 [50:36<1:08:17, 1.63it/s] 42%|████▏ | 4856/11526 [50:37<1:08:15, 1.63it/s] {'loss': 0.1832, 'grad_norm': 0.400791734457016, 'learning_rate': 7.171659856461512e-06, 'epoch': 1.26}
42%|████▏ | 4856/11526 [50:37<1:08:15, 1.63it/s] 42%|████▏ | 4857/11526 [50:37<1:08:14, 1.63it/s] {'loss': 0.2653, 'grad_norm': 0.5615357160568237, 'learning_rate': 7.1702957355740335e-06, 'epoch': 1.26}
42%|████▏ | 4857/11526 [50:38<1:08:14, 1.63it/s] 42%|████▏ | 4858/11526 [50:38<1:08:13, 1.63it/s] {'loss': 0.248, 'grad_norm': 0.6130920052528381, 'learning_rate': 7.168931415614679e-06, 'epoch': 1.26}
42%|████▏ | 4858/11526 [50:38<1:08:13, 1.63it/s] 42%|████▏ | 4859/11526 [50:39<1:08:13, 1.63it/s] {'loss': 0.1999, 'grad_norm': 0.6172446608543396, 'learning_rate': 7.1675668967085956e-06, 'epoch': 1.26}
42%|████▏ | 4859/11526 [50:39<1:08:13, 1.63it/s] 42%|████▏ | 4860/11526 [50:39<1:08:13, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.5425907969474792, 'learning_rate': 7.166202178980942e-06, 'epoch': 1.26}
42%|████▏ | 4860/11526 [50:39<1:08:13, 1.63it/s] 42%|████▏ | 4861/11526 [50:40<1:08:24, 1.62it/s] {'loss': 0.2008, 'grad_norm': 0.5719234943389893, 'learning_rate': 7.1648372625569016e-06, 'epoch': 1.27}
42%|████▏ | 4861/11526 [50:40<1:08:24, 1.62it/s] 42%|████▏ | 4862/11526 [50:41<1:08:21, 1.62it/s] {'loss': 0.1492, 'grad_norm': 0.4432767927646637, 'learning_rate': 7.163472147561668e-06, 'epoch': 1.27}
42%|████▏ | 4862/11526 [50:41<1:08:21, 1.62it/s] 42%|████▏ | 4863/11526 [50:41<1:08:18, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.42784228920936584, 'learning_rate': 7.162106834120461e-06, 'epoch': 1.27}
42%|████▏ | 4863/11526 [50:41<1:08:18, 1.63it/s] 42%|████▏ | 4864/11526 [50:42<1:08:18, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.5640027523040771, 'learning_rate': 7.160741322358513e-06, 'epoch': 1.27}
42%|████▏ | 4864/11526 [50:42<1:08:18, 1.63it/s] 42%|████▏ | 4865/11526 [50:42<1:08:15, 1.63it/s] {'loss': 0.2273, 'grad_norm': 0.5654593706130981, 'learning_rate': 7.159375612401077e-06, 'epoch': 1.27}
42%|████▏ | 4865/11526 [50:43<1:08:15, 1.63it/s] 42%|████▏ | 4866/11526 [50:43<1:08:12, 1.63it/s] {'loss': 0.2796, 'grad_norm': 0.731007993221283, 'learning_rate': 7.158009704373423e-06, 'epoch': 1.27}
42%|████▏ | 4866/11526 [50:43<1:08:12, 1.63it/s] 42%|████▏ | 4867/11526 [50:44<1:08:10, 1.63it/s] {'loss': 0.2256, 'grad_norm': 0.5288510322570801, 'learning_rate': 7.15664359840084e-06, 'epoch': 1.27}
42%|████▏ | 4867/11526 [50:44<1:08:10, 1.63it/s] 42%|████▏ | 4868/11526 [50:44<1:08:09, 1.63it/s] {'loss': 0.2546, 'grad_norm': 0.6116311550140381, 'learning_rate': 7.1552772946086354e-06, 'epoch': 1.27}
42%|████▏ | 4868/11526 [50:44<1:08:09, 1.63it/s] 42%|████▏ | 4869/11526 [50:45<1:08:12, 1.63it/s] {'loss': 0.3058, 'grad_norm': 0.642920970916748, 'learning_rate': 7.153910793122135e-06, 'epoch': 1.27}
42%|████▏ | 4869/11526 [50:45<1:08:12, 1.63it/s] 42%|████▏ | 4870/11526 [50:45<1:08:10, 1.63it/s] {'loss': 0.1828, 'grad_norm': 0.542248010635376, 'learning_rate': 7.15254409406668e-06, 'epoch': 1.27}
42%|████▏ | 4870/11526 [50:46<1:08:10, 1.63it/s] 42%|████▏ | 4871/11526 [50:46<1:08:08, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.407118022441864, 'learning_rate': 7.1511771975676345e-06, 'epoch': 1.27}
42%|████▏ | 4871/11526 [50:46<1:08:08, 1.63it/s] 42%|████▏ | 4872/11526 [50:47<1:08:15, 1.62it/s] {'loss': 0.2221, 'grad_norm': 0.5555047392845154, 'learning_rate': 7.149810103750378e-06, 'epoch': 1.27}
42%|████▏ | 4872/11526 [50:47<1:08:15, 1.62it/s] 42%|████▏ | 4873/11526 [50:47<1:08:13, 1.63it/s] {'loss': 0.1753, 'grad_norm': 0.41257819533348083, 'learning_rate': 7.148442812740304e-06, 'epoch': 1.27}
42%|████▏ | 4873/11526 [50:47<1:08:13, 1.63it/s] 42%|████▏ | 4874/11526 [50:48<1:08:14, 1.62it/s] {'loss': 0.1831, 'grad_norm': 0.49307748675346375, 'learning_rate': 7.147075324662833e-06, 'epoch': 1.27}
42%|████▏ | 4874/11526 [50:48<1:08:14, 1.62it/s] 42%|████▏ | 4875/11526 [50:49<1:08:11, 1.63it/s] {'loss': 0.2124, 'grad_norm': 0.6930848956108093, 'learning_rate': 7.145707639643396e-06, 'epoch': 1.27}
42%|████▏ | 4875/11526 [50:49<1:08:11, 1.63it/s] 42%|████▏ | 4876/11526 [50:49<1:08:08, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.626750111579895, 'learning_rate': 7.144339757807446e-06, 'epoch': 1.27}
42%|████▏ | 4876/11526 [50:49<1:08:08, 1.63it/s] 42%|████▏ | 4877/11526 [50:50<1:08:05, 1.63it/s] {'loss': 0.2084, 'grad_norm': 0.549399197101593, 'learning_rate': 7.142971679280453e-06, 'epoch': 1.27}
42%|████▏ | 4877/11526 [50:50<1:08:05, 1.63it/s] 42%|████▏ | 4878/11526 [50:50<1:08:02, 1.63it/s] {'loss': 0.2199, 'grad_norm': 0.7013736367225647, 'learning_rate': 7.141603404187904e-06, 'epoch': 1.27}
42%|████▏ | 4878/11526 [50:51<1:08:02, 1.63it/s] 42%|████▏ | 4879/11526 [50:51<1:08:06, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.4735085368156433, 'learning_rate': 7.140234932655307e-06, 'epoch': 1.27}
42%|████▏ | 4879/11526 [50:51<1:08:06, 1.63it/s] 42%|████▏ | 4880/11526 [50:52<1:11:48, 1.54it/s] {'loss': 0.2199, 'grad_norm': 0.5619046092033386, 'learning_rate': 7.138866264808182e-06, 'epoch': 1.27}
42%|████▏ | 4880/11526 [50:52<1:11:48, 1.54it/s] 42%|████▏ | 4881/11526 [50:52<1:10:39, 1.57it/s] {'loss': 0.1861, 'grad_norm': 0.5366255640983582, 'learning_rate': 7.137497400772075e-06, 'epoch': 1.27}
42%|████▏ | 4881/11526 [50:52<1:10:39, 1.57it/s] 42%|████▏ | 4882/11526 [50:53<1:09:50, 1.59it/s] {'loss': 0.214, 'grad_norm': 0.5293717980384827, 'learning_rate': 7.136128340672543e-06, 'epoch': 1.27}
42%|████▏ | 4882/11526 [50:53<1:09:50, 1.59it/s] 42%|████▏ | 4883/11526 [50:54<1:09:17, 1.60it/s] {'loss': 0.1853, 'grad_norm': 0.5197919607162476, 'learning_rate': 7.134759084635169e-06, 'epoch': 1.27}
42%|████▏ | 4883/11526 [50:54<1:09:17, 1.60it/s] 42%|████▏ | 4884/11526 [50:54<1:08:54, 1.61it/s] {'loss': 0.1974, 'grad_norm': 0.49481019377708435, 'learning_rate': 7.133389632785543e-06, 'epoch': 1.27}
42%|████▏ | 4884/11526 [50:54<1:08:54, 1.61it/s] 42%|████▏ | 4885/11526 [50:55<1:08:37, 1.61it/s] {'loss': 0.2008, 'grad_norm': 0.5596092343330383, 'learning_rate': 7.132019985249281e-06, 'epoch': 1.27}
42%|████▏ | 4885/11526 [50:55<1:08:37, 1.61it/s] 42%|████▏ | 4886/11526 [50:55<1:08:25, 1.62it/s] {'loss': 0.2275, 'grad_norm': 0.6104880571365356, 'learning_rate': 7.130650142152017e-06, 'epoch': 1.27}
42%|████▏ | 4886/11526 [50:56<1:08:25, 1.62it/s] 42%|████▏ | 4887/11526 [50:56<1:08:16, 1.62it/s] {'loss': 0.2108, 'grad_norm': 0.5361106395721436, 'learning_rate': 7.129280103619398e-06, 'epoch': 1.27}
42%|████▏ | 4887/11526 [50:56<1:08:16, 1.62it/s] 42%|████▏ | 4888/11526 [50:57<1:08:07, 1.62it/s] {'loss': 0.1728, 'grad_norm': 0.48396801948547363, 'learning_rate': 7.127909869777094e-06, 'epoch': 1.27}
42%|████▏ | 4888/11526 [50:57<1:08:07, 1.62it/s] 42%|████▏ | 4889/11526 [50:57<1:08:03, 1.63it/s] {'loss': 0.2033, 'grad_norm': 0.53569495677948, 'learning_rate': 7.126539440750787e-06, 'epoch': 1.27}
42%|████▏ | 4889/11526 [50:57<1:08:03, 1.63it/s] 42%|████▏ | 4890/11526 [50:58<1:08:01, 1.63it/s] {'loss': 0.1966, 'grad_norm': 0.5522298216819763, 'learning_rate': 7.125168816666185e-06, 'epoch': 1.27}
42%|████▏ | 4890/11526 [50:58<1:08:01, 1.63it/s] 42%|████▏ | 4891/11526 [50:58<1:07:57, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.48719534277915955, 'learning_rate': 7.1237979976490065e-06, 'epoch': 1.27}
42%|████▏ | 4891/11526 [50:59<1:07:57, 1.63it/s] 42%|████▏ | 4892/11526 [50:59<1:07:58, 1.63it/s] {'loss': 0.1631, 'grad_norm': 0.4260352849960327, 'learning_rate': 7.122426983824992e-06, 'epoch': 1.27}
42%|████▏ | 4892/11526 [50:59<1:07:58, 1.63it/s] 42%|████▏ | 4893/11526 [51:00<1:07:56, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.4347062408924103, 'learning_rate': 7.121055775319899e-06, 'epoch': 1.27}
42%|████▏ | 4893/11526 [51:00<1:07:56, 1.63it/s] 42%|████▏ | 4894/11526 [51:00<1:07:53, 1.63it/s] {'loss': 0.222, 'grad_norm': 0.5854347944259644, 'learning_rate': 7.119684372259501e-06, 'epoch': 1.27}
42%|████▏ | 4894/11526 [51:00<1:07:53, 1.63it/s] 42%|████▏ | 4895/11526 [51:01<1:07:52, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.547728955745697, 'learning_rate': 7.118312774769594e-06, 'epoch': 1.27}
42%|████▏ | 4895/11526 [51:01<1:07:52, 1.63it/s] 42%|████▏ | 4896/11526 [51:02<1:07:51, 1.63it/s] {'loss': 0.205, 'grad_norm': 0.5097270011901855, 'learning_rate': 7.116940982975984e-06, 'epoch': 1.27}
42%|████▏ | 4896/11526 [51:02<1:07:51, 1.63it/s] 42%|████▏ | 4897/11526 [51:02<1:07:52, 1.63it/s] {'loss': 0.2471, 'grad_norm': 0.5530400276184082, 'learning_rate': 7.115568997004503e-06, 'epoch': 1.27}
42%|████▏ | 4897/11526 [51:02<1:07:52, 1.63it/s] 42%|████▏ | 4898/11526 [51:03<1:07:53, 1.63it/s] {'loss': 0.1876, 'grad_norm': 0.5020399689674377, 'learning_rate': 7.114196816980996e-06, 'epoch': 1.27}
42%|████▏ | 4898/11526 [51:03<1:07:53, 1.63it/s] 43%|████▎ | 4899/11526 [51:03<1:07:56, 1.63it/s] {'loss': 0.5173, 'grad_norm': 0.5411838889122009, 'learning_rate': 7.112824443031329e-06, 'epoch': 1.28}
43%|████▎ | 4899/11526 [51:04<1:07:56, 1.63it/s] 43%|████▎ | 4900/11526 [51:04<1:07:55, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.48811277747154236, 'learning_rate': 7.111451875281381e-06, 'epoch': 1.28}
43%|████▎ | 4900/11526 [51:04<1:07:55, 1.63it/s] 43%|████▎ | 4901/11526 [51:05<1:07:54, 1.63it/s] {'loss': 0.1924, 'grad_norm': 0.5130168795585632, 'learning_rate': 7.110079113857053e-06, 'epoch': 1.28}
43%|████▎ | 4901/11526 [51:05<1:07:54, 1.63it/s] 43%|████▎ | 4902/11526 [51:05<1:07:57, 1.62it/s] {'loss': 0.229, 'grad_norm': 0.5982621908187866, 'learning_rate': 7.108706158884265e-06, 'epoch': 1.28}
43%|████▎ | 4902/11526 [51:05<1:07:57, 1.62it/s] 43%|████▎ | 4903/11526 [51:06<1:07:53, 1.63it/s] {'loss': 0.1808, 'grad_norm': 0.5263066291809082, 'learning_rate': 7.1073330104889496e-06, 'epoch': 1.28}
43%|████▎ | 4903/11526 [51:06<1:07:53, 1.63it/s] 43%|████▎ | 4904/11526 [51:06<1:07:50, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.5641317963600159, 'learning_rate': 7.105959668797059e-06, 'epoch': 1.28}
43%|████▎ | 4904/11526 [51:07<1:07:50, 1.63it/s] 43%|████▎ | 4905/11526 [51:07<1:07:52, 1.63it/s] {'loss': 0.3274, 'grad_norm': 0.6437582969665527, 'learning_rate': 7.1045861339345655e-06, 'epoch': 1.28}
43%|████▎ | 4905/11526 [51:07<1:07:52, 1.63it/s] 43%|████▎ | 4906/11526 [51:08<1:07:50, 1.63it/s] {'loss': 0.1952, 'grad_norm': 0.5378247499465942, 'learning_rate': 7.103212406027459e-06, 'epoch': 1.28}
43%|████▎ | 4906/11526 [51:08<1:07:50, 1.63it/s] 43%|████▎ | 4907/11526 [51:08<1:07:54, 1.62it/s] {'loss': 0.193, 'grad_norm': 0.549226701259613, 'learning_rate': 7.101838485201742e-06, 'epoch': 1.28}
43%|████▎ | 4907/11526 [51:08<1:07:54, 1.62it/s] 43%|████▎ | 4908/11526 [51:09<1:07:49, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.5489727854728699, 'learning_rate': 7.10046437158344e-06, 'epoch': 1.28}
43%|████▎ | 4908/11526 [51:09<1:07:49, 1.63it/s] 43%|████▎ | 4909/11526 [51:10<1:07:47, 1.63it/s] {'loss': 0.1674, 'grad_norm': 0.4178398549556732, 'learning_rate': 7.099090065298595e-06, 'epoch': 1.28}
43%|████▎ | 4909/11526 [51:10<1:07:47, 1.63it/s] 43%|████▎ | 4910/11526 [51:10<1:07:52, 1.62it/s] {'loss': 0.2077, 'grad_norm': 0.5728944540023804, 'learning_rate': 7.097715566473267e-06, 'epoch': 1.28}
43%|████▎ | 4910/11526 [51:10<1:07:52, 1.62it/s] 43%|████▎ | 4911/11526 [51:11<1:07:49, 1.63it/s] {'loss': 0.1738, 'grad_norm': 0.46471673250198364, 'learning_rate': 7.096340875233531e-06, 'epoch': 1.28}
43%|████▎ | 4911/11526 [51:11<1:07:49, 1.63it/s] 43%|████▎ | 4912/11526 [51:11<1:07:50, 1.62it/s] {'loss': 0.2413, 'grad_norm': 0.6845933198928833, 'learning_rate': 7.094965991705482e-06, 'epoch': 1.28}
43%|████▎ | 4912/11526 [51:12<1:07:50, 1.62it/s] 43%|████▎ | 4913/11526 [51:12<1:07:46, 1.63it/s] {'loss': 0.246, 'grad_norm': 0.5906627178192139, 'learning_rate': 7.093590916015232e-06, 'epoch': 1.28}
43%|████▎ | 4913/11526 [51:12<1:07:46, 1.63it/s] 43%|████▎ | 4914/11526 [51:13<1:07:42, 1.63it/s] {'loss': 0.2609, 'grad_norm': 0.693182110786438, 'learning_rate': 7.0922156482889116e-06, 'epoch': 1.28}
43%|████▎ | 4914/11526 [51:13<1:07:42, 1.63it/s] 43%|████▎ | 4915/11526 [51:13<1:07:48, 1.62it/s] {'loss': 0.2728, 'grad_norm': 0.5662575960159302, 'learning_rate': 7.090840188652668e-06, 'epoch': 1.28}
43%|████▎ | 4915/11526 [51:13<1:07:48, 1.62it/s] 43%|████▎ | 4916/11526 [51:14<1:07:45, 1.63it/s] {'loss': 0.192, 'grad_norm': 0.45936450362205505, 'learning_rate': 7.089464537232664e-06, 'epoch': 1.28}
43%|████▎ | 4916/11526 [51:14<1:07:45, 1.63it/s] 43%|████▎ | 4917/11526 [51:14<1:07:44, 1.63it/s] {'loss': 0.2131, 'grad_norm': 0.5442309379577637, 'learning_rate': 7.088088694155087e-06, 'epoch': 1.28}
43%|████▎ | 4917/11526 [51:15<1:07:44, 1.63it/s] 43%|████▎ | 4918/11526 [51:15<1:07:41, 1.63it/s] {'loss': 0.267, 'grad_norm': 0.6021322011947632, 'learning_rate': 7.0867126595461335e-06, 'epoch': 1.28}
43%|████▎ | 4918/11526 [51:15<1:07:41, 1.63it/s] 43%|████▎ | 4919/11526 [51:16<1:07:39, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.5344322323799133, 'learning_rate': 7.085336433532021e-06, 'epoch': 1.28}
43%|████▎ | 4919/11526 [51:16<1:07:39, 1.63it/s] 43%|████▎ | 4920/11526 [51:16<1:07:42, 1.63it/s] {'loss': 0.1784, 'grad_norm': 0.5127834677696228, 'learning_rate': 7.083960016238985e-06, 'epoch': 1.28}
43%|████▎ | 4920/11526 [51:16<1:07:42, 1.63it/s] 43%|████▎ | 4921/11526 [51:17<1:07:39, 1.63it/s] {'loss': 0.2787, 'grad_norm': 0.6370212435722351, 'learning_rate': 7.082583407793282e-06, 'epoch': 1.28}
43%|████▎ | 4921/11526 [51:17<1:07:39, 1.63it/s] 43%|████▎ | 4922/11526 [51:18<1:07:37, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.4719948172569275, 'learning_rate': 7.081206608321178e-06, 'epoch': 1.28}
43%|████▎ | 4922/11526 [51:18<1:07:37, 1.63it/s] 43%|████▎ | 4923/11526 [51:18<1:07:33, 1.63it/s] {'loss': 0.2434, 'grad_norm': 0.5458018779754639, 'learning_rate': 7.07982961794896e-06, 'epoch': 1.28}
43%|████▎ | 4923/11526 [51:18<1:07:33, 1.63it/s] 43%|████▎ | 4924/11526 [51:19<1:07:34, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.47565093636512756, 'learning_rate': 7.078452436802938e-06, 'epoch': 1.28}
43%|████▎ | 4924/11526 [51:19<1:07:34, 1.63it/s] 43%|████▎ | 4925/11526 [51:19<1:07:35, 1.63it/s] {'loss': 0.205, 'grad_norm': 0.5369477272033691, 'learning_rate': 7.0770750650094335e-06, 'epoch': 1.28}
43%|████▎ | 4925/11526 [51:20<1:07:35, 1.63it/s] 43%|████▎ | 4926/11526 [51:20<1:07:33, 1.63it/s] {'loss': 0.1954, 'grad_norm': 0.5668866634368896, 'learning_rate': 7.075697502694785e-06, 'epoch': 1.28}
43%|████▎ | 4926/11526 [51:20<1:07:33, 1.63it/s] 43%|████▎ | 4927/11526 [51:21<1:07:34, 1.63it/s] {'loss': 0.2294, 'grad_norm': 0.5393825173377991, 'learning_rate': 7.07431974998535e-06, 'epoch': 1.28}
43%|████▎ | 4927/11526 [51:21<1:07:34, 1.63it/s] 43%|████▎ | 4928/11526 [51:21<1:07:33, 1.63it/s] {'loss': 0.2225, 'grad_norm': 0.6078373789787292, 'learning_rate': 7.072941807007507e-06, 'epoch': 1.28}
43%|████▎ | 4928/11526 [51:21<1:07:33, 1.63it/s] 43%|████▎ | 4929/11526 [51:22<1:07:34, 1.63it/s] {'loss': 0.203, 'grad_norm': 0.5399278402328491, 'learning_rate': 7.071563673887645e-06, 'epoch': 1.28}
43%|████▎ | 4929/11526 [51:22<1:07:34, 1.63it/s] 43%|████▎ | 4930/11526 [51:22<1:07:35, 1.63it/s] {'loss': 0.2118, 'grad_norm': 0.511040985584259, 'learning_rate': 7.070185350752178e-06, 'epoch': 1.28}
43%|████▎ | 4930/11526 [51:23<1:07:35, 1.63it/s] 43%|████▎ | 4931/11526 [51:23<1:07:31, 1.63it/s] {'loss': 0.2062, 'grad_norm': 0.5335224270820618, 'learning_rate': 7.06880683772753e-06, 'epoch': 1.28}
43%|████▎ | 4931/11526 [51:23<1:07:31, 1.63it/s] 43%|████▎ | 4932/11526 [51:24<1:07:33, 1.63it/s] {'loss': 0.2105, 'grad_norm': 0.579979419708252, 'learning_rate': 7.06742813494015e-06, 'epoch': 1.28}
43%|████▎ | 4932/11526 [51:24<1:07:33, 1.63it/s] 43%|████▎ | 4933/11526 [51:24<1:11:12, 1.54it/s] {'loss': 0.21, 'grad_norm': 0.6228797435760498, 'learning_rate': 7.066049242516497e-06, 'epoch': 1.28}
43%|████▎ | 4933/11526 [51:25<1:11:12, 1.54it/s] 43%|████▎ | 4934/11526 [51:25<1:10:03, 1.57it/s] {'loss': 0.1639, 'grad_norm': 0.45942604541778564, 'learning_rate': 7.064670160583052e-06, 'epoch': 1.28}
43%|████▎ | 4934/11526 [51:25<1:10:03, 1.57it/s] 43%|████▎ | 4935/11526 [51:26<1:09:15, 1.59it/s] {'loss': 0.1996, 'grad_norm': 0.45368075370788574, 'learning_rate': 7.063290889266312e-06, 'epoch': 1.28}
43%|████▎ | 4935/11526 [51:26<1:09:15, 1.59it/s] 43%|████▎ | 4936/11526 [51:26<1:08:40, 1.60it/s] {'loss': 0.1883, 'grad_norm': 0.5254310965538025, 'learning_rate': 7.061911428692793e-06, 'epoch': 1.28}
43%|████▎ | 4936/11526 [51:26<1:08:40, 1.60it/s] 43%|████▎ | 4937/11526 [51:27<1:08:16, 1.61it/s] {'loss': 0.1957, 'grad_norm': 0.4721980392932892, 'learning_rate': 7.060531778989026e-06, 'epoch': 1.29}
43%|████▎ | 4937/11526 [51:27<1:08:16, 1.61it/s] 43%|████▎ | 4938/11526 [51:27<1:07:59, 1.61it/s] {'loss': 0.2117, 'grad_norm': 0.545563280582428, 'learning_rate': 7.05915194028156e-06, 'epoch': 1.29}
43%|████▎ | 4938/11526 [51:28<1:07:59, 1.61it/s] 43%|████▎ | 4939/11526 [51:28<1:11:24, 1.54it/s] {'loss': 0.1996, 'grad_norm': 0.5357691049575806, 'learning_rate': 7.057771912696961e-06, 'epoch': 1.29}
43%|████▎ | 4939/11526 [51:28<1:11:24, 1.54it/s] 43%|████▎ | 4940/11526 [51:29<1:10:15, 1.56it/s] {'loss': 0.2834, 'grad_norm': 0.6063620448112488, 'learning_rate': 7.0563916963618155e-06, 'epoch': 1.29}
43%|████▎ | 4940/11526 [51:29<1:10:15, 1.56it/s] 43%|████▎ | 4941/11526 [51:29<1:09:24, 1.58it/s] {'loss': 0.1898, 'grad_norm': 0.5017141103744507, 'learning_rate': 7.0550112914027224e-06, 'epoch': 1.29}
43%|████▎ | 4941/11526 [51:30<1:09:24, 1.58it/s] 43%|████▎ | 4942/11526 [51:30<1:08:48, 1.59it/s] {'loss': 0.2727, 'grad_norm': 0.7299424409866333, 'learning_rate': 7.053630697946301e-06, 'epoch': 1.29}
43%|████▎ | 4942/11526 [51:30<1:08:48, 1.59it/s] 43%|████▎ | 4943/11526 [51:31<1:08:24, 1.60it/s] {'loss': 0.1997, 'grad_norm': 0.5033614039421082, 'learning_rate': 7.052249916119187e-06, 'epoch': 1.29}
43%|████▎ | 4943/11526 [51:31<1:08:24, 1.60it/s] 43%|████▎ | 4944/11526 [51:31<1:08:08, 1.61it/s] {'loss': 0.2053, 'grad_norm': 0.5844262838363647, 'learning_rate': 7.050868946048035e-06, 'epoch': 1.29}
43%|████▎ | 4944/11526 [51:31<1:08:08, 1.61it/s] 43%|████▎ | 4945/11526 [51:32<1:07:55, 1.61it/s] {'loss': 0.1671, 'grad_norm': 0.5191283226013184, 'learning_rate': 7.049487787859514e-06, 'epoch': 1.29}
43%|████▎ | 4945/11526 [51:32<1:07:55, 1.61it/s] 43%|████▎ | 4946/11526 [51:33<1:07:45, 1.62it/s] {'loss': 0.2484, 'grad_norm': 0.5851466059684753, 'learning_rate': 7.048106441680312e-06, 'epoch': 1.29}
43%|████▎ | 4946/11526 [51:33<1:07:45, 1.62it/s] 43%|████▎ | 4947/11526 [51:33<1:07:38, 1.62it/s] {'loss': 0.2057, 'grad_norm': 0.5525196194648743, 'learning_rate': 7.046724907637133e-06, 'epoch': 1.29}
43%|████▎ | 4947/11526 [51:33<1:07:38, 1.62it/s] 43%|████▎ | 4948/11526 [51:34<1:07:32, 1.62it/s] {'loss': 0.2032, 'grad_norm': 0.49752697348594666, 'learning_rate': 7.045343185856701e-06, 'epoch': 1.29}
43%|████▎ | 4948/11526 [51:34<1:07:32, 1.62it/s] 43%|████▎ | 4949/11526 [51:34<1:07:28, 1.62it/s] {'loss': 0.2138, 'grad_norm': 0.589121401309967, 'learning_rate': 7.043961276465754e-06, 'epoch': 1.29}
43%|████▎ | 4949/11526 [51:34<1:07:28, 1.62it/s] 43%|████▎ | 4950/11526 [51:35<1:07:24, 1.63it/s] {'loss': 0.1854, 'grad_norm': 0.507024347782135, 'learning_rate': 7.042579179591048e-06, 'epoch': 1.29}
43%|████▎ | 4950/11526 [51:35<1:07:24, 1.63it/s] 43%|████▎ | 4951/11526 [51:36<1:07:27, 1.62it/s] {'loss': 0.1974, 'grad_norm': 0.5324098467826843, 'learning_rate': 7.04119689535936e-06, 'epoch': 1.29}
43%|████▎ | 4951/11526 [51:36<1:07:27, 1.62it/s] 43%|████▎ | 4952/11526 [51:36<1:07:24, 1.63it/s] {'loss': 0.2007, 'grad_norm': 0.5509524941444397, 'learning_rate': 7.039814423897477e-06, 'epoch': 1.29}
43%|████▎ | 4952/11526 [51:36<1:07:24, 1.63it/s] 43%|████▎ | 4953/11526 [51:37<1:07:20, 1.63it/s] {'loss': 0.1808, 'grad_norm': 0.5031845569610596, 'learning_rate': 7.03843176533221e-06, 'epoch': 1.29}
43%|████▎ | 4953/11526 [51:37<1:07:20, 1.63it/s] 43%|████▎ | 4954/11526 [51:37<1:07:18, 1.63it/s] {'loss': 0.2194, 'grad_norm': 0.6588479280471802, 'learning_rate': 7.037048919790383e-06, 'epoch': 1.29}
43%|████▎ | 4954/11526 [51:38<1:07:18, 1.63it/s] 43%|████▎ | 4955/11526 [51:38<1:07:16, 1.63it/s] {'loss': 0.2278, 'grad_norm': 0.6110967397689819, 'learning_rate': 7.035665887398839e-06, 'epoch': 1.29}
43%|████▎ | 4955/11526 [51:38<1:07:16, 1.63it/s] 43%|████▎ | 4956/11526 [51:39<1:07:23, 1.62it/s] {'loss': 0.2052, 'grad_norm': 0.4988802671432495, 'learning_rate': 7.034282668284438e-06, 'epoch': 1.29}
43%|████▎ | 4956/11526 [51:39<1:07:23, 1.62it/s] 43%|████▎ | 4957/11526 [51:39<1:07:21, 1.63it/s] {'loss': 0.1902, 'grad_norm': 0.4889766275882721, 'learning_rate': 7.032899262574055e-06, 'epoch': 1.29}
43%|████▎ | 4957/11526 [51:39<1:07:21, 1.63it/s] 43%|████▎ | 4958/11526 [51:40<1:07:18, 1.63it/s] {'loss': 0.2961, 'grad_norm': 0.6931955218315125, 'learning_rate': 7.031515670394584e-06, 'epoch': 1.29}
43%|████▎ | 4958/11526 [51:40<1:07:18, 1.63it/s] 43%|████▎ | 4959/11526 [51:41<1:07:14, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.5198308825492859, 'learning_rate': 7.03013189187294e-06, 'epoch': 1.29}
43%|████▎ | 4959/11526 [51:41<1:07:14, 1.63it/s] 43%|████▎ | 4960/11526 [51:41<1:07:12, 1.63it/s] {'loss': 0.1748, 'grad_norm': 0.4900934100151062, 'learning_rate': 7.0287479271360445e-06, 'epoch': 1.29}
43%|████▎ | 4960/11526 [51:41<1:07:12, 1.63it/s] 43%|████▎ | 4961/11526 [51:42<1:07:15, 1.63it/s] {'loss': 0.1967, 'grad_norm': 0.5565969347953796, 'learning_rate': 7.027363776310848e-06, 'epoch': 1.29}
43%|████▎ | 4961/11526 [51:42<1:07:15, 1.63it/s] 43%|████▎ | 4962/11526 [51:42<1:07:12, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.4952892065048218, 'learning_rate': 7.025979439524311e-06, 'epoch': 1.29}
43%|████▎ | 4962/11526 [51:42<1:07:12, 1.63it/s] 43%|████▎ | 4963/11526 [51:43<1:07:11, 1.63it/s] {'loss': 0.2438, 'grad_norm': 0.6533941626548767, 'learning_rate': 7.024594916903411e-06, 'epoch': 1.29}
43%|████▎ | 4963/11526 [51:43<1:07:11, 1.63it/s] 43%|████▎ | 4964/11526 [51:44<1:07:08, 1.63it/s] {'loss': 0.1563, 'grad_norm': 0.45159515738487244, 'learning_rate': 7.023210208575148e-06, 'epoch': 1.29}
43%|████▎ | 4964/11526 [51:44<1:07:08, 1.63it/s] 43%|████▎ | 4965/11526 [51:44<1:07:08, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.4294429421424866, 'learning_rate': 7.02182531466653e-06, 'epoch': 1.29}
43%|████▎ | 4965/11526 [51:44<1:07:08, 1.63it/s] 43%|████▎ | 4966/11526 [51:45<1:07:09, 1.63it/s] {'loss': 0.1911, 'grad_norm': 0.5327800512313843, 'learning_rate': 7.020440235304593e-06, 'epoch': 1.29}
43%|████▎ | 4966/11526 [51:45<1:07:09, 1.63it/s] 43%|████▎ | 4967/11526 [51:45<1:07:08, 1.63it/s] {'loss': 0.1989, 'grad_norm': 0.5742182731628418, 'learning_rate': 7.01905497061638e-06, 'epoch': 1.29}
43%|████▎ | 4967/11526 [51:46<1:07:08, 1.63it/s] 43%|████▎ | 4968/11526 [51:46<1:07:06, 1.63it/s] {'loss': 0.2219, 'grad_norm': 0.5647261738777161, 'learning_rate': 7.017669520728958e-06, 'epoch': 1.29}
43%|████▎ | 4968/11526 [51:46<1:07:06, 1.63it/s] 43%|████▎ | 4969/11526 [51:47<1:07:09, 1.63it/s] {'loss': 0.1546, 'grad_norm': 0.48111599683761597, 'learning_rate': 7.016283885769406e-06, 'epoch': 1.29}
43%|████▎ | 4969/11526 [51:47<1:07:09, 1.63it/s] 43%|████▎ | 4970/11526 [51:47<1:07:08, 1.63it/s] {'loss': 0.1995, 'grad_norm': 0.48285698890686035, 'learning_rate': 7.014898065864826e-06, 'epoch': 1.29}
43%|████▎ | 4970/11526 [51:47<1:07:08, 1.63it/s] 43%|████▎ | 4971/11526 [51:48<1:07:13, 1.63it/s] {'loss': 0.2122, 'grad_norm': 0.5910437703132629, 'learning_rate': 7.0135120611423315e-06, 'epoch': 1.29}
43%|████▎ | 4971/11526 [51:48<1:07:13, 1.63it/s] 43%|████▎ | 4972/11526 [51:48<1:07:11, 1.63it/s] {'loss': 0.1945, 'grad_norm': 0.5822944045066833, 'learning_rate': 7.012125871729053e-06, 'epoch': 1.29}
43%|████▎ | 4972/11526 [51:49<1:07:11, 1.63it/s] 43%|████▎ | 4973/11526 [51:49<1:07:08, 1.63it/s] {'loss': 0.248, 'grad_norm': 0.6126558780670166, 'learning_rate': 7.010739497752141e-06, 'epoch': 1.29}
43%|████▎ | 4973/11526 [51:49<1:07:08, 1.63it/s] 43%|████▎ | 4974/11526 [51:50<1:07:05, 1.63it/s] {'loss': 0.2577, 'grad_norm': 0.6010451912879944, 'learning_rate': 7.009352939338761e-06, 'epoch': 1.29}
43%|████▎ | 4974/11526 [51:50<1:07:05, 1.63it/s] 43%|████▎ | 4975/11526 [51:50<1:07:06, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.5201930999755859, 'learning_rate': 7.007966196616098e-06, 'epoch': 1.29}
43%|████▎ | 4975/11526 [51:50<1:07:06, 1.63it/s] 43%|████▎ | 4976/11526 [51:51<1:07:11, 1.62it/s] {'loss': 0.1943, 'grad_norm': 0.48569610714912415, 'learning_rate': 7.00657926971135e-06, 'epoch': 1.3}
43%|████▎ | 4976/11526 [51:51<1:07:11, 1.62it/s] 43%|████▎ | 4977/11526 [51:52<1:07:10, 1.62it/s] {'loss': 0.2277, 'grad_norm': 0.5917664766311646, 'learning_rate': 7.005192158751735e-06, 'epoch': 1.3}
43%|████▎ | 4977/11526 [51:52<1:07:10, 1.62it/s] 43%|████▎ | 4978/11526 [51:52<1:07:11, 1.62it/s] {'loss': 0.2173, 'grad_norm': 0.5952563285827637, 'learning_rate': 7.003804863864485e-06, 'epoch': 1.3}
43%|████▎ | 4978/11526 [51:52<1:07:11, 1.62it/s] 43%|████▎ | 4979/11526 [51:53<1:07:06, 1.63it/s] {'loss': 0.2374, 'grad_norm': 0.5467720627784729, 'learning_rate': 7.002417385176853e-06, 'epoch': 1.3}
43%|████▎ | 4979/11526 [51:53<1:07:06, 1.63it/s] 43%|████▎ | 4980/11526 [51:53<1:07:04, 1.63it/s] {'loss': 0.2186, 'grad_norm': 0.5404059290885925, 'learning_rate': 7.001029722816104e-06, 'epoch': 1.3}
43%|████▎ | 4980/11526 [51:54<1:07:04, 1.63it/s] 43%|████▎ | 4981/11526 [51:54<1:07:10, 1.62it/s] {'loss': 0.1767, 'grad_norm': 0.47234731912612915, 'learning_rate': 6.999641876909523e-06, 'epoch': 1.3}
43%|████▎ | 4981/11526 [51:54<1:07:10, 1.62it/s] 43%|████▎ | 4982/11526 [51:55<1:07:05, 1.63it/s] {'loss': 0.1494, 'grad_norm': 0.4549556374549866, 'learning_rate': 6.998253847584413e-06, 'epoch': 1.3}
43%|████▎ | 4982/11526 [51:55<1:07:05, 1.63it/s] 43%|████▎ | 4983/11526 [51:55<1:07:03, 1.63it/s] {'loss': 0.2575, 'grad_norm': 0.5769269466400146, 'learning_rate': 6.99686563496809e-06, 'epoch': 1.3}
43%|████▎ | 4983/11526 [51:55<1:07:03, 1.63it/s] 43%|████▎ | 4984/11526 [51:56<1:07:03, 1.63it/s] {'loss': 0.2704, 'grad_norm': 0.6500507593154907, 'learning_rate': 6.995477239187889e-06, 'epoch': 1.3}
43%|████▎ | 4984/11526 [51:56<1:07:03, 1.63it/s] 43%|████▎ | 4985/11526 [51:56<1:07:03, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.5805332064628601, 'learning_rate': 6.994088660371162e-06, 'epoch': 1.3}
43%|████▎ | 4985/11526 [51:57<1:07:03, 1.63it/s] 43%|████▎ | 4986/11526 [51:57<1:07:03, 1.63it/s] {'loss': 0.1993, 'grad_norm': 0.5219542980194092, 'learning_rate': 6.9926998986452776e-06, 'epoch': 1.3}
43%|████▎ | 4986/11526 [51:57<1:07:03, 1.63it/s] 43%|████▎ | 4987/11526 [51:58<1:07:02, 1.63it/s] {'loss': 0.2378, 'grad_norm': 0.546785295009613, 'learning_rate': 6.99131095413762e-06, 'epoch': 1.3}
43%|████▎ | 4987/11526 [51:58<1:07:02, 1.63it/s] 43%|████▎ | 4988/11526 [51:58<1:07:00, 1.63it/s] {'loss': 0.2016, 'grad_norm': 0.5735865831375122, 'learning_rate': 6.989921826975592e-06, 'epoch': 1.3}
43%|████▎ | 4988/11526 [51:58<1:07:00, 1.63it/s] 43%|████▎ | 4989/11526 [51:59<1:07:00, 1.63it/s] {'loss': 0.2147, 'grad_norm': 0.5720065832138062, 'learning_rate': 6.98853251728661e-06, 'epoch': 1.3}
43%|████▎ | 4989/11526 [51:59<1:07:00, 1.63it/s] 43%|████▎ | 4990/11526 [52:00<1:06:59, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.6037245392799377, 'learning_rate': 6.987143025198114e-06, 'epoch': 1.3}
43%|████▎ | 4990/11526 [52:00<1:06:59, 1.63it/s] 43%|████▎ | 4991/11526 [52:00<1:06:56, 1.63it/s] {'loss': 0.1582, 'grad_norm': 0.674562931060791, 'learning_rate': 6.985753350837552e-06, 'epoch': 1.3}
43%|████▎ | 4991/11526 [52:00<1:06:56, 1.63it/s] 43%|████▎ | 4992/11526 [52:01<1:06:56, 1.63it/s] {'loss': 0.2132, 'grad_norm': 0.5426851511001587, 'learning_rate': 6.984363494332394e-06, 'epoch': 1.3}
43%|████▎ | 4992/11526 [52:01<1:06:56, 1.63it/s] 43%|████▎ | 4993/11526 [52:01<1:06:55, 1.63it/s] {'loss': 0.1705, 'grad_norm': 0.5516709089279175, 'learning_rate': 6.982973455810127e-06, 'epoch': 1.3}
43%|████▎ | 4993/11526 [52:02<1:06:55, 1.63it/s] 43%|████▎ | 4994/11526 [52:02<1:06:53, 1.63it/s] {'loss': 0.263, 'grad_norm': 0.5691989660263062, 'learning_rate': 6.981583235398251e-06, 'epoch': 1.3}
43%|████▎ | 4994/11526 [52:02<1:06:53, 1.63it/s] 43%|████▎ | 4995/11526 [52:03<1:06:54, 1.63it/s] {'loss': 0.1986, 'grad_norm': 0.5245411396026611, 'learning_rate': 6.980192833224285e-06, 'epoch': 1.3}
43%|████▎ | 4995/11526 [52:03<1:06:54, 1.63it/s] 43%|████▎ | 4996/11526 [52:03<1:06:54, 1.63it/s] {'loss': 0.1852, 'grad_norm': 0.5076910257339478, 'learning_rate': 6.978802249415766e-06, 'epoch': 1.3}
43%|████▎ | 4996/11526 [52:03<1:06:54, 1.63it/s] 43%|████▎ | 4997/11526 [52:04<1:06:51, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.5004545450210571, 'learning_rate': 6.977411484100247e-06, 'epoch': 1.3}
43%|████▎ | 4997/11526 [52:04<1:06:51, 1.63it/s] 43%|████▎ | 4998/11526 [52:04<1:06:51, 1.63it/s] {'loss': 0.2241, 'grad_norm': 0.531580924987793, 'learning_rate': 6.976020537405294e-06, 'epoch': 1.3}
43%|████▎ | 4998/11526 [52:05<1:06:51, 1.63it/s] 43%|████▎ | 4999/11526 [52:05<1:06:49, 1.63it/s] {'loss': 0.1893, 'grad_norm': 0.5315682888031006, 'learning_rate': 6.974629409458495e-06, 'epoch': 1.3}
43%|████▎ | 4999/11526 [52:05<1:06:49, 1.63it/s] 43%|████▎ | 5000/11526 [52:06<1:07:07, 1.62it/s] {'loss': 0.3015, 'grad_norm': 0.7603233456611633, 'learning_rate': 6.9732381003874525e-06, 'epoch': 1.3}
43%|████▎ | 5000/11526 [52:06<1:07:07, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.36it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5875862836837769, 'eval_runtime': 1.9541, 'eval_samples_per_second': 102.349, 'eval_steps_per_second': 6.653, 'epoch': 1.3}
43%|████▎ | 5000/11526 [52:08<1:07:07, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 43%|████▎ | 5001/11526 [52:24<10:33:47, 5.83s/it] {'loss': 0.1988, 'grad_norm': 0.5458200573921204, 'learning_rate': 6.971846610319783e-06, 'epoch': 1.3}
43%|████▎ | 5001/11526 [52:24<10:33:47, 5.83s/it] 43%|████▎ | 5002/11526 [52:24<7:43:45, 4.27s/it] {'loss': 0.1864, 'grad_norm': 0.4932946264743805, 'learning_rate': 6.9704549393831246e-06, 'epoch': 1.3}
43%|████▎ | 5002/11526 [52:24<7:43:45, 4.27s/it] 43%|████▎ | 5003/11526 [52:25<5:44:39, 3.17s/it] {'loss': 0.1987, 'grad_norm': 0.5206835269927979, 'learning_rate': 6.969063087705125e-06, 'epoch': 1.3}
43%|████▎ | 5003/11526 [52:25<5:44:39, 3.17s/it] 43%|████▎ | 5004/11526 [52:26<4:21:13, 2.40s/it] {'loss': 0.2127, 'grad_norm': 0.5053127408027649, 'learning_rate': 6.967671055413457e-06, 'epoch': 1.3}
43%|████▎ | 5004/11526 [52:26<4:21:13, 2.40s/it] 43%|████▎ | 5005/11526 [52:26<3:22:50, 1.87s/it] {'loss': 0.2162, 'grad_norm': 0.5666100382804871, 'learning_rate': 6.966278842635804e-06, 'epoch': 1.3}
43%|████▎ | 5005/11526 [52:26<3:22:50, 1.87s/it] 43%|████▎ | 5006/11526 [52:27<2:42:15, 1.49s/it] {'loss': 0.208, 'grad_norm': 0.5269945859909058, 'learning_rate': 6.964886449499868e-06, 'epoch': 1.3}
43%|████▎ | 5006/11526 [52:27<2:42:15, 1.49s/it] 43%|████▎ | 5007/11526 [52:27<2:13:33, 1.23s/it] {'loss': 0.1618, 'grad_norm': 0.5122518539428711, 'learning_rate': 6.963493876133367e-06, 'epoch': 1.3}
43%|████▎ | 5007/11526 [52:28<2:13:33, 1.23s/it] 43%|████▎ | 5008/11526 [52:28<1:53:33, 1.05s/it] {'loss': 0.2028, 'grad_norm': 0.6248735785484314, 'learning_rate': 6.962101122664036e-06, 'epoch': 1.3}
43%|████▎ | 5008/11526 [52:28<1:53:33, 1.05s/it] 43%|████▎ | 5009/11526 [52:29<1:39:27, 1.09it/s] {'loss': 0.1679, 'grad_norm': 0.5103368759155273, 'learning_rate': 6.960708189219626e-06, 'epoch': 1.3}
43%|████▎ | 5009/11526 [52:29<1:39:27, 1.09it/s] 43%|████▎ | 5010/11526 [52:29<1:29:41, 1.21it/s] {'loss': 0.2303, 'grad_norm': 0.6688008904457092, 'learning_rate': 6.959315075927906e-06, 'epoch': 1.3}
43%|████▎ | 5010/11526 [52:29<1:29:41, 1.21it/s] 43%|████▎ | 5011/11526 [52:30<1:22:48, 1.31it/s] {'loss': 0.2022, 'grad_norm': 0.48710528016090393, 'learning_rate': 6.957921782916657e-06, 'epoch': 1.3}
43%|████▎ | 5011/11526 [52:30<1:22:48, 1.31it/s] 43%|████▎ | 5012/11526 [52:30<1:17:57, 1.39it/s] {'loss': 0.1643, 'grad_norm': 0.48928239941596985, 'learning_rate': 6.956528310313684e-06, 'epoch': 1.3}
43%|████▎ | 5012/11526 [52:31<1:17:57, 1.39it/s] 43%|████▎ | 5013/11526 [52:31<1:14:36, 1.45it/s] {'loss': 0.2129, 'grad_norm': 0.6068082451820374, 'learning_rate': 6.9551346582468015e-06, 'epoch': 1.3}
43%|████▎ | 5013/11526 [52:31<1:14:36, 1.45it/s] 44%|████▎ | 5014/11526 [52:32<1:12:12, 1.50it/s] {'loss': 0.2049, 'grad_norm': 0.5320274829864502, 'learning_rate': 6.9537408268438435e-06, 'epoch': 1.31}
44%|████▎ | 5014/11526 [52:32<1:12:12, 1.50it/s] 44%|████▎ | 5015/11526 [52:32<1:10:30, 1.54it/s] {'loss': 0.2269, 'grad_norm': 0.6600440740585327, 'learning_rate': 6.95234681623266e-06, 'epoch': 1.31}
44%|████▎ | 5015/11526 [52:32<1:10:30, 1.54it/s] 44%|████▎ | 5016/11526 [52:33<1:09:23, 1.56it/s] {'loss': 0.1747, 'grad_norm': 0.4750954508781433, 'learning_rate': 6.9509526265411184e-06, 'epoch': 1.31}
44%|████▎ | 5016/11526 [52:33<1:09:23, 1.56it/s] 44%|████▎ | 5017/11526 [52:34<1:08:32, 1.58it/s] {'loss': 0.2039, 'grad_norm': 0.5677465200424194, 'learning_rate': 6.949558257897102e-06, 'epoch': 1.31}
44%|████▎ | 5017/11526 [52:34<1:08:32, 1.58it/s] 44%|████▎ | 5018/11526 [52:34<1:07:56, 1.60it/s] {'loss': 0.2137, 'grad_norm': 0.5539917945861816, 'learning_rate': 6.94816371042851e-06, 'epoch': 1.31}
44%|████▎ | 5018/11526 [52:34<1:07:56, 1.60it/s] 44%|████▎ | 5019/11526 [52:35<1:07:32, 1.61it/s] {'loss': 0.2033, 'grad_norm': 0.5411314964294434, 'learning_rate': 6.946768984263257e-06, 'epoch': 1.31}
44%|████▎ | 5019/11526 [52:35<1:07:32, 1.61it/s] 44%|████▎ | 5020/11526 [52:35<1:07:16, 1.61it/s] {'loss': 0.1885, 'grad_norm': 0.5422846078872681, 'learning_rate': 6.945374079529277e-06, 'epoch': 1.31}
44%|████▎ | 5020/11526 [52:36<1:07:16, 1.61it/s] 44%|████▎ | 5021/11526 [52:36<1:07:04, 1.62it/s] {'loss': 0.1601, 'grad_norm': 0.4990230202674866, 'learning_rate': 6.943978996354517e-06, 'epoch': 1.31}
44%|████▎ | 5021/11526 [52:36<1:07:04, 1.62it/s] 44%|████▎ | 5022/11526 [52:37<1:06:56, 1.62it/s] {'loss': 0.1888, 'grad_norm': 0.4898965358734131, 'learning_rate': 6.942583734866943e-06, 'epoch': 1.31}
44%|████▎ | 5022/11526 [52:37<1:06:56, 1.62it/s] 44%|████▎ | 5023/11526 [52:37<1:06:51, 1.62it/s] {'loss': 0.1882, 'grad_norm': 0.49515092372894287, 'learning_rate': 6.941188295194536e-06, 'epoch': 1.31}
44%|████▎ | 5023/11526 [52:37<1:06:51, 1.62it/s] 44%|████▎ | 5024/11526 [52:38<1:06:45, 1.62it/s] {'loss': 0.1945, 'grad_norm': 0.5614699721336365, 'learning_rate': 6.9397926774652935e-06, 'epoch': 1.31}
44%|████▎ | 5024/11526 [52:38<1:06:45, 1.62it/s] 44%|████▎ | 5025/11526 [52:38<1:06:38, 1.63it/s] {'loss': 0.1825, 'grad_norm': 0.5369157195091248, 'learning_rate': 6.93839688180723e-06, 'epoch': 1.31}
44%|████▎ | 5025/11526 [52:39<1:06:38, 1.63it/s] 44%|████▎ | 5026/11526 [52:39<1:06:54, 1.62it/s] {'loss': 0.2212, 'grad_norm': 0.5240256786346436, 'learning_rate': 6.937000908348375e-06, 'epoch': 1.31}
44%|████▎ | 5026/11526 [52:39<1:06:54, 1.62it/s] 44%|████▎ | 5027/11526 [52:40<1:06:47, 1.62it/s] {'loss': 0.2052, 'grad_norm': 0.5140405893325806, 'learning_rate': 6.935604757216775e-06, 'epoch': 1.31}
44%|████▎ | 5027/11526 [52:40<1:06:47, 1.62it/s] 44%|████▎ | 5028/11526 [52:40<1:06:42, 1.62it/s] {'loss': 0.2445, 'grad_norm': 0.7606958746910095, 'learning_rate': 6.934208428540495e-06, 'epoch': 1.31}
44%|████▎ | 5028/11526 [52:40<1:06:42, 1.62it/s] 44%|████▎ | 5029/11526 [52:41<1:06:40, 1.62it/s] {'loss': 0.1883, 'grad_norm': 0.5099927186965942, 'learning_rate': 6.932811922447612e-06, 'epoch': 1.31}
44%|████▎ | 5029/11526 [52:41<1:06:40, 1.62it/s] 44%|████▎ | 5030/11526 [52:42<1:06:36, 1.63it/s] {'loss': 0.1885, 'grad_norm': 0.5171640515327454, 'learning_rate': 6.931415239066221e-06, 'epoch': 1.31}
44%|████▎ | 5030/11526 [52:42<1:06:36, 1.63it/s] 44%|████▎ | 5031/11526 [52:42<1:06:39, 1.62it/s] {'loss': 0.2198, 'grad_norm': 0.5913516283035278, 'learning_rate': 6.930018378524438e-06, 'epoch': 1.31}
44%|████▎ | 5031/11526 [52:42<1:06:39, 1.62it/s] 44%|████▎ | 5032/11526 [52:43<1:06:35, 1.63it/s] {'loss': 0.1987, 'grad_norm': 0.4949851930141449, 'learning_rate': 6.928621340950385e-06, 'epoch': 1.31}
44%|████▎ | 5032/11526 [52:43<1:06:35, 1.63it/s] 44%|████▎ | 5033/11526 [52:43<1:06:38, 1.62it/s] {'loss': 0.1815, 'grad_norm': 0.5807958841323853, 'learning_rate': 6.9272241264722115e-06, 'epoch': 1.31}
44%|████▎ | 5033/11526 [52:44<1:06:38, 1.62it/s] 44%|████▎ | 5034/11526 [52:44<1:06:35, 1.62it/s] {'loss': 0.2719, 'grad_norm': 0.6747538447380066, 'learning_rate': 6.9258267352180745e-06, 'epoch': 1.31}
44%|████▎ | 5034/11526 [52:44<1:06:35, 1.62it/s] 44%|████▎ | 5035/11526 [52:45<1:06:31, 1.63it/s] {'loss': 0.2056, 'grad_norm': 0.5559393763542175, 'learning_rate': 6.924429167316152e-06, 'epoch': 1.31}
44%|████▎ | 5035/11526 [52:45<1:06:31, 1.63it/s] 44%|████▎ | 5036/11526 [52:45<1:06:53, 1.62it/s] {'loss': 0.246, 'grad_norm': 0.624202311038971, 'learning_rate': 6.923031422894637e-06, 'epoch': 1.31}
44%|████▎ | 5036/11526 [52:45<1:06:53, 1.62it/s] 44%|████▎ | 5037/11526 [52:46<1:06:44, 1.62it/s] {'loss': 0.211, 'grad_norm': 0.5084487795829773, 'learning_rate': 6.9216335020817375e-06, 'epoch': 1.31}
44%|████▎ | 5037/11526 [52:46<1:06:44, 1.62it/s] 44%|████▎ | 5038/11526 [52:46<1:07:01, 1.61it/s] {'loss': 0.224, 'grad_norm': 0.5665268898010254, 'learning_rate': 6.92023540500568e-06, 'epoch': 1.31}
44%|████▎ | 5038/11526 [52:47<1:07:01, 1.61it/s] 44%|████▎ | 5039/11526 [52:47<1:06:47, 1.62it/s] {'loss': 0.2058, 'grad_norm': 0.5779350399971008, 'learning_rate': 6.918837131794706e-06, 'epoch': 1.31}
44%|████▎ | 5039/11526 [52:47<1:06:47, 1.62it/s] 44%|████▎ | 5040/11526 [52:48<1:06:40, 1.62it/s] {'loss': 0.1705, 'grad_norm': 0.49945464730262756, 'learning_rate': 6.917438682577072e-06, 'epoch': 1.31}
44%|████▎ | 5040/11526 [52:48<1:06:40, 1.62it/s] 44%|████▎ | 5041/11526 [52:48<1:06:41, 1.62it/s] {'loss': 0.2255, 'grad_norm': 0.5407106280326843, 'learning_rate': 6.9160400574810515e-06, 'epoch': 1.31}
44%|████▎ | 5041/11526 [52:48<1:06:41, 1.62it/s] 44%|████▎ | 5042/11526 [52:49<1:06:37, 1.62it/s] {'loss': 0.1849, 'grad_norm': 0.5351278185844421, 'learning_rate': 6.914641256634936e-06, 'epoch': 1.31}
44%|████▎ | 5042/11526 [52:49<1:06:37, 1.62it/s] 44%|████▍ | 5043/11526 [52:50<1:06:41, 1.62it/s] {'loss': 0.1648, 'grad_norm': 0.4714474380016327, 'learning_rate': 6.913242280167031e-06, 'epoch': 1.31}
44%|████▍ | 5043/11526 [52:50<1:06:41, 1.62it/s] 44%|████▍ | 5044/11526 [52:50<1:06:38, 1.62it/s] {'loss': 0.2301, 'grad_norm': 0.565022885799408, 'learning_rate': 6.911843128205657e-06, 'epoch': 1.31}
44%|████▍ | 5044/11526 [52:50<1:06:38, 1.62it/s] 44%|████▍ | 5045/11526 [52:51<1:06:33, 1.62it/s] {'loss': 0.1747, 'grad_norm': 0.487469345331192, 'learning_rate': 6.910443800879154e-06, 'epoch': 1.31}
44%|████▍ | 5045/11526 [52:51<1:06:33, 1.62it/s] 44%|████▍ | 5046/11526 [52:51<1:06:32, 1.62it/s] {'loss': 0.2027, 'grad_norm': 0.5439066886901855, 'learning_rate': 6.909044298315875e-06, 'epoch': 1.31}
44%|████▍ | 5046/11526 [52:52<1:06:32, 1.62it/s] 44%|████▍ | 5047/11526 [52:52<1:06:29, 1.62it/s] {'loss': 0.2003, 'grad_norm': 0.5239211916923523, 'learning_rate': 6.907644620644192e-06, 'epoch': 1.31}
44%|████▍ | 5047/11526 [52:52<1:06:29, 1.62it/s] 44%|████▍ | 5048/11526 [52:53<1:06:31, 1.62it/s] {'loss': 0.2084, 'grad_norm': 0.555427074432373, 'learning_rate': 6.90624476799249e-06, 'epoch': 1.31}
44%|████▍ | 5048/11526 [52:53<1:06:31, 1.62it/s] 44%|████▍ | 5049/11526 [52:53<1:06:29, 1.62it/s] {'loss': 0.1966, 'grad_norm': 0.5388742089271545, 'learning_rate': 6.904844740489171e-06, 'epoch': 1.31}
44%|████▍ | 5049/11526 [52:53<1:06:29, 1.62it/s] 44%|████▍ | 5050/11526 [52:54<1:06:25, 1.62it/s] {'loss': 0.2228, 'grad_norm': 0.6642706394195557, 'learning_rate': 6.903444538262656e-06, 'epoch': 1.31}
44%|████▍ | 5050/11526 [52:54<1:06:25, 1.62it/s] 44%|████▍ | 5051/11526 [52:54<1:06:25, 1.62it/s] {'loss': 0.1719, 'grad_norm': 0.4494763910770416, 'learning_rate': 6.902044161441377e-06, 'epoch': 1.31}
44%|████▍ | 5051/11526 [52:55<1:06:25, 1.62it/s] 44%|████▍ | 5052/11526 [52:55<1:06:23, 1.63it/s] {'loss': 0.1772, 'grad_norm': 0.4975728988647461, 'learning_rate': 6.900643610153786e-06, 'epoch': 1.31}
44%|████▍ | 5052/11526 [52:55<1:06:23, 1.63it/s] 44%|████▍ | 5053/11526 [52:56<1:06:25, 1.62it/s] {'loss': 0.2159, 'grad_norm': 0.5177332162857056, 'learning_rate': 6.899242884528346e-06, 'epoch': 1.32}
44%|████▍ | 5053/11526 [52:56<1:06:25, 1.62it/s] 44%|████▍ | 5054/11526 [52:56<1:06:23, 1.62it/s] {'loss': 0.1976, 'grad_norm': 0.5318855047225952, 'learning_rate': 6.897841984693545e-06, 'epoch': 1.32}
44%|████▍ | 5054/11526 [52:56<1:06:23, 1.62it/s] 44%|████▍ | 5055/11526 [52:57<1:06:19, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.5706989169120789, 'learning_rate': 6.896440910777879e-06, 'epoch': 1.32}
44%|████▍ | 5055/11526 [52:57<1:06:19, 1.63it/s] 44%|████▍ | 5056/11526 [52:58<1:06:21, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.5314372777938843, 'learning_rate': 6.89503966290986e-06, 'epoch': 1.32}
44%|████▍ | 5056/11526 [52:58<1:06:21, 1.63it/s] 44%|████▍ | 5057/11526 [52:58<1:06:18, 1.63it/s] {'loss': 0.3922, 'grad_norm': 0.5968328714370728, 'learning_rate': 6.893638241218023e-06, 'epoch': 1.32}
44%|████▍ | 5057/11526 [52:58<1:06:18, 1.63it/s] 44%|████▍ | 5058/11526 [52:59<1:06:21, 1.62it/s] {'loss': 0.1945, 'grad_norm': 0.5605568289756775, 'learning_rate': 6.892236645830912e-06, 'epoch': 1.32}
44%|████▍ | 5058/11526 [52:59<1:06:21, 1.62it/s] 44%|████▍ | 5059/11526 [52:59<1:06:18, 1.63it/s] {'loss': 0.2547, 'grad_norm': 0.5996962785720825, 'learning_rate': 6.890834876877089e-06, 'epoch': 1.32}
44%|████▍ | 5059/11526 [53:00<1:06:18, 1.63it/s] 44%|████▍ | 5060/11526 [53:00<1:06:20, 1.62it/s] {'loss': 0.1857, 'grad_norm': 0.5851413607597351, 'learning_rate': 6.889432934485132e-06, 'epoch': 1.32}
44%|████▍ | 5060/11526 [53:00<1:06:20, 1.62it/s] 44%|████▍ | 5061/11526 [53:01<1:06:18, 1.62it/s] {'loss': 0.232, 'grad_norm': 0.5437815189361572, 'learning_rate': 6.8880308187836365e-06, 'epoch': 1.32}
44%|████▍ | 5061/11526 [53:01<1:06:18, 1.62it/s] 44%|████▍ | 5062/11526 [53:01<1:06:15, 1.63it/s] {'loss': 0.1585, 'grad_norm': 0.414559543132782, 'learning_rate': 6.8866285299012125e-06, 'epoch': 1.32}
44%|████▍ | 5062/11526 [53:01<1:06:15, 1.63it/s] 44%|████▍ | 5063/11526 [53:02<1:06:30, 1.62it/s] {'loss': 0.21, 'grad_norm': 0.5738019943237305, 'learning_rate': 6.885226067966484e-06, 'epoch': 1.32}
44%|████▍ | 5063/11526 [53:02<1:06:30, 1.62it/s] 44%|████▍ | 5064/11526 [53:02<1:06:24, 1.62it/s] {'loss': 0.1689, 'grad_norm': 0.5337746143341064, 'learning_rate': 6.883823433108095e-06, 'epoch': 1.32}
44%|████▍ | 5064/11526 [53:03<1:06:24, 1.62it/s] 44%|████▍ | 5065/11526 [53:03<1:06:19, 1.62it/s] {'loss': 0.1783, 'grad_norm': 0.4756961464881897, 'learning_rate': 6.882420625454702e-06, 'epoch': 1.32}
44%|████▍ | 5065/11526 [53:03<1:06:19, 1.62it/s] 44%|████▍ | 5066/11526 [53:04<1:06:19, 1.62it/s] {'loss': 0.1856, 'grad_norm': 0.6456456184387207, 'learning_rate': 6.881017645134978e-06, 'epoch': 1.32}
44%|████▍ | 5066/11526 [53:04<1:06:19, 1.62it/s] 44%|████▍ | 5067/11526 [53:04<1:06:14, 1.63it/s] {'loss': 0.232, 'grad_norm': 0.5822697877883911, 'learning_rate': 6.879614492277614e-06, 'epoch': 1.32}
44%|████▍ | 5067/11526 [53:04<1:06:14, 1.63it/s] 44%|████▍ | 5068/11526 [53:05<1:06:15, 1.62it/s] {'loss': 0.2252, 'grad_norm': 0.5790320038795471, 'learning_rate': 6.878211167011314e-06, 'epoch': 1.32}
44%|████▍ | 5068/11526 [53:05<1:06:15, 1.62it/s] 44%|████▍ | 5069/11526 [53:06<1:06:11, 1.63it/s] {'loss': 0.1977, 'grad_norm': 0.5084571838378906, 'learning_rate': 6.876807669464799e-06, 'epoch': 1.32}
44%|████▍ | 5069/11526 [53:06<1:06:11, 1.63it/s] 44%|████▍ | 5070/11526 [53:06<1:06:09, 1.63it/s] {'loss': 0.1803, 'grad_norm': 0.529483437538147, 'learning_rate': 6.8754039997668054e-06, 'epoch': 1.32}
44%|████▍ | 5070/11526 [53:06<1:06:09, 1.63it/s] 44%|████▍ | 5071/11526 [53:07<1:06:11, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.45964744687080383, 'learning_rate': 6.874000158046087e-06, 'epoch': 1.32}
44%|████▍ | 5071/11526 [53:07<1:06:11, 1.63it/s] 44%|████▍ | 5072/11526 [53:07<1:06:07, 1.63it/s] {'loss': 0.1798, 'grad_norm': 0.5149293541908264, 'learning_rate': 6.872596144431411e-06, 'epoch': 1.32}
44%|████▍ | 5072/11526 [53:08<1:06:07, 1.63it/s] 44%|████▍ | 5073/11526 [53:08<1:06:14, 1.62it/s] {'loss': 0.1702, 'grad_norm': 0.5060721039772034, 'learning_rate': 6.871191959051563e-06, 'epoch': 1.32}
44%|████▍ | 5073/11526 [53:08<1:06:14, 1.62it/s] 44%|████▍ | 5074/11526 [53:09<1:06:11, 1.62it/s] {'loss': 0.1668, 'grad_norm': 0.46782186627388, 'learning_rate': 6.869787602035341e-06, 'epoch': 1.32}
44%|████▍ | 5074/11526 [53:09<1:06:11, 1.62it/s] 44%|████▍ | 5075/11526 [53:09<1:06:08, 1.63it/s] {'loss': 0.2449, 'grad_norm': 0.5272614359855652, 'learning_rate': 6.868383073511562e-06, 'epoch': 1.32}
44%|████▍ | 5075/11526 [53:09<1:06:08, 1.63it/s] 44%|████▍ | 5076/11526 [53:10<1:06:12, 1.62it/s] {'loss': 0.2212, 'grad_norm': 0.581879734992981, 'learning_rate': 6.8669783736090555e-06, 'epoch': 1.32}
44%|████▍ | 5076/11526 [53:10<1:06:12, 1.62it/s] 44%|████▍ | 5077/11526 [53:10<1:06:07, 1.63it/s] {'loss': 0.1761, 'grad_norm': 0.5507370829582214, 'learning_rate': 6.8655735024566725e-06, 'epoch': 1.32}
44%|████▍ | 5077/11526 [53:11<1:06:07, 1.63it/s] 44%|████▍ | 5078/11526 [53:11<1:06:07, 1.63it/s] {'loss': 0.1803, 'grad_norm': 0.5203486680984497, 'learning_rate': 6.8641684601832715e-06, 'epoch': 1.32}
44%|████▍ | 5078/11526 [53:11<1:06:07, 1.63it/s] 44%|████▍ | 5079/11526 [53:12<1:06:04, 1.63it/s] {'loss': 0.2657, 'grad_norm': 0.798416256904602, 'learning_rate': 6.862763246917732e-06, 'epoch': 1.32}
44%|████▍ | 5079/11526 [53:12<1:06:04, 1.63it/s] 44%|████▍ | 5080/11526 [53:12<1:06:02, 1.63it/s] {'loss': 0.2004, 'grad_norm': 0.5519739389419556, 'learning_rate': 6.861357862788951e-06, 'epoch': 1.32}
44%|████▍ | 5080/11526 [53:12<1:06:02, 1.63it/s] 44%|████▍ | 5081/11526 [53:13<1:06:21, 1.62it/s] {'loss': 0.2213, 'grad_norm': 0.5720822811126709, 'learning_rate': 6.859952307925834e-06, 'epoch': 1.32}
44%|████▍ | 5081/11526 [53:13<1:06:21, 1.62it/s] 44%|████▍ | 5082/11526 [53:14<1:06:14, 1.62it/s] {'loss': 0.2248, 'grad_norm': 0.5608741044998169, 'learning_rate': 6.858546582457311e-06, 'epoch': 1.32}
44%|████▍ | 5082/11526 [53:14<1:06:14, 1.62it/s] 44%|████▍ | 5083/11526 [53:14<1:06:12, 1.62it/s] {'loss': 0.1919, 'grad_norm': 0.545201301574707, 'learning_rate': 6.857140686512317e-06, 'epoch': 1.32}
44%|████▍ | 5083/11526 [53:14<1:06:12, 1.62it/s] 44%|████▍ | 5084/11526 [53:15<1:06:06, 1.62it/s] {'loss': 0.2656, 'grad_norm': 0.6345013380050659, 'learning_rate': 6.855734620219815e-06, 'epoch': 1.32}
44%|████▍ | 5084/11526 [53:15<1:06:06, 1.62it/s] 44%|████▍ | 5085/11526 [53:15<1:06:05, 1.62it/s] {'loss': 0.1687, 'grad_norm': 0.49871957302093506, 'learning_rate': 6.854328383708775e-06, 'epoch': 1.32}
44%|████▍ | 5085/11526 [53:16<1:06:05, 1.62it/s] 44%|████▍ | 5086/11526 [53:16<1:06:06, 1.62it/s] {'loss': 0.1876, 'grad_norm': 0.5855574011802673, 'learning_rate': 6.8529219771081835e-06, 'epoch': 1.32}
44%|████▍ | 5086/11526 [53:16<1:06:06, 1.62it/s] 44%|████▍ | 5087/11526 [53:17<1:06:02, 1.62it/s] {'loss': 0.202, 'grad_norm': 0.5207007527351379, 'learning_rate': 6.851515400547046e-06, 'epoch': 1.32}
44%|████▍ | 5087/11526 [53:17<1:06:02, 1.62it/s] 44%|████▍ | 5088/11526 [53:17<1:06:04, 1.62it/s] {'loss': 0.2025, 'grad_norm': 0.4924187958240509, 'learning_rate': 6.850108654154384e-06, 'epoch': 1.32}
44%|████▍ | 5088/11526 [53:17<1:06:04, 1.62it/s] 44%|████▍ | 5089/11526 [53:18<1:06:01, 1.62it/s] {'loss': 0.1896, 'grad_norm': 0.568649172782898, 'learning_rate': 6.8487017380592266e-06, 'epoch': 1.32}
44%|████▍ | 5089/11526 [53:18<1:06:01, 1.62it/s] 44%|████▍ | 5090/11526 [53:18<1:05:58, 1.63it/s] {'loss': 0.1724, 'grad_norm': 0.48202410340309143, 'learning_rate': 6.847294652390628e-06, 'epoch': 1.32}
44%|████▍ | 5090/11526 [53:19<1:05:58, 1.63it/s] 44%|████▍ | 5091/11526 [53:19<1:06:02, 1.62it/s] {'loss': 0.2478, 'grad_norm': 0.6158656477928162, 'learning_rate': 6.845887397277653e-06, 'epoch': 1.33}
44%|████▍ | 5091/11526 [53:19<1:06:02, 1.62it/s] 44%|████▍ | 5092/11526 [53:20<1:05:57, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.4865584671497345, 'learning_rate': 6.8444799728493835e-06, 'epoch': 1.33}
44%|████▍ | 5092/11526 [53:20<1:05:57, 1.63it/s] 44%|████▍ | 5093/11526 [53:20<1:06:01, 1.62it/s] {'loss': 0.2597, 'grad_norm': 0.523404598236084, 'learning_rate': 6.8430723792349174e-06, 'epoch': 1.33}
44%|████▍ | 5093/11526 [53:20<1:06:01, 1.62it/s] 44%|████▍ | 5094/11526 [53:21<1:05:57, 1.63it/s] {'loss': 0.2024, 'grad_norm': 0.59141606092453, 'learning_rate': 6.841664616563363e-06, 'epoch': 1.33}
44%|████▍ | 5094/11526 [53:21<1:05:57, 1.63it/s] 44%|████▍ | 5095/11526 [53:22<1:05:54, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.4528999626636505, 'learning_rate': 6.8402566849638544e-06, 'epoch': 1.33}
44%|████▍ | 5095/11526 [53:22<1:05:54, 1.63it/s] 44%|████▍ | 5096/11526 [53:22<1:05:57, 1.62it/s] {'loss': 0.3153, 'grad_norm': 0.5905930399894714, 'learning_rate': 6.83884858456553e-06, 'epoch': 1.33}
44%|████▍ | 5096/11526 [53:22<1:05:57, 1.62it/s] 44%|████▍ | 5097/11526 [53:23<1:05:53, 1.63it/s] {'loss': 0.2222, 'grad_norm': 0.608475387096405, 'learning_rate': 6.837440315497552e-06, 'epoch': 1.33}
44%|████▍ | 5097/11526 [53:23<1:05:53, 1.63it/s] 44%|████▍ | 5098/11526 [53:23<1:05:52, 1.63it/s] {'loss': 0.2131, 'grad_norm': 0.716231107711792, 'learning_rate': 6.836031877889092e-06, 'epoch': 1.33}
44%|████▍ | 5098/11526 [53:24<1:05:52, 1.63it/s] 44%|████▍ | 5099/11526 [53:24<1:05:49, 1.63it/s] {'loss': 0.2055, 'grad_norm': 0.5947791337966919, 'learning_rate': 6.834623271869344e-06, 'epoch': 1.33}
44%|████▍ | 5099/11526 [53:24<1:05:49, 1.63it/s] 44%|████▍ | 5100/11526 [53:25<1:05:48, 1.63it/s] {'loss': 0.2349, 'grad_norm': 0.6274242997169495, 'learning_rate': 6.833214497567511e-06, 'epoch': 1.33}
44%|████▍ | 5100/11526 [53:25<1:05:48, 1.63it/s] 44%|████▍ | 5101/11526 [53:25<1:05:52, 1.63it/s] {'loss': 0.2271, 'grad_norm': 0.541924238204956, 'learning_rate': 6.8318055551128116e-06, 'epoch': 1.33}
44%|████▍ | 5101/11526 [53:25<1:05:52, 1.63it/s] 44%|████▍ | 5102/11526 [53:26<1:05:50, 1.63it/s] {'loss': 0.1987, 'grad_norm': 0.5358016490936279, 'learning_rate': 6.830396444634484e-06, 'epoch': 1.33}
44%|████▍ | 5102/11526 [53:26<1:05:50, 1.63it/s] 44%|████▍ | 5103/11526 [53:27<1:06:00, 1.62it/s] {'loss': 0.1743, 'grad_norm': 0.49587470293045044, 'learning_rate': 6.828987166261781e-06, 'epoch': 1.33}
44%|████▍ | 5103/11526 [53:27<1:06:00, 1.62it/s] 44%|████▍ | 5104/11526 [53:27<1:06:01, 1.62it/s] {'loss': 0.2434, 'grad_norm': 0.6600837111473083, 'learning_rate': 6.827577720123969e-06, 'epoch': 1.33}
44%|████▍ | 5104/11526 [53:27<1:06:01, 1.62it/s] 44%|████▍ | 5105/11526 [53:28<1:05:54, 1.62it/s] {'loss': 0.2039, 'grad_norm': 0.5449018478393555, 'learning_rate': 6.8261681063503295e-06, 'epoch': 1.33}
44%|████▍ | 5105/11526 [53:28<1:05:54, 1.62it/s] 44%|████▍ | 5106/11526 [53:28<1:05:53, 1.62it/s] {'loss': 0.1795, 'grad_norm': 0.503143846988678, 'learning_rate': 6.824758325070161e-06, 'epoch': 1.33}
44%|████▍ | 5106/11526 [53:28<1:05:53, 1.62it/s] 44%|████▍ | 5107/11526 [53:29<1:05:49, 1.63it/s] {'loss': 0.1777, 'grad_norm': 0.45052051544189453, 'learning_rate': 6.823348376412776e-06, 'epoch': 1.33}
44%|████▍ | 5107/11526 [53:29<1:05:49, 1.63it/s] 44%|████▍ | 5108/11526 [53:30<1:05:50, 1.62it/s] {'loss': 0.2024, 'grad_norm': 0.5002526044845581, 'learning_rate': 6.821938260507505e-06, 'epoch': 1.33}
44%|████▍ | 5108/11526 [53:30<1:05:50, 1.62it/s] 44%|████▍ | 5109/11526 [53:30<1:05:50, 1.62it/s] {'loss': 0.147, 'grad_norm': 0.48031342029571533, 'learning_rate': 6.820527977483688e-06, 'epoch': 1.33}
44%|████▍ | 5109/11526 [53:30<1:05:50, 1.62it/s] 44%|████▍ | 5110/11526 [53:31<1:05:47, 1.63it/s] {'loss': 0.1889, 'grad_norm': 0.4788575768470764, 'learning_rate': 6.819117527470688e-06, 'epoch': 1.33}
44%|████▍ | 5110/11526 [53:31<1:05:47, 1.63it/s] 44%|████▍ | 5111/11526 [53:31<1:05:48, 1.62it/s] {'loss': 0.1721, 'grad_norm': 0.475101113319397, 'learning_rate': 6.817706910597879e-06, 'epoch': 1.33}
44%|████▍ | 5111/11526 [53:32<1:05:48, 1.62it/s] 44%|████▍ | 5112/11526 [53:32<1:05:41, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.5602203011512756, 'learning_rate': 6.816296126994648e-06, 'epoch': 1.33}
44%|████▍ | 5112/11526 [53:32<1:05:41, 1.63it/s] 44%|████▍ | 5113/11526 [53:33<1:05:41, 1.63it/s] {'loss': 0.2103, 'grad_norm': 0.5382367968559265, 'learning_rate': 6.814885176790402e-06, 'epoch': 1.33}
44%|████▍ | 5113/11526 [53:33<1:05:41, 1.63it/s] 44%|████▍ | 5114/11526 [53:33<1:05:40, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.41776368021965027, 'learning_rate': 6.8134740601145625e-06, 'epoch': 1.33}
44%|████▍ | 5114/11526 [53:33<1:05:40, 1.63it/s] 44%|████▍ | 5115/11526 [53:34<1:05:37, 1.63it/s] {'loss': 0.197, 'grad_norm': 0.5429757833480835, 'learning_rate': 6.812062777096563e-06, 'epoch': 1.33}
44%|████▍ | 5115/11526 [53:34<1:05:37, 1.63it/s] 44%|████▍ | 5116/11526 [53:34<1:05:39, 1.63it/s] {'loss': 0.2183, 'grad_norm': 0.5856050848960876, 'learning_rate': 6.810651327865856e-06, 'epoch': 1.33}
44%|████▍ | 5116/11526 [53:35<1:05:39, 1.63it/s] 44%|████▍ | 5117/11526 [53:35<1:05:37, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.47011247277259827, 'learning_rate': 6.809239712551906e-06, 'epoch': 1.33}
44%|████▍ | 5117/11526 [53:35<1:05:37, 1.63it/s] 44%|████▍ | 5118/11526 [53:36<1:05:43, 1.62it/s] {'loss': 0.2845, 'grad_norm': 0.7439758777618408, 'learning_rate': 6.8078279312841966e-06, 'epoch': 1.33}
44%|████▍ | 5118/11526 [53:36<1:05:43, 1.62it/s] 44%|████▍ | 5119/11526 [53:36<1:05:39, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.5798218250274658, 'learning_rate': 6.806415984192222e-06, 'epoch': 1.33}
44%|████▍ | 5119/11526 [53:36<1:05:39, 1.63it/s] 44%|████▍ | 5120/11526 [53:37<1:05:38, 1.63it/s] {'loss': 0.2431, 'grad_norm': 0.594265341758728, 'learning_rate': 6.805003871405497e-06, 'epoch': 1.33}
44%|████▍ | 5120/11526 [53:37<1:05:38, 1.63it/s] 44%|████▍ | 5121/11526 [53:38<1:05:58, 1.62it/s] {'loss': 0.2163, 'grad_norm': 0.588961660861969, 'learning_rate': 6.803591593053544e-06, 'epoch': 1.33}
44%|████▍ | 5121/11526 [53:38<1:05:58, 1.62it/s] 44%|████▍ | 5122/11526 [53:38<1:05:48, 1.62it/s] {'loss': 0.2121, 'grad_norm': 0.5828805565834045, 'learning_rate': 6.802179149265912e-06, 'epoch': 1.33}
44%|████▍ | 5122/11526 [53:38<1:05:48, 1.62it/s] 44%|████▍ | 5123/11526 [53:39<1:05:46, 1.62it/s] {'loss': 0.3133, 'grad_norm': 0.6370438933372498, 'learning_rate': 6.800766540172152e-06, 'epoch': 1.33}
44%|████▍ | 5123/11526 [53:39<1:05:46, 1.62it/s] 44%|████▍ | 5124/11526 [53:39<1:05:41, 1.62it/s] {'loss': 0.2523, 'grad_norm': 0.7519903779029846, 'learning_rate': 6.79935376590184e-06, 'epoch': 1.33}
44%|████▍ | 5124/11526 [53:40<1:05:41, 1.62it/s] 44%|████▍ | 5125/11526 [53:40<1:05:39, 1.62it/s] {'loss': 0.1779, 'grad_norm': 0.4819071292877197, 'learning_rate': 6.7979408265845606e-06, 'epoch': 1.33}
44%|████▍ | 5125/11526 [53:40<1:05:39, 1.62it/s] 44%|████▍ | 5126/11526 [53:41<1:05:40, 1.62it/s] {'loss': 0.2295, 'grad_norm': 0.5492919087409973, 'learning_rate': 6.796527722349922e-06, 'epoch': 1.33}
44%|████▍ | 5126/11526 [53:41<1:05:40, 1.62it/s] 44%|████▍ | 5127/11526 [53:41<1:05:37, 1.63it/s] {'loss': 0.1926, 'grad_norm': 0.5010651350021362, 'learning_rate': 6.795114453327536e-06, 'epoch': 1.33}
44%|████▍ | 5127/11526 [53:41<1:05:37, 1.63it/s] 44%|████▍ | 5128/11526 [53:42<1:05:42, 1.62it/s] {'loss': 0.2282, 'grad_norm': 0.599063515663147, 'learning_rate': 6.7937010196470386e-06, 'epoch': 1.33}
44%|████▍ | 5128/11526 [53:42<1:05:42, 1.62it/s] 44%|████▍ | 5129/11526 [53:43<1:05:39, 1.62it/s] {'loss': 0.1858, 'grad_norm': 0.5036265254020691, 'learning_rate': 6.79228742143808e-06, 'epoch': 1.33}
44%|████▍ | 5129/11526 [53:43<1:05:39, 1.62it/s] 45%|████▍ | 5130/11526 [53:43<1:05:38, 1.62it/s] {'loss': 0.1969, 'grad_norm': 0.5861877202987671, 'learning_rate': 6.790873658830321e-06, 'epoch': 1.34}
45%|████▍ | 5130/11526 [53:43<1:05:38, 1.62it/s] 45%|████▍ | 5131/11526 [53:44<1:05:42, 1.62it/s] {'loss': 0.191, 'grad_norm': 0.5282135009765625, 'learning_rate': 6.78945973195344e-06, 'epoch': 1.34}
45%|████▍ | 5131/11526 [53:44<1:05:42, 1.62it/s] 45%|████▍ | 5132/11526 [53:44<1:05:36, 1.62it/s] {'loss': 0.1607, 'grad_norm': 0.4630003571510315, 'learning_rate': 6.788045640937129e-06, 'epoch': 1.34}
45%|████▍ | 5132/11526 [53:44<1:05:36, 1.62it/s] 45%|████▍ | 5133/11526 [53:45<1:05:35, 1.62it/s] {'loss': 0.262, 'grad_norm': 0.6745107769966125, 'learning_rate': 6.786631385911101e-06, 'epoch': 1.34}
45%|████▍ | 5133/11526 [53:45<1:05:35, 1.62it/s] 45%|████▍ | 5134/11526 [53:46<1:05:33, 1.63it/s] {'loss': 0.1698, 'grad_norm': 0.4370689392089844, 'learning_rate': 6.785216967005075e-06, 'epoch': 1.34}
45%|████▍ | 5134/11526 [53:46<1:05:33, 1.63it/s] 45%|████▍ | 5135/11526 [53:46<1:05:29, 1.63it/s] {'loss': 0.2393, 'grad_norm': 0.5447256565093994, 'learning_rate': 6.783802384348792e-06, 'epoch': 1.34}
45%|████▍ | 5135/11526 [53:46<1:05:29, 1.63it/s] 45%|████▍ | 5136/11526 [53:47<1:05:28, 1.63it/s] {'loss': 0.1959, 'grad_norm': 0.4886736273765564, 'learning_rate': 6.7823876380720045e-06, 'epoch': 1.34}
45%|████▍ | 5136/11526 [53:47<1:05:28, 1.63it/s] 45%|████▍ | 5137/11526 [53:47<1:05:27, 1.63it/s] {'loss': 0.1906, 'grad_norm': 0.5377547740936279, 'learning_rate': 6.780972728304482e-06, 'epoch': 1.34}
45%|████▍ | 5137/11526 [53:48<1:05:27, 1.63it/s] 45%|████▍ | 5138/11526 [53:48<1:05:26, 1.63it/s] {'loss': 0.2009, 'grad_norm': 0.5511308908462524, 'learning_rate': 6.779557655176008e-06, 'epoch': 1.34}
45%|████▍ | 5138/11526 [53:48<1:05:26, 1.63it/s] 45%|████▍ | 5139/11526 [53:49<1:05:21, 1.63it/s] {'loss': 0.1682, 'grad_norm': 0.4712659418582916, 'learning_rate': 6.77814241881638e-06, 'epoch': 1.34}
45%|████▍ | 5139/11526 [53:49<1:05:21, 1.63it/s] 45%|████▍ | 5140/11526 [53:49<1:05:21, 1.63it/s] {'loss': 0.2204, 'grad_norm': 0.6037439703941345, 'learning_rate': 6.776727019355411e-06, 'epoch': 1.34}
45%|████▍ | 5140/11526 [53:49<1:05:21, 1.63it/s] 45%|████▍ | 5141/11526 [53:50<1:05:25, 1.63it/s] {'loss': 0.2009, 'grad_norm': 0.5239288210868835, 'learning_rate': 6.7753114569229316e-06, 'epoch': 1.34}
45%|████▍ | 5141/11526 [53:50<1:05:25, 1.63it/s] 45%|████▍ | 5142/11526 [53:50<1:05:21, 1.63it/s] {'loss': 0.1845, 'grad_norm': 0.5051903128623962, 'learning_rate': 6.773895731648785e-06, 'epoch': 1.34}
45%|████▍ | 5142/11526 [53:51<1:05:21, 1.63it/s] 45%|████▍ | 5143/11526 [53:51<1:05:23, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.5093215107917786, 'learning_rate': 6.7724798436628285e-06, 'epoch': 1.34}
45%|████▍ | 5143/11526 [53:51<1:05:23, 1.63it/s] 45%|████▍ | 5144/11526 [53:52<1:05:21, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.4294275939464569, 'learning_rate': 6.771063793094935e-06, 'epoch': 1.34}
45%|████▍ | 5144/11526 [53:52<1:05:21, 1.63it/s] 45%|████▍ | 5145/11526 [53:52<1:05:18, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.6272952556610107, 'learning_rate': 6.7696475800749935e-06, 'epoch': 1.34}
45%|████▍ | 5145/11526 [53:52<1:05:18, 1.63it/s] 45%|████▍ | 5146/11526 [53:53<1:05:18, 1.63it/s] {'loss': 0.2243, 'grad_norm': 0.5278520584106445, 'learning_rate': 6.768231204732908e-06, 'epoch': 1.34}
45%|████▍ | 5146/11526 [53:53<1:05:18, 1.63it/s] 45%|████▍ | 5147/11526 [53:54<1:05:16, 1.63it/s] {'loss': 0.2116, 'grad_norm': 0.5827870965003967, 'learning_rate': 6.766814667198595e-06, 'epoch': 1.34}
45%|████▍ | 5147/11526 [53:54<1:05:16, 1.63it/s] 45%|████▍ | 5148/11526 [53:54<1:05:17, 1.63it/s] {'loss': 0.1927, 'grad_norm': 0.506040632724762, 'learning_rate': 6.7653979676019876e-06, 'epoch': 1.34}
45%|████▍ | 5148/11526 [53:54<1:05:17, 1.63it/s] 45%|████▍ | 5149/11526 [53:55<1:05:15, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.6144710779190063, 'learning_rate': 6.763981106073035e-06, 'epoch': 1.34}
45%|████▍ | 5149/11526 [53:55<1:05:15, 1.63it/s] 45%|████▍ | 5150/11526 [53:55<1:05:16, 1.63it/s] {'loss': 0.2526, 'grad_norm': 0.6609393358230591, 'learning_rate': 6.762564082741699e-06, 'epoch': 1.34}
45%|████▍ | 5150/11526 [53:56<1:05:16, 1.63it/s] 45%|████▍ | 5151/11526 [53:56<1:05:17, 1.63it/s] {'loss': 0.1974, 'grad_norm': 0.5039781332015991, 'learning_rate': 6.761146897737954e-06, 'epoch': 1.34}
45%|████▍ | 5151/11526 [53:56<1:05:17, 1.63it/s] 45%|████▍ | 5152/11526 [53:57<1:05:15, 1.63it/s] {'loss': 0.2504, 'grad_norm': 0.5873697400093079, 'learning_rate': 6.759729551191797e-06, 'epoch': 1.34}
45%|████▍ | 5152/11526 [53:57<1:05:15, 1.63it/s] 45%|████▍ | 5153/11526 [53:57<1:05:17, 1.63it/s] {'loss': 0.2617, 'grad_norm': 0.6486479043960571, 'learning_rate': 6.758312043233233e-06, 'epoch': 1.34}
45%|████▍ | 5153/11526 [53:57<1:05:17, 1.63it/s] 45%|████▍ | 5154/11526 [53:58<1:05:15, 1.63it/s] {'loss': 0.22, 'grad_norm': 0.5146434307098389, 'learning_rate': 6.756894373992284e-06, 'epoch': 1.34}
45%|████▍ | 5154/11526 [53:58<1:05:15, 1.63it/s] 45%|████▍ | 5155/11526 [53:58<1:05:14, 1.63it/s] {'loss': 0.2118, 'grad_norm': 0.6354392170906067, 'learning_rate': 6.755476543598986e-06, 'epoch': 1.34}
45%|████▍ | 5155/11526 [53:59<1:05:14, 1.63it/s] 45%|████▍ | 5156/11526 [53:59<1:05:15, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.5558047294616699, 'learning_rate': 6.754058552183391e-06, 'epoch': 1.34}
45%|████▍ | 5156/11526 [53:59<1:05:15, 1.63it/s] 45%|████▍ | 5157/11526 [54:00<1:05:17, 1.63it/s] {'loss': 0.2845, 'grad_norm': 0.6513525247573853, 'learning_rate': 6.752640399875567e-06, 'epoch': 1.34}
45%|████▍ | 5157/11526 [54:00<1:05:17, 1.63it/s] 45%|████▍ | 5158/11526 [54:00<1:05:18, 1.62it/s] {'loss': 0.2327, 'grad_norm': 0.5934886932373047, 'learning_rate': 6.751222086805593e-06, 'epoch': 1.34}
45%|████▍ | 5158/11526 [54:00<1:05:18, 1.62it/s] 45%|████▍ | 5159/11526 [54:01<1:05:17, 1.63it/s] {'loss': 0.1991, 'grad_norm': 0.5025591254234314, 'learning_rate': 6.749803613103565e-06, 'epoch': 1.34}
45%|████▍ | 5159/11526 [54:01<1:05:17, 1.63it/s] 45%|████▍ | 5160/11526 [54:02<1:05:14, 1.63it/s] {'loss': 0.2151, 'grad_norm': 0.5686267614364624, 'learning_rate': 6.748384978899594e-06, 'epoch': 1.34}
45%|████▍ | 5160/11526 [54:02<1:05:14, 1.63it/s] 45%|████▍ | 5161/11526 [54:02<1:05:17, 1.62it/s] {'loss': 0.162, 'grad_norm': 0.5364720821380615, 'learning_rate': 6.746966184323805e-06, 'epoch': 1.34}
45%|████▍ | 5161/11526 [54:02<1:05:17, 1.62it/s] 45%|████▍ | 5162/11526 [54:03<1:05:14, 1.63it/s] {'loss': 0.1877, 'grad_norm': 0.4585244953632355, 'learning_rate': 6.745547229506335e-06, 'epoch': 1.34}
45%|████▍ | 5162/11526 [54:03<1:05:14, 1.63it/s] 45%|████▍ | 5163/11526 [54:03<1:05:15, 1.63it/s] {'loss': 0.2112, 'grad_norm': 0.5617289543151855, 'learning_rate': 6.744128114577344e-06, 'epoch': 1.34}
45%|████▍ | 5163/11526 [54:04<1:05:15, 1.63it/s] 45%|████▍ | 5164/11526 [54:04<1:05:11, 1.63it/s] {'loss': 0.183, 'grad_norm': 0.4628758132457733, 'learning_rate': 6.742708839666998e-06, 'epoch': 1.34}
45%|████▍ | 5164/11526 [54:04<1:05:11, 1.63it/s] 45%|████▍ | 5165/11526 [54:05<1:05:10, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.47144460678100586, 'learning_rate': 6.741289404905482e-06, 'epoch': 1.34}
45%|████▍ | 5165/11526 [54:05<1:05:10, 1.63it/s] 45%|████▍ | 5166/11526 [54:05<1:05:10, 1.63it/s] {'loss': 0.1485, 'grad_norm': 0.3835991621017456, 'learning_rate': 6.739869810422993e-06, 'epoch': 1.34}
45%|████▍ | 5166/11526 [54:05<1:05:10, 1.63it/s] 45%|████▍ | 5167/11526 [54:06<1:05:08, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.48675018548965454, 'learning_rate': 6.7384500563497476e-06, 'epoch': 1.34}
45%|████▍ | 5167/11526 [54:06<1:05:08, 1.63it/s] 45%|████▍ | 5168/11526 [54:06<1:05:10, 1.63it/s] {'loss': 0.213, 'grad_norm': 0.6088152527809143, 'learning_rate': 6.73703014281597e-06, 'epoch': 1.35}
45%|████▍ | 5168/11526 [54:07<1:05:10, 1.63it/s] 45%|████▍ | 5169/11526 [54:07<1:05:06, 1.63it/s] {'loss': 0.1772, 'grad_norm': 0.4584110677242279, 'learning_rate': 6.735610069951905e-06, 'epoch': 1.35}
45%|████▍ | 5169/11526 [54:07<1:05:06, 1.63it/s] 45%|████▍ | 5170/11526 [54:08<1:05:03, 1.63it/s] {'loss': 0.2518, 'grad_norm': 0.5584178566932678, 'learning_rate': 6.7341898378878075e-06, 'epoch': 1.35}
45%|████▍ | 5170/11526 [54:08<1:05:03, 1.63it/s] 45%|████▍ | 5171/11526 [54:08<1:05:02, 1.63it/s] {'loss': 0.1995, 'grad_norm': 0.5046659708023071, 'learning_rate': 6.732769446753954e-06, 'epoch': 1.35}
45%|████▍ | 5171/11526 [54:08<1:05:02, 1.63it/s] 45%|████▍ | 5172/11526 [54:09<1:05:03, 1.63it/s] {'loss': 0.1744, 'grad_norm': 0.49143511056900024, 'learning_rate': 6.731348896680626e-06, 'epoch': 1.35}
45%|████▍ | 5172/11526 [54:09<1:05:03, 1.63it/s] 45%|████▍ | 5173/11526 [54:10<1:05:11, 1.62it/s] {'loss': 0.2542, 'grad_norm': 0.5984535813331604, 'learning_rate': 6.729928187798127e-06, 'epoch': 1.35}
45%|████▍ | 5173/11526 [54:10<1:05:11, 1.62it/s] 45%|████▍ | 5174/11526 [54:10<1:05:07, 1.63it/s] {'loss': 0.1732, 'grad_norm': 0.5113614797592163, 'learning_rate': 6.72850732023677e-06, 'epoch': 1.35}
45%|████▍ | 5174/11526 [54:10<1:05:07, 1.63it/s] 45%|████▍ | 5175/11526 [54:11<1:05:04, 1.63it/s] {'loss': 0.1536, 'grad_norm': 0.4367998242378235, 'learning_rate': 6.727086294126889e-06, 'epoch': 1.35}
45%|████▍ | 5175/11526 [54:11<1:05:04, 1.63it/s] 45%|████▍ | 5176/11526 [54:11<1:05:05, 1.63it/s] {'loss': 0.1992, 'grad_norm': 0.5603272318840027, 'learning_rate': 6.725665109598825e-06, 'epoch': 1.35}
45%|████▍ | 5176/11526 [54:12<1:05:05, 1.63it/s] 45%|████▍ | 5177/11526 [54:12<1:05:01, 1.63it/s] {'loss': 0.1813, 'grad_norm': 0.43976807594299316, 'learning_rate': 6.724243766782939e-06, 'epoch': 1.35}
45%|████▍ | 5177/11526 [54:12<1:05:01, 1.63it/s] 45%|████▍ | 5178/11526 [54:13<1:05:09, 1.62it/s] {'loss': 0.1885, 'grad_norm': 0.4914434850215912, 'learning_rate': 6.722822265809605e-06, 'epoch': 1.35}
45%|████▍ | 5178/11526 [54:13<1:05:09, 1.62it/s] 45%|████▍ | 5179/11526 [54:13<1:05:04, 1.63it/s] {'loss': 0.2342, 'grad_norm': 0.5620859265327454, 'learning_rate': 6.721400606809208e-06, 'epoch': 1.35}
45%|████▍ | 5179/11526 [54:13<1:05:04, 1.63it/s] 45%|████▍ | 5180/11526 [54:14<1:05:00, 1.63it/s] {'loss': 0.2183, 'grad_norm': 0.6320622563362122, 'learning_rate': 6.719978789912156e-06, 'epoch': 1.35}
45%|████▍ | 5180/11526 [54:14<1:05:00, 1.63it/s] 45%|████▍ | 5181/11526 [54:14<1:05:02, 1.63it/s] {'loss': 0.2394, 'grad_norm': 0.56959068775177, 'learning_rate': 6.718556815248861e-06, 'epoch': 1.35}
45%|████▍ | 5181/11526 [54:15<1:05:02, 1.63it/s] 45%|████▍ | 5182/11526 [54:15<1:04:58, 1.63it/s] {'loss': 0.2015, 'grad_norm': 0.538051187992096, 'learning_rate': 6.7171346829497576e-06, 'epoch': 1.35}
45%|████▍ | 5182/11526 [54:15<1:04:58, 1.63it/s] 45%|████▍ | 5183/11526 [54:16<1:05:02, 1.63it/s] {'loss': 0.1644, 'grad_norm': 0.47320324182510376, 'learning_rate': 6.71571239314529e-06, 'epoch': 1.35}
45%|████▍ | 5183/11526 [54:16<1:05:02, 1.63it/s] 45%|████▍ | 5184/11526 [54:16<1:05:03, 1.62it/s] {'loss': 0.1943, 'grad_norm': 0.5044337511062622, 'learning_rate': 6.714289945965921e-06, 'epoch': 1.35}
45%|████▍ | 5184/11526 [54:16<1:05:03, 1.62it/s] 45%|████▍ | 5185/11526 [54:17<1:05:01, 1.63it/s] {'loss': 0.234, 'grad_norm': 0.6596003174781799, 'learning_rate': 6.7128673415421245e-06, 'epoch': 1.35}
45%|████▍ | 5185/11526 [54:17<1:05:01, 1.63it/s] 45%|████▍ | 5186/11526 [54:18<1:05:00, 1.63it/s] {'loss': 0.2056, 'grad_norm': 0.5406523942947388, 'learning_rate': 6.711444580004387e-06, 'epoch': 1.35}
45%|████▍ | 5186/11526 [54:18<1:05:00, 1.63it/s] 45%|████▌ | 5187/11526 [54:18<1:04:59, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.4760223925113678, 'learning_rate': 6.710021661483218e-06, 'epoch': 1.35}
45%|████▌ | 5187/11526 [54:18<1:04:59, 1.63it/s] 45%|████▌ | 5188/11526 [54:19<1:05:00, 1.62it/s] {'loss': 0.2123, 'grad_norm': 0.488310843706131, 'learning_rate': 6.708598586109131e-06, 'epoch': 1.35}
45%|████▌ | 5188/11526 [54:19<1:05:00, 1.62it/s] 45%|████▌ | 5189/11526 [54:19<1:04:55, 1.63it/s] {'loss': 0.1834, 'grad_norm': 0.5571420788764954, 'learning_rate': 6.707175354012659e-06, 'epoch': 1.35}
45%|████▌ | 5189/11526 [54:20<1:04:55, 1.63it/s] 45%|████▌ | 5190/11526 [54:20<1:04:55, 1.63it/s] {'loss': 0.2279, 'grad_norm': 0.5772980451583862, 'learning_rate': 6.705751965324352e-06, 'epoch': 1.35}
45%|████▌ | 5190/11526 [54:20<1:04:55, 1.63it/s] 45%|████▌ | 5191/11526 [54:21<1:04:54, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.5262182354927063, 'learning_rate': 6.70432842017477e-06, 'epoch': 1.35}
45%|████▌ | 5191/11526 [54:21<1:04:54, 1.63it/s] 45%|████▌ | 5192/11526 [54:21<1:04:52, 1.63it/s] {'loss': 0.2498, 'grad_norm': 0.630003809928894, 'learning_rate': 6.702904718694486e-06, 'epoch': 1.35}
45%|████▌ | 5192/11526 [54:21<1:04:52, 1.63it/s] 45%|████▌ | 5193/11526 [54:22<1:04:55, 1.63it/s] {'loss': 0.175, 'grad_norm': 0.5123977065086365, 'learning_rate': 6.701480861014094e-06, 'epoch': 1.35}
45%|████▌ | 5193/11526 [54:22<1:04:55, 1.63it/s] 45%|████▌ | 5194/11526 [54:22<1:04:53, 1.63it/s] {'loss': 0.2248, 'grad_norm': 0.6058310866355896, 'learning_rate': 6.700056847264193e-06, 'epoch': 1.35}
45%|████▌ | 5194/11526 [54:23<1:04:53, 1.63it/s] 45%|████▌ | 5195/11526 [54:23<1:04:50, 1.63it/s] {'loss': 0.1543, 'grad_norm': 0.47426673769950867, 'learning_rate': 6.698632677575409e-06, 'epoch': 1.35}
45%|████▌ | 5195/11526 [54:23<1:04:50, 1.63it/s] 45%|████▌ | 5196/11526 [54:24<1:04:55, 1.62it/s] {'loss': 0.2045, 'grad_norm': 0.4804079234600067, 'learning_rate': 6.697208352078369e-06, 'epoch': 1.35}
45%|████▌ | 5196/11526 [54:24<1:04:55, 1.62it/s] 45%|████▌ | 5197/11526 [54:24<1:04:54, 1.63it/s] {'loss': 0.1924, 'grad_norm': 0.46781376004219055, 'learning_rate': 6.695783870903723e-06, 'epoch': 1.35}
45%|████▌ | 5197/11526 [54:24<1:04:54, 1.63it/s] 45%|████▌ | 5198/11526 [54:25<1:04:58, 1.62it/s] {'loss': 0.214, 'grad_norm': 0.5916358828544617, 'learning_rate': 6.6943592341821315e-06, 'epoch': 1.35}
45%|████▌ | 5198/11526 [54:25<1:04:58, 1.62it/s] 45%|████▌ | 5199/11526 [54:26<1:04:53, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.4886757731437683, 'learning_rate': 6.692934442044271e-06, 'epoch': 1.35}
45%|████▌ | 5199/11526 [54:26<1:04:53, 1.63it/s] 45%|████▌ | 5200/11526 [54:26<1:04:49, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.5426635146141052, 'learning_rate': 6.69150949462083e-06, 'epoch': 1.35}
45%|████▌ | 5200/11526 [54:26<1:04:49, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5883965492248535, 'eval_runtime': 1.9553, 'eval_samples_per_second': 102.286, 'eval_steps_per_second': 6.649, 'epoch': 1.35}
45%|████▌ | 5200/11526 [54:28<1:04:49, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 45%|████▌ | 5201/11526 [54:29<2:06:46, 1.20s/it] {'loss': 0.1959, 'grad_norm': 0.5962038040161133, 'learning_rate': 6.690084392042514e-06, 'epoch': 1.35}
45%|████▌ | 5201/11526 [54:29<2:06:46, 1.20s/it] 45%|████▌ | 5202/11526 [54:29<1:48:08, 1.03s/it] {'loss': 0.1758, 'grad_norm': 0.4972538352012634, 'learning_rate': 6.688659134440043e-06, 'epoch': 1.35}
45%|████▌ | 5202/11526 [54:29<1:48:08, 1.03s/it] 45%|████▌ | 5203/11526 [54:30<1:35:29, 1.10it/s] {'loss': 0.2216, 'grad_norm': 0.5259408354759216, 'learning_rate': 6.687233721944146e-06, 'epoch': 1.35}
45%|████▌ | 5203/11526 [54:30<1:35:29, 1.10it/s] 45%|████▌ | 5204/11526 [54:31<1:26:13, 1.22it/s] {'loss': 0.1855, 'grad_norm': 0.5191421508789062, 'learning_rate': 6.685808154685573e-06, 'epoch': 1.35}
45%|████▌ | 5204/11526 [54:31<1:26:13, 1.22it/s] 45%|████▌ | 5205/11526 [54:31<1:19:46, 1.32it/s] {'loss': 0.2268, 'grad_norm': 0.5311896204948425, 'learning_rate': 6.684382432795083e-06, 'epoch': 1.35}
45%|████▌ | 5205/11526 [54:31<1:19:46, 1.32it/s] 45%|████▌ | 5206/11526 [54:32<1:15:15, 1.40it/s] {'loss': 0.2205, 'grad_norm': 0.4910019040107727, 'learning_rate': 6.682956556403455e-06, 'epoch': 1.36}
45%|████▌ | 5206/11526 [54:32<1:15:15, 1.40it/s] 45%|████▌ | 5207/11526 [54:32<1:12:03, 1.46it/s] {'loss': 0.2281, 'grad_norm': 0.5718873739242554, 'learning_rate': 6.681530525641475e-06, 'epoch': 1.36}
45%|████▌ | 5207/11526 [54:33<1:12:03, 1.46it/s] 45%|████▌ | 5208/11526 [54:33<1:09:55, 1.51it/s] {'loss': 0.3379, 'grad_norm': 0.5933794379234314, 'learning_rate': 6.680104340639947e-06, 'epoch': 1.36}
45%|████▌ | 5208/11526 [54:33<1:09:55, 1.51it/s] 45%|████▌ | 5209/11526 [54:34<1:08:20, 1.54it/s] {'loss': 0.2347, 'grad_norm': 0.5740140676498413, 'learning_rate': 6.678678001529691e-06, 'epoch': 1.36}
45%|████▌ | 5209/11526 [54:34<1:08:20, 1.54it/s] 45%|████▌ | 5210/11526 [54:34<1:07:14, 1.57it/s] {'loss': 0.1793, 'grad_norm': 0.4558897316455841, 'learning_rate': 6.677251508441536e-06, 'epoch': 1.36}
45%|████▌ | 5210/11526 [54:34<1:07:14, 1.57it/s] 45%|████▌ | 5211/11526 [54:35<1:06:31, 1.58it/s] {'loss': 0.1851, 'grad_norm': 0.4382406771183014, 'learning_rate': 6.67582486150633e-06, 'epoch': 1.36}
45%|████▌ | 5211/11526 [54:35<1:06:31, 1.58it/s] 45%|████▌ | 5212/11526 [54:36<1:05:57, 1.60it/s] {'loss': 0.2024, 'grad_norm': 0.5426880717277527, 'learning_rate': 6.674398060854931e-06, 'epoch': 1.36}
45%|████▌ | 5212/11526 [54:36<1:05:57, 1.60it/s] 45%|████▌ | 5213/11526 [54:36<1:05:37, 1.60it/s] {'loss': 0.1377, 'grad_norm': 0.41356202960014343, 'learning_rate': 6.672971106618217e-06, 'epoch': 1.36}
45%|████▌ | 5213/11526 [54:36<1:05:37, 1.60it/s] 45%|████▌ | 5214/11526 [54:37<1:05:20, 1.61it/s] {'loss': 0.1939, 'grad_norm': 0.5569949150085449, 'learning_rate': 6.6715439989270735e-06, 'epoch': 1.36}
45%|████▌ | 5214/11526 [54:37<1:05:20, 1.61it/s] 45%|████▌ | 5215/11526 [54:37<1:05:06, 1.62it/s] {'loss': 0.2011, 'grad_norm': 0.5365814566612244, 'learning_rate': 6.670116737912404e-06, 'epoch': 1.36}
45%|████▌ | 5215/11526 [54:37<1:05:06, 1.62it/s] 45%|████▌ | 5216/11526 [54:38<1:05:00, 1.62it/s] {'loss': 0.19, 'grad_norm': 0.4755382835865021, 'learning_rate': 6.668689323705124e-06, 'epoch': 1.36}
45%|████▌ | 5216/11526 [54:38<1:05:00, 1.62it/s] 45%|████▌ | 5217/11526 [54:39<1:04:52, 1.62it/s] {'loss': 0.2559, 'grad_norm': 0.6449708342552185, 'learning_rate': 6.6672617564361655e-06, 'epoch': 1.36}
45%|████▌ | 5217/11526 [54:39<1:04:52, 1.62it/s] 45%|████▌ | 5218/11526 [54:39<1:04:48, 1.62it/s] {'loss': 0.1579, 'grad_norm': 0.4726240634918213, 'learning_rate': 6.665834036236474e-06, 'epoch': 1.36}
45%|████▌ | 5218/11526 [54:39<1:04:48, 1.62it/s] 45%|████▌ | 5219/11526 [54:40<1:04:42, 1.62it/s] {'loss': 0.1808, 'grad_norm': 0.501530647277832, 'learning_rate': 6.664406163237004e-06, 'epoch': 1.36}
45%|████▌ | 5219/11526 [54:40<1:04:42, 1.62it/s] 45%|████▌ | 5220/11526 [54:40<1:04:37, 1.63it/s] {'loss': 0.2118, 'grad_norm': 0.5043483972549438, 'learning_rate': 6.662978137568731e-06, 'epoch': 1.36}
45%|████▌ | 5220/11526 [54:41<1:04:37, 1.63it/s] 45%|████▌ | 5221/11526 [54:41<1:04:37, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.43984872102737427, 'learning_rate': 6.661549959362641e-06, 'epoch': 1.36}
45%|████▌ | 5221/11526 [54:41<1:04:37, 1.63it/s] 45%|████▌ | 5222/11526 [54:42<1:04:35, 1.63it/s] {'loss': 0.175, 'grad_norm': 0.48794087767601013, 'learning_rate': 6.660121628749736e-06, 'epoch': 1.36}
45%|████▌ | 5222/11526 [54:42<1:04:35, 1.63it/s] 45%|████▌ | 5223/11526 [54:42<1:04:36, 1.63it/s] {'loss': 0.224, 'grad_norm': 0.5917767882347107, 'learning_rate': 6.65869314586103e-06, 'epoch': 1.36}
45%|████▌ | 5223/11526 [54:42<1:04:36, 1.63it/s] 45%|████▌ | 5224/11526 [54:43<1:04:35, 1.63it/s] {'loss': 0.2544, 'grad_norm': 0.6363807320594788, 'learning_rate': 6.6572645108275495e-06, 'epoch': 1.36}
45%|████▌ | 5224/11526 [54:43<1:04:35, 1.63it/s] 45%|████▌ | 5225/11526 [54:43<1:04:32, 1.63it/s] {'loss': 0.1954, 'grad_norm': 0.5501052141189575, 'learning_rate': 6.655835723780338e-06, 'epoch': 1.36}
45%|████▌ | 5225/11526 [54:44<1:04:32, 1.63it/s] 45%|████▌ | 5226/11526 [54:44<1:04:32, 1.63it/s] {'loss': 0.2371, 'grad_norm': 0.6365251541137695, 'learning_rate': 6.654406784850454e-06, 'epoch': 1.36}
45%|████▌ | 5226/11526 [54:44<1:04:32, 1.63it/s] 45%|████▌ | 5227/11526 [54:45<1:04:34, 1.63it/s] {'loss': 0.2535, 'grad_norm': 0.6242861747741699, 'learning_rate': 6.652977694168967e-06, 'epoch': 1.36}
45%|████▌ | 5227/11526 [54:45<1:04:34, 1.63it/s] 45%|████▌ | 5228/11526 [54:45<1:04:51, 1.62it/s] {'loss': 0.2372, 'grad_norm': 0.5790507793426514, 'learning_rate': 6.65154845186696e-06, 'epoch': 1.36}
45%|████▌ | 5228/11526 [54:45<1:04:51, 1.62it/s] 45%|████▌ | 5229/11526 [54:46<1:04:42, 1.62it/s] {'loss': 0.1838, 'grad_norm': 0.496682733297348, 'learning_rate': 6.650119058075533e-06, 'epoch': 1.36}
45%|████▌ | 5229/11526 [54:46<1:04:42, 1.62it/s] 45%|████▌ | 5230/11526 [54:47<1:04:36, 1.62it/s] {'loss': 0.2443, 'grad_norm': 0.6051841378211975, 'learning_rate': 6.648689512925796e-06, 'epoch': 1.36}
45%|████▌ | 5230/11526 [54:47<1:04:36, 1.62it/s] 45%|████▌ | 5231/11526 [54:47<1:04:50, 1.62it/s] {'loss': 0.2329, 'grad_norm': 0.6079108119010925, 'learning_rate': 6.647259816548876e-06, 'epoch': 1.36}
45%|████▌ | 5231/11526 [54:47<1:04:50, 1.62it/s] 45%|████▌ | 5232/11526 [54:48<1:04:41, 1.62it/s] {'loss': 0.2233, 'grad_norm': 0.6147127151489258, 'learning_rate': 6.645829969075914e-06, 'epoch': 1.36}
45%|████▌ | 5232/11526 [54:48<1:04:41, 1.62it/s] 45%|████▌ | 5233/11526 [54:48<1:04:40, 1.62it/s] {'loss': 0.208, 'grad_norm': 0.5142098665237427, 'learning_rate': 6.644399970638062e-06, 'epoch': 1.36}
45%|████▌ | 5233/11526 [54:49<1:04:40, 1.62it/s] 45%|████▌ | 5234/11526 [54:49<1:04:33, 1.62it/s] {'loss': 0.2118, 'grad_norm': 0.6070691347122192, 'learning_rate': 6.642969821366489e-06, 'epoch': 1.36}
45%|████▌ | 5234/11526 [54:49<1:04:33, 1.62it/s] 45%|████▌ | 5235/11526 [54:50<1:04:28, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.42859193682670593, 'learning_rate': 6.6415395213923765e-06, 'epoch': 1.36}
45%|████▌ | 5235/11526 [54:50<1:04:28, 1.63it/s] 45%|████▌ | 5236/11526 [54:50<1:04:32, 1.62it/s] {'loss': 0.1647, 'grad_norm': 0.43767255544662476, 'learning_rate': 6.64010907084692e-06, 'epoch': 1.36}
45%|████▌ | 5236/11526 [54:50<1:04:32, 1.62it/s] 45%|████▌ | 5237/11526 [54:51<1:04:27, 1.63it/s] {'loss': 0.2114, 'grad_norm': 0.5313439965248108, 'learning_rate': 6.638678469861325e-06, 'epoch': 1.36}
45%|████▌ | 5237/11526 [54:51<1:04:27, 1.63it/s] 45%|████▌ | 5238/11526 [54:52<1:04:24, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5311846733093262, 'learning_rate': 6.637247718566819e-06, 'epoch': 1.36}
45%|████▌ | 5238/11526 [54:52<1:04:24, 1.63it/s] 45%|████▌ | 5239/11526 [54:52<1:04:23, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.4210588037967682, 'learning_rate': 6.635816817094634e-06, 'epoch': 1.36}
45%|████▌ | 5239/11526 [54:52<1:04:23, 1.63it/s] 45%|████▌ | 5240/11526 [54:53<1:04:21, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.4377063512802124, 'learning_rate': 6.634385765576027e-06, 'epoch': 1.36}
45%|████▌ | 5240/11526 [54:53<1:04:21, 1.63it/s] 45%|████▌ | 5241/11526 [54:53<1:04:27, 1.62it/s] {'loss': 0.204, 'grad_norm': 0.46161434054374695, 'learning_rate': 6.6329545641422556e-06, 'epoch': 1.36}
45%|████▌ | 5241/11526 [54:53<1:04:27, 1.62it/s] 45%|████▌ | 5242/11526 [54:54<1:04:25, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5045464634895325, 'learning_rate': 6.631523212924601e-06, 'epoch': 1.36}
45%|████▌ | 5242/11526 [54:54<1:04:25, 1.63it/s] 45%|████▌ | 5243/11526 [54:55<1:04:24, 1.63it/s] {'loss': 0.2358, 'grad_norm': 0.5074769854545593, 'learning_rate': 6.630091712054354e-06, 'epoch': 1.36}
45%|████▌ | 5243/11526 [54:55<1:04:24, 1.63it/s] 45%|████▌ | 5244/11526 [54:55<1:04:22, 1.63it/s] {'loss': 0.2589, 'grad_norm': 0.7202687859535217, 'learning_rate': 6.628660061662822e-06, 'epoch': 1.36}
45%|████▌ | 5244/11526 [54:55<1:04:22, 1.63it/s] 46%|████▌ | 5245/11526 [54:56<1:04:20, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.49629905819892883, 'learning_rate': 6.6272282618813214e-06, 'epoch': 1.37}
46%|████▌ | 5245/11526 [54:56<1:04:20, 1.63it/s] 46%|████▌ | 5246/11526 [54:56<1:04:18, 1.63it/s] {'loss': 0.1643, 'grad_norm': 0.47870439291000366, 'learning_rate': 6.625796312841186e-06, 'epoch': 1.37}
46%|████▌ | 5246/11526 [54:57<1:04:18, 1.63it/s] 46%|████▌ | 5247/11526 [54:57<1:04:16, 1.63it/s] {'loss': 0.1958, 'grad_norm': 0.5329071879386902, 'learning_rate': 6.624364214673763e-06, 'epoch': 1.37}
46%|████▌ | 5247/11526 [54:57<1:04:16, 1.63it/s] 46%|████▌ | 5248/11526 [54:58<1:04:16, 1.63it/s] {'loss': 0.265, 'grad_norm': 0.6836670637130737, 'learning_rate': 6.622931967510411e-06, 'epoch': 1.37}
46%|████▌ | 5248/11526 [54:58<1:04:16, 1.63it/s] 46%|████▌ | 5249/11526 [54:58<1:04:14, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.5623175501823425, 'learning_rate': 6.621499571482506e-06, 'epoch': 1.37}
46%|████▌ | 5249/11526 [54:58<1:04:14, 1.63it/s] 46%|████▌ | 5250/11526 [54:59<1:04:13, 1.63it/s] {'loss': 0.2308, 'grad_norm': 0.5150943398475647, 'learning_rate': 6.620067026721433e-06, 'epoch': 1.37}
46%|████▌ | 5250/11526 [54:59<1:04:13, 1.63it/s] 46%|████▌ | 5251/11526 [54:59<1:04:13, 1.63it/s] {'loss': 0.2079, 'grad_norm': 0.531845211982727, 'learning_rate': 6.618634333358596e-06, 'epoch': 1.37}
46%|████▌ | 5251/11526 [55:00<1:04:13, 1.63it/s] 46%|████▌ | 5252/11526 [55:00<1:04:16, 1.63it/s] {'loss': 0.226, 'grad_norm': 0.5436042547225952, 'learning_rate': 6.6172014915254064e-06, 'epoch': 1.37}
46%|████▌ | 5252/11526 [55:00<1:04:16, 1.63it/s] 46%|████▌ | 5253/11526 [55:01<1:04:18, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.598547101020813, 'learning_rate': 6.615768501353297e-06, 'epoch': 1.37}
46%|████▌ | 5253/11526 [55:01<1:04:18, 1.63it/s] 46%|████▌ | 5254/11526 [55:01<1:04:18, 1.63it/s] {'loss': 0.1778, 'grad_norm': 0.5080418586730957, 'learning_rate': 6.614335362973704e-06, 'epoch': 1.37}
46%|████▌ | 5254/11526 [55:01<1:04:18, 1.63it/s] 46%|████▌ | 5255/11526 [55:02<1:04:15, 1.63it/s] {'loss': 0.1575, 'grad_norm': 0.5026316046714783, 'learning_rate': 6.612902076518089e-06, 'epoch': 1.37}
46%|████▌ | 5255/11526 [55:02<1:04:15, 1.63it/s] 46%|████▌ | 5256/11526 [55:03<1:04:15, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.6622915863990784, 'learning_rate': 6.611468642117918e-06, 'epoch': 1.37}
46%|████▌ | 5256/11526 [55:03<1:04:15, 1.63it/s] 46%|████▌ | 5257/11526 [55:03<1:04:13, 1.63it/s] {'loss': 0.2204, 'grad_norm': 0.6206730604171753, 'learning_rate': 6.610035059904674e-06, 'epoch': 1.37}
46%|████▌ | 5257/11526 [55:03<1:04:13, 1.63it/s] 46%|████▌ | 5258/11526 [55:04<1:04:30, 1.62it/s] {'loss': 0.2208, 'grad_norm': 0.566363513469696, 'learning_rate': 6.608601330009853e-06, 'epoch': 1.37}
46%|████▌ | 5258/11526 [55:04<1:04:30, 1.62it/s] 46%|████▌ | 5259/11526 [55:04<1:04:21, 1.62it/s] {'loss': 0.2075, 'grad_norm': 0.547450065612793, 'learning_rate': 6.607167452564965e-06, 'epoch': 1.37}
46%|████▌ | 5259/11526 [55:05<1:04:21, 1.62it/s] 46%|████▌ | 5260/11526 [55:05<1:04:16, 1.62it/s] {'loss': 0.2375, 'grad_norm': 0.6139306426048279, 'learning_rate': 6.605733427701536e-06, 'epoch': 1.37}
46%|████▌ | 5260/11526 [55:05<1:04:16, 1.62it/s] 46%|████▌ | 5261/11526 [55:06<1:04:17, 1.62it/s] {'loss': 0.2188, 'grad_norm': 0.5575916767120361, 'learning_rate': 6.604299255551099e-06, 'epoch': 1.37}
46%|████▌ | 5261/11526 [55:06<1:04:17, 1.62it/s] 46%|████▌ | 5262/11526 [55:06<1:04:15, 1.62it/s] {'loss': 0.227, 'grad_norm': 0.5426295399665833, 'learning_rate': 6.6028649362452065e-06, 'epoch': 1.37}
46%|████▌ | 5262/11526 [55:06<1:04:15, 1.62it/s] 46%|████▌ | 5263/11526 [55:07<1:04:13, 1.63it/s] {'loss': 0.1845, 'grad_norm': 0.5158207416534424, 'learning_rate': 6.601430469915422e-06, 'epoch': 1.37}
46%|████▌ | 5263/11526 [55:07<1:04:13, 1.63it/s] 46%|████▌ | 5264/11526 [55:07<1:04:14, 1.62it/s] {'loss': 0.2414, 'grad_norm': 0.6620558500289917, 'learning_rate': 6.599995856693324e-06, 'epoch': 1.37}
46%|████▌ | 5264/11526 [55:08<1:04:14, 1.62it/s] 46%|████▌ | 5265/11526 [55:08<1:04:11, 1.63it/s] {'loss': 0.207, 'grad_norm': 0.5517960786819458, 'learning_rate': 6.5985610967105015e-06, 'epoch': 1.37}
46%|████▌ | 5265/11526 [55:08<1:04:11, 1.63it/s] 46%|████▌ | 5266/11526 [55:09<1:04:12, 1.63it/s] {'loss': 0.2395, 'grad_norm': 0.6195587515830994, 'learning_rate': 6.597126190098561e-06, 'epoch': 1.37}
46%|████▌ | 5266/11526 [55:09<1:04:12, 1.63it/s] 46%|████▌ | 5267/11526 [55:09<1:04:08, 1.63it/s] {'loss': 0.2426, 'grad_norm': 0.5490666627883911, 'learning_rate': 6.595691136989119e-06, 'epoch': 1.37}
46%|████▌ | 5267/11526 [55:09<1:04:08, 1.63it/s] 46%|████▌ | 5268/11526 [55:10<1:04:13, 1.62it/s] {'loss': 0.1825, 'grad_norm': 0.5181488990783691, 'learning_rate': 6.5942559375138045e-06, 'epoch': 1.37}
46%|████▌ | 5268/11526 [55:10<1:04:13, 1.62it/s] 46%|████▌ | 5269/11526 [55:11<1:04:08, 1.63it/s] {'loss': 0.2521, 'grad_norm': 0.6573134660720825, 'learning_rate': 6.592820591804267e-06, 'epoch': 1.37}
46%|████▌ | 5269/11526 [55:11<1:04:08, 1.63it/s] 46%|████▌ | 5270/11526 [55:11<1:04:06, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.6039023995399475, 'learning_rate': 6.5913850999921606e-06, 'epoch': 1.37}
46%|████▌ | 5270/11526 [55:11<1:04:06, 1.63it/s] 46%|████▌ | 5271/11526 [55:12<1:04:07, 1.63it/s] {'loss': 0.1682, 'grad_norm': 0.4777181148529053, 'learning_rate': 6.589949462209159e-06, 'epoch': 1.37}
46%|████▌ | 5271/11526 [55:12<1:04:07, 1.63it/s] 46%|████▌ | 5272/11526 [55:12<1:04:04, 1.63it/s] {'loss': 0.1744, 'grad_norm': 0.47484543919563293, 'learning_rate': 6.588513678586945e-06, 'epoch': 1.37}
46%|████▌ | 5272/11526 [55:13<1:04:04, 1.63it/s] 46%|████▌ | 5273/11526 [55:13<1:04:05, 1.63it/s] {'loss': 0.1881, 'grad_norm': 0.5435315370559692, 'learning_rate': 6.587077749257219e-06, 'epoch': 1.37}
46%|████▌ | 5273/11526 [55:13<1:04:05, 1.63it/s] 46%|████▌ | 5274/11526 [55:14<1:04:03, 1.63it/s] {'loss': 0.2676, 'grad_norm': 0.6519477367401123, 'learning_rate': 6.585641674351692e-06, 'epoch': 1.37}
46%|████▌ | 5274/11526 [55:14<1:04:03, 1.63it/s] 46%|████▌ | 5275/11526 [55:14<1:04:00, 1.63it/s] {'loss': 0.2132, 'grad_norm': 0.5052947402000427, 'learning_rate': 6.584205454002088e-06, 'epoch': 1.37}
46%|████▌ | 5275/11526 [55:14<1:04:00, 1.63it/s] 46%|████▌ | 5276/11526 [55:15<1:04:03, 1.63it/s] {'loss': 0.168, 'grad_norm': 0.47356274724006653, 'learning_rate': 6.582769088340148e-06, 'epoch': 1.37}
46%|████▌ | 5276/11526 [55:15<1:04:03, 1.63it/s] 46%|████▌ | 5277/11526 [55:15<1:04:01, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.590079128742218, 'learning_rate': 6.581332577497619e-06, 'epoch': 1.37}
46%|████▌ | 5277/11526 [55:16<1:04:01, 1.63it/s] 46%|████▌ | 5278/11526 [55:16<1:04:05, 1.62it/s] {'loss': 0.1724, 'grad_norm': 0.44687336683273315, 'learning_rate': 6.579895921606271e-06, 'epoch': 1.37}
46%|████▌ | 5278/11526 [55:16<1:04:05, 1.62it/s] 46%|████▌ | 5279/11526 [55:17<1:04:01, 1.63it/s] {'loss': 0.1934, 'grad_norm': 0.4782697856426239, 'learning_rate': 6.578459120797878e-06, 'epoch': 1.37}
46%|████▌ | 5279/11526 [55:17<1:04:01, 1.63it/s] 46%|████▌ | 5280/11526 [55:17<1:04:03, 1.62it/s] {'loss': 0.2026, 'grad_norm': 0.49386367201805115, 'learning_rate': 6.577022175204233e-06, 'epoch': 1.37}
46%|████▌ | 5280/11526 [55:17<1:04:03, 1.62it/s] 46%|████▌ | 5281/11526 [55:18<1:03:59, 1.63it/s] {'loss': 0.1771, 'grad_norm': 0.5550651550292969, 'learning_rate': 6.575585084957142e-06, 'epoch': 1.37}
46%|████▌ | 5281/11526 [55:18<1:03:59, 1.63it/s] 46%|████▌ | 5282/11526 [55:19<1:03:57, 1.63it/s] {'loss': 0.2366, 'grad_norm': 0.6658692359924316, 'learning_rate': 6.574147850188423e-06, 'epoch': 1.37}
46%|████▌ | 5282/11526 [55:19<1:03:57, 1.63it/s] 46%|████▌ | 5283/11526 [55:19<1:04:00, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.5233442187309265, 'learning_rate': 6.572710471029906e-06, 'epoch': 1.38}
46%|████▌ | 5283/11526 [55:19<1:04:00, 1.63it/s] 46%|████▌ | 5284/11526 [55:20<1:03:57, 1.63it/s] {'loss': 0.1719, 'grad_norm': 0.47316089272499084, 'learning_rate': 6.571272947613436e-06, 'epoch': 1.38}
46%|████▌ | 5284/11526 [55:20<1:03:57, 1.63it/s] 46%|████▌ | 5285/11526 [55:20<1:03:56, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.44414815306663513, 'learning_rate': 6.569835280070872e-06, 'epoch': 1.38}
46%|████▌ | 5285/11526 [55:21<1:03:56, 1.63it/s] 46%|████▌ | 5286/11526 [55:21<1:03:59, 1.63it/s] {'loss': 0.18, 'grad_norm': 0.5183089375495911, 'learning_rate': 6.568397468534083e-06, 'epoch': 1.38}
46%|████▌ | 5286/11526 [55:21<1:03:59, 1.63it/s] 46%|████▌ | 5287/11526 [55:22<1:03:56, 1.63it/s] {'loss': 0.24, 'grad_norm': 0.5766354203224182, 'learning_rate': 6.566959513134956e-06, 'epoch': 1.38}
46%|████▌ | 5287/11526 [55:22<1:03:56, 1.63it/s] 46%|████▌ | 5288/11526 [55:22<1:03:59, 1.62it/s] {'loss': 0.2055, 'grad_norm': 0.6122263669967651, 'learning_rate': 6.565521414005385e-06, 'epoch': 1.38}
46%|████▌ | 5288/11526 [55:22<1:03:59, 1.62it/s] 46%|████▌ | 5289/11526 [55:23<1:03:56, 1.63it/s] {'loss': 0.2257, 'grad_norm': 0.5647820830345154, 'learning_rate': 6.5640831712772834e-06, 'epoch': 1.38}
46%|████▌ | 5289/11526 [55:23<1:03:56, 1.63it/s] 46%|████▌ | 5290/11526 [55:23<1:03:53, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.40284115076065063, 'learning_rate': 6.562644785082574e-06, 'epoch': 1.38}
46%|████▌ | 5290/11526 [55:24<1:03:53, 1.63it/s] 46%|████▌ | 5291/11526 [55:24<1:03:56, 1.63it/s] {'loss': 0.2975, 'grad_norm': 0.7149738669395447, 'learning_rate': 6.561206255553194e-06, 'epoch': 1.38}
46%|████▌ | 5291/11526 [55:24<1:03:56, 1.63it/s] 46%|████▌ | 5292/11526 [55:25<1:03:53, 1.63it/s] {'loss': 0.2286, 'grad_norm': 0.6450015902519226, 'learning_rate': 6.5597675828210915e-06, 'epoch': 1.38}
46%|████▌ | 5292/11526 [55:25<1:03:53, 1.63it/s] 46%|████▌ | 5293/11526 [55:25<1:03:56, 1.62it/s] {'loss': 0.2009, 'grad_norm': 0.5248563289642334, 'learning_rate': 6.5583287670182335e-06, 'epoch': 1.38}
46%|████▌ | 5293/11526 [55:25<1:03:56, 1.62it/s] 46%|████▌ | 5294/11526 [55:26<1:03:53, 1.63it/s] {'loss': 0.1751, 'grad_norm': 0.4669663906097412, 'learning_rate': 6.5568898082765945e-06, 'epoch': 1.38}
46%|████▌ | 5294/11526 [55:26<1:03:53, 1.63it/s] 46%|████▌ | 5295/11526 [55:27<1:03:50, 1.63it/s] {'loss': 0.1581, 'grad_norm': 0.442773699760437, 'learning_rate': 6.555450706728164e-06, 'epoch': 1.38}
46%|████▌ | 5295/11526 [55:27<1:03:50, 1.63it/s] 46%|████▌ | 5296/11526 [55:27<1:03:52, 1.63it/s] {'loss': 0.2342, 'grad_norm': 0.5503316521644592, 'learning_rate': 6.554011462504943e-06, 'epoch': 1.38}
46%|████▌ | 5296/11526 [55:27<1:03:52, 1.63it/s] 46%|████▌ | 5297/11526 [55:28<1:03:55, 1.62it/s] {'loss': 0.1556, 'grad_norm': 0.45460623502731323, 'learning_rate': 6.55257207573895e-06, 'epoch': 1.38}
46%|████▌ | 5297/11526 [55:28<1:03:55, 1.62it/s] 46%|████▌ | 5298/11526 [55:28<1:03:55, 1.62it/s] {'loss': 0.2886, 'grad_norm': 0.6106911301612854, 'learning_rate': 6.551132546562213e-06, 'epoch': 1.38}
46%|████▌ | 5298/11526 [55:29<1:03:55, 1.62it/s] 46%|████▌ | 5299/11526 [55:29<1:03:52, 1.62it/s] {'loss': 0.1673, 'grad_norm': 0.5399235486984253, 'learning_rate': 6.549692875106771e-06, 'epoch': 1.38}
46%|████▌ | 5299/11526 [55:29<1:03:52, 1.62it/s] 46%|████▌ | 5300/11526 [55:30<1:03:51, 1.62it/s] {'loss': 0.2401, 'grad_norm': 0.6361052989959717, 'learning_rate': 6.548253061504684e-06, 'epoch': 1.38}
46%|████▌ | 5300/11526 [55:30<1:03:51, 1.62it/s] 46%|████▌ | 5301/11526 [55:30<1:03:54, 1.62it/s] {'loss': 0.1936, 'grad_norm': 0.5278553366661072, 'learning_rate': 6.546813105888014e-06, 'epoch': 1.38}
46%|████▌ | 5301/11526 [55:30<1:03:54, 1.62it/s] 46%|████▌ | 5302/11526 [55:31<1:03:55, 1.62it/s] {'loss': 0.259, 'grad_norm': 0.6483287811279297, 'learning_rate': 6.545373008388848e-06, 'epoch': 1.38}
46%|████▌ | 5302/11526 [55:31<1:03:55, 1.62it/s] 46%|████▌ | 5303/11526 [55:31<1:03:55, 1.62it/s] {'loss': 0.2417, 'grad_norm': 0.604720413684845, 'learning_rate': 6.543932769139274e-06, 'epoch': 1.38}
46%|████▌ | 5303/11526 [55:32<1:03:55, 1.62it/s] 46%|████▌ | 5304/11526 [55:32<1:03:57, 1.62it/s] {'loss': 0.1861, 'grad_norm': 0.5147123336791992, 'learning_rate': 6.542492388271404e-06, 'epoch': 1.38}
46%|████▌ | 5304/11526 [55:32<1:03:57, 1.62it/s] 46%|████▌ | 5305/11526 [55:33<1:03:52, 1.62it/s] {'loss': 0.2413, 'grad_norm': 0.6291390657424927, 'learning_rate': 6.541051865917355e-06, 'epoch': 1.38}
46%|████▌ | 5305/11526 [55:33<1:03:52, 1.62it/s] 46%|████▌ | 5306/11526 [55:33<1:03:52, 1.62it/s] {'loss': 0.2853, 'grad_norm': 0.5859425067901611, 'learning_rate': 6.539611202209261e-06, 'epoch': 1.38}
46%|████▌ | 5306/11526 [55:33<1:03:52, 1.62it/s] 46%|████▌ | 5307/11526 [55:34<1:03:48, 1.62it/s] {'loss': 0.2017, 'grad_norm': 0.5508902072906494, 'learning_rate': 6.538170397279266e-06, 'epoch': 1.38}
46%|████▌ | 5307/11526 [55:34<1:03:48, 1.62it/s] 46%|████▌ | 5308/11526 [55:35<1:03:50, 1.62it/s] {'loss': 0.2078, 'grad_norm': 0.6070965528488159, 'learning_rate': 6.5367294512595316e-06, 'epoch': 1.38}
46%|████▌ | 5308/11526 [55:35<1:03:50, 1.62it/s] 46%|████▌ | 5309/11526 [55:35<1:03:46, 1.62it/s] {'loss': 0.1559, 'grad_norm': 0.4885345995426178, 'learning_rate': 6.5352883642822285e-06, 'epoch': 1.38}
46%|████▌ | 5309/11526 [55:35<1:03:46, 1.62it/s] 46%|████▌ | 5310/11526 [55:36<1:03:44, 1.63it/s] {'loss': 0.2663, 'grad_norm': 0.7131391763687134, 'learning_rate': 6.533847136479541e-06, 'epoch': 1.38}
46%|████▌ | 5310/11526 [55:36<1:03:44, 1.63it/s] 46%|████▌ | 5311/11526 [55:36<1:03:45, 1.62it/s] {'loss': 0.168, 'grad_norm': 0.49559855461120605, 'learning_rate': 6.532405767983665e-06, 'epoch': 1.38}
46%|████▌ | 5311/11526 [55:37<1:03:45, 1.62it/s] 46%|████▌ | 5312/11526 [55:37<1:03:42, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5241042375564575, 'learning_rate': 6.530964258926815e-06, 'epoch': 1.38}
46%|████▌ | 5312/11526 [55:37<1:03:42, 1.63it/s] 46%|████▌ | 5313/11526 [55:38<1:03:47, 1.62it/s] {'loss': 0.2155, 'grad_norm': 0.5902872085571289, 'learning_rate': 6.529522609441212e-06, 'epoch': 1.38}
46%|████▌ | 5313/11526 [55:38<1:03:47, 1.62it/s] 46%|████▌ | 5314/11526 [55:38<1:03:40, 1.63it/s] {'loss': 0.16, 'grad_norm': 0.44096800684928894, 'learning_rate': 6.528080819659093e-06, 'epoch': 1.38}
46%|████▌ | 5314/11526 [55:38<1:03:40, 1.63it/s] 46%|████▌ | 5315/11526 [55:39<1:03:38, 1.63it/s] {'loss': 0.2241, 'grad_norm': 0.7306726574897766, 'learning_rate': 6.5266388897127065e-06, 'epoch': 1.38}
46%|████▌ | 5315/11526 [55:39<1:03:38, 1.63it/s] 46%|████▌ | 5316/11526 [55:39<1:03:58, 1.62it/s] {'loss': 0.2084, 'grad_norm': 0.5488961935043335, 'learning_rate': 6.525196819734314e-06, 'epoch': 1.38}
46%|████▌ | 5316/11526 [55:40<1:03:58, 1.62it/s] 46%|████▌ | 5317/11526 [55:40<1:03:50, 1.62it/s] {'loss': 0.2064, 'grad_norm': 0.5305510759353638, 'learning_rate': 6.523754609856192e-06, 'epoch': 1.38}
46%|████▌ | 5317/11526 [55:40<1:03:50, 1.62it/s] 46%|████▌ | 5318/11526 [55:41<1:03:45, 1.62it/s] {'loss': 0.2075, 'grad_norm': 0.5337734222412109, 'learning_rate': 6.522312260210627e-06, 'epoch': 1.38}
46%|████▌ | 5318/11526 [55:41<1:03:45, 1.62it/s] 46%|████▌ | 5319/11526 [55:41<1:03:42, 1.62it/s] {'loss': 0.2535, 'grad_norm': 0.6386659741401672, 'learning_rate': 6.520869770929919e-06, 'epoch': 1.38}
46%|████▌ | 5319/11526 [55:41<1:03:42, 1.62it/s] 46%|████▌ | 5320/11526 [55:42<1:03:38, 1.63it/s] {'loss': 0.188, 'grad_norm': 0.4654123783111572, 'learning_rate': 6.519427142146385e-06, 'epoch': 1.38}
46%|████▌ | 5320/11526 [55:42<1:03:38, 1.63it/s] 46%|████▌ | 5321/11526 [55:43<1:03:40, 1.62it/s] {'loss': 0.2283, 'grad_norm': 0.6063688397407532, 'learning_rate': 6.517984373992347e-06, 'epoch': 1.38}
46%|████▌ | 5321/11526 [55:43<1:03:40, 1.62it/s] 46%|████▌ | 5322/11526 [55:43<1:03:36, 1.63it/s] {'loss': 0.1513, 'grad_norm': 0.44931238889694214, 'learning_rate': 6.516541466600144e-06, 'epoch': 1.39}
46%|████▌ | 5322/11526 [55:43<1:03:36, 1.63it/s] 46%|████▌ | 5323/11526 [55:44<1:03:37, 1.62it/s] {'loss': 0.17, 'grad_norm': 0.44958075881004333, 'learning_rate': 6.51509842010213e-06, 'epoch': 1.39}
46%|████▌ | 5323/11526 [55:44<1:03:37, 1.62it/s] 46%|████▌ | 5324/11526 [55:44<1:03:36, 1.63it/s] {'loss': 0.2367, 'grad_norm': 0.5630375146865845, 'learning_rate': 6.513655234630669e-06, 'epoch': 1.39}
46%|████▌ | 5324/11526 [55:45<1:03:36, 1.63it/s] 46%|████▌ | 5325/11526 [55:45<1:03:33, 1.63it/s] {'loss': 0.2189, 'grad_norm': 0.5953933596611023, 'learning_rate': 6.5122119103181366e-06, 'epoch': 1.39}
46%|████▌ | 5325/11526 [55:45<1:03:33, 1.63it/s] 46%|████▌ | 5326/11526 [55:46<1:03:30, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.40579962730407715, 'learning_rate': 6.510768447296922e-06, 'epoch': 1.39}
46%|████▌ | 5326/11526 [55:46<1:03:30, 1.63it/s] 46%|████▌ | 5327/11526 [55:46<1:03:31, 1.63it/s] {'loss': 0.1847, 'grad_norm': 0.5575274229049683, 'learning_rate': 6.509324845699433e-06, 'epoch': 1.39}
46%|████▌ | 5327/11526 [55:46<1:03:31, 1.63it/s] 46%|████▌ | 5328/11526 [55:47<1:03:34, 1.62it/s] {'loss': 0.1475, 'grad_norm': 0.42242297530174255, 'learning_rate': 6.507881105658079e-06, 'epoch': 1.39}
46%|████▌ | 5328/11526 [55:47<1:03:34, 1.62it/s] 46%|████▌ | 5329/11526 [55:47<1:03:31, 1.63it/s] {'loss': 0.1787, 'grad_norm': 0.48555758595466614, 'learning_rate': 6.5064372273052916e-06, 'epoch': 1.39}
46%|████▌ | 5329/11526 [55:48<1:03:31, 1.63it/s] 46%|████▌ | 5330/11526 [55:48<1:03:29, 1.63it/s] {'loss': 0.2409, 'grad_norm': 0.7080441117286682, 'learning_rate': 6.504993210773509e-06, 'epoch': 1.39}
46%|████▌ | 5330/11526 [55:48<1:03:29, 1.63it/s] 46%|████▋ | 5331/11526 [55:49<1:03:30, 1.63it/s] {'loss': 0.2328, 'grad_norm': 0.6096552610397339, 'learning_rate': 6.503549056195188e-06, 'epoch': 1.39}
46%|████▋ | 5331/11526 [55:49<1:03:30, 1.63it/s] 46%|████▋ | 5332/11526 [55:49<1:03:27, 1.63it/s] {'loss': 0.2116, 'grad_norm': 0.5752704739570618, 'learning_rate': 6.502104763702792e-06, 'epoch': 1.39}
46%|████▋ | 5332/11526 [55:49<1:03:27, 1.63it/s] 46%|████▋ | 5333/11526 [55:50<1:03:33, 1.62it/s] {'loss': 0.1705, 'grad_norm': 0.45453545451164246, 'learning_rate': 6.5006603334288e-06, 'epoch': 1.39}
46%|████▋ | 5333/11526 [55:50<1:03:33, 1.62it/s] 46%|████▋ | 5334/11526 [55:51<1:03:29, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.5281893610954285, 'learning_rate': 6.499215765505704e-06, 'epoch': 1.39}
46%|████▋ | 5334/11526 [55:51<1:03:29, 1.63it/s] 46%|████▋ | 5335/11526 [55:51<1:03:25, 1.63it/s] {'loss': 0.1866, 'grad_norm': 0.48125970363616943, 'learning_rate': 6.497771060066008e-06, 'epoch': 1.39}
46%|████▋ | 5335/11526 [55:51<1:03:25, 1.63it/s] 46%|████▋ | 5336/11526 [55:52<1:03:27, 1.63it/s] {'loss': 0.1959, 'grad_norm': 0.472684383392334, 'learning_rate': 6.496326217242229e-06, 'epoch': 1.39}
46%|████▋ | 5336/11526 [55:52<1:03:27, 1.63it/s] 46%|████▋ | 5337/11526 [55:52<1:03:26, 1.63it/s] {'loss': 0.1921, 'grad_norm': 0.4799150824546814, 'learning_rate': 6.494881237166894e-06, 'epoch': 1.39}
46%|████▋ | 5337/11526 [55:53<1:03:26, 1.63it/s] 46%|████▋ | 5338/11526 [55:53<1:03:30, 1.62it/s] {'loss': 0.1775, 'grad_norm': 0.5243301391601562, 'learning_rate': 6.493436119972548e-06, 'epoch': 1.39}
46%|████▋ | 5338/11526 [55:53<1:03:30, 1.62it/s] 46%|████▋ | 5339/11526 [55:54<1:03:27, 1.62it/s] {'loss': 0.1992, 'grad_norm': 0.5473558902740479, 'learning_rate': 6.491990865791743e-06, 'epoch': 1.39}
46%|████▋ | 5339/11526 [55:54<1:03:27, 1.62it/s] 46%|████▋ | 5340/11526 [55:54<1:03:28, 1.62it/s] {'loss': 0.1934, 'grad_norm': 0.6203209161758423, 'learning_rate': 6.490545474757047e-06, 'epoch': 1.39}
46%|████▋ | 5340/11526 [55:54<1:03:28, 1.62it/s] 46%|████▋ | 5341/11526 [55:55<1:03:27, 1.62it/s] {'loss': 0.2052, 'grad_norm': 0.5574944615364075, 'learning_rate': 6.489099947001039e-06, 'epoch': 1.39}
46%|████▋ | 5341/11526 [55:55<1:03:27, 1.62it/s] 46%|████▋ | 5342/11526 [55:55<1:03:24, 1.63it/s] {'loss': 0.1982, 'grad_norm': 0.47590821981430054, 'learning_rate': 6.4876542826563105e-06, 'epoch': 1.39}
46%|████▋ | 5342/11526 [55:56<1:03:24, 1.63it/s] 46%|████▋ | 5343/11526 [55:56<1:03:29, 1.62it/s] {'loss': 0.2526, 'grad_norm': 0.6766353249549866, 'learning_rate': 6.486208481855467e-06, 'epoch': 1.39}
46%|████▋ | 5343/11526 [55:56<1:03:29, 1.62it/s] 46%|████▋ | 5344/11526 [55:57<1:03:24, 1.62it/s] {'loss': 0.2049, 'grad_norm': 0.5075778961181641, 'learning_rate': 6.4847625447311265e-06, 'epoch': 1.39}
46%|████▋ | 5344/11526 [55:57<1:03:24, 1.62it/s] 46%|████▋ | 5345/11526 [55:57<1:03:20, 1.63it/s] {'loss': 0.2209, 'grad_norm': 0.5406424403190613, 'learning_rate': 6.483316471415917e-06, 'epoch': 1.39}
46%|████▋ | 5345/11526 [55:57<1:03:20, 1.63it/s] 46%|████▋ | 5346/11526 [55:58<1:03:21, 1.63it/s] {'loss': 0.2192, 'grad_norm': 0.5858036279678345, 'learning_rate': 6.481870262042481e-06, 'epoch': 1.39}
46%|████▋ | 5346/11526 [55:58<1:03:21, 1.63it/s] 46%|████▋ | 5347/11526 [55:59<1:03:19, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.510450541973114, 'learning_rate': 6.480423916743474e-06, 'epoch': 1.39}
46%|████▋ | 5347/11526 [55:59<1:03:19, 1.63it/s] 46%|████▋ | 5348/11526 [55:59<1:03:21, 1.63it/s] {'loss': 0.2093, 'grad_norm': 0.6230859756469727, 'learning_rate': 6.478977435651561e-06, 'epoch': 1.39}
46%|████▋ | 5348/11526 [55:59<1:03:21, 1.63it/s] 46%|████▋ | 5349/11526 [56:00<1:03:21, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.506883442401886, 'learning_rate': 6.477530818899422e-06, 'epoch': 1.39}
46%|████▋ | 5349/11526 [56:00<1:03:21, 1.63it/s] 46%|████▋ | 5350/11526 [56:00<1:03:16, 1.63it/s] {'loss': 0.2393, 'grad_norm': 0.6247755885124207, 'learning_rate': 6.47608406661975e-06, 'epoch': 1.39}
46%|████▋ | 5350/11526 [56:01<1:03:16, 1.63it/s] 46%|████▋ | 5351/11526 [56:01<1:03:14, 1.63it/s] {'loss': 0.2633, 'grad_norm': 0.6277180910110474, 'learning_rate': 6.474637178945249e-06, 'epoch': 1.39}
46%|████▋ | 5351/11526 [56:01<1:03:14, 1.63it/s] 46%|████▋ | 5352/11526 [56:02<1:03:13, 1.63it/s] {'loss': 0.1898, 'grad_norm': 0.49444183707237244, 'learning_rate': 6.473190156008635e-06, 'epoch': 1.39}
46%|████▋ | 5352/11526 [56:02<1:03:13, 1.63it/s] 46%|████▋ | 5353/11526 [56:02<1:03:18, 1.63it/s] {'loss': 0.221, 'grad_norm': 0.5433229804039001, 'learning_rate': 6.471742997942639e-06, 'epoch': 1.39}
46%|████▋ | 5353/11526 [56:02<1:03:18, 1.63it/s] 46%|████▋ | 5354/11526 [56:03<1:03:15, 1.63it/s] {'loss': 0.1755, 'grad_norm': 0.4722549319267273, 'learning_rate': 6.4702957048800005e-06, 'epoch': 1.39}
46%|████▋ | 5354/11526 [56:03<1:03:15, 1.63it/s] 46%|████▋ | 5355/11526 [56:03<1:03:14, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.5596209764480591, 'learning_rate': 6.4688482769534745e-06, 'epoch': 1.39}
46%|████▋ | 5355/11526 [56:04<1:03:14, 1.63it/s] 46%|████▋ | 5356/11526 [56:04<1:03:12, 1.63it/s] {'loss': 0.2274, 'grad_norm': 0.5630086064338684, 'learning_rate': 6.467400714295827e-06, 'epoch': 1.39}
46%|████▋ | 5356/11526 [56:04<1:03:12, 1.63it/s] 46%|████▋ | 5357/11526 [56:05<1:03:09, 1.63it/s] {'loss': 0.2434, 'grad_norm': 0.6099865436553955, 'learning_rate': 6.4659530170398365e-06, 'epoch': 1.39}
46%|████▋ | 5357/11526 [56:05<1:03:09, 1.63it/s] 46%|████▋ | 5358/11526 [56:05<1:03:14, 1.63it/s] {'loss': 0.2556, 'grad_norm': 0.6372237801551819, 'learning_rate': 6.464505185318296e-06, 'epoch': 1.39}
46%|████▋ | 5358/11526 [56:05<1:03:14, 1.63it/s] 46%|████▋ | 5359/11526 [56:06<1:03:10, 1.63it/s] {'loss': 0.2354, 'grad_norm': 0.6321574449539185, 'learning_rate': 6.463057219264005e-06, 'epoch': 1.39}
46%|████▋ | 5359/11526 [56:06<1:03:10, 1.63it/s] 47%|████▋ | 5360/11526 [56:07<1:03:09, 1.63it/s] {'loss': 0.2458, 'grad_norm': 0.6006023287773132, 'learning_rate': 6.461609119009783e-06, 'epoch': 1.4}
47%|████▋ | 5360/11526 [56:07<1:03:09, 1.63it/s] 47%|████▋ | 5361/11526 [56:07<1:03:08, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.4641490578651428, 'learning_rate': 6.460160884688455e-06, 'epoch': 1.4}
47%|████▋ | 5361/11526 [56:07<1:03:08, 1.63it/s] 47%|████▋ | 5362/11526 [56:08<1:03:08, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5456270575523376, 'learning_rate': 6.458712516432865e-06, 'epoch': 1.4}
47%|████▋ | 5362/11526 [56:08<1:03:08, 1.63it/s] 47%|████▋ | 5363/11526 [56:08<1:03:15, 1.62it/s] {'loss': 0.2196, 'grad_norm': 0.5989719033241272, 'learning_rate': 6.457264014375862e-06, 'epoch': 1.4}
47%|████▋ | 5363/11526 [56:09<1:03:15, 1.62it/s] 47%|████▋ | 5364/11526 [56:09<1:03:11, 1.63it/s] {'loss': 0.2316, 'grad_norm': 0.5131886601448059, 'learning_rate': 6.455815378650309e-06, 'epoch': 1.4}
47%|████▋ | 5364/11526 [56:09<1:03:11, 1.63it/s] 47%|████▋ | 5365/11526 [56:10<1:03:09, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.48584845662117004, 'learning_rate': 6.454366609389091e-06, 'epoch': 1.4}
47%|████▋ | 5365/11526 [56:10<1:03:09, 1.63it/s] 47%|████▋ | 5366/11526 [56:10<1:03:07, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.534580409526825, 'learning_rate': 6.45291770672509e-06, 'epoch': 1.4}
47%|████▋ | 5366/11526 [56:10<1:03:07, 1.63it/s] 47%|████▋ | 5367/11526 [56:11<1:03:09, 1.63it/s] {'loss': 0.2145, 'grad_norm': 0.6161578893661499, 'learning_rate': 6.451468670791211e-06, 'epoch': 1.4}
47%|████▋ | 5367/11526 [56:11<1:03:09, 1.63it/s] 47%|████▋ | 5368/11526 [56:11<1:03:12, 1.62it/s] {'loss': 0.1864, 'grad_norm': 0.4823785424232483, 'learning_rate': 6.450019501720366e-06, 'epoch': 1.4}
47%|████▋ | 5368/11526 [56:12<1:03:12, 1.62it/s] 47%|████▋ | 5369/11526 [56:12<1:03:09, 1.62it/s] {'loss': 0.1876, 'grad_norm': 0.5358800888061523, 'learning_rate': 6.448570199645484e-06, 'epoch': 1.4}
47%|████▋ | 5369/11526 [56:12<1:03:09, 1.62it/s] 47%|████▋ | 5370/11526 [56:13<1:03:07, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.4400717318058014, 'learning_rate': 6.4471207646995e-06, 'epoch': 1.4}
47%|████▋ | 5370/11526 [56:13<1:03:07, 1.63it/s] 47%|████▋ | 5371/11526 [56:13<1:03:10, 1.62it/s] {'loss': 0.234, 'grad_norm': 0.5754537582397461, 'learning_rate': 6.4456711970153664e-06, 'epoch': 1.4}
47%|████▋ | 5371/11526 [56:13<1:03:10, 1.62it/s] 47%|████▋ | 5372/11526 [56:14<1:03:06, 1.63it/s] {'loss': 0.1879, 'grad_norm': 0.5437793135643005, 'learning_rate': 6.444221496726043e-06, 'epoch': 1.4}
47%|████▋ | 5372/11526 [56:14<1:03:06, 1.63it/s] 47%|████▋ | 5373/11526 [56:15<1:03:09, 1.62it/s] {'loss': 0.2434, 'grad_norm': 0.5851292610168457, 'learning_rate': 6.4427716639645086e-06, 'epoch': 1.4}
47%|████▋ | 5373/11526 [56:15<1:03:09, 1.62it/s] 47%|████▋ | 5374/11526 [56:15<1:03:06, 1.62it/s] {'loss': 0.17, 'grad_norm': 0.50721275806427, 'learning_rate': 6.441321698863749e-06, 'epoch': 1.4}
47%|████▋ | 5374/11526 [56:15<1:03:06, 1.62it/s] 47%|████▋ | 5375/11526 [56:16<1:03:04, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.4210149645805359, 'learning_rate': 6.43987160155676e-06, 'epoch': 1.4}
47%|████▋ | 5375/11526 [56:16<1:03:04, 1.63it/s] 47%|████▋ | 5376/11526 [56:16<1:03:02, 1.63it/s] {'loss': 0.2049, 'grad_norm': 0.5190760493278503, 'learning_rate': 6.4384213721765565e-06, 'epoch': 1.4}
47%|████▋ | 5376/11526 [56:17<1:03:02, 1.63it/s] 47%|████▋ | 5377/11526 [56:17<1:02:59, 1.63it/s] {'loss': 0.2474, 'grad_norm': 0.6285850405693054, 'learning_rate': 6.43697101085616e-06, 'epoch': 1.4}
47%|████▋ | 5377/11526 [56:17<1:02:59, 1.63it/s] 47%|████▋ | 5378/11526 [56:18<1:02:55, 1.63it/s] {'loss': 0.2034, 'grad_norm': 0.578493058681488, 'learning_rate': 6.435520517728607e-06, 'epoch': 1.4}
47%|████▋ | 5378/11526 [56:18<1:02:55, 1.63it/s] 47%|████▋ | 5379/11526 [56:18<1:02:56, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.48444321751594543, 'learning_rate': 6.434069892926943e-06, 'epoch': 1.4}
47%|████▋ | 5379/11526 [56:18<1:02:56, 1.63it/s] 47%|████▋ | 5380/11526 [56:19<1:02:54, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.5388681888580322, 'learning_rate': 6.432619136584231e-06, 'epoch': 1.4}
47%|████▋ | 5380/11526 [56:19<1:02:54, 1.63it/s] 47%|████▋ | 5381/11526 [56:19<1:02:58, 1.63it/s] {'loss': 0.1527, 'grad_norm': 0.4521864652633667, 'learning_rate': 6.431168248833539e-06, 'epoch': 1.4}
47%|████▋ | 5381/11526 [56:20<1:02:58, 1.63it/s] 47%|████▋ | 5382/11526 [56:20<1:02:55, 1.63it/s] {'loss': 0.2224, 'grad_norm': 0.5390557646751404, 'learning_rate': 6.429717229807953e-06, 'epoch': 1.4}
47%|████▋ | 5382/11526 [56:20<1:02:55, 1.63it/s] 47%|████▋ | 5383/11526 [56:21<1:02:58, 1.63it/s] {'loss': 0.1649, 'grad_norm': 0.44728603959083557, 'learning_rate': 6.428266079640567e-06, 'epoch': 1.4}
47%|████▋ | 5383/11526 [56:21<1:02:58, 1.63it/s] 47%|████▋ | 5384/11526 [56:21<1:02:57, 1.63it/s] {'loss': 0.2225, 'grad_norm': 0.5685698390007019, 'learning_rate': 6.4268147984644906e-06, 'epoch': 1.4}
47%|████▋ | 5384/11526 [56:21<1:02:57, 1.63it/s] 47%|████▋ | 5385/11526 [56:22<1:02:56, 1.63it/s] {'loss': 0.2036, 'grad_norm': 0.5530596375465393, 'learning_rate': 6.4253633864128425e-06, 'epoch': 1.4}
47%|████▋ | 5385/11526 [56:22<1:02:56, 1.63it/s] 47%|████▋ | 5386/11526 [56:23<1:02:57, 1.63it/s] {'loss': 0.2219, 'grad_norm': 0.5347642302513123, 'learning_rate': 6.4239118436187545e-06, 'epoch': 1.4}
47%|████▋ | 5386/11526 [56:23<1:02:57, 1.63it/s] 47%|████▋ | 5387/11526 [56:23<1:02:53, 1.63it/s] {'loss': 0.2076, 'grad_norm': 0.5282899141311646, 'learning_rate': 6.42246017021537e-06, 'epoch': 1.4}
47%|████▋ | 5387/11526 [56:23<1:02:53, 1.63it/s] 47%|████▋ | 5388/11526 [56:24<1:02:56, 1.63it/s] {'loss': 0.1603, 'grad_norm': 0.44762924313545227, 'learning_rate': 6.421008366335844e-06, 'epoch': 1.4}
47%|████▋ | 5388/11526 [56:24<1:02:56, 1.63it/s] 47%|████▋ | 5389/11526 [56:24<1:02:53, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.49559298157691956, 'learning_rate': 6.419556432113349e-06, 'epoch': 1.4}
47%|████▋ | 5389/11526 [56:25<1:02:53, 1.63it/s] 47%|████▋ | 5390/11526 [56:25<1:02:51, 1.63it/s] {'loss': 0.2195, 'grad_norm': 0.6051715016365051, 'learning_rate': 6.418104367681058e-06, 'epoch': 1.4}
47%|████▋ | 5390/11526 [56:25<1:02:51, 1.63it/s] 47%|████▋ | 5391/11526 [56:26<1:02:49, 1.63it/s] {'loss': 0.2165, 'grad_norm': 0.5749416351318359, 'learning_rate': 6.416652173172166e-06, 'epoch': 1.4}
47%|████▋ | 5391/11526 [56:26<1:02:49, 1.63it/s] 47%|████▋ | 5392/11526 [56:26<1:02:47, 1.63it/s] {'loss': 0.146, 'grad_norm': 0.48849475383758545, 'learning_rate': 6.4151998487198775e-06, 'epoch': 1.4}
47%|████▋ | 5392/11526 [56:26<1:02:47, 1.63it/s] 47%|████▋ | 5393/11526 [56:27<1:02:52, 1.63it/s] {'loss': 0.1703, 'grad_norm': 0.47770246863365173, 'learning_rate': 6.413747394457407e-06, 'epoch': 1.4}
47%|████▋ | 5393/11526 [56:27<1:02:52, 1.63it/s] 47%|████▋ | 5394/11526 [56:27<1:02:48, 1.63it/s] {'loss': 0.2024, 'grad_norm': 0.5689165592193604, 'learning_rate': 6.41229481051798e-06, 'epoch': 1.4}
47%|████▋ | 5394/11526 [56:28<1:02:48, 1.63it/s] 47%|████▋ | 5395/11526 [56:28<1:02:48, 1.63it/s] {'loss': 0.2286, 'grad_norm': 0.5927143096923828, 'learning_rate': 6.410842097034839e-06, 'epoch': 1.4}
47%|████▋ | 5395/11526 [56:28<1:02:48, 1.63it/s] 47%|████▋ | 5396/11526 [56:29<1:02:45, 1.63it/s] {'loss': 0.2103, 'grad_norm': 0.6006422638893127, 'learning_rate': 6.409389254141233e-06, 'epoch': 1.4}
47%|████▋ | 5396/11526 [56:29<1:02:45, 1.63it/s] 47%|████▋ | 5397/11526 [56:29<1:02:43, 1.63it/s] {'loss': 0.164, 'grad_norm': 0.6913142800331116, 'learning_rate': 6.407936281970425e-06, 'epoch': 1.4}
47%|████▋ | 5397/11526 [56:29<1:02:43, 1.63it/s] 47%|████▋ | 5398/11526 [56:30<1:02:47, 1.63it/s] {'loss': 0.2355, 'grad_norm': 0.6457868218421936, 'learning_rate': 6.406483180655691e-06, 'epoch': 1.4}
47%|████▋ | 5398/11526 [56:30<1:02:47, 1.63it/s] 47%|████▋ | 5399/11526 [56:31<1:02:43, 1.63it/s] {'loss': 0.1947, 'grad_norm': 0.4563107490539551, 'learning_rate': 6.405029950330315e-06, 'epoch': 1.41}
47%|████▋ | 5399/11526 [56:31<1:02:43, 1.63it/s] 47%|████▋ | 5400/11526 [56:31<1:02:42, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.451543390750885, 'learning_rate': 6.403576591127601e-06, 'epoch': 1.41}
47%|████▋ | 5400/11526 [56:31<1:02:42, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 12.41it/s]
31%|███ | 4/13 [00:00<00:01, 8.23it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.69it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.35it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.13it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.98it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.88it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5812064409255981, 'eval_runtime': 1.9661, 'eval_samples_per_second': 101.723, 'eval_steps_per_second': 6.612, 'epoch': 1.41}
47%|████▋ | 5400/11526 [56:33<1:02:42, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 47%|████▋ | 5401/11526 [56:34<2:03:04, 1.21s/it] {'loss': 0.1775, 'grad_norm': 0.5351911187171936, 'learning_rate': 6.402123103180854e-06, 'epoch': 1.41}
47%|████▋ | 5401/11526 [56:34<2:03:04, 1.21s/it] 47%|████▋ | 5402/11526 [56:34<1:44:54, 1.03s/it] {'loss': 0.2321, 'grad_norm': 0.6013830900192261, 'learning_rate': 6.4006694866233985e-06, 'epoch': 1.41}
47%|████▋ | 5402/11526 [56:34<1:44:54, 1.03s/it] 47%|████▋ | 5403/11526 [56:35<1:32:20, 1.11it/s] {'loss': 0.2162, 'grad_norm': 0.5683925151824951, 'learning_rate': 6.399215741588569e-06, 'epoch': 1.41}
47%|████▋ | 5403/11526 [56:35<1:32:20, 1.11it/s] 47%|████▋ | 5404/11526 [56:36<1:23:24, 1.22it/s] {'loss': 0.2098, 'grad_norm': 0.5688738226890564, 'learning_rate': 6.3977618682097086e-06, 'epoch': 1.41}
47%|████▋ | 5404/11526 [56:36<1:23:24, 1.22it/s] 47%|████▋ | 5405/11526 [56:36<1:17:11, 1.32it/s] {'loss': 0.1908, 'grad_norm': 0.5932753682136536, 'learning_rate': 6.3963078666201785e-06, 'epoch': 1.41}
47%|████▋ | 5405/11526 [56:36<1:17:11, 1.32it/s] 47%|████▋ | 5406/11526 [56:37<1:12:46, 1.40it/s] {'loss': 0.1994, 'grad_norm': 0.5439413785934448, 'learning_rate': 6.394853736953345e-06, 'epoch': 1.41}
47%|████▋ | 5406/11526 [56:37<1:12:46, 1.40it/s] 47%|████▋ | 5407/11526 [56:37<1:09:44, 1.46it/s] {'loss': 0.1938, 'grad_norm': 0.4872756600379944, 'learning_rate': 6.393399479342593e-06, 'epoch': 1.41}
47%|████▋ | 5407/11526 [56:38<1:09:44, 1.46it/s] 47%|████▋ | 5408/11526 [56:38<1:07:35, 1.51it/s] {'loss': 0.1931, 'grad_norm': 0.5636178255081177, 'learning_rate': 6.391945093921309e-06, 'epoch': 1.41}
47%|████▋ | 5408/11526 [56:38<1:07:35, 1.51it/s] 47%|████▋ | 5409/11526 [56:39<1:06:07, 1.54it/s] {'loss': 0.1998, 'grad_norm': 0.5432283878326416, 'learning_rate': 6.390490580822902e-06, 'epoch': 1.41}
47%|████▋ | 5409/11526 [56:39<1:06:07, 1.54it/s] 47%|████▋ | 5410/11526 [56:39<1:05:03, 1.57it/s] {'loss': 0.2143, 'grad_norm': 0.5816391110420227, 'learning_rate': 6.389035940180789e-06, 'epoch': 1.41}
47%|████▋ | 5410/11526 [56:39<1:05:03, 1.57it/s] 47%|████▋ | 5411/11526 [56:40<1:04:17, 1.59it/s] {'loss': 0.2113, 'grad_norm': 0.5831997990608215, 'learning_rate': 6.387581172128395e-06, 'epoch': 1.41}
47%|████▋ | 5411/11526 [56:40<1:04:17, 1.59it/s] 47%|████▋ | 5412/11526 [56:40<1:03:47, 1.60it/s] {'loss': 0.2311, 'grad_norm': 0.6206128597259521, 'learning_rate': 6.386126276799162e-06, 'epoch': 1.41}
47%|████▋ | 5412/11526 [56:41<1:03:47, 1.60it/s] 47%|████▋ | 5413/11526 [56:41<1:03:31, 1.60it/s] {'loss': 0.2371, 'grad_norm': 0.5914543867111206, 'learning_rate': 6.384671254326539e-06, 'epoch': 1.41}
47%|████▋ | 5413/11526 [56:41<1:03:31, 1.60it/s] 47%|████▋ | 5414/11526 [56:42<1:03:13, 1.61it/s] {'loss': 0.2096, 'grad_norm': 0.5121464133262634, 'learning_rate': 6.383216104843991e-06, 'epoch': 1.41}
47%|████▋ | 5414/11526 [56:42<1:03:13, 1.61it/s] 47%|████▋ | 5415/11526 [56:42<1:02:59, 1.62it/s] {'loss': 0.18, 'grad_norm': 0.5324555039405823, 'learning_rate': 6.381760828484991e-06, 'epoch': 1.41}
47%|████▋ | 5415/11526 [56:42<1:02:59, 1.62it/s] 47%|████▋ | 5416/11526 [56:43<1:02:51, 1.62it/s] {'loss': 0.2318, 'grad_norm': 0.6566379070281982, 'learning_rate': 6.380305425383027e-06, 'epoch': 1.41}
47%|████▋ | 5416/11526 [56:43<1:02:51, 1.62it/s] 47%|████▋ | 5417/11526 [56:44<1:02:44, 1.62it/s] {'loss': 0.1998, 'grad_norm': 0.49504247307777405, 'learning_rate': 6.378849895671594e-06, 'epoch': 1.41}
47%|████▋ | 5417/11526 [56:44<1:02:44, 1.62it/s] 47%|████▋ | 5418/11526 [56:44<1:02:39, 1.62it/s] {'loss': 0.2632, 'grad_norm': 0.6032097339630127, 'learning_rate': 6.377394239484205e-06, 'epoch': 1.41}
47%|████▋ | 5418/11526 [56:44<1:02:39, 1.62it/s] 47%|████▋ | 5419/11526 [56:45<1:02:37, 1.63it/s] {'loss': 0.2521, 'grad_norm': 0.6368476152420044, 'learning_rate': 6.375938456954378e-06, 'epoch': 1.41}
47%|████▋ | 5419/11526 [56:45<1:02:37, 1.63it/s] 47%|████▋ | 5420/11526 [56:45<1:02:34, 1.63it/s] {'loss': 0.1751, 'grad_norm': 0.5311503410339355, 'learning_rate': 6.374482548215648e-06, 'epoch': 1.41}
47%|████▋ | 5420/11526 [56:46<1:02:34, 1.63it/s] 47%|████▋ | 5421/11526 [56:46<1:02:45, 1.62it/s] {'loss': 0.1542, 'grad_norm': 0.47439491748809814, 'learning_rate': 6.373026513401557e-06, 'epoch': 1.41}
47%|████▋ | 5421/11526 [56:46<1:02:45, 1.62it/s] 47%|████▋ | 5422/11526 [56:47<1:02:40, 1.62it/s] {'loss': 0.1635, 'grad_norm': 0.4520779252052307, 'learning_rate': 6.371570352645663e-06, 'epoch': 1.41}
47%|████▋ | 5422/11526 [56:47<1:02:40, 1.62it/s] 47%|████▋ | 5423/11526 [56:47<1:02:43, 1.62it/s] {'loss': 0.1855, 'grad_norm': 0.5457897782325745, 'learning_rate': 6.370114066081532e-06, 'epoch': 1.41}
47%|████▋ | 5423/11526 [56:47<1:02:43, 1.62it/s] 47%|████▋ | 5424/11526 [56:48<1:02:39, 1.62it/s] {'loss': 0.185, 'grad_norm': 0.5326676368713379, 'learning_rate': 6.368657653842744e-06, 'epoch': 1.41}
47%|████▋ | 5424/11526 [56:48<1:02:39, 1.62it/s] 47%|████▋ | 5425/11526 [56:48<1:02:37, 1.62it/s] {'loss': 0.2425, 'grad_norm': 0.6412667632102966, 'learning_rate': 6.367201116062886e-06, 'epoch': 1.41}
47%|████▋ | 5425/11526 [56:49<1:02:37, 1.62it/s] 47%|████▋ | 5426/11526 [56:49<1:02:35, 1.62it/s] {'loss': 0.192, 'grad_norm': 0.6161933541297913, 'learning_rate': 6.365744452875565e-06, 'epoch': 1.41}
47%|████▋ | 5426/11526 [56:49<1:02:35, 1.62it/s] 47%|████▋ | 5427/11526 [56:50<1:02:31, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.45582717657089233, 'learning_rate': 6.364287664414393e-06, 'epoch': 1.41}
47%|████▋ | 5427/11526 [56:50<1:02:31, 1.63it/s] 47%|████▋ | 5428/11526 [56:50<1:02:33, 1.62it/s] {'loss': 0.2667, 'grad_norm': 0.732326865196228, 'learning_rate': 6.362830750812992e-06, 'epoch': 1.41}
47%|████▋ | 5428/11526 [56:50<1:02:33, 1.62it/s] 47%|████▋ | 5429/11526 [56:51<1:02:29, 1.63it/s] {'loss': 0.2749, 'grad_norm': 0.6724960207939148, 'learning_rate': 6.361373712205e-06, 'epoch': 1.41}
47%|████▋ | 5429/11526 [56:51<1:02:29, 1.63it/s] 47%|████▋ | 5430/11526 [56:52<1:02:25, 1.63it/s] {'loss': 0.2519, 'grad_norm': 0.6366010904312134, 'learning_rate': 6.359916548724065e-06, 'epoch': 1.41}
47%|████▋ | 5430/11526 [56:52<1:02:25, 1.63it/s] 47%|████▋ | 5431/11526 [56:52<1:02:27, 1.63it/s] {'loss': 0.2525, 'grad_norm': 0.6206581592559814, 'learning_rate': 6.3584592605038484e-06, 'epoch': 1.41}
47%|████▋ | 5431/11526 [56:52<1:02:27, 1.63it/s] 47%|████▋ | 5432/11526 [56:53<1:02:24, 1.63it/s] {'loss': 0.244, 'grad_norm': 0.633124828338623, 'learning_rate': 6.3570018476780184e-06, 'epoch': 1.41}
47%|████▋ | 5432/11526 [56:53<1:02:24, 1.63it/s] 47%|████▋ | 5433/11526 [56:53<1:02:22, 1.63it/s] {'loss': 0.1984, 'grad_norm': 0.5658910870552063, 'learning_rate': 6.355544310380257e-06, 'epoch': 1.41}
47%|████▋ | 5433/11526 [56:54<1:02:22, 1.63it/s] 47%|████▋ | 5434/11526 [56:54<1:02:24, 1.63it/s] {'loss': 0.2236, 'grad_norm': 0.5593621134757996, 'learning_rate': 6.35408664874426e-06, 'epoch': 1.41}
47%|████▋ | 5434/11526 [56:54<1:02:24, 1.63it/s] 47%|████▋ | 5435/11526 [56:55<1:02:21, 1.63it/s] {'loss': 0.287, 'grad_norm': 0.5229024291038513, 'learning_rate': 6.3526288629037315e-06, 'epoch': 1.41}
47%|████▋ | 5435/11526 [56:55<1:02:21, 1.63it/s] 47%|████▋ | 5436/11526 [56:55<1:02:42, 1.62it/s] {'loss': 0.1931, 'grad_norm': 0.5018503665924072, 'learning_rate': 6.351170952992386e-06, 'epoch': 1.41}
47%|████▋ | 5436/11526 [56:55<1:02:42, 1.62it/s] 47%|████▋ | 5437/11526 [56:56<1:02:35, 1.62it/s] {'loss': 0.2061, 'grad_norm': 0.5839714407920837, 'learning_rate': 6.349712919143954e-06, 'epoch': 1.42}
47%|████▋ | 5437/11526 [56:56<1:02:35, 1.62it/s] 47%|████▋ | 5438/11526 [56:57<1:02:35, 1.62it/s] {'loss': 0.2271, 'grad_norm': 0.5911531448364258, 'learning_rate': 6.348254761492172e-06, 'epoch': 1.42}
47%|████▋ | 5438/11526 [56:57<1:02:35, 1.62it/s] 47%|████▋ | 5439/11526 [56:57<1:02:30, 1.62it/s] {'loss': 0.2496, 'grad_norm': 0.6008501052856445, 'learning_rate': 6.346796480170794e-06, 'epoch': 1.42}
47%|████▋ | 5439/11526 [56:57<1:02:30, 1.62it/s] 47%|████▋ | 5440/11526 [56:58<1:02:27, 1.62it/s] {'loss': 0.182, 'grad_norm': 0.43095552921295166, 'learning_rate': 6.345338075313579e-06, 'epoch': 1.42}
47%|████▋ | 5440/11526 [56:58<1:02:27, 1.62it/s] 47%|████▋ | 5441/11526 [56:58<1:02:28, 1.62it/s] {'loss': 0.155, 'grad_norm': 0.4104529917240143, 'learning_rate': 6.343879547054299e-06, 'epoch': 1.42}
47%|████▋ | 5441/11526 [56:58<1:02:28, 1.62it/s] 47%|████▋ | 5442/11526 [56:59<1:02:23, 1.63it/s] {'loss': 0.2362, 'grad_norm': 0.6388903856277466, 'learning_rate': 6.342420895526744e-06, 'epoch': 1.42}
47%|████▋ | 5442/11526 [56:59<1:02:23, 1.63it/s] 47%|████▋ | 5443/11526 [57:00<1:02:29, 1.62it/s] {'loss': 0.2709, 'grad_norm': 0.641925573348999, 'learning_rate': 6.340962120864704e-06, 'epoch': 1.42}
47%|████▋ | 5443/11526 [57:00<1:02:29, 1.62it/s] 47%|████▋ | 5444/11526 [57:00<1:02:22, 1.62it/s] {'loss': 0.1999, 'grad_norm': 0.5039745569229126, 'learning_rate': 6.339503223201991e-06, 'epoch': 1.42}
47%|████▋ | 5444/11526 [57:00<1:02:22, 1.62it/s] 47%|████▋ | 5445/11526 [57:01<1:02:19, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5152844190597534, 'learning_rate': 6.338044202672419e-06, 'epoch': 1.42}
47%|████▋ | 5445/11526 [57:01<1:02:19, 1.63it/s] 47%|████▋ | 5446/11526 [57:01<1:02:21, 1.62it/s] {'loss': 0.2175, 'grad_norm': 0.5550540089607239, 'learning_rate': 6.33658505940982e-06, 'epoch': 1.42}
47%|████▋ | 5446/11526 [57:02<1:02:21, 1.62it/s] 47%|████▋ | 5447/11526 [57:02<1:02:16, 1.63it/s] {'loss': 0.1956, 'grad_norm': 0.5142914652824402, 'learning_rate': 6.335125793548034e-06, 'epoch': 1.42}
47%|████▋ | 5447/11526 [57:02<1:02:16, 1.63it/s] 47%|████▋ | 5448/11526 [57:03<1:02:12, 1.63it/s] {'loss': 0.2481, 'grad_norm': 0.5505771040916443, 'learning_rate': 6.333666405220913e-06, 'epoch': 1.42}
47%|████▋ | 5448/11526 [57:03<1:02:12, 1.63it/s] 47%|████▋ | 5449/11526 [57:03<1:02:13, 1.63it/s] {'loss': 0.2068, 'grad_norm': 0.5607795119285583, 'learning_rate': 6.332206894562324e-06, 'epoch': 1.42}
47%|████▋ | 5449/11526 [57:03<1:02:13, 1.63it/s] 47%|████▋ | 5450/11526 [57:04<1:02:10, 1.63it/s] {'loss': 0.2563, 'grad_norm': 0.588802695274353, 'learning_rate': 6.330747261706136e-06, 'epoch': 1.42}
47%|████▋ | 5450/11526 [57:04<1:02:10, 1.63it/s] 47%|████▋ | 5451/11526 [57:04<1:02:14, 1.63it/s] {'loss': 0.193, 'grad_norm': 0.5601218938827515, 'learning_rate': 6.329287506786238e-06, 'epoch': 1.42}
47%|████▋ | 5451/11526 [57:05<1:02:14, 1.63it/s] 47%|████▋ | 5452/11526 [57:05<1:02:11, 1.63it/s] {'loss': 0.2374, 'grad_norm': 0.6482681632041931, 'learning_rate': 6.327827629936528e-06, 'epoch': 1.42}
47%|████▋ | 5452/11526 [57:05<1:02:11, 1.63it/s] 47%|████▋ | 5453/11526 [57:06<1:02:11, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.4610179364681244, 'learning_rate': 6.326367631290912e-06, 'epoch': 1.42}
47%|████▋ | 5453/11526 [57:06<1:02:11, 1.63it/s] 47%|████▋ | 5454/11526 [57:06<1:02:10, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.4602057933807373, 'learning_rate': 6.32490751098331e-06, 'epoch': 1.42}
47%|████▋ | 5454/11526 [57:06<1:02:10, 1.63it/s] 47%|████▋ | 5455/11526 [57:07<1:02:08, 1.63it/s] {'loss': 0.1818, 'grad_norm': 0.4602288007736206, 'learning_rate': 6.323447269147651e-06, 'epoch': 1.42}
47%|████▋ | 5455/11526 [57:07<1:02:08, 1.63it/s] 47%|████▋ | 5456/11526 [57:08<1:02:12, 1.63it/s] {'loss': 0.1832, 'grad_norm': 0.5355375409126282, 'learning_rate': 6.3219869059178805e-06, 'epoch': 1.42}
47%|████▋ | 5456/11526 [57:08<1:02:12, 1.63it/s] 47%|████▋ | 5457/11526 [57:08<1:02:08, 1.63it/s] {'loss': 0.2024, 'grad_norm': 0.5410546660423279, 'learning_rate': 6.320526421427948e-06, 'epoch': 1.42}
47%|████▋ | 5457/11526 [57:08<1:02:08, 1.63it/s] 47%|████▋ | 5458/11526 [57:09<1:02:06, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.6381192803382874, 'learning_rate': 6.3190658158118205e-06, 'epoch': 1.42}
47%|████▋ | 5458/11526 [57:09<1:02:06, 1.63it/s] 47%|████▋ | 5459/11526 [57:09<1:02:07, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.5295021533966064, 'learning_rate': 6.317605089203469e-06, 'epoch': 1.42}
47%|████▋ | 5459/11526 [57:10<1:02:07, 1.63it/s] 47%|████▋ | 5460/11526 [57:10<1:02:06, 1.63it/s] {'loss': 0.2204, 'grad_norm': 0.6059871315956116, 'learning_rate': 6.316144241736883e-06, 'epoch': 1.42}
47%|████▋ | 5460/11526 [57:10<1:02:06, 1.63it/s] 47%|████▋ | 5461/11526 [57:11<1:02:05, 1.63it/s] {'loss': 0.2539, 'grad_norm': 0.5553544163703918, 'learning_rate': 6.314683273546058e-06, 'epoch': 1.42}
47%|████▋ | 5461/11526 [57:11<1:02:05, 1.63it/s] 47%|████▋ | 5462/11526 [57:11<1:02:04, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.5294254422187805, 'learning_rate': 6.3132221847650046e-06, 'epoch': 1.42}
47%|████▋ | 5462/11526 [57:11<1:02:04, 1.63it/s] 47%|████▋ | 5463/11526 [57:12<1:02:10, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5620841979980469, 'learning_rate': 6.311760975527739e-06, 'epoch': 1.42}
47%|████▋ | 5463/11526 [57:12<1:02:10, 1.63it/s] 47%|████▋ | 5464/11526 [57:12<1:02:07, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.5151888728141785, 'learning_rate': 6.310299645968292e-06, 'epoch': 1.42}
47%|████▋ | 5464/11526 [57:13<1:02:07, 1.63it/s] 47%|████▋ | 5465/11526 [57:13<1:02:05, 1.63it/s] {'loss': 0.1918, 'grad_norm': 0.5132137537002563, 'learning_rate': 6.308838196220709e-06, 'epoch': 1.42}
47%|████▋ | 5465/11526 [57:13<1:02:05, 1.63it/s] 47%|████▋ | 5466/11526 [57:14<1:02:04, 1.63it/s] {'loss': 0.2475, 'grad_norm': 0.6067014336585999, 'learning_rate': 6.307376626419037e-06, 'epoch': 1.42}
47%|████▋ | 5466/11526 [57:14<1:02:04, 1.63it/s] 47%|████▋ | 5467/11526 [57:14<1:02:05, 1.63it/s] {'loss': 0.2453, 'grad_norm': 0.6966031193733215, 'learning_rate': 6.305914936697344e-06, 'epoch': 1.42}
47%|████▋ | 5467/11526 [57:14<1:02:05, 1.63it/s] 47%|████▋ | 5468/11526 [57:15<1:02:15, 1.62it/s] {'loss': 0.1749, 'grad_norm': 0.5284714698791504, 'learning_rate': 6.304453127189702e-06, 'epoch': 1.42}
47%|████▋ | 5468/11526 [57:15<1:02:15, 1.62it/s] 47%|████▋ | 5469/11526 [57:16<1:02:11, 1.62it/s] {'loss': 0.2477, 'grad_norm': 0.7179372310638428, 'learning_rate': 6.302991198030197e-06, 'epoch': 1.42}
47%|████▋ | 5469/11526 [57:16<1:02:11, 1.62it/s] 47%|████▋ | 5470/11526 [57:16<1:02:08, 1.62it/s] {'loss': 0.2023, 'grad_norm': 0.5977983474731445, 'learning_rate': 6.301529149352925e-06, 'epoch': 1.42}
47%|████▋ | 5470/11526 [57:16<1:02:08, 1.62it/s] 47%|████▋ | 5471/11526 [57:17<1:02:06, 1.62it/s] {'loss': 0.2259, 'grad_norm': 0.5772517323493958, 'learning_rate': 6.300066981291995e-06, 'epoch': 1.42}
47%|████▋ | 5471/11526 [57:17<1:02:06, 1.62it/s] 47%|████▋ | 5472/11526 [57:17<1:02:04, 1.63it/s] {'loss': 0.1965, 'grad_norm': 0.5278123021125793, 'learning_rate': 6.298604693981525e-06, 'epoch': 1.42}
47%|████▋ | 5472/11526 [57:18<1:02:04, 1.63it/s] 47%|████▋ | 5473/11526 [57:18<1:02:08, 1.62it/s] {'loss': 0.1847, 'grad_norm': 0.50849848985672, 'learning_rate': 6.297142287555642e-06, 'epoch': 1.42}
47%|████▋ | 5473/11526 [57:18<1:02:08, 1.62it/s] 47%|████▋ | 5474/11526 [57:19<1:02:03, 1.63it/s] {'loss': 0.5838, 'grad_norm': 0.7030114531517029, 'learning_rate': 6.295679762148489e-06, 'epoch': 1.42}
47%|████▋ | 5474/11526 [57:19<1:02:03, 1.63it/s] 48%|████▊ | 5475/11526 [57:19<1:02:00, 1.63it/s] {'loss': 0.2154, 'grad_norm': 0.5565318465232849, 'learning_rate': 6.294217117894214e-06, 'epoch': 1.43}
48%|████▊ | 5475/11526 [57:19<1:02:00, 1.63it/s] 48%|████▊ | 5476/11526 [57:20<1:02:18, 1.62it/s] {'loss': 0.2207, 'grad_norm': 0.5650877952575684, 'learning_rate': 6.292754354926986e-06, 'epoch': 1.43}
48%|████▊ | 5476/11526 [57:20<1:02:18, 1.62it/s] 48%|████▊ | 5477/11526 [57:20<1:02:11, 1.62it/s] {'loss': 0.1793, 'grad_norm': 0.515079140663147, 'learning_rate': 6.291291473380969e-06, 'epoch': 1.43}
48%|████▊ | 5477/11526 [57:21<1:02:11, 1.62it/s] 48%|████▊ | 5478/11526 [57:21<1:02:08, 1.62it/s] {'loss': 0.2297, 'grad_norm': 0.5688931941986084, 'learning_rate': 6.2898284733903515e-06, 'epoch': 1.43}
48%|████▊ | 5478/11526 [57:21<1:02:08, 1.62it/s] 48%|████▊ | 5479/11526 [57:22<1:02:04, 1.62it/s] {'loss': 0.2236, 'grad_norm': 0.6025370955467224, 'learning_rate': 6.288365355089328e-06, 'epoch': 1.43}
48%|████▊ | 5479/11526 [57:22<1:02:04, 1.62it/s] 48%|████▊ | 5480/11526 [57:22<1:01:59, 1.63it/s] {'loss': 0.1987, 'grad_norm': 0.5433154702186584, 'learning_rate': 6.2869021186121034e-06, 'epoch': 1.43}
48%|████▊ | 5480/11526 [57:22<1:01:59, 1.63it/s] 48%|████▊ | 5481/11526 [57:23<1:02:01, 1.62it/s] {'loss': 0.1825, 'grad_norm': 0.5174264311790466, 'learning_rate': 6.2854387640928945e-06, 'epoch': 1.43}
48%|████▊ | 5481/11526 [57:23<1:02:01, 1.62it/s] 48%|████▊ | 5482/11526 [57:24<1:01:57, 1.63it/s] {'loss': 0.2563, 'grad_norm': 0.5904784202575684, 'learning_rate': 6.283975291665929e-06, 'epoch': 1.43}
48%|████▊ | 5482/11526 [57:24<1:01:57, 1.63it/s] 48%|████▊ | 5483/11526 [57:24<1:02:02, 1.62it/s] {'loss': 0.283, 'grad_norm': 0.6338187456130981, 'learning_rate': 6.282511701465445e-06, 'epoch': 1.43}
48%|████▊ | 5483/11526 [57:24<1:02:02, 1.62it/s] 48%|████▊ | 5484/11526 [57:25<1:01:56, 1.63it/s] {'loss': 0.2531, 'grad_norm': 0.6199983954429626, 'learning_rate': 6.2810479936256884e-06, 'epoch': 1.43}
48%|████▊ | 5484/11526 [57:25<1:01:56, 1.63it/s] 48%|████▊ | 5485/11526 [57:25<1:01:55, 1.63it/s] {'loss': 0.2406, 'grad_norm': 0.6437427401542664, 'learning_rate': 6.279584168280921e-06, 'epoch': 1.43}
48%|████▊ | 5485/11526 [57:26<1:01:55, 1.63it/s] 48%|████▊ | 5486/11526 [57:26<1:01:55, 1.63it/s] {'loss': 0.2218, 'grad_norm': 0.49216970801353455, 'learning_rate': 6.278120225565414e-06, 'epoch': 1.43}
48%|████▊ | 5486/11526 [57:26<1:01:55, 1.63it/s] 48%|████▊ | 5487/11526 [57:27<1:01:52, 1.63it/s] {'loss': 0.2042, 'grad_norm': 0.5444942712783813, 'learning_rate': 6.276656165613448e-06, 'epoch': 1.43}
48%|████▊ | 5487/11526 [57:27<1:01:52, 1.63it/s] 48%|████▊ | 5488/11526 [57:27<1:01:54, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.5782101154327393, 'learning_rate': 6.275191988559315e-06, 'epoch': 1.43}
48%|████▊ | 5488/11526 [57:27<1:01:54, 1.63it/s] 48%|████▊ | 5489/11526 [57:28<1:01:52, 1.63it/s] {'loss': 0.1925, 'grad_norm': 0.5194937586784363, 'learning_rate': 6.273727694537316e-06, 'epoch': 1.43}
48%|████▊ | 5489/11526 [57:28<1:01:52, 1.63it/s] 48%|████▊ | 5490/11526 [57:28<1:01:50, 1.63it/s] {'loss': 0.214, 'grad_norm': 0.5004193782806396, 'learning_rate': 6.272263283681766e-06, 'epoch': 1.43}
48%|████▊ | 5490/11526 [57:29<1:01:50, 1.63it/s] 48%|████▊ | 5491/11526 [57:29<1:01:54, 1.62it/s] {'loss': 0.267, 'grad_norm': 0.7351613640785217, 'learning_rate': 6.27079875612699e-06, 'epoch': 1.43}
48%|████▊ | 5491/11526 [57:29<1:01:54, 1.62it/s] 48%|████▊ | 5492/11526 [57:30<1:01:51, 1.63it/s] {'loss': 0.1863, 'grad_norm': 0.4938403069972992, 'learning_rate': 6.269334112007321e-06, 'epoch': 1.43}
48%|████▊ | 5492/11526 [57:30<1:01:51, 1.63it/s] 48%|████▊ | 5493/11526 [57:30<1:01:54, 1.62it/s] {'loss': 0.1659, 'grad_norm': 0.479333758354187, 'learning_rate': 6.267869351457103e-06, 'epoch': 1.43}
48%|████▊ | 5493/11526 [57:30<1:01:54, 1.62it/s] 48%|████▊ | 5494/11526 [57:31<1:01:50, 1.63it/s] {'loss': 0.2407, 'grad_norm': 0.5815158486366272, 'learning_rate': 6.266404474610697e-06, 'epoch': 1.43}
48%|████▊ | 5494/11526 [57:31<1:01:50, 1.63it/s] 48%|████▊ | 5495/11526 [57:32<1:01:47, 1.63it/s] {'loss': 0.1892, 'grad_norm': 0.5098471641540527, 'learning_rate': 6.264939481602465e-06, 'epoch': 1.43}
48%|████▊ | 5495/11526 [57:32<1:01:47, 1.63it/s] 48%|████▊ | 5496/11526 [57:32<1:01:49, 1.63it/s] {'loss': 0.1613, 'grad_norm': 0.45141592621803284, 'learning_rate': 6.263474372566789e-06, 'epoch': 1.43}
48%|████▊ | 5496/11526 [57:32<1:01:49, 1.63it/s] 48%|████▊ | 5497/11526 [57:33<1:01:48, 1.63it/s] {'loss': 0.1655, 'grad_norm': 0.4700244665145874, 'learning_rate': 6.262009147638051e-06, 'epoch': 1.43}
48%|████▊ | 5497/11526 [57:33<1:01:48, 1.63it/s] 48%|████▊ | 5498/11526 [57:33<1:01:47, 1.63it/s] {'loss': 0.2962, 'grad_norm': 0.6213946342468262, 'learning_rate': 6.260543806950657e-06, 'epoch': 1.43}
48%|████▊ | 5498/11526 [57:34<1:01:47, 1.63it/s] 48%|████▊ | 5499/11526 [57:34<1:01:45, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.5012149214744568, 'learning_rate': 6.25907835063901e-06, 'epoch': 1.43}
48%|████▊ | 5499/11526 [57:34<1:01:45, 1.63it/s] 48%|████▊ | 5500/11526 [57:35<1:01:43, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.5147197842597961, 'learning_rate': 6.257612778837535e-06, 'epoch': 1.43}
48%|████▊ | 5500/11526 [57:35<1:01:43, 1.63it/s] 48%|████▊ | 5501/11526 [57:35<1:01:47, 1.63it/s] {'loss': 0.2428, 'grad_norm': 0.6080776453018188, 'learning_rate': 6.2561470916806595e-06, 'epoch': 1.43}
48%|████▊ | 5501/11526 [57:35<1:01:47, 1.63it/s] 48%|████▊ | 5502/11526 [57:36<1:01:45, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.4853057563304901, 'learning_rate': 6.254681289302825e-06, 'epoch': 1.43}
48%|████▊ | 5502/11526 [57:36<1:01:45, 1.63it/s] 48%|████▊ | 5503/11526 [57:36<1:01:47, 1.62it/s] {'loss': 0.2344, 'grad_norm': 0.6347095966339111, 'learning_rate': 6.253215371838486e-06, 'epoch': 1.43}
48%|████▊ | 5503/11526 [57:37<1:01:47, 1.62it/s] 48%|████▊ | 5504/11526 [57:37<1:01:44, 1.63it/s] {'loss': 0.178, 'grad_norm': 0.4800346791744232, 'learning_rate': 6.2517493394220995e-06, 'epoch': 1.43}
48%|████▊ | 5504/11526 [57:37<1:01:44, 1.63it/s] 48%|████▊ | 5505/11526 [57:38<1:01:41, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.46097883582115173, 'learning_rate': 6.250283192188142e-06, 'epoch': 1.43}
48%|████▊ | 5505/11526 [57:38<1:01:41, 1.63it/s] 48%|████▊ | 5506/11526 [57:38<1:01:40, 1.63it/s] {'loss': 0.256, 'grad_norm': 0.5603499412536621, 'learning_rate': 6.2488169302710976e-06, 'epoch': 1.43}
48%|████▊ | 5506/11526 [57:38<1:01:40, 1.63it/s] 48%|████▊ | 5507/11526 [57:39<1:01:39, 1.63it/s] {'loss': 0.2603, 'grad_norm': 0.6133333444595337, 'learning_rate': 6.24735055380546e-06, 'epoch': 1.43}
48%|████▊ | 5507/11526 [57:39<1:01:39, 1.63it/s] 48%|████▊ | 5508/11526 [57:40<1:01:40, 1.63it/s] {'loss': 0.211, 'grad_norm': 0.5717518329620361, 'learning_rate': 6.245884062925729e-06, 'epoch': 1.43}
48%|████▊ | 5508/11526 [57:40<1:01:40, 1.63it/s] 48%|████▊ | 5509/11526 [57:40<1:01:37, 1.63it/s] {'loss': 0.1992, 'grad_norm': 0.5010180473327637, 'learning_rate': 6.244417457766426e-06, 'epoch': 1.43}
48%|████▊ | 5509/11526 [57:40<1:01:37, 1.63it/s] 48%|████▊ | 5510/11526 [57:41<1:01:42, 1.63it/s] {'loss': 0.1789, 'grad_norm': 0.4590829908847809, 'learning_rate': 6.242950738462072e-06, 'epoch': 1.43}
48%|████▊ | 5510/11526 [57:41<1:01:42, 1.63it/s] 48%|████▊ | 5511/11526 [57:41<1:01:38, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.520413339138031, 'learning_rate': 6.241483905147206e-06, 'epoch': 1.43}
48%|████▊ | 5511/11526 [57:42<1:01:38, 1.63it/s] 48%|████▊ | 5512/11526 [57:42<1:01:38, 1.63it/s] {'loss': 0.2261, 'grad_norm': 0.5686134099960327, 'learning_rate': 6.24001695795637e-06, 'epoch': 1.43}
48%|████▊ | 5512/11526 [57:42<1:01:38, 1.63it/s] 48%|████▊ | 5513/11526 [57:43<1:01:40, 1.63it/s] {'loss': 0.2615, 'grad_norm': 0.7097981572151184, 'learning_rate': 6.238549897024123e-06, 'epoch': 1.43}
48%|████▊ | 5513/11526 [57:43<1:01:40, 1.63it/s] 48%|████▊ | 5514/11526 [57:43<1:01:36, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.5228271484375, 'learning_rate': 6.237082722485035e-06, 'epoch': 1.44}
48%|████▊ | 5514/11526 [57:43<1:01:36, 1.63it/s] 48%|████▊ | 5515/11526 [57:44<1:01:32, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.518569827079773, 'learning_rate': 6.2356154344736785e-06, 'epoch': 1.44}
48%|████▊ | 5515/11526 [57:44<1:01:32, 1.63it/s] 48%|████▊ | 5516/11526 [57:44<1:01:33, 1.63it/s] {'loss': 0.1882, 'grad_norm': 0.5022196173667908, 'learning_rate': 6.234148033124645e-06, 'epoch': 1.44}
48%|████▊ | 5516/11526 [57:45<1:01:33, 1.63it/s] 48%|████▊ | 5517/11526 [57:45<1:01:32, 1.63it/s] {'loss': 0.1895, 'grad_norm': 0.5075571537017822, 'learning_rate': 6.232680518572531e-06, 'epoch': 1.44}
48%|████▊ | 5517/11526 [57:45<1:01:32, 1.63it/s] 48%|████▊ | 5518/11526 [57:46<1:01:32, 1.63it/s] {'loss': 0.1912, 'grad_norm': 0.5015223622322083, 'learning_rate': 6.231212890951948e-06, 'epoch': 1.44}
48%|████▊ | 5518/11526 [57:46<1:01:32, 1.63it/s] 48%|████▊ | 5519/11526 [57:46<1:01:30, 1.63it/s] {'loss': 0.1999, 'grad_norm': 0.5677736401557922, 'learning_rate': 6.229745150397513e-06, 'epoch': 1.44}
48%|████▊ | 5519/11526 [57:46<1:01:30, 1.63it/s] 48%|████▊ | 5520/11526 [57:47<1:01:28, 1.63it/s] {'loss': 0.1982, 'grad_norm': 0.5613862872123718, 'learning_rate': 6.2282772970438546e-06, 'epoch': 1.44}
48%|████▊ | 5520/11526 [57:47<1:01:28, 1.63it/s] 48%|████▊ | 5521/11526 [57:48<1:01:29, 1.63it/s] {'loss': 0.1656, 'grad_norm': 0.44915685057640076, 'learning_rate': 6.226809331025615e-06, 'epoch': 1.44}
48%|████▊ | 5521/11526 [57:48<1:01:29, 1.63it/s] 48%|████▊ | 5522/11526 [57:48<1:01:31, 1.63it/s] {'loss': 0.2407, 'grad_norm': 0.48379141092300415, 'learning_rate': 6.2253412524774425e-06, 'epoch': 1.44}
48%|████▊ | 5522/11526 [57:48<1:01:31, 1.63it/s] 48%|████▊ | 5523/11526 [57:49<1:01:28, 1.63it/s] {'loss': 0.1588, 'grad_norm': 0.48270782828330994, 'learning_rate': 6.223873061533998e-06, 'epoch': 1.44}
48%|████▊ | 5523/11526 [57:49<1:01:28, 1.63it/s] 48%|████▊ | 5524/11526 [57:49<1:01:25, 1.63it/s] {'loss': 0.1893, 'grad_norm': 0.5357421636581421, 'learning_rate': 6.222404758329953e-06, 'epoch': 1.44}
48%|████▊ | 5524/11526 [57:50<1:01:25, 1.63it/s] 48%|████▊ | 5525/11526 [57:50<1:01:24, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.40353772044181824, 'learning_rate': 6.22093634299999e-06, 'epoch': 1.44}
48%|████▊ | 5525/11526 [57:50<1:01:24, 1.63it/s] 48%|████▊ | 5526/11526 [57:51<1:01:28, 1.63it/s] {'loss': 0.1633, 'grad_norm': 0.44498535990715027, 'learning_rate': 6.219467815678797e-06, 'epoch': 1.44}
48%|████▊ | 5526/11526 [57:51<1:01:28, 1.63it/s] 48%|████▊ | 5527/11526 [57:51<1:01:26, 1.63it/s] {'loss': 0.2259, 'grad_norm': 0.5929351449012756, 'learning_rate': 6.217999176501078e-06, 'epoch': 1.44}
48%|████▊ | 5527/11526 [57:51<1:01:26, 1.63it/s] 48%|████▊ | 5528/11526 [57:52<1:01:33, 1.62it/s] {'loss': 0.2149, 'grad_norm': 0.5556310415267944, 'learning_rate': 6.216530425601544e-06, 'epoch': 1.44}
48%|████▊ | 5528/11526 [57:52<1:01:33, 1.62it/s] 48%|████▊ | 5529/11526 [57:52<1:01:29, 1.63it/s] {'loss': 0.2499, 'grad_norm': 0.6321167349815369, 'learning_rate': 6.215061563114919e-06, 'epoch': 1.44}
48%|████▊ | 5529/11526 [57:53<1:01:29, 1.63it/s] 48%|████▊ | 5530/11526 [57:53<1:01:25, 1.63it/s] {'loss': 0.286, 'grad_norm': 0.55785071849823, 'learning_rate': 6.213592589175934e-06, 'epoch': 1.44}
48%|████▊ | 5530/11526 [57:53<1:01:25, 1.63it/s] 48%|████▊ | 5531/11526 [57:54<1:01:27, 1.63it/s] {'loss': 0.2173, 'grad_norm': 0.5835590362548828, 'learning_rate': 6.212123503919329e-06, 'epoch': 1.44}
48%|████▊ | 5531/11526 [57:54<1:01:27, 1.63it/s] 48%|████▊ | 5532/11526 [57:54<1:01:24, 1.63it/s] {'loss': 0.2428, 'grad_norm': 0.5414198040962219, 'learning_rate': 6.210654307479862e-06, 'epoch': 1.44}
48%|████▊ | 5532/11526 [57:54<1:01:24, 1.63it/s] 48%|████▊ | 5533/11526 [57:55<1:01:21, 1.63it/s] {'loss': 0.2429, 'grad_norm': 0.5938297510147095, 'learning_rate': 6.209184999992292e-06, 'epoch': 1.44}
48%|████▊ | 5533/11526 [57:55<1:01:21, 1.63it/s] 48%|████▊ | 5534/11526 [57:56<1:01:23, 1.63it/s] {'loss': 0.2583, 'grad_norm': 0.5968653559684753, 'learning_rate': 6.207715581591396e-06, 'epoch': 1.44}
48%|████▊ | 5534/11526 [57:56<1:01:23, 1.63it/s] 48%|████▊ | 5535/11526 [57:56<1:01:20, 1.63it/s] {'loss': 0.2515, 'grad_norm': 0.6581779718399048, 'learning_rate': 6.2062460524119525e-06, 'epoch': 1.44}
48%|████▊ | 5535/11526 [57:56<1:01:20, 1.63it/s] 48%|████▊ | 5536/11526 [57:57<1:01:21, 1.63it/s] {'loss': 0.2123, 'grad_norm': 0.5238813757896423, 'learning_rate': 6.20477641258876e-06, 'epoch': 1.44}
48%|████▊ | 5536/11526 [57:57<1:01:21, 1.63it/s] 48%|████▊ | 5537/11526 [57:57<1:01:19, 1.63it/s] {'loss': 0.1634, 'grad_norm': 0.4939369261264801, 'learning_rate': 6.20330666225662e-06, 'epoch': 1.44}
48%|████▊ | 5537/11526 [57:58<1:01:19, 1.63it/s] 48%|████▊ | 5538/11526 [57:58<1:01:17, 1.63it/s] {'loss': 0.2275, 'grad_norm': 0.5769575834274292, 'learning_rate': 6.201836801550346e-06, 'epoch': 1.44}
48%|████▊ | 5538/11526 [57:58<1:01:17, 1.63it/s] 48%|████▊ | 5539/11526 [57:59<1:01:16, 1.63it/s] {'loss': 0.209, 'grad_norm': 0.546970009803772, 'learning_rate': 6.2003668306047605e-06, 'epoch': 1.44}
48%|████▊ | 5539/11526 [57:59<1:01:16, 1.63it/s] 48%|████▊ | 5540/11526 [57:59<1:01:16, 1.63it/s] {'loss': 0.2762, 'grad_norm': 0.6165581941604614, 'learning_rate': 6.1988967495547016e-06, 'epoch': 1.44}
48%|████▊ | 5540/11526 [57:59<1:01:16, 1.63it/s] 48%|████▊ | 5541/11526 [58:00<1:01:18, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5205325484275818, 'learning_rate': 6.1974265585350116e-06, 'epoch': 1.44}
48%|████▊ | 5541/11526 [58:00<1:01:18, 1.63it/s] 48%|████▊ | 5542/11526 [58:00<1:01:18, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.5766313076019287, 'learning_rate': 6.195956257680542e-06, 'epoch': 1.44}
48%|████▊ | 5542/11526 [58:01<1:01:18, 1.63it/s] 48%|████▊ | 5543/11526 [58:01<1:01:16, 1.63it/s] {'loss': 0.2615, 'grad_norm': 0.6262105703353882, 'learning_rate': 6.194485847126163e-06, 'epoch': 1.44}
48%|████▊ | 5543/11526 [58:01<1:01:16, 1.63it/s] 48%|████▊ | 5544/11526 [58:02<1:01:15, 1.63it/s] {'loss': 0.2296, 'grad_norm': 0.6147996187210083, 'learning_rate': 6.193015327006744e-06, 'epoch': 1.44}
48%|████▊ | 5544/11526 [58:02<1:01:15, 1.63it/s] 48%|████▊ | 5545/11526 [58:02<1:01:13, 1.63it/s] {'loss': 0.1999, 'grad_norm': 0.52244633436203, 'learning_rate': 6.191544697457172e-06, 'epoch': 1.44}
48%|████▊ | 5545/11526 [58:02<1:01:13, 1.63it/s] 48%|████▊ | 5546/11526 [58:03<1:01:34, 1.62it/s] {'loss': 0.1757, 'grad_norm': 0.5210894346237183, 'learning_rate': 6.19007395861234e-06, 'epoch': 1.44}
48%|████▊ | 5546/11526 [58:03<1:01:34, 1.62it/s] 48%|████▊ | 5547/11526 [58:04<1:01:25, 1.62it/s] {'loss': 0.2366, 'grad_norm': 0.6360064744949341, 'learning_rate': 6.188603110607153e-06, 'epoch': 1.44}
48%|████▊ | 5547/11526 [58:04<1:01:25, 1.62it/s] 48%|████▊ | 5548/11526 [58:04<1:01:22, 1.62it/s] {'loss': 0.2305, 'grad_norm': 0.5477888584136963, 'learning_rate': 6.187132153576526e-06, 'epoch': 1.44}
48%|████▊ | 5548/11526 [58:04<1:01:22, 1.62it/s] 48%|████▊ | 5549/11526 [58:05<1:01:19, 1.62it/s] {'loss': 0.2329, 'grad_norm': 0.5747239589691162, 'learning_rate': 6.185661087655385e-06, 'epoch': 1.44}
48%|████▊ | 5549/11526 [58:05<1:01:19, 1.62it/s] 48%|████▊ | 5550/11526 [58:05<1:01:13, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5572449564933777, 'learning_rate': 6.184189912978661e-06, 'epoch': 1.44}
48%|████▊ | 5550/11526 [58:06<1:01:13, 1.63it/s] 48%|████▊ | 5551/11526 [58:06<1:01:15, 1.63it/s] {'loss': 0.1976, 'grad_norm': 0.5552921295166016, 'learning_rate': 6.1827186296813015e-06, 'epoch': 1.44}
48%|████▊ | 5551/11526 [58:06<1:01:15, 1.63it/s] 48%|████▊ | 5552/11526 [58:07<1:01:13, 1.63it/s] {'loss': 0.2332, 'grad_norm': 0.5575600862503052, 'learning_rate': 6.18124723789826e-06, 'epoch': 1.45}
48%|████▊ | 5552/11526 [58:07<1:01:13, 1.63it/s] 48%|████▊ | 5553/11526 [58:07<1:01:12, 1.63it/s] {'loss': 0.1581, 'grad_norm': 0.4427918493747711, 'learning_rate': 6.1797757377645e-06, 'epoch': 1.45}
48%|████▊ | 5553/11526 [58:07<1:01:12, 1.63it/s] 48%|████▊ | 5554/11526 [58:08<1:01:10, 1.63it/s] {'loss': 0.2138, 'grad_norm': 0.556627631187439, 'learning_rate': 6.178304129414997e-06, 'epoch': 1.45}
48%|████▊ | 5554/11526 [58:08<1:01:10, 1.63it/s] 48%|████▊ | 5555/11526 [58:08<1:01:10, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.6119393110275269, 'learning_rate': 6.176832412984734e-06, 'epoch': 1.45}
48%|████▊ | 5555/11526 [58:09<1:01:10, 1.63it/s] 48%|████▊ | 5556/11526 [58:09<1:01:09, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.5864142179489136, 'learning_rate': 6.1753605886087075e-06, 'epoch': 1.45}
48%|████▊ | 5556/11526 [58:09<1:01:09, 1.63it/s] 48%|████▊ | 5557/11526 [58:10<1:01:06, 1.63it/s] {'loss': 0.2324, 'grad_norm': 0.5436546206474304, 'learning_rate': 6.1738886564219205e-06, 'epoch': 1.45}
48%|████▊ | 5557/11526 [58:10<1:01:06, 1.63it/s] 48%|████▊ | 5558/11526 [58:10<1:01:04, 1.63it/s] {'loss': 0.2518, 'grad_norm': 0.6154535412788391, 'learning_rate': 6.172416616559385e-06, 'epoch': 1.45}
48%|████▊ | 5558/11526 [58:10<1:01:04, 1.63it/s] 48%|████▊ | 5559/11526 [58:11<1:01:02, 1.63it/s] {'loss': 0.2114, 'grad_norm': 0.49676308035850525, 'learning_rate': 6.170944469156128e-06, 'epoch': 1.45}
48%|████▊ | 5559/11526 [58:11<1:01:02, 1.63it/s] 48%|████▊ | 5560/11526 [58:12<1:01:00, 1.63it/s] {'loss': 0.2109, 'grad_norm': 0.5410504937171936, 'learning_rate': 6.169472214347183e-06, 'epoch': 1.45}
48%|████▊ | 5560/11526 [58:12<1:01:00, 1.63it/s] 48%|████▊ | 5561/11526 [58:12<1:01:23, 1.62it/s] {'loss': 0.1859, 'grad_norm': 0.5769607424736023, 'learning_rate': 6.167999852267591e-06, 'epoch': 1.45}
48%|████▊ | 5561/11526 [58:12<1:01:23, 1.62it/s] 48%|████▊ | 5562/11526 [58:13<1:01:18, 1.62it/s] {'loss': 0.1793, 'grad_norm': 0.4764896333217621, 'learning_rate': 6.166527383052406e-06, 'epoch': 1.45}
48%|████▊ | 5562/11526 [58:13<1:01:18, 1.62it/s] 48%|████▊ | 5563/11526 [58:13<1:01:19, 1.62it/s] {'loss': 0.1646, 'grad_norm': 0.4887414276599884, 'learning_rate': 6.165054806836694e-06, 'epoch': 1.45}
48%|████▊ | 5563/11526 [58:14<1:01:19, 1.62it/s] 48%|████▊ | 5564/11526 [58:14<1:01:12, 1.62it/s] {'loss': 0.2761, 'grad_norm': 0.6232955455780029, 'learning_rate': 6.163582123755526e-06, 'epoch': 1.45}
48%|████▊ | 5564/11526 [58:14<1:01:12, 1.62it/s] 48%|████▊ | 5565/11526 [58:15<1:01:06, 1.63it/s] {'loss': 0.1785, 'grad_norm': 0.46392226219177246, 'learning_rate': 6.162109333943984e-06, 'epoch': 1.45}
48%|████▊ | 5565/11526 [58:15<1:01:06, 1.63it/s] 48%|████▊ | 5566/11526 [58:15<1:01:05, 1.63it/s] {'loss': 0.1952, 'grad_norm': 0.48442280292510986, 'learning_rate': 6.160636437537162e-06, 'epoch': 1.45}
48%|████▊ | 5566/11526 [58:15<1:01:05, 1.63it/s] 48%|████▊ | 5567/11526 [58:16<1:01:06, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.5648987293243408, 'learning_rate': 6.1591634346701635e-06, 'epoch': 1.45}
48%|████▊ | 5567/11526 [58:16<1:01:06, 1.63it/s] 48%|████▊ | 5568/11526 [58:17<1:02:56, 1.58it/s] {'loss': 0.1926, 'grad_norm': 0.5024765729904175, 'learning_rate': 6.157690325478098e-06, 'epoch': 1.45}
48%|████▊ | 5568/11526 [58:17<1:02:56, 1.58it/s] 48%|████▊ | 5569/11526 [58:17<1:02:20, 1.59it/s] {'loss': 0.1782, 'grad_norm': 0.5192736387252808, 'learning_rate': 6.156217110096089e-06, 'epoch': 1.45}
48%|████▊ | 5569/11526 [58:17<1:02:20, 1.59it/s] 48%|████▊ | 5570/11526 [58:18<1:01:55, 1.60it/s] {'loss': 0.1817, 'grad_norm': 0.5371754169464111, 'learning_rate': 6.15474378865927e-06, 'epoch': 1.45}
48%|████▊ | 5570/11526 [58:18<1:01:55, 1.60it/s] 48%|████▊ | 5571/11526 [58:18<1:01:38, 1.61it/s] {'loss': 0.1887, 'grad_norm': 0.47153815627098083, 'learning_rate': 6.153270361302777e-06, 'epoch': 1.45}
48%|████▊ | 5571/11526 [58:18<1:01:38, 1.61it/s] 48%|████▊ | 5572/11526 [58:19<1:01:25, 1.62it/s] {'loss': 0.1856, 'grad_norm': 0.49985238909721375, 'learning_rate': 6.151796828161766e-06, 'epoch': 1.45}
48%|████▊ | 5572/11526 [58:19<1:01:25, 1.62it/s] 48%|████▊ | 5573/11526 [58:20<1:01:22, 1.62it/s] {'loss': 0.2805, 'grad_norm': 0.6271909475326538, 'learning_rate': 6.150323189371398e-06, 'epoch': 1.45}
48%|████▊ | 5573/11526 [58:20<1:01:22, 1.62it/s] 48%|████▊ | 5574/11526 [58:20<1:01:13, 1.62it/s] {'loss': 0.2112, 'grad_norm': 0.5514768362045288, 'learning_rate': 6.148849445066841e-06, 'epoch': 1.45}
48%|████▊ | 5574/11526 [58:20<1:01:13, 1.62it/s] 48%|████▊ | 5575/11526 [58:21<1:01:06, 1.62it/s] {'loss': 0.2313, 'grad_norm': 0.5871658325195312, 'learning_rate': 6.147375595383276e-06, 'epoch': 1.45}
48%|████▊ | 5575/11526 [58:21<1:01:06, 1.62it/s] 48%|████▊ | 5576/11526 [58:21<1:01:05, 1.62it/s] {'loss': 0.2178, 'grad_norm': 0.5697159171104431, 'learning_rate': 6.145901640455893e-06, 'epoch': 1.45}
48%|████▊ | 5576/11526 [58:22<1:01:05, 1.62it/s] 48%|████▊ | 5577/11526 [58:22<1:01:01, 1.62it/s] {'loss': 0.159, 'grad_norm': 0.4278634488582611, 'learning_rate': 6.144427580419892e-06, 'epoch': 1.45}
48%|████▊ | 5577/11526 [58:22<1:01:01, 1.62it/s] 48%|████▊ | 5578/11526 [58:23<1:01:11, 1.62it/s] {'loss': 0.2083, 'grad_norm': 0.5094971656799316, 'learning_rate': 6.142953415410483e-06, 'epoch': 1.45}
48%|████▊ | 5578/11526 [58:23<1:01:11, 1.62it/s] 48%|████▊ | 5579/11526 [58:23<1:01:05, 1.62it/s] {'loss': 0.2146, 'grad_norm': 0.5999428033828735, 'learning_rate': 6.141479145562883e-06, 'epoch': 1.45}
48%|████▊ | 5579/11526 [58:23<1:01:05, 1.62it/s] 48%|████▊ | 5580/11526 [58:24<1:00:59, 1.62it/s] {'loss': 0.1831, 'grad_norm': 0.5078852772712708, 'learning_rate': 6.1400047710123185e-06, 'epoch': 1.45}
48%|████▊ | 5580/11526 [58:24<1:00:59, 1.62it/s] 48%|████▊ | 5581/11526 [58:25<1:00:59, 1.62it/s] {'loss': 0.1997, 'grad_norm': 0.5599985122680664, 'learning_rate': 6.138530291894033e-06, 'epoch': 1.45}
48%|████▊ | 5581/11526 [58:25<1:00:59, 1.62it/s] 48%|████▊ | 5582/11526 [58:25<1:00:56, 1.63it/s] {'loss': 0.2169, 'grad_norm': 0.5796253681182861, 'learning_rate': 6.137055708343269e-06, 'epoch': 1.45}
48%|████▊ | 5582/11526 [58:25<1:00:56, 1.63it/s] 48%|████▊ | 5583/11526 [58:26<1:00:54, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.5713209509849548, 'learning_rate': 6.1355810204952885e-06, 'epoch': 1.45}
48%|████▊ | 5583/11526 [58:26<1:00:54, 1.63it/s] 48%|████▊ | 5584/11526 [58:26<1:00:52, 1.63it/s] {'loss': 0.2177, 'grad_norm': 0.6671246886253357, 'learning_rate': 6.134106228485353e-06, 'epoch': 1.45}
48%|████▊ | 5584/11526 [58:26<1:00:52, 1.63it/s] 48%|████▊ | 5585/11526 [58:27<1:00:55, 1.63it/s] {'loss': 0.159, 'grad_norm': 0.43478503823280334, 'learning_rate': 6.132631332448743e-06, 'epoch': 1.45}
48%|████▊ | 5585/11526 [58:27<1:00:55, 1.63it/s] 48%|████▊ | 5586/11526 [58:28<1:00:53, 1.63it/s] {'loss': 0.1931, 'grad_norm': 0.5732662677764893, 'learning_rate': 6.131156332520741e-06, 'epoch': 1.45}
48%|████▊ | 5586/11526 [58:28<1:00:53, 1.63it/s] 48%|████▊ | 5587/11526 [58:28<1:00:48, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.4674510955810547, 'learning_rate': 6.129681228836647e-06, 'epoch': 1.45}
48%|████▊ | 5587/11526 [58:28<1:00:48, 1.63it/s] 48%|████▊ | 5588/11526 [58:29<1:00:51, 1.63it/s] {'loss': 0.1791, 'grad_norm': 0.5093639492988586, 'learning_rate': 6.128206021531759e-06, 'epoch': 1.45}
48%|████▊ | 5588/11526 [58:29<1:00:51, 1.63it/s] 48%|████▊ | 5589/11526 [58:29<1:00:48, 1.63it/s] {'loss': 0.2058, 'grad_norm': 0.7123666405677795, 'learning_rate': 6.126730710741399e-06, 'epoch': 1.45}
48%|████▊ | 5589/11526 [58:30<1:00:48, 1.63it/s] 48%|████▊ | 5590/11526 [58:30<1:04:04, 1.54it/s] {'loss': 0.2872, 'grad_norm': 0.6527453064918518, 'learning_rate': 6.125255296600887e-06, 'epoch': 1.45}
48%|████▊ | 5590/11526 [58:30<1:04:04, 1.54it/s] 49%|████▊ | 5591/11526 [58:31<1:03:08, 1.57it/s] {'loss': 0.1841, 'grad_norm': 0.5157011151313782, 'learning_rate': 6.123779779245557e-06, 'epoch': 1.46}
49%|████▊ | 5591/11526 [58:31<1:03:08, 1.57it/s] 49%|████▊ | 5592/11526 [58:31<1:02:23, 1.59it/s] {'loss': 0.2679, 'grad_norm': 0.5876116156578064, 'learning_rate': 6.12230415881075e-06, 'epoch': 1.46}
49%|████▊ | 5592/11526 [58:32<1:02:23, 1.59it/s] 49%|████▊ | 5593/11526 [58:32<1:01:49, 1.60it/s] {'loss': 0.185, 'grad_norm': 0.5365090370178223, 'learning_rate': 6.120828435431821e-06, 'epoch': 1.46}
49%|████▊ | 5593/11526 [58:32<1:01:49, 1.60it/s] 49%|████▊ | 5594/11526 [58:33<1:01:29, 1.61it/s] {'loss': 0.1645, 'grad_norm': 0.4666370451450348, 'learning_rate': 6.11935260924413e-06, 'epoch': 1.46}
49%|████▊ | 5594/11526 [58:33<1:01:29, 1.61it/s] 49%|████▊ | 5595/11526 [58:33<1:01:16, 1.61it/s] {'loss': 0.186, 'grad_norm': 0.5398767590522766, 'learning_rate': 6.117876680383048e-06, 'epoch': 1.46}
49%|████▊ | 5595/11526 [58:33<1:01:16, 1.61it/s] 49%|████▊ | 5596/11526 [58:34<1:04:25, 1.53it/s] {'loss': 0.2334, 'grad_norm': 0.5744159817695618, 'learning_rate': 6.116400648983958e-06, 'epoch': 1.46}
49%|████▊ | 5596/11526 [58:34<1:04:25, 1.53it/s] 49%|████▊ | 5597/11526 [58:35<1:03:20, 1.56it/s] {'loss': 0.1637, 'grad_norm': 0.521706223487854, 'learning_rate': 6.114924515182248e-06, 'epoch': 1.46}
49%|████▊ | 5597/11526 [58:35<1:03:20, 1.56it/s] 49%|████▊ | 5598/11526 [58:35<1:02:32, 1.58it/s] {'loss': 0.209, 'grad_norm': 0.5661709308624268, 'learning_rate': 6.113448279113318e-06, 'epoch': 1.46}
49%|████▊ | 5598/11526 [58:35<1:02:32, 1.58it/s] 49%|████▊ | 5599/11526 [58:36<1:02:01, 1.59it/s] {'loss': 0.2666, 'grad_norm': 0.6038132905960083, 'learning_rate': 6.111971940912576e-06, 'epoch': 1.46}
49%|████▊ | 5599/11526 [58:36<1:02:01, 1.59it/s] 49%|████▊ | 5600/11526 [58:36<1:01:36, 1.60it/s] {'loss': 0.1585, 'grad_norm': 0.4714517593383789, 'learning_rate': 6.110495500715441e-06, 'epoch': 1.46}
49%|████▊ | 5600/11526 [58:37<1:01:36, 1.60it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5775817036628723, 'eval_runtime': 1.9569, 'eval_samples_per_second': 102.204, 'eval_steps_per_second': 6.643, 'epoch': 1.46}
49%|████▊ | 5600/11526 [58:39<1:01:36, 1.60it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 49%|████▊ | 5601/11526 [58:39<1:59:30, 1.21s/it] {'loss': 0.2112, 'grad_norm': 0.5039400458335876, 'learning_rate': 6.10901895865734e-06, 'epoch': 1.46}
49%|████▊ | 5601/11526 [58:39<1:59:30, 1.21s/it] 49%|████▊ | 5602/11526 [58:40<1:41:49, 1.03s/it] {'loss': 0.2137, 'grad_norm': 0.6225388646125793, 'learning_rate': 6.10754231487371e-06, 'epoch': 1.46}
49%|████▊ | 5602/11526 [58:40<1:41:49, 1.03s/it] 49%|████▊ | 5603/11526 [58:40<1:29:25, 1.10it/s] {'loss': 0.1839, 'grad_norm': 0.4894367456436157, 'learning_rate': 6.106065569499997e-06, 'epoch': 1.46}
49%|████▊ | 5603/11526 [58:40<1:29:25, 1.10it/s] 49%|████▊ | 5604/11526 [58:41<1:20:45, 1.22it/s] {'loss': 0.2069, 'grad_norm': 0.5694513320922852, 'learning_rate': 6.104588722671657e-06, 'epoch': 1.46}
49%|████▊ | 5604/11526 [58:41<1:20:45, 1.22it/s] 49%|████▊ | 5605/11526 [58:41<1:14:42, 1.32it/s] {'loss': 0.2114, 'grad_norm': 0.5655454993247986, 'learning_rate': 6.1031117745241565e-06, 'epoch': 1.46}
49%|████▊ | 5605/11526 [58:42<1:14:42, 1.32it/s] 49%|████▊ | 5606/11526 [58:42<1:10:26, 1.40it/s] {'loss': 0.2509, 'grad_norm': 0.7150793671607971, 'learning_rate': 6.101634725192965e-06, 'epoch': 1.46}
49%|████▊ | 5606/11526 [58:42<1:10:26, 1.40it/s] 49%|████▊ | 5607/11526 [58:43<1:07:30, 1.46it/s] {'loss': 0.1715, 'grad_norm': 0.4702790081501007, 'learning_rate': 6.1001575748135685e-06, 'epoch': 1.46}
49%|████▊ | 5607/11526 [58:43<1:07:30, 1.46it/s] 49%|████▊ | 5608/11526 [58:43<1:05:26, 1.51it/s] {'loss': 0.2191, 'grad_norm': 0.5467002987861633, 'learning_rate': 6.098680323521461e-06, 'epoch': 1.46}
49%|████▊ | 5608/11526 [58:43<1:05:26, 1.51it/s] 49%|████▊ | 5609/11526 [58:44<1:04:03, 1.54it/s] {'loss': 0.2037, 'grad_norm': 0.5931721925735474, 'learning_rate': 6.097202971452143e-06, 'epoch': 1.46}
49%|████▊ | 5609/11526 [58:44<1:04:03, 1.54it/s] 49%|████▊ | 5610/11526 [58:45<1:02:59, 1.57it/s] {'loss': 0.2461, 'grad_norm': 0.7811353206634521, 'learning_rate': 6.0957255187411246e-06, 'epoch': 1.46}
49%|████▊ | 5610/11526 [58:45<1:02:59, 1.57it/s] 49%|████▊ | 5611/11526 [58:45<1:02:15, 1.58it/s] {'loss': 0.1615, 'grad_norm': 0.38858988881111145, 'learning_rate': 6.094247965523927e-06, 'epoch': 1.46}
49%|████▊ | 5611/11526 [58:45<1:02:15, 1.58it/s] 49%|████▊ | 5612/11526 [58:46<1:01:43, 1.60it/s] {'loss': 0.2427, 'grad_norm': 0.5762541890144348, 'learning_rate': 6.0927703119360814e-06, 'epoch': 1.46}
49%|████▊ | 5612/11526 [58:46<1:01:43, 1.60it/s] 49%|████▊ | 5613/11526 [58:46<1:01:25, 1.60it/s] {'loss': 0.1719, 'grad_norm': 0.48983651399612427, 'learning_rate': 6.0912925581131246e-06, 'epoch': 1.46}
49%|████▊ | 5613/11526 [58:46<1:01:25, 1.60it/s] 49%|████▊ | 5614/11526 [58:47<1:01:13, 1.61it/s] {'loss': 0.2143, 'grad_norm': 0.5667641162872314, 'learning_rate': 6.089814704190604e-06, 'epoch': 1.46}
49%|████▊ | 5614/11526 [58:47<1:01:13, 1.61it/s] 49%|████▊ | 5615/11526 [58:48<1:01:02, 1.61it/s] {'loss': 0.218, 'grad_norm': 0.6988852620124817, 'learning_rate': 6.088336750304078e-06, 'epoch': 1.46}
49%|████▊ | 5615/11526 [58:48<1:01:02, 1.61it/s] 49%|████▊ | 5616/11526 [58:48<1:00:59, 1.62it/s] {'loss': 0.2233, 'grad_norm': 0.6069226861000061, 'learning_rate': 6.086858696589115e-06, 'epoch': 1.46}
49%|████▊ | 5616/11526 [58:48<1:00:59, 1.62it/s] 49%|████▊ | 5617/11526 [58:49<1:00:48, 1.62it/s] {'loss': 0.1701, 'grad_norm': 0.4122316837310791, 'learning_rate': 6.0853805431812875e-06, 'epoch': 1.46}
49%|████▊ | 5617/11526 [58:49<1:00:48, 1.62it/s] 49%|████▊ | 5618/11526 [58:49<1:00:41, 1.62it/s] {'loss': 0.1729, 'grad_norm': 0.5006240010261536, 'learning_rate': 6.08390229021618e-06, 'epoch': 1.46}
49%|████▊ | 5618/11526 [58:50<1:00:41, 1.62it/s] 49%|████▉ | 5619/11526 [58:50<1:00:36, 1.62it/s] {'loss': 0.19, 'grad_norm': 0.6016986966133118, 'learning_rate': 6.082423937829388e-06, 'epoch': 1.46}
49%|████▉ | 5619/11526 [58:50<1:00:36, 1.62it/s] 49%|████▉ | 5620/11526 [58:51<1:00:32, 1.63it/s] {'loss': 0.2288, 'grad_norm': 0.628054141998291, 'learning_rate': 6.0809454861565145e-06, 'epoch': 1.46}
49%|████▉ | 5620/11526 [58:51<1:00:32, 1.63it/s] 49%|████▉ | 5621/11526 [58:51<1:00:33, 1.62it/s] {'loss': 0.2438, 'grad_norm': 0.5871626138687134, 'learning_rate': 6.07946693533317e-06, 'epoch': 1.46}
49%|████▉ | 5621/11526 [58:51<1:00:33, 1.62it/s] 49%|████▉ | 5622/11526 [58:52<1:00:32, 1.63it/s] {'loss': 0.2047, 'grad_norm': 0.6312248706817627, 'learning_rate': 6.0779882854949745e-06, 'epoch': 1.46}
49%|████▉ | 5622/11526 [58:52<1:00:32, 1.63it/s] 49%|████▉ | 5623/11526 [58:53<1:00:27, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.44203105568885803, 'learning_rate': 6.076509536777563e-06, 'epoch': 1.46}
49%|████▉ | 5623/11526 [58:53<1:00:27, 1.63it/s] 49%|████▉ | 5624/11526 [58:53<1:00:26, 1.63it/s] {'loss': 0.2069, 'grad_norm': 0.5758419036865234, 'learning_rate': 6.07503068931657e-06, 'epoch': 1.46}
49%|████▉ | 5624/11526 [58:53<1:00:26, 1.63it/s] 49%|████▉ | 5625/11526 [58:54<1:00:27, 1.63it/s] {'loss': 0.1682, 'grad_norm': 0.4579544961452484, 'learning_rate': 6.073551743247645e-06, 'epoch': 1.46}
49%|████▉ | 5625/11526 [58:54<1:00:27, 1.63it/s] 49%|████▉ | 5626/11526 [58:54<1:00:30, 1.63it/s] {'loss': 0.1471, 'grad_norm': 0.4236728549003601, 'learning_rate': 6.072072698706446e-06, 'epoch': 1.46}
49%|████▉ | 5626/11526 [58:54<1:00:30, 1.63it/s] 49%|████▉ | 5627/11526 [58:55<1:00:26, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.48835885524749756, 'learning_rate': 6.070593555828641e-06, 'epoch': 1.46}
49%|████▉ | 5627/11526 [58:55<1:00:26, 1.63it/s] 49%|████▉ | 5628/11526 [58:56<1:00:23, 1.63it/s] {'loss': 0.1757, 'grad_norm': 0.4389919340610504, 'learning_rate': 6.069114314749902e-06, 'epoch': 1.46}
49%|████▉ | 5628/11526 [58:56<1:00:23, 1.63it/s] 49%|████▉ | 5629/11526 [58:56<1:00:26, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.39506950974464417, 'learning_rate': 6.067634975605916e-06, 'epoch': 1.47}
49%|████▉ | 5629/11526 [58:56<1:00:26, 1.63it/s] 49%|████▉ | 5630/11526 [58:57<1:00:24, 1.63it/s] {'loss': 0.1503, 'grad_norm': 0.419766366481781, 'learning_rate': 6.0661555385323745e-06, 'epoch': 1.47}
49%|████▉ | 5630/11526 [58:57<1:00:24, 1.63it/s] 49%|████▉ | 5631/11526 [58:57<1:00:22, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.7811883091926575, 'learning_rate': 6.064676003664981e-06, 'epoch': 1.47}
49%|████▉ | 5631/11526 [58:58<1:00:22, 1.63it/s] 49%|████▉ | 5632/11526 [58:58<1:00:19, 1.63it/s] {'loss': 0.2226, 'grad_norm': 0.5197371244430542, 'learning_rate': 6.063196371139448e-06, 'epoch': 1.47}
49%|████▉ | 5632/11526 [58:58<1:00:19, 1.63it/s] 49%|████▉ | 5633/11526 [58:59<1:00:18, 1.63it/s] {'loss': 0.1844, 'grad_norm': 0.48972856998443604, 'learning_rate': 6.061716641091493e-06, 'epoch': 1.47}
49%|████▉ | 5633/11526 [58:59<1:00:18, 1.63it/s] 49%|████▉ | 5634/11526 [58:59<1:00:19, 1.63it/s] {'loss': 0.1499, 'grad_norm': 0.45029523968696594, 'learning_rate': 6.0602368136568466e-06, 'epoch': 1.47}
49%|████▉ | 5634/11526 [58:59<1:00:19, 1.63it/s] 49%|████▉ | 5635/11526 [59:00<1:00:20, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.504889190196991, 'learning_rate': 6.058756888971248e-06, 'epoch': 1.47}
49%|████▉ | 5635/11526 [59:00<1:00:20, 1.63it/s] 49%|████▉ | 5636/11526 [59:01<1:00:23, 1.63it/s] {'loss': 0.2658, 'grad_norm': 0.6705020666122437, 'learning_rate': 6.057276867170444e-06, 'epoch': 1.47}
49%|████▉ | 5636/11526 [59:01<1:00:23, 1.63it/s] 49%|████▉ | 5637/11526 [59:01<1:00:23, 1.63it/s] {'loss': 0.2677, 'grad_norm': 0.622747004032135, 'learning_rate': 6.055796748390189e-06, 'epoch': 1.47}
49%|████▉ | 5637/11526 [59:01<1:00:23, 1.63it/s] 49%|████▉ | 5638/11526 [59:02<1:00:20, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.5610184073448181, 'learning_rate': 6.054316532766249e-06, 'epoch': 1.47}
49%|████▉ | 5638/11526 [59:02<1:00:20, 1.63it/s] 49%|████▉ | 5639/11526 [59:02<1:00:22, 1.62it/s] {'loss': 0.2199, 'grad_norm': 0.5849447846412659, 'learning_rate': 6.0528362204343996e-06, 'epoch': 1.47}
49%|████▉ | 5639/11526 [59:02<1:00:22, 1.62it/s] 49%|████▉ | 5640/11526 [59:03<1:00:19, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.5197739005088806, 'learning_rate': 6.05135581153042e-06, 'epoch': 1.47}
49%|████▉ | 5640/11526 [59:03<1:00:19, 1.63it/s] 49%|████▉ | 5641/11526 [59:04<1:00:20, 1.63it/s] {'loss': 0.2136, 'grad_norm': 0.5593876242637634, 'learning_rate': 6.0498753061901024e-06, 'epoch': 1.47}
49%|████▉ | 5641/11526 [59:04<1:00:20, 1.63it/s] 49%|████▉ | 5642/11526 [59:04<1:00:18, 1.63it/s] {'loss': 0.2262, 'grad_norm': 0.6114769577980042, 'learning_rate': 6.04839470454925e-06, 'epoch': 1.47}
49%|████▉ | 5642/11526 [59:04<1:00:18, 1.63it/s] 49%|████▉ | 5643/11526 [59:05<1:00:20, 1.62it/s] {'loss': 0.2325, 'grad_norm': 0.6204046607017517, 'learning_rate': 6.046914006743669e-06, 'epoch': 1.47}
49%|████▉ | 5643/11526 [59:05<1:00:20, 1.62it/s] 49%|████▉ | 5644/11526 [59:05<1:00:20, 1.62it/s] {'loss': 0.2293, 'grad_norm': 0.5621123909950256, 'learning_rate': 6.04543321290918e-06, 'epoch': 1.47}
49%|████▉ | 5644/11526 [59:06<1:00:20, 1.62it/s] 49%|████▉ | 5645/11526 [59:06<1:00:16, 1.63it/s] {'loss': 0.1751, 'grad_norm': 0.4687138795852661, 'learning_rate': 6.0439523231816064e-06, 'epoch': 1.47}
49%|████▉ | 5645/11526 [59:06<1:00:16, 1.63it/s] 49%|████▉ | 5646/11526 [59:07<1:00:34, 1.62it/s] {'loss': 0.2579, 'grad_norm': 0.6127133369445801, 'learning_rate': 6.042471337696787e-06, 'epoch': 1.47}
49%|████▉ | 5646/11526 [59:07<1:00:34, 1.62it/s] 49%|████▉ | 5647/11526 [59:07<1:00:27, 1.62it/s] {'loss': 0.2374, 'grad_norm': 0.5370569825172424, 'learning_rate': 6.040990256590566e-06, 'epoch': 1.47}
49%|████▉ | 5647/11526 [59:07<1:00:27, 1.62it/s] 49%|████▉ | 5648/11526 [59:08<1:00:20, 1.62it/s] {'loss': 0.229, 'grad_norm': 0.5801236629486084, 'learning_rate': 6.039509079998796e-06, 'epoch': 1.47}
49%|████▉ | 5648/11526 [59:08<1:00:20, 1.62it/s] 49%|████▉ | 5649/11526 [59:09<1:00:23, 1.62it/s] {'loss': 0.3144, 'grad_norm': 0.7416834831237793, 'learning_rate': 6.038027808057336e-06, 'epoch': 1.47}
49%|████▉ | 5649/11526 [59:09<1:00:23, 1.62it/s] 49%|████▉ | 5650/11526 [59:09<1:00:18, 1.62it/s] {'loss': 0.2026, 'grad_norm': 0.6050172448158264, 'learning_rate': 6.036546440902061e-06, 'epoch': 1.47}
49%|████▉ | 5650/11526 [59:09<1:00:18, 1.62it/s] 49%|████▉ | 5651/11526 [59:10<1:00:18, 1.62it/s] {'loss': 0.2331, 'grad_norm': 0.5766851902008057, 'learning_rate': 6.035064978668848e-06, 'epoch': 1.47}
49%|████▉ | 5651/11526 [59:10<1:00:18, 1.62it/s] 49%|████▉ | 5652/11526 [59:10<1:00:13, 1.63it/s] {'loss': 0.2052, 'grad_norm': 0.8029273152351379, 'learning_rate': 6.033583421493587e-06, 'epoch': 1.47}
49%|████▉ | 5652/11526 [59:10<1:00:13, 1.63it/s] 49%|████▉ | 5653/11526 [59:11<1:00:12, 1.63it/s] {'loss': 0.202, 'grad_norm': 0.5255299806594849, 'learning_rate': 6.032101769512173e-06, 'epoch': 1.47}
49%|████▉ | 5653/11526 [59:11<1:00:12, 1.63it/s] 49%|████▉ | 5654/11526 [59:12<1:00:15, 1.62it/s] {'loss': 0.2009, 'grad_norm': 0.511721134185791, 'learning_rate': 6.030620022860513e-06, 'epoch': 1.47}
49%|████▉ | 5654/11526 [59:12<1:00:15, 1.62it/s] 49%|████▉ | 5655/11526 [59:12<1:00:11, 1.63it/s] {'loss': 0.2332, 'grad_norm': 0.5948547720909119, 'learning_rate': 6.02913818167452e-06, 'epoch': 1.47}
49%|████▉ | 5655/11526 [59:12<1:00:11, 1.63it/s] 49%|████▉ | 5656/11526 [59:13<1:00:12, 1.62it/s] {'loss': 0.1981, 'grad_norm': 0.5125578045845032, 'learning_rate': 6.0276562460901165e-06, 'epoch': 1.47}
49%|████▉ | 5656/11526 [59:13<1:00:12, 1.62it/s] 49%|████▉ | 5657/11526 [59:13<1:00:10, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.6365430355072021, 'learning_rate': 6.026174216243235e-06, 'epoch': 1.47}
49%|████▉ | 5657/11526 [59:14<1:00:10, 1.63it/s] 49%|████▉ | 5658/11526 [59:14<1:00:11, 1.62it/s] {'loss': 0.2007, 'grad_norm': 0.5775301456451416, 'learning_rate': 6.024692092269818e-06, 'epoch': 1.47}
49%|████▉ | 5658/11526 [59:14<1:00:11, 1.62it/s] 49%|████▉ | 5659/11526 [59:15<1:00:11, 1.62it/s] {'loss': 0.1456, 'grad_norm': 0.4555552899837494, 'learning_rate': 6.023209874305811e-06, 'epoch': 1.47}
49%|████▉ | 5659/11526 [59:15<1:00:11, 1.62it/s] 49%|████▉ | 5660/11526 [59:15<1:00:07, 1.63it/s] {'loss': 0.1757, 'grad_norm': 0.47749319672584534, 'learning_rate': 6.021727562487171e-06, 'epoch': 1.47}
49%|████▉ | 5660/11526 [59:15<1:00:07, 1.63it/s] 49%|████▉ | 5661/11526 [59:16<1:00:10, 1.62it/s] {'loss': 0.1423, 'grad_norm': 0.4216955602169037, 'learning_rate': 6.020245156949867e-06, 'epoch': 1.47}
49%|████▉ | 5661/11526 [59:16<1:00:10, 1.62it/s] 49%|████▉ | 5662/11526 [59:17<1:00:16, 1.62it/s] {'loss': 0.1776, 'grad_norm': 0.46916186809539795, 'learning_rate': 6.018762657829872e-06, 'epoch': 1.47}
49%|████▉ | 5662/11526 [59:17<1:00:16, 1.62it/s] 49%|████▉ | 5663/11526 [59:17<1:00:07, 1.63it/s] {'loss': 0.2755, 'grad_norm': 0.6780477166175842, 'learning_rate': 6.0172800652631706e-06, 'epoch': 1.47}
49%|████▉ | 5663/11526 [59:17<1:00:07, 1.63it/s] 49%|████▉ | 5664/11526 [59:18<1:00:09, 1.62it/s] {'loss': 0.1875, 'grad_norm': 0.5223588943481445, 'learning_rate': 6.015797379385751e-06, 'epoch': 1.47}
49%|████▉ | 5664/11526 [59:18<1:00:09, 1.62it/s] 49%|████▉ | 5665/11526 [59:18<1:00:05, 1.63it/s] {'loss': 0.1633, 'grad_norm': 0.4837125539779663, 'learning_rate': 6.014314600333618e-06, 'epoch': 1.47}
49%|████▉ | 5665/11526 [59:18<1:00:05, 1.63it/s] 49%|████▉ | 5666/11526 [59:19<1:00:08, 1.62it/s] {'loss': 0.1798, 'grad_norm': 0.5093609690666199, 'learning_rate': 6.012831728242778e-06, 'epoch': 1.47}
49%|████▉ | 5666/11526 [59:19<1:00:08, 1.62it/s] 49%|████▉ | 5667/11526 [59:20<1:00:04, 1.63it/s] {'loss': 0.2148, 'grad_norm': 0.49581676721572876, 'learning_rate': 6.01134876324925e-06, 'epoch': 1.48}
49%|████▉ | 5667/11526 [59:20<1:00:04, 1.63it/s] 49%|████▉ | 5668/11526 [59:20<1:00:02, 1.63it/s] {'loss': 0.1841, 'grad_norm': 0.6127605438232422, 'learning_rate': 6.009865705489058e-06, 'epoch': 1.48}
49%|████▉ | 5668/11526 [59:20<1:00:02, 1.63it/s] 49%|████▉ | 5669/11526 [59:21<1:00:04, 1.62it/s] {'loss': 0.232, 'grad_norm': 0.6612945795059204, 'learning_rate': 6.0083825550982375e-06, 'epoch': 1.48}
49%|████▉ | 5669/11526 [59:21<1:00:04, 1.62it/s] 49%|████▉ | 5670/11526 [59:21<1:00:02, 1.63it/s] {'loss': 0.1529, 'grad_norm': 0.4622403085231781, 'learning_rate': 6.006899312212832e-06, 'epoch': 1.48}
49%|████▉ | 5670/11526 [59:22<1:00:02, 1.63it/s] 49%|████▉ | 5671/11526 [59:22<1:00:03, 1.62it/s] {'loss': 0.1823, 'grad_norm': 0.45539867877960205, 'learning_rate': 6.005415976968893e-06, 'epoch': 1.48}
49%|████▉ | 5671/11526 [59:22<1:00:03, 1.62it/s] 49%|████▉ | 5672/11526 [59:23<1:00:02, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.5887107253074646, 'learning_rate': 6.003932549502479e-06, 'epoch': 1.48}
49%|████▉ | 5672/11526 [59:23<1:00:02, 1.63it/s] 49%|████▉ | 5673/11526 [59:23<59:59, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.492247611284256, 'learning_rate': 6.00244902994966e-06, 'epoch': 1.48}
49%|████▉ | 5673/11526 [59:23<59:59, 1.63it/s] 49%|████▉ | 5674/11526 [59:24<59:58, 1.63it/s] {'loss': 0.1613, 'grad_norm': 0.5223187804222107, 'learning_rate': 6.000965418446513e-06, 'epoch': 1.48}
49%|████▉ | 5674/11526 [59:24<59:58, 1.63it/s] 49%|████▉ | 5675/11526 [59:25<59:57, 1.63it/s] {'loss': 0.2294, 'grad_norm': 0.6061701774597168, 'learning_rate': 5.9994817151291215e-06, 'epoch': 1.48}
49%|████▉ | 5675/11526 [59:25<59:57, 1.63it/s] 49%|████▉ | 5676/11526 [59:25<1:00:01, 1.62it/s] {'loss': 0.257, 'grad_norm': 0.609840989112854, 'learning_rate': 5.9979979201335806e-06, 'epoch': 1.48}
49%|████▉ | 5676/11526 [59:25<1:00:01, 1.62it/s] 49%|████▉ | 5677/11526 [59:26<1:00:01, 1.62it/s] {'loss': 0.1885, 'grad_norm': 0.555388331413269, 'learning_rate': 5.9965140335959915e-06, 'epoch': 1.48}
49%|████▉ | 5677/11526 [59:26<1:00:01, 1.62it/s] 49%|████▉ | 5678/11526 [59:26<59:59, 1.62it/s] {'loss': 0.2117, 'grad_norm': 0.570460855960846, 'learning_rate': 5.995030055652467e-06, 'epoch': 1.48}
49%|████▉ | 5678/11526 [59:26<59:59, 1.62it/s] 49%|████▉ | 5679/11526 [59:27<1:00:02, 1.62it/s] {'loss': 0.1824, 'grad_norm': 0.5062739849090576, 'learning_rate': 5.993545986439122e-06, 'epoch': 1.48}
49%|████▉ | 5679/11526 [59:27<1:00:02, 1.62it/s] 49%|████▉ | 5680/11526 [59:28<1:00:01, 1.62it/s] {'loss': 0.2102, 'grad_norm': 0.4958834946155548, 'learning_rate': 5.992061826092087e-06, 'epoch': 1.48}
49%|████▉ | 5680/11526 [59:28<1:00:01, 1.62it/s] 49%|████▉ | 5681/11526 [59:28<1:00:01, 1.62it/s] {'loss': 0.2113, 'grad_norm': 0.532136082649231, 'learning_rate': 5.990577574747498e-06, 'epoch': 1.48}
49%|████▉ | 5681/11526 [59:28<1:00:01, 1.62it/s] 49%|████▉ | 5682/11526 [59:29<59:56, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5008308291435242, 'learning_rate': 5.989093232541496e-06, 'epoch': 1.48}
49%|████▉ | 5682/11526 [59:29<59:56, 1.63it/s] 49%|████▉ | 5683/11526 [59:29<59:52, 1.63it/s] {'loss': 0.2257, 'grad_norm': 0.7039124965667725, 'learning_rate': 5.987608799610236e-06, 'epoch': 1.48}
49%|████▉ | 5683/11526 [59:30<59:52, 1.63it/s] 49%|████▉ | 5684/11526 [59:30<59:53, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.5631621479988098, 'learning_rate': 5.9861242760898775e-06, 'epoch': 1.48}
49%|████▉ | 5684/11526 [59:30<59:53, 1.63it/s] 49%|████▉ | 5685/11526 [59:31<59:50, 1.63it/s] {'loss': 0.1765, 'grad_norm': 0.6027649641036987, 'learning_rate': 5.984639662116589e-06, 'epoch': 1.48}
49%|████▉ | 5685/11526 [59:31<59:50, 1.63it/s] 49%|████▉ | 5686/11526 [59:31<59:51, 1.63it/s] {'loss': 0.1989, 'grad_norm': 0.47663241624832153, 'learning_rate': 5.983154957826549e-06, 'epoch': 1.48}
49%|████▉ | 5686/11526 [59:31<59:51, 1.63it/s] 49%|████▉ | 5687/11526 [59:32<59:50, 1.63it/s] {'loss': 0.193, 'grad_norm': 0.4539510905742645, 'learning_rate': 5.981670163355941e-06, 'epoch': 1.48}
49%|████▉ | 5687/11526 [59:32<59:50, 1.63it/s] 49%|████▉ | 5688/11526 [59:33<59:48, 1.63it/s] {'loss': 0.2196, 'grad_norm': 0.5826871395111084, 'learning_rate': 5.980185278840963e-06, 'epoch': 1.48}
49%|████▉ | 5688/11526 [59:33<59:48, 1.63it/s] 49%|████▉ | 5689/11526 [59:33<59:52, 1.62it/s] {'loss': 0.1967, 'grad_norm': 0.6039407253265381, 'learning_rate': 5.978700304417812e-06, 'epoch': 1.48}
49%|████▉ | 5689/11526 [59:33<59:52, 1.62it/s] 49%|████▉ | 5690/11526 [59:34<59:48, 1.63it/s] {'loss': 0.1615, 'grad_norm': 0.4522641897201538, 'learning_rate': 5.9772152402227006e-06, 'epoch': 1.48}
49%|████▉ | 5690/11526 [59:34<59:48, 1.63it/s] 49%|████▉ | 5691/11526 [59:34<59:49, 1.63it/s] {'loss': 0.1724, 'grad_norm': 0.46480217576026917, 'learning_rate': 5.975730086391848e-06, 'epoch': 1.48}
49%|████▉ | 5691/11526 [59:34<59:49, 1.63it/s] 49%|████▉ | 5692/11526 [59:35<59:46, 1.63it/s] {'loss': 0.223, 'grad_norm': 0.5977430939674377, 'learning_rate': 5.97424484306148e-06, 'epoch': 1.48}
49%|████▉ | 5692/11526 [59:35<59:46, 1.63it/s] 49%|████▉ | 5693/11526 [59:36<59:45, 1.63it/s] {'loss': 0.3284, 'grad_norm': 0.7024449706077576, 'learning_rate': 5.972759510367831e-06, 'epoch': 1.48}
49%|████▉ | 5693/11526 [59:36<59:45, 1.63it/s] 49%|████▉ | 5694/11526 [59:36<59:50, 1.62it/s] {'loss': 0.1997, 'grad_norm': 0.5551047325134277, 'learning_rate': 5.971274088447145e-06, 'epoch': 1.48}
49%|████▉ | 5694/11526 [59:36<59:50, 1.62it/s] 49%|████▉ | 5695/11526 [59:37<59:46, 1.63it/s] {'loss': 0.2048, 'grad_norm': 0.5122146010398865, 'learning_rate': 5.9697885774356735e-06, 'epoch': 1.48}
49%|████▉ | 5695/11526 [59:37<59:46, 1.63it/s] 49%|████▉ | 5696/11526 [59:37<59:45, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.5244777798652649, 'learning_rate': 5.9683029774696774e-06, 'epoch': 1.48}
49%|████▉ | 5696/11526 [59:38<59:45, 1.63it/s] 49%|████▉ | 5697/11526 [59:38<59:43, 1.63it/s] {'loss': 0.1665, 'grad_norm': 0.4991079866886139, 'learning_rate': 5.966817288685423e-06, 'epoch': 1.48}
49%|████▉ | 5697/11526 [59:38<59:43, 1.63it/s] 49%|████▉ | 5698/11526 [59:39<59:41, 1.63it/s] {'loss': 0.1742, 'grad_norm': 0.46945151686668396, 'learning_rate': 5.965331511219182e-06, 'epoch': 1.48}
49%|████▉ | 5698/11526 [59:39<59:41, 1.63it/s] 49%|████▉ | 5699/11526 [59:39<59:39, 1.63it/s] {'loss': 0.2386, 'grad_norm': 0.6155315637588501, 'learning_rate': 5.963845645207246e-06, 'epoch': 1.48}
49%|████▉ | 5699/11526 [59:39<59:39, 1.63it/s] 49%|████▉ | 5700/11526 [59:40<59:39, 1.63it/s] {'loss': 0.1947, 'grad_norm': 0.5190098285675049, 'learning_rate': 5.962359690785902e-06, 'epoch': 1.48}
49%|████▉ | 5700/11526 [59:40<59:39, 1.63it/s] 49%|████▉ | 5701/11526 [59:40<59:41, 1.63it/s] {'loss': 0.313, 'grad_norm': 0.6676728129386902, 'learning_rate': 5.960873648091452e-06, 'epoch': 1.48}
49%|████▉ | 5701/11526 [59:41<59:41, 1.63it/s] 49%|████▉ | 5702/11526 [59:41<59:38, 1.63it/s] {'loss': 0.2429, 'grad_norm': 0.5827082991600037, 'learning_rate': 5.959387517260201e-06, 'epoch': 1.48}
49%|████▉ | 5702/11526 [59:41<59:38, 1.63it/s] 49%|████▉ | 5703/11526 [59:42<59:37, 1.63it/s] {'loss': 0.2478, 'grad_norm': 0.5722534656524658, 'learning_rate': 5.957901298428472e-06, 'epoch': 1.48}
49%|████▉ | 5703/11526 [59:42<59:37, 1.63it/s] 49%|████▉ | 5704/11526 [59:42<59:38, 1.63it/s] {'loss': 0.1614, 'grad_norm': 0.4421345889568329, 'learning_rate': 5.9564149917325845e-06, 'epoch': 1.48}
49%|████▉ | 5704/11526 [59:42<59:38, 1.63it/s] 49%|████▉ | 5705/11526 [59:43<59:38, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.4900866150856018, 'learning_rate': 5.954928597308872e-06, 'epoch': 1.48}
49%|████▉ | 5705/11526 [59:43<59:38, 1.63it/s] 50%|████▉ | 5706/11526 [59:44<59:43, 1.62it/s] {'loss': 0.2878, 'grad_norm': 0.7467003464698792, 'learning_rate': 5.953442115293674e-06, 'epoch': 1.49}
50%|████▉ | 5706/11526 [59:44<59:43, 1.62it/s] 50%|████▉ | 5707/11526 [59:44<59:40, 1.63it/s] {'loss': 0.1661, 'grad_norm': 0.530680775642395, 'learning_rate': 5.951955545823342e-06, 'epoch': 1.49}
50%|████▉ | 5707/11526 [59:44<59:40, 1.63it/s] 50%|████▉ | 5708/11526 [59:45<59:41, 1.62it/s] {'loss': 0.2428, 'grad_norm': 0.6139402985572815, 'learning_rate': 5.950468889034232e-06, 'epoch': 1.49}
50%|████▉ | 5708/11526 [59:45<59:41, 1.62it/s] 50%|████▉ | 5709/11526 [59:45<59:39, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.48442333936691284, 'learning_rate': 5.948982145062705e-06, 'epoch': 1.49}
50%|████▉ | 5709/11526 [59:46<59:39, 1.63it/s] 50%|████▉ | 5710/11526 [59:46<59:36, 1.63it/s] {'loss': 0.1869, 'grad_norm': 0.5141445398330688, 'learning_rate': 5.9474953140451375e-06, 'epoch': 1.49}
50%|████▉ | 5710/11526 [59:46<59:36, 1.63it/s] 50%|████▉ | 5711/11526 [59:47<59:38, 1.63it/s] {'loss': 0.2316, 'grad_norm': 0.5655697584152222, 'learning_rate': 5.9460083961179086e-06, 'epoch': 1.49}
50%|████▉ | 5711/11526 [59:47<59:38, 1.63it/s] 50%|████▉ | 5712/11526 [59:47<59:35, 1.63it/s] {'loss': 0.2544, 'grad_norm': 0.5419268608093262, 'learning_rate': 5.9445213914174075e-06, 'epoch': 1.49}
50%|████▉ | 5712/11526 [59:47<59:35, 1.63it/s] 50%|████▉ | 5713/11526 [59:48<59:33, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.5332032442092896, 'learning_rate': 5.943034300080029e-06, 'epoch': 1.49}
50%|████▉ | 5713/11526 [59:48<59:33, 1.63it/s] 50%|████▉ | 5714/11526 [59:48<59:33, 1.63it/s] {'loss': 0.2661, 'grad_norm': 0.6313892602920532, 'learning_rate': 5.94154712224218e-06, 'epoch': 1.49}
50%|████▉ | 5714/11526 [59:49<59:33, 1.63it/s] 50%|████▉ | 5715/11526 [59:49<59:32, 1.63it/s] {'loss': 0.1853, 'grad_norm': 0.49832865595817566, 'learning_rate': 5.9400598580402704e-06, 'epoch': 1.49}
50%|████▉ | 5715/11526 [59:49<59:32, 1.63it/s] 50%|████▉ | 5716/11526 [59:50<59:36, 1.62it/s] {'loss': 0.1854, 'grad_norm': 0.5252243876457214, 'learning_rate': 5.938572507610724e-06, 'epoch': 1.49}
50%|████▉ | 5716/11526 [59:50<59:36, 1.62it/s] 50%|████▉ | 5717/11526 [59:50<59:32, 1.63it/s] {'loss': 0.2768, 'grad_norm': 0.717777669429779, 'learning_rate': 5.937085071089965e-06, 'epoch': 1.49}
50%|████▉ | 5717/11526 [59:50<59:32, 1.63it/s] 50%|████▉ | 5718/11526 [59:51<59:29, 1.63it/s] {'loss': 0.1458, 'grad_norm': 0.4288213551044464, 'learning_rate': 5.935597548614432e-06, 'epoch': 1.49}
50%|████▉ | 5718/11526 [59:51<59:29, 1.63it/s] 50%|████▉ | 5719/11526 [59:52<59:32, 1.63it/s] {'loss': 0.2356, 'grad_norm': 0.5893942713737488, 'learning_rate': 5.9341099403205695e-06, 'epoch': 1.49}
50%|████▉ | 5719/11526 [59:52<59:32, 1.63it/s] 50%|████▉ | 5720/11526 [59:52<59:32, 1.63it/s] {'loss': 0.2121, 'grad_norm': 0.5176088213920593, 'learning_rate': 5.932622246344828e-06, 'epoch': 1.49}
50%|████▉ | 5720/11526 [59:52<59:32, 1.63it/s] 50%|████▉ | 5721/11526 [59:53<59:38, 1.62it/s] {'loss': 0.2156, 'grad_norm': 0.535812497138977, 'learning_rate': 5.931134466823667e-06, 'epoch': 1.49}
50%|████▉ | 5721/11526 [59:53<59:38, 1.62it/s] 50%|████▉ | 5722/11526 [59:53<59:34, 1.62it/s] {'loss': 0.1958, 'grad_norm': 0.5119482278823853, 'learning_rate': 5.929646601893555e-06, 'epoch': 1.49}
50%|████▉ | 5722/11526 [59:54<59:34, 1.62it/s] 50%|████▉ | 5723/11526 [59:54<59:29, 1.63it/s] {'loss': 0.2108, 'grad_norm': 0.5357882380485535, 'learning_rate': 5.9281586516909674e-06, 'epoch': 1.49}
50%|████▉ | 5723/11526 [59:54<59:29, 1.63it/s] 50%|████▉ | 5724/11526 [59:55<59:31, 1.62it/s] {'loss': 0.2374, 'grad_norm': 0.5885134339332581, 'learning_rate': 5.9266706163523865e-06, 'epoch': 1.49}
50%|████▉ | 5724/11526 [59:55<59:31, 1.62it/s] 50%|████▉ | 5725/11526 [59:55<59:30, 1.62it/s] {'loss': 0.2026, 'grad_norm': 0.48996907472610474, 'learning_rate': 5.925182496014306e-06, 'epoch': 1.49}
50%|████▉ | 5725/11526 [59:55<59:30, 1.62it/s] 50%|████▉ | 5726/11526 [59:56<59:33, 1.62it/s] {'loss': 0.1868, 'grad_norm': 0.5043264031410217, 'learning_rate': 5.923694290813221e-06, 'epoch': 1.49}
50%|████▉ | 5726/11526 [59:56<59:33, 1.62it/s] 50%|████▉ | 5727/11526 [59:56<59:28, 1.62it/s] {'loss': 0.1853, 'grad_norm': 0.5008369088172913, 'learning_rate': 5.922206000885641e-06, 'epoch': 1.49}
50%|████▉ | 5727/11526 [59:57<59:28, 1.62it/s] 50%|████▉ | 5728/11526 [59:57<59:24, 1.63it/s] {'loss': 0.239, 'grad_norm': 0.6395801901817322, 'learning_rate': 5.920717626368079e-06, 'epoch': 1.49}
50%|████▉ | 5728/11526 [59:57<59:24, 1.63it/s] 50%|████▉ | 5729/11526 [59:58<59:26, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.520687460899353, 'learning_rate': 5.919229167397058e-06, 'epoch': 1.49}
50%|████▉ | 5729/11526 [59:58<59:26, 1.63it/s] 50%|████▉ | 5730/11526 [59:58<59:25, 1.63it/s] {'loss': 0.2904, 'grad_norm': 0.5490100383758545, 'learning_rate': 5.917740624109107e-06, 'epoch': 1.49}
50%|████▉ | 5730/11526 [59:58<59:25, 1.63it/s] 50%|████▉ | 5731/11526 [59:59<59:26, 1.62it/s] {'loss': 0.2391, 'grad_norm': 0.6140601634979248, 'learning_rate': 5.916251996640765e-06, 'epoch': 1.49}
50%|████▉ | 5731/11526 [59:59<59:26, 1.62it/s] 50%|████▉ | 5732/11526 [1:00:00<59:24, 1.63it/s] {'loss': 0.1872, 'grad_norm': 0.5619593262672424, 'learning_rate': 5.9147632851285764e-06, 'epoch': 1.49}
50%|████▉ | 5732/11526 [1:00:00<59:24, 1.63it/s] 50%|████▉ | 5733/11526 [1:00:00<59:24, 1.63it/s] {'loss': 0.2105, 'grad_norm': 0.5136166214942932, 'learning_rate': 5.913274489709094e-06, 'epoch': 1.49}
50%|████▉ | 5733/11526 [1:00:00<59:24, 1.63it/s] 50%|████▉ | 5734/11526 [1:00:01<59:21, 1.63it/s] {'loss': 0.2276, 'grad_norm': 0.5098747611045837, 'learning_rate': 5.91178561051888e-06, 'epoch': 1.49}
50%|████▉ | 5734/11526 [1:00:01<59:21, 1.63it/s] 50%|████▉ | 5735/11526 [1:00:01<59:19, 1.63it/s] {'loss': 0.2351, 'grad_norm': 0.6806982755661011, 'learning_rate': 5.910296647694501e-06, 'epoch': 1.49}
50%|████▉ | 5735/11526 [1:00:02<59:19, 1.63it/s] 50%|████▉ | 5736/11526 [1:00:02<59:20, 1.63it/s] {'loss': 0.3254, 'grad_norm': 0.6985424160957336, 'learning_rate': 5.908807601372534e-06, 'epoch': 1.49}
50%|████▉ | 5736/11526 [1:00:02<59:20, 1.63it/s] 50%|████▉ | 5737/11526 [1:00:03<59:20, 1.63it/s] {'loss': 0.2673, 'grad_norm': 0.697185754776001, 'learning_rate': 5.907318471689565e-06, 'epoch': 1.49}
50%|████▉ | 5737/11526 [1:00:03<59:20, 1.63it/s] 50%|████▉ | 5738/11526 [1:00:03<59:17, 1.63it/s] {'loss': 0.2045, 'grad_norm': 0.5234305262565613, 'learning_rate': 5.905829258782181e-06, 'epoch': 1.49}
50%|████▉ | 5738/11526 [1:00:03<59:17, 1.63it/s] 50%|████▉ | 5739/11526 [1:00:04<59:16, 1.63it/s] {'loss': 0.1656, 'grad_norm': 0.5651493072509766, 'learning_rate': 5.9043399627869845e-06, 'epoch': 1.49}
50%|████▉ | 5739/11526 [1:00:04<59:16, 1.63it/s] 50%|████▉ | 5740/11526 [1:00:04<59:14, 1.63it/s] {'loss': 0.2343, 'grad_norm': 0.6397081017494202, 'learning_rate': 5.90285058384058e-06, 'epoch': 1.49}
50%|████▉ | 5740/11526 [1:00:05<59:14, 1.63it/s] 50%|████▉ | 5741/11526 [1:00:05<59:28, 1.62it/s] {'loss': 0.24, 'grad_norm': 0.6090707778930664, 'learning_rate': 5.901361122079584e-06, 'epoch': 1.49}
50%|████▉ | 5741/11526 [1:00:05<59:28, 1.62it/s] 50%|████▉ | 5742/11526 [1:00:06<59:23, 1.62it/s] {'loss': 0.2115, 'grad_norm': 0.5835724472999573, 'learning_rate': 5.899871577640617e-06, 'epoch': 1.49}
50%|████▉ | 5742/11526 [1:00:06<59:23, 1.62it/s] 50%|████▉ | 5743/11526 [1:00:06<59:18, 1.62it/s] {'loss': 0.2352, 'grad_norm': 0.620805561542511, 'learning_rate': 5.8983819506603106e-06, 'epoch': 1.49}
50%|████▉ | 5743/11526 [1:00:06<59:18, 1.62it/s] 50%|████▉ | 5744/11526 [1:00:07<59:19, 1.62it/s] {'loss': 0.2391, 'grad_norm': 0.6481167674064636, 'learning_rate': 5.896892241275296e-06, 'epoch': 1.5}
50%|████▉ | 5744/11526 [1:00:07<59:19, 1.62it/s] 50%|████▉ | 5745/11526 [1:00:08<59:16, 1.63it/s] {'loss': 0.1929, 'grad_norm': 0.4687800407409668, 'learning_rate': 5.895402449622226e-06, 'epoch': 1.5}
50%|████▉ | 5745/11526 [1:00:08<59:16, 1.63it/s] 50%|████▉ | 5746/11526 [1:00:08<59:21, 1.62it/s] {'loss': 0.1943, 'grad_norm': 0.52703458070755, 'learning_rate': 5.893912575837748e-06, 'epoch': 1.5}
50%|████▉ | 5746/11526 [1:00:08<59:21, 1.62it/s] 50%|████▉ | 5747/11526 [1:00:09<59:17, 1.62it/s] {'loss': 0.2262, 'grad_norm': 0.5991954803466797, 'learning_rate': 5.892422620058521e-06, 'epoch': 1.5}
50%|████▉ | 5747/11526 [1:00:09<59:17, 1.62it/s] 50%|████▉ | 5748/11526 [1:00:09<59:13, 1.63it/s] {'loss': 0.1789, 'grad_norm': 0.48423153162002563, 'learning_rate': 5.890932582421214e-06, 'epoch': 1.5}
50%|████▉ | 5748/11526 [1:00:10<59:13, 1.63it/s] 50%|████▉ | 5749/11526 [1:00:10<59:11, 1.63it/s] {'loss': 0.1941, 'grad_norm': 0.52970951795578, 'learning_rate': 5.8894424630625e-06, 'epoch': 1.5}
50%|████▉ | 5749/11526 [1:00:10<59:11, 1.63it/s] 50%|████▉ | 5750/11526 [1:00:11<59:10, 1.63it/s] {'loss': 0.1529, 'grad_norm': 0.447519987821579, 'learning_rate': 5.887952262119066e-06, 'epoch': 1.5}
50%|████▉ | 5750/11526 [1:00:11<59:10, 1.63it/s] 50%|████▉ | 5751/11526 [1:00:11<59:25, 1.62it/s] {'loss': 0.1414, 'grad_norm': 0.45108315348625183, 'learning_rate': 5.886461979727594e-06, 'epoch': 1.5}
50%|████▉ | 5751/11526 [1:00:11<59:25, 1.62it/s] 50%|████▉ | 5752/11526 [1:00:12<59:20, 1.62it/s] {'loss': 0.1705, 'grad_norm': 0.4798590838909149, 'learning_rate': 5.884971616024788e-06, 'epoch': 1.5}
50%|████▉ | 5752/11526 [1:00:12<59:20, 1.62it/s] 50%|████▉ | 5753/11526 [1:00:12<59:16, 1.62it/s] {'loss': 0.2618, 'grad_norm': 0.6137256622314453, 'learning_rate': 5.883481171147351e-06, 'epoch': 1.5}
50%|████▉ | 5753/11526 [1:00:13<59:16, 1.62it/s] 50%|████▉ | 5754/11526 [1:00:13<59:16, 1.62it/s] {'loss': 0.189, 'grad_norm': 0.48408666253089905, 'learning_rate': 5.8819906452319935e-06, 'epoch': 1.5}
50%|████▉ | 5754/11526 [1:00:13<59:16, 1.62it/s] 50%|████▉ | 5755/11526 [1:00:14<59:13, 1.62it/s] {'loss': 0.2356, 'grad_norm': 0.650449275970459, 'learning_rate': 5.8805000384154334e-06, 'epoch': 1.5}
50%|████▉ | 5755/11526 [1:00:14<59:13, 1.62it/s] 50%|████▉ | 5756/11526 [1:00:14<59:26, 1.62it/s] {'loss': 0.2566, 'grad_norm': 0.7162827253341675, 'learning_rate': 5.879009350834402e-06, 'epoch': 1.5}
50%|████▉ | 5756/11526 [1:00:14<59:26, 1.62it/s] 50%|████▉ | 5757/11526 [1:00:15<59:19, 1.62it/s] {'loss': 0.1572, 'grad_norm': 0.5281780958175659, 'learning_rate': 5.877518582625632e-06, 'epoch': 1.5}
50%|████▉ | 5757/11526 [1:00:15<59:19, 1.62it/s] 50%|████▉ | 5758/11526 [1:00:16<59:13, 1.62it/s] {'loss': 0.2226, 'grad_norm': 0.5210511684417725, 'learning_rate': 5.8760277339258645e-06, 'epoch': 1.5}
50%|████▉ | 5758/11526 [1:00:16<59:13, 1.62it/s] 50%|████▉ | 5759/11526 [1:00:16<59:14, 1.62it/s] {'loss': 0.1776, 'grad_norm': 0.526663601398468, 'learning_rate': 5.874536804871848e-06, 'epoch': 1.5}
50%|████▉ | 5759/11526 [1:00:16<59:14, 1.62it/s] 50%|████▉ | 5760/11526 [1:00:17<59:08, 1.62it/s] {'loss': 0.2445, 'grad_norm': 0.7166354060173035, 'learning_rate': 5.873045795600339e-06, 'epoch': 1.5}
50%|████▉ | 5760/11526 [1:00:17<59:08, 1.62it/s] 50%|████▉ | 5761/11526 [1:00:17<59:09, 1.62it/s] {'loss': 0.2059, 'grad_norm': 0.5387047529220581, 'learning_rate': 5.871554706248105e-06, 'epoch': 1.5}
50%|████▉ | 5761/11526 [1:00:18<59:09, 1.62it/s] 50%|████▉ | 5762/11526 [1:00:18<59:08, 1.62it/s] {'loss': 0.1892, 'grad_norm': 0.5318969488143921, 'learning_rate': 5.870063536951913e-06, 'epoch': 1.5}
50%|████▉ | 5762/11526 [1:00:18<59:08, 1.62it/s] 50%|█████ | 5763/11526 [1:00:19<59:04, 1.63it/s] {'loss': 0.2224, 'grad_norm': 0.5628551244735718, 'learning_rate': 5.8685722878485415e-06, 'epoch': 1.5}
50%|█████ | 5763/11526 [1:00:19<59:04, 1.63it/s] 50%|█████ | 5764/11526 [1:00:19<59:08, 1.62it/s] {'loss': 0.2612, 'grad_norm': 0.5597865581512451, 'learning_rate': 5.867080959074779e-06, 'epoch': 1.5}
50%|█████ | 5764/11526 [1:00:19<59:08, 1.62it/s] 50%|█████ | 5765/11526 [1:00:20<59:08, 1.62it/s] {'loss': 0.2151, 'grad_norm': 0.6233736276626587, 'learning_rate': 5.865589550767418e-06, 'epoch': 1.5}
50%|█████ | 5765/11526 [1:00:20<59:08, 1.62it/s] 50%|█████ | 5766/11526 [1:00:21<59:08, 1.62it/s] {'loss': 0.2088, 'grad_norm': 0.5174716114997864, 'learning_rate': 5.864098063063258e-06, 'epoch': 1.5}
50%|█████ | 5766/11526 [1:00:21<59:08, 1.62it/s] 50%|█████ | 5767/11526 [1:00:21<59:06, 1.62it/s] {'loss': 0.1502, 'grad_norm': 0.41443362832069397, 'learning_rate': 5.8626064960991065e-06, 'epoch': 1.5}
50%|█████ | 5767/11526 [1:00:21<59:06, 1.62it/s] 50%|█████ | 5768/11526 [1:00:22<59:03, 1.63it/s] {'loss': 0.1529, 'grad_norm': 0.4598965346813202, 'learning_rate': 5.861114850011781e-06, 'epoch': 1.5}
50%|█████ | 5768/11526 [1:00:22<59:03, 1.63it/s] 50%|█████ | 5769/11526 [1:00:22<59:03, 1.62it/s] {'loss': 0.1754, 'grad_norm': 0.5283898711204529, 'learning_rate': 5.859623124938101e-06, 'epoch': 1.5}
50%|█████ | 5769/11526 [1:00:22<59:03, 1.62it/s] 50%|█████ | 5770/11526 [1:00:23<59:01, 1.63it/s] {'loss': 0.197, 'grad_norm': 0.5482416152954102, 'learning_rate': 5.858131321014896e-06, 'epoch': 1.5}
50%|█████ | 5770/11526 [1:00:23<59:01, 1.63it/s] 50%|█████ | 5771/11526 [1:00:24<59:02, 1.62it/s] {'loss': 0.2219, 'grad_norm': 0.5968651175498962, 'learning_rate': 5.856639438379004e-06, 'epoch': 1.5}
50%|█████ | 5771/11526 [1:00:24<59:02, 1.62it/s] 50%|█████ | 5772/11526 [1:00:24<59:01, 1.62it/s] {'loss': 0.1783, 'grad_norm': 0.5309575200080872, 'learning_rate': 5.855147477167269e-06, 'epoch': 1.5}
50%|█████ | 5772/11526 [1:00:24<59:01, 1.62it/s] 50%|█████ | 5773/11526 [1:00:25<58:57, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.43377113342285156, 'learning_rate': 5.853655437516542e-06, 'epoch': 1.5}
50%|█████ | 5773/11526 [1:00:25<58:57, 1.63it/s] 50%|█████ | 5774/11526 [1:00:25<58:58, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.4908115863800049, 'learning_rate': 5.852163319563681e-06, 'epoch': 1.5}
50%|█████ | 5774/11526 [1:00:26<58:58, 1.63it/s] 50%|█████ | 5775/11526 [1:00:26<58:54, 1.63it/s] {'loss': 0.2019, 'grad_norm': 0.548478364944458, 'learning_rate': 5.8506711234455505e-06, 'epoch': 1.5}
50%|█████ | 5775/11526 [1:00:26<58:54, 1.63it/s] 50%|█████ | 5776/11526 [1:00:27<59:01, 1.62it/s] {'loss': 0.1756, 'grad_norm': 0.4828166961669922, 'learning_rate': 5.849178849299027e-06, 'epoch': 1.5}
50%|█████ | 5776/11526 [1:00:27<59:01, 1.62it/s] 50%|█████ | 5777/11526 [1:00:27<58:55, 1.63it/s] {'loss': 0.2582, 'grad_norm': 0.578121542930603, 'learning_rate': 5.847686497260987e-06, 'epoch': 1.5}
50%|█████ | 5777/11526 [1:00:27<58:55, 1.63it/s] 50%|█████ | 5778/11526 [1:00:28<58:53, 1.63it/s] {'loss': 0.2139, 'grad_norm': 0.5786406397819519, 'learning_rate': 5.846194067468316e-06, 'epoch': 1.5}
50%|█████ | 5778/11526 [1:00:28<58:53, 1.63it/s] 50%|█████ | 5779/11526 [1:00:29<58:51, 1.63it/s] {'loss': 0.2122, 'grad_norm': 0.5432443022727966, 'learning_rate': 5.844701560057912e-06, 'epoch': 1.5}
50%|█████ | 5779/11526 [1:00:29<58:51, 1.63it/s] 50%|█████ | 5780/11526 [1:00:29<58:50, 1.63it/s] {'loss': 0.1529, 'grad_norm': 0.49442631006240845, 'learning_rate': 5.843208975166675e-06, 'epoch': 1.5}
50%|█████ | 5780/11526 [1:00:29<58:50, 1.63it/s] 50%|█████ | 5781/11526 [1:00:30<59:12, 1.62it/s] {'loss': 0.2246, 'grad_norm': 0.5962690114974976, 'learning_rate': 5.841716312931514e-06, 'epoch': 1.5}
50%|█████ | 5781/11526 [1:00:30<59:12, 1.62it/s] 50%|█████ | 5782/11526 [1:00:30<59:04, 1.62it/s] {'loss': 0.234, 'grad_norm': 0.59196537733078, 'learning_rate': 5.8402235734893405e-06, 'epoch': 1.5}
50%|█████ | 5782/11526 [1:00:30<59:04, 1.62it/s] 50%|█████ | 5783/11526 [1:00:31<59:00, 1.62it/s] {'loss': 0.2568, 'grad_norm': 0.6357885003089905, 'learning_rate': 5.838730756977084e-06, 'epoch': 1.51}
50%|█████ | 5783/11526 [1:00:31<59:00, 1.62it/s] 50%|█████ | 5784/11526 [1:00:32<58:59, 1.62it/s] {'loss': 0.1582, 'grad_norm': 0.477067232131958, 'learning_rate': 5.837237863531669e-06, 'epoch': 1.51}
50%|█████ | 5784/11526 [1:00:32<58:59, 1.62it/s] 50%|█████ | 5785/11526 [1:00:32<58:53, 1.62it/s] {'loss': 0.1962, 'grad_norm': 0.5585259795188904, 'learning_rate': 5.835744893290032e-06, 'epoch': 1.51}
50%|█████ | 5785/11526 [1:00:32<58:53, 1.62it/s] 50%|█████ | 5786/11526 [1:00:33<58:58, 1.62it/s] {'loss': 0.1479, 'grad_norm': 0.4276792109012604, 'learning_rate': 5.8342518463891195e-06, 'epoch': 1.51}
50%|█████ | 5786/11526 [1:00:33<58:58, 1.62it/s] 50%|█████ | 5787/11526 [1:00:33<58:52, 1.62it/s] {'loss': 0.2313, 'grad_norm': 0.5723785758018494, 'learning_rate': 5.832758722965881e-06, 'epoch': 1.51}
50%|█████ | 5787/11526 [1:00:34<58:52, 1.62it/s] 50%|█████ | 5788/11526 [1:00:34<58:50, 1.63it/s] {'loss': 0.2442, 'grad_norm': 0.6146573424339294, 'learning_rate': 5.8312655231572745e-06, 'epoch': 1.51}
50%|█████ | 5788/11526 [1:00:34<58:50, 1.63it/s] 50%|█████ | 5789/11526 [1:00:35<58:53, 1.62it/s] {'loss': 0.1923, 'grad_norm': 0.5020846128463745, 'learning_rate': 5.829772247100263e-06, 'epoch': 1.51}
50%|█████ | 5789/11526 [1:00:35<58:53, 1.62it/s] 50%|█████ | 5790/11526 [1:00:35<58:49, 1.63it/s] {'loss': 0.2209, 'grad_norm': 0.6026579737663269, 'learning_rate': 5.828278894931821e-06, 'epoch': 1.51}
50%|█████ | 5790/11526 [1:00:35<58:49, 1.63it/s] 50%|█████ | 5791/11526 [1:00:36<59:03, 1.62it/s] {'loss': 0.2462, 'grad_norm': 0.6041945815086365, 'learning_rate': 5.826785466788926e-06, 'epoch': 1.51}
50%|█████ | 5791/11526 [1:00:36<59:03, 1.62it/s] 50%|█████ | 5792/11526 [1:00:37<58:57, 1.62it/s] {'loss': 0.1987, 'grad_norm': 0.5217170119285583, 'learning_rate': 5.8252919628085635e-06, 'epoch': 1.51}
50%|█████ | 5792/11526 [1:00:37<58:57, 1.62it/s] 50%|█████ | 5793/11526 [1:00:37<58:51, 1.62it/s] {'loss': 0.223, 'grad_norm': 0.6657920479774475, 'learning_rate': 5.823798383127726e-06, 'epoch': 1.51}
50%|█████ | 5793/11526 [1:00:37<58:51, 1.62it/s] 50%|█████ | 5794/11526 [1:00:38<58:49, 1.62it/s] {'loss': 0.2101, 'grad_norm': 0.6102240085601807, 'learning_rate': 5.822304727883415e-06, 'epoch': 1.51}
50%|█████ | 5794/11526 [1:00:38<58:49, 1.62it/s] 50%|█████ | 5795/11526 [1:00:38<58:49, 1.62it/s] {'loss': 0.2291, 'grad_norm': 0.829170286655426, 'learning_rate': 5.820810997212635e-06, 'epoch': 1.51}
50%|█████ | 5795/11526 [1:00:38<58:49, 1.62it/s] 50%|█████ | 5796/11526 [1:00:39<58:49, 1.62it/s] {'loss': 0.2041, 'grad_norm': 0.5523187518119812, 'learning_rate': 5.8193171912523994e-06, 'epoch': 1.51}
50%|█████ | 5796/11526 [1:00:39<58:49, 1.62it/s] 50%|█████ | 5797/11526 [1:00:40<58:44, 1.63it/s] {'loss': 0.3083, 'grad_norm': 0.7152985334396362, 'learning_rate': 5.81782331013973e-06, 'epoch': 1.51}
50%|█████ | 5797/11526 [1:00:40<58:44, 1.63it/s] 50%|█████ | 5798/11526 [1:00:40<58:41, 1.63it/s] {'loss': 0.1652, 'grad_norm': 0.4676186442375183, 'learning_rate': 5.816329354011653e-06, 'epoch': 1.51}
50%|█████ | 5798/11526 [1:00:40<58:41, 1.63it/s] 50%|█████ | 5799/11526 [1:00:41<58:41, 1.63it/s] {'loss': 0.1957, 'grad_norm': 0.5688178539276123, 'learning_rate': 5.8148353230052035e-06, 'epoch': 1.51}
50%|█████ | 5799/11526 [1:00:41<58:41, 1.63it/s] 50%|█████ | 5800/11526 [1:00:41<58:39, 1.63it/s] {'loss': 0.3084, 'grad_norm': 0.7368128299713135, 'learning_rate': 5.813341217257421e-06, 'epoch': 1.51}
50%|█████ | 5800/11526 [1:00:42<58:39, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.33it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5716719627380371, 'eval_runtime': 1.9554, 'eval_samples_per_second': 102.282, 'eval_steps_per_second': 6.648, 'epoch': 1.51}
50%|█████ | 5800/11526 [1:00:44<58:39, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 50%|█████ | 5801/11526 [1:00:44<1:54:45, 1.20s/it] {'loss': 0.2253, 'grad_norm': 0.5948531031608582, 'learning_rate': 5.8118470369053545e-06, 'epoch': 1.51}
50%|█████ | 5801/11526 [1:00:44<1:54:45, 1.20s/it] 50%|█████ | 5802/11526 [1:00:45<1:37:55, 1.03s/it] {'loss': 0.1749, 'grad_norm': 0.4685109257698059, 'learning_rate': 5.810352782086058e-06, 'epoch': 1.51}
50%|█████ | 5802/11526 [1:00:45<1:37:55, 1.03s/it] 50%|█████ | 5803/11526 [1:00:45<1:26:07, 1.11it/s] {'loss': 0.3182, 'grad_norm': 0.5632795691490173, 'learning_rate': 5.808858452936596e-06, 'epoch': 1.51}
50%|█████ | 5803/11526 [1:00:45<1:26:07, 1.11it/s] 50%|█████ | 5804/11526 [1:00:46<1:17:52, 1.22it/s] {'loss': 0.1732, 'grad_norm': 0.5439707636833191, 'learning_rate': 5.807364049594032e-06, 'epoch': 1.51}
50%|█████ | 5804/11526 [1:00:46<1:17:52, 1.22it/s] 50%|█████ | 5805/11526 [1:00:46<1:12:04, 1.32it/s] {'loss': 0.2115, 'grad_norm': 0.5947838425636292, 'learning_rate': 5.805869572195445e-06, 'epoch': 1.51}
50%|█████ | 5805/11526 [1:00:47<1:12:04, 1.32it/s] 50%|█████ | 5806/11526 [1:00:47<1:07:58, 1.40it/s] {'loss': 0.2347, 'grad_norm': 0.621699869632721, 'learning_rate': 5.804375020877916e-06, 'epoch': 1.51}
50%|█████ | 5806/11526 [1:00:47<1:07:58, 1.40it/s] 50%|█████ | 5807/11526 [1:00:48<1:05:10, 1.46it/s] {'loss': 0.1527, 'grad_norm': 0.4706576466560364, 'learning_rate': 5.802880395778532e-06, 'epoch': 1.51}
50%|█████ | 5807/11526 [1:00:48<1:05:10, 1.46it/s] 50%|█████ | 5808/11526 [1:00:48<1:03:10, 1.51it/s] {'loss': 0.2522, 'grad_norm': 0.6528451442718506, 'learning_rate': 5.8013856970343905e-06, 'epoch': 1.51}
50%|█████ | 5808/11526 [1:00:48<1:03:10, 1.51it/s] 50%|█████ | 5809/11526 [1:00:49<1:01:46, 1.54it/s] {'loss': 0.3024, 'grad_norm': 0.6196422576904297, 'learning_rate': 5.799890924782593e-06, 'epoch': 1.51}
50%|█████ | 5809/11526 [1:00:49<1:01:46, 1.54it/s] 50%|█████ | 5810/11526 [1:00:50<1:00:47, 1.57it/s] {'loss': 0.1883, 'grad_norm': 0.4994112551212311, 'learning_rate': 5.798396079160249e-06, 'epoch': 1.51}
50%|█████ | 5810/11526 [1:00:50<1:00:47, 1.57it/s] 50%|█████ | 5811/11526 [1:00:50<1:00:12, 1.58it/s] {'loss': 0.175, 'grad_norm': 0.5142545104026794, 'learning_rate': 5.796901160304475e-06, 'epoch': 1.51}
50%|█████ | 5811/11526 [1:00:50<1:00:12, 1.58it/s] 50%|█████ | 5812/11526 [1:00:51<59:39, 1.60it/s] {'loss': 0.2119, 'grad_norm': 0.5750592947006226, 'learning_rate': 5.79540616835239e-06, 'epoch': 1.51}
50%|█████ | 5812/11526 [1:00:51<59:39, 1.60it/s] 50%|█████ | 5813/11526 [1:00:51<59:17, 1.61it/s] {'loss': 0.1857, 'grad_norm': 0.5057964324951172, 'learning_rate': 5.793911103441127e-06, 'epoch': 1.51}
50%|█████ | 5813/11526 [1:00:52<59:17, 1.61it/s] 50%|█████ | 5814/11526 [1:00:52<59:02, 1.61it/s] {'loss': 0.2222, 'grad_norm': 0.5740044713020325, 'learning_rate': 5.7924159657078206e-06, 'epoch': 1.51}
50%|█████ | 5814/11526 [1:00:52<59:02, 1.61it/s] 50%|█████ | 5815/11526 [1:00:53<58:50, 1.62it/s] {'loss': 0.2099, 'grad_norm': 0.518663763999939, 'learning_rate': 5.7909207552896115e-06, 'epoch': 1.51}
50%|█████ | 5815/11526 [1:00:53<58:50, 1.62it/s] 50%|█████ | 5816/11526 [1:00:53<58:52, 1.62it/s] {'loss': 0.2276, 'grad_norm': 0.5619065165519714, 'learning_rate': 5.789425472323652e-06, 'epoch': 1.51}
50%|█████ | 5816/11526 [1:00:53<58:52, 1.62it/s] 50%|█████ | 5817/11526 [1:00:54<58:44, 1.62it/s] {'loss': 0.154, 'grad_norm': 0.41206973791122437, 'learning_rate': 5.787930116947098e-06, 'epoch': 1.51}
50%|█████ | 5817/11526 [1:00:54<58:44, 1.62it/s] 50%|█████ | 5818/11526 [1:00:54<58:37, 1.62it/s] {'loss': 0.2816, 'grad_norm': 0.6079213619232178, 'learning_rate': 5.786434689297108e-06, 'epoch': 1.51}
50%|█████ | 5818/11526 [1:00:55<58:37, 1.62it/s] 50%|█████ | 5819/11526 [1:00:55<58:34, 1.62it/s] {'loss': 0.1872, 'grad_norm': 0.5481922030448914, 'learning_rate': 5.784939189510854e-06, 'epoch': 1.51}
50%|█████ | 5819/11526 [1:00:55<58:34, 1.62it/s] 50%|█████ | 5820/11526 [1:00:56<58:30, 1.63it/s] {'loss': 0.2204, 'grad_norm': 0.4669358730316162, 'learning_rate': 5.783443617725513e-06, 'epoch': 1.51}
50%|█████ | 5820/11526 [1:00:56<58:30, 1.63it/s] 51%|█████ | 5821/11526 [1:00:56<58:31, 1.62it/s] {'loss': 0.1381, 'grad_norm': 0.385219931602478, 'learning_rate': 5.781947974078264e-06, 'epoch': 1.52}
51%|█████ | 5821/11526 [1:00:56<58:31, 1.62it/s] 51%|█████ | 5822/11526 [1:00:57<58:29, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.6107745170593262, 'learning_rate': 5.7804522587062995e-06, 'epoch': 1.52}
51%|█████ | 5822/11526 [1:00:57<58:29, 1.63it/s] 51%|█████ | 5823/11526 [1:00:58<58:26, 1.63it/s] {'loss': 0.2354, 'grad_norm': 0.6709606647491455, 'learning_rate': 5.778956471746811e-06, 'epoch': 1.52}
51%|█████ | 5823/11526 [1:00:58<58:26, 1.63it/s] 51%|█████ | 5824/11526 [1:00:58<58:26, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5989331603050232, 'learning_rate': 5.777460613337003e-06, 'epoch': 1.52}
51%|█████ | 5824/11526 [1:00:58<58:26, 1.63it/s] 51%|█████ | 5825/11526 [1:00:59<58:23, 1.63it/s] {'loss': 0.199, 'grad_norm': 0.5659587383270264, 'learning_rate': 5.775964683614087e-06, 'epoch': 1.52}
51%|█████ | 5825/11526 [1:00:59<58:23, 1.63it/s] 51%|█████ | 5826/11526 [1:00:59<58:23, 1.63it/s] {'loss': 0.1943, 'grad_norm': 0.4863525629043579, 'learning_rate': 5.774468682715274e-06, 'epoch': 1.52}
51%|█████ | 5826/11526 [1:01:00<58:23, 1.63it/s] 51%|█████ | 5827/11526 [1:01:00<58:22, 1.63it/s] {'loss': 0.1868, 'grad_norm': 0.5790393352508545, 'learning_rate': 5.7729726107777855e-06, 'epoch': 1.52}
51%|█████ | 5827/11526 [1:01:00<58:22, 1.63it/s] 51%|█████ | 5828/11526 [1:01:01<58:20, 1.63it/s] {'loss': 0.2583, 'grad_norm': 0.6737915873527527, 'learning_rate': 5.771476467938851e-06, 'epoch': 1.52}
51%|█████ | 5828/11526 [1:01:01<58:20, 1.63it/s] 51%|█████ | 5829/11526 [1:01:01<58:18, 1.63it/s] {'loss': 0.1741, 'grad_norm': 0.5146365761756897, 'learning_rate': 5.769980254335707e-06, 'epoch': 1.52}
51%|█████ | 5829/11526 [1:01:01<58:18, 1.63it/s] 51%|█████ | 5830/11526 [1:01:02<58:15, 1.63it/s] {'loss': 0.2832, 'grad_norm': 0.6452949643135071, 'learning_rate': 5.768483970105592e-06, 'epoch': 1.52}
51%|█████ | 5830/11526 [1:01:02<58:15, 1.63it/s] 51%|█████ | 5831/11526 [1:01:02<58:20, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.5669056177139282, 'learning_rate': 5.766987615385754e-06, 'epoch': 1.52}
51%|█████ | 5831/11526 [1:01:03<58:20, 1.63it/s] 51%|█████ | 5832/11526 [1:01:03<58:18, 1.63it/s] {'loss': 0.3095, 'grad_norm': 0.7054117918014526, 'learning_rate': 5.765491190313449e-06, 'epoch': 1.52}
51%|█████ | 5832/11526 [1:01:03<58:18, 1.63it/s] 51%|█████ | 5833/11526 [1:01:04<58:15, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.44321516156196594, 'learning_rate': 5.763994695025936e-06, 'epoch': 1.52}
51%|█████ | 5833/11526 [1:01:04<58:15, 1.63it/s] 51%|█████ | 5834/11526 [1:01:04<58:14, 1.63it/s] {'loss': 0.1812, 'grad_norm': 0.5619460940361023, 'learning_rate': 5.762498129660484e-06, 'epoch': 1.52}
51%|█████ | 5834/11526 [1:01:04<58:14, 1.63it/s] 51%|█████ | 5835/11526 [1:01:05<58:13, 1.63it/s] {'loss': 0.1676, 'grad_norm': 0.4804130494594574, 'learning_rate': 5.761001494354363e-06, 'epoch': 1.52}
51%|█████ | 5835/11526 [1:01:05<58:13, 1.63it/s] 51%|█████ | 5836/11526 [1:01:06<58:14, 1.63it/s] {'loss': 0.1895, 'grad_norm': 0.5088072419166565, 'learning_rate': 5.759504789244856e-06, 'epoch': 1.52}
51%|█████ | 5836/11526 [1:01:06<58:14, 1.63it/s] 51%|█████ | 5837/11526 [1:01:06<58:13, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.5189276337623596, 'learning_rate': 5.75800801446925e-06, 'epoch': 1.52}
51%|█████ | 5837/11526 [1:01:06<58:13, 1.63it/s] 51%|█████ | 5838/11526 [1:01:07<58:13, 1.63it/s] {'loss': 0.2214, 'grad_norm': 0.6006885766983032, 'learning_rate': 5.756511170164834e-06, 'epoch': 1.52}
51%|█████ | 5838/11526 [1:01:07<58:13, 1.63it/s] 51%|█████ | 5839/11526 [1:01:07<58:11, 1.63it/s] {'loss': 0.2069, 'grad_norm': 0.5522888898849487, 'learning_rate': 5.755014256468909e-06, 'epoch': 1.52}
51%|█████ | 5839/11526 [1:01:07<58:11, 1.63it/s] 51%|█████ | 5840/11526 [1:01:08<58:10, 1.63it/s] {'loss': 0.1913, 'grad_norm': 0.5197306275367737, 'learning_rate': 5.753517273518782e-06, 'epoch': 1.52}
51%|█████ | 5840/11526 [1:01:08<58:10, 1.63it/s] 51%|█████ | 5841/11526 [1:01:09<58:09, 1.63it/s] {'loss': 0.1697, 'grad_norm': 0.46650269627571106, 'learning_rate': 5.752020221451763e-06, 'epoch': 1.52}
51%|█████ | 5841/11526 [1:01:09<58:09, 1.63it/s] 51%|█████ | 5842/11526 [1:01:09<58:07, 1.63it/s] {'loss': 0.1953, 'grad_norm': 0.5337250232696533, 'learning_rate': 5.750523100405169e-06, 'epoch': 1.52}
51%|█████ | 5842/11526 [1:01:09<58:07, 1.63it/s] 51%|█████ | 5843/11526 [1:01:10<58:08, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.5958317518234253, 'learning_rate': 5.7490259105163256e-06, 'epoch': 1.52}
51%|█████ | 5843/11526 [1:01:10<58:08, 1.63it/s] 51%|█████ | 5844/11526 [1:01:10<58:08, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.4728444814682007, 'learning_rate': 5.747528651922567e-06, 'epoch': 1.52}
51%|█████ | 5844/11526 [1:01:11<58:08, 1.63it/s] 51%|█████ | 5845/11526 [1:01:11<58:06, 1.63it/s] {'loss': 0.2065, 'grad_norm': 0.5852007269859314, 'learning_rate': 5.746031324761225e-06, 'epoch': 1.52}
51%|█████ | 5845/11526 [1:01:11<58:06, 1.63it/s] 51%|█████ | 5846/11526 [1:01:12<58:05, 1.63it/s] {'loss': 0.1582, 'grad_norm': 0.4111465811729431, 'learning_rate': 5.744533929169646e-06, 'epoch': 1.52}
51%|█████ | 5846/11526 [1:01:12<58:05, 1.63it/s] 51%|█████ | 5847/11526 [1:01:12<58:07, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.420051246881485, 'learning_rate': 5.743036465285177e-06, 'epoch': 1.52}
51%|█████ | 5847/11526 [1:01:12<58:07, 1.63it/s] 51%|█████ | 5848/11526 [1:01:13<58:07, 1.63it/s] {'loss': 0.181, 'grad_norm': 0.5133239030838013, 'learning_rate': 5.741538933245178e-06, 'epoch': 1.52}
51%|█████ | 5848/11526 [1:01:13<58:07, 1.63it/s] 51%|█████ | 5849/11526 [1:01:14<58:07, 1.63it/s] {'loss': 0.1747, 'grad_norm': 0.55014967918396, 'learning_rate': 5.740041333187007e-06, 'epoch': 1.52}
51%|█████ | 5849/11526 [1:01:14<58:07, 1.63it/s] 51%|█████ | 5850/11526 [1:01:14<58:07, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.5517136454582214, 'learning_rate': 5.7385436652480355e-06, 'epoch': 1.52}
51%|█████ | 5850/11526 [1:01:14<58:07, 1.63it/s] 51%|█████ | 5851/11526 [1:01:15<58:05, 1.63it/s] {'loss': 0.1646, 'grad_norm': 0.4548230469226837, 'learning_rate': 5.737045929565635e-06, 'epoch': 1.52}
51%|█████ | 5851/11526 [1:01:15<58:05, 1.63it/s] 51%|█████ | 5852/11526 [1:01:15<58:03, 1.63it/s] {'loss': 0.2425, 'grad_norm': 0.6258777379989624, 'learning_rate': 5.735548126277191e-06, 'epoch': 1.52}
51%|█████ | 5852/11526 [1:01:15<58:03, 1.63it/s] 51%|█████ | 5853/11526 [1:01:16<58:03, 1.63it/s] {'loss': 0.2287, 'grad_norm': 0.6034979224205017, 'learning_rate': 5.734050255520086e-06, 'epoch': 1.52}
51%|█████ | 5853/11526 [1:01:16<58:03, 1.63it/s] 51%|█████ | 5854/11526 [1:01:17<58:03, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.47714507579803467, 'learning_rate': 5.7325523174317154e-06, 'epoch': 1.52}
51%|█████ | 5854/11526 [1:01:17<58:03, 1.63it/s] 51%|█████ | 5855/11526 [1:01:17<58:01, 1.63it/s] {'loss': 0.2038, 'grad_norm': 0.5240675806999207, 'learning_rate': 5.73105431214948e-06, 'epoch': 1.52}
51%|█████ | 5855/11526 [1:01:17<58:01, 1.63it/s] 51%|█████ | 5856/11526 [1:01:18<58:00, 1.63it/s] {'loss': 0.2253, 'grad_norm': 0.6412543654441833, 'learning_rate': 5.72955623981078e-06, 'epoch': 1.52}
51%|█████ | 5856/11526 [1:01:18<58:00, 1.63it/s] 51%|█████ | 5857/11526 [1:01:18<58:00, 1.63it/s] {'loss': 0.1475, 'grad_norm': 0.43193545937538147, 'learning_rate': 5.728058100553033e-06, 'epoch': 1.52}
51%|█████ | 5857/11526 [1:01:19<58:00, 1.63it/s] 51%|█████ | 5858/11526 [1:01:19<57:57, 1.63it/s] {'loss': 0.1947, 'grad_norm': 0.5354369878768921, 'learning_rate': 5.726559894513653e-06, 'epoch': 1.52}
51%|█████ | 5858/11526 [1:01:19<57:57, 1.63it/s] 51%|█████ | 5859/11526 [1:01:20<57:58, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.5914942622184753, 'learning_rate': 5.725061621830068e-06, 'epoch': 1.52}
51%|█████ | 5859/11526 [1:01:20<57:58, 1.63it/s] 51%|█████ | 5860/11526 [1:01:20<57:57, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.6186822056770325, 'learning_rate': 5.723563282639704e-06, 'epoch': 1.53}
51%|█████ | 5860/11526 [1:01:20<57:57, 1.63it/s] 51%|█████ | 5861/11526 [1:01:21<57:56, 1.63it/s] {'loss': 0.1646, 'grad_norm': 0.4832823574542999, 'learning_rate': 5.722064877079999e-06, 'epoch': 1.53}
51%|█████ | 5861/11526 [1:01:21<57:56, 1.63it/s] 51%|█████ | 5862/11526 [1:01:21<57:57, 1.63it/s] {'loss': 0.2035, 'grad_norm': 0.5856859683990479, 'learning_rate': 5.720566405288394e-06, 'epoch': 1.53}
51%|█████ | 5862/11526 [1:01:22<57:57, 1.63it/s] 51%|█████ | 5863/11526 [1:01:22<57:56, 1.63it/s] {'loss': 0.2268, 'grad_norm': 0.5784217119216919, 'learning_rate': 5.71906786740234e-06, 'epoch': 1.53}
51%|█████ | 5863/11526 [1:01:22<57:56, 1.63it/s] 51%|█████ | 5864/11526 [1:01:23<57:55, 1.63it/s] {'loss': 0.2106, 'grad_norm': 0.5545591711997986, 'learning_rate': 5.717569263559291e-06, 'epoch': 1.53}
51%|█████ | 5864/11526 [1:01:23<57:55, 1.63it/s] 51%|█████ | 5865/11526 [1:01:23<57:55, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.4394989013671875, 'learning_rate': 5.7160705938967035e-06, 'epoch': 1.53}
51%|█████ | 5865/11526 [1:01:23<57:55, 1.63it/s] 51%|█████ | 5866/11526 [1:01:24<57:56, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.4951590299606323, 'learning_rate': 5.714571858552049e-06, 'epoch': 1.53}
51%|█████ | 5866/11526 [1:01:24<57:56, 1.63it/s] 51%|█████ | 5867/11526 [1:01:25<57:56, 1.63it/s] {'loss': 0.2061, 'grad_norm': 0.5409990549087524, 'learning_rate': 5.713073057662797e-06, 'epoch': 1.53}
51%|█████ | 5867/11526 [1:01:25<57:56, 1.63it/s] 51%|█████ | 5868/11526 [1:01:25<57:54, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.5150898098945618, 'learning_rate': 5.711574191366427e-06, 'epoch': 1.53}
51%|█████ | 5868/11526 [1:01:25<57:54, 1.63it/s] 51%|█████ | 5869/11526 [1:01:26<57:53, 1.63it/s] {'loss': 0.2894, 'grad_norm': 0.5953391790390015, 'learning_rate': 5.710075259800423e-06, 'epoch': 1.53}
51%|█████ | 5869/11526 [1:01:26<57:53, 1.63it/s] 51%|█████ | 5870/11526 [1:01:26<57:50, 1.63it/s] {'loss': 0.1821, 'grad_norm': 0.5182964205741882, 'learning_rate': 5.7085762631022765e-06, 'epoch': 1.53}
51%|█████ | 5870/11526 [1:01:27<57:50, 1.63it/s] 51%|█████ | 5871/11526 [1:01:27<57:52, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.5326521396636963, 'learning_rate': 5.707077201409483e-06, 'epoch': 1.53}
51%|█████ | 5871/11526 [1:01:27<57:52, 1.63it/s] 51%|█████ | 5872/11526 [1:01:28<57:51, 1.63it/s] {'loss': 0.1969, 'grad_norm': 0.5915059447288513, 'learning_rate': 5.705578074859547e-06, 'epoch': 1.53}
51%|█████ | 5872/11526 [1:01:28<57:51, 1.63it/s] 51%|█████ | 5873/11526 [1:01:28<57:51, 1.63it/s] {'loss': 0.1795, 'grad_norm': 0.4818885922431946, 'learning_rate': 5.704078883589973e-06, 'epoch': 1.53}
51%|█████ | 5873/11526 [1:01:28<57:51, 1.63it/s] 51%|█████ | 5874/11526 [1:01:29<57:53, 1.63it/s] {'loss': 0.2064, 'grad_norm': 0.5574833154678345, 'learning_rate': 5.702579627738279e-06, 'epoch': 1.53}
51%|█████ | 5874/11526 [1:01:29<57:53, 1.63it/s] 51%|█████ | 5875/11526 [1:01:29<57:53, 1.63it/s] {'loss': 0.2003, 'grad_norm': 0.5089884400367737, 'learning_rate': 5.701080307441985e-06, 'epoch': 1.53}
51%|█████ | 5875/11526 [1:01:30<57:53, 1.63it/s] 51%|█████ | 5876/11526 [1:01:30<57:51, 1.63it/s] {'loss': 0.2361, 'grad_norm': 0.636287271976471, 'learning_rate': 5.699580922838616e-06, 'epoch': 1.53}
51%|█████ | 5876/11526 [1:01:30<57:51, 1.63it/s] 51%|█████ | 5877/11526 [1:01:31<57:52, 1.63it/s] {'loss': 0.2812, 'grad_norm': 0.6765362024307251, 'learning_rate': 5.698081474065703e-06, 'epoch': 1.53}
51%|█████ | 5877/11526 [1:01:31<57:52, 1.63it/s] 51%|█████ | 5878/11526 [1:01:31<57:50, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.5021886825561523, 'learning_rate': 5.696581961260785e-06, 'epoch': 1.53}
51%|█████ | 5878/11526 [1:01:31<57:50, 1.63it/s] 51%|█████ | 5879/11526 [1:01:32<57:49, 1.63it/s] {'loss': 0.293, 'grad_norm': 0.7061349153518677, 'learning_rate': 5.695082384561408e-06, 'epoch': 1.53}
51%|█████ | 5879/11526 [1:01:32<57:49, 1.63it/s] 51%|█████ | 5880/11526 [1:01:33<57:48, 1.63it/s] {'loss': 0.1998, 'grad_norm': 0.5465329885482788, 'learning_rate': 5.6935827441051186e-06, 'epoch': 1.53}
51%|█████ | 5880/11526 [1:01:33<57:48, 1.63it/s] 51%|█████ | 5881/11526 [1:01:33<57:47, 1.63it/s] {'loss': 0.1316, 'grad_norm': 0.38691627979278564, 'learning_rate': 5.692083040029474e-06, 'epoch': 1.53}
51%|█████ | 5881/11526 [1:01:33<57:47, 1.63it/s] 51%|█████ | 5882/11526 [1:01:34<57:44, 1.63it/s] {'loss': 0.1785, 'grad_norm': 0.44996893405914307, 'learning_rate': 5.690583272472034e-06, 'epoch': 1.53}
51%|█████ | 5882/11526 [1:01:34<57:44, 1.63it/s] 51%|█████ | 5883/11526 [1:01:34<57:45, 1.63it/s] {'loss': 0.2369, 'grad_norm': 0.6080963015556335, 'learning_rate': 5.689083441570369e-06, 'epoch': 1.53}
51%|█████ | 5883/11526 [1:01:35<57:45, 1.63it/s] 51%|█████ | 5884/11526 [1:01:35<57:42, 1.63it/s] {'loss': 0.252, 'grad_norm': 0.608792781829834, 'learning_rate': 5.687583547462049e-06, 'epoch': 1.53}
51%|█████ | 5884/11526 [1:01:35<57:42, 1.63it/s] 51%|█████ | 5885/11526 [1:01:36<57:43, 1.63it/s] {'loss': 0.2562, 'grad_norm': 0.6407113075256348, 'learning_rate': 5.6860835902846535e-06, 'epoch': 1.53}
51%|█████ | 5885/11526 [1:01:36<57:43, 1.63it/s] 51%|█████ | 5886/11526 [1:01:36<57:46, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.47334617376327515, 'learning_rate': 5.684583570175769e-06, 'epoch': 1.53}
51%|█████ | 5886/11526 [1:01:36<57:46, 1.63it/s] 51%|█████ | 5887/11526 [1:01:37<57:45, 1.63it/s] {'loss': 0.1593, 'grad_norm': 0.5349007844924927, 'learning_rate': 5.683083487272982e-06, 'epoch': 1.53}
51%|█████ | 5887/11526 [1:01:37<57:45, 1.63it/s] 51%|█████ | 5888/11526 [1:01:37<57:45, 1.63it/s] {'loss': 0.1862, 'grad_norm': 0.5136051774024963, 'learning_rate': 5.6815833417138925e-06, 'epoch': 1.53}
51%|█████ | 5888/11526 [1:01:38<57:45, 1.63it/s] 51%|█████ | 5889/11526 [1:01:38<57:45, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.4795417785644531, 'learning_rate': 5.6800831336360995e-06, 'epoch': 1.53}
51%|█████ | 5889/11526 [1:01:38<57:45, 1.63it/s] 51%|█████ | 5890/11526 [1:01:39<57:46, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.4839659631252289, 'learning_rate': 5.678582863177214e-06, 'epoch': 1.53}
51%|█████ | 5890/11526 [1:01:39<57:46, 1.63it/s] 51%|█████ | 5891/11526 [1:01:39<57:47, 1.63it/s] {'loss': 0.1565, 'grad_norm': 0.43835026025772095, 'learning_rate': 5.677082530474845e-06, 'epoch': 1.53}
51%|█████ | 5891/11526 [1:01:39<57:47, 1.63it/s] 51%|█████ | 5892/11526 [1:01:40<57:46, 1.63it/s] {'loss': 0.1347, 'grad_norm': 0.4207773506641388, 'learning_rate': 5.675582135666615e-06, 'epoch': 1.53}
51%|█████ | 5892/11526 [1:01:40<57:46, 1.63it/s] 51%|█████ | 5893/11526 [1:01:41<57:46, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5419616103172302, 'learning_rate': 5.674081678890149e-06, 'epoch': 1.53}
51%|█████ | 5893/11526 [1:01:41<57:46, 1.63it/s] 51%|█████ | 5894/11526 [1:01:41<57:45, 1.63it/s] {'loss': 0.2838, 'grad_norm': 0.7576248645782471, 'learning_rate': 5.672581160283075e-06, 'epoch': 1.53}
51%|█████ | 5894/11526 [1:01:41<57:45, 1.63it/s] 51%|█████ | 5895/11526 [1:01:42<57:43, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.4306703209877014, 'learning_rate': 5.671080579983029e-06, 'epoch': 1.53}
51%|█████ | 5895/11526 [1:01:42<57:43, 1.63it/s] 51%|█████ | 5896/11526 [1:01:42<57:44, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.49339866638183594, 'learning_rate': 5.669579938127655e-06, 'epoch': 1.53}
51%|█████ | 5896/11526 [1:01:43<57:44, 1.63it/s] 51%|█████ | 5897/11526 [1:01:43<57:41, 1.63it/s] {'loss': 0.2426, 'grad_norm': 0.6176979541778564, 'learning_rate': 5.6680792348546e-06, 'epoch': 1.53}
51%|█████ | 5897/11526 [1:01:43<57:41, 1.63it/s] 51%|█████ | 5898/11526 [1:01:44<57:40, 1.63it/s] {'loss': 0.1706, 'grad_norm': 0.4505901038646698, 'learning_rate': 5.666578470301515e-06, 'epoch': 1.54}
51%|█████ | 5898/11526 [1:01:44<57:40, 1.63it/s] 51%|█████ | 5899/11526 [1:01:44<57:42, 1.63it/s] {'loss': 0.2392, 'grad_norm': 0.6779398918151855, 'learning_rate': 5.6650776446060605e-06, 'epoch': 1.54}
51%|█████ | 5899/11526 [1:01:44<57:42, 1.63it/s] 51%|█████ | 5900/11526 [1:01:45<57:40, 1.63it/s] {'loss': 0.2206, 'grad_norm': 0.5915946364402771, 'learning_rate': 5.6635767579059e-06, 'epoch': 1.54}
51%|█████ | 5900/11526 [1:01:45<57:40, 1.63it/s] 51%|█████ | 5901/11526 [1:01:45<57:41, 1.63it/s] {'loss': 0.2286, 'grad_norm': 0.6434274911880493, 'learning_rate': 5.6620758103387044e-06, 'epoch': 1.54}
51%|█████ | 5901/11526 [1:01:46<57:41, 1.63it/s] 51%|█████ | 5902/11526 [1:01:46<57:39, 1.63it/s] {'loss': 0.1555, 'grad_norm': 0.43136677145957947, 'learning_rate': 5.660574802042148e-06, 'epoch': 1.54}
51%|█████ | 5902/11526 [1:01:46<57:39, 1.63it/s] 51%|█████ | 5903/11526 [1:01:47<57:37, 1.63it/s] {'loss': 0.2359, 'grad_norm': 0.587043821811676, 'learning_rate': 5.659073733153911e-06, 'epoch': 1.54}
51%|█████ | 5903/11526 [1:01:47<57:37, 1.63it/s] 51%|█████ | 5904/11526 [1:01:47<57:41, 1.62it/s] {'loss': 0.1952, 'grad_norm': 0.5395867824554443, 'learning_rate': 5.657572603811684e-06, 'epoch': 1.54}
51%|█████ | 5904/11526 [1:01:47<57:41, 1.62it/s] 51%|█████ | 5905/11526 [1:01:48<57:38, 1.63it/s] {'loss': 0.2193, 'grad_norm': 0.6125359535217285, 'learning_rate': 5.656071414153154e-06, 'epoch': 1.54}
51%|█████ | 5905/11526 [1:01:48<57:38, 1.63it/s] 51%|█████ | 5906/11526 [1:01:49<57:41, 1.62it/s] {'loss': 0.1914, 'grad_norm': 0.4898240864276886, 'learning_rate': 5.6545701643160236e-06, 'epoch': 1.54}
51%|█████ | 5906/11526 [1:01:49<57:41, 1.62it/s] 51%|█████ | 5907/11526 [1:01:49<57:39, 1.62it/s] {'loss': 0.199, 'grad_norm': 0.5772337317466736, 'learning_rate': 5.65306885443799e-06, 'epoch': 1.54}
51%|█████ | 5907/11526 [1:01:49<57:39, 1.62it/s] 51%|█████▏ | 5908/11526 [1:01:50<57:35, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.4329463839530945, 'learning_rate': 5.651567484656768e-06, 'epoch': 1.54}
51%|█████▏ | 5908/11526 [1:01:50<57:35, 1.63it/s] 51%|█████▏ | 5909/11526 [1:01:50<57:38, 1.62it/s] {'loss': 0.2738, 'grad_norm': 0.7075490951538086, 'learning_rate': 5.650066055110067e-06, 'epoch': 1.54}
51%|█████▏ | 5909/11526 [1:01:51<57:38, 1.62it/s] 51%|█████▏ | 5910/11526 [1:01:51<57:34, 1.63it/s] {'loss': 0.2553, 'grad_norm': 0.6834741830825806, 'learning_rate': 5.64856456593561e-06, 'epoch': 1.54}
51%|█████▏ | 5910/11526 [1:01:51<57:34, 1.63it/s] 51%|█████▏ | 5911/11526 [1:01:52<57:39, 1.62it/s] {'loss': 0.182, 'grad_norm': 0.5164248943328857, 'learning_rate': 5.64706301727112e-06, 'epoch': 1.54}
51%|█████▏ | 5911/11526 [1:01:52<57:39, 1.62it/s] 51%|█████▏ | 5912/11526 [1:01:52<57:33, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.4507155120372772, 'learning_rate': 5.64556140925433e-06, 'epoch': 1.54}
51%|█████▏ | 5912/11526 [1:01:52<57:33, 1.63it/s] 51%|█████▏ | 5913/11526 [1:01:53<57:31, 1.63it/s] {'loss': 0.2044, 'grad_norm': 0.5135353803634644, 'learning_rate': 5.644059742022974e-06, 'epoch': 1.54}
51%|█████▏ | 5913/11526 [1:01:53<57:31, 1.63it/s] 51%|█████▏ | 5914/11526 [1:01:53<57:29, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.5582365989685059, 'learning_rate': 5.642558015714792e-06, 'epoch': 1.54}
51%|█████▏ | 5914/11526 [1:01:54<57:29, 1.63it/s] 51%|█████▏ | 5915/11526 [1:01:54<57:27, 1.63it/s] {'loss': 0.2066, 'grad_norm': 0.5260403156280518, 'learning_rate': 5.641056230467534e-06, 'epoch': 1.54}
51%|█████▏ | 5915/11526 [1:01:54<57:27, 1.63it/s] 51%|█████▏ | 5916/11526 [1:01:55<57:29, 1.63it/s] {'loss': 0.2796, 'grad_norm': 0.7405680418014526, 'learning_rate': 5.639554386418951e-06, 'epoch': 1.54}
51%|█████▏ | 5916/11526 [1:01:55<57:29, 1.63it/s] 51%|█████▏ | 5917/11526 [1:01:55<57:29, 1.63it/s] {'loss': 0.22, 'grad_norm': 0.5457751750946045, 'learning_rate': 5.6380524837068015e-06, 'epoch': 1.54}
51%|█████▏ | 5917/11526 [1:01:55<57:29, 1.63it/s] 51%|█████▏ | 5918/11526 [1:01:56<57:25, 1.63it/s] {'loss': 0.1972, 'grad_norm': 0.5118393301963806, 'learning_rate': 5.6365505224688465e-06, 'epoch': 1.54}
51%|█████▏ | 5918/11526 [1:01:56<57:25, 1.63it/s] 51%|█████▏ | 5919/11526 [1:01:57<57:22, 1.63it/s] {'loss': 0.2234, 'grad_norm': 0.6096487641334534, 'learning_rate': 5.635048502842857e-06, 'epoch': 1.54}
51%|█████▏ | 5919/11526 [1:01:57<57:22, 1.63it/s] 51%|█████▏ | 5920/11526 [1:01:57<57:22, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.4920502007007599, 'learning_rate': 5.633546424966604e-06, 'epoch': 1.54}
51%|█████▏ | 5920/11526 [1:01:57<57:22, 1.63it/s] 51%|█████▏ | 5921/11526 [1:01:58<57:26, 1.63it/s] {'loss': 0.2483, 'grad_norm': 0.6637029647827148, 'learning_rate': 5.63204428897787e-06, 'epoch': 1.54}
51%|█████▏ | 5921/11526 [1:01:58<57:26, 1.63it/s] 51%|█████▏ | 5922/11526 [1:01:58<57:23, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.4779587388038635, 'learning_rate': 5.6305420950144365e-06, 'epoch': 1.54}
51%|█████▏ | 5922/11526 [1:01:58<57:23, 1.63it/s] 51%|█████▏ | 5923/11526 [1:01:59<57:21, 1.63it/s] {'loss': 0.2688, 'grad_norm': 0.6124845743179321, 'learning_rate': 5.629039843214095e-06, 'epoch': 1.54}
51%|█████▏ | 5923/11526 [1:01:59<57:21, 1.63it/s] 51%|█████▏ | 5924/11526 [1:02:00<57:23, 1.63it/s] {'loss': 0.1995, 'grad_norm': 0.5156206488609314, 'learning_rate': 5.62753753371464e-06, 'epoch': 1.54}
51%|█████▏ | 5924/11526 [1:02:00<57:23, 1.63it/s] 51%|█████▏ | 5925/11526 [1:02:00<57:21, 1.63it/s] {'loss': 0.2728, 'grad_norm': 0.6642764210700989, 'learning_rate': 5.626035166653871e-06, 'epoch': 1.54}
51%|█████▏ | 5925/11526 [1:02:00<57:21, 1.63it/s] 51%|█████▏ | 5926/11526 [1:02:01<57:28, 1.62it/s] {'loss': 0.2139, 'grad_norm': 0.5510098934173584, 'learning_rate': 5.624532742169595e-06, 'epoch': 1.54}
51%|█████▏ | 5926/11526 [1:02:01<57:28, 1.62it/s] 51%|█████▏ | 5927/11526 [1:02:01<57:25, 1.62it/s] {'loss': 0.2019, 'grad_norm': 0.5607900023460388, 'learning_rate': 5.623030260399622e-06, 'epoch': 1.54}
51%|█████▏ | 5927/11526 [1:02:02<57:25, 1.62it/s] 51%|█████▏ | 5928/11526 [1:02:02<57:23, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.47163712978363037, 'learning_rate': 5.62152772148177e-06, 'epoch': 1.54}
51%|█████▏ | 5928/11526 [1:02:02<57:23, 1.63it/s] 51%|█████▏ | 5929/11526 [1:02:03<57:24, 1.63it/s] {'loss': 0.1373, 'grad_norm': 0.42095625400543213, 'learning_rate': 5.620025125553859e-06, 'epoch': 1.54}
51%|█████▏ | 5929/11526 [1:02:03<57:24, 1.63it/s] 51%|█████▏ | 5930/11526 [1:02:03<57:20, 1.63it/s] {'loss': 0.1733, 'grad_norm': 0.43776461482048035, 'learning_rate': 5.6185224727537135e-06, 'epoch': 1.54}
51%|█████▏ | 5930/11526 [1:02:03<57:20, 1.63it/s] 51%|█████▏ | 5931/11526 [1:02:04<57:25, 1.62it/s] {'loss': 0.2113, 'grad_norm': 0.6077358722686768, 'learning_rate': 5.617019763219169e-06, 'epoch': 1.54}
51%|█████▏ | 5931/11526 [1:02:04<57:25, 1.62it/s] 51%|█████▏ | 5932/11526 [1:02:05<57:22, 1.62it/s] {'loss': 0.1551, 'grad_norm': 0.43965062499046326, 'learning_rate': 5.615516997088062e-06, 'epoch': 1.54}
51%|█████▏ | 5932/11526 [1:02:05<57:22, 1.62it/s] 51%|█████▏ | 5933/11526 [1:02:05<57:20, 1.63it/s] {'loss': 0.1783, 'grad_norm': 0.49057838320732117, 'learning_rate': 5.614014174498232e-06, 'epoch': 1.54}
51%|█████▏ | 5933/11526 [1:02:05<57:20, 1.63it/s] 51%|█████▏ | 5934/11526 [1:02:06<57:18, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.4526456296443939, 'learning_rate': 5.612511295587529e-06, 'epoch': 1.54}
51%|█████▏ | 5934/11526 [1:02:06<57:18, 1.63it/s] 51%|█████▏ | 5935/11526 [1:02:06<57:17, 1.63it/s] {'loss': 0.1868, 'grad_norm': 0.47739580273628235, 'learning_rate': 5.611008360493806e-06, 'epoch': 1.54}
51%|█████▏ | 5935/11526 [1:02:06<57:17, 1.63it/s] 52%|█████▏ | 5936/11526 [1:02:07<57:17, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.519832968711853, 'learning_rate': 5.609505369354918e-06, 'epoch': 1.55}
52%|█████▏ | 5936/11526 [1:02:07<57:17, 1.63it/s] 52%|█████▏ | 5937/11526 [1:02:08<57:15, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.5193149447441101, 'learning_rate': 5.608002322308731e-06, 'epoch': 1.55}
52%|█████▏ | 5937/11526 [1:02:08<57:15, 1.63it/s] 52%|█████▏ | 5938/11526 [1:02:08<57:14, 1.63it/s] {'loss': 0.2957, 'grad_norm': 0.6647741794586182, 'learning_rate': 5.6064992194931115e-06, 'epoch': 1.55}
52%|█████▏ | 5938/11526 [1:02:08<57:14, 1.63it/s] 52%|█████▏ | 5939/11526 [1:02:09<57:16, 1.63it/s] {'loss': 0.2095, 'grad_norm': 0.5703453421592712, 'learning_rate': 5.604996061045932e-06, 'epoch': 1.55}
52%|█████▏ | 5939/11526 [1:02:09<57:16, 1.63it/s] 52%|█████▏ | 5940/11526 [1:02:09<57:14, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.5079355239868164, 'learning_rate': 5.603492847105072e-06, 'epoch': 1.55}
52%|█████▏ | 5940/11526 [1:02:10<57:14, 1.63it/s] 52%|█████▏ | 5941/11526 [1:02:10<57:15, 1.63it/s] {'loss': 0.2236, 'grad_norm': 0.5906720161437988, 'learning_rate': 5.601989577808413e-06, 'epoch': 1.55}
52%|█████▏ | 5941/11526 [1:02:10<57:15, 1.63it/s] 52%|█████▏ | 5942/11526 [1:02:11<57:13, 1.63it/s] {'loss': 0.2725, 'grad_norm': 0.72877037525177, 'learning_rate': 5.600486253293847e-06, 'epoch': 1.55}
52%|█████▏ | 5942/11526 [1:02:11<57:13, 1.63it/s] 52%|█████▏ | 5943/11526 [1:02:11<57:11, 1.63it/s] {'loss': 0.2582, 'grad_norm': 0.5764105319976807, 'learning_rate': 5.598982873699264e-06, 'epoch': 1.55}
52%|█████▏ | 5943/11526 [1:02:11<57:11, 1.63it/s] 52%|█████▏ | 5944/11526 [1:02:12<57:10, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5436527729034424, 'learning_rate': 5.597479439162564e-06, 'epoch': 1.55}
52%|█████▏ | 5944/11526 [1:02:12<57:10, 1.63it/s] 52%|█████▏ | 5945/11526 [1:02:13<57:07, 1.63it/s] {'loss': 0.1891, 'grad_norm': 0.5204329490661621, 'learning_rate': 5.595975949821649e-06, 'epoch': 1.55}
52%|█████▏ | 5945/11526 [1:02:13<57:07, 1.63it/s] 52%|█████▏ | 5946/11526 [1:02:13<57:06, 1.63it/s] {'loss': 0.2334, 'grad_norm': 0.5980271697044373, 'learning_rate': 5.594472405814432e-06, 'epoch': 1.55}
52%|█████▏ | 5946/11526 [1:02:13<57:06, 1.63it/s] 52%|█████▏ | 5947/11526 [1:02:14<57:07, 1.63it/s] {'loss': 0.203, 'grad_norm': 0.5693639516830444, 'learning_rate': 5.59296880727882e-06, 'epoch': 1.55}
52%|█████▏ | 5947/11526 [1:02:14<57:07, 1.63it/s] 52%|█████▏ | 5948/11526 [1:02:14<57:05, 1.63it/s] {'loss': 0.1577, 'grad_norm': 0.43945786356925964, 'learning_rate': 5.591465154352738e-06, 'epoch': 1.55}
52%|█████▏ | 5948/11526 [1:02:14<57:05, 1.63it/s] 52%|█████▏ | 5949/11526 [1:02:15<57:08, 1.63it/s] {'loss': 0.1988, 'grad_norm': 0.5038434863090515, 'learning_rate': 5.589961447174104e-06, 'epoch': 1.55}
52%|█████▏ | 5949/11526 [1:02:15<57:08, 1.63it/s] 52%|█████▏ | 5950/11526 [1:02:16<57:06, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.6482704877853394, 'learning_rate': 5.588457685880851e-06, 'epoch': 1.55}
52%|█████▏ | 5950/11526 [1:02:16<57:06, 1.63it/s] 52%|█████▏ | 5951/11526 [1:02:16<57:03, 1.63it/s] {'loss': 0.2291, 'grad_norm': 0.5789526104927063, 'learning_rate': 5.586953870610911e-06, 'epoch': 1.55}
52%|█████▏ | 5951/11526 [1:02:16<57:03, 1.63it/s] 52%|█████▏ | 5952/11526 [1:02:17<57:01, 1.63it/s] {'loss': 0.2131, 'grad_norm': 0.5579162240028381, 'learning_rate': 5.585450001502223e-06, 'epoch': 1.55}
52%|█████▏ | 5952/11526 [1:02:17<57:01, 1.63it/s] 52%|█████▏ | 5953/11526 [1:02:17<57:00, 1.63it/s] {'loss': 0.1757, 'grad_norm': 0.5209652781486511, 'learning_rate': 5.5839460786927295e-06, 'epoch': 1.55}
52%|█████▏ | 5953/11526 [1:02:18<57:00, 1.63it/s] 52%|█████▏ | 5954/11526 [1:02:18<56:59, 1.63it/s] {'loss': 0.2173, 'grad_norm': 0.5890516638755798, 'learning_rate': 5.582442102320378e-06, 'epoch': 1.55}
52%|█████▏ | 5954/11526 [1:02:18<56:59, 1.63it/s] 52%|█████▏ | 5955/11526 [1:02:19<56:59, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.6049984693527222, 'learning_rate': 5.5809380725231236e-06, 'epoch': 1.55}
52%|█████▏ | 5955/11526 [1:02:19<56:59, 1.63it/s] 52%|█████▏ | 5956/11526 [1:02:19<57:01, 1.63it/s] {'loss': 0.2191, 'grad_norm': 0.5862400531768799, 'learning_rate': 5.579433989438923e-06, 'epoch': 1.55}
52%|█████▏ | 5956/11526 [1:02:19<57:01, 1.63it/s] 52%|█████▏ | 5957/11526 [1:02:20<56:59, 1.63it/s] {'loss': 0.214, 'grad_norm': 0.599924623966217, 'learning_rate': 5.57792985320574e-06, 'epoch': 1.55}
52%|█████▏ | 5957/11526 [1:02:20<56:59, 1.63it/s] 52%|█████▏ | 5958/11526 [1:02:20<56:57, 1.63it/s] {'loss': 0.1713, 'grad_norm': 0.49292629957199097, 'learning_rate': 5.5764256639615434e-06, 'epoch': 1.55}
52%|█████▏ | 5958/11526 [1:02:21<56:57, 1.63it/s] 52%|█████▏ | 5959/11526 [1:02:21<57:00, 1.63it/s] {'loss': 0.2228, 'grad_norm': 0.5258201360702515, 'learning_rate': 5.574921421844306e-06, 'epoch': 1.55}
52%|█████▏ | 5959/11526 [1:02:21<57:00, 1.63it/s] 52%|█████▏ | 5960/11526 [1:02:22<56:57, 1.63it/s] {'loss': 0.1895, 'grad_norm': 0.5120235085487366, 'learning_rate': 5.573417126992004e-06, 'epoch': 1.55}
52%|█████▏ | 5960/11526 [1:02:22<56:57, 1.63it/s] 52%|█████▏ | 5961/11526 [1:02:22<56:56, 1.63it/s] {'loss': 0.14, 'grad_norm': 0.46791714429855347, 'learning_rate': 5.5719127795426185e-06, 'epoch': 1.55}
52%|█████▏ | 5961/11526 [1:02:22<56:56, 1.63it/s] 52%|█████▏ | 5962/11526 [1:02:23<56:54, 1.63it/s] {'loss': 0.1768, 'grad_norm': 0.5632709860801697, 'learning_rate': 5.5704083796341415e-06, 'epoch': 1.55}
52%|█████▏ | 5962/11526 [1:02:23<56:54, 1.63it/s] 52%|█████▏ | 5963/11526 [1:02:24<56:52, 1.63it/s] {'loss': 0.2365, 'grad_norm': 0.5223758220672607, 'learning_rate': 5.568903927404561e-06, 'epoch': 1.55}
52%|█████▏ | 5963/11526 [1:02:24<56:52, 1.63it/s] 52%|█████▏ | 5964/11526 [1:02:24<56:51, 1.63it/s] {'loss': 0.2163, 'grad_norm': 0.5990880131721497, 'learning_rate': 5.567399422991876e-06, 'epoch': 1.55}
52%|█████▏ | 5964/11526 [1:02:24<56:51, 1.63it/s] 52%|█████▏ | 5965/11526 [1:02:25<56:51, 1.63it/s] {'loss': 0.2279, 'grad_norm': 0.5915990471839905, 'learning_rate': 5.565894866534086e-06, 'epoch': 1.55}
52%|█████▏ | 5965/11526 [1:02:25<56:51, 1.63it/s] 52%|█████▏ | 5966/11526 [1:02:25<56:50, 1.63it/s] {'loss': 0.1827, 'grad_norm': 0.5845610499382019, 'learning_rate': 5.564390258169201e-06, 'epoch': 1.55}
52%|█████▏ | 5966/11526 [1:02:26<56:50, 1.63it/s] 52%|█████▏ | 5967/11526 [1:02:26<56:49, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.6381760835647583, 'learning_rate': 5.56288559803523e-06, 'epoch': 1.55}
52%|█████▏ | 5967/11526 [1:02:26<56:49, 1.63it/s] 52%|█████▏ | 5968/11526 [1:02:27<56:52, 1.63it/s] {'loss': 0.1754, 'grad_norm': 0.47031599283218384, 'learning_rate': 5.5613808862701876e-06, 'epoch': 1.55}
52%|█████▏ | 5968/11526 [1:02:27<56:52, 1.63it/s] 52%|█████▏ | 5969/11526 [1:02:27<56:51, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.4794989824295044, 'learning_rate': 5.559876123012098e-06, 'epoch': 1.55}
52%|█████▏ | 5969/11526 [1:02:27<56:51, 1.63it/s] 52%|█████▏ | 5970/11526 [1:02:28<56:53, 1.63it/s] {'loss': 0.2461, 'grad_norm': 0.5376240015029907, 'learning_rate': 5.558371308398984e-06, 'epoch': 1.55}
52%|█████▏ | 5970/11526 [1:02:28<56:53, 1.63it/s] 52%|█████▏ | 5971/11526 [1:02:28<56:51, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.5760928988456726, 'learning_rate': 5.556866442568878e-06, 'epoch': 1.55}
52%|█████▏ | 5971/11526 [1:02:29<56:51, 1.63it/s] 52%|█████▏ | 5972/11526 [1:02:29<56:48, 1.63it/s] {'loss': 0.1693, 'grad_norm': 0.5910878777503967, 'learning_rate': 5.555361525659812e-06, 'epoch': 1.55}
52%|█████▏ | 5972/11526 [1:02:29<56:48, 1.63it/s] 52%|█████▏ | 5973/11526 [1:02:30<56:47, 1.63it/s] {'loss': 0.205, 'grad_norm': 0.6379966735839844, 'learning_rate': 5.553856557809827e-06, 'epoch': 1.55}
52%|█████▏ | 5973/11526 [1:02:30<56:47, 1.63it/s] 52%|█████▏ | 5974/11526 [1:02:30<56:49, 1.63it/s] {'loss': 0.211, 'grad_norm': 0.554924726486206, 'learning_rate': 5.552351539156967e-06, 'epoch': 1.55}
52%|█████▏ | 5974/11526 [1:02:30<56:49, 1.63it/s] 52%|█████▏ | 5975/11526 [1:02:31<56:56, 1.62it/s] {'loss': 0.2961, 'grad_norm': 0.7816564440727234, 'learning_rate': 5.550846469839282e-06, 'epoch': 1.56}
52%|█████▏ | 5975/11526 [1:02:31<56:56, 1.62it/s] 52%|█████▏ | 5976/11526 [1:02:32<56:55, 1.63it/s] {'loss': 0.1485, 'grad_norm': 0.4384227395057678, 'learning_rate': 5.549341349994824e-06, 'epoch': 1.56}
52%|█████▏ | 5976/11526 [1:02:32<56:55, 1.63it/s] 52%|█████▏ | 5977/11526 [1:02:32<56:54, 1.63it/s] {'loss': 0.1948, 'grad_norm': 0.5292698740959167, 'learning_rate': 5.547836179761652e-06, 'epoch': 1.56}
52%|█████▏ | 5977/11526 [1:02:32<56:54, 1.63it/s] 52%|█████▏ | 5978/11526 [1:02:33<56:52, 1.63it/s] {'loss': 0.1819, 'grad_norm': 0.5112382769584656, 'learning_rate': 5.546330959277828e-06, 'epoch': 1.56}
52%|█████▏ | 5978/11526 [1:02:33<56:52, 1.63it/s] 52%|█████▏ | 5979/11526 [1:02:33<56:53, 1.63it/s] {'loss': 0.1673, 'grad_norm': 0.5156438946723938, 'learning_rate': 5.544825688681421e-06, 'epoch': 1.56}
52%|█████▏ | 5979/11526 [1:02:34<56:53, 1.63it/s] 52%|█████▏ | 5980/11526 [1:02:34<56:50, 1.63it/s] {'loss': 0.212, 'grad_norm': 0.535594642162323, 'learning_rate': 5.543320368110501e-06, 'epoch': 1.56}
52%|█████▏ | 5980/11526 [1:02:34<56:50, 1.63it/s] 52%|█████▏ | 5981/11526 [1:02:35<56:53, 1.62it/s] {'loss': 0.1861, 'grad_norm': 0.4999571144580841, 'learning_rate': 5.541814997703145e-06, 'epoch': 1.56}
52%|█████▏ | 5981/11526 [1:02:35<56:53, 1.62it/s] 52%|█████▏ | 5982/11526 [1:02:35<56:49, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.4962005615234375, 'learning_rate': 5.5403095775974365e-06, 'epoch': 1.56}
52%|█████▏ | 5982/11526 [1:02:35<56:49, 1.63it/s] 52%|█████▏ | 5983/11526 [1:02:36<56:46, 1.63it/s] {'loss': 0.2608, 'grad_norm': 0.6558687090873718, 'learning_rate': 5.538804107931457e-06, 'epoch': 1.56}
52%|█████▏ | 5983/11526 [1:02:36<56:46, 1.63it/s] 52%|█████▏ | 5984/11526 [1:02:36<56:47, 1.63it/s] {'loss': 0.2467, 'grad_norm': 0.5850653648376465, 'learning_rate': 5.537298588843302e-06, 'epoch': 1.56}
52%|█████▏ | 5984/11526 [1:02:37<56:47, 1.63it/s] 52%|█████▏ | 5985/11526 [1:02:37<56:45, 1.63it/s] {'loss': 0.2551, 'grad_norm': 0.66200190782547, 'learning_rate': 5.5357930204710605e-06, 'epoch': 1.56}
52%|█████▏ | 5985/11526 [1:02:37<56:45, 1.63it/s] 52%|█████▏ | 5986/11526 [1:02:38<56:48, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.45837509632110596, 'learning_rate': 5.534287402952837e-06, 'epoch': 1.56}
52%|█████▏ | 5986/11526 [1:02:38<56:48, 1.63it/s] 52%|█████▏ | 5987/11526 [1:02:38<56:47, 1.63it/s] {'loss': 0.171, 'grad_norm': 0.4231340289115906, 'learning_rate': 5.53278173642673e-06, 'epoch': 1.56}
52%|█████▏ | 5987/11526 [1:02:38<56:47, 1.63it/s] 52%|█████▏ | 5988/11526 [1:02:39<56:44, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.503780722618103, 'learning_rate': 5.531276021030852e-06, 'epoch': 1.56}
52%|█████▏ | 5988/11526 [1:02:39<56:44, 1.63it/s] 52%|█████▏ | 5989/11526 [1:02:40<56:44, 1.63it/s] {'loss': 0.2688, 'grad_norm': 0.572690486907959, 'learning_rate': 5.529770256903315e-06, 'epoch': 1.56}
52%|█████▏ | 5989/11526 [1:02:40<56:44, 1.63it/s] 52%|█████▏ | 5990/11526 [1:02:40<56:44, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.5469558835029602, 'learning_rate': 5.528264444182236e-06, 'epoch': 1.56}
52%|█████▏ | 5990/11526 [1:02:40<56:44, 1.63it/s] 52%|█████▏ | 5991/11526 [1:02:41<56:44, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.5593962669372559, 'learning_rate': 5.526758583005736e-06, 'epoch': 1.56}
52%|█████▏ | 5991/11526 [1:02:41<56:44, 1.63it/s] 52%|█████▏ | 5992/11526 [1:02:41<56:43, 1.63it/s] {'loss': 0.1839, 'grad_norm': 0.4699085056781769, 'learning_rate': 5.525252673511942e-06, 'epoch': 1.56}
52%|█████▏ | 5992/11526 [1:02:42<56:43, 1.63it/s] 52%|█████▏ | 5993/11526 [1:02:42<56:41, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.4892418682575226, 'learning_rate': 5.523746715838985e-06, 'epoch': 1.56}
52%|█████▏ | 5993/11526 [1:02:42<56:41, 1.63it/s] 52%|█████▏ | 5994/11526 [1:02:43<56:57, 1.62it/s] {'loss': 0.1966, 'grad_norm': 0.5063052773475647, 'learning_rate': 5.5222407101249966e-06, 'epoch': 1.56}
52%|█████▏ | 5994/11526 [1:02:43<56:57, 1.62it/s] 52%|█████▏ | 5995/11526 [1:02:43<56:52, 1.62it/s] {'loss': 0.2276, 'grad_norm': 0.5365422964096069, 'learning_rate': 5.520734656508121e-06, 'epoch': 1.56}
52%|█████▏ | 5995/11526 [1:02:43<56:52, 1.62it/s] 52%|█████▏ | 5996/11526 [1:02:44<56:50, 1.62it/s] {'loss': 0.1726, 'grad_norm': 0.505679726600647, 'learning_rate': 5.519228555126499e-06, 'epoch': 1.56}
52%|█████▏ | 5996/11526 [1:02:44<56:50, 1.62it/s] 52%|█████▏ | 5997/11526 [1:02:44<56:44, 1.62it/s] {'loss': 0.1591, 'grad_norm': 0.4817415773868561, 'learning_rate': 5.517722406118281e-06, 'epoch': 1.56}
52%|█████▏ | 5997/11526 [1:02:45<56:44, 1.62it/s] 52%|█████▏ | 5998/11526 [1:02:45<56:46, 1.62it/s] {'loss': 0.241, 'grad_norm': 0.5340906977653503, 'learning_rate': 5.516216209621615e-06, 'epoch': 1.56}
52%|█████▏ | 5998/11526 [1:02:45<56:46, 1.62it/s] 52%|█████▏ | 5999/11526 [1:02:46<56:40, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.49737420678138733, 'learning_rate': 5.514709965774664e-06, 'epoch': 1.56}
52%|█████▏ | 5999/11526 [1:02:46<56:40, 1.63it/s] 52%|█████▏ | 6000/11526 [1:02:46<56:38, 1.63it/s] {'loss': 0.1757, 'grad_norm': 0.4628431797027588, 'learning_rate': 5.513203674715586e-06, 'epoch': 1.56}
52%|█████▏ | 6000/11526 [1:02:46<56:38, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5627127885818481, 'eval_runtime': 1.9561, 'eval_samples_per_second': 102.243, 'eval_steps_per_second': 6.646, 'epoch': 1.56}
52%|█████▏ | 6000/11526 [1:02:48<56:38, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 52%|█████▏ | 6001/11526 [1:02:49<1:50:52, 1.20s/it] {'loss': 0.1625, 'grad_norm': 0.4362161457538605, 'learning_rate': 5.511697336582547e-06, 'epoch': 1.56}
52%|█████▏ | 6001/11526 [1:02:49<1:50:52, 1.20s/it] 52%|█████▏ | 6002/11526 [1:02:50<1:34:35, 1.03s/it] {'loss': 0.2073, 'grad_norm': 0.5243520140647888, 'learning_rate': 5.510190951513716e-06, 'epoch': 1.56}
52%|█████▏ | 6002/11526 [1:02:50<1:34:35, 1.03s/it] 52%|█████▏ | 6003/11526 [1:02:50<1:23:09, 1.11it/s] {'loss': 0.1841, 'grad_norm': 0.46866872906684875, 'learning_rate': 5.5086845196472684e-06, 'epoch': 1.56}
52%|█████▏ | 6003/11526 [1:02:50<1:23:09, 1.11it/s] 52%|█████▏ | 6004/11526 [1:02:51<1:15:15, 1.22it/s] {'loss': 0.2057, 'grad_norm': 0.5539171099662781, 'learning_rate': 5.507178041121383e-06, 'epoch': 1.56}
52%|█████▏ | 6004/11526 [1:02:51<1:15:15, 1.22it/s] 52%|█████▏ | 6005/11526 [1:02:51<1:09:37, 1.32it/s] {'loss': 0.2153, 'grad_norm': 0.6330326199531555, 'learning_rate': 5.505671516074242e-06, 'epoch': 1.56}
52%|█████▏ | 6005/11526 [1:02:51<1:09:37, 1.32it/s] 52%|█████▏ | 6006/11526 [1:02:52<1:05:38, 1.40it/s] {'loss': 0.2174, 'grad_norm': 0.4957272708415985, 'learning_rate': 5.50416494464403e-06, 'epoch': 1.56}
52%|█████▏ | 6006/11526 [1:02:52<1:05:38, 1.40it/s] 52%|█████▏ | 6007/11526 [1:02:53<1:02:54, 1.46it/s] {'loss': 0.1837, 'grad_norm': 0.5456900000572205, 'learning_rate': 5.502658326968943e-06, 'epoch': 1.56}
52%|█████▏ | 6007/11526 [1:02:53<1:02:54, 1.46it/s] 52%|█████▏ | 6008/11526 [1:02:53<1:00:58, 1.51it/s] {'loss': 0.2482, 'grad_norm': 0.6182568073272705, 'learning_rate': 5.501151663187174e-06, 'epoch': 1.56}
52%|█████▏ | 6008/11526 [1:02:53<1:00:58, 1.51it/s] 52%|█████▏ | 6009/11526 [1:02:54<59:41, 1.54it/s] {'loss': 0.2009, 'grad_norm': 0.5643497705459595, 'learning_rate': 5.499644953436922e-06, 'epoch': 1.56}
52%|█████▏ | 6009/11526 [1:02:54<59:41, 1.54it/s] 52%|█████▏ | 6010/11526 [1:02:54<58:43, 1.57it/s] {'loss': 0.2371, 'grad_norm': 0.6202389001846313, 'learning_rate': 5.498138197856393e-06, 'epoch': 1.56}
52%|█████▏ | 6010/11526 [1:02:55<58:43, 1.57it/s] 52%|█████▏ | 6011/11526 [1:02:55<58:07, 1.58it/s] {'loss': 0.1937, 'grad_norm': 0.48073652386665344, 'learning_rate': 5.496631396583794e-06, 'epoch': 1.56}
52%|█████▏ | 6011/11526 [1:02:55<58:07, 1.58it/s] 52%|█████▏ | 6012/11526 [1:02:56<57:33, 1.60it/s] {'loss': 0.2248, 'grad_norm': 0.574011504650116, 'learning_rate': 5.4951245497573365e-06, 'epoch': 1.56}
52%|█████▏ | 6012/11526 [1:02:56<57:33, 1.60it/s] 52%|█████▏ | 6013/11526 [1:02:56<57:15, 1.60it/s] {'loss': 0.2368, 'grad_norm': 0.6373533010482788, 'learning_rate': 5.49361765751524e-06, 'epoch': 1.57}
52%|█████▏ | 6013/11526 [1:02:56<57:15, 1.60it/s] 52%|█████▏ | 6014/11526 [1:02:57<57:02, 1.61it/s] {'loss': 0.2028, 'grad_norm': 0.5148013234138489, 'learning_rate': 5.492110719995721e-06, 'epoch': 1.57}
52%|█████▏ | 6014/11526 [1:02:57<57:02, 1.61it/s] 52%|█████▏ | 6015/11526 [1:02:58<56:51, 1.62it/s] {'loss': 0.237, 'grad_norm': 0.6387227773666382, 'learning_rate': 5.490603737337011e-06, 'epoch': 1.57}
52%|█████▏ | 6015/11526 [1:02:58<56:51, 1.62it/s] 52%|█████▏ | 6016/11526 [1:02:58<56:50, 1.62it/s] {'loss': 0.2363, 'grad_norm': 0.52806156873703, 'learning_rate': 5.489096709677331e-06, 'epoch': 1.57}
52%|█████▏ | 6016/11526 [1:02:58<56:50, 1.62it/s] 52%|█████▏ | 6017/11526 [1:02:59<56:44, 1.62it/s] {'loss': 0.1703, 'grad_norm': 0.47676610946655273, 'learning_rate': 5.487589637154918e-06, 'epoch': 1.57}
52%|█████▏ | 6017/11526 [1:02:59<56:44, 1.62it/s] 52%|█████▏ | 6018/11526 [1:02:59<56:39, 1.62it/s] {'loss': 0.1692, 'grad_norm': 0.5161877274513245, 'learning_rate': 5.4860825199080105e-06, 'epoch': 1.57}
52%|█████▏ | 6018/11526 [1:02:59<56:39, 1.62it/s] 52%|█████▏ | 6019/11526 [1:03:00<56:37, 1.62it/s] {'loss': 0.2474, 'grad_norm': 0.646659791469574, 'learning_rate': 5.484575358074849e-06, 'epoch': 1.57}
52%|█████▏ | 6019/11526 [1:03:00<56:37, 1.62it/s] 52%|█████▏ | 6020/11526 [1:03:01<56:34, 1.62it/s] {'loss': 0.2144, 'grad_norm': 0.5186123847961426, 'learning_rate': 5.483068151793678e-06, 'epoch': 1.57}
52%|█████▏ | 6020/11526 [1:03:01<56:34, 1.62it/s] 52%|█████▏ | 6021/11526 [1:03:01<56:33, 1.62it/s] {'loss': 0.1629, 'grad_norm': 0.5046865940093994, 'learning_rate': 5.481560901202746e-06, 'epoch': 1.57}
52%|█████▏ | 6021/11526 [1:03:01<56:33, 1.62it/s] 52%|█████▏ | 6022/11526 [1:03:02<56:30, 1.62it/s] {'loss': 0.1667, 'grad_norm': 0.4758676588535309, 'learning_rate': 5.480053606440311e-06, 'epoch': 1.57}
52%|█████▏ | 6022/11526 [1:03:02<56:30, 1.62it/s] 52%|█████▏ | 6023/11526 [1:03:02<56:26, 1.62it/s] {'loss': 0.2172, 'grad_norm': 0.5784633159637451, 'learning_rate': 5.4785462676446255e-06, 'epoch': 1.57}
52%|█████▏ | 6023/11526 [1:03:03<56:26, 1.62it/s] 52%|█████▏ | 6024/11526 [1:03:03<56:29, 1.62it/s] {'loss': 0.2293, 'grad_norm': 0.5956320762634277, 'learning_rate': 5.477038884953955e-06, 'epoch': 1.57}
52%|█████▏ | 6024/11526 [1:03:03<56:29, 1.62it/s] 52%|█████▏ | 6025/11526 [1:03:04<56:27, 1.62it/s] {'loss': 0.1892, 'grad_norm': 0.5353350639343262, 'learning_rate': 5.475531458506562e-06, 'epoch': 1.57}
52%|█████▏ | 6025/11526 [1:03:04<56:27, 1.62it/s] 52%|█████▏ | 6026/11526 [1:03:04<56:29, 1.62it/s] {'loss': 0.1795, 'grad_norm': 0.5023013353347778, 'learning_rate': 5.4740239884407195e-06, 'epoch': 1.57}
52%|█████▏ | 6026/11526 [1:03:04<56:29, 1.62it/s] 52%|█████▏ | 6027/11526 [1:03:05<56:24, 1.62it/s] {'loss': 0.201, 'grad_norm': 0.5700668692588806, 'learning_rate': 5.4725164748947005e-06, 'epoch': 1.57}
52%|█████▏ | 6027/11526 [1:03:05<56:24, 1.62it/s] 52%|█████▏ | 6028/11526 [1:03:06<56:20, 1.63it/s] {'loss': 0.3198, 'grad_norm': 0.7587442994117737, 'learning_rate': 5.471008918006779e-06, 'epoch': 1.57}
52%|█████▏ | 6028/11526 [1:03:06<56:20, 1.63it/s] 52%|█████▏ | 6029/11526 [1:03:06<56:23, 1.62it/s] {'loss': 0.1836, 'grad_norm': 0.51140296459198, 'learning_rate': 5.469501317915242e-06, 'epoch': 1.57}
52%|█████▏ | 6029/11526 [1:03:06<56:23, 1.62it/s] 52%|█████▏ | 6030/11526 [1:03:07<56:20, 1.63it/s] {'loss': 0.2168, 'grad_norm': 0.5339359641075134, 'learning_rate': 5.467993674758373e-06, 'epoch': 1.57}
52%|█████▏ | 6030/11526 [1:03:07<56:20, 1.63it/s] 52%|█████▏ | 6031/11526 [1:03:07<56:23, 1.62it/s] {'loss': 0.1769, 'grad_norm': 0.5238810181617737, 'learning_rate': 5.4664859886744615e-06, 'epoch': 1.57}
52%|█████▏ | 6031/11526 [1:03:07<56:23, 1.62it/s] 52%|█████▏ | 6032/11526 [1:03:08<56:21, 1.62it/s] {'loss': 0.2308, 'grad_norm': 0.5634828805923462, 'learning_rate': 5.464978259801797e-06, 'epoch': 1.57}
52%|█████▏ | 6032/11526 [1:03:08<56:21, 1.62it/s] 52%|█████▏ | 6033/11526 [1:03:09<56:18, 1.63it/s] {'loss': 0.2812, 'grad_norm': 0.6262320280075073, 'learning_rate': 5.463470488278686e-06, 'epoch': 1.57}
52%|█████▏ | 6033/11526 [1:03:09<56:18, 1.63it/s] 52%|█████▏ | 6034/11526 [1:03:09<56:15, 1.63it/s] {'loss': 0.1446, 'grad_norm': 0.38116955757141113, 'learning_rate': 5.4619626742434215e-06, 'epoch': 1.57}
52%|█████▏ | 6034/11526 [1:03:09<56:15, 1.63it/s] 52%|█████▏ | 6035/11526 [1:03:10<56:13, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.7856459617614746, 'learning_rate': 5.460454817834314e-06, 'epoch': 1.57}
52%|█████▏ | 6035/11526 [1:03:10<56:13, 1.63it/s] 52%|█████▏ | 6036/11526 [1:03:10<56:14, 1.63it/s] {'loss': 0.1869, 'grad_norm': 0.507939338684082, 'learning_rate': 5.4589469191896685e-06, 'epoch': 1.57}
52%|█████▏ | 6036/11526 [1:03:11<56:14, 1.63it/s] 52%|█████▏ | 6037/11526 [1:03:11<56:13, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.5424343943595886, 'learning_rate': 5.457438978447802e-06, 'epoch': 1.57}
52%|█████▏ | 6037/11526 [1:03:11<56:13, 1.63it/s] 52%|█████▏ | 6038/11526 [1:03:12<56:13, 1.63it/s] {'loss': 0.281, 'grad_norm': 0.626420259475708, 'learning_rate': 5.455930995747029e-06, 'epoch': 1.57}
52%|█████▏ | 6038/11526 [1:03:12<56:13, 1.63it/s] 52%|█████▏ | 6039/11526 [1:03:12<56:10, 1.63it/s] {'loss': 0.1879, 'grad_norm': 0.5068703889846802, 'learning_rate': 5.454422971225673e-06, 'epoch': 1.57}
52%|█████▏ | 6039/11526 [1:03:12<56:10, 1.63it/s] 52%|█████▏ | 6040/11526 [1:03:13<56:08, 1.63it/s] {'loss': 0.1997, 'grad_norm': 0.5678920745849609, 'learning_rate': 5.4529149050220534e-06, 'epoch': 1.57}
52%|█████▏ | 6040/11526 [1:03:13<56:08, 1.63it/s] 52%|█████▏ | 6041/11526 [1:03:13<56:13, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.5444497466087341, 'learning_rate': 5.451406797274503e-06, 'epoch': 1.57}
52%|█████▏ | 6041/11526 [1:03:14<56:13, 1.63it/s] 52%|█████▏ | 6042/11526 [1:03:14<56:10, 1.63it/s] {'loss': 0.1885, 'grad_norm': 0.5634726285934448, 'learning_rate': 5.449898648121355e-06, 'epoch': 1.57}
52%|█████▏ | 6042/11526 [1:03:14<56:10, 1.63it/s] 52%|█████▏ | 6043/11526 [1:03:15<56:07, 1.63it/s] {'loss': 0.1804, 'grad_norm': 0.4628235995769501, 'learning_rate': 5.448390457700939e-06, 'epoch': 1.57}
52%|█████▏ | 6043/11526 [1:03:15<56:07, 1.63it/s] 52%|█████▏ | 6044/11526 [1:03:15<56:09, 1.63it/s] {'loss': 0.2334, 'grad_norm': 0.519091010093689, 'learning_rate': 5.446882226151602e-06, 'epoch': 1.57}
52%|█████▏ | 6044/11526 [1:03:15<56:09, 1.63it/s] 52%|█████▏ | 6045/11526 [1:03:16<56:07, 1.63it/s] {'loss': 0.2337, 'grad_norm': 0.5430852174758911, 'learning_rate': 5.4453739536116826e-06, 'epoch': 1.57}
52%|█████▏ | 6045/11526 [1:03:16<56:07, 1.63it/s] 52%|█████▏ | 6046/11526 [1:03:17<56:10, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.48884445428848267, 'learning_rate': 5.443865640219532e-06, 'epoch': 1.57}
52%|█████▏ | 6046/11526 [1:03:17<56:10, 1.63it/s] 52%|█████▏ | 6047/11526 [1:03:17<56:08, 1.63it/s] {'loss': 0.1925, 'grad_norm': 0.5398766398429871, 'learning_rate': 5.442357286113497e-06, 'epoch': 1.57}
52%|█████▏ | 6047/11526 [1:03:17<56:08, 1.63it/s] 52%|█████▏ | 6048/11526 [1:03:18<56:08, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.4699999690055847, 'learning_rate': 5.440848891431937e-06, 'epoch': 1.57}
52%|█████▏ | 6048/11526 [1:03:18<56:08, 1.63it/s] 52%|█████▏ | 6049/11526 [1:03:18<56:04, 1.63it/s] {'loss': 0.2601, 'grad_norm': 0.5426648259162903, 'learning_rate': 5.439340456313208e-06, 'epoch': 1.57}
52%|█████▏ | 6049/11526 [1:03:19<56:04, 1.63it/s] 52%|█████▏ | 6050/11526 [1:03:19<56:03, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.43711617588996887, 'learning_rate': 5.437831980895672e-06, 'epoch': 1.57}
52%|█████▏ | 6050/11526 [1:03:19<56:03, 1.63it/s] 52%|█████▏ | 6051/11526 [1:03:20<56:09, 1.63it/s] {'loss': 0.2336, 'grad_norm': 0.4484269618988037, 'learning_rate': 5.436323465317696e-06, 'epoch': 1.57}
52%|█████▏ | 6051/11526 [1:03:20<56:09, 1.63it/s] 53%|█████▎ | 6052/11526 [1:03:20<56:06, 1.63it/s] {'loss': 0.1791, 'grad_norm': 0.4412325322628021, 'learning_rate': 5.4348149097176485e-06, 'epoch': 1.58}
53%|█████▎ | 6052/11526 [1:03:20<56:06, 1.63it/s] 53%|█████▎ | 6053/11526 [1:03:21<56:05, 1.63it/s] {'loss': 0.1926, 'grad_norm': 0.5125789642333984, 'learning_rate': 5.433306314233905e-06, 'epoch': 1.58}
53%|█████▎ | 6053/11526 [1:03:21<56:05, 1.63it/s] 53%|█████▎ | 6054/11526 [1:03:21<56:06, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.5955922603607178, 'learning_rate': 5.43179767900484e-06, 'epoch': 1.58}
53%|█████▎ | 6054/11526 [1:03:22<56:06, 1.63it/s] 53%|█████▎ | 6055/11526 [1:03:22<56:04, 1.63it/s] {'loss': 0.2309, 'grad_norm': 0.6154299974441528, 'learning_rate': 5.430289004168834e-06, 'epoch': 1.58}
53%|█████▎ | 6055/11526 [1:03:22<56:04, 1.63it/s] 53%|█████▎ | 6056/11526 [1:03:23<56:05, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.4726334810256958, 'learning_rate': 5.428780289864273e-06, 'epoch': 1.58}
53%|█████▎ | 6056/11526 [1:03:23<56:05, 1.63it/s] 53%|█████▎ | 6057/11526 [1:03:23<56:03, 1.63it/s] {'loss': 0.2363, 'grad_norm': 0.5814317464828491, 'learning_rate': 5.427271536229545e-06, 'epoch': 1.58}
53%|█████▎ | 6057/11526 [1:03:23<56:03, 1.63it/s] 53%|█████▎ | 6058/11526 [1:03:24<56:00, 1.63it/s] {'loss': 0.2325, 'grad_norm': 0.5811302661895752, 'learning_rate': 5.42576274340304e-06, 'epoch': 1.58}
53%|█████▎ | 6058/11526 [1:03:24<56:00, 1.63it/s] 53%|█████▎ | 6059/11526 [1:03:25<55:59, 1.63it/s] {'loss': 0.1854, 'grad_norm': 0.5102909207344055, 'learning_rate': 5.424253911523153e-06, 'epoch': 1.58}
53%|█████▎ | 6059/11526 [1:03:25<55:59, 1.63it/s] 53%|█████▎ | 6060/11526 [1:03:25<56:00, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.4766002297401428, 'learning_rate': 5.422745040728285e-06, 'epoch': 1.58}
53%|█████▎ | 6060/11526 [1:03:25<56:00, 1.63it/s] 53%|█████▎ | 6061/11526 [1:03:26<56:04, 1.62it/s] {'loss': 0.1643, 'grad_norm': 0.48339399695396423, 'learning_rate': 5.4212361311568355e-06, 'epoch': 1.58}
53%|█████▎ | 6061/11526 [1:03:26<56:04, 1.62it/s] 53%|█████▎ | 6062/11526 [1:03:26<56:02, 1.62it/s] {'loss': 0.235, 'grad_norm': 0.6215381622314453, 'learning_rate': 5.419727182947211e-06, 'epoch': 1.58}
53%|█████▎ | 6062/11526 [1:03:27<56:02, 1.62it/s] 53%|█████▎ | 6063/11526 [1:03:27<56:00, 1.63it/s] {'loss': 0.1945, 'grad_norm': 0.5357658267021179, 'learning_rate': 5.418218196237821e-06, 'epoch': 1.58}
53%|█████▎ | 6063/11526 [1:03:27<56:00, 1.63it/s] 53%|█████▎ | 6064/11526 [1:03:28<56:01, 1.62it/s] {'loss': 0.1828, 'grad_norm': 0.48560431599617004, 'learning_rate': 5.4167091711670794e-06, 'epoch': 1.58}
53%|█████▎ | 6064/11526 [1:03:28<56:01, 1.62it/s] 53%|█████▎ | 6065/11526 [1:03:28<55:57, 1.63it/s] {'loss': 0.1781, 'grad_norm': 0.4724199175834656, 'learning_rate': 5.415200107873402e-06, 'epoch': 1.58}
53%|█████▎ | 6065/11526 [1:03:28<55:57, 1.63it/s] 53%|█████▎ | 6066/11526 [1:03:29<56:01, 1.62it/s] {'loss': 0.1945, 'grad_norm': 0.58516925573349, 'learning_rate': 5.413691006495206e-06, 'epoch': 1.58}
53%|█████▎ | 6066/11526 [1:03:29<56:01, 1.62it/s] 53%|█████▎ | 6067/11526 [1:03:29<55:58, 1.63it/s] {'loss': 0.2862, 'grad_norm': 0.6826393008232117, 'learning_rate': 5.412181867170919e-06, 'epoch': 1.58}
53%|█████▎ | 6067/11526 [1:03:30<55:58, 1.63it/s] 53%|█████▎ | 6068/11526 [1:03:30<55:54, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.4881439208984375, 'learning_rate': 5.4106726900389664e-06, 'epoch': 1.58}
53%|█████▎ | 6068/11526 [1:03:30<55:54, 1.63it/s] 53%|█████▎ | 6069/11526 [1:03:31<55:58, 1.62it/s] {'loss': 0.2834, 'grad_norm': 0.7219222784042358, 'learning_rate': 5.409163475237776e-06, 'epoch': 1.58}
53%|█████▎ | 6069/11526 [1:03:31<55:58, 1.62it/s] 53%|█████▎ | 6070/11526 [1:03:31<55:56, 1.63it/s] {'loss': 0.2552, 'grad_norm': 0.604708731174469, 'learning_rate': 5.407654222905785e-06, 'epoch': 1.58}
53%|█████▎ | 6070/11526 [1:03:31<55:56, 1.63it/s] 53%|█████▎ | 6071/11526 [1:03:32<56:01, 1.62it/s] {'loss': 0.1751, 'grad_norm': 0.5349549651145935, 'learning_rate': 5.406144933181429e-06, 'epoch': 1.58}
53%|█████▎ | 6071/11526 [1:03:32<56:01, 1.62it/s] 53%|█████▎ | 6072/11526 [1:03:33<55:57, 1.62it/s] {'loss': 0.1565, 'grad_norm': 0.4775816798210144, 'learning_rate': 5.40463560620315e-06, 'epoch': 1.58}
53%|█████▎ | 6072/11526 [1:03:33<55:57, 1.62it/s] 53%|█████▎ | 6073/11526 [1:03:33<55:54, 1.63it/s] {'loss': 0.1837, 'grad_norm': 0.511648416519165, 'learning_rate': 5.40312624210939e-06, 'epoch': 1.58}
53%|█████▎ | 6073/11526 [1:03:33<55:54, 1.63it/s] 53%|█████▎ | 6074/11526 [1:03:34<55:57, 1.62it/s] {'loss': 0.2249, 'grad_norm': 0.512361466884613, 'learning_rate': 5.401616841038596e-06, 'epoch': 1.58}
53%|█████▎ | 6074/11526 [1:03:34<55:57, 1.62it/s] 53%|█████▎ | 6075/11526 [1:03:34<55:54, 1.62it/s] {'loss': 0.1904, 'grad_norm': 0.5198235511779785, 'learning_rate': 5.4001074031292225e-06, 'epoch': 1.58}
53%|█████▎ | 6075/11526 [1:03:35<55:54, 1.62it/s] 53%|█████▎ | 6076/11526 [1:03:35<56:01, 1.62it/s] {'loss': 0.1871, 'grad_norm': 0.5468631982803345, 'learning_rate': 5.3985979285197195e-06, 'epoch': 1.58}
53%|█████▎ | 6076/11526 [1:03:35<56:01, 1.62it/s] 53%|█████▎ | 6077/11526 [1:03:36<55:58, 1.62it/s] {'loss': 0.1849, 'grad_norm': 0.4778958857059479, 'learning_rate': 5.397088417348548e-06, 'epoch': 1.58}
53%|█████▎ | 6077/11526 [1:03:36<55:58, 1.62it/s] 53%|█████▎ | 6078/11526 [1:03:36<55:56, 1.62it/s] {'loss': 0.2249, 'grad_norm': 0.628387987613678, 'learning_rate': 5.395578869754167e-06, 'epoch': 1.58}
53%|█████▎ | 6078/11526 [1:03:36<55:56, 1.62it/s] 53%|█████▎ | 6079/11526 [1:03:37<56:07, 1.62it/s] {'loss': 0.1844, 'grad_norm': 0.48935195803642273, 'learning_rate': 5.394069285875041e-06, 'epoch': 1.58}
53%|█████▎ | 6079/11526 [1:03:37<56:07, 1.62it/s] 53%|█████▎ | 6080/11526 [1:03:38<56:01, 1.62it/s] {'loss': 0.2075, 'grad_norm': 0.6498350501060486, 'learning_rate': 5.39255966584964e-06, 'epoch': 1.58}
53%|█████▎ | 6080/11526 [1:03:38<56:01, 1.62it/s] 53%|█████▎ | 6081/11526 [1:03:38<55:58, 1.62it/s] {'loss': 0.1522, 'grad_norm': 0.42192143201828003, 'learning_rate': 5.39105000981643e-06, 'epoch': 1.58}
53%|█████▎ | 6081/11526 [1:03:38<55:58, 1.62it/s] 53%|█████▎ | 6082/11526 [1:03:39<55:52, 1.62it/s] {'loss': 0.2069, 'grad_norm': 0.5763571262359619, 'learning_rate': 5.3895403179138895e-06, 'epoch': 1.58}
53%|█████▎ | 6082/11526 [1:03:39<55:52, 1.62it/s] 53%|█████▎ | 6083/11526 [1:03:39<55:49, 1.62it/s] {'loss': 0.2013, 'grad_norm': 0.5388515591621399, 'learning_rate': 5.388030590280493e-06, 'epoch': 1.58}
53%|█████▎ | 6083/11526 [1:03:39<55:49, 1.62it/s] 53%|█████▎ | 6084/11526 [1:03:40<55:51, 1.62it/s] {'loss': 0.1699, 'grad_norm': 0.4459388852119446, 'learning_rate': 5.386520827054725e-06, 'epoch': 1.58}
53%|█████▎ | 6084/11526 [1:03:40<55:51, 1.62it/s] 53%|█████▎ | 6085/11526 [1:03:41<55:46, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.5842627882957458, 'learning_rate': 5.385011028375067e-06, 'epoch': 1.58}
53%|█████▎ | 6085/11526 [1:03:41<55:46, 1.63it/s] 53%|█████▎ | 6086/11526 [1:03:41<55:47, 1.62it/s] {'loss': 0.1901, 'grad_norm': 0.5585426092147827, 'learning_rate': 5.383501194380006e-06, 'epoch': 1.58}
53%|█████▎ | 6086/11526 [1:03:41<55:47, 1.62it/s] 53%|█████▎ | 6087/11526 [1:03:42<55:45, 1.63it/s] {'loss': 0.2429, 'grad_norm': 0.600700318813324, 'learning_rate': 5.381991325208035e-06, 'epoch': 1.58}
53%|█████▎ | 6087/11526 [1:03:42<55:45, 1.63it/s] 53%|█████▎ | 6088/11526 [1:03:42<55:42, 1.63it/s] {'loss': 0.275, 'grad_norm': 0.6222581267356873, 'learning_rate': 5.380481420997645e-06, 'epoch': 1.58}
53%|█████▎ | 6088/11526 [1:03:43<55:42, 1.63it/s] 53%|█████▎ | 6089/11526 [1:03:43<55:44, 1.63it/s] {'loss': 0.1989, 'grad_norm': 0.544299304485321, 'learning_rate': 5.3789714818873355e-06, 'epoch': 1.58}
53%|█████▎ | 6089/11526 [1:03:43<55:44, 1.63it/s] 53%|█████▎ | 6090/11526 [1:03:44<55:42, 1.63it/s] {'loss': 0.1682, 'grad_norm': 0.4783487915992737, 'learning_rate': 5.3774615080156035e-06, 'epoch': 1.59}
53%|█████▎ | 6090/11526 [1:03:44<55:42, 1.63it/s] 53%|█████▎ | 6091/11526 [1:03:44<55:41, 1.63it/s] {'loss': 0.2807, 'grad_norm': 0.5712781548500061, 'learning_rate': 5.375951499520956e-06, 'epoch': 1.59}
53%|█████▎ | 6091/11526 [1:03:44<55:41, 1.63it/s] 53%|█████▎ | 6092/11526 [1:03:45<55:41, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.4793972969055176, 'learning_rate': 5.374441456541897e-06, 'epoch': 1.59}
53%|█████▎ | 6092/11526 [1:03:45<55:41, 1.63it/s] 53%|█████▎ | 6093/11526 [1:03:45<55:41, 1.63it/s] {'loss': 0.2254, 'grad_norm': 0.5739443302154541, 'learning_rate': 5.372931379216937e-06, 'epoch': 1.59}
53%|█████▎ | 6093/11526 [1:03:46<55:41, 1.63it/s] 53%|█████▎ | 6094/11526 [1:03:46<55:55, 1.62it/s] {'loss': 0.257, 'grad_norm': 0.6381434202194214, 'learning_rate': 5.371421267684589e-06, 'epoch': 1.59}
53%|█████▎ | 6094/11526 [1:03:46<55:55, 1.62it/s] 53%|█████▎ | 6095/11526 [1:03:47<55:48, 1.62it/s] {'loss': 0.2161, 'grad_norm': 0.6071916222572327, 'learning_rate': 5.3699111220833696e-06, 'epoch': 1.59}
53%|█████▎ | 6095/11526 [1:03:47<55:48, 1.62it/s] 53%|█████▎ | 6096/11526 [1:03:47<55:48, 1.62it/s] {'loss': 0.2147, 'grad_norm': 0.6050119400024414, 'learning_rate': 5.368400942551797e-06, 'epoch': 1.59}
53%|█████▎ | 6096/11526 [1:03:47<55:48, 1.62it/s] 53%|█████▎ | 6097/11526 [1:03:48<55:43, 1.62it/s] {'loss': 0.1676, 'grad_norm': 0.5097556114196777, 'learning_rate': 5.366890729228395e-06, 'epoch': 1.59}
53%|█████▎ | 6097/11526 [1:03:48<55:43, 1.62it/s] 53%|█████▎ | 6098/11526 [1:03:49<55:40, 1.62it/s] {'loss': 0.1976, 'grad_norm': 0.5615293979644775, 'learning_rate': 5.365380482251685e-06, 'epoch': 1.59}
53%|█████▎ | 6098/11526 [1:03:49<55:40, 1.62it/s] 53%|█████▎ | 6099/11526 [1:03:49<55:41, 1.62it/s] {'loss': 0.1656, 'grad_norm': 0.4605031907558441, 'learning_rate': 5.3638702017602004e-06, 'epoch': 1.59}
53%|█████▎ | 6099/11526 [1:03:49<55:41, 1.62it/s] 53%|█████▎ | 6100/11526 [1:03:50<55:37, 1.63it/s] {'loss': 0.2366, 'grad_norm': 0.665497362613678, 'learning_rate': 5.362359887892471e-06, 'epoch': 1.59}
53%|█████▎ | 6100/11526 [1:03:50<55:37, 1.63it/s] 53%|█████▎ | 6101/11526 [1:03:50<55:36, 1.63it/s] {'loss': 0.1918, 'grad_norm': 0.4898964464664459, 'learning_rate': 5.36084954078703e-06, 'epoch': 1.59}
53%|█████▎ | 6101/11526 [1:03:51<55:36, 1.63it/s] 53%|█████▎ | 6102/11526 [1:03:51<55:35, 1.63it/s] {'loss': 0.1835, 'grad_norm': 0.4935327470302582, 'learning_rate': 5.359339160582417e-06, 'epoch': 1.59}
53%|█████▎ | 6102/11526 [1:03:51<55:35, 1.63it/s] 53%|█████▎ | 6103/11526 [1:03:52<55:37, 1.62it/s] {'loss': 0.176, 'grad_norm': 0.4900496304035187, 'learning_rate': 5.35782874741717e-06, 'epoch': 1.59}
53%|█████▎ | 6103/11526 [1:03:52<55:37, 1.62it/s] 53%|█████▎ | 6104/11526 [1:03:52<55:38, 1.62it/s] {'loss': 0.1842, 'grad_norm': 0.45543140172958374, 'learning_rate': 5.3563183014298346e-06, 'epoch': 1.59}
53%|█████▎ | 6104/11526 [1:03:52<55:38, 1.62it/s] 53%|█████▎ | 6105/11526 [1:03:53<55:36, 1.62it/s] {'loss': 0.1568, 'grad_norm': 0.5730137825012207, 'learning_rate': 5.354807822758957e-06, 'epoch': 1.59}
53%|█████▎ | 6105/11526 [1:03:53<55:36, 1.62it/s] 53%|█████▎ | 6106/11526 [1:03:54<55:37, 1.62it/s] {'loss': 0.1533, 'grad_norm': 0.48920124769210815, 'learning_rate': 5.353297311543089e-06, 'epoch': 1.59}
53%|█████▎ | 6106/11526 [1:03:54<55:37, 1.62it/s] 53%|█████▎ | 6107/11526 [1:03:54<55:35, 1.62it/s] {'loss': 0.1853, 'grad_norm': 0.5353995561599731, 'learning_rate': 5.351786767920779e-06, 'epoch': 1.59}
53%|█████▎ | 6107/11526 [1:03:54<55:35, 1.62it/s] 53%|█████▎ | 6108/11526 [1:03:55<55:32, 1.63it/s] {'loss': 0.1997, 'grad_norm': 0.6196042895317078, 'learning_rate': 5.350276192030587e-06, 'epoch': 1.59}
53%|█████▎ | 6108/11526 [1:03:55<55:32, 1.63it/s] 53%|█████▎ | 6109/11526 [1:03:55<55:32, 1.63it/s] {'loss': 0.1884, 'grad_norm': 0.4675016403198242, 'learning_rate': 5.348765584011068e-06, 'epoch': 1.59}
53%|█████▎ | 6109/11526 [1:03:55<55:32, 1.63it/s] 53%|█████▎ | 6110/11526 [1:03:56<55:30, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.6121213436126709, 'learning_rate': 5.347254944000787e-06, 'epoch': 1.59}
53%|█████▎ | 6110/11526 [1:03:56<55:30, 1.63it/s] 53%|█████▎ | 6111/11526 [1:03:57<55:33, 1.62it/s] {'loss': 0.2467, 'grad_norm': 0.5053110718727112, 'learning_rate': 5.345744272138307e-06, 'epoch': 1.59}
53%|█████▎ | 6111/11526 [1:03:57<55:33, 1.62it/s] 53%|█████▎ | 6112/11526 [1:03:57<55:31, 1.63it/s] {'loss': 0.3054, 'grad_norm': 0.8427198529243469, 'learning_rate': 5.344233568562193e-06, 'epoch': 1.59}
53%|█████▎ | 6112/11526 [1:03:57<55:31, 1.63it/s] 53%|█████▎ | 6113/11526 [1:03:58<55:28, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.5351673364639282, 'learning_rate': 5.342722833411018e-06, 'epoch': 1.59}
53%|█████▎ | 6113/11526 [1:03:58<55:28, 1.63it/s] 53%|█████▎ | 6114/11526 [1:03:58<55:29, 1.63it/s] {'loss': 0.1655, 'grad_norm': 0.4275144338607788, 'learning_rate': 5.341212066823356e-06, 'epoch': 1.59}
53%|█████▎ | 6114/11526 [1:03:59<55:29, 1.63it/s] 53%|█████▎ | 6115/11526 [1:03:59<55:27, 1.63it/s] {'loss': 0.1844, 'grad_norm': 0.46066731214523315, 'learning_rate': 5.339701268937781e-06, 'epoch': 1.59}
53%|█████▎ | 6115/11526 [1:03:59<55:27, 1.63it/s] 53%|█████▎ | 6116/11526 [1:04:00<55:36, 1.62it/s] {'loss': 0.2059, 'grad_norm': 0.5869001150131226, 'learning_rate': 5.3381904398928715e-06, 'epoch': 1.59}
53%|█████▎ | 6116/11526 [1:04:00<55:36, 1.62it/s] 53%|█████▎ | 6117/11526 [1:04:00<55:31, 1.62it/s] {'loss': 0.1681, 'grad_norm': 0.5067296624183655, 'learning_rate': 5.336679579827213e-06, 'epoch': 1.59}
53%|█████▎ | 6117/11526 [1:04:00<55:31, 1.62it/s] 53%|█████▎ | 6118/11526 [1:04:01<55:27, 1.63it/s] {'loss': 0.1733, 'grad_norm': 0.5149413347244263, 'learning_rate': 5.335168688879386e-06, 'epoch': 1.59}
53%|█████▎ | 6118/11526 [1:04:01<55:27, 1.63it/s] 53%|█████▎ | 6119/11526 [1:04:01<55:24, 1.63it/s] {'loss': 0.1962, 'grad_norm': 0.5513718128204346, 'learning_rate': 5.333657767187982e-06, 'epoch': 1.59}
53%|█████▎ | 6119/11526 [1:04:02<55:24, 1.63it/s] 53%|█████▎ | 6120/11526 [1:04:02<55:22, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.6196323037147522, 'learning_rate': 5.332146814891588e-06, 'epoch': 1.59}
53%|█████▎ | 6120/11526 [1:04:02<55:22, 1.63it/s] 53%|█████▎ | 6121/11526 [1:04:03<55:23, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.5202468633651733, 'learning_rate': 5.330635832128796e-06, 'epoch': 1.59}
53%|█████▎ | 6121/11526 [1:04:03<55:23, 1.63it/s] 53%|█████▎ | 6122/11526 [1:04:03<55:21, 1.63it/s] {'loss': 0.2416, 'grad_norm': 0.5524104237556458, 'learning_rate': 5.329124819038209e-06, 'epoch': 1.59}
53%|█████▎ | 6122/11526 [1:04:03<55:21, 1.63it/s] 53%|█████▎ | 6123/11526 [1:04:04<55:18, 1.63it/s] {'loss': 0.187, 'grad_norm': 0.5117430686950684, 'learning_rate': 5.3276137757584165e-06, 'epoch': 1.59}
53%|█████▎ | 6123/11526 [1:04:04<55:18, 1.63it/s] 53%|█████▎ | 6124/11526 [1:04:05<55:21, 1.63it/s] {'loss': 0.2521, 'grad_norm': 0.6945081353187561, 'learning_rate': 5.3261027024280265e-06, 'epoch': 1.59}
53%|█████▎ | 6124/11526 [1:04:05<55:21, 1.63it/s] 53%|█████▎ | 6125/11526 [1:04:05<55:19, 1.63it/s] {'loss': 0.1839, 'grad_norm': 0.48899221420288086, 'learning_rate': 5.32459159918564e-06, 'epoch': 1.59}
53%|█████▎ | 6125/11526 [1:04:05<55:19, 1.63it/s] 53%|█████▎ | 6126/11526 [1:04:06<55:21, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.5096535682678223, 'learning_rate': 5.323080466169867e-06, 'epoch': 1.59}
53%|█████▎ | 6126/11526 [1:04:06<55:21, 1.63it/s] 53%|█████▎ | 6127/11526 [1:04:06<55:18, 1.63it/s] {'loss': 0.2072, 'grad_norm': 0.5164393782615662, 'learning_rate': 5.321569303519315e-06, 'epoch': 1.59}
53%|█████▎ | 6127/11526 [1:04:07<55:18, 1.63it/s] 53%|█████▎ | 6128/11526 [1:04:07<55:17, 1.63it/s] {'loss': 0.1915, 'grad_norm': 0.506702721118927, 'learning_rate': 5.320058111372596e-06, 'epoch': 1.6}
53%|█████▎ | 6128/11526 [1:04:07<55:17, 1.63it/s] 53%|█████▎ | 6129/11526 [1:04:08<55:15, 1.63it/s] {'loss': 0.1555, 'grad_norm': 0.4353031516075134, 'learning_rate': 5.318546889868327e-06, 'epoch': 1.6}
53%|█████▎ | 6129/11526 [1:04:08<55:15, 1.63it/s] 53%|█████▎ | 6130/11526 [1:04:08<55:13, 1.63it/s] {'loss': 0.1656, 'grad_norm': 0.5611482858657837, 'learning_rate': 5.3170356391451225e-06, 'epoch': 1.6}
53%|█████▎ | 6130/11526 [1:04:08<55:13, 1.63it/s] 53%|█████▎ | 6131/11526 [1:04:09<55:19, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.4100019037723541, 'learning_rate': 5.315524359341608e-06, 'epoch': 1.6}
53%|█████▎ | 6131/11526 [1:04:09<55:19, 1.63it/s] 53%|█████▎ | 6132/11526 [1:04:09<55:17, 1.63it/s] {'loss': 0.1481, 'grad_norm': 0.44049757719039917, 'learning_rate': 5.314013050596402e-06, 'epoch': 1.6}
53%|█████▎ | 6132/11526 [1:04:10<55:17, 1.63it/s] 53%|█████▎ | 6133/11526 [1:04:10<55:15, 1.63it/s] {'loss': 0.1977, 'grad_norm': 0.5415844917297363, 'learning_rate': 5.312501713048134e-06, 'epoch': 1.6}
53%|█████▎ | 6133/11526 [1:04:10<55:15, 1.63it/s] 53%|█████▎ | 6134/11526 [1:04:11<55:13, 1.63it/s] {'loss': 0.1891, 'grad_norm': 0.5130942463874817, 'learning_rate': 5.310990346835429e-06, 'epoch': 1.6}
53%|█████▎ | 6134/11526 [1:04:11<55:13, 1.63it/s] 53%|█████▎ | 6135/11526 [1:04:11<55:12, 1.63it/s] {'loss': 0.2092, 'grad_norm': 0.6282849907875061, 'learning_rate': 5.309478952096921e-06, 'epoch': 1.6}
53%|█████▎ | 6135/11526 [1:04:11<55:12, 1.63it/s] 53%|█████▎ | 6136/11526 [1:04:12<55:17, 1.62it/s] {'loss': 0.2045, 'grad_norm': 0.514743983745575, 'learning_rate': 5.3079675289712425e-06, 'epoch': 1.6}
53%|█████▎ | 6136/11526 [1:04:12<55:17, 1.62it/s] 53%|█████▎ | 6137/11526 [1:04:13<55:14, 1.63it/s] {'loss': 0.2172, 'grad_norm': 0.5420757532119751, 'learning_rate': 5.306456077597031e-06, 'epoch': 1.6}
53%|█████▎ | 6137/11526 [1:04:13<55:14, 1.63it/s] 53%|█████▎ | 6138/11526 [1:04:13<55:12, 1.63it/s] {'loss': 0.1997, 'grad_norm': 0.5587872862815857, 'learning_rate': 5.304944598112923e-06, 'epoch': 1.6}
53%|█████▎ | 6138/11526 [1:04:13<55:12, 1.63it/s] 53%|█████▎ | 6139/11526 [1:04:14<55:14, 1.63it/s] {'loss': 0.2036, 'grad_norm': 0.5504013299942017, 'learning_rate': 5.303433090657561e-06, 'epoch': 1.6}
53%|█████▎ | 6139/11526 [1:04:14<55:14, 1.63it/s] 53%|█████▎ | 6140/11526 [1:04:14<55:12, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.5267800688743591, 'learning_rate': 5.301921555369591e-06, 'epoch': 1.6}
53%|█████▎ | 6140/11526 [1:04:15<55:12, 1.63it/s] 53%|█████▎ | 6141/11526 [1:04:15<55:14, 1.62it/s] {'loss': 0.2228, 'grad_norm': 0.6128795742988586, 'learning_rate': 5.300409992387658e-06, 'epoch': 1.6}
53%|█████▎ | 6141/11526 [1:04:15<55:14, 1.62it/s] 53%|█████▎ | 6142/11526 [1:04:16<55:11, 1.63it/s] {'loss': 0.2034, 'grad_norm': 0.5815759301185608, 'learning_rate': 5.29889840185041e-06, 'epoch': 1.6}
53%|█████▎ | 6142/11526 [1:04:16<55:11, 1.63it/s] 53%|█████▎ | 6143/11526 [1:04:16<55:07, 1.63it/s] {'loss': 0.2051, 'grad_norm': 0.5431367754936218, 'learning_rate': 5.297386783896501e-06, 'epoch': 1.6}
53%|█████▎ | 6143/11526 [1:04:16<55:07, 1.63it/s] 53%|█████▎ | 6144/11526 [1:04:17<55:08, 1.63it/s] {'loss': 0.272, 'grad_norm': 0.6413703560829163, 'learning_rate': 5.2958751386645835e-06, 'epoch': 1.6}
53%|█████▎ | 6144/11526 [1:04:17<55:08, 1.63it/s] 53%|█████▎ | 6145/11526 [1:04:17<55:08, 1.63it/s] {'loss': 0.2265, 'grad_norm': 0.5723989009857178, 'learning_rate': 5.2943634662933156e-06, 'epoch': 1.6}
53%|█████▎ | 6145/11526 [1:04:18<55:08, 1.63it/s] 53%|█████▎ | 6146/11526 [1:04:18<55:11, 1.62it/s] {'loss': 0.1578, 'grad_norm': 0.4600696265697479, 'learning_rate': 5.2928517669213545e-06, 'epoch': 1.6}
53%|█████▎ | 6146/11526 [1:04:18<55:11, 1.62it/s] 53%|█████▎ | 6147/11526 [1:04:19<55:09, 1.63it/s] {'loss': 0.1786, 'grad_norm': 0.48907825350761414, 'learning_rate': 5.291340040687364e-06, 'epoch': 1.6}
53%|█████▎ | 6147/11526 [1:04:19<55:09, 1.63it/s] 53%|█████▎ | 6148/11526 [1:04:19<55:07, 1.63it/s] {'loss': 0.2375, 'grad_norm': 0.57337486743927, 'learning_rate': 5.289828287730006e-06, 'epoch': 1.6}
53%|█████▎ | 6148/11526 [1:04:19<55:07, 1.63it/s] 53%|█████▎ | 6149/11526 [1:04:20<55:09, 1.62it/s] {'loss': 0.1729, 'grad_norm': 0.48889416456222534, 'learning_rate': 5.2883165081879515e-06, 'epoch': 1.6}
53%|█████▎ | 6149/11526 [1:04:20<55:09, 1.62it/s] 53%|█████▎ | 6150/11526 [1:04:21<55:07, 1.63it/s] {'loss': 0.1856, 'grad_norm': 0.5067185163497925, 'learning_rate': 5.286804702199863e-06, 'epoch': 1.6}
53%|█████▎ | 6150/11526 [1:04:21<55:07, 1.63it/s] 53%|█████▎ | 6151/11526 [1:04:21<55:08, 1.62it/s] {'loss': 0.2667, 'grad_norm': 0.7357701063156128, 'learning_rate': 5.285292869904417e-06, 'epoch': 1.6}
53%|█████▎ | 6151/11526 [1:04:21<55:08, 1.62it/s] 53%|█████▎ | 6152/11526 [1:04:22<55:05, 1.63it/s] {'loss': 0.1848, 'grad_norm': 0.5012803673744202, 'learning_rate': 5.283781011440285e-06, 'epoch': 1.6}
53%|█████▎ | 6152/11526 [1:04:22<55:05, 1.63it/s] 53%|█████▎ | 6153/11526 [1:04:22<55:04, 1.63it/s] {'loss': 0.1936, 'grad_norm': 0.5244895219802856, 'learning_rate': 5.282269126946145e-06, 'epoch': 1.6}
53%|█████▎ | 6153/11526 [1:04:23<55:04, 1.63it/s] 53%|█████▎ | 6154/11526 [1:04:23<55:03, 1.63it/s] {'loss': 0.1747, 'grad_norm': 0.5955398082733154, 'learning_rate': 5.280757216560673e-06, 'epoch': 1.6}
53%|█████▎ | 6154/11526 [1:04:23<55:03, 1.63it/s] 53%|█████▎ | 6155/11526 [1:04:24<55:00, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.4931245446205139, 'learning_rate': 5.2792452804225535e-06, 'epoch': 1.6}
53%|█████▎ | 6155/11526 [1:04:24<55:00, 1.63it/s] 53%|█████▎ | 6156/11526 [1:04:24<54:59, 1.63it/s] {'loss': 0.2872, 'grad_norm': 0.6373463869094849, 'learning_rate': 5.2777333186704675e-06, 'epoch': 1.6}
53%|█████▎ | 6156/11526 [1:04:24<54:59, 1.63it/s] 53%|█████▎ | 6157/11526 [1:04:25<54:57, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.4390556514263153, 'learning_rate': 5.276221331443101e-06, 'epoch': 1.6}
53%|█████▎ | 6157/11526 [1:04:25<54:57, 1.63it/s] 53%|█████▎ | 6158/11526 [1:04:25<54:58, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.4860963821411133, 'learning_rate': 5.274709318879142e-06, 'epoch': 1.6}
53%|█████▎ | 6158/11526 [1:04:26<54:58, 1.63it/s] 53%|█████▎ | 6159/11526 [1:04:26<54:59, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.4785040020942688, 'learning_rate': 5.273197281117281e-06, 'epoch': 1.6}
53%|█████▎ | 6159/11526 [1:04:26<54:59, 1.63it/s] 53%|█████▎ | 6160/11526 [1:04:27<54:56, 1.63it/s] {'loss': 0.2055, 'grad_norm': 0.5050296783447266, 'learning_rate': 5.271685218296213e-06, 'epoch': 1.6}
53%|█████▎ | 6160/11526 [1:04:27<54:56, 1.63it/s] 53%|█████▎ | 6161/11526 [1:04:27<55:00, 1.63it/s] {'loss': 0.2224, 'grad_norm': 0.6189011335372925, 'learning_rate': 5.270173130554629e-06, 'epoch': 1.6}
53%|█████▎ | 6161/11526 [1:04:27<55:00, 1.63it/s] 53%|█████▎ | 6162/11526 [1:04:28<54:57, 1.63it/s] {'loss': 0.2117, 'grad_norm': 0.6137133240699768, 'learning_rate': 5.268661018031229e-06, 'epoch': 1.6}
53%|█████▎ | 6162/11526 [1:04:28<54:57, 1.63it/s] 53%|█████▎ | 6163/11526 [1:04:29<54:54, 1.63it/s] {'loss': 0.2274, 'grad_norm': 0.639336884021759, 'learning_rate': 5.267148880864711e-06, 'epoch': 1.6}
53%|█████▎ | 6163/11526 [1:04:29<54:54, 1.63it/s] 53%|█████▎ | 6164/11526 [1:04:29<54:54, 1.63it/s] {'loss': 0.1694, 'grad_norm': 0.5013244152069092, 'learning_rate': 5.265636719193778e-06, 'epoch': 1.6}
53%|█████▎ | 6164/11526 [1:04:29<54:54, 1.63it/s] 53%|█████▎ | 6165/11526 [1:04:30<54:53, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.4482991695404053, 'learning_rate': 5.264124533157135e-06, 'epoch': 1.6}
53%|█████▎ | 6165/11526 [1:04:30<54:53, 1.63it/s] 53%|█████▎ | 6166/11526 [1:04:30<54:55, 1.63it/s] {'loss': 0.3016, 'grad_norm': 0.6799830198287964, 'learning_rate': 5.262612322893486e-06, 'epoch': 1.6}
53%|█████▎ | 6166/11526 [1:04:31<54:55, 1.63it/s] 54%|█████▎ | 6167/11526 [1:04:31<54:53, 1.63it/s] {'loss': 0.2389, 'grad_norm': 0.6024277806282043, 'learning_rate': 5.2611000885415385e-06, 'epoch': 1.61}
54%|█████▎ | 6167/11526 [1:04:31<54:53, 1.63it/s] 54%|█████▎ | 6168/11526 [1:04:32<54:52, 1.63it/s] {'loss': 0.2751, 'grad_norm': 0.6232792735099792, 'learning_rate': 5.259587830240008e-06, 'epoch': 1.61}
54%|█████▎ | 6168/11526 [1:04:32<54:52, 1.63it/s] 54%|█████▎ | 6169/11526 [1:04:32<54:52, 1.63it/s] {'loss': 0.2199, 'grad_norm': 0.4971659481525421, 'learning_rate': 5.258075548127605e-06, 'epoch': 1.61}
54%|█████▎ | 6169/11526 [1:04:32<54:52, 1.63it/s] 54%|█████▎ | 6170/11526 [1:04:33<54:51, 1.63it/s] {'loss': 0.1468, 'grad_norm': 0.42635589838027954, 'learning_rate': 5.2565632423430425e-06, 'epoch': 1.61}
54%|█████▎ | 6170/11526 [1:04:33<54:51, 1.63it/s] 54%|█████▎ | 6171/11526 [1:04:33<54:51, 1.63it/s] {'loss': 0.2332, 'grad_norm': 0.5888078808784485, 'learning_rate': 5.255050913025041e-06, 'epoch': 1.61}
54%|█████▎ | 6171/11526 [1:04:34<54:51, 1.63it/s] 54%|█████▎ | 6172/11526 [1:04:34<54:50, 1.63it/s] {'loss': 0.2192, 'grad_norm': 0.5499675273895264, 'learning_rate': 5.253538560312318e-06, 'epoch': 1.61}
54%|█████▎ | 6172/11526 [1:04:34<54:50, 1.63it/s] 54%|█████▎ | 6173/11526 [1:04:35<54:49, 1.63it/s] {'loss': 0.1947, 'grad_norm': 0.5611737370491028, 'learning_rate': 5.252026184343595e-06, 'epoch': 1.61}
54%|█████▎ | 6173/11526 [1:04:35<54:49, 1.63it/s] 54%|█████▎ | 6174/11526 [1:04:35<54:49, 1.63it/s] {'loss': 0.2084, 'grad_norm': 0.5497708320617676, 'learning_rate': 5.2505137852575974e-06, 'epoch': 1.61}
54%|█████▎ | 6174/11526 [1:04:35<54:49, 1.63it/s] 54%|█████▎ | 6175/11526 [1:04:36<54:48, 1.63it/s] {'loss': 0.2024, 'grad_norm': 0.5908557772636414, 'learning_rate': 5.249001363193049e-06, 'epoch': 1.61}
54%|█████▎ | 6175/11526 [1:04:36<54:48, 1.63it/s] 54%|█████▎ | 6176/11526 [1:04:37<54:46, 1.63it/s] {'loss': 0.2189, 'grad_norm': 0.5322055220603943, 'learning_rate': 5.24748891828868e-06, 'epoch': 1.61}
54%|█████▎ | 6176/11526 [1:04:37<54:46, 1.63it/s] 54%|█████▎ | 6177/11526 [1:04:37<54:46, 1.63it/s] {'loss': 0.1877, 'grad_norm': 0.4852859377861023, 'learning_rate': 5.245976450683219e-06, 'epoch': 1.61}
54%|█████▎ | 6177/11526 [1:04:37<54:46, 1.63it/s] 54%|█████▎ | 6178/11526 [1:04:38<54:44, 1.63it/s] {'loss': 0.2266, 'grad_norm': 0.7178403735160828, 'learning_rate': 5.244463960515398e-06, 'epoch': 1.61}
54%|█████▎ | 6178/11526 [1:04:38<54:44, 1.63it/s] 54%|█████▎ | 6179/11526 [1:04:38<54:43, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.4847533702850342, 'learning_rate': 5.242951447923952e-06, 'epoch': 1.61}
54%|█████▎ | 6179/11526 [1:04:39<54:43, 1.63it/s] 54%|█████▎ | 6180/11526 [1:04:39<54:42, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.46311038732528687, 'learning_rate': 5.241438913047616e-06, 'epoch': 1.61}
54%|█████▎ | 6180/11526 [1:04:39<54:42, 1.63it/s] 54%|█████▎ | 6181/11526 [1:04:40<54:48, 1.63it/s] {'loss': 0.2571, 'grad_norm': 0.6977236270904541, 'learning_rate': 5.2399263560251305e-06, 'epoch': 1.61}
54%|█████▎ | 6181/11526 [1:04:40<54:48, 1.63it/s] 54%|█████▎ | 6182/11526 [1:04:40<54:45, 1.63it/s] {'loss': 0.1933, 'grad_norm': 0.4898710250854492, 'learning_rate': 5.238413776995234e-06, 'epoch': 1.61}
54%|█████▎ | 6182/11526 [1:04:40<54:45, 1.63it/s] 54%|█████▎ | 6183/11526 [1:04:41<54:44, 1.63it/s] {'loss': 0.2439, 'grad_norm': 0.5600656867027283, 'learning_rate': 5.236901176096668e-06, 'epoch': 1.61}
54%|█████▎ | 6183/11526 [1:04:41<54:44, 1.63it/s] 54%|█████▎ | 6184/11526 [1:04:41<54:42, 1.63it/s] {'loss': 0.2172, 'grad_norm': 0.6090371012687683, 'learning_rate': 5.235388553468179e-06, 'epoch': 1.61}
54%|█████▎ | 6184/11526 [1:04:42<54:42, 1.63it/s] 54%|█████▎ | 6185/11526 [1:04:42<54:41, 1.63it/s] {'loss': 0.2554, 'grad_norm': 0.5866860151290894, 'learning_rate': 5.233875909248513e-06, 'epoch': 1.61}
54%|█████▎ | 6185/11526 [1:04:42<54:41, 1.63it/s] 54%|█████▎ | 6186/11526 [1:04:43<54:43, 1.63it/s] {'loss': 0.2, 'grad_norm': 0.5335956811904907, 'learning_rate': 5.232363243576419e-06, 'epoch': 1.61}
54%|█████▎ | 6186/11526 [1:04:43<54:43, 1.63it/s] 54%|█████▎ | 6187/11526 [1:04:43<54:41, 1.63it/s] {'loss': 0.1746, 'grad_norm': 0.5363844633102417, 'learning_rate': 5.2308505565906445e-06, 'epoch': 1.61}
54%|█████▎ | 6187/11526 [1:04:43<54:41, 1.63it/s] 54%|█████▎ | 6188/11526 [1:04:44<54:38, 1.63it/s] {'loss': 0.2105, 'grad_norm': 0.5478374361991882, 'learning_rate': 5.229337848429944e-06, 'epoch': 1.61}
54%|█████▎ | 6188/11526 [1:04:44<54:38, 1.63it/s] 54%|█████▎ | 6189/11526 [1:04:45<54:39, 1.63it/s] {'loss': 0.1417, 'grad_norm': 0.5001686811447144, 'learning_rate': 5.227825119233072e-06, 'epoch': 1.61}
54%|█████▎ | 6189/11526 [1:04:45<54:39, 1.63it/s] 54%|█████▎ | 6190/11526 [1:04:45<54:39, 1.63it/s] {'loss': 0.2051, 'grad_norm': 0.5315244793891907, 'learning_rate': 5.226312369138782e-06, 'epoch': 1.61}
54%|█████▎ | 6190/11526 [1:04:45<54:39, 1.63it/s] 54%|█████▎ | 6191/11526 [1:04:46<54:39, 1.63it/s] {'loss': 0.187, 'grad_norm': 0.545683741569519, 'learning_rate': 5.2247995982858355e-06, 'epoch': 1.61}
54%|█████▎ | 6191/11526 [1:04:46<54:39, 1.63it/s] 54%|█████▎ | 6192/11526 [1:04:46<54:39, 1.63it/s] {'loss': 0.2225, 'grad_norm': 0.5263987183570862, 'learning_rate': 5.223286806812989e-06, 'epoch': 1.61}
54%|█████▎ | 6192/11526 [1:04:47<54:39, 1.63it/s] 54%|█████▎ | 6193/11526 [1:04:47<54:37, 1.63it/s] {'loss': 0.2722, 'grad_norm': 0.5456870794296265, 'learning_rate': 5.221773994859008e-06, 'epoch': 1.61}
54%|█████▎ | 6193/11526 [1:04:47<54:37, 1.63it/s] 54%|█████▎ | 6194/11526 [1:04:48<54:37, 1.63it/s] {'loss': 0.2585, 'grad_norm': 0.6286144852638245, 'learning_rate': 5.220261162562653e-06, 'epoch': 1.61}
54%|█████▎ | 6194/11526 [1:04:48<54:37, 1.63it/s] 54%|█████▎ | 6195/11526 [1:04:48<54:36, 1.63it/s] {'loss': 0.2468, 'grad_norm': 0.6765317916870117, 'learning_rate': 5.218748310062692e-06, 'epoch': 1.61}
54%|█████▎ | 6195/11526 [1:04:48<54:36, 1.63it/s] 54%|█████▍ | 6196/11526 [1:04:49<54:39, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.5300513505935669, 'learning_rate': 5.2172354374978905e-06, 'epoch': 1.61}
54%|█████▍ | 6196/11526 [1:04:49<54:39, 1.63it/s] 54%|█████▍ | 6197/11526 [1:04:49<54:36, 1.63it/s] {'loss': 0.2271, 'grad_norm': 0.672033965587616, 'learning_rate': 5.215722545007019e-06, 'epoch': 1.61}
54%|█████▍ | 6197/11526 [1:04:50<54:36, 1.63it/s] 54%|█████▍ | 6198/11526 [1:04:50<54:34, 1.63it/s] {'loss': 0.1318, 'grad_norm': 0.37686389684677124, 'learning_rate': 5.214209632728849e-06, 'epoch': 1.61}
54%|█████▍ | 6198/11526 [1:04:50<54:34, 1.63it/s] 54%|█████▍ | 6199/11526 [1:04:51<54:33, 1.63it/s] {'loss': 0.188, 'grad_norm': 0.5622206926345825, 'learning_rate': 5.2126967008021525e-06, 'epoch': 1.61}
54%|█████▍ | 6199/11526 [1:04:51<54:33, 1.63it/s] 54%|█████▍ | 6200/11526 [1:04:51<54:31, 1.63it/s] {'loss': 0.2239, 'grad_norm': 0.5569396018981934, 'learning_rate': 5.2111837493657035e-06, 'epoch': 1.61}
54%|█████▍ | 6200/11526 [1:04:51<54:31, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5508691668510437, 'eval_runtime': 1.9552, 'eval_samples_per_second': 102.289, 'eval_steps_per_second': 6.649, 'epoch': 1.61}
54%|█████▍ | 6200/11526 [1:04:53<54:31, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 54%|█████▍ | 6201/11526 [1:04:54<1:46:46, 1.20s/it] {'loss': 0.1675, 'grad_norm': 0.502144455909729, 'learning_rate': 5.209670778558279e-06, 'epoch': 1.61}
54%|█████▍ | 6201/11526 [1:04:54<1:46:46, 1.20s/it] 54%|█████▍ | 6202/11526 [1:04:54<1:31:04, 1.03s/it] {'loss': 0.1881, 'grad_norm': 0.505460798740387, 'learning_rate': 5.20815778851866e-06, 'epoch': 1.61}
54%|█████▍ | 6202/11526 [1:04:55<1:31:04, 1.03s/it] 54%|█████▍ | 6203/11526 [1:04:55<1:20:03, 1.11it/s] {'loss': 0.2088, 'grad_norm': 0.5857293009757996, 'learning_rate': 5.206644779385622e-06, 'epoch': 1.61}
54%|█████▍ | 6203/11526 [1:04:55<1:20:03, 1.11it/s] 54%|█████▍ | 6204/11526 [1:04:56<1:12:22, 1.23it/s] {'loss': 0.2365, 'grad_norm': 0.7453268766403198, 'learning_rate': 5.205131751297949e-06, 'epoch': 1.61}
54%|█████▍ | 6204/11526 [1:04:56<1:12:22, 1.23it/s] 54%|█████▍ | 6205/11526 [1:04:56<1:07:02, 1.32it/s] {'loss': 0.1771, 'grad_norm': 0.5236985087394714, 'learning_rate': 5.2036187043944265e-06, 'epoch': 1.62}
54%|█████▍ | 6205/11526 [1:04:56<1:07:02, 1.32it/s] 54%|█████▍ | 6206/11526 [1:04:57<1:03:13, 1.40it/s] {'loss': 0.1972, 'grad_norm': 0.6098158359527588, 'learning_rate': 5.202105638813837e-06, 'epoch': 1.62}
54%|█████▍ | 6206/11526 [1:04:57<1:03:13, 1.40it/s] 54%|█████▍ | 6207/11526 [1:04:58<1:00:35, 1.46it/s] {'loss': 0.2502, 'grad_norm': 0.6026996374130249, 'learning_rate': 5.2005925546949665e-06, 'epoch': 1.62}
54%|█████▍ | 6207/11526 [1:04:58<1:00:35, 1.46it/s] 54%|█████▍ | 6208/11526 [1:04:58<58:45, 1.51it/s] {'loss': 0.171, 'grad_norm': 0.5228632092475891, 'learning_rate': 5.199079452176608e-06, 'epoch': 1.62}
54%|█████▍ | 6208/11526 [1:04:58<58:45, 1.51it/s] 54%|█████▍ | 6209/11526 [1:04:59<57:26, 1.54it/s] {'loss': 0.1612, 'grad_norm': 0.4655211567878723, 'learning_rate': 5.19756633139755e-06, 'epoch': 1.62}
54%|█████▍ | 6209/11526 [1:04:59<57:26, 1.54it/s] 54%|█████▍ | 6210/11526 [1:04:59<56:31, 1.57it/s] {'loss': 0.1091, 'grad_norm': 0.345492959022522, 'learning_rate': 5.196053192496583e-06, 'epoch': 1.62}
54%|█████▍ | 6210/11526 [1:05:00<56:31, 1.57it/s] 54%|█████▍ | 6211/11526 [1:05:00<55:52, 1.59it/s] {'loss': 0.2713, 'grad_norm': 0.6197485327720642, 'learning_rate': 5.194540035612501e-06, 'epoch': 1.62}
54%|█████▍ | 6211/11526 [1:05:00<55:52, 1.59it/s] 54%|█████▍ | 6212/11526 [1:05:01<55:25, 1.60it/s] {'loss': 0.1772, 'grad_norm': 0.6451025009155273, 'learning_rate': 5.1930268608841e-06, 'epoch': 1.62}
54%|█████▍ | 6212/11526 [1:05:01<55:25, 1.60it/s] 54%|█████▍ | 6213/11526 [1:05:01<55:05, 1.61it/s] {'loss': 0.1862, 'grad_norm': 0.5392288565635681, 'learning_rate': 5.191513668450178e-06, 'epoch': 1.62}
54%|█████▍ | 6213/11526 [1:05:01<55:05, 1.61it/s] 54%|█████▍ | 6214/11526 [1:05:02<54:50, 1.61it/s] {'loss': 0.1638, 'grad_norm': 0.44621115922927856, 'learning_rate': 5.190000458449532e-06, 'epoch': 1.62}
54%|█████▍ | 6214/11526 [1:05:02<54:50, 1.61it/s] 54%|█████▍ | 6215/11526 [1:05:02<54:43, 1.62it/s] {'loss': 0.1537, 'grad_norm': 0.46669310331344604, 'learning_rate': 5.188487231020962e-06, 'epoch': 1.62}
54%|█████▍ | 6215/11526 [1:05:03<54:43, 1.62it/s] 54%|█████▍ | 6216/11526 [1:05:03<54:38, 1.62it/s] {'loss': 0.1868, 'grad_norm': 0.6665382385253906, 'learning_rate': 5.186973986303272e-06, 'epoch': 1.62}
54%|█████▍ | 6216/11526 [1:05:03<54:38, 1.62it/s] 54%|█████▍ | 6217/11526 [1:05:04<54:36, 1.62it/s] {'loss': 0.2072, 'grad_norm': 0.6206576824188232, 'learning_rate': 5.185460724435265e-06, 'epoch': 1.62}
54%|█████▍ | 6217/11526 [1:05:04<54:36, 1.62it/s] 54%|█████▍ | 6218/11526 [1:05:04<54:30, 1.62it/s] {'loss': 0.2101, 'grad_norm': 0.6117464900016785, 'learning_rate': 5.183947445555744e-06, 'epoch': 1.62}
54%|█████▍ | 6218/11526 [1:05:04<54:30, 1.62it/s] 54%|█████▍ | 6219/11526 [1:05:05<54:30, 1.62it/s] {'loss': 0.1716, 'grad_norm': 0.49727290868759155, 'learning_rate': 5.182434149803516e-06, 'epoch': 1.62}
54%|█████▍ | 6219/11526 [1:05:05<54:30, 1.62it/s] 54%|█████▍ | 6220/11526 [1:05:06<54:26, 1.62it/s] {'loss': 0.2576, 'grad_norm': 0.6367467641830444, 'learning_rate': 5.180920837317392e-06, 'epoch': 1.62}
54%|█████▍ | 6220/11526 [1:05:06<54:26, 1.62it/s] 54%|█████▍ | 6221/11526 [1:05:06<54:23, 1.63it/s] {'loss': 0.2318, 'grad_norm': 0.5595157146453857, 'learning_rate': 5.179407508236176e-06, 'epoch': 1.62}
54%|█████▍ | 6221/11526 [1:05:06<54:23, 1.63it/s] 54%|█████▍ | 6222/11526 [1:05:07<54:20, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.4851064085960388, 'learning_rate': 5.177894162698684e-06, 'epoch': 1.62}
54%|█████▍ | 6222/11526 [1:05:07<54:20, 1.63it/s] 54%|█████▍ | 6223/11526 [1:05:07<54:18, 1.63it/s] {'loss': 0.1975, 'grad_norm': 0.5192489624023438, 'learning_rate': 5.176380800843728e-06, 'epoch': 1.62}
54%|█████▍ | 6223/11526 [1:05:08<54:18, 1.63it/s] 54%|█████▍ | 6224/11526 [1:05:08<54:18, 1.63it/s] {'loss': 0.2274, 'grad_norm': 0.5940082669258118, 'learning_rate': 5.174867422810122e-06, 'epoch': 1.62}
54%|█████▍ | 6224/11526 [1:05:08<54:18, 1.63it/s] 54%|█████▍ | 6225/11526 [1:05:09<54:17, 1.63it/s] {'loss': 0.1834, 'grad_norm': 0.48784777522087097, 'learning_rate': 5.17335402873668e-06, 'epoch': 1.62}
54%|█████▍ | 6225/11526 [1:05:09<54:17, 1.63it/s] 54%|█████▍ | 6226/11526 [1:05:09<54:16, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.5149915218353271, 'learning_rate': 5.1718406187622195e-06, 'epoch': 1.62}
54%|█████▍ | 6226/11526 [1:05:09<54:16, 1.63it/s] 54%|█████▍ | 6227/11526 [1:05:10<54:15, 1.63it/s] {'loss': 0.1787, 'grad_norm': 0.5101255774497986, 'learning_rate': 5.170327193025562e-06, 'epoch': 1.62}
54%|█████▍ | 6227/11526 [1:05:10<54:15, 1.63it/s] 54%|█████▍ | 6228/11526 [1:05:10<54:14, 1.63it/s] {'loss': 0.2183, 'grad_norm': 0.6053203344345093, 'learning_rate': 5.168813751665522e-06, 'epoch': 1.62}
54%|█████▍ | 6228/11526 [1:05:11<54:14, 1.63it/s] 54%|█████▍ | 6229/11526 [1:05:11<54:18, 1.63it/s] {'loss': 0.1841, 'grad_norm': 0.5980213284492493, 'learning_rate': 5.1673002948209275e-06, 'epoch': 1.62}
54%|█████▍ | 6229/11526 [1:05:11<54:18, 1.63it/s] 54%|█████▍ | 6230/11526 [1:05:12<54:25, 1.62it/s] {'loss': 0.2292, 'grad_norm': 0.5211341977119446, 'learning_rate': 5.165786822630595e-06, 'epoch': 1.62}
54%|█████▍ | 6230/11526 [1:05:12<54:25, 1.62it/s] 54%|█████▍ | 6231/11526 [1:05:12<54:23, 1.62it/s] {'loss': 0.1824, 'grad_norm': 0.5232373476028442, 'learning_rate': 5.164273335233354e-06, 'epoch': 1.62}
54%|█████▍ | 6231/11526 [1:05:12<54:23, 1.62it/s] 54%|█████▍ | 6232/11526 [1:05:13<54:21, 1.62it/s] {'loss': 0.1866, 'grad_norm': 0.5352303385734558, 'learning_rate': 5.1627598327680276e-06, 'epoch': 1.62}
54%|█████▍ | 6232/11526 [1:05:13<54:21, 1.62it/s] 54%|█████▍ | 6233/11526 [1:05:14<54:16, 1.63it/s] {'loss': 0.1958, 'grad_norm': 0.5668577551841736, 'learning_rate': 5.161246315373443e-06, 'epoch': 1.62}
54%|█████▍ | 6233/11526 [1:05:14<54:16, 1.63it/s] 54%|█████▍ | 6234/11526 [1:05:14<54:18, 1.62it/s] {'loss': 0.1942, 'grad_norm': 0.5836471915245056, 'learning_rate': 5.159732783188427e-06, 'epoch': 1.62}
54%|█████▍ | 6234/11526 [1:05:14<54:18, 1.62it/s] 54%|█████▍ | 6235/11526 [1:05:15<54:15, 1.63it/s] {'loss': 0.2146, 'grad_norm': 0.58753901720047, 'learning_rate': 5.158219236351815e-06, 'epoch': 1.62}
54%|█████▍ | 6235/11526 [1:05:15<54:15, 1.63it/s] 54%|█████▍ | 6236/11526 [1:05:15<54:17, 1.62it/s] {'loss': 0.2165, 'grad_norm': 0.5793477296829224, 'learning_rate': 5.1567056750024315e-06, 'epoch': 1.62}
54%|█████▍ | 6236/11526 [1:05:16<54:17, 1.62it/s] 54%|█████▍ | 6237/11526 [1:05:16<54:14, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.5245561599731445, 'learning_rate': 5.155192099279113e-06, 'epoch': 1.62}
54%|█████▍ | 6237/11526 [1:05:16<54:14, 1.63it/s] 54%|█████▍ | 6238/11526 [1:05:17<54:11, 1.63it/s] {'loss': 0.2198, 'grad_norm': 0.5432213544845581, 'learning_rate': 5.153678509320692e-06, 'epoch': 1.62}
54%|█████▍ | 6238/11526 [1:05:17<54:11, 1.63it/s] 54%|█████▍ | 6239/11526 [1:05:17<54:12, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.6325258612632751, 'learning_rate': 5.1521649052660035e-06, 'epoch': 1.62}
54%|█████▍ | 6239/11526 [1:05:17<54:12, 1.63it/s] 54%|█████▍ | 6240/11526 [1:05:18<54:10, 1.63it/s] {'loss': 0.1877, 'grad_norm': 0.527229368686676, 'learning_rate': 5.150651287253886e-06, 'epoch': 1.62}
54%|█████▍ | 6240/11526 [1:05:18<54:10, 1.63it/s] 54%|█████▍ | 6241/11526 [1:05:18<54:26, 1.62it/s] {'loss': 0.2013, 'grad_norm': 0.5928999781608582, 'learning_rate': 5.149137655423172e-06, 'epoch': 1.62}
54%|█████▍ | 6241/11526 [1:05:19<54:26, 1.62it/s] 54%|█████▍ | 6242/11526 [1:05:19<54:20, 1.62it/s] {'loss': 0.1712, 'grad_norm': 0.5234718918800354, 'learning_rate': 5.147624009912706e-06, 'epoch': 1.62}
54%|█████▍ | 6242/11526 [1:05:19<54:20, 1.62it/s] 54%|█████▍ | 6243/11526 [1:05:20<54:15, 1.62it/s] {'loss': 0.1946, 'grad_norm': 0.5481929183006287, 'learning_rate': 5.146110350861325e-06, 'epoch': 1.62}
54%|█████▍ | 6243/11526 [1:05:20<54:15, 1.62it/s] 54%|█████▍ | 6244/11526 [1:05:20<54:18, 1.62it/s] {'loss': 0.195, 'grad_norm': 0.5410261750221252, 'learning_rate': 5.1445966784078725e-06, 'epoch': 1.63}
54%|█████▍ | 6244/11526 [1:05:20<54:18, 1.62it/s] 54%|█████▍ | 6245/11526 [1:05:21<54:17, 1.62it/s] {'loss': 0.1977, 'grad_norm': 0.5463564395904541, 'learning_rate': 5.1430829926911874e-06, 'epoch': 1.63}
54%|█████▍ | 6245/11526 [1:05:21<54:17, 1.62it/s] 54%|█████▍ | 6246/11526 [1:05:22<54:13, 1.62it/s] {'loss': 0.1764, 'grad_norm': 0.5200499296188354, 'learning_rate': 5.141569293850118e-06, 'epoch': 1.63}
54%|█████▍ | 6246/11526 [1:05:22<54:13, 1.62it/s] 54%|█████▍ | 6247/11526 [1:05:22<54:09, 1.62it/s] {'loss': 0.281, 'grad_norm': 0.6586129069328308, 'learning_rate': 5.140055582023508e-06, 'epoch': 1.63}
54%|█████▍ | 6247/11526 [1:05:22<54:09, 1.62it/s] 54%|█████▍ | 6248/11526 [1:05:23<54:08, 1.62it/s] {'loss': 0.1716, 'grad_norm': 0.48439356684684753, 'learning_rate': 5.138541857350202e-06, 'epoch': 1.63}
54%|█████▍ | 6248/11526 [1:05:23<54:08, 1.62it/s] 54%|█████▍ | 6249/11526 [1:05:23<54:10, 1.62it/s] {'loss': 0.1723, 'grad_norm': 0.5248698592185974, 'learning_rate': 5.137028119969048e-06, 'epoch': 1.63}
54%|█████▍ | 6249/11526 [1:05:24<54:10, 1.62it/s] 54%|█████▍ | 6250/11526 [1:05:24<54:06, 1.63it/s] {'loss': 0.2454, 'grad_norm': 0.6123601198196411, 'learning_rate': 5.135514370018896e-06, 'epoch': 1.63}
54%|█████▍ | 6250/11526 [1:05:24<54:06, 1.63it/s] 54%|█████▍ | 6251/11526 [1:05:25<54:08, 1.62it/s] {'loss': 0.238, 'grad_norm': 0.5734075903892517, 'learning_rate': 5.1340006076385964e-06, 'epoch': 1.63}
54%|█████▍ | 6251/11526 [1:05:25<54:08, 1.62it/s] 54%|█████▍ | 6252/11526 [1:05:25<54:04, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.5011069178581238, 'learning_rate': 5.132486832966997e-06, 'epoch': 1.63}
54%|█████▍ | 6252/11526 [1:05:25<54:04, 1.63it/s] 54%|█████▍ | 6253/11526 [1:05:26<54:02, 1.63it/s] {'loss': 0.2401, 'grad_norm': 0.5456930994987488, 'learning_rate': 5.130973046142951e-06, 'epoch': 1.63}
54%|█████▍ | 6253/11526 [1:05:26<54:02, 1.63it/s] 54%|█████▍ | 6254/11526 [1:05:26<54:03, 1.63it/s] {'loss': 0.2117, 'grad_norm': 0.5508461594581604, 'learning_rate': 5.129459247305312e-06, 'epoch': 1.63}
54%|█████▍ | 6254/11526 [1:05:27<54:03, 1.63it/s] 54%|█████▍ | 6255/11526 [1:05:27<54:00, 1.63it/s] {'loss': 0.1959, 'grad_norm': 0.5371347665786743, 'learning_rate': 5.127945436592936e-06, 'epoch': 1.63}
54%|█████▍ | 6255/11526 [1:05:27<54:00, 1.63it/s] 54%|█████▍ | 6256/11526 [1:05:28<53:58, 1.63it/s] {'loss': 0.2261, 'grad_norm': 0.5836100578308105, 'learning_rate': 5.126431614144674e-06, 'epoch': 1.63}
54%|█████▍ | 6256/11526 [1:05:28<53:58, 1.63it/s] 54%|█████▍ | 6257/11526 [1:05:28<53:58, 1.63it/s] {'loss': 0.2364, 'grad_norm': 0.6192004680633545, 'learning_rate': 5.124917780099386e-06, 'epoch': 1.63}
54%|█████▍ | 6257/11526 [1:05:28<53:58, 1.63it/s] 54%|█████▍ | 6258/11526 [1:05:29<54:02, 1.62it/s] {'loss': 0.212, 'grad_norm': 0.5648044347763062, 'learning_rate': 5.123403934595931e-06, 'epoch': 1.63}
54%|█████▍ | 6258/11526 [1:05:29<54:02, 1.62it/s] 54%|█████▍ | 6259/11526 [1:05:30<54:00, 1.63it/s] {'loss': 0.3183, 'grad_norm': 0.7229530215263367, 'learning_rate': 5.121890077773162e-06, 'epoch': 1.63}
54%|█████▍ | 6259/11526 [1:05:30<54:00, 1.63it/s] 54%|█████▍ | 6260/11526 [1:05:30<53:58, 1.63it/s] {'loss': 0.2374, 'grad_norm': 0.5839351415634155, 'learning_rate': 5.120376209769942e-06, 'epoch': 1.63}
54%|█████▍ | 6260/11526 [1:05:30<53:58, 1.63it/s] 54%|█████▍ | 6261/11526 [1:05:31<53:58, 1.63it/s] {'loss': 0.2227, 'grad_norm': 0.6220995187759399, 'learning_rate': 5.118862330725132e-06, 'epoch': 1.63}
54%|█████▍ | 6261/11526 [1:05:31<53:58, 1.63it/s] 54%|█████▍ | 6262/11526 [1:05:31<53:55, 1.63it/s] {'loss': 0.3423, 'grad_norm': 0.5008871555328369, 'learning_rate': 5.117348440777592e-06, 'epoch': 1.63}
54%|█████▍ | 6262/11526 [1:05:32<53:55, 1.63it/s] 54%|█████▍ | 6263/11526 [1:05:32<53:53, 1.63it/s] {'loss': 0.2182, 'grad_norm': 0.556041419506073, 'learning_rate': 5.115834540066186e-06, 'epoch': 1.63}
54%|█████▍ | 6263/11526 [1:05:32<53:53, 1.63it/s] 54%|█████▍ | 6264/11526 [1:05:33<53:51, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.5095818042755127, 'learning_rate': 5.114320628729777e-06, 'epoch': 1.63}
54%|█████▍ | 6264/11526 [1:05:33<53:51, 1.63it/s] 54%|█████▍ | 6265/11526 [1:05:33<53:49, 1.63it/s] {'loss': 0.1713, 'grad_norm': 0.44777047634124756, 'learning_rate': 5.112806706907227e-06, 'epoch': 1.63}
54%|█████▍ | 6265/11526 [1:05:33<53:49, 1.63it/s] 54%|█████▍ | 6266/11526 [1:05:34<55:29, 1.58it/s] {'loss': 0.2127, 'grad_norm': 0.5876436829566956, 'learning_rate': 5.111292774737407e-06, 'epoch': 1.63}
54%|█████▍ | 6266/11526 [1:05:34<55:29, 1.58it/s] 54%|█████▍ | 6267/11526 [1:05:35<54:58, 1.59it/s] {'loss': 0.1906, 'grad_norm': 0.6066347360610962, 'learning_rate': 5.109778832359179e-06, 'epoch': 1.63}
54%|█████▍ | 6267/11526 [1:05:35<54:58, 1.59it/s] 54%|█████▍ | 6268/11526 [1:05:35<54:36, 1.60it/s] {'loss': 0.2183, 'grad_norm': 0.5113004446029663, 'learning_rate': 5.108264879911412e-06, 'epoch': 1.63}
54%|█████▍ | 6268/11526 [1:05:35<54:36, 1.60it/s] 54%|█████▍ | 6269/11526 [1:05:36<54:26, 1.61it/s] {'loss': 0.2415, 'grad_norm': 0.6065959930419922, 'learning_rate': 5.106750917532976e-06, 'epoch': 1.63}
54%|█████▍ | 6269/11526 [1:05:36<54:26, 1.61it/s] 54%|█████▍ | 6270/11526 [1:05:36<54:15, 1.61it/s] {'loss': 0.1714, 'grad_norm': 0.48529645800590515, 'learning_rate': 5.105236945362736e-06, 'epoch': 1.63}
54%|█████▍ | 6270/11526 [1:05:36<54:15, 1.61it/s] 54%|█████▍ | 6271/11526 [1:05:37<54:10, 1.62it/s] {'loss': 0.2001, 'grad_norm': 0.5371055603027344, 'learning_rate': 5.103722963539565e-06, 'epoch': 1.63}
54%|█████▍ | 6271/11526 [1:05:37<54:10, 1.62it/s] 54%|█████▍ | 6272/11526 [1:05:38<54:03, 1.62it/s] {'loss': 0.2114, 'grad_norm': 0.5747687816619873, 'learning_rate': 5.102208972202335e-06, 'epoch': 1.63}
54%|█████▍ | 6272/11526 [1:05:38<54:03, 1.62it/s] 54%|█████▍ | 6273/11526 [1:05:38<53:59, 1.62it/s] {'loss': 0.1811, 'grad_norm': 0.4913647770881653, 'learning_rate': 5.100694971489916e-06, 'epoch': 1.63}
54%|█████▍ | 6273/11526 [1:05:38<53:59, 1.62it/s] 54%|█████▍ | 6274/11526 [1:05:39<53:57, 1.62it/s] {'loss': 0.2209, 'grad_norm': 0.6025490760803223, 'learning_rate': 5.099180961541181e-06, 'epoch': 1.63}
54%|█████▍ | 6274/11526 [1:05:39<53:57, 1.62it/s] 54%|█████▍ | 6275/11526 [1:05:39<53:56, 1.62it/s] {'loss': 0.1871, 'grad_norm': 0.5115110874176025, 'learning_rate': 5.097666942495004e-06, 'epoch': 1.63}
54%|█████▍ | 6275/11526 [1:05:40<53:56, 1.62it/s] 54%|█████▍ | 6276/11526 [1:05:40<53:54, 1.62it/s] {'loss': 0.1997, 'grad_norm': 0.5480155348777771, 'learning_rate': 5.09615291449026e-06, 'epoch': 1.63}
54%|█████▍ | 6276/11526 [1:05:40<53:54, 1.62it/s] 54%|█████▍ | 6277/11526 [1:05:41<53:49, 1.63it/s] {'loss': 0.209, 'grad_norm': 0.5180677175521851, 'learning_rate': 5.094638877665822e-06, 'epoch': 1.63}
54%|█████▍ | 6277/11526 [1:05:41<53:49, 1.63it/s] 54%|█████▍ | 6278/11526 [1:05:41<53:47, 1.63it/s] {'loss': 0.2929, 'grad_norm': 0.5338461399078369, 'learning_rate': 5.093124832160569e-06, 'epoch': 1.63}
54%|█████▍ | 6278/11526 [1:05:41<53:47, 1.63it/s] 54%|█████▍ | 6279/11526 [1:05:42<53:52, 1.62it/s] {'loss': 0.2362, 'grad_norm': 0.6190704703330994, 'learning_rate': 5.091610778113375e-06, 'epoch': 1.63}
54%|█████▍ | 6279/11526 [1:05:42<53:52, 1.62it/s] 54%|█████▍ | 6280/11526 [1:05:43<53:47, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.5245222449302673, 'learning_rate': 5.090096715663121e-06, 'epoch': 1.63}
54%|█████▍ | 6280/11526 [1:05:43<53:47, 1.63it/s] 54%|█████▍ | 6281/11526 [1:05:43<55:14, 1.58it/s] {'loss': 0.1675, 'grad_norm': 0.5094539523124695, 'learning_rate': 5.088582644948683e-06, 'epoch': 1.63}
54%|█████▍ | 6281/11526 [1:05:43<55:14, 1.58it/s] 55%|█████▍ | 6282/11526 [1:05:44<54:58, 1.59it/s] {'loss': 0.2049, 'grad_norm': 0.5185012221336365, 'learning_rate': 5.087068566108942e-06, 'epoch': 1.64}
55%|█████▍ | 6282/11526 [1:05:44<54:58, 1.59it/s] 55%|█████▍ | 6283/11526 [1:05:44<54:35, 1.60it/s] {'loss': 0.1901, 'grad_norm': 0.5770642757415771, 'learning_rate': 5.085554479282772e-06, 'epoch': 1.64}
55%|█████▍ | 6283/11526 [1:05:45<54:35, 1.60it/s] 55%|█████▍ | 6284/11526 [1:05:45<54:24, 1.61it/s] {'loss': 0.204, 'grad_norm': 0.5599385499954224, 'learning_rate': 5.084040384609063e-06, 'epoch': 1.64}
55%|█████▍ | 6284/11526 [1:05:45<54:24, 1.61it/s] 55%|█████▍ | 6285/11526 [1:05:46<54:12, 1.61it/s] {'loss': 0.1866, 'grad_norm': 0.4930693507194519, 'learning_rate': 5.082526282226691e-06, 'epoch': 1.64}
55%|█████▍ | 6285/11526 [1:05:46<54:12, 1.61it/s] 55%|█████▍ | 6286/11526 [1:05:46<54:05, 1.61it/s] {'loss': 0.2189, 'grad_norm': 0.5741771459579468, 'learning_rate': 5.0810121722745385e-06, 'epoch': 1.64}
55%|█████▍ | 6286/11526 [1:05:46<54:05, 1.61it/s] 55%|█████▍ | 6287/11526 [1:05:47<55:30, 1.57it/s] {'loss': 0.2078, 'grad_norm': 0.5476234555244446, 'learning_rate': 5.079498054891489e-06, 'epoch': 1.64}
55%|█████▍ | 6287/11526 [1:05:47<55:30, 1.57it/s] 55%|█████▍ | 6288/11526 [1:05:48<55:11, 1.58it/s] {'loss': 0.2345, 'grad_norm': 0.6616838574409485, 'learning_rate': 5.077983930216424e-06, 'epoch': 1.64}
55%|█████▍ | 6288/11526 [1:05:48<55:11, 1.58it/s] 55%|█████▍ | 6289/11526 [1:05:48<54:50, 1.59it/s] {'loss': 0.1792, 'grad_norm': 0.469887375831604, 'learning_rate': 5.07646979838823e-06, 'epoch': 1.64}
55%|█████▍ | 6289/11526 [1:05:48<54:50, 1.59it/s] 55%|█████▍ | 6290/11526 [1:05:49<54:29, 1.60it/s] {'loss': 0.1893, 'grad_norm': 0.5481907725334167, 'learning_rate': 5.0749556595457915e-06, 'epoch': 1.64}
55%|█████▍ | 6290/11526 [1:05:49<54:29, 1.60it/s] 55%|█████▍ | 6291/11526 [1:05:49<54:15, 1.61it/s] {'loss': 0.2217, 'grad_norm': 0.558672308921814, 'learning_rate': 5.073441513827994e-06, 'epoch': 1.64}
55%|█████▍ | 6291/11526 [1:05:50<54:15, 1.61it/s] 55%|█████▍ | 6292/11526 [1:05:50<54:02, 1.61it/s] {'loss': 0.1858, 'grad_norm': 0.5148620009422302, 'learning_rate': 5.07192736137372e-06, 'epoch': 1.64}
55%|█████▍ | 6292/11526 [1:05:50<54:02, 1.61it/s] 55%|█████▍ | 6293/11526 [1:05:51<53:52, 1.62it/s] {'loss': 0.1678, 'grad_norm': 0.5140743851661682, 'learning_rate': 5.070413202321863e-06, 'epoch': 1.64}
55%|█████▍ | 6293/11526 [1:05:51<53:52, 1.62it/s] 55%|█████▍ | 6294/11526 [1:05:51<53:52, 1.62it/s] {'loss': 0.1913, 'grad_norm': 0.519162654876709, 'learning_rate': 5.0688990368113044e-06, 'epoch': 1.64}
55%|█████▍ | 6294/11526 [1:05:51<53:52, 1.62it/s] 55%|█████▍ | 6295/11526 [1:05:52<53:46, 1.62it/s] {'loss': 0.2279, 'grad_norm': 0.5414532423019409, 'learning_rate': 5.0673848649809346e-06, 'epoch': 1.64}
55%|█████▍ | 6295/11526 [1:05:52<53:46, 1.62it/s] 55%|█████▍ | 6296/11526 [1:05:52<53:42, 1.62it/s] {'loss': 0.2146, 'grad_norm': 0.5843806862831116, 'learning_rate': 5.065870686969642e-06, 'epoch': 1.64}
55%|█████▍ | 6296/11526 [1:05:53<53:42, 1.62it/s] 55%|█████▍ | 6297/11526 [1:05:53<53:39, 1.62it/s] {'loss': 0.2403, 'grad_norm': 0.5487253665924072, 'learning_rate': 5.0643565029163144e-06, 'epoch': 1.64}
55%|█████▍ | 6297/11526 [1:05:53<53:39, 1.62it/s] 55%|█████▍ | 6298/11526 [1:05:54<53:36, 1.63it/s] {'loss': 0.2281, 'grad_norm': 0.6448429226875305, 'learning_rate': 5.062842312959843e-06, 'epoch': 1.64}
55%|█████▍ | 6298/11526 [1:05:54<53:36, 1.63it/s] 55%|█████▍ | 6299/11526 [1:05:54<53:49, 1.62it/s] {'loss': 0.223, 'grad_norm': 0.5842196941375732, 'learning_rate': 5.061328117239115e-06, 'epoch': 1.64}
55%|█████▍ | 6299/11526 [1:05:54<53:49, 1.62it/s] 55%|█████▍ | 6300/11526 [1:05:55<53:41, 1.62it/s] {'loss': 0.2067, 'grad_norm': 0.5196123123168945, 'learning_rate': 5.059813915893026e-06, 'epoch': 1.64}
55%|█████▍ | 6300/11526 [1:05:55<53:41, 1.62it/s] 55%|█████▍ | 6301/11526 [1:05:56<53:39, 1.62it/s] {'loss': 0.208, 'grad_norm': 0.5200225710868835, 'learning_rate': 5.058299709060462e-06, 'epoch': 1.64}
55%|█████▍ | 6301/11526 [1:05:56<53:39, 1.62it/s] 55%|█████▍ | 6302/11526 [1:05:56<53:35, 1.62it/s] {'loss': 0.229, 'grad_norm': 0.5446870923042297, 'learning_rate': 5.0567854968803175e-06, 'epoch': 1.64}
55%|█████▍ | 6302/11526 [1:05:56<53:35, 1.62it/s] 55%|█████▍ | 6303/11526 [1:05:57<53:34, 1.62it/s] {'loss': 0.1801, 'grad_norm': 0.5164833664894104, 'learning_rate': 5.055271279491485e-06, 'epoch': 1.64}
55%|█████▍ | 6303/11526 [1:05:57<53:34, 1.62it/s] 55%|█████▍ | 6304/11526 [1:05:57<53:34, 1.62it/s] {'loss': 0.2065, 'grad_norm': 0.511244535446167, 'learning_rate': 5.053757057032857e-06, 'epoch': 1.64}
55%|█████▍ | 6304/11526 [1:05:58<53:34, 1.62it/s] 55%|█████▍ | 6305/11526 [1:05:58<53:33, 1.62it/s] {'loss': 0.2137, 'grad_norm': 0.5571997165679932, 'learning_rate': 5.052242829643323e-06, 'epoch': 1.64}
55%|█████▍ | 6305/11526 [1:05:58<53:33, 1.62it/s] 55%|█████▍ | 6306/11526 [1:05:59<53:29, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.481588214635849, 'learning_rate': 5.050728597461781e-06, 'epoch': 1.64}
55%|█████▍ | 6306/11526 [1:05:59<53:29, 1.63it/s] 55%|█████▍ | 6307/11526 [1:05:59<53:31, 1.63it/s] {'loss': 0.2597, 'grad_norm': 0.6025944948196411, 'learning_rate': 5.049214360627125e-06, 'epoch': 1.64}
55%|█████▍ | 6307/11526 [1:05:59<53:31, 1.63it/s] 55%|█████▍ | 6308/11526 [1:06:00<53:28, 1.63it/s] {'loss': 0.1903, 'grad_norm': 0.572257399559021, 'learning_rate': 5.047700119278246e-06, 'epoch': 1.64}
55%|█████▍ | 6308/11526 [1:06:00<53:28, 1.63it/s] 55%|█████▍ | 6309/11526 [1:06:01<53:30, 1.63it/s] {'loss': 0.2381, 'grad_norm': 0.6513067483901978, 'learning_rate': 5.04618587355404e-06, 'epoch': 1.64}
55%|█████▍ | 6309/11526 [1:06:01<53:30, 1.63it/s] 55%|█████▍ | 6310/11526 [1:06:01<53:28, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.5725415349006653, 'learning_rate': 5.0446716235934044e-06, 'epoch': 1.64}
55%|█████▍ | 6310/11526 [1:06:01<53:28, 1.63it/s] 55%|█████▍ | 6311/11526 [1:06:02<53:25, 1.63it/s] {'loss': 0.2059, 'grad_norm': 0.5487856864929199, 'learning_rate': 5.0431573695352334e-06, 'epoch': 1.64}
55%|█████▍ | 6311/11526 [1:06:02<53:25, 1.63it/s] 55%|█████▍ | 6312/11526 [1:06:02<53:22, 1.63it/s] {'loss': 0.2346, 'grad_norm': 0.6661316156387329, 'learning_rate': 5.041643111518423e-06, 'epoch': 1.64}
55%|█████▍ | 6312/11526 [1:06:02<53:22, 1.63it/s] 55%|█████▍ | 6313/11526 [1:06:03<53:22, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5818718075752258, 'learning_rate': 5.040128849681868e-06, 'epoch': 1.64}
55%|█████▍ | 6313/11526 [1:06:03<53:22, 1.63it/s] 55%|█████▍ | 6314/11526 [1:06:04<53:27, 1.62it/s] {'loss': 0.2412, 'grad_norm': 0.6007742285728455, 'learning_rate': 5.0386145841644674e-06, 'epoch': 1.64}
55%|█████▍ | 6314/11526 [1:06:04<53:27, 1.62it/s] 55%|█████▍ | 6315/11526 [1:06:04<53:26, 1.63it/s] {'loss': 0.165, 'grad_norm': 0.48182645440101624, 'learning_rate': 5.037100315105118e-06, 'epoch': 1.64}
55%|█████▍ | 6315/11526 [1:06:04<53:26, 1.63it/s] 55%|█████▍ | 6316/11526 [1:06:05<53:24, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.5545507669448853, 'learning_rate': 5.035586042642716e-06, 'epoch': 1.64}
55%|█████▍ | 6316/11526 [1:06:05<53:24, 1.63it/s] 55%|█████▍ | 6317/11526 [1:06:05<53:23, 1.63it/s] {'loss': 0.1585, 'grad_norm': 0.41430020332336426, 'learning_rate': 5.0340717669161585e-06, 'epoch': 1.64}
55%|█████▍ | 6317/11526 [1:06:06<53:23, 1.63it/s] 55%|█████▍ | 6318/11526 [1:06:06<53:24, 1.63it/s] {'loss': 0.2025, 'grad_norm': 0.5424428582191467, 'learning_rate': 5.032557488064347e-06, 'epoch': 1.64}
55%|█████▍ | 6318/11526 [1:06:06<53:24, 1.63it/s] 55%|█████▍ | 6319/11526 [1:06:07<53:26, 1.62it/s] {'loss': 0.2134, 'grad_norm': 0.5921123027801514, 'learning_rate': 5.0310432062261764e-06, 'epoch': 1.64}
55%|█████▍ | 6319/11526 [1:06:07<53:26, 1.62it/s] 55%|█████▍ | 6320/11526 [1:06:07<53:22, 1.63it/s] {'loss': 0.1986, 'grad_norm': 0.6639673709869385, 'learning_rate': 5.029528921540546e-06, 'epoch': 1.64}
55%|█████▍ | 6320/11526 [1:06:07<53:22, 1.63it/s] 55%|█████▍ | 6321/11526 [1:06:08<53:20, 1.63it/s] {'loss': 0.2376, 'grad_norm': 0.5040886998176575, 'learning_rate': 5.028014634146354e-06, 'epoch': 1.65}
55%|█████▍ | 6321/11526 [1:06:08<53:20, 1.63it/s] 55%|█████▍ | 6322/11526 [1:06:08<53:19, 1.63it/s] {'loss': 0.2341, 'grad_norm': 0.6650235056877136, 'learning_rate': 5.026500344182502e-06, 'epoch': 1.65}
55%|█████▍ | 6322/11526 [1:06:09<53:19, 1.63it/s] 55%|█████▍ | 6323/11526 [1:06:09<53:16, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.44329559803009033, 'learning_rate': 5.024986051787888e-06, 'epoch': 1.65}
55%|█████▍ | 6323/11526 [1:06:09<53:16, 1.63it/s] 55%|█████▍ | 6324/11526 [1:06:10<53:17, 1.63it/s] {'loss': 0.2984, 'grad_norm': 0.67155921459198, 'learning_rate': 5.02347175710141e-06, 'epoch': 1.65}
55%|█████▍ | 6324/11526 [1:06:10<53:17, 1.63it/s] 55%|█████▍ | 6325/11526 [1:06:10<53:18, 1.63it/s] {'loss': 0.1842, 'grad_norm': 0.502638041973114, 'learning_rate': 5.021957460261969e-06, 'epoch': 1.65}
55%|█████▍ | 6325/11526 [1:06:10<53:18, 1.63it/s] 55%|█████▍ | 6326/11526 [1:06:11<53:15, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.5077468752861023, 'learning_rate': 5.020443161408466e-06, 'epoch': 1.65}
55%|█████▍ | 6326/11526 [1:06:11<53:15, 1.63it/s] 55%|█████▍ | 6327/11526 [1:06:12<53:13, 1.63it/s] {'loss': 0.1846, 'grad_norm': 0.5465292930603027, 'learning_rate': 5.0189288606797994e-06, 'epoch': 1.65}
55%|█████▍ | 6327/11526 [1:06:12<53:13, 1.63it/s] 55%|█████▍ | 6328/11526 [1:06:12<53:12, 1.63it/s] {'loss': 0.242, 'grad_norm': 0.6264804601669312, 'learning_rate': 5.0174145582148695e-06, 'epoch': 1.65}
55%|█████▍ | 6328/11526 [1:06:12<53:12, 1.63it/s] 55%|█████▍ | 6329/11526 [1:06:13<53:26, 1.62it/s] {'loss': 0.223, 'grad_norm': 0.6066399216651917, 'learning_rate': 5.015900254152576e-06, 'epoch': 1.65}
55%|█████▍ | 6329/11526 [1:06:13<53:26, 1.62it/s] 55%|█████▍ | 6330/11526 [1:06:13<53:20, 1.62it/s] {'loss': 0.1773, 'grad_norm': 0.4812931418418884, 'learning_rate': 5.014385948631822e-06, 'epoch': 1.65}
55%|█████▍ | 6330/11526 [1:06:14<53:20, 1.62it/s] 55%|█████▍ | 6331/11526 [1:06:14<53:17, 1.62it/s] {'loss': 0.182, 'grad_norm': 0.5938217043876648, 'learning_rate': 5.012871641791508e-06, 'epoch': 1.65}
55%|█████▍ | 6331/11526 [1:06:14<53:17, 1.62it/s] 55%|█████▍ | 6332/11526 [1:06:15<53:12, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.4770878255367279, 'learning_rate': 5.0113573337705324e-06, 'epoch': 1.65}
55%|█████▍ | 6332/11526 [1:06:15<53:12, 1.63it/s] 55%|█████▍ | 6333/11526 [1:06:15<53:10, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.4706372618675232, 'learning_rate': 5.009843024707798e-06, 'epoch': 1.65}
55%|█████▍ | 6333/11526 [1:06:15<53:10, 1.63it/s] 55%|█████▍ | 6334/11526 [1:06:16<53:14, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.41573312878608704, 'learning_rate': 5.008328714742205e-06, 'epoch': 1.65}
55%|█████▍ | 6334/11526 [1:06:16<53:14, 1.63it/s] 55%|█████▍ | 6335/11526 [1:06:16<53:12, 1.63it/s] {'loss': 0.2198, 'grad_norm': 0.5324915051460266, 'learning_rate': 5.006814404012656e-06, 'epoch': 1.65}
55%|█████▍ | 6335/11526 [1:06:17<53:12, 1.63it/s] 55%|█████▍ | 6336/11526 [1:06:17<53:15, 1.62it/s] {'loss': 0.1891, 'grad_norm': 0.4940371513366699, 'learning_rate': 5.005300092658049e-06, 'epoch': 1.65}
55%|█████▍ | 6336/11526 [1:06:17<53:15, 1.62it/s] 55%|█████▍ | 6337/11526 [1:06:18<53:12, 1.63it/s] {'loss': 0.2401, 'grad_norm': 0.5290247201919556, 'learning_rate': 5.0037857808172885e-06, 'epoch': 1.65}
55%|█████▍ | 6337/11526 [1:06:18<53:12, 1.63it/s] 55%|█████▍ | 6338/11526 [1:06:18<53:10, 1.63it/s] {'loss': 0.2371, 'grad_norm': 0.6221719980239868, 'learning_rate': 5.002271468629275e-06, 'epoch': 1.65}
55%|█████▍ | 6338/11526 [1:06:18<53:10, 1.63it/s] 55%|█████▍ | 6339/11526 [1:06:19<53:13, 1.62it/s] {'loss': 0.2046, 'grad_norm': 0.49481943249702454, 'learning_rate': 5.000757156232909e-06, 'epoch': 1.65}
55%|█████▍ | 6339/11526 [1:06:19<53:13, 1.62it/s] 55%|█████▌ | 6340/11526 [1:06:20<53:10, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5346150398254395, 'learning_rate': 4.9992428437670925e-06, 'epoch': 1.65}
55%|█████▌ | 6340/11526 [1:06:20<53:10, 1.63it/s] 55%|█████▌ | 6341/11526 [1:06:20<53:08, 1.63it/s] {'loss': 0.2211, 'grad_norm': 0.6657461524009705, 'learning_rate': 4.997728531370727e-06, 'epoch': 1.65}
55%|█████▌ | 6341/11526 [1:06:20<53:08, 1.63it/s] 55%|█████▌ | 6342/11526 [1:06:21<53:04, 1.63it/s] {'loss': 0.1589, 'grad_norm': 0.44784271717071533, 'learning_rate': 4.996214219182713e-06, 'epoch': 1.65}
55%|█████▌ | 6342/11526 [1:06:21<53:04, 1.63it/s] 55%|█████▌ | 6343/11526 [1:06:21<53:03, 1.63it/s] {'loss': 0.1891, 'grad_norm': 0.46065554022789, 'learning_rate': 4.994699907341952e-06, 'epoch': 1.65}
55%|█████▌ | 6343/11526 [1:06:22<53:03, 1.63it/s] 55%|█████▌ | 6344/11526 [1:06:22<53:07, 1.63it/s] {'loss': 0.2129, 'grad_norm': 0.5151588916778564, 'learning_rate': 4.993185595987347e-06, 'epoch': 1.65}
55%|█████▌ | 6344/11526 [1:06:22<53:07, 1.63it/s] 55%|█████▌ | 6345/11526 [1:06:23<53:05, 1.63it/s] {'loss': 0.2085, 'grad_norm': 0.46378394961357117, 'learning_rate': 4.991671285257796e-06, 'epoch': 1.65}
55%|█████▌ | 6345/11526 [1:06:23<53:05, 1.63it/s] 55%|█████▌ | 6346/11526 [1:06:23<53:03, 1.63it/s] {'loss': 0.2126, 'grad_norm': 0.5092033743858337, 'learning_rate': 4.990156975292204e-06, 'epoch': 1.65}
55%|█████▌ | 6346/11526 [1:06:23<53:03, 1.63it/s] 55%|█████▌ | 6347/11526 [1:06:24<53:01, 1.63it/s] {'loss': 0.2138, 'grad_norm': 0.589289665222168, 'learning_rate': 4.988642666229469e-06, 'epoch': 1.65}
55%|█████▌ | 6347/11526 [1:06:24<53:01, 1.63it/s] 55%|█████▌ | 6348/11526 [1:06:24<52:59, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.5307474136352539, 'learning_rate': 4.9871283582084944e-06, 'epoch': 1.65}
55%|█████▌ | 6348/11526 [1:06:25<52:59, 1.63it/s] 55%|█████▌ | 6349/11526 [1:06:25<53:05, 1.63it/s] {'loss': 0.2045, 'grad_norm': 0.5816412568092346, 'learning_rate': 4.9856140513681785e-06, 'epoch': 1.65}
55%|█████▌ | 6349/11526 [1:06:25<53:05, 1.63it/s] 55%|█████▌ | 6350/11526 [1:06:26<53:03, 1.63it/s] {'loss': 0.1698, 'grad_norm': 0.4780545234680176, 'learning_rate': 4.984099745847425e-06, 'epoch': 1.65}
55%|█████▌ | 6350/11526 [1:06:26<53:03, 1.63it/s] 55%|█████▌ | 6351/11526 [1:06:26<53:00, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.4657112956047058, 'learning_rate': 4.982585441785133e-06, 'epoch': 1.65}
55%|█████▌ | 6351/11526 [1:06:26<53:00, 1.63it/s] 55%|█████▌ | 6352/11526 [1:06:27<52:59, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.43530040979385376, 'learning_rate': 4.981071139320203e-06, 'epoch': 1.65}
55%|█████▌ | 6352/11526 [1:06:27<52:59, 1.63it/s] 55%|█████▌ | 6353/11526 [1:06:28<52:58, 1.63it/s] {'loss': 0.2042, 'grad_norm': 0.5907611846923828, 'learning_rate': 4.979556838591535e-06, 'epoch': 1.65}
55%|█████▌ | 6353/11526 [1:06:28<52:58, 1.63it/s] 55%|█████▌ | 6354/11526 [1:06:28<53:02, 1.63it/s] {'loss': 0.2077, 'grad_norm': 0.5771214365959167, 'learning_rate': 4.9780425397380315e-06, 'epoch': 1.65}
55%|█████▌ | 6354/11526 [1:06:28<53:02, 1.63it/s] 55%|█████▌ | 6355/11526 [1:06:29<53:00, 1.63it/s] {'loss': 0.1946, 'grad_norm': 0.5205808877944946, 'learning_rate': 4.9765282428985904e-06, 'epoch': 1.65}
55%|█████▌ | 6355/11526 [1:06:29<53:00, 1.63it/s] 55%|█████▌ | 6356/11526 [1:06:29<53:01, 1.63it/s] {'loss': 0.1304, 'grad_norm': 0.38693809509277344, 'learning_rate': 4.975013948212114e-06, 'epoch': 1.65}
55%|█████▌ | 6356/11526 [1:06:30<53:01, 1.63it/s] 55%|█████▌ | 6357/11526 [1:06:30<53:00, 1.63it/s] {'loss': 0.2361, 'grad_norm': 0.6248453259468079, 'learning_rate': 4.9734996558174995e-06, 'epoch': 1.65}
55%|█████▌ | 6357/11526 [1:06:30<53:00, 1.63it/s] 55%|█████▌ | 6358/11526 [1:06:31<52:59, 1.63it/s] {'loss': 0.2044, 'grad_norm': 0.5620766282081604, 'learning_rate': 4.971985365853646e-06, 'epoch': 1.65}
55%|█████▌ | 6358/11526 [1:06:31<52:59, 1.63it/s] 55%|█████▌ | 6359/11526 [1:06:31<53:00, 1.62it/s] {'loss': 0.2339, 'grad_norm': 0.6169837713241577, 'learning_rate': 4.970471078459455e-06, 'epoch': 1.66}
55%|█████▌ | 6359/11526 [1:06:31<53:00, 1.62it/s] 55%|█████▌ | 6360/11526 [1:06:32<52:58, 1.63it/s] {'loss': 0.2614, 'grad_norm': 0.553383469581604, 'learning_rate': 4.968956793773825e-06, 'epoch': 1.66}
55%|█████▌ | 6360/11526 [1:06:32<52:58, 1.63it/s] 55%|█████▌ | 6361/11526 [1:06:32<52:57, 1.63it/s] {'loss': 0.2392, 'grad_norm': 0.6213073134422302, 'learning_rate': 4.967442511935655e-06, 'epoch': 1.66}
55%|█████▌ | 6361/11526 [1:06:33<52:57, 1.63it/s] 55%|█████▌ | 6362/11526 [1:06:33<52:54, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.5168159604072571, 'learning_rate': 4.9659282330838414e-06, 'epoch': 1.66}
55%|█████▌ | 6362/11526 [1:06:33<52:54, 1.63it/s] 55%|█████▌ | 6363/11526 [1:06:34<52:52, 1.63it/s] {'loss': 0.197, 'grad_norm': 0.5498520135879517, 'learning_rate': 4.964413957357284e-06, 'epoch': 1.66}
55%|█████▌ | 6363/11526 [1:06:34<52:52, 1.63it/s] 55%|█████▌ | 6364/11526 [1:06:34<52:56, 1.62it/s] {'loss': 0.2005, 'grad_norm': 0.5783654451370239, 'learning_rate': 4.962899684894883e-06, 'epoch': 1.66}
55%|█████▌ | 6364/11526 [1:06:34<52:56, 1.62it/s] 55%|█████▌ | 6365/11526 [1:06:35<52:55, 1.63it/s] {'loss': 0.2153, 'grad_norm': 0.5225991010665894, 'learning_rate': 4.961385415835534e-06, 'epoch': 1.66}
55%|█████▌ | 6365/11526 [1:06:35<52:55, 1.63it/s] 55%|█████▌ | 6366/11526 [1:06:36<52:52, 1.63it/s] {'loss': 0.1548, 'grad_norm': 0.4713810384273529, 'learning_rate': 4.959871150318133e-06, 'epoch': 1.66}
55%|█████▌ | 6366/11526 [1:06:36<52:52, 1.63it/s] 55%|█████▌ | 6367/11526 [1:06:36<52:50, 1.63it/s] {'loss': 0.2107, 'grad_norm': 0.5198962688446045, 'learning_rate': 4.958356888481578e-06, 'epoch': 1.66}
55%|█████▌ | 6367/11526 [1:06:36<52:50, 1.63it/s] 55%|█████▌ | 6368/11526 [1:06:37<52:48, 1.63it/s] {'loss': 0.2483, 'grad_norm': 0.6214694976806641, 'learning_rate': 4.956842630464767e-06, 'epoch': 1.66}
55%|█████▌ | 6368/11526 [1:06:37<52:48, 1.63it/s] 55%|█████▌ | 6369/11526 [1:06:37<53:04, 1.62it/s] {'loss': 0.1826, 'grad_norm': 0.47983700037002563, 'learning_rate': 4.955328376406596e-06, 'epoch': 1.66}
55%|█████▌ | 6369/11526 [1:06:38<53:04, 1.62it/s] 55%|█████▌ | 6370/11526 [1:06:38<52:59, 1.62it/s] {'loss': 0.2354, 'grad_norm': 0.6124559044837952, 'learning_rate': 4.953814126445961e-06, 'epoch': 1.66}
55%|█████▌ | 6370/11526 [1:06:38<52:59, 1.62it/s] 55%|█████▌ | 6371/11526 [1:06:39<52:55, 1.62it/s] {'loss': 0.1967, 'grad_norm': 0.5277947187423706, 'learning_rate': 4.952299880721755e-06, 'epoch': 1.66}
55%|█████▌ | 6371/11526 [1:06:39<52:55, 1.62it/s] 55%|█████▌ | 6372/11526 [1:06:39<52:52, 1.62it/s] {'loss': 0.2269, 'grad_norm': 0.6065030097961426, 'learning_rate': 4.950785639372877e-06, 'epoch': 1.66}
55%|█████▌ | 6372/11526 [1:06:39<52:52, 1.62it/s] 55%|█████▌ | 6373/11526 [1:06:40<52:49, 1.63it/s] {'loss': 0.1769, 'grad_norm': 0.5553733706474304, 'learning_rate': 4.9492714025382196e-06, 'epoch': 1.66}
55%|█████▌ | 6373/11526 [1:06:40<52:49, 1.63it/s] 55%|█████▌ | 6374/11526 [1:06:40<52:52, 1.62it/s] {'loss': 0.2036, 'grad_norm': 0.5480161309242249, 'learning_rate': 4.947757170356678e-06, 'epoch': 1.66}
55%|█████▌ | 6374/11526 [1:06:41<52:52, 1.62it/s] 55%|█████▌ | 6375/11526 [1:06:41<52:48, 1.63it/s] {'loss': 0.1953, 'grad_norm': 0.6156177520751953, 'learning_rate': 4.946242942967146e-06, 'epoch': 1.66}
55%|█████▌ | 6375/11526 [1:06:41<52:48, 1.63it/s] 55%|█████▌ | 6376/11526 [1:06:42<52:46, 1.63it/s] {'loss': 0.1482, 'grad_norm': 0.41420668363571167, 'learning_rate': 4.944728720508516e-06, 'epoch': 1.66}
55%|█████▌ | 6376/11526 [1:06:42<52:46, 1.63it/s] 55%|█████▌ | 6377/11526 [1:06:42<52:46, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.46800878643989563, 'learning_rate': 4.943214503119683e-06, 'epoch': 1.66}
55%|█████▌ | 6377/11526 [1:06:42<52:46, 1.63it/s] 55%|█████▌ | 6378/11526 [1:06:43<52:43, 1.63it/s] {'loss': 0.2253, 'grad_norm': 0.6899321675300598, 'learning_rate': 4.941700290939539e-06, 'epoch': 1.66}
55%|█████▌ | 6378/11526 [1:06:43<52:43, 1.63it/s] 55%|█████▌ | 6379/11526 [1:06:44<52:47, 1.63it/s] {'loss': 0.1907, 'grad_norm': 0.5162034630775452, 'learning_rate': 4.9401860841069765e-06, 'epoch': 1.66}
55%|█████▌ | 6379/11526 [1:06:44<52:47, 1.63it/s] 55%|█████▌ | 6380/11526 [1:06:44<52:45, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.4727327525615692, 'learning_rate': 4.938671882760885e-06, 'epoch': 1.66}
55%|█████▌ | 6380/11526 [1:06:44<52:45, 1.63it/s] 55%|█████▌ | 6381/11526 [1:06:45<52:49, 1.62it/s] {'loss': 0.1855, 'grad_norm': 0.4604489505290985, 'learning_rate': 4.93715768704016e-06, 'epoch': 1.66}
55%|█████▌ | 6381/11526 [1:06:45<52:49, 1.62it/s] 55%|█████▌ | 6382/11526 [1:06:45<52:47, 1.62it/s] {'loss': 0.2288, 'grad_norm': 0.5544418096542358, 'learning_rate': 4.935643497083687e-06, 'epoch': 1.66}
55%|█████▌ | 6382/11526 [1:06:46<52:47, 1.62it/s] 55%|█████▌ | 6383/11526 [1:06:46<52:44, 1.63it/s] {'loss': 0.2085, 'grad_norm': 0.6539827585220337, 'learning_rate': 4.93412931303036e-06, 'epoch': 1.66}
55%|█████▌ | 6383/11526 [1:06:46<52:44, 1.63it/s] 55%|█████▌ | 6384/11526 [1:06:47<52:45, 1.62it/s] {'loss': 0.1561, 'grad_norm': 0.44260987639427185, 'learning_rate': 4.932615135019068e-06, 'epoch': 1.66}
55%|█████▌ | 6384/11526 [1:06:47<52:45, 1.62it/s] 55%|█████▌ | 6385/11526 [1:06:47<52:40, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.5560705065727234, 'learning_rate': 4.931100963188697e-06, 'epoch': 1.66}
55%|█████▌ | 6385/11526 [1:06:47<52:40, 1.63it/s] 55%|█████▌ | 6386/11526 [1:06:48<52:39, 1.63it/s] {'loss': 0.2041, 'grad_norm': 0.5460354089736938, 'learning_rate': 4.9295867976781384e-06, 'epoch': 1.66}
55%|█████▌ | 6386/11526 [1:06:48<52:39, 1.63it/s] 55%|█████▌ | 6387/11526 [1:06:48<52:37, 1.63it/s] {'loss': 0.1768, 'grad_norm': 0.5500664710998535, 'learning_rate': 4.9280726386262804e-06, 'epoch': 1.66}
55%|█████▌ | 6387/11526 [1:06:49<52:37, 1.63it/s] 55%|█████▌ | 6388/11526 [1:06:49<52:37, 1.63it/s] {'loss': 0.204, 'grad_norm': 0.5234349370002747, 'learning_rate': 4.9265584861720095e-06, 'epoch': 1.66}
55%|█████▌ | 6388/11526 [1:06:49<52:37, 1.63it/s] 55%|█████▌ | 6389/11526 [1:06:50<52:39, 1.63it/s] {'loss': 0.2265, 'grad_norm': 0.5679160952568054, 'learning_rate': 4.92504434045421e-06, 'epoch': 1.66}
55%|█████▌ | 6389/11526 [1:06:50<52:39, 1.63it/s] 55%|█████▌ | 6390/11526 [1:06:50<52:36, 1.63it/s] {'loss': 0.2033, 'grad_norm': 0.4983033835887909, 'learning_rate': 4.9235302016117705e-06, 'epoch': 1.66}
55%|█████▌ | 6390/11526 [1:06:50<52:36, 1.63it/s] 55%|█████▌ | 6391/11526 [1:06:51<52:35, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.47862112522125244, 'learning_rate': 4.922016069783577e-06, 'epoch': 1.66}
55%|█████▌ | 6391/11526 [1:06:51<52:35, 1.63it/s] 55%|█████▌ | 6392/11526 [1:06:52<52:33, 1.63it/s] {'loss': 0.2139, 'grad_norm': 0.5985431671142578, 'learning_rate': 4.920501945108514e-06, 'epoch': 1.66}
55%|█████▌ | 6392/11526 [1:06:52<52:33, 1.63it/s] 55%|█████▌ | 6393/11526 [1:06:52<52:32, 1.63it/s] {'loss': 0.1786, 'grad_norm': 0.45654192566871643, 'learning_rate': 4.918987827725462e-06, 'epoch': 1.66}
55%|█████▌ | 6393/11526 [1:06:52<52:32, 1.63it/s] 55%|█████▌ | 6394/11526 [1:06:53<52:35, 1.63it/s] {'loss': 0.1629, 'grad_norm': 0.5014111399650574, 'learning_rate': 4.917473717773309e-06, 'epoch': 1.66}
55%|█████▌ | 6394/11526 [1:06:53<52:35, 1.63it/s] 55%|█████▌ | 6395/11526 [1:06:53<52:34, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.5544602870941162, 'learning_rate': 4.915959615390938e-06, 'epoch': 1.66}
55%|█████▌ | 6395/11526 [1:06:54<52:34, 1.63it/s] 55%|█████▌ | 6396/11526 [1:06:54<52:33, 1.63it/s] {'loss': 0.2056, 'grad_norm': 0.6310425400733948, 'learning_rate': 4.914445520717228e-06, 'epoch': 1.66}
55%|█████▌ | 6396/11526 [1:06:54<52:33, 1.63it/s] 56%|█████▌ | 6397/11526 [1:06:55<52:32, 1.63it/s] {'loss': 0.2116, 'grad_norm': 0.6137412190437317, 'learning_rate': 4.912931433891062e-06, 'epoch': 1.67}
56%|█████▌ | 6397/11526 [1:06:55<52:32, 1.63it/s] 56%|█████▌ | 6398/11526 [1:06:55<52:31, 1.63it/s] {'loss': 0.231, 'grad_norm': 0.6336262226104736, 'learning_rate': 4.911417355051318e-06, 'epoch': 1.67}
56%|█████▌ | 6398/11526 [1:06:55<52:31, 1.63it/s] 56%|█████▌ | 6399/11526 [1:06:56<52:31, 1.63it/s] {'loss': 0.2303, 'grad_norm': 0.5726328492164612, 'learning_rate': 4.90990328433688e-06, 'epoch': 1.67}
56%|█████▌ | 6399/11526 [1:06:56<52:31, 1.63it/s] 56%|█████▌ | 6400/11526 [1:06:56<52:32, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.5046522617340088, 'learning_rate': 4.908389221886627e-06, 'epoch': 1.67}
56%|█████▌ | 6400/11526 [1:06:57<52:32, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.33it/s]
31%|███ | 4/13 [00:00<00:01, 8.37it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.75it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.72it/s]
100%|██████████| 13/13 [00:01<00:00, 6.73it/s]
{'eval_loss': 0.5447467565536499, 'eval_runtime': 1.9593, 'eval_samples_per_second': 102.077, 'eval_steps_per_second': 6.635, 'epoch': 1.67}
56%|█████▌ | 6400/11526 [1:06:59<52:32, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.73it/s]
 56%|█████▌ | 6401/11526 [1:06:59<1:42:51, 1.20s/it] {'loss': 0.1966, 'grad_norm': 0.5475740432739258, 'learning_rate': 4.906875167839433e-06, 'epoch': 1.67}
56%|█████▌ | 6401/11526 [1:06:59<1:42:51, 1.20s/it] 56%|█████▌ | 6402/11526 [1:07:00<1:27:42, 1.03s/it] {'loss': 0.2213, 'grad_norm': 0.6502124071121216, 'learning_rate': 4.905361122334178e-06, 'epoch': 1.67}
56%|█████▌ | 6402/11526 [1:07:00<1:27:42, 1.03s/it] 56%|█████▌ | 6403/11526 [1:07:00<1:17:05, 1.11it/s] {'loss': 0.2037, 'grad_norm': 0.5554869174957275, 'learning_rate': 4.903847085509742e-06, 'epoch': 1.67}
56%|█████▌ | 6403/11526 [1:07:00<1:17:05, 1.11it/s] 56%|█████▌ | 6404/11526 [1:07:01<1:09:45, 1.22it/s] {'loss': 0.2041, 'grad_norm': 0.5606160163879395, 'learning_rate': 4.902333057504998e-06, 'epoch': 1.67}
56%|█████▌ | 6404/11526 [1:07:01<1:09:45, 1.22it/s] 56%|█████▌ | 6405/11526 [1:07:02<1:04:32, 1.32it/s] {'loss': 0.2318, 'grad_norm': 0.6106163263320923, 'learning_rate': 4.90081903845882e-06, 'epoch': 1.67}
56%|█████▌ | 6405/11526 [1:07:02<1:04:32, 1.32it/s] 56%|█████▌ | 6406/11526 [1:07:02<1:00:52, 1.40it/s] {'loss': 0.2122, 'grad_norm': 0.6134449243545532, 'learning_rate': 4.8993050285100865e-06, 'epoch': 1.67}
56%|█████▌ | 6406/11526 [1:07:02<1:00:52, 1.40it/s] 56%|█████▌ | 6407/11526 [1:07:03<58:19, 1.46it/s] {'loss': 0.2433, 'grad_norm': 0.7297170162200928, 'learning_rate': 4.897791027797666e-06, 'epoch': 1.67}
56%|█████▌ | 6407/11526 [1:07:03<58:19, 1.46it/s] 56%|█████▌ | 6408/11526 [1:07:03<56:31, 1.51it/s] {'loss': 0.1875, 'grad_norm': 0.5354832410812378, 'learning_rate': 4.896277036460435e-06, 'epoch': 1.67}
56%|█████▌ | 6408/11526 [1:07:03<56:31, 1.51it/s] 56%|█████▌ | 6409/11526 [1:07:04<55:18, 1.54it/s] {'loss': 0.2246, 'grad_norm': 0.6165109872817993, 'learning_rate': 4.8947630546372645e-06, 'epoch': 1.67}
56%|█████▌ | 6409/11526 [1:07:04<55:18, 1.54it/s] 56%|█████▌ | 6410/11526 [1:07:05<54:26, 1.57it/s] {'loss': 0.1762, 'grad_norm': 0.5759462118148804, 'learning_rate': 4.893249082467027e-06, 'epoch': 1.67}
56%|█████▌ | 6410/11526 [1:07:05<54:26, 1.57it/s] 56%|█████▌ | 6411/11526 [1:07:05<53:47, 1.58it/s] {'loss': 0.2058, 'grad_norm': 0.5766630172729492, 'learning_rate': 4.891735120088587e-06, 'epoch': 1.67}
56%|█████▌ | 6411/11526 [1:07:05<53:47, 1.58it/s] 56%|█████▌ | 6412/11526 [1:07:06<53:21, 1.60it/s] {'loss': 0.1413, 'grad_norm': 0.4954400062561035, 'learning_rate': 4.8902211676408215e-06, 'epoch': 1.67}
56%|█████▌ | 6412/11526 [1:07:06<53:21, 1.60it/s] 56%|█████▌ | 6413/11526 [1:07:06<53:01, 1.61it/s] {'loss': 0.2022, 'grad_norm': 0.5994308590888977, 'learning_rate': 4.8887072252625935e-06, 'epoch': 1.67}
56%|█████▌ | 6413/11526 [1:07:07<53:01, 1.61it/s] 56%|█████▌ | 6414/11526 [1:07:07<52:50, 1.61it/s] {'loss': 0.1736, 'grad_norm': 0.5455106496810913, 'learning_rate': 4.887193293092774e-06, 'epoch': 1.67}
56%|█████▌ | 6414/11526 [1:07:07<52:50, 1.61it/s] 56%|█████▌ | 6415/11526 [1:07:08<52:40, 1.62it/s] {'loss': 0.2037, 'grad_norm': 0.5735116004943848, 'learning_rate': 4.885679371270226e-06, 'epoch': 1.67}
56%|█████▌ | 6415/11526 [1:07:08<52:40, 1.62it/s] 56%|█████▌ | 6416/11526 [1:07:08<52:33, 1.62it/s] {'loss': 0.2328, 'grad_norm': 0.6193531155586243, 'learning_rate': 4.884165459933815e-06, 'epoch': 1.67}
56%|█████▌ | 6416/11526 [1:07:08<52:33, 1.62it/s] 56%|█████▌ | 6417/11526 [1:07:09<52:28, 1.62it/s] {'loss': 0.254, 'grad_norm': 0.649890661239624, 'learning_rate': 4.882651559222409e-06, 'epoch': 1.67}
56%|█████▌ | 6417/11526 [1:07:09<52:28, 1.62it/s] 56%|█████▌ | 6418/11526 [1:07:09<52:24, 1.62it/s] {'loss': 0.1883, 'grad_norm': 0.5377092361450195, 'learning_rate': 4.8811376692748706e-06, 'epoch': 1.67}
56%|█████▌ | 6418/11526 [1:07:10<52:24, 1.62it/s] 56%|█████▌ | 6419/11526 [1:07:10<52:23, 1.62it/s] {'loss': 0.3196, 'grad_norm': 0.7452520728111267, 'learning_rate': 4.879623790230059e-06, 'epoch': 1.67}
56%|█████▌ | 6419/11526 [1:07:10<52:23, 1.62it/s] 56%|█████▌ | 6420/11526 [1:07:11<52:20, 1.63it/s] {'loss': 0.1619, 'grad_norm': 0.4700539708137512, 'learning_rate': 4.878109922226838e-06, 'epoch': 1.67}
56%|█████▌ | 6420/11526 [1:07:11<52:20, 1.63it/s] 56%|█████▌ | 6421/11526 [1:07:11<52:16, 1.63it/s] {'loss': 0.1792, 'grad_norm': 0.5082269310951233, 'learning_rate': 4.876596065404071e-06, 'epoch': 1.67}
56%|█████▌ | 6421/11526 [1:07:11<52:16, 1.63it/s] 56%|█████▌ | 6422/11526 [1:07:12<52:17, 1.63it/s] {'loss': 0.1951, 'grad_norm': 0.5212538242340088, 'learning_rate': 4.875082219900615e-06, 'epoch': 1.67}
56%|█████▌ | 6422/11526 [1:07:12<52:17, 1.63it/s] 56%|█████▌ | 6423/11526 [1:07:13<52:15, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.5451244115829468, 'learning_rate': 4.8735683858553265e-06, 'epoch': 1.67}
56%|█████▌ | 6423/11526 [1:07:13<52:15, 1.63it/s] 56%|█████▌ | 6424/11526 [1:07:13<52:16, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.4839470386505127, 'learning_rate': 4.872054563407067e-06, 'epoch': 1.67}
56%|█████▌ | 6424/11526 [1:07:13<52:16, 1.63it/s] 56%|█████▌ | 6425/11526 [1:07:14<52:13, 1.63it/s] {'loss': 0.2671, 'grad_norm': 0.6218132376670837, 'learning_rate': 4.870540752694689e-06, 'epoch': 1.67}
56%|█████▌ | 6425/11526 [1:07:14<52:13, 1.63it/s] 56%|█████▌ | 6426/11526 [1:07:14<52:11, 1.63it/s] {'loss': 0.1552, 'grad_norm': 0.47411367297172546, 'learning_rate': 4.869026953857051e-06, 'epoch': 1.67}
56%|█████▌ | 6426/11526 [1:07:15<52:11, 1.63it/s] 56%|█████▌ | 6427/11526 [1:07:15<52:09, 1.63it/s] {'loss': 0.1718, 'grad_norm': 0.501902163028717, 'learning_rate': 4.867513167033005e-06, 'epoch': 1.67}
56%|█████▌ | 6427/11526 [1:07:15<52:09, 1.63it/s] 56%|█████▌ | 6428/11526 [1:07:16<52:08, 1.63it/s] {'loss': 0.2334, 'grad_norm': 0.6275154948234558, 'learning_rate': 4.865999392361407e-06, 'epoch': 1.67}
56%|█████▌ | 6428/11526 [1:07:16<52:08, 1.63it/s] 56%|█████▌ | 6429/11526 [1:07:16<52:12, 1.63it/s] {'loss': 0.1983, 'grad_norm': 0.5221641063690186, 'learning_rate': 4.864485629981105e-06, 'epoch': 1.67}
56%|█████▌ | 6429/11526 [1:07:16<52:12, 1.63it/s] 56%|█████▌ | 6430/11526 [1:07:17<52:10, 1.63it/s] {'loss': 0.1859, 'grad_norm': 0.5574922561645508, 'learning_rate': 4.862971880030953e-06, 'epoch': 1.67}
56%|█████▌ | 6430/11526 [1:07:17<52:10, 1.63it/s] 56%|█████▌ | 6431/11526 [1:07:17<52:10, 1.63it/s] {'loss': 0.2119, 'grad_norm': 0.6529492139816284, 'learning_rate': 4.8614581426498e-06, 'epoch': 1.67}
56%|█████▌ | 6431/11526 [1:07:18<52:10, 1.63it/s] 56%|█████▌ | 6432/11526 [1:07:18<52:09, 1.63it/s] {'loss': 0.2118, 'grad_norm': 0.6185007691383362, 'learning_rate': 4.859944417976495e-06, 'epoch': 1.67}
56%|█████▌ | 6432/11526 [1:07:18<52:09, 1.63it/s] 56%|█████▌ | 6433/11526 [1:07:19<52:07, 1.63it/s] {'loss': 0.1763, 'grad_norm': 0.5697612762451172, 'learning_rate': 4.858430706149884e-06, 'epoch': 1.67}
56%|█████▌ | 6433/11526 [1:07:19<52:07, 1.63it/s] 56%|█████▌ | 6434/11526 [1:07:19<52:12, 1.63it/s] {'loss': 0.163, 'grad_norm': 0.504367470741272, 'learning_rate': 4.856917007308813e-06, 'epoch': 1.67}
56%|█████▌ | 6434/11526 [1:07:19<52:12, 1.63it/s] 56%|█████▌ | 6435/11526 [1:07:20<52:10, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.4745934009552002, 'learning_rate': 4.855403321592129e-06, 'epoch': 1.67}
56%|█████▌ | 6435/11526 [1:07:20<52:10, 1.63it/s] 56%|█████▌ | 6436/11526 [1:07:21<52:08, 1.63it/s] {'loss': 0.2078, 'grad_norm': 0.53026282787323, 'learning_rate': 4.8538896491386765e-06, 'epoch': 1.68}
56%|█████▌ | 6436/11526 [1:07:21<52:08, 1.63it/s] 56%|█████▌ | 6437/11526 [1:07:21<52:07, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.5057183504104614, 'learning_rate': 4.852375990087296e-06, 'epoch': 1.68}
56%|█████▌ | 6437/11526 [1:07:21<52:07, 1.63it/s] 56%|█████▌ | 6438/11526 [1:07:22<52:05, 1.63it/s] {'loss': 0.2451, 'grad_norm': 0.6527573466300964, 'learning_rate': 4.850862344576828e-06, 'epoch': 1.68}
56%|█████▌ | 6438/11526 [1:07:22<52:05, 1.63it/s] 56%|█████▌ | 6439/11526 [1:07:22<52:06, 1.63it/s] {'loss': 0.2148, 'grad_norm': 0.5500391721725464, 'learning_rate': 4.849348712746116e-06, 'epoch': 1.68}
56%|█████▌ | 6439/11526 [1:07:23<52:06, 1.63it/s] 56%|█████▌ | 6440/11526 [1:07:23<52:04, 1.63it/s] {'loss': 0.2342, 'grad_norm': 0.5802512764930725, 'learning_rate': 4.847835094733997e-06, 'epoch': 1.68}
56%|█████▌ | 6440/11526 [1:07:23<52:04, 1.63it/s] 56%|█████▌ | 6441/11526 [1:07:24<52:06, 1.63it/s] {'loss': 0.2095, 'grad_norm': 0.5713983774185181, 'learning_rate': 4.84632149067931e-06, 'epoch': 1.68}
56%|█████▌ | 6441/11526 [1:07:24<52:06, 1.63it/s] 56%|█████▌ | 6442/11526 [1:07:24<52:06, 1.63it/s] {'loss': 0.2389, 'grad_norm': 0.5943389534950256, 'learning_rate': 4.844807900720888e-06, 'epoch': 1.68}
56%|█████▌ | 6442/11526 [1:07:24<52:06, 1.63it/s] 56%|█████▌ | 6443/11526 [1:07:25<52:05, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.47895678877830505, 'learning_rate': 4.8432943249975685e-06, 'epoch': 1.68}
56%|█████▌ | 6443/11526 [1:07:25<52:05, 1.63it/s] 56%|█████▌ | 6444/11526 [1:07:25<52:08, 1.62it/s] {'loss': 0.1936, 'grad_norm': 0.4836857318878174, 'learning_rate': 4.8417807636481875e-06, 'epoch': 1.68}
56%|█████▌ | 6444/11526 [1:07:26<52:08, 1.62it/s] 56%|█████▌ | 6445/11526 [1:07:26<52:04, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.4824797213077545, 'learning_rate': 4.840267216811574e-06, 'epoch': 1.68}
56%|█████▌ | 6445/11526 [1:07:26<52:04, 1.63it/s] 56%|█████▌ | 6446/11526 [1:07:27<52:02, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.5892367362976074, 'learning_rate': 4.83875368462656e-06, 'epoch': 1.68}
56%|█████▌ | 6446/11526 [1:07:27<52:02, 1.63it/s] 56%|█████▌ | 6447/11526 [1:07:27<52:01, 1.63it/s] {'loss': 0.2168, 'grad_norm': 0.5788378119468689, 'learning_rate': 4.837240167231973e-06, 'epoch': 1.68}
56%|█████▌ | 6447/11526 [1:07:27<52:01, 1.63it/s] 56%|█████▌ | 6448/11526 [1:07:28<52:00, 1.63it/s] {'loss': 0.1714, 'grad_norm': 0.48216867446899414, 'learning_rate': 4.835726664766647e-06, 'epoch': 1.68}
56%|█████▌ | 6448/11526 [1:07:28<52:00, 1.63it/s] 56%|█████▌ | 6449/11526 [1:07:29<52:13, 1.62it/s] {'loss': 0.1998, 'grad_norm': 0.6292220950126648, 'learning_rate': 4.834213177369406e-06, 'epoch': 1.68}
56%|█████▌ | 6449/11526 [1:07:29<52:13, 1.62it/s] 56%|█████▌ | 6450/11526 [1:07:29<52:08, 1.62it/s] {'loss': 0.186, 'grad_norm': 0.48737359046936035, 'learning_rate': 4.832699705179075e-06, 'epoch': 1.68}
56%|█████▌ | 6450/11526 [1:07:29<52:08, 1.62it/s] 56%|█████▌ | 6451/11526 [1:07:30<52:06, 1.62it/s] {'loss': 0.1769, 'grad_norm': 0.49202340841293335, 'learning_rate': 4.831186248334478e-06, 'epoch': 1.68}
56%|█████▌ | 6451/11526 [1:07:30<52:06, 1.62it/s] 56%|█████▌ | 6452/11526 [1:07:30<52:02, 1.63it/s] {'loss': 0.2127, 'grad_norm': 0.6190791130065918, 'learning_rate': 4.829672806974441e-06, 'epoch': 1.68}
56%|█████▌ | 6452/11526 [1:07:31<52:02, 1.63it/s] 56%|█████▌ | 6453/11526 [1:07:31<51:59, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.5332125425338745, 'learning_rate': 4.828159381237782e-06, 'epoch': 1.68}
56%|█████▌ | 6453/11526 [1:07:31<51:59, 1.63it/s] 56%|█████▌ | 6454/11526 [1:07:32<52:10, 1.62it/s] {'loss': 0.1867, 'grad_norm': 0.5084027051925659, 'learning_rate': 4.826645971263322e-06, 'epoch': 1.68}
56%|█████▌ | 6454/11526 [1:07:32<52:10, 1.62it/s] 56%|█████▌ | 6455/11526 [1:07:32<52:04, 1.62it/s] {'loss': 0.2255, 'grad_norm': 0.5790855884552002, 'learning_rate': 4.825132577189881e-06, 'epoch': 1.68}
56%|█████▌ | 6455/11526 [1:07:32<52:04, 1.62it/s] 56%|█████▌ | 6456/11526 [1:07:33<52:00, 1.62it/s] {'loss': 0.159, 'grad_norm': 0.48437678813934326, 'learning_rate': 4.823619199156272e-06, 'epoch': 1.68}
56%|█████▌ | 6456/11526 [1:07:33<52:00, 1.62it/s] 56%|█████▌ | 6457/11526 [1:07:33<51:58, 1.63it/s] {'loss': 0.2258, 'grad_norm': 0.5537893176078796, 'learning_rate': 4.822105837301317e-06, 'epoch': 1.68}
56%|█████▌ | 6457/11526 [1:07:34<51:58, 1.63it/s] 56%|█████▌ | 6458/11526 [1:07:34<51:55, 1.63it/s] {'loss': 0.1786, 'grad_norm': 0.5087190866470337, 'learning_rate': 4.820592491763825e-06, 'epoch': 1.68}
56%|█████▌ | 6458/11526 [1:07:34<51:55, 1.63it/s] 56%|█████▌ | 6459/11526 [1:07:35<51:57, 1.63it/s] {'loss': 0.2337, 'grad_norm': 0.6444538831710815, 'learning_rate': 4.819079162682612e-06, 'epoch': 1.68}
56%|█████▌ | 6459/11526 [1:07:35<51:57, 1.63it/s] 56%|█████▌ | 6460/11526 [1:07:35<51:55, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.5108135342597961, 'learning_rate': 4.817565850196484e-06, 'epoch': 1.68}
56%|█████▌ | 6460/11526 [1:07:35<51:55, 1.63it/s] 56%|█████▌ | 6461/11526 [1:07:36<51:52, 1.63it/s] {'loss': 0.1803, 'grad_norm': 0.5546877980232239, 'learning_rate': 4.816052554444259e-06, 'epoch': 1.68}
56%|█████▌ | 6461/11526 [1:07:36<51:52, 1.63it/s] 56%|█████▌ | 6462/11526 [1:07:37<51:53, 1.63it/s] {'loss': 0.2615, 'grad_norm': 0.6243659257888794, 'learning_rate': 4.814539275564737e-06, 'epoch': 1.68}
56%|█████▌ | 6462/11526 [1:07:37<51:53, 1.63it/s] 56%|█████▌ | 6463/11526 [1:07:37<51:51, 1.63it/s] {'loss': 0.1961, 'grad_norm': 0.5089245438575745, 'learning_rate': 4.813026013696729e-06, 'epoch': 1.68}
56%|█████▌ | 6463/11526 [1:07:37<51:51, 1.63it/s] 56%|█████▌ | 6464/11526 [1:07:38<51:53, 1.63it/s] {'loss': 0.1749, 'grad_norm': 0.48316293954849243, 'learning_rate': 4.811512768979041e-06, 'epoch': 1.68}
56%|█████▌ | 6464/11526 [1:07:38<51:53, 1.63it/s] 56%|█████▌ | 6465/11526 [1:07:38<51:51, 1.63it/s] {'loss': 0.183, 'grad_norm': 0.4953364431858063, 'learning_rate': 4.809999541550469e-06, 'epoch': 1.68}
56%|█████▌ | 6465/11526 [1:07:39<51:51, 1.63it/s] 56%|█████▌ | 6466/11526 [1:07:39<51:50, 1.63it/s] {'loss': 0.1927, 'grad_norm': 0.5129945278167725, 'learning_rate': 4.808486331549824e-06, 'epoch': 1.68}
56%|█████▌ | 6466/11526 [1:07:39<51:50, 1.63it/s] 56%|█████▌ | 6467/11526 [1:07:40<51:48, 1.63it/s] {'loss': 0.2565, 'grad_norm': 0.7531858682632446, 'learning_rate': 4.806973139115902e-06, 'epoch': 1.68}
56%|█████▌ | 6467/11526 [1:07:40<51:48, 1.63it/s] 56%|█████▌ | 6468/11526 [1:07:40<51:46, 1.63it/s] {'loss': 0.2269, 'grad_norm': 0.5982547402381897, 'learning_rate': 4.805459964387502e-06, 'epoch': 1.68}
56%|█████▌ | 6468/11526 [1:07:40<51:46, 1.63it/s] 56%|█████▌ | 6469/11526 [1:07:41<51:49, 1.63it/s] {'loss': 0.1803, 'grad_norm': 0.47391602396965027, 'learning_rate': 4.803946807503419e-06, 'epoch': 1.68}
56%|█████▌ | 6469/11526 [1:07:41<51:49, 1.63it/s] 56%|█████▌ | 6470/11526 [1:07:41<51:47, 1.63it/s] {'loss': 0.1913, 'grad_norm': 0.5429909229278564, 'learning_rate': 4.802433668602451e-06, 'epoch': 1.68}
56%|█████▌ | 6470/11526 [1:07:42<51:47, 1.63it/s] 56%|█████▌ | 6471/11526 [1:07:42<51:46, 1.63it/s] {'loss': 0.1742, 'grad_norm': 0.5136940479278564, 'learning_rate': 4.800920547823392e-06, 'epoch': 1.68}
56%|█████▌ | 6471/11526 [1:07:42<51:46, 1.63it/s] 56%|█████▌ | 6472/11526 [1:07:43<51:45, 1.63it/s] {'loss': 0.2775, 'grad_norm': 0.7903183102607727, 'learning_rate': 4.799407445305034e-06, 'epoch': 1.68}
56%|█████▌ | 6472/11526 [1:07:43<51:45, 1.63it/s] 56%|█████▌ | 6473/11526 [1:07:43<51:43, 1.63it/s] {'loss': 0.1921, 'grad_norm': 0.47358614206314087, 'learning_rate': 4.797894361186166e-06, 'epoch': 1.68}
56%|█████▌ | 6473/11526 [1:07:43<51:43, 1.63it/s] 56%|█████▌ | 6474/11526 [1:07:44<51:46, 1.63it/s] {'loss': 0.2461, 'grad_norm': 0.6135320663452148, 'learning_rate': 4.796381295605574e-06, 'epoch': 1.69}
56%|█████▌ | 6474/11526 [1:07:44<51:46, 1.63it/s] 56%|█████▌ | 6475/11526 [1:07:45<51:43, 1.63it/s] {'loss': 0.2182, 'grad_norm': 0.6208218336105347, 'learning_rate': 4.794868248702052e-06, 'epoch': 1.69}
56%|█████▌ | 6475/11526 [1:07:45<51:43, 1.63it/s] 56%|█████▌ | 6476/11526 [1:07:45<51:40, 1.63it/s] {'loss': 0.1413, 'grad_norm': 0.3888228237628937, 'learning_rate': 4.793355220614379e-06, 'epoch': 1.69}
56%|█████▌ | 6476/11526 [1:07:45<51:40, 1.63it/s] 56%|█████▌ | 6477/11526 [1:07:46<51:41, 1.63it/s] {'loss': 0.181, 'grad_norm': 0.4578274190425873, 'learning_rate': 4.791842211481343e-06, 'epoch': 1.69}
56%|█████▌ | 6477/11526 [1:07:46<51:41, 1.63it/s] 56%|█████▌ | 6478/11526 [1:07:46<51:41, 1.63it/s] {'loss': 0.2393, 'grad_norm': 0.615490198135376, 'learning_rate': 4.790329221441721e-06, 'epoch': 1.69}
56%|█████▌ | 6478/11526 [1:07:47<51:41, 1.63it/s] 56%|█████▌ | 6479/11526 [1:07:47<51:43, 1.63it/s] {'loss': 0.2087, 'grad_norm': 0.49627184867858887, 'learning_rate': 4.788816250634298e-06, 'epoch': 1.69}
56%|█████▌ | 6479/11526 [1:07:47<51:43, 1.63it/s] 56%|█████▌ | 6480/11526 [1:07:48<51:41, 1.63it/s] {'loss': 0.2856, 'grad_norm': 0.7581702470779419, 'learning_rate': 4.787303299197849e-06, 'epoch': 1.69}
56%|█████▌ | 6480/11526 [1:07:48<51:41, 1.63it/s] 56%|█████▌ | 6481/11526 [1:07:48<51:40, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.5478861331939697, 'learning_rate': 4.785790367271153e-06, 'epoch': 1.69}
56%|█████▌ | 6481/11526 [1:07:48<51:40, 1.63it/s] 56%|█████▌ | 6482/11526 [1:07:49<51:39, 1.63it/s] {'loss': 0.2282, 'grad_norm': 0.6495162844657898, 'learning_rate': 4.784277454992983e-06, 'epoch': 1.69}
56%|█████▌ | 6482/11526 [1:07:49<51:39, 1.63it/s] 56%|█████▌ | 6483/11526 [1:07:49<51:39, 1.63it/s] {'loss': 0.2585, 'grad_norm': 0.7145475149154663, 'learning_rate': 4.78276456250211e-06, 'epoch': 1.69}
56%|█████▌ | 6483/11526 [1:07:50<51:39, 1.63it/s] 56%|█████▋ | 6484/11526 [1:07:50<51:42, 1.63it/s] {'loss': 0.1884, 'grad_norm': 0.484144926071167, 'learning_rate': 4.781251689937309e-06, 'epoch': 1.69}
56%|█████▋ | 6484/11526 [1:07:50<51:42, 1.63it/s] 56%|█████▋ | 6485/11526 [1:07:51<51:42, 1.62it/s] {'loss': 0.2089, 'grad_norm': 0.5953325033187866, 'learning_rate': 4.779738837437348e-06, 'epoch': 1.69}
56%|█████▋ | 6485/11526 [1:07:51<51:42, 1.62it/s] 56%|█████▋ | 6486/11526 [1:07:51<51:40, 1.63it/s] {'loss': 0.203, 'grad_norm': 0.5758793950080872, 'learning_rate': 4.778226005140994e-06, 'epoch': 1.69}
56%|█████▋ | 6486/11526 [1:07:51<51:40, 1.63it/s] 56%|█████▋ | 6487/11526 [1:07:52<51:39, 1.63it/s] {'loss': 0.1871, 'grad_norm': 0.5182952880859375, 'learning_rate': 4.776713193187012e-06, 'epoch': 1.69}
56%|█████▋ | 6487/11526 [1:07:52<51:39, 1.63it/s] 56%|█████▋ | 6488/11526 [1:07:53<51:37, 1.63it/s] {'loss': 0.2078, 'grad_norm': 0.5810613036155701, 'learning_rate': 4.775200401714165e-06, 'epoch': 1.69}
56%|█████▋ | 6488/11526 [1:07:53<51:37, 1.63it/s] 56%|█████▋ | 6489/11526 [1:07:53<51:37, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.6485875248908997, 'learning_rate': 4.773687630861219e-06, 'epoch': 1.69}
56%|█████▋ | 6489/11526 [1:07:53<51:37, 1.63it/s] 56%|█████▋ | 6490/11526 [1:07:54<51:35, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.47306209802627563, 'learning_rate': 4.772174880766931e-06, 'epoch': 1.69}
56%|█████▋ | 6490/11526 [1:07:54<51:35, 1.63it/s] 56%|█████▋ | 6491/11526 [1:07:54<51:33, 1.63it/s] {'loss': 0.1957, 'grad_norm': 0.49224376678466797, 'learning_rate': 4.770662151570057e-06, 'epoch': 1.69}
56%|█████▋ | 6491/11526 [1:07:54<51:33, 1.63it/s] 56%|█████▋ | 6492/11526 [1:07:55<51:30, 1.63it/s] {'loss': 0.1767, 'grad_norm': 0.4996158182621002, 'learning_rate': 4.7691494434093555e-06, 'epoch': 1.69}
56%|█████▋ | 6492/11526 [1:07:55<51:30, 1.63it/s] 56%|█████▋ | 6493/11526 [1:07:56<51:30, 1.63it/s] {'loss': 0.152, 'grad_norm': 0.45001423358917236, 'learning_rate': 4.767636756423582e-06, 'epoch': 1.69}
56%|█████▋ | 6493/11526 [1:07:56<51:30, 1.63it/s] 56%|█████▋ | 6494/11526 [1:07:56<51:31, 1.63it/s] {'loss': 0.2361, 'grad_norm': 0.6413518190383911, 'learning_rate': 4.766124090751488e-06, 'epoch': 1.69}
56%|█████▋ | 6494/11526 [1:07:56<51:31, 1.63it/s] 56%|█████▋ | 6495/11526 [1:07:57<51:28, 1.63it/s] {'loss': 0.1852, 'grad_norm': 0.482014000415802, 'learning_rate': 4.764611446531822e-06, 'epoch': 1.69}
56%|█████▋ | 6495/11526 [1:07:57<51:28, 1.63it/s] 56%|█████▋ | 6496/11526 [1:07:57<51:28, 1.63it/s] {'loss': 0.1919, 'grad_norm': 0.5039687156677246, 'learning_rate': 4.763098823903332e-06, 'epoch': 1.69}
56%|█████▋ | 6496/11526 [1:07:58<51:28, 1.63it/s] 56%|█████▋ | 6497/11526 [1:07:58<51:26, 1.63it/s] {'loss': 0.1547, 'grad_norm': 0.43258604407310486, 'learning_rate': 4.761586223004768e-06, 'epoch': 1.69}
56%|█████▋ | 6497/11526 [1:07:58<51:26, 1.63it/s] 56%|█████▋ | 6498/11526 [1:07:59<51:25, 1.63it/s] {'loss': 0.2216, 'grad_norm': 0.6574431657791138, 'learning_rate': 4.760073643974872e-06, 'epoch': 1.69}
56%|█████▋ | 6498/11526 [1:07:59<51:25, 1.63it/s] 56%|█████▋ | 6499/11526 [1:07:59<51:28, 1.63it/s] {'loss': 0.3061, 'grad_norm': 0.6102513670921326, 'learning_rate': 4.758561086952385e-06, 'epoch': 1.69}
56%|█████▋ | 6499/11526 [1:07:59<51:28, 1.63it/s] 56%|█████▋ | 6500/11526 [1:08:00<51:28, 1.63it/s] {'loss': 0.219, 'grad_norm': 0.5540569424629211, 'learning_rate': 4.757048552076049e-06, 'epoch': 1.69}
56%|█████▋ | 6500/11526 [1:08:00<51:28, 1.63it/s] 56%|█████▋ | 6501/11526 [1:08:01<51:27, 1.63it/s] {'loss': 0.2613, 'grad_norm': 0.6693131327629089, 'learning_rate': 4.755536039484603e-06, 'epoch': 1.69}
56%|█████▋ | 6501/11526 [1:08:01<51:27, 1.63it/s] 56%|█████▋ | 6502/11526 [1:08:01<51:25, 1.63it/s] {'loss': 0.1805, 'grad_norm': 0.524438738822937, 'learning_rate': 4.754023549316783e-06, 'epoch': 1.69}
56%|█████▋ | 6502/11526 [1:08:01<51:25, 1.63it/s] 56%|█████▋ | 6503/11526 [1:08:02<51:25, 1.63it/s] {'loss': 0.2523, 'grad_norm': 0.6485269665718079, 'learning_rate': 4.752511081711321e-06, 'epoch': 1.69}
56%|█████▋ | 6503/11526 [1:08:02<51:25, 1.63it/s] 56%|█████▋ | 6504/11526 [1:08:02<51:26, 1.63it/s] {'loss': 0.1948, 'grad_norm': 0.5512745380401611, 'learning_rate': 4.750998636806953e-06, 'epoch': 1.69}
56%|█████▋ | 6504/11526 [1:08:02<51:26, 1.63it/s] 56%|█████▋ | 6505/11526 [1:08:03<51:24, 1.63it/s] {'loss': 0.1806, 'grad_norm': 0.492348849773407, 'learning_rate': 4.749486214742403e-06, 'epoch': 1.69}
56%|█████▋ | 6505/11526 [1:08:03<51:24, 1.63it/s] 56%|█████▋ | 6506/11526 [1:08:04<51:23, 1.63it/s] {'loss': 0.1825, 'grad_norm': 0.44888636469841003, 'learning_rate': 4.747973815656406e-06, 'epoch': 1.69}
56%|█████▋ | 6506/11526 [1:08:04<51:23, 1.63it/s] 56%|█████▋ | 6507/11526 [1:08:04<51:22, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.553712010383606, 'learning_rate': 4.746461439687684e-06, 'epoch': 1.69}
56%|█████▋ | 6507/11526 [1:08:04<51:22, 1.63it/s] 56%|█████▋ | 6508/11526 [1:08:05<51:22, 1.63it/s] {'loss': 0.2069, 'grad_norm': 0.505462646484375, 'learning_rate': 4.744949086974962e-06, 'epoch': 1.69}
56%|█████▋ | 6508/11526 [1:08:05<51:22, 1.63it/s] 56%|█████▋ | 6509/11526 [1:08:05<51:23, 1.63it/s] {'loss': 0.1799, 'grad_norm': 0.5272694230079651, 'learning_rate': 4.7434367576569575e-06, 'epoch': 1.69}
56%|█████▋ | 6509/11526 [1:08:06<51:23, 1.63it/s] 56%|█████▋ | 6510/11526 [1:08:06<51:21, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.5979765057563782, 'learning_rate': 4.7419244518723975e-06, 'epoch': 1.69}
56%|█████▋ | 6510/11526 [1:08:06<51:21, 1.63it/s] 56%|█████▋ | 6511/11526 [1:08:07<51:19, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.4878505766391754, 'learning_rate': 4.740412169759993e-06, 'epoch': 1.69}
56%|█████▋ | 6511/11526 [1:08:07<51:19, 1.63it/s] 56%|█████▋ | 6512/11526 [1:08:07<51:19, 1.63it/s] {'loss': 0.1781, 'grad_norm': 0.5020973682403564, 'learning_rate': 4.738899911458462e-06, 'epoch': 1.69}
56%|█████▋ | 6512/11526 [1:08:07<51:19, 1.63it/s] 57%|█████▋ | 6513/11526 [1:08:08<51:18, 1.63it/s] {'loss': 0.2032, 'grad_norm': 0.5300800800323486, 'learning_rate': 4.737387677106518e-06, 'epoch': 1.7}
57%|█████▋ | 6513/11526 [1:08:08<51:18, 1.63it/s] 57%|█████▋ | 6514/11526 [1:08:09<51:30, 1.62it/s] {'loss': 0.1703, 'grad_norm': 0.7040746212005615, 'learning_rate': 4.735875466842867e-06, 'epoch': 1.7}
57%|█████▋ | 6514/11526 [1:08:09<51:30, 1.62it/s] 57%|█████▋ | 6515/11526 [1:08:09<51:27, 1.62it/s] {'loss': 0.2616, 'grad_norm': 0.6312313675880432, 'learning_rate': 4.734363280806223e-06, 'epoch': 1.7}
57%|█████▋ | 6515/11526 [1:08:09<51:27, 1.62it/s] 57%|█████▋ | 6516/11526 [1:08:10<51:22, 1.63it/s] {'loss': 0.2111, 'grad_norm': 0.5571659207344055, 'learning_rate': 4.73285111913529e-06, 'epoch': 1.7}
57%|█████▋ | 6516/11526 [1:08:10<51:22, 1.63it/s] 57%|█████▋ | 6517/11526 [1:08:10<51:21, 1.63it/s] {'loss': 0.1456, 'grad_norm': 0.47360333800315857, 'learning_rate': 4.731338981968774e-06, 'epoch': 1.7}
57%|█████▋ | 6517/11526 [1:08:10<51:21, 1.63it/s] 57%|█████▋ | 6518/11526 [1:08:11<51:19, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.547666609287262, 'learning_rate': 4.729826869445372e-06, 'epoch': 1.7}
57%|█████▋ | 6518/11526 [1:08:11<51:19, 1.63it/s] 57%|█████▋ | 6519/11526 [1:08:12<51:21, 1.62it/s] {'loss': 0.1982, 'grad_norm': 0.628093421459198, 'learning_rate': 4.728314781703788e-06, 'epoch': 1.7}
57%|█████▋ | 6519/11526 [1:08:12<51:21, 1.62it/s] 57%|█████▋ | 6520/11526 [1:08:12<51:17, 1.63it/s] {'loss': 0.1973, 'grad_norm': 1.2986061573028564, 'learning_rate': 4.726802718882719e-06, 'epoch': 1.7}
57%|█████▋ | 6520/11526 [1:08:12<51:17, 1.63it/s] 57%|█████▋ | 6521/11526 [1:08:13<51:16, 1.63it/s] {'loss': 0.2837, 'grad_norm': 0.677569568157196, 'learning_rate': 4.72529068112086e-06, 'epoch': 1.7}
57%|█████▋ | 6521/11526 [1:08:13<51:16, 1.63it/s] 57%|█████▋ | 6522/11526 [1:08:13<51:14, 1.63it/s] {'loss': 0.2116, 'grad_norm': 0.5718197226524353, 'learning_rate': 4.723778668556901e-06, 'epoch': 1.7}
57%|█████▋ | 6522/11526 [1:08:14<51:14, 1.63it/s] 57%|█████▋ | 6523/11526 [1:08:14<51:12, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.44313162565231323, 'learning_rate': 4.722266681329533e-06, 'epoch': 1.7}
57%|█████▋ | 6523/11526 [1:08:14<51:12, 1.63it/s] 57%|█████▋ | 6524/11526 [1:08:15<51:13, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.424195259809494, 'learning_rate': 4.720754719577448e-06, 'epoch': 1.7}
57%|█████▋ | 6524/11526 [1:08:15<51:13, 1.63it/s] 57%|█████▋ | 6525/11526 [1:08:15<51:12, 1.63it/s] {'loss': 0.202, 'grad_norm': 0.5398566722869873, 'learning_rate': 4.719242783439328e-06, 'epoch': 1.7}
57%|█████▋ | 6525/11526 [1:08:15<51:12, 1.63it/s] 57%|█████▋ | 6526/11526 [1:08:16<51:11, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.6982064247131348, 'learning_rate': 4.717730873053857e-06, 'epoch': 1.7}
57%|█████▋ | 6526/11526 [1:08:16<51:11, 1.63it/s] 57%|█████▋ | 6527/11526 [1:08:16<51:11, 1.63it/s] {'loss': 0.1809, 'grad_norm': 0.5647715330123901, 'learning_rate': 4.716218988559715e-06, 'epoch': 1.7}
57%|█████▋ | 6527/11526 [1:08:17<51:11, 1.63it/s] 57%|█████▋ | 6528/11526 [1:08:17<51:11, 1.63it/s] {'loss': 0.1781, 'grad_norm': 0.5043041706085205, 'learning_rate': 4.7147071300955845e-06, 'epoch': 1.7}
57%|█████▋ | 6528/11526 [1:08:17<51:11, 1.63it/s] 57%|█████▋ | 6529/11526 [1:08:18<51:14, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.6250106692314148, 'learning_rate': 4.713195297800139e-06, 'epoch': 1.7}
57%|█████▋ | 6529/11526 [1:08:18<51:14, 1.63it/s] 57%|█████▋ | 6530/11526 [1:08:18<51:12, 1.63it/s] {'loss': 0.1744, 'grad_norm': 0.49664372205734253, 'learning_rate': 4.711683491812051e-06, 'epoch': 1.7}
57%|█████▋ | 6530/11526 [1:08:18<51:12, 1.63it/s] 57%|█████▋ | 6531/11526 [1:08:19<51:10, 1.63it/s] {'loss': 0.2136, 'grad_norm': 0.59415602684021, 'learning_rate': 4.710171712269995e-06, 'epoch': 1.7}
57%|█████▋ | 6531/11526 [1:08:19<51:10, 1.63it/s] 57%|█████▋ | 6532/11526 [1:08:20<51:11, 1.63it/s] {'loss': 0.2087, 'grad_norm': 0.5096461772918701, 'learning_rate': 4.708659959312637e-06, 'epoch': 1.7}
57%|█████▋ | 6532/11526 [1:08:20<51:11, 1.63it/s] 57%|█████▋ | 6533/11526 [1:08:20<51:10, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.5777961015701294, 'learning_rate': 4.707148233078647e-06, 'epoch': 1.7}
57%|█████▋ | 6533/11526 [1:08:20<51:10, 1.63it/s] 57%|█████▋ | 6534/11526 [1:08:21<51:23, 1.62it/s] {'loss': 0.1616, 'grad_norm': 0.4589294195175171, 'learning_rate': 4.705636533706686e-06, 'epoch': 1.7}
57%|█████▋ | 6534/11526 [1:08:21<51:23, 1.62it/s] 57%|█████▋ | 6535/11526 [1:08:21<51:17, 1.62it/s] {'loss': 0.1974, 'grad_norm': 0.5412936806678772, 'learning_rate': 4.704124861335418e-06, 'epoch': 1.7}
57%|█████▋ | 6535/11526 [1:08:22<51:17, 1.62it/s] 57%|█████▋ | 6536/11526 [1:08:22<51:12, 1.62it/s] {'loss': 0.2184, 'grad_norm': 0.6246803402900696, 'learning_rate': 4.7026132161035e-06, 'epoch': 1.7}
57%|█████▋ | 6536/11526 [1:08:22<51:12, 1.62it/s] 57%|█████▋ | 6537/11526 [1:08:23<51:09, 1.63it/s] {'loss': 0.1523, 'grad_norm': 0.47967708110809326, 'learning_rate': 4.701101598149591e-06, 'epoch': 1.7}
57%|█████▋ | 6537/11526 [1:08:23<51:09, 1.63it/s] 57%|█████▋ | 6538/11526 [1:08:23<51:06, 1.63it/s] {'loss': 0.2131, 'grad_norm': 0.5122146606445312, 'learning_rate': 4.6995900076123436e-06, 'epoch': 1.7}
57%|█████▋ | 6538/11526 [1:08:23<51:06, 1.63it/s] 57%|█████▋ | 6539/11526 [1:08:24<51:18, 1.62it/s] {'loss': 0.1894, 'grad_norm': 0.5267069339752197, 'learning_rate': 4.69807844463041e-06, 'epoch': 1.7}
57%|█████▋ | 6539/11526 [1:08:24<51:18, 1.62it/s] 57%|█████▋ | 6540/11526 [1:08:24<51:13, 1.62it/s] {'loss': 0.2475, 'grad_norm': 0.6401429176330566, 'learning_rate': 4.696566909342439e-06, 'epoch': 1.7}
57%|█████▋ | 6540/11526 [1:08:25<51:13, 1.62it/s] 57%|█████▋ | 6541/11526 [1:08:25<51:10, 1.62it/s] {'loss': 0.203, 'grad_norm': 0.49709251523017883, 'learning_rate': 4.695055401887078e-06, 'epoch': 1.7}
57%|█████▋ | 6541/11526 [1:08:25<51:10, 1.62it/s] 57%|█████▋ | 6542/11526 [1:08:26<51:08, 1.62it/s] {'loss': 0.1604, 'grad_norm': 0.48340874910354614, 'learning_rate': 4.693543922402971e-06, 'epoch': 1.7}
57%|█████▋ | 6542/11526 [1:08:26<51:08, 1.62it/s] 57%|█████▋ | 6543/11526 [1:08:26<51:05, 1.63it/s] {'loss': 0.1577, 'grad_norm': 0.4753386974334717, 'learning_rate': 4.692032471028759e-06, 'epoch': 1.7}
57%|█████▋ | 6543/11526 [1:08:26<51:05, 1.63it/s] 57%|█████▋ | 6544/11526 [1:08:27<51:07, 1.62it/s] {'loss': 0.2086, 'grad_norm': 0.5017691850662231, 'learning_rate': 4.69052104790308e-06, 'epoch': 1.7}
57%|█████▋ | 6544/11526 [1:08:27<51:07, 1.62it/s] 57%|█████▋ | 6545/11526 [1:08:28<51:06, 1.62it/s] {'loss': 0.2136, 'grad_norm': 0.5690110921859741, 'learning_rate': 4.689009653164571e-06, 'epoch': 1.7}
57%|█████▋ | 6545/11526 [1:08:28<51:06, 1.62it/s] 57%|█████▋ | 6546/11526 [1:08:28<51:02, 1.63it/s] {'loss': 0.2032, 'grad_norm': 0.5804017782211304, 'learning_rate': 4.687498286951867e-06, 'epoch': 1.7}
57%|█████▋ | 6546/11526 [1:08:28<51:02, 1.63it/s] 57%|█████▋ | 6547/11526 [1:08:29<50:59, 1.63it/s] {'loss': 0.259, 'grad_norm': 0.5692179799079895, 'learning_rate': 4.6859869494036e-06, 'epoch': 1.7}
57%|█████▋ | 6547/11526 [1:08:29<50:59, 1.63it/s] 57%|█████▋ | 6548/11526 [1:08:29<50:59, 1.63it/s] {'loss': 0.2581, 'grad_norm': 0.6067357659339905, 'learning_rate': 4.684475640658394e-06, 'epoch': 1.7}
57%|█████▋ | 6548/11526 [1:08:30<50:59, 1.63it/s] 57%|█████▋ | 6549/11526 [1:08:30<51:01, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.5902701616287231, 'learning_rate': 4.682964360854878e-06, 'epoch': 1.7}
57%|█████▋ | 6549/11526 [1:08:30<51:01, 1.63it/s] 57%|█████▋ | 6550/11526 [1:08:31<50:58, 1.63it/s] {'loss': 0.2243, 'grad_norm': 0.5613006353378296, 'learning_rate': 4.681453110131675e-06, 'epoch': 1.7}
57%|█████▋ | 6550/11526 [1:08:31<50:58, 1.63it/s] 57%|█████▋ | 6551/11526 [1:08:31<50:57, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.6052939295768738, 'learning_rate': 4.679941888627406e-06, 'epoch': 1.71}
57%|█████▋ | 6551/11526 [1:08:31<50:57, 1.63it/s] 57%|█████▋ | 6552/11526 [1:08:32<50:55, 1.63it/s] {'loss': 0.1421, 'grad_norm': 0.4489807188510895, 'learning_rate': 4.678430696480687e-06, 'epoch': 1.71}
57%|█████▋ | 6552/11526 [1:08:32<50:55, 1.63it/s] 57%|█████▋ | 6553/11526 [1:08:32<50:52, 1.63it/s] {'loss': 0.324, 'grad_norm': 0.6578356623649597, 'learning_rate': 4.676919533830135e-06, 'epoch': 1.71}
57%|█████▋ | 6553/11526 [1:08:33<50:52, 1.63it/s] 57%|█████▋ | 6554/11526 [1:08:33<50:57, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.587263286113739, 'learning_rate': 4.67540840081436e-06, 'epoch': 1.71}
57%|█████▋ | 6554/11526 [1:08:33<50:57, 1.63it/s] 57%|█████▋ | 6555/11526 [1:08:34<50:54, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.5559097528457642, 'learning_rate': 4.673897297571975e-06, 'epoch': 1.71}
57%|█████▋ | 6555/11526 [1:08:34<50:54, 1.63it/s] 57%|█████▋ | 6556/11526 [1:08:34<50:50, 1.63it/s] {'loss': 0.1628, 'grad_norm': 0.48127201199531555, 'learning_rate': 4.672386224241584e-06, 'epoch': 1.71}
57%|█████▋ | 6556/11526 [1:08:34<50:50, 1.63it/s] 57%|█████▋ | 6557/11526 [1:08:35<50:51, 1.63it/s] {'loss': 0.1886, 'grad_norm': 0.5660018920898438, 'learning_rate': 4.670875180961794e-06, 'epoch': 1.71}
57%|█████▋ | 6557/11526 [1:08:35<50:51, 1.63it/s] 57%|█████▋ | 6558/11526 [1:08:36<50:50, 1.63it/s] {'loss': 0.2999, 'grad_norm': 0.6729249954223633, 'learning_rate': 4.669364167871203e-06, 'epoch': 1.71}
57%|█████▋ | 6558/11526 [1:08:36<50:50, 1.63it/s] 57%|█████▋ | 6559/11526 [1:08:36<50:53, 1.63it/s] {'loss': 0.197, 'grad_norm': 0.5192663669586182, 'learning_rate': 4.667853185108414e-06, 'epoch': 1.71}
57%|█████▋ | 6559/11526 [1:08:36<50:53, 1.63it/s] 57%|█████▋ | 6560/11526 [1:08:37<50:51, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.5286502242088318, 'learning_rate': 4.6663422328120196e-06, 'epoch': 1.71}
57%|█████▋ | 6560/11526 [1:08:37<50:51, 1.63it/s] 57%|█████▋ | 6561/11526 [1:08:37<50:52, 1.63it/s] {'loss': 0.2507, 'grad_norm': 0.5697177052497864, 'learning_rate': 4.664831311120615e-06, 'epoch': 1.71}
57%|█████▋ | 6561/11526 [1:08:38<50:52, 1.63it/s] 57%|█████▋ | 6562/11526 [1:08:38<50:48, 1.63it/s] {'loss': 0.3097, 'grad_norm': 0.7155141830444336, 'learning_rate': 4.6633204201727895e-06, 'epoch': 1.71}
57%|█████▋ | 6562/11526 [1:08:38<50:48, 1.63it/s] 57%|█████▋ | 6563/11526 [1:08:39<50:49, 1.63it/s] {'loss': 0.1736, 'grad_norm': 0.4714343249797821, 'learning_rate': 4.6618095601071284e-06, 'epoch': 1.71}
57%|█████▋ | 6563/11526 [1:08:39<50:49, 1.63it/s] 57%|█████▋ | 6564/11526 [1:08:39<50:52, 1.63it/s] {'loss': 0.2602, 'grad_norm': 0.645111083984375, 'learning_rate': 4.66029873106222e-06, 'epoch': 1.71}
57%|█████▋ | 6564/11526 [1:08:39<50:52, 1.63it/s] 57%|█████▋ | 6565/11526 [1:08:40<50:49, 1.63it/s] {'loss': 0.2384, 'grad_norm': 0.6117613315582275, 'learning_rate': 4.6587879331766465e-06, 'epoch': 1.71}
57%|█████▋ | 6565/11526 [1:08:40<50:49, 1.63it/s] 57%|█████▋ | 6566/11526 [1:08:40<50:46, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.518587589263916, 'learning_rate': 4.657277166588984e-06, 'epoch': 1.71}
57%|█████▋ | 6566/11526 [1:08:41<50:46, 1.63it/s] 57%|█████▋ | 6567/11526 [1:08:41<50:45, 1.63it/s] {'loss': 0.1839, 'grad_norm': 0.5456492304801941, 'learning_rate': 4.655766431437808e-06, 'epoch': 1.71}
57%|█████▋ | 6567/11526 [1:08:41<50:45, 1.63it/s] 57%|█████▋ | 6568/11526 [1:08:42<50:41, 1.63it/s] {'loss': 0.2247, 'grad_norm': 0.5736673474311829, 'learning_rate': 4.654255727861695e-06, 'epoch': 1.71}
57%|█████▋ | 6568/11526 [1:08:42<50:41, 1.63it/s] 57%|█████▋ | 6569/11526 [1:08:42<50:46, 1.63it/s] {'loss': 0.2216, 'grad_norm': 0.5315992832183838, 'learning_rate': 4.652745055999214e-06, 'epoch': 1.71}
57%|█████▋ | 6569/11526 [1:08:42<50:46, 1.63it/s] 57%|█████▋ | 6570/11526 [1:08:43<50:48, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.5466523170471191, 'learning_rate': 4.6512344159889335e-06, 'epoch': 1.71}
57%|█████▋ | 6570/11526 [1:08:43<50:48, 1.63it/s] 57%|█████▋ | 6571/11526 [1:08:44<50:45, 1.63it/s] {'loss': 0.1818, 'grad_norm': 0.49178943037986755, 'learning_rate': 4.649723807969415e-06, 'epoch': 1.71}
57%|█████▋ | 6571/11526 [1:08:44<50:45, 1.63it/s] 57%|█████▋ | 6572/11526 [1:08:44<50:44, 1.63it/s] {'loss': 0.1748, 'grad_norm': 0.46597805619239807, 'learning_rate': 4.648213232079221e-06, 'epoch': 1.71}
57%|█████▋ | 6572/11526 [1:08:44<50:44, 1.63it/s] 57%|█████▋ | 6573/11526 [1:08:45<50:47, 1.63it/s] {'loss': 0.2351, 'grad_norm': 0.5701265931129456, 'learning_rate': 4.646702688456913e-06, 'epoch': 1.71}
57%|█████▋ | 6573/11526 [1:08:45<50:47, 1.63it/s] 57%|█████▋ | 6574/11526 [1:08:45<50:47, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.5322268009185791, 'learning_rate': 4.645192177241044e-06, 'epoch': 1.71}
57%|█████▋ | 6574/11526 [1:08:46<50:47, 1.63it/s] 57%|█████▋ | 6575/11526 [1:08:46<50:43, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.6088309288024902, 'learning_rate': 4.643681698570167e-06, 'epoch': 1.71}
57%|█████▋ | 6575/11526 [1:08:46<50:43, 1.63it/s] 57%|█████▋ | 6576/11526 [1:08:47<50:41, 1.63it/s] {'loss': 0.2157, 'grad_norm': 0.6016860604286194, 'learning_rate': 4.64217125258283e-06, 'epoch': 1.71}
57%|█████▋ | 6576/11526 [1:08:47<50:41, 1.63it/s] 57%|█████▋ | 6577/11526 [1:08:47<50:39, 1.63it/s] {'loss': 0.1805, 'grad_norm': 0.4768587052822113, 'learning_rate': 4.640660839417585e-06, 'epoch': 1.71}
57%|█████▋ | 6577/11526 [1:08:47<50:39, 1.63it/s] 57%|█████▋ | 6578/11526 [1:08:48<50:38, 1.63it/s] {'loss': 0.1988, 'grad_norm': 0.5991699695587158, 'learning_rate': 4.639150459212972e-06, 'epoch': 1.71}
57%|█████▋ | 6578/11526 [1:08:48<50:38, 1.63it/s] 57%|█████▋ | 6579/11526 [1:08:48<50:41, 1.63it/s] {'loss': 0.1676, 'grad_norm': 0.48064345121383667, 'learning_rate': 4.637640112107531e-06, 'epoch': 1.71}
57%|█████▋ | 6579/11526 [1:08:49<50:41, 1.63it/s] 57%|█████▋ | 6580/11526 [1:08:49<50:40, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.44334423542022705, 'learning_rate': 4.636129798239801e-06, 'epoch': 1.71}
57%|█████▋ | 6580/11526 [1:08:49<50:40, 1.63it/s] 57%|█████▋ | 6581/11526 [1:08:50<50:38, 1.63it/s] {'loss': 0.1772, 'grad_norm': 0.49401167035102844, 'learning_rate': 4.634619517748315e-06, 'epoch': 1.71}
57%|█████▋ | 6581/11526 [1:08:50<50:38, 1.63it/s] 57%|█████▋ | 6582/11526 [1:08:50<50:37, 1.63it/s] {'loss': 0.2142, 'grad_norm': 0.6386318802833557, 'learning_rate': 4.633109270771608e-06, 'epoch': 1.71}
57%|█████▋ | 6582/11526 [1:08:50<50:37, 1.63it/s] 57%|█████▋ | 6583/11526 [1:08:51<50:35, 1.63it/s] {'loss': 0.2292, 'grad_norm': 0.6037657856941223, 'learning_rate': 4.631599057448204e-06, 'epoch': 1.71}
57%|█████▋ | 6583/11526 [1:08:51<50:35, 1.63it/s] 57%|█████▋ | 6584/11526 [1:08:52<50:38, 1.63it/s] {'loss': 0.2431, 'grad_norm': 0.6126949787139893, 'learning_rate': 4.630088877916633e-06, 'epoch': 1.71}
57%|█████▋ | 6584/11526 [1:08:52<50:38, 1.63it/s] 57%|█████▋ | 6585/11526 [1:08:52<50:36, 1.63it/s] {'loss': 0.2384, 'grad_norm': 0.5016792416572571, 'learning_rate': 4.628578732315412e-06, 'epoch': 1.71}
57%|█████▋ | 6585/11526 [1:08:52<50:36, 1.63it/s] 57%|█████▋ | 6586/11526 [1:08:53<50:37, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.5211735367774963, 'learning_rate': 4.627068620783064e-06, 'epoch': 1.71}
57%|█████▋ | 6586/11526 [1:08:53<50:37, 1.63it/s] 57%|█████▋ | 6587/11526 [1:08:53<50:36, 1.63it/s] {'loss': 0.2306, 'grad_norm': 0.5495870113372803, 'learning_rate': 4.6255585434581045e-06, 'epoch': 1.71}
57%|█████▋ | 6587/11526 [1:08:54<50:36, 1.63it/s] 57%|█████▋ | 6588/11526 [1:08:54<50:34, 1.63it/s] {'loss': 0.2367, 'grad_norm': 0.5857179760932922, 'learning_rate': 4.6240485004790465e-06, 'epoch': 1.71}
57%|█████▋ | 6588/11526 [1:08:54<50:34, 1.63it/s] 57%|█████▋ | 6589/11526 [1:08:55<50:36, 1.63it/s] {'loss': 0.197, 'grad_norm': 0.5462257862091064, 'learning_rate': 4.6225384919843965e-06, 'epoch': 1.71}
57%|█████▋ | 6589/11526 [1:08:55<50:36, 1.63it/s] 57%|█████▋ | 6590/11526 [1:08:55<50:34, 1.63it/s] {'loss': 0.1755, 'grad_norm': 0.4971558153629303, 'learning_rate': 4.621028518112667e-06, 'epoch': 1.72}
57%|█████▋ | 6590/11526 [1:08:55<50:34, 1.63it/s] 57%|█████▋ | 6591/11526 [1:08:56<50:33, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.5561574697494507, 'learning_rate': 4.619518579002356e-06, 'epoch': 1.72}
57%|█████▋ | 6591/11526 [1:08:56<50:33, 1.63it/s] 57%|█████▋ | 6592/11526 [1:08:56<50:31, 1.63it/s] {'loss': 0.2541, 'grad_norm': 0.5818633437156677, 'learning_rate': 4.618008674791967e-06, 'epoch': 1.72}
57%|█████▋ | 6592/11526 [1:08:57<50:31, 1.63it/s] 57%|█████▋ | 6593/11526 [1:08:57<50:32, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.5210739374160767, 'learning_rate': 4.616498805619996e-06, 'epoch': 1.72}
57%|█████▋ | 6593/11526 [1:08:57<50:32, 1.63it/s] 57%|█████▋ | 6594/11526 [1:08:58<50:36, 1.62it/s] {'loss': 0.2074, 'grad_norm': 0.5313827991485596, 'learning_rate': 4.614988971624934e-06, 'epoch': 1.72}
57%|█████▋ | 6594/11526 [1:08:58<50:36, 1.62it/s] 57%|█████▋ | 6595/11526 [1:08:58<50:34, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.4540694057941437, 'learning_rate': 4.6134791729452754e-06, 'epoch': 1.72}
57%|█████▋ | 6595/11526 [1:08:58<50:34, 1.63it/s] 57%|█████▋ | 6596/11526 [1:08:59<50:30, 1.63it/s] {'loss': 0.171, 'grad_norm': 0.49428215622901917, 'learning_rate': 4.611969409719507e-06, 'epoch': 1.72}
57%|█████▋ | 6596/11526 [1:08:59<50:30, 1.63it/s] 57%|█████▋ | 6597/11526 [1:09:00<50:29, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.4852031171321869, 'learning_rate': 4.610459682086114e-06, 'epoch': 1.72}
57%|█████▋ | 6597/11526 [1:09:00<50:29, 1.63it/s] 57%|█████▋ | 6598/11526 [1:09:00<50:29, 1.63it/s] {'loss': 0.2061, 'grad_norm': 0.6041330099105835, 'learning_rate': 4.608949990183571e-06, 'epoch': 1.72}
57%|█████▋ | 6598/11526 [1:09:00<50:29, 1.63it/s] 57%|█████▋ | 6599/11526 [1:09:01<50:33, 1.62it/s] {'loss': 0.183, 'grad_norm': 0.5250415802001953, 'learning_rate': 4.607440334150362e-06, 'epoch': 1.72}
57%|█████▋ | 6599/11526 [1:09:01<50:33, 1.62it/s] 57%|█████▋ | 6600/11526 [1:09:01<50:31, 1.62it/s] {'loss': 0.2233, 'grad_norm': 0.5912268757820129, 'learning_rate': 4.60593071412496e-06, 'epoch': 1.72}
57%|█████▋ | 6600/11526 [1:09:02<50:31, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.33it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5328559279441833, 'eval_runtime': 1.9551, 'eval_samples_per_second': 102.294, 'eval_steps_per_second': 6.649, 'epoch': 1.72}
57%|█████▋ | 6600/11526 [1:09:03<50:31, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 57%|█████▋ | 6601/11526 [1:09:04<1:38:44, 1.20s/it] {'loss': 0.1928, 'grad_norm': 0.5241773128509521, 'learning_rate': 4.604421130245835e-06, 'epoch': 1.72}
57%|█████▋ | 6601/11526 [1:09:04<1:38:44, 1.20s/it] 57%|█████▋ | 6602/11526 [1:09:05<1:24:13, 1.03s/it] {'loss': 0.2357, 'grad_norm': 0.6527788639068604, 'learning_rate': 4.602911582651453e-06, 'epoch': 1.72}
57%|█████▋ | 6602/11526 [1:09:05<1:24:13, 1.03s/it] 57%|█████▋ | 6603/11526 [1:09:05<1:14:04, 1.11it/s] {'loss': 0.2048, 'grad_norm': 0.5277063250541687, 'learning_rate': 4.6014020714802805e-06, 'epoch': 1.72}
57%|█████▋ | 6603/11526 [1:09:05<1:14:04, 1.11it/s] 57%|█████▋ | 6604/11526 [1:09:06<1:07:02, 1.22it/s] {'loss': 0.2119, 'grad_norm': 0.598041832447052, 'learning_rate': 4.599892596870779e-06, 'epoch': 1.72}
57%|█████▋ | 6604/11526 [1:09:06<1:07:02, 1.22it/s] 57%|█████▋ | 6605/11526 [1:09:06<1:02:00, 1.32it/s] {'loss': 0.2015, 'grad_norm': 0.5951598882675171, 'learning_rate': 4.598383158961405e-06, 'epoch': 1.72}
57%|█████▋ | 6605/11526 [1:09:07<1:02:00, 1.32it/s] 57%|█████▋ | 6606/11526 [1:09:07<58:31, 1.40it/s] {'loss': 0.1962, 'grad_norm': 0.5608953237533569, 'learning_rate': 4.596873757890612e-06, 'epoch': 1.72}
57%|█████▋ | 6606/11526 [1:09:07<58:31, 1.40it/s] 57%|█████▋ | 6607/11526 [1:09:08<56:03, 1.46it/s] {'loss': 0.2521, 'grad_norm': 0.657673180103302, 'learning_rate': 4.5953643937968505e-06, 'epoch': 1.72}
57%|█████▋ | 6607/11526 [1:09:08<56:03, 1.46it/s] 57%|█████▋ | 6608/11526 [1:09:08<54:18, 1.51it/s] {'loss': 0.2192, 'grad_norm': 0.5924502015113831, 'learning_rate': 4.593855066818571e-06, 'epoch': 1.72}
57%|█████▋ | 6608/11526 [1:09:08<54:18, 1.51it/s] 57%|█████▋ | 6609/11526 [1:09:09<53:12, 1.54it/s] {'loss': 0.2537, 'grad_norm': 0.5556563138961792, 'learning_rate': 4.592345777094216e-06, 'epoch': 1.72}
57%|█████▋ | 6609/11526 [1:09:09<53:12, 1.54it/s] 57%|█████▋ | 6610/11526 [1:09:09<52:21, 1.56it/s] {'loss': 0.2217, 'grad_norm': 0.54770827293396, 'learning_rate': 4.590836524762225e-06, 'epoch': 1.72}
57%|█████▋ | 6610/11526 [1:09:10<52:21, 1.56it/s] 57%|█████▋ | 6611/11526 [1:09:10<51:44, 1.58it/s] {'loss': 0.1613, 'grad_norm': 0.4966078996658325, 'learning_rate': 4.589327309961036e-06, 'epoch': 1.72}
57%|█████▋ | 6611/11526 [1:09:10<51:44, 1.58it/s] 57%|█████▋ | 6612/11526 [1:09:11<51:16, 1.60it/s] {'loss': 0.1646, 'grad_norm': 0.4899371266365051, 'learning_rate': 4.587818132829081e-06, 'epoch': 1.72}
57%|█████▋ | 6612/11526 [1:09:11<51:16, 1.60it/s] 57%|█████▋ | 6613/11526 [1:09:11<50:58, 1.61it/s] {'loss': 0.1568, 'grad_norm': 0.4710817337036133, 'learning_rate': 4.586308993504795e-06, 'epoch': 1.72}
57%|█████▋ | 6613/11526 [1:09:11<50:58, 1.61it/s] 57%|█████▋ | 6614/11526 [1:09:12<50:50, 1.61it/s] {'loss': 0.1961, 'grad_norm': 0.5657757520675659, 'learning_rate': 4.5847998921266e-06, 'epoch': 1.72}
57%|█████▋ | 6614/11526 [1:09:12<50:50, 1.61it/s] 57%|█████▋ | 6615/11526 [1:09:13<50:39, 1.62it/s] {'loss': 0.2403, 'grad_norm': 0.6358776688575745, 'learning_rate': 4.583290828832922e-06, 'epoch': 1.72}
57%|█████▋ | 6615/11526 [1:09:13<50:39, 1.62it/s] 57%|█████▋ | 6616/11526 [1:09:13<50:32, 1.62it/s] {'loss': 0.2425, 'grad_norm': 0.6135761141777039, 'learning_rate': 4.581781803762179e-06, 'epoch': 1.72}
57%|█████▋ | 6616/11526 [1:09:13<50:32, 1.62it/s] 57%|█████▋ | 6617/11526 [1:09:14<50:26, 1.62it/s] {'loss': 0.1771, 'grad_norm': 0.5147128701210022, 'learning_rate': 4.5802728170527905e-06, 'epoch': 1.72}
57%|█████▋ | 6617/11526 [1:09:14<50:26, 1.62it/s] 57%|█████▋ | 6618/11526 [1:09:14<50:22, 1.62it/s] {'loss': 0.2293, 'grad_norm': 0.569749653339386, 'learning_rate': 4.578763868843166e-06, 'epoch': 1.72}
57%|█████▋ | 6618/11526 [1:09:15<50:22, 1.62it/s] 57%|█████▋ | 6619/11526 [1:09:15<50:22, 1.62it/s] {'loss': 0.156, 'grad_norm': 0.5524927973747253, 'learning_rate': 4.577254959271717e-06, 'epoch': 1.72}
57%|█████▋ | 6619/11526 [1:09:15<50:22, 1.62it/s] 57%|█████▋ | 6620/11526 [1:09:16<50:18, 1.63it/s] {'loss': 0.2136, 'grad_norm': 0.6284612417221069, 'learning_rate': 4.575746088476849e-06, 'epoch': 1.72}
57%|█████▋ | 6620/11526 [1:09:16<50:18, 1.63it/s] 57%|█████▋ | 6621/11526 [1:09:16<50:15, 1.63it/s] {'loss': 0.2425, 'grad_norm': 0.48248711228370667, 'learning_rate': 4.5742372565969615e-06, 'epoch': 1.72}
57%|█████▋ | 6621/11526 [1:09:16<50:15, 1.63it/s] 57%|█████▋ | 6622/11526 [1:09:17<50:14, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5631560683250427, 'learning_rate': 4.572728463770456e-06, 'epoch': 1.72}
57%|█████▋ | 6622/11526 [1:09:17<50:14, 1.63it/s] 57%|█████▋ | 6623/11526 [1:09:17<50:12, 1.63it/s] {'loss': 0.1785, 'grad_norm': 0.48967263102531433, 'learning_rate': 4.571219710135729e-06, 'epoch': 1.72}
57%|█████▋ | 6623/11526 [1:09:18<50:12, 1.63it/s] 57%|█████▋ | 6624/11526 [1:09:18<50:14, 1.63it/s] {'loss': 0.2082, 'grad_norm': 0.615674614906311, 'learning_rate': 4.569710995831168e-06, 'epoch': 1.72}
57%|█████▋ | 6624/11526 [1:09:18<50:14, 1.63it/s] 57%|█████▋ | 6625/11526 [1:09:19<50:12, 1.63it/s] {'loss': 0.2398, 'grad_norm': 0.5906621813774109, 'learning_rate': 4.568202320995162e-06, 'epoch': 1.72}
57%|█████▋ | 6625/11526 [1:09:19<50:12, 1.63it/s] 57%|█████▋ | 6626/11526 [1:09:19<50:10, 1.63it/s] {'loss': 0.1845, 'grad_norm': 0.5016725659370422, 'learning_rate': 4.566693685766096e-06, 'epoch': 1.72}
57%|█████▋ | 6626/11526 [1:09:19<50:10, 1.63it/s] 57%|█████▋ | 6627/11526 [1:09:20<50:09, 1.63it/s] {'loss': 0.1926, 'grad_norm': 0.5249282121658325, 'learning_rate': 4.565185090282353e-06, 'epoch': 1.72}
57%|█████▋ | 6627/11526 [1:09:20<50:09, 1.63it/s] 58%|█████▊ | 6628/11526 [1:09:21<50:08, 1.63it/s] {'loss': 0.2061, 'grad_norm': 0.6213529109954834, 'learning_rate': 4.563676534682305e-06, 'epoch': 1.73}
58%|█████▊ | 6628/11526 [1:09:21<50:08, 1.63it/s] 58%|█████▊ | 6629/11526 [1:09:21<50:10, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.5179181694984436, 'learning_rate': 4.562168019104328e-06, 'epoch': 1.73}
58%|█████▊ | 6629/11526 [1:09:21<50:10, 1.63it/s] 58%|█████▊ | 6630/11526 [1:09:22<50:08, 1.63it/s] {'loss': 0.1937, 'grad_norm': 0.5192571878433228, 'learning_rate': 4.560659543686793e-06, 'epoch': 1.73}
58%|█████▊ | 6630/11526 [1:09:22<50:08, 1.63it/s] 58%|█████▊ | 6631/11526 [1:09:22<50:07, 1.63it/s] {'loss': 0.1593, 'grad_norm': 0.4536171853542328, 'learning_rate': 4.559151108568065e-06, 'epoch': 1.73}
58%|█████▊ | 6631/11526 [1:09:23<50:07, 1.63it/s] 58%|█████▊ | 6632/11526 [1:09:23<50:06, 1.63it/s] {'loss': 0.1592, 'grad_norm': 0.46550044417381287, 'learning_rate': 4.5576427138865034e-06, 'epoch': 1.73}
58%|█████▊ | 6632/11526 [1:09:23<50:06, 1.63it/s] 58%|█████▊ | 6633/11526 [1:09:24<50:05, 1.63it/s] {'loss': 0.2243, 'grad_norm': 0.879120945930481, 'learning_rate': 4.556134359780471e-06, 'epoch': 1.73}
58%|█████▊ | 6633/11526 [1:09:24<50:05, 1.63it/s] 58%|█████▊ | 6634/11526 [1:09:24<50:06, 1.63it/s] {'loss': 0.2252, 'grad_norm': 0.5841490626335144, 'learning_rate': 4.554626046388317e-06, 'epoch': 1.73}
58%|█████▊ | 6634/11526 [1:09:24<50:06, 1.63it/s] 58%|█████▊ | 6635/11526 [1:09:25<50:05, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.42080020904541016, 'learning_rate': 4.5531177738484e-06, 'epoch': 1.73}
58%|█████▊ | 6635/11526 [1:09:25<50:05, 1.63it/s] 58%|█████▊ | 6636/11526 [1:09:25<50:04, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.5011909008026123, 'learning_rate': 4.551609542299062e-06, 'epoch': 1.73}
58%|█████▊ | 6636/11526 [1:09:26<50:04, 1.63it/s] 58%|█████▊ | 6637/11526 [1:09:26<50:02, 1.63it/s] {'loss': 0.2301, 'grad_norm': 0.5321975946426392, 'learning_rate': 4.550101351878649e-06, 'epoch': 1.73}
58%|█████▊ | 6637/11526 [1:09:26<50:02, 1.63it/s] 58%|█████▊ | 6638/11526 [1:09:27<50:01, 1.63it/s] {'loss': 0.2318, 'grad_norm': 0.6062177419662476, 'learning_rate': 4.548593202725497e-06, 'epoch': 1.73}
58%|█████▊ | 6638/11526 [1:09:27<50:01, 1.63it/s] 58%|█████▊ | 6639/11526 [1:09:27<50:06, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.49882206320762634, 'learning_rate': 4.547085094977947e-06, 'epoch': 1.73}
58%|█████▊ | 6639/11526 [1:09:27<50:06, 1.63it/s] 58%|█████▊ | 6640/11526 [1:09:28<50:07, 1.62it/s] {'loss': 0.2209, 'grad_norm': 0.633222758769989, 'learning_rate': 4.545577028774329e-06, 'epoch': 1.73}
58%|█████▊ | 6640/11526 [1:09:28<50:07, 1.62it/s] 58%|█████▊ | 6641/11526 [1:09:29<50:05, 1.63it/s] {'loss': 0.2129, 'grad_norm': 0.5929202437400818, 'learning_rate': 4.5440690042529715e-06, 'epoch': 1.73}
58%|█████▊ | 6641/11526 [1:09:29<50:05, 1.63it/s] 58%|█████▊ | 6642/11526 [1:09:29<50:03, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.5282464623451233, 'learning_rate': 4.5425610215522e-06, 'epoch': 1.73}
58%|█████▊ | 6642/11526 [1:09:29<50:03, 1.63it/s] 58%|█████▊ | 6643/11526 [1:09:30<50:00, 1.63it/s] {'loss': 0.1918, 'grad_norm': 0.5627526640892029, 'learning_rate': 4.5410530808103315e-06, 'epoch': 1.73}
58%|█████▊ | 6643/11526 [1:09:30<50:00, 1.63it/s] 58%|█████▊ | 6644/11526 [1:09:30<50:01, 1.63it/s] {'loss': 0.2194, 'grad_norm': 0.5278336405754089, 'learning_rate': 4.539545182165687e-06, 'epoch': 1.73}
58%|█████▊ | 6644/11526 [1:09:31<50:01, 1.63it/s] 58%|█████▊ | 6645/11526 [1:09:31<49:59, 1.63it/s] {'loss': 0.1775, 'grad_norm': 0.4823898375034332, 'learning_rate': 4.538037325756579e-06, 'epoch': 1.73}
58%|█████▊ | 6645/11526 [1:09:31<49:59, 1.63it/s] 58%|█████▊ | 6646/11526 [1:09:32<49:58, 1.63it/s] {'loss': 0.2721, 'grad_norm': 0.6995585560798645, 'learning_rate': 4.536529511721317e-06, 'epoch': 1.73}
58%|█████▊ | 6646/11526 [1:09:32<49:58, 1.63it/s] 58%|█████▊ | 6647/11526 [1:09:32<49:56, 1.63it/s] {'loss': 0.1955, 'grad_norm': 0.5335721373558044, 'learning_rate': 4.535021740198202e-06, 'epoch': 1.73}
58%|█████▊ | 6647/11526 [1:09:32<49:56, 1.63it/s] 58%|█████▊ | 6648/11526 [1:09:33<49:55, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.556685209274292, 'learning_rate': 4.53351401132554e-06, 'epoch': 1.73}
58%|█████▊ | 6648/11526 [1:09:33<49:55, 1.63it/s] 58%|█████▊ | 6649/11526 [1:09:33<49:59, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.46318772435188293, 'learning_rate': 4.532006325241629e-06, 'epoch': 1.73}
58%|█████▊ | 6649/11526 [1:09:34<49:59, 1.63it/s] 58%|█████▊ | 6650/11526 [1:09:34<49:57, 1.63it/s] {'loss': 0.2148, 'grad_norm': 0.5363909006118774, 'learning_rate': 4.530498682084759e-06, 'epoch': 1.73}
58%|█████▊ | 6650/11526 [1:09:34<49:57, 1.63it/s] 58%|█████▊ | 6651/11526 [1:09:35<49:56, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.45498883724212646, 'learning_rate': 4.528991081993222e-06, 'epoch': 1.73}
58%|█████▊ | 6651/11526 [1:09:35<49:56, 1.63it/s] 58%|█████▊ | 6652/11526 [1:09:35<49:54, 1.63it/s] {'loss': 0.1417, 'grad_norm': 0.43198785185813904, 'learning_rate': 4.5274835251053e-06, 'epoch': 1.73}
58%|█████▊ | 6652/11526 [1:09:35<49:54, 1.63it/s] 58%|█████▊ | 6653/11526 [1:09:36<49:53, 1.63it/s] {'loss': 0.2031, 'grad_norm': 0.5809819102287292, 'learning_rate': 4.525976011559281e-06, 'epoch': 1.73}
58%|█████▊ | 6653/11526 [1:09:36<49:53, 1.63it/s] 58%|█████▊ | 6654/11526 [1:09:37<49:55, 1.63it/s] {'loss': 0.1536, 'grad_norm': 0.48201072216033936, 'learning_rate': 4.524468541493439e-06, 'epoch': 1.73}
58%|█████▊ | 6654/11526 [1:09:37<49:55, 1.63it/s] 58%|█████▊ | 6655/11526 [1:09:37<49:55, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.5060845017433167, 'learning_rate': 4.522961115046047e-06, 'epoch': 1.73}
58%|█████▊ | 6655/11526 [1:09:37<49:55, 1.63it/s] 58%|█████▊ | 6656/11526 [1:09:38<49:54, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.4710051119327545, 'learning_rate': 4.5214537323553745e-06, 'epoch': 1.73}
58%|█████▊ | 6656/11526 [1:09:38<49:54, 1.63it/s] 58%|█████▊ | 6657/11526 [1:09:38<49:53, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.4501633048057556, 'learning_rate': 4.51994639355969e-06, 'epoch': 1.73}
58%|█████▊ | 6657/11526 [1:09:39<49:53, 1.63it/s] 58%|█████▊ | 6658/11526 [1:09:39<49:51, 1.63it/s] {'loss': 0.1785, 'grad_norm': 0.515704870223999, 'learning_rate': 4.5184390987972545e-06, 'epoch': 1.73}
58%|█████▊ | 6658/11526 [1:09:39<49:51, 1.63it/s] 58%|█████▊ | 6659/11526 [1:09:40<49:54, 1.63it/s] {'loss': 0.1924, 'grad_norm': 0.5121482610702515, 'learning_rate': 4.516931848206324e-06, 'epoch': 1.73}
58%|█████▊ | 6659/11526 [1:09:40<49:54, 1.63it/s] 58%|█████▊ | 6660/11526 [1:09:40<49:52, 1.63it/s] {'loss': 0.2207, 'grad_norm': 0.6920545697212219, 'learning_rate': 4.515424641925154e-06, 'epoch': 1.73}
58%|█████▊ | 6660/11526 [1:09:40<49:52, 1.63it/s] 58%|█████▊ | 6661/11526 [1:09:41<49:49, 1.63it/s] {'loss': 0.179, 'grad_norm': 0.5699533820152283, 'learning_rate': 4.51391748009199e-06, 'epoch': 1.73}
58%|█████▊ | 6661/11526 [1:09:41<49:49, 1.63it/s] 58%|█████▊ | 6662/11526 [1:09:41<49:48, 1.63it/s] {'loss': 0.2146, 'grad_norm': 0.567766547203064, 'learning_rate': 4.512410362845083e-06, 'epoch': 1.73}
58%|█████▊ | 6662/11526 [1:09:42<49:48, 1.63it/s] 58%|█████▊ | 6663/11526 [1:09:42<49:48, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.48164117336273193, 'learning_rate': 4.510903290322671e-06, 'epoch': 1.73}
58%|█████▊ | 6663/11526 [1:09:42<49:48, 1.63it/s] 58%|█████▊ | 6664/11526 [1:09:43<49:49, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.5695652961730957, 'learning_rate': 4.509396262662993e-06, 'epoch': 1.73}
58%|█████▊ | 6664/11526 [1:09:43<49:49, 1.63it/s] 58%|█████▊ | 6665/11526 [1:09:43<49:47, 1.63it/s] {'loss': 0.1577, 'grad_norm': 0.43529608845710754, 'learning_rate': 4.507889280004278e-06, 'epoch': 1.73}
58%|█████▊ | 6665/11526 [1:09:43<49:47, 1.63it/s] 58%|█████▊ | 6666/11526 [1:09:44<49:47, 1.63it/s] {'loss': 0.2943, 'grad_norm': 0.6207113265991211, 'learning_rate': 4.506382342484762e-06, 'epoch': 1.74}
58%|█████▊ | 6666/11526 [1:09:44<49:47, 1.63it/s] 58%|█████▊ | 6667/11526 [1:09:45<49:45, 1.63it/s] {'loss': 0.2764, 'grad_norm': 0.637949526309967, 'learning_rate': 4.504875450242664e-06, 'epoch': 1.74}
58%|█████▊ | 6667/11526 [1:09:45<49:45, 1.63it/s] 58%|█████▊ | 6668/11526 [1:09:45<49:43, 1.63it/s] {'loss': 0.2637, 'grad_norm': 0.644665002822876, 'learning_rate': 4.503368603416208e-06, 'epoch': 1.74}
58%|█████▊ | 6668/11526 [1:09:45<49:43, 1.63it/s] 58%|█████▊ | 6669/11526 [1:09:46<49:42, 1.63it/s] {'loss': 0.239, 'grad_norm': 0.6105846762657166, 'learning_rate': 4.50186180214361e-06, 'epoch': 1.74}
58%|█████▊ | 6669/11526 [1:09:46<49:42, 1.63it/s] 58%|█████▊ | 6670/11526 [1:09:46<49:43, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.5509973764419556, 'learning_rate': 4.5003550465630795e-06, 'epoch': 1.74}
58%|█████▊ | 6670/11526 [1:09:46<49:43, 1.63it/s] 58%|█████▊ | 6671/11526 [1:09:47<49:42, 1.63it/s] {'loss': 0.226, 'grad_norm': 0.6386541724205017, 'learning_rate': 4.498848336812828e-06, 'epoch': 1.74}
58%|█████▊ | 6671/11526 [1:09:47<49:42, 1.63it/s] 58%|█████▊ | 6672/11526 [1:09:48<49:41, 1.63it/s] {'loss': 0.2364, 'grad_norm': 0.6182551383972168, 'learning_rate': 4.497341673031059e-06, 'epoch': 1.74}
58%|█████▊ | 6672/11526 [1:09:48<49:41, 1.63it/s] 58%|█████▊ | 6673/11526 [1:09:48<49:41, 1.63it/s] {'loss': 0.183, 'grad_norm': 0.609246015548706, 'learning_rate': 4.495835055355972e-06, 'epoch': 1.74}
58%|█████▊ | 6673/11526 [1:09:48<49:41, 1.63it/s] 58%|█████▊ | 6674/11526 [1:09:49<49:39, 1.63it/s] {'loss': 0.2602, 'grad_norm': 0.6306396126747131, 'learning_rate': 4.494328483925761e-06, 'epoch': 1.74}
58%|█████▊ | 6674/11526 [1:09:49<49:39, 1.63it/s] 58%|█████▊ | 6675/11526 [1:09:49<49:45, 1.62it/s] {'loss': 0.3196, 'grad_norm': 0.6479842662811279, 'learning_rate': 4.492821958878619e-06, 'epoch': 1.74}
58%|█████▊ | 6675/11526 [1:09:50<49:45, 1.62it/s] 58%|█████▊ | 6676/11526 [1:09:50<49:43, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.4596911370754242, 'learning_rate': 4.491315480352733e-06, 'epoch': 1.74}
58%|█████▊ | 6676/11526 [1:09:50<49:43, 1.63it/s] 58%|█████▊ | 6677/11526 [1:09:51<49:40, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.5576812028884888, 'learning_rate': 4.489809048486286e-06, 'epoch': 1.74}
58%|█████▊ | 6677/11526 [1:09:51<49:40, 1.63it/s] 58%|█████▊ | 6678/11526 [1:09:51<49:39, 1.63it/s] {'loss': 0.2366, 'grad_norm': 0.5636025071144104, 'learning_rate': 4.488302663417454e-06, 'epoch': 1.74}
58%|█████▊ | 6678/11526 [1:09:51<49:39, 1.63it/s] 58%|█████▊ | 6679/11526 [1:09:52<49:37, 1.63it/s] {'loss': 0.2079, 'grad_norm': 0.5723035931587219, 'learning_rate': 4.486796325284415e-06, 'epoch': 1.74}
58%|█████▊ | 6679/11526 [1:09:52<49:37, 1.63it/s] 58%|█████▊ | 6680/11526 [1:09:53<49:36, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.6016109585762024, 'learning_rate': 4.485290034225337e-06, 'epoch': 1.74}
58%|█████▊ | 6680/11526 [1:09:53<49:36, 1.63it/s] 58%|█████▊ | 6681/11526 [1:09:53<49:36, 1.63it/s] {'loss': 0.2655, 'grad_norm': 0.7221013903617859, 'learning_rate': 4.483783790378386e-06, 'epoch': 1.74}
58%|█████▊ | 6681/11526 [1:09:53<49:36, 1.63it/s] 58%|█████▊ | 6682/11526 [1:09:54<49:35, 1.63it/s] {'loss': 0.1622, 'grad_norm': 0.44371411204338074, 'learning_rate': 4.482277593881722e-06, 'epoch': 1.74}
58%|█████▊ | 6682/11526 [1:09:54<49:35, 1.63it/s] 58%|█████▊ | 6683/11526 [1:09:54<49:34, 1.63it/s] {'loss': 0.1808, 'grad_norm': 0.5105659365653992, 'learning_rate': 4.4807714448735016e-06, 'epoch': 1.74}
58%|█████▊ | 6683/11526 [1:09:54<49:34, 1.63it/s] 58%|█████▊ | 6684/11526 [1:09:55<49:37, 1.63it/s] {'loss': 0.1971, 'grad_norm': 0.5445254445075989, 'learning_rate': 4.47926534349188e-06, 'epoch': 1.74}
58%|█████▊ | 6684/11526 [1:09:55<49:37, 1.63it/s] 58%|█████▊ | 6685/11526 [1:09:56<49:36, 1.63it/s] {'loss': 0.1831, 'grad_norm': 0.5199702382087708, 'learning_rate': 4.477759289875005e-06, 'epoch': 1.74}
58%|█████▊ | 6685/11526 [1:09:56<49:36, 1.63it/s] 58%|█████▊ | 6686/11526 [1:09:56<49:34, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.6334291696548462, 'learning_rate': 4.476253284161018e-06, 'epoch': 1.74}
58%|█████▊ | 6686/11526 [1:09:56<49:34, 1.63it/s] 58%|█████▊ | 6687/11526 [1:09:57<49:33, 1.63it/s] {'loss': 0.2052, 'grad_norm': 0.5459723472595215, 'learning_rate': 4.474747326488058e-06, 'epoch': 1.74}
58%|█████▊ | 6687/11526 [1:09:57<49:33, 1.63it/s] 58%|█████▊ | 6688/11526 [1:09:57<49:31, 1.63it/s] {'loss': 0.1787, 'grad_norm': 0.5461519956588745, 'learning_rate': 4.473241416994265e-06, 'epoch': 1.74}
58%|█████▊ | 6688/11526 [1:09:58<49:31, 1.63it/s] 58%|█████▊ | 6689/11526 [1:09:58<49:33, 1.63it/s] {'loss': 0.2514, 'grad_norm': 0.6240156292915344, 'learning_rate': 4.471735555817765e-06, 'epoch': 1.74}
58%|█████▊ | 6689/11526 [1:09:58<49:33, 1.63it/s] 58%|█████▊ | 6690/11526 [1:09:59<49:36, 1.62it/s] {'loss': 0.1925, 'grad_norm': 0.5113894939422607, 'learning_rate': 4.470229743096686e-06, 'epoch': 1.74}
58%|█████▊ | 6690/11526 [1:09:59<49:36, 1.62it/s] 58%|█████▊ | 6691/11526 [1:09:59<49:34, 1.63it/s] {'loss': 0.1914, 'grad_norm': 0.5004516243934631, 'learning_rate': 4.468723978969149e-06, 'epoch': 1.74}
58%|█████▊ | 6691/11526 [1:09:59<49:34, 1.63it/s] 58%|█████▊ | 6692/11526 [1:10:00<49:32, 1.63it/s] {'loss': 0.1597, 'grad_norm': 0.45925620198249817, 'learning_rate': 4.467218263573271e-06, 'epoch': 1.74}
58%|█████▊ | 6692/11526 [1:10:00<49:32, 1.63it/s] 58%|█████▊ | 6693/11526 [1:10:00<49:30, 1.63it/s] {'loss': 0.2331, 'grad_norm': 0.5985701084136963, 'learning_rate': 4.465712597047165e-06, 'epoch': 1.74}
58%|█████▊ | 6693/11526 [1:10:01<49:30, 1.63it/s] 58%|█████▊ | 6694/11526 [1:10:01<49:32, 1.63it/s] {'loss': 0.1736, 'grad_norm': 0.4887809753417969, 'learning_rate': 4.46420697952894e-06, 'epoch': 1.74}
58%|█████▊ | 6694/11526 [1:10:01<49:32, 1.63it/s] 58%|█████▊ | 6695/11526 [1:10:02<49:29, 1.63it/s] {'loss': 0.205, 'grad_norm': 0.5660289525985718, 'learning_rate': 4.462701411156702e-06, 'epoch': 1.74}
58%|█████▊ | 6695/11526 [1:10:02<49:29, 1.63it/s] 58%|█████▊ | 6696/11526 [1:10:02<49:29, 1.63it/s] {'loss': 0.2356, 'grad_norm': 0.646838366985321, 'learning_rate': 4.461195892068542e-06, 'epoch': 1.74}
58%|█████▊ | 6696/11526 [1:10:02<49:29, 1.63it/s] 58%|█████▊ | 6697/11526 [1:10:03<49:29, 1.63it/s] {'loss': 0.1556, 'grad_norm': 0.46734392642974854, 'learning_rate': 4.459690422402564e-06, 'epoch': 1.74}
58%|█████▊ | 6697/11526 [1:10:03<49:29, 1.63it/s] 58%|█████▊ | 6698/11526 [1:10:04<49:28, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5118646621704102, 'learning_rate': 4.458185002296856e-06, 'epoch': 1.74}
58%|█████▊ | 6698/11526 [1:10:04<49:28, 1.63it/s] 58%|█████▊ | 6699/11526 [1:10:04<49:31, 1.62it/s] {'loss': 0.2163, 'grad_norm': 0.5650988817214966, 'learning_rate': 4.456679631889501e-06, 'epoch': 1.74}
58%|█████▊ | 6699/11526 [1:10:04<49:31, 1.62it/s] 58%|█████▊ | 6700/11526 [1:10:05<49:30, 1.62it/s] {'loss': 0.1905, 'grad_norm': 0.5177134871482849, 'learning_rate': 4.455174311318581e-06, 'epoch': 1.74}
58%|█████▊ | 6700/11526 [1:10:05<49:30, 1.62it/s] 58%|█████▊ | 6701/11526 [1:10:05<49:27, 1.63it/s] {'loss': 0.2031, 'grad_norm': 0.6082696318626404, 'learning_rate': 4.4536690407221716e-06, 'epoch': 1.74}
58%|█████▊ | 6701/11526 [1:10:06<49:27, 1.63it/s] 58%|█████▊ | 6702/11526 [1:10:06<49:26, 1.63it/s] {'loss': 0.2027, 'grad_norm': 0.5169367790222168, 'learning_rate': 4.452163820238349e-06, 'epoch': 1.74}
58%|█████▊ | 6702/11526 [1:10:06<49:26, 1.63it/s] 58%|█████▊ | 6703/11526 [1:10:07<49:25, 1.63it/s] {'loss': 0.2688, 'grad_norm': 0.6149641871452332, 'learning_rate': 4.450658650005178e-06, 'epoch': 1.74}
58%|█████▊ | 6703/11526 [1:10:07<49:25, 1.63it/s] 58%|█████▊ | 6704/11526 [1:10:07<49:26, 1.63it/s] {'loss': 0.1624, 'grad_norm': 0.5088595747947693, 'learning_rate': 4.44915353016072e-06, 'epoch': 1.74}
58%|█████▊ | 6704/11526 [1:10:07<49:26, 1.63it/s] 58%|█████▊ | 6705/11526 [1:10:08<49:23, 1.63it/s] {'loss': 0.1676, 'grad_norm': 0.43424856662750244, 'learning_rate': 4.447648460843033e-06, 'epoch': 1.75}
58%|█████▊ | 6705/11526 [1:10:08<49:23, 1.63it/s] 58%|█████▊ | 6706/11526 [1:10:08<49:22, 1.63it/s] {'loss': 0.2459, 'grad_norm': 0.584332287311554, 'learning_rate': 4.446143442190174e-06, 'epoch': 1.75}
58%|█████▊ | 6706/11526 [1:10:09<49:22, 1.63it/s] 58%|█████▊ | 6707/11526 [1:10:09<49:20, 1.63it/s] {'loss': 0.2022, 'grad_norm': 0.5098947882652283, 'learning_rate': 4.44463847434019e-06, 'epoch': 1.75}
58%|█████▊ | 6707/11526 [1:10:09<49:20, 1.63it/s] 58%|█████▊ | 6708/11526 [1:10:10<49:18, 1.63it/s] {'loss': 0.2886, 'grad_norm': 0.7510136961936951, 'learning_rate': 4.443133557431124e-06, 'epoch': 1.75}
58%|█████▊ | 6708/11526 [1:10:10<49:18, 1.63it/s] 58%|█████▊ | 6709/11526 [1:10:10<49:19, 1.63it/s] {'loss': 0.1667, 'grad_norm': 0.5333155393600464, 'learning_rate': 4.441628691601017e-06, 'epoch': 1.75}
58%|█████▊ | 6709/11526 [1:10:10<49:19, 1.63it/s] 58%|█████▊ | 6710/11526 [1:10:11<49:16, 1.63it/s] {'loss': 0.212, 'grad_norm': 0.6041895747184753, 'learning_rate': 4.440123876987902e-06, 'epoch': 1.75}
58%|█████▊ | 6710/11526 [1:10:11<49:16, 1.63it/s] 58%|█████▊ | 6711/11526 [1:10:12<49:17, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.5617942810058594, 'learning_rate': 4.438619113729813e-06, 'epoch': 1.75}
58%|█████▊ | 6711/11526 [1:10:12<49:17, 1.63it/s] 58%|█████▊ | 6712/11526 [1:10:12<49:17, 1.63it/s] {'loss': 0.2, 'grad_norm': 0.6216724514961243, 'learning_rate': 4.437114401964772e-06, 'epoch': 1.75}
58%|█████▊ | 6712/11526 [1:10:12<49:17, 1.63it/s] 58%|█████▊ | 6713/11526 [1:10:13<49:18, 1.63it/s] {'loss': 0.1969, 'grad_norm': 0.5581679344177246, 'learning_rate': 4.4356097418308e-06, 'epoch': 1.75}
58%|█████▊ | 6713/11526 [1:10:13<49:18, 1.63it/s] 58%|█████▊ | 6714/11526 [1:10:13<49:21, 1.63it/s] {'loss': 0.2261, 'grad_norm': 0.6169017553329468, 'learning_rate': 4.434105133465913e-06, 'epoch': 1.75}
58%|█████▊ | 6714/11526 [1:10:14<49:21, 1.63it/s] 58%|█████▊ | 6715/11526 [1:10:14<49:18, 1.63it/s] {'loss': 0.1538, 'grad_norm': 0.48210960626602173, 'learning_rate': 4.432600577008126e-06, 'epoch': 1.75}
58%|█████▊ | 6715/11526 [1:10:14<49:18, 1.63it/s] 58%|█████▊ | 6716/11526 [1:10:15<49:16, 1.63it/s] {'loss': 0.2415, 'grad_norm': 0.5821541547775269, 'learning_rate': 4.43109607259544e-06, 'epoch': 1.75}
58%|█████▊ | 6716/11526 [1:10:15<49:16, 1.63it/s] 58%|█████▊ | 6717/11526 [1:10:15<49:13, 1.63it/s] {'loss': 0.215, 'grad_norm': 0.5205346941947937, 'learning_rate': 4.429591620365861e-06, 'epoch': 1.75}
58%|█████▊ | 6717/11526 [1:10:15<49:13, 1.63it/s] 58%|█████▊ | 6718/11526 [1:10:16<49:13, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.599281370639801, 'learning_rate': 4.428087220457384e-06, 'epoch': 1.75}
58%|█████▊ | 6718/11526 [1:10:16<49:13, 1.63it/s] 58%|█████▊ | 6719/11526 [1:10:16<49:14, 1.63it/s] {'loss': 0.2644, 'grad_norm': 0.5879153609275818, 'learning_rate': 4.426582873007999e-06, 'epoch': 1.75}
58%|█████▊ | 6719/11526 [1:10:17<49:14, 1.63it/s] 58%|█████▊ | 6720/11526 [1:10:17<49:13, 1.63it/s] {'loss': 0.246, 'grad_norm': 0.6392467617988586, 'learning_rate': 4.425078578155695e-06, 'epoch': 1.75}
58%|█████▊ | 6720/11526 [1:10:17<49:13, 1.63it/s] 58%|█████▊ | 6721/11526 [1:10:18<49:12, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.6853635907173157, 'learning_rate': 4.423574336038458e-06, 'epoch': 1.75}
58%|█████▊ | 6721/11526 [1:10:18<49:12, 1.63it/s] 58%|█████▊ | 6722/11526 [1:10:18<49:11, 1.63it/s] {'loss': 0.2441, 'grad_norm': 0.613638162612915, 'learning_rate': 4.422070146794261e-06, 'epoch': 1.75}
58%|█████▊ | 6722/11526 [1:10:18<49:11, 1.63it/s] 58%|█████▊ | 6723/11526 [1:10:19<49:10, 1.63it/s] {'loss': 0.1755, 'grad_norm': 0.471478134393692, 'learning_rate': 4.420566010561078e-06, 'epoch': 1.75}
58%|█████▊ | 6723/11526 [1:10:19<49:10, 1.63it/s] 58%|█████▊ | 6724/11526 [1:10:20<49:14, 1.63it/s] {'loss': 0.2861, 'grad_norm': 0.7162939310073853, 'learning_rate': 4.419061927476878e-06, 'epoch': 1.75}
58%|█████▊ | 6724/11526 [1:10:20<49:14, 1.63it/s] 58%|█████▊ | 6725/11526 [1:10:20<49:09, 1.63it/s] {'loss': 0.1959, 'grad_norm': 0.5053873658180237, 'learning_rate': 4.417557897679624e-06, 'epoch': 1.75}
58%|█████▊ | 6725/11526 [1:10:20<49:09, 1.63it/s] 58%|█████▊ | 6726/11526 [1:10:21<49:09, 1.63it/s] {'loss': 0.2353, 'grad_norm': 0.6371537446975708, 'learning_rate': 4.416053921307274e-06, 'epoch': 1.75}
58%|█████▊ | 6726/11526 [1:10:21<49:09, 1.63it/s] 58%|█████▊ | 6727/11526 [1:10:21<49:08, 1.63it/s] {'loss': 0.1805, 'grad_norm': 0.5064708590507507, 'learning_rate': 4.4145499984977785e-06, 'epoch': 1.75}
58%|█████▊ | 6727/11526 [1:10:22<49:08, 1.63it/s] 58%|█████▊ | 6728/11526 [1:10:22<49:07, 1.63it/s] {'loss': 0.1809, 'grad_norm': 0.5040870904922485, 'learning_rate': 4.4130461293890895e-06, 'epoch': 1.75}
58%|█████▊ | 6728/11526 [1:10:22<49:07, 1.63it/s] 58%|█████▊ | 6729/11526 [1:10:23<49:05, 1.63it/s] {'loss': 0.1941, 'grad_norm': 0.4430079460144043, 'learning_rate': 4.41154231411915e-06, 'epoch': 1.75}
58%|█████▊ | 6729/11526 [1:10:23<49:05, 1.63it/s] 58%|█████▊ | 6730/11526 [1:10:23<49:04, 1.63it/s] {'loss': 0.2465, 'grad_norm': 0.5842481255531311, 'learning_rate': 4.410038552825897e-06, 'epoch': 1.75}
58%|█████▊ | 6730/11526 [1:10:23<49:04, 1.63it/s] 58%|█████▊ | 6731/11526 [1:10:24<49:03, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.5150280594825745, 'learning_rate': 4.408534845647264e-06, 'epoch': 1.75}
58%|█████▊ | 6731/11526 [1:10:24<49:03, 1.63it/s] 58%|█████▊ | 6732/11526 [1:10:24<49:02, 1.63it/s] {'loss': 0.2058, 'grad_norm': 0.5012891888618469, 'learning_rate': 4.40703119272118e-06, 'epoch': 1.75}
58%|█████▊ | 6732/11526 [1:10:25<49:02, 1.63it/s] 58%|█████▊ | 6733/11526 [1:10:25<49:02, 1.63it/s] {'loss': 0.1767, 'grad_norm': 0.6022240519523621, 'learning_rate': 4.40552759418557e-06, 'epoch': 1.75}
58%|█████▊ | 6733/11526 [1:10:25<49:02, 1.63it/s] 58%|█████▊ | 6734/11526 [1:10:26<49:02, 1.63it/s] {'loss': 0.2649, 'grad_norm': 0.6412681937217712, 'learning_rate': 4.404024050178352e-06, 'epoch': 1.75}
58%|█████▊ | 6734/11526 [1:10:26<49:02, 1.63it/s] 58%|█████▊ | 6735/11526 [1:10:26<49:00, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5204740166664124, 'learning_rate': 4.402520560837438e-06, 'epoch': 1.75}
58%|█████▊ | 6735/11526 [1:10:26<49:00, 1.63it/s] 58%|█████▊ | 6736/11526 [1:10:27<49:00, 1.63it/s] {'loss': 0.1831, 'grad_norm': 0.5004721283912659, 'learning_rate': 4.401017126300736e-06, 'epoch': 1.75}
58%|█████▊ | 6736/11526 [1:10:27<49:00, 1.63it/s] 58%|█████▊ | 6737/11526 [1:10:28<48:59, 1.63it/s] {'loss': 0.1825, 'grad_norm': 0.4778066575527191, 'learning_rate': 4.399513746706154e-06, 'epoch': 1.75}
58%|█████▊ | 6737/11526 [1:10:28<48:59, 1.63it/s] 58%|█████▊ | 6738/11526 [1:10:28<49:00, 1.63it/s] {'loss': 0.2273, 'grad_norm': 0.665685772895813, 'learning_rate': 4.398010422191588e-06, 'epoch': 1.75}
58%|█████▊ | 6738/11526 [1:10:28<49:00, 1.63it/s] 58%|█████▊ | 6739/11526 [1:10:29<49:02, 1.63it/s] {'loss': 0.1909, 'grad_norm': 0.5418396592140198, 'learning_rate': 4.39650715289493e-06, 'epoch': 1.75}
58%|█████▊ | 6739/11526 [1:10:29<49:02, 1.63it/s] 58%|█████▊ | 6740/11526 [1:10:29<49:00, 1.63it/s] {'loss': 0.1979, 'grad_norm': 0.5671247839927673, 'learning_rate': 4.39500393895407e-06, 'epoch': 1.75}
58%|█████▊ | 6740/11526 [1:10:30<49:00, 1.63it/s] 58%|█████▊ | 6741/11526 [1:10:30<48:59, 1.63it/s] {'loss': 0.3057, 'grad_norm': 0.4975462853908539, 'learning_rate': 4.393500780506889e-06, 'epoch': 1.75}
58%|█████▊ | 6741/11526 [1:10:30<48:59, 1.63it/s] 58%|█████▊ | 6742/11526 [1:10:31<49:00, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.5573699474334717, 'learning_rate': 4.391997677691271e-06, 'epoch': 1.75}
58%|█████▊ | 6742/11526 [1:10:31<49:00, 1.63it/s] 59%|█████▊ | 6743/11526 [1:10:31<48:58, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.575289249420166, 'learning_rate': 4.3904946306450824e-06, 'epoch': 1.76}
59%|█████▊ | 6743/11526 [1:10:31<48:58, 1.63it/s] 59%|█████▊ | 6744/11526 [1:10:32<48:57, 1.63it/s] {'loss': 0.217, 'grad_norm': 0.55560702085495, 'learning_rate': 4.388991639506196e-06, 'epoch': 1.76}
59%|█████▊ | 6744/11526 [1:10:32<48:57, 1.63it/s] 59%|█████▊ | 6745/11526 [1:10:32<48:55, 1.63it/s] {'loss': 0.231, 'grad_norm': 0.5752715468406677, 'learning_rate': 4.387488704412471e-06, 'epoch': 1.76}
59%|█████▊ | 6745/11526 [1:10:33<48:55, 1.63it/s] 59%|█████▊ | 6746/11526 [1:10:33<48:56, 1.63it/s] {'loss': 0.1956, 'grad_norm': 0.5487837791442871, 'learning_rate': 4.385985825501769e-06, 'epoch': 1.76}
59%|█████▊ | 6746/11526 [1:10:33<48:56, 1.63it/s] 59%|█████▊ | 6747/11526 [1:10:34<48:55, 1.63it/s] {'loss': 0.1815, 'grad_norm': 0.49594080448150635, 'learning_rate': 4.3844830029119395e-06, 'epoch': 1.76}
59%|█████▊ | 6747/11526 [1:10:34<48:55, 1.63it/s] 59%|█████▊ | 6748/11526 [1:10:34<48:55, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.453713983297348, 'learning_rate': 4.382980236780832e-06, 'epoch': 1.76}
59%|█████▊ | 6748/11526 [1:10:34<48:55, 1.63it/s] 59%|█████▊ | 6749/11526 [1:10:35<48:56, 1.63it/s] {'loss': 0.1557, 'grad_norm': 0.4889322817325592, 'learning_rate': 4.381477527246288e-06, 'epoch': 1.76}
59%|█████▊ | 6749/11526 [1:10:35<48:56, 1.63it/s] 59%|█████▊ | 6750/11526 [1:10:36<48:56, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.43405458331108093, 'learning_rate': 4.379974874446144e-06, 'epoch': 1.76}
59%|█████▊ | 6750/11526 [1:10:36<48:56, 1.63it/s] 59%|█████▊ | 6751/11526 [1:10:36<48:52, 1.63it/s] {'loss': 0.2037, 'grad_norm': 0.6160924434661865, 'learning_rate': 4.378472278518231e-06, 'epoch': 1.76}
59%|█████▊ | 6751/11526 [1:10:36<48:52, 1.63it/s] 59%|█████▊ | 6752/11526 [1:10:37<48:52, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.49631303548812866, 'learning_rate': 4.376969739600379e-06, 'epoch': 1.76}
59%|█████▊ | 6752/11526 [1:10:37<48:52, 1.63it/s] 59%|█████▊ | 6753/11526 [1:10:37<48:51, 1.63it/s] {'loss': 0.1751, 'grad_norm': 0.4726431965827942, 'learning_rate': 4.3754672578304065e-06, 'epoch': 1.76}
59%|█████▊ | 6753/11526 [1:10:37<48:51, 1.63it/s] 59%|█████▊ | 6754/11526 [1:10:38<48:52, 1.63it/s] {'loss': 0.2144, 'grad_norm': 0.5996086597442627, 'learning_rate': 4.37396483334613e-06, 'epoch': 1.76}
59%|█████▊ | 6754/11526 [1:10:38<48:52, 1.63it/s] 59%|█████▊ | 6755/11526 [1:10:39<48:51, 1.63it/s] {'loss': 0.2564, 'grad_norm': 0.6627918481826782, 'learning_rate': 4.372462466285361e-06, 'epoch': 1.76}
59%|█████▊ | 6755/11526 [1:10:39<48:51, 1.63it/s] 59%|█████▊ | 6756/11526 [1:10:39<48:49, 1.63it/s] {'loss': 0.1631, 'grad_norm': 0.5317302942276001, 'learning_rate': 4.370960156785907e-06, 'epoch': 1.76}
59%|█████▊ | 6756/11526 [1:10:39<48:49, 1.63it/s] 59%|█████▊ | 6757/11526 [1:10:40<48:48, 1.63it/s] {'loss': 0.3722, 'grad_norm': 0.588901698589325, 'learning_rate': 4.369457904985564e-06, 'epoch': 1.76}
59%|█████▊ | 6757/11526 [1:10:40<48:48, 1.63it/s] 59%|█████▊ | 6758/11526 [1:10:40<48:49, 1.63it/s] {'loss': 0.2062, 'grad_norm': 0.5425044894218445, 'learning_rate': 4.3679557110221324e-06, 'epoch': 1.76}
59%|█████▊ | 6758/11526 [1:10:41<48:49, 1.63it/s] 59%|█████▊ | 6759/11526 [1:10:41<48:48, 1.63it/s] {'loss': 0.174, 'grad_norm': 0.5347126722335815, 'learning_rate': 4.366453575033396e-06, 'epoch': 1.76}
59%|█████▊ | 6759/11526 [1:10:41<48:48, 1.63it/s] 59%|█████▊ | 6760/11526 [1:10:42<48:47, 1.63it/s] {'loss': 0.2986, 'grad_norm': 0.6881677508354187, 'learning_rate': 4.364951497157145e-06, 'epoch': 1.76}
59%|█████▊ | 6760/11526 [1:10:42<48:47, 1.63it/s] 59%|█████▊ | 6761/11526 [1:10:42<48:47, 1.63it/s] {'loss': 0.1811, 'grad_norm': 0.5544742941856384, 'learning_rate': 4.363449477531154e-06, 'epoch': 1.76}
59%|█████▊ | 6761/11526 [1:10:42<48:47, 1.63it/s] 59%|█████▊ | 6762/11526 [1:10:43<48:47, 1.63it/s] {'loss': 0.2164, 'grad_norm': 0.5556869506835938, 'learning_rate': 4.361947516293201e-06, 'epoch': 1.76}
59%|█████▊ | 6762/11526 [1:10:43<48:47, 1.63it/s] 59%|█████▊ | 6763/11526 [1:10:44<48:45, 1.63it/s] {'loss': 0.1736, 'grad_norm': 0.546787440776825, 'learning_rate': 4.360445613581049e-06, 'epoch': 1.76}
59%|█████▊ | 6763/11526 [1:10:44<48:45, 1.63it/s] 59%|█████▊ | 6764/11526 [1:10:44<48:47, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.5383438467979431, 'learning_rate': 4.3589437695324675e-06, 'epoch': 1.76}
59%|█████▊ | 6764/11526 [1:10:44<48:47, 1.63it/s] 59%|█████▊ | 6765/11526 [1:10:45<48:46, 1.63it/s] {'loss': 0.1887, 'grad_norm': 0.5616699457168579, 'learning_rate': 4.3574419842852085e-06, 'epoch': 1.76}
59%|█████▊ | 6765/11526 [1:10:45<48:46, 1.63it/s] 59%|█████▊ | 6766/11526 [1:10:45<48:46, 1.63it/s] {'loss': 0.2011, 'grad_norm': 0.5736537575721741, 'learning_rate': 4.3559402579770295e-06, 'epoch': 1.76}
59%|█████▊ | 6766/11526 [1:10:45<48:46, 1.63it/s] 59%|█████▊ | 6767/11526 [1:10:46<48:44, 1.63it/s] {'loss': 0.1665, 'grad_norm': 0.5303363800048828, 'learning_rate': 4.354438590745673e-06, 'epoch': 1.76}
59%|█████▊ | 6767/11526 [1:10:46<48:44, 1.63it/s] 59%|█████▊ | 6768/11526 [1:10:47<48:42, 1.63it/s] {'loss': 0.2236, 'grad_norm': 0.587628960609436, 'learning_rate': 4.352936982728881e-06, 'epoch': 1.76}
59%|█████▊ | 6768/11526 [1:10:47<48:42, 1.63it/s] 59%|█████▊ | 6769/11526 [1:10:47<48:40, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.6010907292366028, 'learning_rate': 4.351435434064391e-06, 'epoch': 1.76}
59%|█████▊ | 6769/11526 [1:10:47<48:40, 1.63it/s] 59%|█████▊ | 6770/11526 [1:10:48<48:42, 1.63it/s] {'loss': 0.2333, 'grad_norm': 0.6694866418838501, 'learning_rate': 4.349933944889934e-06, 'epoch': 1.76}
59%|█████▊ | 6770/11526 [1:10:48<48:42, 1.63it/s] 59%|█████▊ | 6771/11526 [1:10:48<48:41, 1.63it/s] {'loss': 0.1782, 'grad_norm': 0.6699041128158569, 'learning_rate': 4.348432515343235e-06, 'epoch': 1.76}
59%|█████▊ | 6771/11526 [1:10:49<48:41, 1.63it/s] 59%|█████▉ | 6772/11526 [1:10:49<48:40, 1.63it/s] {'loss': 0.2062, 'grad_norm': 0.5663532614707947, 'learning_rate': 4.34693114556201e-06, 'epoch': 1.76}
59%|█████▉ | 6772/11526 [1:10:49<48:40, 1.63it/s] 59%|█████▉ | 6773/11526 [1:10:50<48:41, 1.63it/s] {'loss': 0.1926, 'grad_norm': 0.5745680928230286, 'learning_rate': 4.345429835683978e-06, 'epoch': 1.76}
59%|█████▉ | 6773/11526 [1:10:50<48:41, 1.63it/s] 59%|█████▉ | 6774/11526 [1:10:50<48:39, 1.63it/s] {'loss': 0.2356, 'grad_norm': 0.6435632109642029, 'learning_rate': 4.3439285858468465e-06, 'epoch': 1.76}
59%|█████▉ | 6774/11526 [1:10:50<48:39, 1.63it/s] 59%|█████▉ | 6775/11526 [1:10:51<48:37, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.44380006194114685, 'learning_rate': 4.342427396188318e-06, 'epoch': 1.76}
59%|█████▉ | 6775/11526 [1:10:51<48:37, 1.63it/s] 59%|█████▉ | 6776/11526 [1:10:51<48:38, 1.63it/s] {'loss': 0.2092, 'grad_norm': 0.5578196048736572, 'learning_rate': 4.340926266846088e-06, 'epoch': 1.76}
59%|█████▉ | 6776/11526 [1:10:52<48:38, 1.63it/s] 59%|█████▉ | 6777/11526 [1:10:52<48:36, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.4298318028450012, 'learning_rate': 4.339425197957853e-06, 'epoch': 1.76}
59%|█████▉ | 6777/11526 [1:10:52<48:36, 1.63it/s] 59%|█████▉ | 6778/11526 [1:10:53<48:36, 1.63it/s] {'loss': 0.1859, 'grad_norm': 0.5124281048774719, 'learning_rate': 4.337924189661296e-06, 'epoch': 1.76}
59%|█████▉ | 6778/11526 [1:10:53<48:36, 1.63it/s] 59%|█████▉ | 6779/11526 [1:10:53<48:37, 1.63it/s] {'loss': 0.1673, 'grad_norm': 0.4729132354259491, 'learning_rate': 4.336423242094102e-06, 'epoch': 1.76}
59%|█████▉ | 6779/11526 [1:10:53<48:37, 1.63it/s] 59%|█████▉ | 6780/11526 [1:10:54<48:35, 1.63it/s] {'loss': 0.2226, 'grad_norm': 0.6029179096221924, 'learning_rate': 4.334922355393941e-06, 'epoch': 1.76}
59%|█████▉ | 6780/11526 [1:10:54<48:35, 1.63it/s] 59%|█████▉ | 6781/11526 [1:10:55<48:33, 1.63it/s] {'loss': 0.1616, 'grad_norm': 0.50376296043396, 'learning_rate': 4.333421529698486e-06, 'epoch': 1.76}
59%|█████▉ | 6781/11526 [1:10:55<48:33, 1.63it/s] 59%|█████▉ | 6782/11526 [1:10:55<48:34, 1.63it/s] {'loss': 0.2157, 'grad_norm': 0.5822186470031738, 'learning_rate': 4.3319207651454015e-06, 'epoch': 1.77}
59%|█████▉ | 6782/11526 [1:10:55<48:34, 1.63it/s] 59%|█████▉ | 6783/11526 [1:10:56<48:34, 1.63it/s] {'loss': 0.2208, 'grad_norm': 0.6617374420166016, 'learning_rate': 4.330420061872347e-06, 'epoch': 1.77}
59%|█████▉ | 6783/11526 [1:10:56<48:34, 1.63it/s] 59%|█████▉ | 6784/11526 [1:10:56<48:34, 1.63it/s] {'loss': 0.1994, 'grad_norm': 0.5488311648368835, 'learning_rate': 4.328919420016972e-06, 'epoch': 1.77}
59%|█████▉ | 6784/11526 [1:10:57<48:34, 1.63it/s] 59%|█████▉ | 6785/11526 [1:10:57<48:34, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.500963568687439, 'learning_rate': 4.327418839716926e-06, 'epoch': 1.77}
59%|█████▉ | 6785/11526 [1:10:57<48:34, 1.63it/s] 59%|█████▉ | 6786/11526 [1:10:58<48:33, 1.63it/s] {'loss': 0.1879, 'grad_norm': 0.5473921298980713, 'learning_rate': 4.325918321109852e-06, 'epoch': 1.77}
59%|█████▉ | 6786/11526 [1:10:58<48:33, 1.63it/s] 59%|█████▉ | 6787/11526 [1:10:58<48:31, 1.63it/s] {'loss': 0.2251, 'grad_norm': 0.6027835011482239, 'learning_rate': 4.3244178643333854e-06, 'epoch': 1.77}
59%|█████▉ | 6787/11526 [1:10:58<48:31, 1.63it/s] 59%|█████▉ | 6788/11526 [1:10:59<48:32, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.4986591041088104, 'learning_rate': 4.322917469525156e-06, 'epoch': 1.77}
59%|█████▉ | 6788/11526 [1:10:59<48:32, 1.63it/s] 59%|█████▉ | 6789/11526 [1:10:59<48:34, 1.63it/s] {'loss': 0.2188, 'grad_norm': 0.4920826852321625, 'learning_rate': 4.3214171368227885e-06, 'epoch': 1.77}
59%|█████▉ | 6789/11526 [1:11:00<48:34, 1.63it/s] 59%|█████▉ | 6790/11526 [1:11:00<48:33, 1.63it/s] {'loss': 0.1843, 'grad_norm': 0.4954042136669159, 'learning_rate': 4.3199168663639005e-06, 'epoch': 1.77}
59%|█████▉ | 6790/11526 [1:11:00<48:33, 1.63it/s] 59%|█████▉ | 6791/11526 [1:11:01<48:31, 1.63it/s] {'loss': 0.1511, 'grad_norm': 0.46218249201774597, 'learning_rate': 4.318416658286109e-06, 'epoch': 1.77}
59%|█████▉ | 6791/11526 [1:11:01<48:31, 1.63it/s] 59%|█████▉ | 6792/11526 [1:11:01<48:30, 1.63it/s] {'loss': 0.1875, 'grad_norm': 0.5330107808113098, 'learning_rate': 4.316916512727019e-06, 'epoch': 1.77}
59%|█████▉ | 6792/11526 [1:11:01<48:30, 1.63it/s] 59%|█████▉ | 6793/11526 [1:11:02<48:25, 1.63it/s] {'loss': 0.1479, 'grad_norm': 0.43471506237983704, 'learning_rate': 4.315416429824234e-06, 'epoch': 1.77}
59%|█████▉ | 6793/11526 [1:11:02<48:25, 1.63it/s] 59%|█████▉ | 6794/11526 [1:11:03<48:30, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.51816326379776, 'learning_rate': 4.3139164097153465e-06, 'epoch': 1.77}
59%|█████▉ | 6794/11526 [1:11:03<48:30, 1.63it/s] 59%|█████▉ | 6795/11526 [1:11:03<48:29, 1.63it/s] {'loss': 0.192, 'grad_norm': 0.43735888600349426, 'learning_rate': 4.312416452537953e-06, 'epoch': 1.77}
59%|█████▉ | 6795/11526 [1:11:03<48:29, 1.63it/s] 59%|█████▉ | 6796/11526 [1:11:04<48:28, 1.63it/s] {'loss': 0.2064, 'grad_norm': 0.5693363547325134, 'learning_rate': 4.310916558429632e-06, 'epoch': 1.77}
59%|█████▉ | 6796/11526 [1:11:04<48:28, 1.63it/s] 59%|█████▉ | 6797/11526 [1:11:04<48:26, 1.63it/s] {'loss': 0.2123, 'grad_norm': 0.6608054041862488, 'learning_rate': 4.309416727527967e-06, 'epoch': 1.77}
59%|█████▉ | 6797/11526 [1:11:05<48:26, 1.63it/s] 59%|█████▉ | 6798/11526 [1:11:05<48:26, 1.63it/s] {'loss': 0.2039, 'grad_norm': 0.5219505429267883, 'learning_rate': 4.307916959970529e-06, 'epoch': 1.77}
59%|█████▉ | 6798/11526 [1:11:05<48:26, 1.63it/s] 59%|█████▉ | 6799/11526 [1:11:06<48:27, 1.63it/s] {'loss': 0.1701, 'grad_norm': 0.46165892481803894, 'learning_rate': 4.306417255894882e-06, 'epoch': 1.77}
59%|█████▉ | 6799/11526 [1:11:06<48:27, 1.63it/s] 59%|█████▉ | 6800/11526 [1:11:06<48:28, 1.62it/s] {'loss': 0.1621, 'grad_norm': 0.5163775086402893, 'learning_rate': 4.304917615438594e-06, 'epoch': 1.77}
59%|█████▉ | 6800/11526 [1:11:06<48:28, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5349627137184143, 'eval_runtime': 1.9554, 'eval_samples_per_second': 102.283, 'eval_steps_per_second': 6.648, 'epoch': 1.77}
59%|█████▉ | 6800/11526 [1:11:08<48:28, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 59%|█████▉ | 6801/11526 [1:11:09<1:34:44, 1.20s/it] {'loss': 0.2229, 'grad_norm': 0.6245871186256409, 'learning_rate': 4.303418038739216e-06, 'epoch': 1.77}
59%|█████▉ | 6801/11526 [1:11:09<1:34:44, 1.20s/it] 59%|█████▉ | 6802/11526 [1:11:09<1:20:47, 1.03s/it] {'loss': 0.1915, 'grad_norm': 0.535652756690979, 'learning_rate': 4.3019185259343e-06, 'epoch': 1.77}
59%|█████▉ | 6802/11526 [1:11:10<1:20:47, 1.03s/it] 59%|█████▉ | 6803/11526 [1:11:10<1:11:03, 1.11it/s] {'loss': 0.1773, 'grad_norm': 0.5322232246398926, 'learning_rate': 4.300419077161386e-06, 'epoch': 1.77}
59%|█████▉ | 6803/11526 [1:11:10<1:11:03, 1.11it/s] 59%|█████▉ | 6804/11526 [1:11:11<1:04:17, 1.22it/s] {'loss': 0.136, 'grad_norm': 0.4608101546764374, 'learning_rate': 4.298919692558016e-06, 'epoch': 1.77}
59%|█████▉ | 6804/11526 [1:11:11<1:04:17, 1.22it/s] 59%|█████▉ | 6805/11526 [1:11:11<59:28, 1.32it/s] {'loss': 0.2516, 'grad_norm': 0.7883619666099548, 'learning_rate': 4.297420372261722e-06, 'epoch': 1.77}
59%|█████▉ | 6805/11526 [1:11:11<59:28, 1.32it/s] 59%|█████▉ | 6806/11526 [1:11:12<56:07, 1.40it/s] {'loss': 0.1613, 'grad_norm': 0.46345871686935425, 'learning_rate': 4.295921116410029e-06, 'epoch': 1.77}
59%|█████▉ | 6806/11526 [1:11:12<56:07, 1.40it/s] 59%|█████▉ | 6807/11526 [1:11:13<53:46, 1.46it/s] {'loss': 0.2262, 'grad_norm': 0.5942211747169495, 'learning_rate': 4.2944219251404556e-06, 'epoch': 1.77}
59%|█████▉ | 6807/11526 [1:11:13<53:46, 1.46it/s] 59%|█████▉ | 6808/11526 [1:11:13<52:07, 1.51it/s] {'loss': 0.1619, 'grad_norm': 0.46002063155174255, 'learning_rate': 4.292922798590517e-06, 'epoch': 1.77}
59%|█████▉ | 6808/11526 [1:11:13<52:07, 1.51it/s] 59%|█████▉ | 6809/11526 [1:11:14<51:03, 1.54it/s] {'loss': 0.1875, 'grad_norm': 0.5290367007255554, 'learning_rate': 4.291423736897725e-06, 'epoch': 1.77}
59%|█████▉ | 6809/11526 [1:11:14<51:03, 1.54it/s] 59%|█████▉ | 6810/11526 [1:11:14<50:14, 1.56it/s] {'loss': 0.1532, 'grad_norm': 0.46106842160224915, 'learning_rate': 4.289924740199579e-06, 'epoch': 1.77}
59%|█████▉ | 6810/11526 [1:11:14<50:14, 1.56it/s] 59%|█████▉ | 6811/11526 [1:11:15<49:39, 1.58it/s] {'loss': 0.1487, 'grad_norm': 0.4334805905818939, 'learning_rate': 4.2884258086335755e-06, 'epoch': 1.77}
59%|█████▉ | 6811/11526 [1:11:15<49:39, 1.58it/s] 59%|█████▉ | 6812/11526 [1:11:16<49:15, 1.60it/s] {'loss': 0.1939, 'grad_norm': 0.5182964205741882, 'learning_rate': 4.286926942337204e-06, 'epoch': 1.77}
59%|█████▉ | 6812/11526 [1:11:16<49:15, 1.60it/s] 59%|█████▉ | 6813/11526 [1:11:16<48:55, 1.61it/s] {'loss': 0.1735, 'grad_norm': 0.49239665269851685, 'learning_rate': 4.2854281414479525e-06, 'epoch': 1.77}
59%|█████▉ | 6813/11526 [1:11:16<48:55, 1.61it/s] 59%|█████▉ | 6814/11526 [1:11:17<48:47, 1.61it/s] {'loss': 0.1492, 'grad_norm': 0.4704868793487549, 'learning_rate': 4.283929406103298e-06, 'epoch': 1.77}
59%|█████▉ | 6814/11526 [1:11:17<48:47, 1.61it/s] 59%|█████▉ | 6815/11526 [1:11:17<48:36, 1.62it/s] {'loss': 0.1309, 'grad_norm': 0.3948592245578766, 'learning_rate': 4.282430736440712e-06, 'epoch': 1.77}
59%|█████▉ | 6815/11526 [1:11:18<48:36, 1.62it/s] 59%|█████▉ | 6816/11526 [1:11:18<48:28, 1.62it/s] {'loss': 0.2357, 'grad_norm': 0.594764769077301, 'learning_rate': 4.280932132597661e-06, 'epoch': 1.77}
59%|█████▉ | 6816/11526 [1:11:18<48:28, 1.62it/s] 59%|█████▉ | 6817/11526 [1:11:19<48:22, 1.62it/s] {'loss': 0.2004, 'grad_norm': 0.5666645169258118, 'learning_rate': 4.2794335947116065e-06, 'epoch': 1.77}
59%|█████▉ | 6817/11526 [1:11:19<48:22, 1.62it/s] 59%|█████▉ | 6818/11526 [1:11:19<48:18, 1.62it/s] {'loss': 0.1651, 'grad_norm': 0.441059947013855, 'learning_rate': 4.277935122920003e-06, 'epoch': 1.77}
59%|█████▉ | 6818/11526 [1:11:19<48:18, 1.62it/s] 59%|█████▉ | 6819/11526 [1:11:20<48:19, 1.62it/s] {'loss': 0.2371, 'grad_norm': 0.6037691831588745, 'learning_rate': 4.276436717360297e-06, 'epoch': 1.77}
59%|█████▉ | 6819/11526 [1:11:20<48:19, 1.62it/s] 59%|█████▉ | 6820/11526 [1:11:21<48:15, 1.63it/s] {'loss': 0.2316, 'grad_norm': 0.6409966945648193, 'learning_rate': 4.274938378169935e-06, 'epoch': 1.78}
59%|█████▉ | 6820/11526 [1:11:21<48:15, 1.63it/s] 59%|█████▉ | 6821/11526 [1:11:21<48:13, 1.63it/s] {'loss': 0.1971, 'grad_norm': 0.49755144119262695, 'learning_rate': 4.2734401054863466e-06, 'epoch': 1.78}
59%|█████▉ | 6821/11526 [1:11:21<48:13, 1.63it/s] 59%|█████▉ | 6822/11526 [1:11:22<48:10, 1.63it/s] {'loss': 0.1854, 'grad_norm': 0.5963601469993591, 'learning_rate': 4.2719418994469685e-06, 'epoch': 1.78}
59%|█████▉ | 6822/11526 [1:11:22<48:10, 1.63it/s] 59%|█████▉ | 6823/11526 [1:11:22<48:10, 1.63it/s] {'loss': 0.2326, 'grad_norm': 0.6425283551216125, 'learning_rate': 4.2704437601892205e-06, 'epoch': 1.78}
59%|█████▉ | 6823/11526 [1:11:22<48:10, 1.63it/s] 59%|█████▉ | 6824/11526 [1:11:23<48:13, 1.62it/s] {'loss': 0.2145, 'grad_norm': 0.5967991352081299, 'learning_rate': 4.268945687850523e-06, 'epoch': 1.78}
59%|█████▉ | 6824/11526 [1:11:23<48:13, 1.62it/s] 59%|█████▉ | 6825/11526 [1:11:24<48:12, 1.62it/s] {'loss': 0.1586, 'grad_norm': 0.5165544748306274, 'learning_rate': 4.2674476825682845e-06, 'epoch': 1.78}
59%|█████▉ | 6825/11526 [1:11:24<48:12, 1.62it/s] 59%|█████▉ | 6826/11526 [1:11:24<48:12, 1.63it/s] {'loss': 0.2402, 'grad_norm': 0.6219651699066162, 'learning_rate': 4.265949744479915e-06, 'epoch': 1.78}
59%|█████▉ | 6826/11526 [1:11:24<48:12, 1.63it/s] 59%|█████▉ | 6827/11526 [1:11:25<48:10, 1.63it/s] {'loss': 0.1797, 'grad_norm': 0.523841142654419, 'learning_rate': 4.26445187372281e-06, 'epoch': 1.78}
59%|█████▉ | 6827/11526 [1:11:25<48:10, 1.63it/s] 59%|█████▉ | 6828/11526 [1:11:25<48:08, 1.63it/s] {'loss': 0.3121, 'grad_norm': 0.5813983082771301, 'learning_rate': 4.262954070434365e-06, 'epoch': 1.78}
59%|█████▉ | 6828/11526 [1:11:26<48:08, 1.63it/s] 59%|█████▉ | 6829/11526 [1:11:26<48:08, 1.63it/s] {'loss': 0.2055, 'grad_norm': 0.5354546904563904, 'learning_rate': 4.261456334751966e-06, 'epoch': 1.78}
59%|█████▉ | 6829/11526 [1:11:26<48:08, 1.63it/s] 59%|█████▉ | 6830/11526 [1:11:27<48:05, 1.63it/s] {'loss': 0.2238, 'grad_norm': 0.6134254336357117, 'learning_rate': 4.259958666812994e-06, 'epoch': 1.78}
59%|█████▉ | 6830/11526 [1:11:27<48:05, 1.63it/s] 59%|█████▉ | 6831/11526 [1:11:27<48:03, 1.63it/s] {'loss': 0.2191, 'grad_norm': 0.5821624994277954, 'learning_rate': 4.258461066754824e-06, 'epoch': 1.78}
59%|█████▉ | 6831/11526 [1:11:27<48:03, 1.63it/s] 59%|█████▉ | 6832/11526 [1:11:28<48:03, 1.63it/s] {'loss': 0.1974, 'grad_norm': 0.5012708306312561, 'learning_rate': 4.256963534714824e-06, 'epoch': 1.78}
59%|█████▉ | 6832/11526 [1:11:28<48:03, 1.63it/s] 59%|█████▉ | 6833/11526 [1:11:28<48:01, 1.63it/s] {'loss': 0.2438, 'grad_norm': 0.593525230884552, 'learning_rate': 4.255466070830357e-06, 'epoch': 1.78}
59%|█████▉ | 6833/11526 [1:11:29<48:01, 1.63it/s] 59%|█████▉ | 6834/11526 [1:11:29<48:05, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.5551138520240784, 'learning_rate': 4.253968675238776e-06, 'epoch': 1.78}
59%|█████▉ | 6834/11526 [1:11:29<48:05, 1.63it/s] 59%|█████▉ | 6835/11526 [1:11:30<48:03, 1.63it/s] {'loss': 0.2148, 'grad_norm': 0.5762263536453247, 'learning_rate': 4.252471348077434e-06, 'epoch': 1.78}
59%|█████▉ | 6835/11526 [1:11:30<48:03, 1.63it/s] 59%|█████▉ | 6836/11526 [1:11:30<48:01, 1.63it/s] {'loss': 0.2273, 'grad_norm': 0.6771634221076965, 'learning_rate': 4.250974089483675e-06, 'epoch': 1.78}
59%|█████▉ | 6836/11526 [1:11:30<48:01, 1.63it/s] 59%|█████▉ | 6837/11526 [1:11:31<48:00, 1.63it/s] {'loss': 0.2167, 'grad_norm': 0.592136025428772, 'learning_rate': 4.249476899594832e-06, 'epoch': 1.78}
59%|█████▉ | 6837/11526 [1:11:31<48:00, 1.63it/s] 59%|█████▉ | 6838/11526 [1:11:32<47:59, 1.63it/s] {'loss': 0.1906, 'grad_norm': 0.500200092792511, 'learning_rate': 4.24797977854824e-06, 'epoch': 1.78}
59%|█████▉ | 6838/11526 [1:11:32<47:59, 1.63it/s] 59%|█████▉ | 6839/11526 [1:11:32<48:03, 1.63it/s] {'loss': 0.2502, 'grad_norm': 0.6440930366516113, 'learning_rate': 4.246482726481219e-06, 'epoch': 1.78}
59%|█████▉ | 6839/11526 [1:11:32<48:03, 1.63it/s] 59%|█████▉ | 6840/11526 [1:11:33<48:01, 1.63it/s] {'loss': 0.2461, 'grad_norm': 0.6740423440933228, 'learning_rate': 4.244985743531092e-06, 'epoch': 1.78}
59%|█████▉ | 6840/11526 [1:11:33<48:01, 1.63it/s] 59%|█████▉ | 6841/11526 [1:11:33<47:58, 1.63it/s] {'loss': 0.2308, 'grad_norm': 0.54420405626297, 'learning_rate': 4.243488829835167e-06, 'epoch': 1.78}
59%|█████▉ | 6841/11526 [1:11:34<47:58, 1.63it/s] 59%|█████▉ | 6842/11526 [1:11:34<47:58, 1.63it/s] {'loss': 0.1604, 'grad_norm': 0.4493846893310547, 'learning_rate': 4.241991985530752e-06, 'epoch': 1.78}
59%|█████▉ | 6842/11526 [1:11:34<47:58, 1.63it/s] 59%|█████▉ | 6843/11526 [1:11:35<47:58, 1.63it/s] {'loss': 0.2025, 'grad_norm': 0.48490068316459656, 'learning_rate': 4.240495210755143e-06, 'epoch': 1.78}
59%|█████▉ | 6843/11526 [1:11:35<47:58, 1.63it/s] 59%|█████▉ | 6844/11526 [1:11:35<48:00, 1.63it/s] {'loss': 0.2066, 'grad_norm': 0.5776879787445068, 'learning_rate': 4.238998505645638e-06, 'epoch': 1.78}
59%|█████▉ | 6844/11526 [1:11:35<48:00, 1.63it/s] 59%|█████▉ | 6845/11526 [1:11:36<47:58, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.5288534164428711, 'learning_rate': 4.237501870339518e-06, 'epoch': 1.78}
59%|█████▉ | 6845/11526 [1:11:36<47:58, 1.63it/s] 59%|█████▉ | 6846/11526 [1:11:36<47:56, 1.63it/s] {'loss': 0.2163, 'grad_norm': 0.5693230628967285, 'learning_rate': 4.236005304974065e-06, 'epoch': 1.78}
59%|█████▉ | 6846/11526 [1:11:37<47:56, 1.63it/s] 59%|█████▉ | 6847/11526 [1:11:37<47:54, 1.63it/s] {'loss': 0.1474, 'grad_norm': 0.4940963685512543, 'learning_rate': 4.234508809686553e-06, 'epoch': 1.78}
59%|█████▉ | 6847/11526 [1:11:37<47:54, 1.63it/s] 59%|█████▉ | 6848/11526 [1:11:38<47:52, 1.63it/s] {'loss': 0.1625, 'grad_norm': 0.5108457803726196, 'learning_rate': 4.233012384614247e-06, 'epoch': 1.78}
59%|█████▉ | 6848/11526 [1:11:38<47:52, 1.63it/s] 59%|█████▉ | 6849/11526 [1:11:38<47:56, 1.63it/s] {'loss': 0.2179, 'grad_norm': 0.5531466603279114, 'learning_rate': 4.231516029894409e-06, 'epoch': 1.78}
59%|█████▉ | 6849/11526 [1:11:38<47:56, 1.63it/s] 59%|█████▉ | 6850/11526 [1:11:39<47:53, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.42857372760772705, 'learning_rate': 4.230019745664295e-06, 'epoch': 1.78}
59%|█████▉ | 6850/11526 [1:11:39<47:53, 1.63it/s] 59%|█████▉ | 6851/11526 [1:11:40<47:51, 1.63it/s] {'loss': 0.2522, 'grad_norm': 0.7211313843727112, 'learning_rate': 4.22852353206115e-06, 'epoch': 1.78}
59%|█████▉ | 6851/11526 [1:11:40<47:51, 1.63it/s] 59%|█████▉ | 6852/11526 [1:11:40<47:49, 1.63it/s] {'loss': 0.1989, 'grad_norm': 0.5162541270256042, 'learning_rate': 4.227027389222215e-06, 'epoch': 1.78}
59%|█████▉ | 6852/11526 [1:11:40<47:49, 1.63it/s] 59%|█████▉ | 6853/11526 [1:11:41<47:49, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.5316342115402222, 'learning_rate': 4.225531317284727e-06, 'epoch': 1.78}
59%|█████▉ | 6853/11526 [1:11:41<47:49, 1.63it/s] 59%|█████▉ | 6854/11526 [1:11:41<48:03, 1.62it/s] {'loss': 0.1658, 'grad_norm': 0.4932999908924103, 'learning_rate': 4.2240353163859145e-06, 'epoch': 1.78}
59%|█████▉ | 6854/11526 [1:11:42<48:03, 1.62it/s] 59%|█████▉ | 6855/11526 [1:11:42<47:59, 1.62it/s] {'loss': 0.1514, 'grad_norm': 0.45910751819610596, 'learning_rate': 4.222539386662997e-06, 'epoch': 1.78}
59%|█████▉ | 6855/11526 [1:11:42<47:59, 1.62it/s] 59%|█████▉ | 6856/11526 [1:11:43<47:55, 1.62it/s] {'loss': 0.2198, 'grad_norm': 0.5585564970970154, 'learning_rate': 4.22104352825319e-06, 'epoch': 1.78}
59%|█████▉ | 6856/11526 [1:11:43<47:55, 1.62it/s] 59%|█████▉ | 6857/11526 [1:11:43<47:52, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5409095883369446, 'learning_rate': 4.219547741293701e-06, 'epoch': 1.78}
59%|█████▉ | 6857/11526 [1:11:43<47:52, 1.63it/s] 60%|█████▉ | 6858/11526 [1:11:44<47:50, 1.63it/s] {'loss': 0.1846, 'grad_norm': 0.5315662026405334, 'learning_rate': 4.2180520259217364e-06, 'epoch': 1.79}
60%|█████▉ | 6858/11526 [1:11:44<47:50, 1.63it/s] 60%|█████▉ | 6859/11526 [1:11:44<47:54, 1.62it/s] {'loss': 0.2549, 'grad_norm': 0.7207983732223511, 'learning_rate': 4.2165563822744895e-06, 'epoch': 1.79}
60%|█████▉ | 6859/11526 [1:11:45<47:54, 1.62it/s] 60%|█████▉ | 6860/11526 [1:11:45<47:52, 1.62it/s] {'loss': 0.2052, 'grad_norm': 0.5138137936592102, 'learning_rate': 4.215060810489148e-06, 'epoch': 1.79}
60%|█████▉ | 6860/11526 [1:11:45<47:52, 1.62it/s] 60%|█████▉ | 6861/11526 [1:11:46<47:49, 1.63it/s] {'loss': 0.2455, 'grad_norm': 0.6043034791946411, 'learning_rate': 4.213565310702892e-06, 'epoch': 1.79}
60%|█████▉ | 6861/11526 [1:11:46<47:49, 1.63it/s] 60%|█████▉ | 6862/11526 [1:11:46<47:48, 1.63it/s] {'loss': 0.1959, 'grad_norm': 0.5521331429481506, 'learning_rate': 4.212069883052904e-06, 'epoch': 1.79}
60%|█████▉ | 6862/11526 [1:11:46<47:48, 1.63it/s] 60%|█████▉ | 6863/11526 [1:11:47<47:47, 1.63it/s] {'loss': 0.2272, 'grad_norm': 0.533997118473053, 'learning_rate': 4.210574527676349e-06, 'epoch': 1.79}
60%|█████▉ | 6863/11526 [1:11:47<47:47, 1.63it/s] 60%|█████▉ | 6864/11526 [1:11:48<47:52, 1.62it/s] {'loss': 0.1796, 'grad_norm': 0.5050402879714966, 'learning_rate': 4.209079244710389e-06, 'epoch': 1.79}
60%|█████▉ | 6864/11526 [1:11:48<47:52, 1.62it/s] 60%|█████▉ | 6865/11526 [1:11:48<47:50, 1.62it/s] {'loss': 0.1506, 'grad_norm': 0.4908497631549835, 'learning_rate': 4.207584034292182e-06, 'epoch': 1.79}
60%|█████▉ | 6865/11526 [1:11:48<47:50, 1.62it/s] 60%|█████▉ | 6866/11526 [1:11:49<47:49, 1.62it/s] {'loss': 0.2765, 'grad_norm': 0.6901393532752991, 'learning_rate': 4.206088896558874e-06, 'epoch': 1.79}
60%|█████▉ | 6866/11526 [1:11:49<47:49, 1.62it/s] 60%|█████▉ | 6867/11526 [1:11:49<47:49, 1.62it/s] {'loss': 0.159, 'grad_norm': 0.508581817150116, 'learning_rate': 4.2045938316476114e-06, 'epoch': 1.79}
60%|█████▉ | 6867/11526 [1:11:50<47:49, 1.62it/s] 60%|█████▉ | 6868/11526 [1:11:50<47:47, 1.62it/s] {'loss': 0.2359, 'grad_norm': 0.63795405626297, 'learning_rate': 4.203098839695528e-06, 'epoch': 1.79}
60%|█████▉ | 6868/11526 [1:11:50<47:47, 1.62it/s] 60%|█████▉ | 6869/11526 [1:11:51<47:47, 1.62it/s] {'loss': 0.1925, 'grad_norm': 0.5805729031562805, 'learning_rate': 4.201603920839753e-06, 'epoch': 1.79}
60%|█████▉ | 6869/11526 [1:11:51<47:47, 1.62it/s] 60%|█████▉ | 6870/11526 [1:11:51<47:45, 1.63it/s] {'loss': 0.208, 'grad_norm': 0.5691603422164917, 'learning_rate': 4.200109075217408e-06, 'epoch': 1.79}
60%|█████▉ | 6870/11526 [1:11:51<47:45, 1.63it/s] 60%|█████▉ | 6871/11526 [1:11:52<47:41, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5018669366836548, 'learning_rate': 4.198614302965611e-06, 'epoch': 1.79}
60%|█████▉ | 6871/11526 [1:11:52<47:41, 1.63it/s] 60%|█████▉ | 6872/11526 [1:11:52<47:41, 1.63it/s] {'loss': 0.2239, 'grad_norm': 0.6074321866035461, 'learning_rate': 4.197119604221469e-06, 'epoch': 1.79}
60%|█████▉ | 6872/11526 [1:11:53<47:41, 1.63it/s] 60%|█████▉ | 6873/11526 [1:11:53<47:38, 1.63it/s] {'loss': 0.2038, 'grad_norm': 0.5493898987770081, 'learning_rate': 4.1956249791220864e-06, 'epoch': 1.79}
60%|█████▉ | 6873/11526 [1:11:53<47:38, 1.63it/s] 60%|█████▉ | 6874/11526 [1:11:54<47:41, 1.63it/s] {'loss': 0.2232, 'grad_norm': 0.5530081987380981, 'learning_rate': 4.194130427804556e-06, 'epoch': 1.79}
60%|█████▉ | 6874/11526 [1:11:54<47:41, 1.63it/s] 60%|█████▉ | 6875/11526 [1:11:54<47:38, 1.63it/s] {'loss': 0.2114, 'grad_norm': 0.5907770395278931, 'learning_rate': 4.192635950405969e-06, 'epoch': 1.79}
60%|█████▉ | 6875/11526 [1:11:54<47:38, 1.63it/s] 60%|█████▉ | 6876/11526 [1:11:55<47:36, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.6149120926856995, 'learning_rate': 4.191141547063405e-06, 'epoch': 1.79}
60%|█████▉ | 6876/11526 [1:11:55<47:36, 1.63it/s] 60%|█████▉ | 6877/11526 [1:11:56<47:34, 1.63it/s] {'loss': 0.2157, 'grad_norm': 0.5738511681556702, 'learning_rate': 4.189647217913943e-06, 'epoch': 1.79}
60%|█████▉ | 6877/11526 [1:11:56<47:34, 1.63it/s] 60%|█████▉ | 6878/11526 [1:11:56<47:35, 1.63it/s] {'loss': 0.2617, 'grad_norm': 0.6381465196609497, 'learning_rate': 4.188152963094648e-06, 'epoch': 1.79}
60%|█████▉ | 6878/11526 [1:11:56<47:35, 1.63it/s] 60%|█████▉ | 6879/11526 [1:11:57<47:37, 1.63it/s] {'loss': 0.2279, 'grad_norm': 0.573868989944458, 'learning_rate': 4.18665878274258e-06, 'epoch': 1.79}
60%|█████▉ | 6879/11526 [1:11:57<47:37, 1.63it/s] 60%|█████▉ | 6880/11526 [1:11:57<47:35, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.48696520924568176, 'learning_rate': 4.185164676994798e-06, 'epoch': 1.79}
60%|█████▉ | 6880/11526 [1:11:58<47:35, 1.63it/s] 60%|█████▉ | 6881/11526 [1:11:58<47:33, 1.63it/s] {'loss': 0.242, 'grad_norm': 0.65585857629776, 'learning_rate': 4.183670645988349e-06, 'epoch': 1.79}
60%|█████▉ | 6881/11526 [1:11:58<47:33, 1.63it/s] 60%|█████▉ | 6882/11526 [1:11:59<47:36, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.4522063136100769, 'learning_rate': 4.182176689860273e-06, 'epoch': 1.79}
60%|█████▉ | 6882/11526 [1:11:59<47:36, 1.63it/s] 60%|█████▉ | 6883/11526 [1:11:59<47:35, 1.63it/s] {'loss': 0.2414, 'grad_norm': 0.56806880235672, 'learning_rate': 4.180682808747602e-06, 'epoch': 1.79}
60%|█████▉ | 6883/11526 [1:11:59<47:35, 1.63it/s] 60%|█████▉ | 6884/11526 [1:12:00<47:43, 1.62it/s] {'loss': 0.2031, 'grad_norm': 0.5773836374282837, 'learning_rate': 4.179189002787366e-06, 'epoch': 1.79}
60%|█████▉ | 6884/11526 [1:12:00<47:43, 1.62it/s] 60%|█████▉ | 6885/11526 [1:12:00<47:39, 1.62it/s] {'loss': 0.2971, 'grad_norm': 0.7851483821868896, 'learning_rate': 4.177695272116587e-06, 'epoch': 1.79}
60%|█████▉ | 6885/11526 [1:12:01<47:39, 1.62it/s] 60%|█████▉ | 6886/11526 [1:12:01<47:36, 1.62it/s] {'loss': 0.1885, 'grad_norm': 0.5755347013473511, 'learning_rate': 4.176201616872276e-06, 'epoch': 1.79}
60%|█████▉ | 6886/11526 [1:12:01<47:36, 1.62it/s] 60%|█████▉ | 6887/11526 [1:12:02<47:32, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.4893949031829834, 'learning_rate': 4.174708037191438e-06, 'epoch': 1.79}
60%|█████▉ | 6887/11526 [1:12:02<47:32, 1.63it/s] 60%|█████▉ | 6888/11526 [1:12:02<47:29, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.5515700578689575, 'learning_rate': 4.173214533211075e-06, 'epoch': 1.79}
60%|█████▉ | 6888/11526 [1:12:02<47:29, 1.63it/s] 60%|█████▉ | 6889/11526 [1:12:03<47:31, 1.63it/s] {'loss': 0.2171, 'grad_norm': 0.5096089243888855, 'learning_rate': 4.17172110506818e-06, 'epoch': 1.79}
60%|█████▉ | 6889/11526 [1:12:03<47:31, 1.63it/s] 60%|█████▉ | 6890/11526 [1:12:04<47:30, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5534856915473938, 'learning_rate': 4.170227752899739e-06, 'epoch': 1.79}
60%|█████▉ | 6890/11526 [1:12:04<47:30, 1.63it/s] 60%|█████▉ | 6891/11526 [1:12:04<47:30, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5350280404090881, 'learning_rate': 4.168734476842728e-06, 'epoch': 1.79}
60%|█████▉ | 6891/11526 [1:12:04<47:30, 1.63it/s] 60%|█████▉ | 6892/11526 [1:12:05<47:28, 1.63it/s] {'loss': 0.1949, 'grad_norm': 0.5171188116073608, 'learning_rate': 4.1672412770341196e-06, 'epoch': 1.79}
60%|█████▉ | 6892/11526 [1:12:05<47:28, 1.63it/s] 60%|█████▉ | 6893/11526 [1:12:05<47:26, 1.63it/s] {'loss': 0.1785, 'grad_norm': 0.5749475955963135, 'learning_rate': 4.165748153610881e-06, 'epoch': 1.79}
60%|█████▉ | 6893/11526 [1:12:06<47:26, 1.63it/s] 60%|█████▉ | 6894/11526 [1:12:06<47:32, 1.62it/s] {'loss': 0.2283, 'grad_norm': 0.5936381816864014, 'learning_rate': 4.16425510670997e-06, 'epoch': 1.79}
60%|█████▉ | 6894/11526 [1:12:06<47:32, 1.62it/s] 60%|█████▉ | 6895/11526 [1:12:07<47:30, 1.62it/s] {'loss': 0.1666, 'grad_norm': 0.48360785841941833, 'learning_rate': 4.162762136468334e-06, 'epoch': 1.79}
60%|█████▉ | 6895/11526 [1:12:07<47:30, 1.62it/s] 60%|█████▉ | 6896/11526 [1:12:07<47:28, 1.63it/s] {'loss': 0.1888, 'grad_norm': 0.47046375274658203, 'learning_rate': 4.161269243022919e-06, 'epoch': 1.79}
60%|█████▉ | 6896/11526 [1:12:07<47:28, 1.63it/s] 60%|█████▉ | 6897/11526 [1:12:08<47:26, 1.63it/s] {'loss': 0.2572, 'grad_norm': 0.6981285810470581, 'learning_rate': 4.159776426510659e-06, 'epoch': 1.8}
60%|█████▉ | 6897/11526 [1:12:08<47:26, 1.63it/s] 60%|█████▉ | 6898/11526 [1:12:08<47:23, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.5073105692863464, 'learning_rate': 4.158283687068488e-06, 'epoch': 1.8}
60%|█████▉ | 6898/11526 [1:12:09<47:23, 1.63it/s] 60%|█████▉ | 6899/11526 [1:12:09<47:29, 1.62it/s] {'loss': 0.1858, 'grad_norm': 0.6021971106529236, 'learning_rate': 4.1567910248333265e-06, 'epoch': 1.8}
60%|█████▉ | 6899/11526 [1:12:09<47:29, 1.62it/s] 60%|█████▉ | 6900/11526 [1:12:10<47:28, 1.62it/s] {'loss': 0.2182, 'grad_norm': 0.5816395282745361, 'learning_rate': 4.15529843994209e-06, 'epoch': 1.8}
60%|█████▉ | 6900/11526 [1:12:10<47:28, 1.62it/s] 60%|█████▉ | 6901/11526 [1:12:10<47:24, 1.63it/s] {'loss': 0.218, 'grad_norm': 0.5295535922050476, 'learning_rate': 4.153805932531685e-06, 'epoch': 1.8}
60%|█████▉ | 6901/11526 [1:12:10<47:24, 1.63it/s] 60%|█████▉ | 6902/11526 [1:12:11<47:22, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.5468753576278687, 'learning_rate': 4.152313502739015e-06, 'epoch': 1.8}
60%|█████▉ | 6902/11526 [1:12:11<47:22, 1.63it/s] 60%|█████▉ | 6903/11526 [1:12:12<47:21, 1.63it/s] {'loss': 0.23, 'grad_norm': 0.6562590003013611, 'learning_rate': 4.150821150700975e-06, 'epoch': 1.8}
60%|█████▉ | 6903/11526 [1:12:12<47:21, 1.63it/s] 60%|█████▉ | 6904/11526 [1:12:12<47:23, 1.63it/s] {'loss': 0.1594, 'grad_norm': 0.4986235499382019, 'learning_rate': 4.149328876554451e-06, 'epoch': 1.8}
60%|█████▉ | 6904/11526 [1:12:12<47:23, 1.63it/s] 60%|█████▉ | 6905/11526 [1:12:13<47:21, 1.63it/s] {'loss': 0.226, 'grad_norm': 0.6428924202919006, 'learning_rate': 4.1478366804363215e-06, 'epoch': 1.8}
60%|█████▉ | 6905/11526 [1:12:13<47:21, 1.63it/s] 60%|█████▉ | 6906/11526 [1:12:13<47:21, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.5354100465774536, 'learning_rate': 4.1463445624834585e-06, 'epoch': 1.8}
60%|█████▉ | 6906/11526 [1:12:14<47:21, 1.63it/s] 60%|█████▉ | 6907/11526 [1:12:14<47:21, 1.63it/s] {'loss': 0.1499, 'grad_norm': 0.43633824586868286, 'learning_rate': 4.144852522832732e-06, 'epoch': 1.8}
60%|█████▉ | 6907/11526 [1:12:14<47:21, 1.63it/s] 60%|█████▉ | 6908/11526 [1:12:15<47:20, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.6199550032615662, 'learning_rate': 4.1433605616209975e-06, 'epoch': 1.8}
60%|█████▉ | 6908/11526 [1:12:15<47:20, 1.63it/s] 60%|█████▉ | 6909/11526 [1:12:15<47:23, 1.62it/s] {'loss': 0.1864, 'grad_norm': 0.5408421158790588, 'learning_rate': 4.141868678985106e-06, 'epoch': 1.8}
60%|█████▉ | 6909/11526 [1:12:15<47:23, 1.62it/s] 60%|█████▉ | 6910/11526 [1:12:16<47:22, 1.62it/s] {'loss': 0.2561, 'grad_norm': 0.6513819694519043, 'learning_rate': 4.1403768750619e-06, 'epoch': 1.8}
60%|█████▉ | 6910/11526 [1:12:16<47:22, 1.62it/s] 60%|█████▉ | 6911/11526 [1:12:16<47:21, 1.62it/s] {'loss': 0.1928, 'grad_norm': 0.543856143951416, 'learning_rate': 4.1388851499882195e-06, 'epoch': 1.8}
60%|█████▉ | 6911/11526 [1:12:17<47:21, 1.62it/s] 60%|█████▉ | 6912/11526 [1:12:17<47:19, 1.62it/s] {'loss': 0.1997, 'grad_norm': 0.5465028882026672, 'learning_rate': 4.137393503900894e-06, 'epoch': 1.8}
60%|█████▉ | 6912/11526 [1:12:17<47:19, 1.62it/s] 60%|█████▉ | 6913/11526 [1:12:18<47:22, 1.62it/s] {'loss': 0.2239, 'grad_norm': 0.6328906416893005, 'learning_rate': 4.135901936936743e-06, 'epoch': 1.8}
60%|█████▉ | 6913/11526 [1:12:18<47:22, 1.62it/s] 60%|█████▉ | 6914/11526 [1:12:18<47:22, 1.62it/s] {'loss': 0.251, 'grad_norm': 0.6140619516372681, 'learning_rate': 4.134410449232584e-06, 'epoch': 1.8}
60%|█████▉ | 6914/11526 [1:12:18<47:22, 1.62it/s] 60%|█████▉ | 6915/11526 [1:12:19<47:18, 1.62it/s] {'loss': 0.1707, 'grad_norm': 0.4932888448238373, 'learning_rate': 4.13291904092522e-06, 'epoch': 1.8}
60%|█████▉ | 6915/11526 [1:12:19<47:18, 1.62it/s] 60%|██████ | 6916/11526 [1:12:20<47:16, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.5012977719306946, 'learning_rate': 4.131427712151459e-06, 'epoch': 1.8}
60%|██████ | 6916/11526 [1:12:20<47:16, 1.63it/s] 60%|██████ | 6917/11526 [1:12:20<47:15, 1.63it/s] {'loss': 0.1796, 'grad_norm': 0.5745136141777039, 'learning_rate': 4.1299364630480885e-06, 'epoch': 1.8}
60%|██████ | 6917/11526 [1:12:20<47:15, 1.63it/s] 60%|██████ | 6918/11526 [1:12:21<47:12, 1.63it/s] {'loss': 0.1636, 'grad_norm': 0.5152845978736877, 'learning_rate': 4.128445293751897e-06, 'epoch': 1.8}
60%|██████ | 6918/11526 [1:12:21<47:12, 1.63it/s] 60%|██████ | 6919/11526 [1:12:21<47:15, 1.62it/s] {'loss': 0.2339, 'grad_norm': 0.6124987602233887, 'learning_rate': 4.1269542043996606e-06, 'epoch': 1.8}
60%|██████ | 6919/11526 [1:12:22<47:15, 1.62it/s] 60%|██████ | 6920/11526 [1:12:22<47:13, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.4289529025554657, 'learning_rate': 4.125463195128153e-06, 'epoch': 1.8}
60%|██████ | 6920/11526 [1:12:22<47:13, 1.63it/s] 60%|██████ | 6921/11526 [1:12:23<47:11, 1.63it/s] {'loss': 0.2319, 'grad_norm': 0.5391265153884888, 'learning_rate': 4.123972266074137e-06, 'epoch': 1.8}
60%|██████ | 6921/11526 [1:12:23<47:11, 1.63it/s] 60%|██████ | 6922/11526 [1:12:23<47:09, 1.63it/s] {'loss': 0.2176, 'grad_norm': 0.5969319343566895, 'learning_rate': 4.12248141737437e-06, 'epoch': 1.8}
60%|██████ | 6922/11526 [1:12:23<47:09, 1.63it/s] 60%|██████ | 6923/11526 [1:12:24<47:08, 1.63it/s] {'loss': 0.2195, 'grad_norm': 0.500734806060791, 'learning_rate': 4.120990649165598e-06, 'epoch': 1.8}
60%|██████ | 6923/11526 [1:12:24<47:08, 1.63it/s] 60%|██████ | 6924/11526 [1:12:24<47:09, 1.63it/s] {'loss': 0.2338, 'grad_norm': 0.5484648942947388, 'learning_rate': 4.119499961584567e-06, 'epoch': 1.8}
60%|██████ | 6924/11526 [1:12:25<47:09, 1.63it/s] 60%|██████ | 6925/11526 [1:12:25<47:08, 1.63it/s] {'loss': 0.168, 'grad_norm': 0.49123767018318176, 'learning_rate': 4.118009354768008e-06, 'epoch': 1.8}
60%|██████ | 6925/11526 [1:12:25<47:08, 1.63it/s] 60%|██████ | 6926/11526 [1:12:26<47:09, 1.63it/s] {'loss': 0.2006, 'grad_norm': 0.5317858457565308, 'learning_rate': 4.116518828852652e-06, 'epoch': 1.8}
60%|██████ | 6926/11526 [1:12:26<47:09, 1.63it/s] 60%|██████ | 6927/11526 [1:12:26<47:08, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.4631026089191437, 'learning_rate': 4.115028383975213e-06, 'epoch': 1.8}
60%|██████ | 6927/11526 [1:12:26<47:08, 1.63it/s] 60%|██████ | 6928/11526 [1:12:27<47:05, 1.63it/s] {'loss': 0.2277, 'grad_norm': 0.5970296859741211, 'learning_rate': 4.113538020272407e-06, 'epoch': 1.8}
60%|██████ | 6928/11526 [1:12:27<47:05, 1.63it/s] 60%|██████ | 6929/11526 [1:12:28<47:08, 1.63it/s] {'loss': 0.1688, 'grad_norm': 0.45246732234954834, 'learning_rate': 4.112047737880936e-06, 'epoch': 1.8}
60%|██████ | 6929/11526 [1:12:28<47:08, 1.63it/s] 60%|██████ | 6930/11526 [1:12:28<47:05, 1.63it/s] {'loss': 0.2498, 'grad_norm': 0.6031157970428467, 'learning_rate': 4.1105575369375005e-06, 'epoch': 1.8}
60%|██████ | 6930/11526 [1:12:28<47:05, 1.63it/s] 60%|██████ | 6931/11526 [1:12:29<47:04, 1.63it/s] {'loss': 0.1844, 'grad_norm': 0.5066699385643005, 'learning_rate': 4.1090674175787895e-06, 'epoch': 1.8}
60%|██████ | 6931/11526 [1:12:29<47:04, 1.63it/s] 60%|██████ | 6932/11526 [1:12:29<47:02, 1.63it/s] {'loss': 0.1978, 'grad_norm': 0.5574156641960144, 'learning_rate': 4.107577379941481e-06, 'epoch': 1.8}
60%|██████ | 6932/11526 [1:12:30<47:02, 1.63it/s] 60%|██████ | 6933/11526 [1:12:30<47:01, 1.63it/s] {'loss': 0.2197, 'grad_norm': 0.5046302080154419, 'learning_rate': 4.106087424162254e-06, 'epoch': 1.8}
60%|██████ | 6933/11526 [1:12:30<47:01, 1.63it/s] 60%|██████ | 6934/11526 [1:12:31<47:01, 1.63it/s] {'loss': 0.2469, 'grad_norm': 0.6078893542289734, 'learning_rate': 4.104597550377776e-06, 'epoch': 1.8}
60%|██████ | 6934/11526 [1:12:31<47:01, 1.63it/s] 60%|██████ | 6935/11526 [1:12:31<46:59, 1.63it/s] {'loss': 0.1877, 'grad_norm': 0.5004312992095947, 'learning_rate': 4.103107758724705e-06, 'epoch': 1.81}
60%|██████ | 6935/11526 [1:12:31<46:59, 1.63it/s] 60%|██████ | 6936/11526 [1:12:32<46:58, 1.63it/s] {'loss': 0.2425, 'grad_norm': 0.716643214225769, 'learning_rate': 4.101618049339692e-06, 'epoch': 1.81}
60%|██████ | 6936/11526 [1:12:32<46:58, 1.63it/s] 60%|██████ | 6937/11526 [1:12:32<46:57, 1.63it/s] {'loss': 0.1763, 'grad_norm': 0.550512969493866, 'learning_rate': 4.100128422359383e-06, 'epoch': 1.81}
60%|██████ | 6937/11526 [1:12:33<46:57, 1.63it/s] 60%|██████ | 6938/11526 [1:12:33<46:57, 1.63it/s] {'loss': 0.2627, 'grad_norm': 0.5770377516746521, 'learning_rate': 4.098638877920417e-06, 'epoch': 1.81}
60%|██████ | 6938/11526 [1:12:33<46:57, 1.63it/s] 60%|██████ | 6939/11526 [1:12:34<47:00, 1.63it/s] {'loss': 0.1936, 'grad_norm': 0.5344544053077698, 'learning_rate': 4.0971494161594216e-06, 'epoch': 1.81}
60%|██████ | 6939/11526 [1:12:34<47:00, 1.63it/s] 60%|██████ | 6940/11526 [1:12:34<46:58, 1.63it/s] {'loss': 0.2675, 'grad_norm': 0.6310492157936096, 'learning_rate': 4.095660037213017e-06, 'epoch': 1.81}
60%|██████ | 6940/11526 [1:12:34<46:58, 1.63it/s] 60%|██████ | 6941/11526 [1:12:35<46:56, 1.63it/s] {'loss': 0.174, 'grad_norm': 0.4769931137561798, 'learning_rate': 4.09417074121782e-06, 'epoch': 1.81}
60%|██████ | 6941/11526 [1:12:35<46:56, 1.63it/s] 60%|██████ | 6942/11526 [1:12:36<46:54, 1.63it/s] {'loss': 0.2021, 'grad_norm': 0.6188851594924927, 'learning_rate': 4.092681528310437e-06, 'epoch': 1.81}
60%|██████ | 6942/11526 [1:12:36<46:54, 1.63it/s] 60%|██████ | 6943/11526 [1:12:36<46:52, 1.63it/s] {'loss': 0.2601, 'grad_norm': 0.6699022054672241, 'learning_rate': 4.091192398627467e-06, 'epoch': 1.81}
60%|██████ | 6943/11526 [1:12:36<46:52, 1.63it/s] 60%|██████ | 6944/11526 [1:12:37<46:54, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.48907896876335144, 'learning_rate': 4.0897033523055005e-06, 'epoch': 1.81}
60%|██████ | 6944/11526 [1:12:37<46:54, 1.63it/s] 60%|██████ | 6945/11526 [1:12:37<46:52, 1.63it/s] {'loss': 0.255, 'grad_norm': 0.666030764579773, 'learning_rate': 4.088214389481122e-06, 'epoch': 1.81}
60%|██████ | 6945/11526 [1:12:38<46:52, 1.63it/s] 60%|██████ | 6946/11526 [1:12:38<46:51, 1.63it/s] {'loss': 0.1549, 'grad_norm': 0.4620293378829956, 'learning_rate': 4.0867255102909065e-06, 'epoch': 1.81}
60%|██████ | 6946/11526 [1:12:38<46:51, 1.63it/s] 60%|██████ | 6947/11526 [1:12:39<46:50, 1.63it/s] {'loss': 0.216, 'grad_norm': 0.5260071754455566, 'learning_rate': 4.085236714871425e-06, 'epoch': 1.81}
60%|██████ | 6947/11526 [1:12:39<46:50, 1.63it/s] 60%|██████ | 6948/11526 [1:12:39<46:50, 1.63it/s] {'loss': 0.1948, 'grad_norm': 0.5423583984375, 'learning_rate': 4.083748003359236e-06, 'epoch': 1.81}
60%|██████ | 6948/11526 [1:12:39<46:50, 1.63it/s] 60%|██████ | 6949/11526 [1:12:40<46:53, 1.63it/s] {'loss': 0.2098, 'grad_norm': 0.5749220252037048, 'learning_rate': 4.082259375890894e-06, 'epoch': 1.81}
60%|██████ | 6949/11526 [1:12:40<46:53, 1.63it/s] 60%|██████ | 6950/11526 [1:12:40<46:52, 1.63it/s] {'loss': 0.1691, 'grad_norm': 0.4693126082420349, 'learning_rate': 4.080770832602943e-06, 'epoch': 1.81}
60%|██████ | 6950/11526 [1:12:41<46:52, 1.63it/s] 60%|██████ | 6951/11526 [1:12:41<46:50, 1.63it/s] {'loss': 0.1763, 'grad_norm': 0.5151182413101196, 'learning_rate': 4.079282373631922e-06, 'epoch': 1.81}
60%|██████ | 6951/11526 [1:12:41<46:50, 1.63it/s] 60%|██████ | 6952/11526 [1:12:42<46:48, 1.63it/s] {'loss': 0.2053, 'grad_norm': 0.5910793542861938, 'learning_rate': 4.07779399911436e-06, 'epoch': 1.81}
60%|██████ | 6952/11526 [1:12:42<46:48, 1.63it/s] 60%|██████ | 6953/11526 [1:12:42<46:56, 1.62it/s] {'loss': 0.238, 'grad_norm': 0.6088268756866455, 'learning_rate': 4.076305709186781e-06, 'epoch': 1.81}
60%|██████ | 6953/11526 [1:12:42<46:56, 1.62it/s] 60%|██████ | 6954/11526 [1:12:43<47:00, 1.62it/s] {'loss': 0.162, 'grad_norm': 0.4466678500175476, 'learning_rate': 4.074817503985697e-06, 'epoch': 1.81}
60%|██████ | 6954/11526 [1:12:43<47:00, 1.62it/s] 60%|██████ | 6955/11526 [1:12:44<46:57, 1.62it/s] {'loss': 0.2086, 'grad_norm': 0.591172993183136, 'learning_rate': 4.073329383647614e-06, 'epoch': 1.81}
60%|██████ | 6955/11526 [1:12:44<46:57, 1.62it/s] 60%|██████ | 6956/11526 [1:12:44<46:54, 1.62it/s] {'loss': 0.2508, 'grad_norm': 0.6761407852172852, 'learning_rate': 4.071841348309033e-06, 'epoch': 1.81}
60%|██████ | 6956/11526 [1:12:44<46:54, 1.62it/s] 60%|██████ | 6957/11526 [1:12:45<46:50, 1.63it/s] {'loss': 0.1995, 'grad_norm': 0.5703777074813843, 'learning_rate': 4.070353398106447e-06, 'epoch': 1.81}
60%|██████ | 6957/11526 [1:12:45<46:50, 1.63it/s] 60%|██████ | 6958/11526 [1:12:45<46:48, 1.63it/s] {'loss': 0.2864, 'grad_norm': 0.7682322263717651, 'learning_rate': 4.068865533176336e-06, 'epoch': 1.81}
60%|██████ | 6958/11526 [1:12:45<46:48, 1.63it/s] 60%|██████ | 6959/11526 [1:12:46<46:48, 1.63it/s] {'loss': 0.2296, 'grad_norm': 0.6818765997886658, 'learning_rate': 4.067377753655174e-06, 'epoch': 1.81}
60%|██████ | 6959/11526 [1:12:46<46:48, 1.63it/s] 60%|██████ | 6960/11526 [1:12:47<46:46, 1.63it/s] {'loss': 0.1771, 'grad_norm': 0.457408607006073, 'learning_rate': 4.065890059679431e-06, 'epoch': 1.81}
60%|██████ | 6960/11526 [1:12:47<46:46, 1.63it/s] 60%|██████ | 6961/11526 [1:12:47<46:45, 1.63it/s] {'loss': 0.2624, 'grad_norm': 0.5966795086860657, 'learning_rate': 4.064402451385569e-06, 'epoch': 1.81}
60%|██████ | 6961/11526 [1:12:47<46:45, 1.63it/s] 60%|██████ | 6962/11526 [1:12:48<46:44, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.48805665969848633, 'learning_rate': 4.062914928910036e-06, 'epoch': 1.81}
60%|██████ | 6962/11526 [1:12:48<46:44, 1.63it/s] 60%|██████ | 6963/11526 [1:12:48<46:42, 1.63it/s] {'loss': 0.1606, 'grad_norm': 0.4633457064628601, 'learning_rate': 4.0614274923892785e-06, 'epoch': 1.81}
60%|██████ | 6963/11526 [1:12:49<46:42, 1.63it/s] 60%|██████ | 6964/11526 [1:12:49<46:43, 1.63it/s] {'loss': 0.166, 'grad_norm': 0.5527840256690979, 'learning_rate': 4.0599401419597295e-06, 'epoch': 1.81}
60%|██████ | 6964/11526 [1:12:49<46:43, 1.63it/s] 60%|██████ | 6965/11526 [1:12:50<48:09, 1.58it/s] {'loss': 0.1848, 'grad_norm': 0.4816334545612335, 'learning_rate': 4.058452877757822e-06, 'epoch': 1.81}
60%|██████ | 6965/11526 [1:12:50<48:09, 1.58it/s] 60%|██████ | 6966/11526 [1:12:50<47:41, 1.59it/s] {'loss': 0.1759, 'grad_norm': 0.467300683259964, 'learning_rate': 4.056965699919972e-06, 'epoch': 1.81}
60%|██████ | 6966/11526 [1:12:50<47:41, 1.59it/s] 60%|██████ | 6967/11526 [1:12:51<47:28, 1.60it/s] {'loss': 0.2924, 'grad_norm': 0.7120322585105896, 'learning_rate': 4.055478608582595e-06, 'epoch': 1.81}
60%|██████ | 6967/11526 [1:12:51<47:28, 1.60it/s] 60%|██████ | 6968/11526 [1:12:52<47:16, 1.61it/s] {'loss': 0.1983, 'grad_norm': 0.5501811504364014, 'learning_rate': 4.053991603882092e-06, 'epoch': 1.81}
60%|██████ | 6968/11526 [1:12:52<47:16, 1.61it/s] 60%|██████ | 6969/11526 [1:12:52<47:07, 1.61it/s] {'loss': 0.2011, 'grad_norm': 0.6469659805297852, 'learning_rate': 4.052504685954864e-06, 'epoch': 1.81}
60%|██████ | 6969/11526 [1:12:52<47:07, 1.61it/s] 60%|██████ | 6970/11526 [1:12:53<46:56, 1.62it/s] {'loss': 0.2603, 'grad_norm': 0.6555082201957703, 'learning_rate': 4.051017854937296e-06, 'epoch': 1.81}
60%|██████ | 6970/11526 [1:12:53<46:56, 1.62it/s] 60%|██████ | 6971/11526 [1:12:53<46:49, 1.62it/s] {'loss': 0.1657, 'grad_norm': 0.4645508825778961, 'learning_rate': 4.049531110965771e-06, 'epoch': 1.81}
60%|██████ | 6971/11526 [1:12:54<46:49, 1.62it/s] 60%|██████ | 6972/11526 [1:12:54<46:44, 1.62it/s] {'loss': 0.1914, 'grad_norm': 0.5598008036613464, 'learning_rate': 4.048044454176658e-06, 'epoch': 1.81}
60%|██████ | 6972/11526 [1:12:54<46:44, 1.62it/s] 60%|██████ | 6973/11526 [1:12:55<46:41, 1.63it/s] {'loss': 0.3349, 'grad_norm': 0.76068514585495, 'learning_rate': 4.046557884706326e-06, 'epoch': 1.81}
60%|██████ | 6973/11526 [1:12:55<46:41, 1.63it/s] 61%|██████ | 6974/11526 [1:12:55<47:55, 1.58it/s] {'loss': 0.2305, 'grad_norm': 0.6609376072883606, 'learning_rate': 4.04507140269113e-06, 'epoch': 1.82}
61%|██████ | 6974/11526 [1:12:55<47:55, 1.58it/s] 61%|██████ | 6975/11526 [1:12:56<47:41, 1.59it/s] {'loss': 0.1915, 'grad_norm': 0.477303147315979, 'learning_rate': 4.043585008267418e-06, 'epoch': 1.82}
61%|██████ | 6975/11526 [1:12:56<47:41, 1.59it/s] 61%|██████ | 6976/11526 [1:12:57<47:20, 1.60it/s] {'loss': 0.1579, 'grad_norm': 0.46799570322036743, 'learning_rate': 4.0420987015715305e-06, 'epoch': 1.82}
61%|██████ | 6976/11526 [1:12:57<47:20, 1.60it/s] 61%|██████ | 6977/11526 [1:12:57<47:07, 1.61it/s] {'loss': 0.1511, 'grad_norm': 0.473154753446579, 'learning_rate': 4.040612482739799e-06, 'epoch': 1.82}
61%|██████ | 6977/11526 [1:12:57<47:07, 1.61it/s] 61%|██████ | 6978/11526 [1:12:58<46:55, 1.62it/s] {'loss': 0.2268, 'grad_norm': 0.5738425254821777, 'learning_rate': 4.03912635190855e-06, 'epoch': 1.82}
61%|██████ | 6978/11526 [1:12:58<46:55, 1.62it/s] 61%|██████ | 6979/11526 [1:12:58<46:52, 1.62it/s] {'loss': 0.3208, 'grad_norm': 0.6052866578102112, 'learning_rate': 4.0376403092141e-06, 'epoch': 1.82}
61%|██████ | 6979/11526 [1:12:59<46:52, 1.62it/s] 61%|██████ | 6980/11526 [1:12:59<47:59, 1.58it/s] {'loss': 0.1996, 'grad_norm': 0.5953381061553955, 'learning_rate': 4.036154354792757e-06, 'epoch': 1.82}
61%|██████ | 6980/11526 [1:12:59<47:59, 1.58it/s] 61%|██████ | 6981/11526 [1:13:00<47:43, 1.59it/s] {'loss': 0.2053, 'grad_norm': 0.5842302441596985, 'learning_rate': 4.034668488780818e-06, 'epoch': 1.82}
61%|██████ | 6981/11526 [1:13:00<47:43, 1.59it/s] 61%|██████ | 6982/11526 [1:13:00<47:21, 1.60it/s] {'loss': 0.2338, 'grad_norm': 0.48001915216445923, 'learning_rate': 4.033182711314579e-06, 'epoch': 1.82}
61%|██████ | 6982/11526 [1:13:00<47:21, 1.60it/s] 61%|██████ | 6983/11526 [1:13:01<47:06, 1.61it/s] {'loss': 0.1609, 'grad_norm': 0.5009353160858154, 'learning_rate': 4.031697022530324e-06, 'epoch': 1.82}
61%|██████ | 6983/11526 [1:13:01<47:06, 1.61it/s] 61%|██████ | 6984/11526 [1:13:02<46:58, 1.61it/s] {'loss': 0.2052, 'grad_norm': 0.46471571922302246, 'learning_rate': 4.030211422564327e-06, 'epoch': 1.82}
61%|██████ | 6984/11526 [1:13:02<46:58, 1.61it/s] 61%|██████ | 6985/11526 [1:13:02<46:47, 1.62it/s] {'loss': 0.1874, 'grad_norm': 0.5302966237068176, 'learning_rate': 4.028725911552856e-06, 'epoch': 1.82}
61%|██████ | 6985/11526 [1:13:02<46:47, 1.62it/s] 61%|██████ | 6986/11526 [1:13:03<46:47, 1.62it/s] {'loss': 0.1719, 'grad_norm': 0.5373603701591492, 'learning_rate': 4.0272404896321695e-06, 'epoch': 1.82}
61%|██████ | 6986/11526 [1:13:03<46:47, 1.62it/s] 61%|██████ | 6987/11526 [1:13:03<46:41, 1.62it/s] {'loss': 0.2735, 'grad_norm': 0.5950497984886169, 'learning_rate': 4.025755156938522e-06, 'epoch': 1.82}
61%|██████ | 6987/11526 [1:13:04<46:41, 1.62it/s] 61%|██████ | 6988/11526 [1:13:04<46:37, 1.62it/s] {'loss': 0.1912, 'grad_norm': 0.5478935837745667, 'learning_rate': 4.024269913608155e-06, 'epoch': 1.82}
61%|██████ | 6988/11526 [1:13:04<46:37, 1.62it/s] 61%|██████ | 6989/11526 [1:13:05<46:38, 1.62it/s] {'loss': 0.1766, 'grad_norm': 0.5016225576400757, 'learning_rate': 4.0227847597773e-06, 'epoch': 1.82}
61%|██████ | 6989/11526 [1:13:05<46:38, 1.62it/s] 61%|██████ | 6990/11526 [1:13:05<46:34, 1.62it/s] {'loss': 0.1801, 'grad_norm': 0.48291054368019104, 'learning_rate': 4.021299695582188e-06, 'epoch': 1.82}
61%|██████ | 6990/11526 [1:13:05<46:34, 1.62it/s] 61%|██████ | 6991/11526 [1:13:06<46:31, 1.62it/s] {'loss': 0.1944, 'grad_norm': 0.5971914529800415, 'learning_rate': 4.0198147211590385e-06, 'epoch': 1.82}
61%|██████ | 6991/11526 [1:13:06<46:31, 1.62it/s] 61%|██████ | 6992/11526 [1:13:06<46:28, 1.63it/s] {'loss': 0.2075, 'grad_norm': 0.5692657828330994, 'learning_rate': 4.01832983664406e-06, 'epoch': 1.82}
61%|██████ | 6992/11526 [1:13:07<46:28, 1.63it/s] 61%|██████ | 6993/11526 [1:13:07<46:27, 1.63it/s] {'loss': 0.2646, 'grad_norm': 0.6763654947280884, 'learning_rate': 4.016845042173452e-06, 'epoch': 1.82}
61%|██████ | 6993/11526 [1:13:07<46:27, 1.63it/s] 61%|██████ | 6994/11526 [1:13:08<46:29, 1.62it/s] {'loss': 0.2283, 'grad_norm': 0.5874255299568176, 'learning_rate': 4.015360337883412e-06, 'epoch': 1.82}
61%|██████ | 6994/11526 [1:13:08<46:29, 1.62it/s] 61%|██████ | 6995/11526 [1:13:08<46:26, 1.63it/s] {'loss': 0.2245, 'grad_norm': 0.5581841468811035, 'learning_rate': 4.013875723910123e-06, 'epoch': 1.82}
61%|██████ | 6995/11526 [1:13:08<46:26, 1.63it/s] 61%|██████ | 6996/11526 [1:13:09<46:25, 1.63it/s] {'loss': 0.2894, 'grad_norm': 0.6721134781837463, 'learning_rate': 4.012391200389765e-06, 'epoch': 1.82}
61%|██████ | 6996/11526 [1:13:09<46:25, 1.63it/s] 61%|██████ | 6997/11526 [1:13:10<46:22, 1.63it/s] {'loss': 0.227, 'grad_norm': 0.5786166191101074, 'learning_rate': 4.0109067674585045e-06, 'epoch': 1.82}
61%|██████ | 6997/11526 [1:13:10<46:22, 1.63it/s] 61%|██████ | 6998/11526 [1:13:10<46:22, 1.63it/s] {'loss': 0.2098, 'grad_norm': 0.5607265830039978, 'learning_rate': 4.009422425252504e-06, 'epoch': 1.82}
61%|██████ | 6998/11526 [1:13:10<46:22, 1.63it/s] 61%|██████ | 6999/11526 [1:13:11<46:26, 1.62it/s] {'loss': 0.2302, 'grad_norm': 0.6004804372787476, 'learning_rate': 4.007938173907912e-06, 'epoch': 1.82}
61%|██████ | 6999/11526 [1:13:11<46:26, 1.62it/s] 61%|██████ | 7000/11526 [1:13:11<46:24, 1.63it/s] {'loss': 0.1938, 'grad_norm': 0.5607790946960449, 'learning_rate': 4.0064540135608786e-06, 'epoch': 1.82}
61%|██████ | 7000/11526 [1:13:12<46:24, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5288287997245789, 'eval_runtime': 1.956, 'eval_samples_per_second': 102.251, 'eval_steps_per_second': 6.646, 'epoch': 1.82}
61%|██████ | 7000/11526 [1:13:13<46:24, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 61%|██████ | 7001/11526 [1:13:14<1:30:44, 1.20s/it] {'loss': 0.1558, 'grad_norm': 0.4407941699028015, 'learning_rate': 4.004969944347535e-06, 'epoch': 1.82}
61%|██████ | 7001/11526 [1:13:14<1:30:44, 1.20s/it] 61%|██████ | 7002/11526 [1:13:15<1:17:23, 1.03s/it] {'loss': 0.1632, 'grad_norm': 0.5018138885498047, 'learning_rate': 4.00348596640401e-06, 'epoch': 1.82}
61%|██████ | 7002/11526 [1:13:15<1:17:23, 1.03s/it] 61%|██████ | 7003/11526 [1:13:15<1:08:01, 1.11it/s] {'loss': 0.275, 'grad_norm': 0.6331875324249268, 'learning_rate': 4.002002079866422e-06, 'epoch': 1.82}
61%|██████ | 7003/11526 [1:13:15<1:08:01, 1.11it/s] 61%|██████ | 7004/11526 [1:13:16<1:01:34, 1.22it/s] {'loss': 0.2485, 'grad_norm': 0.6231549978256226, 'learning_rate': 4.00051828487088e-06, 'epoch': 1.82}
61%|██████ | 7004/11526 [1:13:16<1:01:34, 1.22it/s] 61%|██████ | 7005/11526 [1:13:16<56:59, 1.32it/s] {'loss': 0.1446, 'grad_norm': 0.43572646379470825, 'learning_rate': 3.999034581553489e-06, 'epoch': 1.82}
61%|██████ | 7005/11526 [1:13:17<56:59, 1.32it/s] 61%|██████ | 7006/11526 [1:13:17<53:44, 1.40it/s] {'loss': 0.1687, 'grad_norm': 0.49390268325805664, 'learning_rate': 3.997550970050342e-06, 'epoch': 1.82}
61%|██████ | 7006/11526 [1:13:17<53:44, 1.40it/s] 61%|██████ | 7007/11526 [1:13:18<51:31, 1.46it/s] {'loss': 0.174, 'grad_norm': 0.49763596057891846, 'learning_rate': 3.9960674504975235e-06, 'epoch': 1.82}
61%|██████ | 7007/11526 [1:13:18<51:31, 1.46it/s] 61%|██████ | 7008/11526 [1:13:18<49:56, 1.51it/s] {'loss': 0.1705, 'grad_norm': 0.46285003423690796, 'learning_rate': 3.994584023031108e-06, 'epoch': 1.82}
61%|██████ | 7008/11526 [1:13:18<49:56, 1.51it/s] 61%|██████ | 7009/11526 [1:13:19<48:55, 1.54it/s] {'loss': 0.1604, 'grad_norm': 0.4547848105430603, 'learning_rate': 3.9931006877871685e-06, 'epoch': 1.82}
61%|██████ | 7009/11526 [1:13:19<48:55, 1.54it/s] 61%|██████ | 7010/11526 [1:13:19<48:06, 1.56it/s] {'loss': 0.2542, 'grad_norm': 0.5026652216911316, 'learning_rate': 3.991617444901764e-06, 'epoch': 1.82}
61%|██████ | 7010/11526 [1:13:20<48:06, 1.56it/s] 61%|██████ | 7011/11526 [1:13:20<47:31, 1.58it/s] {'loss': 0.2078, 'grad_norm': 0.5448158979415894, 'learning_rate': 3.990134294510944e-06, 'epoch': 1.82}
61%|██████ | 7011/11526 [1:13:20<47:31, 1.58it/s] 61%|██████ | 7012/11526 [1:13:21<47:06, 1.60it/s] {'loss': 0.1533, 'grad_norm': 0.4532041549682617, 'learning_rate': 3.9886512367507526e-06, 'epoch': 1.83}
61%|██████ | 7012/11526 [1:13:21<47:06, 1.60it/s] 61%|██████ | 7013/11526 [1:13:21<46:50, 1.61it/s] {'loss': 0.1284, 'grad_norm': 0.41540929675102234, 'learning_rate': 3.987168271757222e-06, 'epoch': 1.83}
61%|██████ | 7013/11526 [1:13:21<46:50, 1.61it/s] 61%|██████ | 7014/11526 [1:13:22<46:50, 1.61it/s] {'loss': 0.1703, 'grad_norm': 0.523853063583374, 'learning_rate': 3.9856853996663836e-06, 'epoch': 1.83}
61%|██████ | 7014/11526 [1:13:22<46:50, 1.61it/s] 61%|██████ | 7015/11526 [1:13:23<46:38, 1.61it/s] {'loss': 0.2567, 'grad_norm': 0.6411023139953613, 'learning_rate': 3.984202620614251e-06, 'epoch': 1.83}
61%|██████ | 7015/11526 [1:13:23<46:38, 1.61it/s] 61%|██████ | 7016/11526 [1:13:23<46:29, 1.62it/s] {'loss': 0.2084, 'grad_norm': 0.5635441541671753, 'learning_rate': 3.982719934736832e-06, 'epoch': 1.83}
61%|██████ | 7016/11526 [1:13:23<46:29, 1.62it/s] 61%|██████ | 7017/11526 [1:13:24<46:23, 1.62it/s] {'loss': 0.185, 'grad_norm': 0.5161164999008179, 'learning_rate': 3.9812373421701285e-06, 'epoch': 1.83}
61%|██████ | 7017/11526 [1:13:24<46:23, 1.62it/s] 61%|██████ | 7018/11526 [1:13:24<46:17, 1.62it/s] {'loss': 0.25, 'grad_norm': 0.6721857786178589, 'learning_rate': 3.9797548430501335e-06, 'epoch': 1.83}
61%|██████ | 7018/11526 [1:13:25<46:17, 1.62it/s] 61%|██████ | 7019/11526 [1:13:25<46:18, 1.62it/s] {'loss': 0.1602, 'grad_norm': 0.5313553810119629, 'learning_rate': 3.97827243751283e-06, 'epoch': 1.83}
61%|██████ | 7019/11526 [1:13:25<46:18, 1.62it/s] 61%|██████ | 7020/11526 [1:13:26<46:13, 1.62it/s] {'loss': 0.1745, 'grad_norm': 0.5101827383041382, 'learning_rate': 3.976790125694191e-06, 'epoch': 1.83}
61%|██████ | 7020/11526 [1:13:26<46:13, 1.62it/s] 61%|██████ | 7021/11526 [1:13:26<46:11, 1.63it/s] {'loss': 0.2436, 'grad_norm': 0.5957359671592712, 'learning_rate': 3.975307907730183e-06, 'epoch': 1.83}
61%|██████ | 7021/11526 [1:13:26<46:11, 1.63it/s] 61%|██████ | 7022/11526 [1:13:27<46:09, 1.63it/s] {'loss': 0.2292, 'grad_norm': 0.5623039603233337, 'learning_rate': 3.973825783756765e-06, 'epoch': 1.83}
61%|██████ | 7022/11526 [1:13:27<46:09, 1.63it/s] 61%|██████ | 7023/11526 [1:13:27<46:09, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.5301286578178406, 'learning_rate': 3.972343753909884e-06, 'epoch': 1.83}
61%|██████ | 7023/11526 [1:13:28<46:09, 1.63it/s] 61%|██████ | 7024/11526 [1:13:28<46:08, 1.63it/s] {'loss': 0.1714, 'grad_norm': 0.538164496421814, 'learning_rate': 3.9708618183254815e-06, 'epoch': 1.83}
61%|██████ | 7024/11526 [1:13:28<46:08, 1.63it/s] 61%|██████ | 7025/11526 [1:13:29<46:07, 1.63it/s] {'loss': 0.1953, 'grad_norm': 0.5084078907966614, 'learning_rate': 3.9693799771394896e-06, 'epoch': 1.83}
61%|██████ | 7025/11526 [1:13:29<46:07, 1.63it/s] 61%|██████ | 7026/11526 [1:13:29<46:05, 1.63it/s] {'loss': 0.27, 'grad_norm': 0.6219936013221741, 'learning_rate': 3.967898230487828e-06, 'epoch': 1.83}
61%|██████ | 7026/11526 [1:13:29<46:05, 1.63it/s] 61%|██████ | 7027/11526 [1:13:30<46:04, 1.63it/s] {'loss': 0.118, 'grad_norm': 0.3847035765647888, 'learning_rate': 3.966416578506414e-06, 'epoch': 1.83}
61%|██████ | 7027/11526 [1:13:30<46:04, 1.63it/s] 61%|██████ | 7028/11526 [1:13:31<46:03, 1.63it/s] {'loss': 0.2163, 'grad_norm': 0.551507830619812, 'learning_rate': 3.9649350213311524e-06, 'epoch': 1.83}
61%|██████ | 7028/11526 [1:13:31<46:03, 1.63it/s] 61%|██████ | 7029/11526 [1:13:31<46:07, 1.62it/s] {'loss': 0.2146, 'grad_norm': 0.654341459274292, 'learning_rate': 3.963453559097941e-06, 'epoch': 1.83}
61%|██████ | 7029/11526 [1:13:31<46:07, 1.62it/s] 61%|██████ | 7030/11526 [1:13:32<46:05, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.5191405415534973, 'learning_rate': 3.961972191942664e-06, 'epoch': 1.83}
61%|██████ | 7030/11526 [1:13:32<46:05, 1.63it/s] 61%|██████ | 7031/11526 [1:13:32<46:03, 1.63it/s] {'loss': 0.1806, 'grad_norm': 0.51352459192276, 'learning_rate': 3.960490920001207e-06, 'epoch': 1.83}
61%|██████ | 7031/11526 [1:13:33<46:03, 1.63it/s] 61%|██████ | 7032/11526 [1:13:33<46:01, 1.63it/s] {'loss': 0.2581, 'grad_norm': 0.6780652403831482, 'learning_rate': 3.9590097434094346e-06, 'epoch': 1.83}
61%|██████ | 7032/11526 [1:13:33<46:01, 1.63it/s] 61%|██████ | 7033/11526 [1:13:34<46:00, 1.63it/s] {'loss': 0.1886, 'grad_norm': 0.5189133882522583, 'learning_rate': 3.957528662303214e-06, 'epoch': 1.83}
61%|██████ | 7033/11526 [1:13:34<46:00, 1.63it/s] 61%|██████ | 7034/11526 [1:13:34<46:00, 1.63it/s] {'loss': 0.1746, 'grad_norm': 0.4741310775279999, 'learning_rate': 3.956047676818394e-06, 'epoch': 1.83}
61%|██████ | 7034/11526 [1:13:34<46:00, 1.63it/s] 61%|██████ | 7035/11526 [1:13:35<46:01, 1.63it/s] {'loss': 0.2486, 'grad_norm': 0.676751434803009, 'learning_rate': 3.9545667870908215e-06, 'epoch': 1.83}
61%|██████ | 7035/11526 [1:13:35<46:01, 1.63it/s] 61%|██████ | 7036/11526 [1:13:35<46:01, 1.63it/s] {'loss': 0.1986, 'grad_norm': 0.543242871761322, 'learning_rate': 3.9530859932563315e-06, 'epoch': 1.83}
61%|██████ | 7036/11526 [1:13:36<46:01, 1.63it/s] 61%|██████ | 7037/11526 [1:13:36<46:00, 1.63it/s] {'loss': 0.257, 'grad_norm': 0.6180221438407898, 'learning_rate': 3.951605295450753e-06, 'epoch': 1.83}
61%|██████ | 7037/11526 [1:13:36<46:00, 1.63it/s] 61%|██████ | 7038/11526 [1:13:37<45:58, 1.63it/s] {'loss': 0.2646, 'grad_norm': 0.5592921376228333, 'learning_rate': 3.950124693809898e-06, 'epoch': 1.83}
61%|██████ | 7038/11526 [1:13:37<45:58, 1.63it/s] 61%|██████ | 7039/11526 [1:13:37<46:04, 1.62it/s] {'loss': 0.2266, 'grad_norm': 0.5427175164222717, 'learning_rate': 3.948644188469582e-06, 'epoch': 1.83}
61%|██████ | 7039/11526 [1:13:37<46:04, 1.62it/s] 61%|██████ | 7040/11526 [1:13:38<46:00, 1.63it/s] {'loss': 0.1836, 'grad_norm': 0.5187915563583374, 'learning_rate': 3.947163779565602e-06, 'epoch': 1.83}
61%|██████ | 7040/11526 [1:13:38<46:00, 1.63it/s] 61%|██████ | 7041/11526 [1:13:39<45:58, 1.63it/s] {'loss': 0.1679, 'grad_norm': 0.4920266568660736, 'learning_rate': 3.945683467233752e-06, 'epoch': 1.83}
61%|██████ | 7041/11526 [1:13:39<45:58, 1.63it/s] 61%|██████ | 7042/11526 [1:13:39<45:55, 1.63it/s] {'loss': 0.1737, 'grad_norm': 0.4743726849555969, 'learning_rate': 3.944203251609812e-06, 'epoch': 1.83}
61%|██████ | 7042/11526 [1:13:39<45:55, 1.63it/s] 61%|██████ | 7043/11526 [1:13:40<45:54, 1.63it/s] {'loss': 0.1949, 'grad_norm': 0.6127454042434692, 'learning_rate': 3.942723132829559e-06, 'epoch': 1.83}
61%|██████ | 7043/11526 [1:13:40<45:54, 1.63it/s] 61%|██████ | 7044/11526 [1:13:40<45:56, 1.63it/s] {'loss': 0.2493, 'grad_norm': 0.6341199278831482, 'learning_rate': 3.9412431110287515e-06, 'epoch': 1.83}
61%|██████ | 7044/11526 [1:13:41<45:56, 1.63it/s] 61%|██████ | 7045/11526 [1:13:41<45:54, 1.63it/s] {'loss': 0.2043, 'grad_norm': 0.5785811543464661, 'learning_rate': 3.939763186343154e-06, 'epoch': 1.83}
61%|██████ | 7045/11526 [1:13:41<45:54, 1.63it/s] 61%|██████ | 7046/11526 [1:13:42<45:53, 1.63it/s] {'loss': 0.2941, 'grad_norm': 0.7142670750617981, 'learning_rate': 3.938283358908508e-06, 'epoch': 1.83}
61%|██████ | 7046/11526 [1:13:42<45:53, 1.63it/s] 61%|██████ | 7047/11526 [1:13:42<45:52, 1.63it/s] {'loss': 0.1635, 'grad_norm': 0.4650724232196808, 'learning_rate': 3.936803628860554e-06, 'epoch': 1.83}
61%|██████ | 7047/11526 [1:13:42<45:52, 1.63it/s] 61%|██████ | 7048/11526 [1:13:43<45:50, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.45156675577163696, 'learning_rate': 3.935323996335019e-06, 'epoch': 1.83}
61%|██████ | 7048/11526 [1:13:43<45:50, 1.63it/s] 61%|██████ | 7049/11526 [1:13:43<45:53, 1.63it/s] {'loss': 0.1772, 'grad_norm': 0.5176602005958557, 'learning_rate': 3.933844461467626e-06, 'epoch': 1.83}
61%|██████ | 7049/11526 [1:13:44<45:53, 1.63it/s] 61%|██████ | 7050/11526 [1:13:44<45:51, 1.63it/s] {'loss': 0.1699, 'grad_norm': 0.5345945954322815, 'learning_rate': 3.9323650243940855e-06, 'epoch': 1.83}
61%|██████ | 7050/11526 [1:13:44<45:51, 1.63it/s] 61%|██████ | 7051/11526 [1:13:45<45:50, 1.63it/s] {'loss': 0.1671, 'grad_norm': 0.48807525634765625, 'learning_rate': 3.9308856852501e-06, 'epoch': 1.84}
61%|██████ | 7051/11526 [1:13:45<45:50, 1.63it/s] 61%|██████ | 7052/11526 [1:13:45<45:48, 1.63it/s] {'loss': 0.2448, 'grad_norm': 0.7442659735679626, 'learning_rate': 3.929406444171362e-06, 'epoch': 1.84}
61%|██████ | 7052/11526 [1:13:45<45:48, 1.63it/s] 61%|██████ | 7053/11526 [1:13:46<45:47, 1.63it/s] {'loss': 0.2445, 'grad_norm': 0.6014370918273926, 'learning_rate': 3.9279273012935545e-06, 'epoch': 1.84}
61%|██████ | 7053/11526 [1:13:46<45:47, 1.63it/s] 61%|██████ | 7054/11526 [1:13:47<46:00, 1.62it/s] {'loss': 0.1543, 'grad_norm': 0.3962904214859009, 'learning_rate': 3.926448256752356e-06, 'epoch': 1.84}
61%|██████ | 7054/11526 [1:13:47<46:00, 1.62it/s] 61%|██████ | 7055/11526 [1:13:47<45:54, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.4440643787384033, 'learning_rate': 3.924969310683432e-06, 'epoch': 1.84}
61%|██████ | 7055/11526 [1:13:47<45:54, 1.62it/s] 61%|██████ | 7056/11526 [1:13:48<45:50, 1.62it/s] {'loss': 0.2268, 'grad_norm': 0.5706800818443298, 'learning_rate': 3.923490463222441e-06, 'epoch': 1.84}
61%|██████ | 7056/11526 [1:13:48<45:50, 1.62it/s] 61%|██████ | 7057/11526 [1:13:48<45:47, 1.63it/s] {'loss': 0.1971, 'grad_norm': 0.5300887823104858, 'learning_rate': 3.9220117145050254e-06, 'epoch': 1.84}
61%|██████ | 7057/11526 [1:13:49<45:47, 1.63it/s] 61%|██████ | 7058/11526 [1:13:49<45:45, 1.63it/s] {'loss': 0.1907, 'grad_norm': 0.48228126764297485, 'learning_rate': 3.920533064666831e-06, 'epoch': 1.84}
61%|██████ | 7058/11526 [1:13:49<45:45, 1.63it/s] 61%|██████ | 7059/11526 [1:13:50<45:49, 1.62it/s] {'loss': 0.1781, 'grad_norm': 0.49574920535087585, 'learning_rate': 3.919054513843488e-06, 'epoch': 1.84}
61%|██████ | 7059/11526 [1:13:50<45:49, 1.62it/s] 61%|██████▏ | 7060/11526 [1:13:50<45:46, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.5226278305053711, 'learning_rate': 3.917576062170614e-06, 'epoch': 1.84}
61%|██████▏ | 7060/11526 [1:13:50<45:46, 1.63it/s] 61%|██████▏ | 7061/11526 [1:13:51<45:43, 1.63it/s] {'loss': 0.2403, 'grad_norm': 0.528294563293457, 'learning_rate': 3.916097709783821e-06, 'epoch': 1.84}
61%|██████▏ | 7061/11526 [1:13:51<45:43, 1.63it/s] 61%|██████▏ | 7062/11526 [1:13:51<45:43, 1.63it/s] {'loss': 0.204, 'grad_norm': 0.5573733448982239, 'learning_rate': 3.914619456818713e-06, 'epoch': 1.84}
61%|██████▏ | 7062/11526 [1:13:52<45:43, 1.63it/s] 61%|██████▏ | 7063/11526 [1:13:52<45:43, 1.63it/s] {'loss': 0.1836, 'grad_norm': 0.48823216557502747, 'learning_rate': 3.913141303410886e-06, 'epoch': 1.84}
61%|██████▏ | 7063/11526 [1:13:52<45:43, 1.63it/s] 61%|██████▏ | 7064/11526 [1:13:53<45:44, 1.63it/s] {'loss': 0.1954, 'grad_norm': 0.578425407409668, 'learning_rate': 3.9116632496959224e-06, 'epoch': 1.84}
61%|██████▏ | 7064/11526 [1:13:53<45:44, 1.63it/s] 61%|██████▏ | 7065/11526 [1:13:53<45:43, 1.63it/s] {'loss': 0.2234, 'grad_norm': 0.5148276686668396, 'learning_rate': 3.9101852958093976e-06, 'epoch': 1.84}
61%|██████▏ | 7065/11526 [1:13:53<45:43, 1.63it/s] 61%|██████▏ | 7066/11526 [1:13:54<45:42, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.46618178486824036, 'learning_rate': 3.908707441886876e-06, 'epoch': 1.84}
61%|██████▏ | 7066/11526 [1:13:54<45:42, 1.63it/s] 61%|██████▏ | 7067/11526 [1:13:55<45:40, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.5223297476768494, 'learning_rate': 3.90722968806392e-06, 'epoch': 1.84}
61%|██████▏ | 7067/11526 [1:13:55<45:40, 1.63it/s] 61%|██████▏ | 7068/11526 [1:13:55<45:38, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.5741158127784729, 'learning_rate': 3.905752034476074e-06, 'epoch': 1.84}
61%|██████▏ | 7068/11526 [1:13:55<45:38, 1.63it/s] 61%|██████▏ | 7069/11526 [1:13:56<45:37, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.5608739256858826, 'learning_rate': 3.904274481258877e-06, 'epoch': 1.84}
61%|██████▏ | 7069/11526 [1:13:56<45:37, 1.63it/s] 61%|██████▏ | 7070/11526 [1:13:56<45:36, 1.63it/s] {'loss': 0.2352, 'grad_norm': 0.7011055946350098, 'learning_rate': 3.9027970285478575e-06, 'epoch': 1.84}
61%|██████▏ | 7070/11526 [1:13:57<45:36, 1.63it/s] 61%|██████▏ | 7071/11526 [1:13:57<45:36, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.5227944850921631, 'learning_rate': 3.90131967647854e-06, 'epoch': 1.84}
61%|██████▏ | 7071/11526 [1:13:57<45:36, 1.63it/s] 61%|██████▏ | 7072/11526 [1:13:58<45:35, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.5900956392288208, 'learning_rate': 3.899842425186432e-06, 'epoch': 1.84}
61%|██████▏ | 7072/11526 [1:13:58<45:35, 1.63it/s] 61%|██████▏ | 7073/11526 [1:13:58<45:35, 1.63it/s] {'loss': 0.1854, 'grad_norm': 0.485666424036026, 'learning_rate': 3.898365274807037e-06, 'epoch': 1.84}
61%|██████▏ | 7073/11526 [1:13:58<45:35, 1.63it/s] 61%|██████▏ | 7074/11526 [1:13:59<45:38, 1.63it/s] {'loss': 0.2016, 'grad_norm': 0.5179113745689392, 'learning_rate': 3.896888225475847e-06, 'epoch': 1.84}
61%|██████▏ | 7074/11526 [1:13:59<45:38, 1.63it/s] 61%|██████▏ | 7075/11526 [1:13:59<45:37, 1.63it/s] {'loss': 0.1666, 'grad_norm': 0.45178160071372986, 'learning_rate': 3.895411277328342e-06, 'epoch': 1.84}
61%|██████▏ | 7075/11526 [1:14:00<45:37, 1.63it/s] 61%|██████▏ | 7076/11526 [1:14:00<45:34, 1.63it/s] {'loss': 0.2711, 'grad_norm': 0.7023906707763672, 'learning_rate': 3.893934430500003e-06, 'epoch': 1.84}
61%|██████▏ | 7076/11526 [1:14:00<45:34, 1.63it/s] 61%|██████▏ | 7077/11526 [1:14:01<45:33, 1.63it/s] {'loss': 0.3256, 'grad_norm': 0.5214544534683228, 'learning_rate': 3.892457685126291e-06, 'epoch': 1.84}
61%|██████▏ | 7077/11526 [1:14:01<45:33, 1.63it/s] 61%|██████▏ | 7078/11526 [1:14:01<45:32, 1.63it/s] {'loss': 0.142, 'grad_norm': 0.45021215081214905, 'learning_rate': 3.890981041342662e-06, 'epoch': 1.84}
61%|██████▏ | 7078/11526 [1:14:01<45:32, 1.63it/s] 61%|██████▏ | 7079/11526 [1:14:02<45:46, 1.62it/s] {'loss': 0.2447, 'grad_norm': 0.5884124040603638, 'learning_rate': 3.88950449928456e-06, 'epoch': 1.84}
61%|██████▏ | 7079/11526 [1:14:02<45:46, 1.62it/s] 61%|██████▏ | 7080/11526 [1:14:03<45:42, 1.62it/s] {'loss': 0.1911, 'grad_norm': 0.5620632171630859, 'learning_rate': 3.888028059087426e-06, 'epoch': 1.84}
61%|██████▏ | 7080/11526 [1:14:03<45:42, 1.62it/s] 61%|██████▏ | 7081/11526 [1:14:03<45:39, 1.62it/s] {'loss': 0.2051, 'grad_norm': 0.5585446357727051, 'learning_rate': 3.886551720886684e-06, 'epoch': 1.84}
61%|██████▏ | 7081/11526 [1:14:03<45:39, 1.62it/s] 61%|██████▏ | 7082/11526 [1:14:04<45:36, 1.62it/s] {'loss': 0.1944, 'grad_norm': 0.5609580874443054, 'learning_rate': 3.885075484817754e-06, 'epoch': 1.84}
61%|██████▏ | 7082/11526 [1:14:04<45:36, 1.62it/s] 61%|██████▏ | 7083/11526 [1:14:04<45:33, 1.63it/s] {'loss': 0.1911, 'grad_norm': 0.5462238788604736, 'learning_rate': 3.883599351016045e-06, 'epoch': 1.84}
61%|██████▏ | 7083/11526 [1:14:05<45:33, 1.63it/s] 61%|██████▏ | 7084/11526 [1:14:05<45:34, 1.62it/s] {'loss': 0.2183, 'grad_norm': 0.5624341368675232, 'learning_rate': 3.882123319616953e-06, 'epoch': 1.84}
61%|██████▏ | 7084/11526 [1:14:05<45:34, 1.62it/s] 61%|██████▏ | 7085/11526 [1:14:06<45:31, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.5183133482933044, 'learning_rate': 3.880647390755872e-06, 'epoch': 1.84}
61%|██████▏ | 7085/11526 [1:14:06<45:31, 1.63it/s] 61%|██████▏ | 7086/11526 [1:14:06<45:28, 1.63it/s] {'loss': 0.1553, 'grad_norm': 0.456591933965683, 'learning_rate': 3.879171564568182e-06, 'epoch': 1.84}
61%|██████▏ | 7086/11526 [1:14:06<45:28, 1.63it/s] 61%|██████▏ | 7087/11526 [1:14:07<45:26, 1.63it/s] {'loss': 0.1495, 'grad_norm': 0.438292920589447, 'learning_rate': 3.877695841189253e-06, 'epoch': 1.84}
61%|██████▏ | 7087/11526 [1:14:07<45:26, 1.63it/s] 61%|██████▏ | 7088/11526 [1:14:07<45:26, 1.63it/s] {'loss': 0.2322, 'grad_norm': 0.5232959389686584, 'learning_rate': 3.876220220754445e-06, 'epoch': 1.84}
61%|██████▏ | 7088/11526 [1:14:08<45:26, 1.63it/s] 62%|██████▏ | 7089/11526 [1:14:08<45:27, 1.63it/s] {'loss': 0.1907, 'grad_norm': 0.4795706868171692, 'learning_rate': 3.874744703399114e-06, 'epoch': 1.85}
62%|██████▏ | 7089/11526 [1:14:08<45:27, 1.63it/s] 62%|██████▏ | 7090/11526 [1:14:09<45:26, 1.63it/s] {'loss': 0.2739, 'grad_norm': 0.6207345128059387, 'learning_rate': 3.873269289258602e-06, 'epoch': 1.85}
62%|██████▏ | 7090/11526 [1:14:09<45:26, 1.63it/s] 62%|██████▏ | 7091/11526 [1:14:09<45:24, 1.63it/s] {'loss': 0.2623, 'grad_norm': 0.6348301768302917, 'learning_rate': 3.871793978468241e-06, 'epoch': 1.85}
62%|██████▏ | 7091/11526 [1:14:09<45:24, 1.63it/s] 62%|██████▏ | 7092/11526 [1:14:10<45:23, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.5161216259002686, 'learning_rate': 3.870318771163356e-06, 'epoch': 1.85}
62%|██████▏ | 7092/11526 [1:14:10<45:23, 1.63it/s] 62%|██████▏ | 7093/11526 [1:14:11<45:24, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.5556342601776123, 'learning_rate': 3.868843667479259e-06, 'epoch': 1.85}
62%|██████▏ | 7093/11526 [1:14:11<45:24, 1.63it/s] 62%|██████▏ | 7094/11526 [1:14:11<45:27, 1.62it/s] {'loss': 0.2693, 'grad_norm': 0.5712441205978394, 'learning_rate': 3.867368667551259e-06, 'epoch': 1.85}
62%|██████▏ | 7094/11526 [1:14:11<45:27, 1.62it/s] 62%|██████▏ | 7095/11526 [1:14:12<45:24, 1.63it/s] {'loss': 0.1883, 'grad_norm': 0.5351272821426392, 'learning_rate': 3.86589377151465e-06, 'epoch': 1.85}
62%|██████▏ | 7095/11526 [1:14:12<45:24, 1.63it/s] 62%|██████▏ | 7096/11526 [1:14:12<45:22, 1.63it/s] {'loss': 0.1888, 'grad_norm': 0.4930594861507416, 'learning_rate': 3.864418979504714e-06, 'epoch': 1.85}
62%|██████▏ | 7096/11526 [1:14:13<45:22, 1.63it/s] 62%|██████▏ | 7097/11526 [1:14:13<45:22, 1.63it/s] {'loss': 0.1652, 'grad_norm': 0.4668085277080536, 'learning_rate': 3.862944291656731e-06, 'epoch': 1.85}
62%|██████▏ | 7097/11526 [1:14:13<45:22, 1.63it/s] 62%|██████▏ | 7098/11526 [1:14:14<45:20, 1.63it/s] {'loss': 0.2301, 'grad_norm': 0.5463760495185852, 'learning_rate': 3.861469708105969e-06, 'epoch': 1.85}
62%|██████▏ | 7098/11526 [1:14:14<45:20, 1.63it/s] 62%|██████▏ | 7099/11526 [1:14:14<45:22, 1.63it/s] {'loss': 0.2077, 'grad_norm': 0.5247781872749329, 'learning_rate': 3.859995228987683e-06, 'epoch': 1.85}
62%|██████▏ | 7099/11526 [1:14:14<45:22, 1.63it/s] 62%|██████▏ | 7100/11526 [1:14:15<45:20, 1.63it/s] {'loss': 0.2094, 'grad_norm': 0.5855410099029541, 'learning_rate': 3.85852085443712e-06, 'epoch': 1.85}
62%|██████▏ | 7100/11526 [1:14:15<45:20, 1.63it/s] 62%|██████▏ | 7101/11526 [1:14:15<45:18, 1.63it/s] {'loss': 0.2227, 'grad_norm': 0.6055750846862793, 'learning_rate': 3.85704658458952e-06, 'epoch': 1.85}
62%|██████▏ | 7101/11526 [1:14:16<45:18, 1.63it/s] 62%|██████▏ | 7102/11526 [1:14:16<45:17, 1.63it/s] {'loss': 0.1784, 'grad_norm': 0.5320996046066284, 'learning_rate': 3.855572419580108e-06, 'epoch': 1.85}
62%|██████▏ | 7102/11526 [1:14:16<45:17, 1.63it/s] 62%|██████▏ | 7103/11526 [1:14:17<45:16, 1.63it/s] {'loss': 0.1915, 'grad_norm': 0.5392299890518188, 'learning_rate': 3.8540983595441074e-06, 'epoch': 1.85}
62%|██████▏ | 7103/11526 [1:14:17<45:16, 1.63it/s] 62%|██████▏ | 7104/11526 [1:14:17<45:18, 1.63it/s] {'loss': 0.1509, 'grad_norm': 0.4232563376426697, 'learning_rate': 3.852624404616725e-06, 'epoch': 1.85}
62%|██████▏ | 7104/11526 [1:14:17<45:18, 1.63it/s] 62%|██████▏ | 7105/11526 [1:14:18<45:16, 1.63it/s] {'loss': 0.2255, 'grad_norm': 0.5556395053863525, 'learning_rate': 3.851150554933161e-06, 'epoch': 1.85}
62%|██████▏ | 7105/11526 [1:14:18<45:16, 1.63it/s] 62%|██████▏ | 7106/11526 [1:14:19<45:16, 1.63it/s] {'loss': 0.1925, 'grad_norm': 0.44683900475502014, 'learning_rate': 3.849676810628603e-06, 'epoch': 1.85}
62%|██████▏ | 7106/11526 [1:14:19<45:16, 1.63it/s] 62%|██████▏ | 7107/11526 [1:14:19<45:14, 1.63it/s] {'loss': 0.2064, 'grad_norm': 0.5399092435836792, 'learning_rate': 3.848203171838234e-06, 'epoch': 1.85}
62%|██████▏ | 7107/11526 [1:14:19<45:14, 1.63it/s] 62%|██████▏ | 7108/11526 [1:14:20<45:14, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.5078713297843933, 'learning_rate': 3.846729638697223e-06, 'epoch': 1.85}
62%|██████▏ | 7108/11526 [1:14:20<45:14, 1.63it/s] 62%|██████▏ | 7109/11526 [1:14:20<45:18, 1.62it/s] {'loss': 0.1886, 'grad_norm': 0.5061477422714233, 'learning_rate': 3.845256211340733e-06, 'epoch': 1.85}
62%|██████▏ | 7109/11526 [1:14:20<45:18, 1.62it/s] 62%|██████▏ | 7110/11526 [1:14:21<45:15, 1.63it/s] {'loss': 0.1599, 'grad_norm': 0.47149068117141724, 'learning_rate': 3.843782889903912e-06, 'epoch': 1.85}
62%|██████▏ | 7110/11526 [1:14:21<45:15, 1.63it/s] 62%|██████▏ | 7111/11526 [1:14:22<45:13, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5732012391090393, 'learning_rate': 3.8423096745219025e-06, 'epoch': 1.85}
62%|██████▏ | 7111/11526 [1:14:22<45:13, 1.63it/s] 62%|██████▏ | 7112/11526 [1:14:22<45:12, 1.63it/s] {'loss': 0.1947, 'grad_norm': 0.48013994097709656, 'learning_rate': 3.840836565329838e-06, 'epoch': 1.85}
62%|██████▏ | 7112/11526 [1:14:22<45:12, 1.63it/s] 62%|██████▏ | 7113/11526 [1:14:23<45:11, 1.63it/s] {'loss': 0.1848, 'grad_norm': 0.53067547082901, 'learning_rate': 3.83936356246284e-06, 'epoch': 1.85}
62%|██████▏ | 7113/11526 [1:14:23<45:11, 1.63it/s] 62%|██████▏ | 7114/11526 [1:14:23<45:13, 1.63it/s] {'loss': 0.2029, 'grad_norm': 0.5348948836326599, 'learning_rate': 3.837890666056018e-06, 'epoch': 1.85}
62%|██████▏ | 7114/11526 [1:14:24<45:13, 1.63it/s] 62%|██████▏ | 7115/11526 [1:14:24<45:12, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.48168978095054626, 'learning_rate': 3.836417876244475e-06, 'epoch': 1.85}
62%|██████▏ | 7115/11526 [1:14:24<45:12, 1.63it/s] 62%|██████▏ | 7116/11526 [1:14:25<45:09, 1.63it/s] {'loss': 0.228, 'grad_norm': 0.6510171890258789, 'learning_rate': 3.8349451931633065e-06, 'epoch': 1.85}
62%|██████▏ | 7116/11526 [1:14:25<45:09, 1.63it/s] 62%|██████▏ | 7117/11526 [1:14:25<45:08, 1.63it/s] {'loss': 0.2143, 'grad_norm': 0.5389730334281921, 'learning_rate': 3.833472616947595e-06, 'epoch': 1.85}
62%|██████▏ | 7117/11526 [1:14:25<45:08, 1.63it/s] 62%|██████▏ | 7118/11526 [1:14:26<45:06, 1.63it/s] {'loss': 0.1706, 'grad_norm': 0.5108242034912109, 'learning_rate': 3.832000147732411e-06, 'epoch': 1.85}
62%|██████▏ | 7118/11526 [1:14:26<45:06, 1.63it/s] 62%|██████▏ | 7119/11526 [1:14:27<45:07, 1.63it/s] {'loss': 0.2373, 'grad_norm': 0.5825146436691284, 'learning_rate': 3.830527785652818e-06, 'epoch': 1.85}
62%|██████▏ | 7119/11526 [1:14:27<45:07, 1.63it/s] 62%|██████▏ | 7120/11526 [1:14:27<45:05, 1.63it/s] {'loss': 0.1997, 'grad_norm': 0.5082781910896301, 'learning_rate': 3.829055530843872e-06, 'epoch': 1.85}
62%|██████▏ | 7120/11526 [1:14:27<45:05, 1.63it/s] 62%|██████▏ | 7121/11526 [1:14:28<45:07, 1.63it/s] {'loss': 0.1572, 'grad_norm': 0.43133819103240967, 'learning_rate': 3.8275833834406155e-06, 'epoch': 1.85}
62%|██████▏ | 7121/11526 [1:14:28<45:07, 1.63it/s] 62%|██████▏ | 7122/11526 [1:14:28<45:08, 1.63it/s] {'loss': 0.2029, 'grad_norm': 0.5434048175811768, 'learning_rate': 3.826111343578081e-06, 'epoch': 1.85}
62%|██████▏ | 7122/11526 [1:14:28<45:08, 1.63it/s] 62%|██████▏ | 7123/11526 [1:14:29<45:08, 1.63it/s] {'loss': 0.2051, 'grad_norm': 0.5051414370536804, 'learning_rate': 3.824639411391294e-06, 'epoch': 1.85}
62%|██████▏ | 7123/11526 [1:14:29<45:08, 1.63it/s] 62%|██████▏ | 7124/11526 [1:14:30<45:10, 1.62it/s] {'loss': 0.2023, 'grad_norm': 0.5177277326583862, 'learning_rate': 3.8231675870152664e-06, 'epoch': 1.85}
62%|██████▏ | 7124/11526 [1:14:30<45:10, 1.62it/s] 62%|██████▏ | 7125/11526 [1:14:30<45:09, 1.62it/s] {'loss': 0.1899, 'grad_norm': 0.5099687576293945, 'learning_rate': 3.821695870585005e-06, 'epoch': 1.85}
62%|██████▏ | 7125/11526 [1:14:30<45:09, 1.62it/s] 62%|██████▏ | 7126/11526 [1:14:31<45:06, 1.63it/s] {'loss': 0.234, 'grad_norm': 0.5711730122566223, 'learning_rate': 3.820224262235501e-06, 'epoch': 1.85}
62%|██████▏ | 7126/11526 [1:14:31<45:06, 1.63it/s] 62%|██████▏ | 7127/11526 [1:14:31<45:04, 1.63it/s] {'loss': 0.1911, 'grad_norm': 0.525222897529602, 'learning_rate': 3.818752762101743e-06, 'epoch': 1.86}
62%|██████▏ | 7127/11526 [1:14:32<45:04, 1.63it/s] 62%|██████▏ | 7128/11526 [1:14:32<45:02, 1.63it/s] {'loss': 0.2228, 'grad_norm': 0.5530250072479248, 'learning_rate': 3.817281370318699e-06, 'epoch': 1.86}
62%|██████▏ | 7128/11526 [1:14:32<45:02, 1.63it/s] 62%|██████▏ | 7129/11526 [1:14:33<45:04, 1.63it/s] {'loss': 0.1721, 'grad_norm': 0.4917619824409485, 'learning_rate': 3.81581008702134e-06, 'epoch': 1.86}
62%|██████▏ | 7129/11526 [1:14:33<45:04, 1.63it/s] 62%|██████▏ | 7130/11526 [1:14:33<45:02, 1.63it/s] {'loss': 0.1753, 'grad_norm': 0.554056704044342, 'learning_rate': 3.8143389123446164e-06, 'epoch': 1.86}
62%|██████▏ | 7130/11526 [1:14:33<45:02, 1.63it/s] 62%|██████▏ | 7131/11526 [1:14:34<45:00, 1.63it/s] {'loss': 0.1602, 'grad_norm': 0.5185564756393433, 'learning_rate': 3.8128678464234745e-06, 'epoch': 1.86}
62%|██████▏ | 7131/11526 [1:14:34<45:00, 1.63it/s] 62%|██████▏ | 7132/11526 [1:14:35<45:00, 1.63it/s] {'loss': 0.1659, 'grad_norm': 0.486628919839859, 'learning_rate': 3.811396889392849e-06, 'epoch': 1.86}
62%|██████▏ | 7132/11526 [1:14:35<45:00, 1.63it/s] 62%|██████▏ | 7133/11526 [1:14:35<44:59, 1.63it/s] {'loss': 0.2614, 'grad_norm': 0.56028813123703, 'learning_rate': 3.8099260413876616e-06, 'epoch': 1.86}
62%|██████▏ | 7133/11526 [1:14:35<44:59, 1.63it/s] 62%|██████▏ | 7134/11526 [1:14:36<44:59, 1.63it/s] {'loss': 0.2, 'grad_norm': 0.5991607904434204, 'learning_rate': 3.8084553025428294e-06, 'epoch': 1.86}
62%|██████▏ | 7134/11526 [1:14:36<44:59, 1.63it/s] 62%|██████▏ | 7135/11526 [1:14:36<44:57, 1.63it/s] {'loss': 0.1902, 'grad_norm': 0.6047558188438416, 'learning_rate': 3.8069846729932575e-06, 'epoch': 1.86}
62%|██████▏ | 7135/11526 [1:14:36<44:57, 1.63it/s] 62%|██████▏ | 7136/11526 [1:14:37<44:56, 1.63it/s] {'loss': 0.2081, 'grad_norm': 0.5638161301612854, 'learning_rate': 3.8055141528738398e-06, 'epoch': 1.86}
62%|██████▏ | 7136/11526 [1:14:37<44:56, 1.63it/s] 62%|██████▏ | 7137/11526 [1:14:38<44:54, 1.63it/s] {'loss': 0.1604, 'grad_norm': 0.47809290885925293, 'learning_rate': 3.8040437423194576e-06, 'epoch': 1.86}
62%|██████▏ | 7137/11526 [1:14:38<44:54, 1.63it/s] 62%|██████▏ | 7138/11526 [1:14:38<44:54, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.49317771196365356, 'learning_rate': 3.8025734414649897e-06, 'epoch': 1.86}
62%|██████▏ | 7138/11526 [1:14:38<44:54, 1.63it/s] 62%|██████▏ | 7139/11526 [1:14:39<44:55, 1.63it/s] {'loss': 0.1993, 'grad_norm': 0.5558554530143738, 'learning_rate': 3.8011032504453e-06, 'epoch': 1.86}
62%|██████▏ | 7139/11526 [1:14:39<44:55, 1.63it/s] 62%|██████▏ | 7140/11526 [1:14:39<44:53, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5174018144607544, 'learning_rate': 3.7996331693952408e-06, 'epoch': 1.86}
62%|██████▏ | 7140/11526 [1:14:40<44:53, 1.63it/s] 62%|██████▏ | 7141/11526 [1:14:40<44:52, 1.63it/s] {'loss': 0.1815, 'grad_norm': 0.4817260801792145, 'learning_rate': 3.7981631984496568e-06, 'epoch': 1.86}
62%|██████▏ | 7141/11526 [1:14:40<44:52, 1.63it/s] 62%|██████▏ | 7142/11526 [1:14:41<44:51, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.4911251366138458, 'learning_rate': 3.7966933377433814e-06, 'epoch': 1.86}
62%|██████▏ | 7142/11526 [1:14:41<44:51, 1.63it/s] 62%|██████▏ | 7143/11526 [1:14:41<44:50, 1.63it/s] {'loss': 0.2064, 'grad_norm': 0.6015594601631165, 'learning_rate': 3.795223587411241e-06, 'epoch': 1.86}
62%|██████▏ | 7143/11526 [1:14:41<44:50, 1.63it/s] 62%|██████▏ | 7144/11526 [1:14:42<44:49, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.4625721573829651, 'learning_rate': 3.7937539475880487e-06, 'epoch': 1.86}
62%|██████▏ | 7144/11526 [1:14:42<44:49, 1.63it/s] 62%|██████▏ | 7145/11526 [1:14:42<44:48, 1.63it/s] {'loss': 0.1771, 'grad_norm': 0.4816162884235382, 'learning_rate': 3.7922844184086063e-06, 'epoch': 1.86}
62%|██████▏ | 7145/11526 [1:14:43<44:48, 1.63it/s] 62%|██████▏ | 7146/11526 [1:14:43<44:48, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.5853617787361145, 'learning_rate': 3.7908150000077076e-06, 'epoch': 1.86}
62%|██████▏ | 7146/11526 [1:14:43<44:48, 1.63it/s] 62%|██████▏ | 7147/11526 [1:14:44<44:47, 1.63it/s] {'loss': 0.2294, 'grad_norm': 0.5913609862327576, 'learning_rate': 3.789345692520139e-06, 'epoch': 1.86}
62%|██████▏ | 7147/11526 [1:14:44<44:47, 1.63it/s] 62%|██████▏ | 7148/11526 [1:14:44<44:47, 1.63it/s] {'loss': 0.2086, 'grad_norm': 0.5423474311828613, 'learning_rate': 3.7878764960806723e-06, 'epoch': 1.86}
62%|██████▏ | 7148/11526 [1:14:44<44:47, 1.63it/s] 62%|██████▏ | 7149/11526 [1:14:45<44:48, 1.63it/s] {'loss': 0.1565, 'grad_norm': 0.47883322834968567, 'learning_rate': 3.786407410824069e-06, 'epoch': 1.86}
62%|██████▏ | 7149/11526 [1:14:45<44:48, 1.63it/s] 62%|██████▏ | 7150/11526 [1:14:46<44:46, 1.63it/s] {'loss': 0.1832, 'grad_norm': 0.4765053391456604, 'learning_rate': 3.7849384368850834e-06, 'epoch': 1.86}
62%|██████▏ | 7150/11526 [1:14:46<44:46, 1.63it/s] 62%|██████▏ | 7151/11526 [1:14:46<44:46, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.5909844040870667, 'learning_rate': 3.7834695743984564e-06, 'epoch': 1.86}
62%|██████▏ | 7151/11526 [1:14:46<44:46, 1.63it/s] 62%|██████▏ | 7152/11526 [1:14:47<44:46, 1.63it/s] {'loss': 0.1881, 'grad_norm': 0.5223832130432129, 'learning_rate': 3.7820008234989237e-06, 'epoch': 1.86}
62%|██████▏ | 7152/11526 [1:14:47<44:46, 1.63it/s] 62%|██████▏ | 7153/11526 [1:14:47<44:44, 1.63it/s] {'loss': 0.1846, 'grad_norm': 0.47483429312705994, 'learning_rate': 3.780532184321204e-06, 'epoch': 1.86}
62%|██████▏ | 7153/11526 [1:14:48<44:44, 1.63it/s] 62%|██████▏ | 7154/11526 [1:14:48<44:44, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.5986648797988892, 'learning_rate': 3.779063657000013e-06, 'epoch': 1.86}
62%|██████▏ | 7154/11526 [1:14:48<44:44, 1.63it/s] 62%|██████▏ | 7155/11526 [1:14:49<44:43, 1.63it/s] {'loss': 0.1986, 'grad_norm': 0.487713098526001, 'learning_rate': 3.7775952416700467e-06, 'epoch': 1.86}
62%|██████▏ | 7155/11526 [1:14:49<44:43, 1.63it/s] 62%|██████▏ | 7156/11526 [1:14:49<44:42, 1.63it/s] {'loss': 0.2358, 'grad_norm': 0.5387531518936157, 'learning_rate': 3.776126938466003e-06, 'epoch': 1.86}
62%|██████▏ | 7156/11526 [1:14:49<44:42, 1.63it/s] 62%|██████▏ | 7157/11526 [1:14:50<44:42, 1.63it/s] {'loss': 0.2108, 'grad_norm': 0.5818737149238586, 'learning_rate': 3.774658747522559e-06, 'epoch': 1.86}
62%|██████▏ | 7157/11526 [1:14:50<44:42, 1.63it/s] 62%|██████▏ | 7158/11526 [1:14:50<44:41, 1.63it/s] {'loss': 0.1936, 'grad_norm': 0.6024751663208008, 'learning_rate': 3.773190668974388e-06, 'epoch': 1.86}
62%|██████▏ | 7158/11526 [1:14:51<44:41, 1.63it/s] 62%|██████▏ | 7159/11526 [1:14:51<44:41, 1.63it/s] {'loss': 0.1838, 'grad_norm': 0.47015470266342163, 'learning_rate': 3.771722702956149e-06, 'epoch': 1.86}
62%|██████▏ | 7159/11526 [1:14:51<44:41, 1.63it/s] 62%|██████▏ | 7160/11526 [1:14:52<44:42, 1.63it/s] {'loss': 0.2638, 'grad_norm': 0.6407634615898132, 'learning_rate': 3.770254849602489e-06, 'epoch': 1.86}
62%|██████▏ | 7160/11526 [1:14:52<44:42, 1.63it/s] 62%|██████▏ | 7161/11526 [1:14:52<44:40, 1.63it/s] {'loss': 0.1288, 'grad_norm': 0.44790852069854736, 'learning_rate': 3.7687871090480534e-06, 'epoch': 1.86}
62%|██████▏ | 7161/11526 [1:14:52<44:40, 1.63it/s] 62%|██████▏ | 7162/11526 [1:14:53<44:39, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.6192987561225891, 'learning_rate': 3.7673194814274697e-06, 'epoch': 1.86}
62%|██████▏ | 7162/11526 [1:14:53<44:39, 1.63it/s] 62%|██████▏ | 7163/11526 [1:14:54<44:37, 1.63it/s] {'loss': 0.1798, 'grad_norm': 0.49587875604629517, 'learning_rate': 3.765851966875358e-06, 'epoch': 1.86}
62%|██████▏ | 7163/11526 [1:14:54<44:37, 1.63it/s] 62%|██████▏ | 7164/11526 [1:14:54<44:38, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.48650407791137695, 'learning_rate': 3.7643845655263227e-06, 'epoch': 1.86}
62%|██████▏ | 7164/11526 [1:14:54<44:38, 1.63it/s] 62%|██████▏ | 7165/11526 [1:14:55<44:39, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.5524424910545349, 'learning_rate': 3.762917277514967e-06, 'epoch': 1.86}
62%|██████▏ | 7165/11526 [1:14:55<44:39, 1.63it/s] 62%|██████▏ | 7166/11526 [1:14:55<44:39, 1.63it/s] {'loss': 0.2071, 'grad_norm': 0.5696313977241516, 'learning_rate': 3.761450102975878e-06, 'epoch': 1.87}
62%|██████▏ | 7166/11526 [1:14:56<44:39, 1.63it/s] 62%|██████▏ | 7167/11526 [1:14:56<44:37, 1.63it/s] {'loss': 0.2734, 'grad_norm': 0.7067801356315613, 'learning_rate': 3.759983042043633e-06, 'epoch': 1.87}
62%|██████▏ | 7167/11526 [1:14:56<44:37, 1.63it/s] 62%|██████▏ | 7168/11526 [1:14:57<44:40, 1.63it/s] {'loss': 0.2242, 'grad_norm': 0.6467667818069458, 'learning_rate': 3.7585160948527964e-06, 'epoch': 1.87}
62%|██████▏ | 7168/11526 [1:14:57<44:40, 1.63it/s] 62%|██████▏ | 7169/11526 [1:14:57<44:38, 1.63it/s] {'loss': 0.192, 'grad_norm': 0.5359105467796326, 'learning_rate': 3.7570492615379282e-06, 'epoch': 1.87}
62%|██████▏ | 7169/11526 [1:14:57<44:38, 1.63it/s] 62%|██████▏ | 7170/11526 [1:14:58<44:37, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.47164005041122437, 'learning_rate': 3.7555825422335758e-06, 'epoch': 1.87}
62%|██████▏ | 7170/11526 [1:14:58<44:37, 1.63it/s] 62%|██████▏ | 7171/11526 [1:14:58<44:36, 1.63it/s] {'loss': 0.2193, 'grad_norm': 0.6007590889930725, 'learning_rate': 3.7541159370742724e-06, 'epoch': 1.87}
62%|██████▏ | 7171/11526 [1:14:59<44:36, 1.63it/s] 62%|██████▏ | 7172/11526 [1:14:59<44:35, 1.63it/s] {'loss': 0.1896, 'grad_norm': 0.5183532238006592, 'learning_rate': 3.752649446194543e-06, 'epoch': 1.87}
62%|██████▏ | 7172/11526 [1:14:59<44:35, 1.63it/s] 62%|██████▏ | 7173/11526 [1:15:00<44:33, 1.63it/s] {'loss': 0.221, 'grad_norm': 0.5954309105873108, 'learning_rate': 3.7511830697289024e-06, 'epoch': 1.87}
62%|██████▏ | 7173/11526 [1:15:00<44:33, 1.63it/s] 62%|██████▏ | 7174/11526 [1:15:00<44:34, 1.63it/s] {'loss': 0.2201, 'grad_norm': 0.5832558870315552, 'learning_rate': 3.7497168078118586e-06, 'epoch': 1.87}
62%|██████▏ | 7174/11526 [1:15:00<44:34, 1.63it/s] 62%|██████▏ | 7175/11526 [1:15:01<44:33, 1.63it/s] {'loss': 0.242, 'grad_norm': 0.6290537714958191, 'learning_rate': 3.7482506605779013e-06, 'epoch': 1.87}
62%|██████▏ | 7175/11526 [1:15:01<44:33, 1.63it/s] 62%|██████▏ | 7176/11526 [1:15:02<44:32, 1.63it/s] {'loss': 0.1968, 'grad_norm': 0.5775386691093445, 'learning_rate': 3.7467846281615173e-06, 'epoch': 1.87}
62%|██████▏ | 7176/11526 [1:15:02<44:32, 1.63it/s] 62%|██████▏ | 7177/11526 [1:15:02<44:31, 1.63it/s] {'loss': 0.2021, 'grad_norm': 0.5466127395629883, 'learning_rate': 3.7453187106971757e-06, 'epoch': 1.87}
62%|██████▏ | 7177/11526 [1:15:02<44:31, 1.63it/s] 62%|██████▏ | 7178/11526 [1:15:03<44:29, 1.63it/s] {'loss': 0.22, 'grad_norm': 0.607427716255188, 'learning_rate': 3.743852908319342e-06, 'epoch': 1.87}
62%|██████▏ | 7178/11526 [1:15:03<44:29, 1.63it/s] 62%|██████▏ | 7179/11526 [1:15:03<44:30, 1.63it/s] {'loss': 0.1905, 'grad_norm': 0.5966889262199402, 'learning_rate': 3.742387221162466e-06, 'epoch': 1.87}
62%|██████▏ | 7179/11526 [1:15:03<44:30, 1.63it/s] 62%|██████▏ | 7180/11526 [1:15:04<44:29, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.5435986518859863, 'learning_rate': 3.740921649360991e-06, 'epoch': 1.87}
62%|██████▏ | 7180/11526 [1:15:04<44:29, 1.63it/s] 62%|██████▏ | 7181/11526 [1:15:05<44:28, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.5528957843780518, 'learning_rate': 3.739456193049346e-06, 'epoch': 1.87}
62%|██████▏ | 7181/11526 [1:15:05<44:28, 1.63it/s] 62%|██████▏ | 7182/11526 [1:15:05<44:28, 1.63it/s] {'loss': 0.2322, 'grad_norm': 0.585945725440979, 'learning_rate': 3.737990852361949e-06, 'epoch': 1.87}
62%|██████▏ | 7182/11526 [1:15:05<44:28, 1.63it/s] 62%|██████▏ | 7183/11526 [1:15:06<44:29, 1.63it/s] {'loss': 0.252, 'grad_norm': 0.8022609353065491, 'learning_rate': 3.736525627433213e-06, 'epoch': 1.87}
62%|██████▏ | 7183/11526 [1:15:06<44:29, 1.63it/s] 62%|██████▏ | 7184/11526 [1:15:06<44:31, 1.63it/s] {'loss': 0.2338, 'grad_norm': 0.5810028314590454, 'learning_rate': 3.7350605183975365e-06, 'epoch': 1.87}
62%|██████▏ | 7184/11526 [1:15:07<44:31, 1.63it/s] 62%|██████▏ | 7185/11526 [1:15:07<44:28, 1.63it/s] {'loss': 0.1997, 'grad_norm': 0.522351861000061, 'learning_rate': 3.733595525389306e-06, 'epoch': 1.87}
62%|██████▏ | 7185/11526 [1:15:07<44:28, 1.63it/s] 62%|██████▏ | 7186/11526 [1:15:08<44:27, 1.63it/s] {'loss': 0.2828, 'grad_norm': 0.7966023683547974, 'learning_rate': 3.732130648542897e-06, 'epoch': 1.87}
62%|██████▏ | 7186/11526 [1:15:08<44:27, 1.63it/s] 62%|██████▏ | 7187/11526 [1:15:08<44:24, 1.63it/s] {'loss': 0.176, 'grad_norm': 0.4762219488620758, 'learning_rate': 3.7306658879926804e-06, 'epoch': 1.87}
62%|██████▏ | 7187/11526 [1:15:08<44:24, 1.63it/s] 62%|██████▏ | 7188/11526 [1:15:09<44:25, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.5656667351722717, 'learning_rate': 3.729201243873012e-06, 'epoch': 1.87}
62%|██████▏ | 7188/11526 [1:15:09<44:25, 1.63it/s] 62%|██████▏ | 7189/11526 [1:15:10<44:25, 1.63it/s] {'loss': 0.182, 'grad_norm': 0.48630475997924805, 'learning_rate': 3.7277367163182355e-06, 'epoch': 1.87}
62%|██████▏ | 7189/11526 [1:15:10<44:25, 1.63it/s] 62%|██████▏ | 7190/11526 [1:15:10<44:23, 1.63it/s] {'loss': 0.256, 'grad_norm': 0.6117269992828369, 'learning_rate': 3.7262723054626854e-06, 'epoch': 1.87}
62%|██████▏ | 7190/11526 [1:15:10<44:23, 1.63it/s] 62%|██████▏ | 7191/11526 [1:15:11<44:22, 1.63it/s] {'loss': 0.1928, 'grad_norm': 0.5051115155220032, 'learning_rate': 3.7248080114406852e-06, 'epoch': 1.87}
62%|██████▏ | 7191/11526 [1:15:11<44:22, 1.63it/s] 62%|██████▏ | 7192/11526 [1:15:11<44:21, 1.63it/s] {'loss': 0.223, 'grad_norm': 0.5736192464828491, 'learning_rate': 3.7233438343865524e-06, 'epoch': 1.87}
62%|██████▏ | 7192/11526 [1:15:11<44:21, 1.63it/s] 62%|██████▏ | 7193/11526 [1:15:12<44:19, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.5527591109275818, 'learning_rate': 3.7218797744345868e-06, 'epoch': 1.87}
62%|██████▏ | 7193/11526 [1:15:12<44:19, 1.63it/s] 62%|██████▏ | 7194/11526 [1:15:13<44:22, 1.63it/s] {'loss': 0.2338, 'grad_norm': 0.6923272013664246, 'learning_rate': 3.7204158317190796e-06, 'epoch': 1.87}
62%|██████▏ | 7194/11526 [1:15:13<44:22, 1.63it/s] 62%|██████▏ | 7195/11526 [1:15:13<44:21, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.4684491455554962, 'learning_rate': 3.7189520063743115e-06, 'epoch': 1.87}
62%|██████▏ | 7195/11526 [1:15:13<44:21, 1.63it/s] 62%|██████▏ | 7196/11526 [1:15:14<44:18, 1.63it/s] {'loss': 0.2117, 'grad_norm': 0.6221287846565247, 'learning_rate': 3.7174882985345567e-06, 'epoch': 1.87}
62%|██████▏ | 7196/11526 [1:15:14<44:18, 1.63it/s] 62%|██████▏ | 7197/11526 [1:15:14<44:20, 1.63it/s] {'loss': 0.1917, 'grad_norm': 0.5931541323661804, 'learning_rate': 3.7160247083340723e-06, 'epoch': 1.87}
62%|██████▏ | 7197/11526 [1:15:15<44:20, 1.63it/s] 62%|██████▏ | 7198/11526 [1:15:15<44:18, 1.63it/s] {'loss': 0.1574, 'grad_norm': 0.45345455408096313, 'learning_rate': 3.714561235907106e-06, 'epoch': 1.87}
62%|██████▏ | 7198/11526 [1:15:15<44:18, 1.63it/s] 62%|██████▏ | 7199/11526 [1:15:16<44:17, 1.63it/s] {'loss': 0.2158, 'grad_norm': 0.5927488803863525, 'learning_rate': 3.7130978813878982e-06, 'epoch': 1.87}
62%|██████▏ | 7199/11526 [1:15:16<44:17, 1.63it/s] 62%|██████▏ | 7200/11526 [1:15:16<44:17, 1.63it/s] {'loss': 0.2842, 'grad_norm': 0.7955197095870972, 'learning_rate': 3.7116346449106723e-06, 'epoch': 1.87}
62%|██████▏ | 7200/11526 [1:15:16<44:17, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5226460099220276, 'eval_runtime': 1.9543, 'eval_samples_per_second': 102.338, 'eval_steps_per_second': 6.652, 'epoch': 1.87}
62%|██████▏ | 7200/11526 [1:15:18<44:17, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 62%|██████▏ | 7201/11526 [1:15:19<1:26:38, 1.20s/it] {'loss': 0.2268, 'grad_norm': 0.6207001209259033, 'learning_rate': 3.71017152660965e-06, 'epoch': 1.87}
62%|██████▏ | 7201/11526 [1:15:19<1:26:38, 1.20s/it] 62%|██████▏ | 7202/11526 [1:15:19<1:13:54, 1.03s/it] {'loss': 0.2307, 'grad_norm': 0.8360636830329895, 'learning_rate': 3.7087085266190324e-06, 'epoch': 1.87}
62%|██████▏ | 7202/11526 [1:15:20<1:13:54, 1.03s/it] 62%|██████▏ | 7203/11526 [1:15:20<1:04:59, 1.11it/s] {'loss': 0.1061, 'grad_norm': 0.3405294120311737, 'learning_rate': 3.7072456450730178e-06, 'epoch': 1.87}
62%|██████▏ | 7203/11526 [1:15:20<1:04:59, 1.11it/s] 63%|██████▎ | 7204/11526 [1:15:21<58:45, 1.23it/s] {'loss': 0.1918, 'grad_norm': 0.5559899806976318, 'learning_rate': 3.705782882105785e-06, 'epoch': 1.88}
63%|██████▎ | 7204/11526 [1:15:21<58:45, 1.23it/s] 63%|██████▎ | 7205/11526 [1:15:21<54:24, 1.32it/s] {'loss': 0.1507, 'grad_norm': 0.4223271310329437, 'learning_rate': 3.704320237851513e-06, 'epoch': 1.88}
63%|██████▎ | 7205/11526 [1:15:21<54:24, 1.32it/s] 63%|██████▎ | 7206/11526 [1:15:22<51:22, 1.40it/s] {'loss': 0.1451, 'grad_norm': 0.4021451473236084, 'learning_rate': 3.7028577124443595e-06, 'epoch': 1.88}
63%|██████▎ | 7206/11526 [1:15:22<51:22, 1.40it/s] 63%|██████▎ | 7207/11526 [1:15:23<49:15, 1.46it/s] {'loss': 0.1736, 'grad_norm': 0.5084652304649353, 'learning_rate': 3.701395306018478e-06, 'epoch': 1.88}
63%|██████▎ | 7207/11526 [1:15:23<49:15, 1.46it/s] 63%|██████▎ | 7208/11526 [1:15:23<47:45, 1.51it/s] {'loss': 0.1803, 'grad_norm': 0.5385011434555054, 'learning_rate': 3.6999330187080057e-06, 'epoch': 1.88}
63%|██████▎ | 7208/11526 [1:15:23<47:45, 1.51it/s] 63%|██████▎ | 7209/11526 [1:15:24<46:45, 1.54it/s] {'loss': 0.1776, 'grad_norm': 0.47193872928619385, 'learning_rate': 3.6984708506470757e-06, 'epoch': 1.88}
63%|██████▎ | 7209/11526 [1:15:24<46:45, 1.54it/s] 63%|██████▎ | 7210/11526 [1:15:24<45:58, 1.56it/s] {'loss': 0.1789, 'grad_norm': 0.530078649520874, 'learning_rate': 3.697008801969805e-06, 'epoch': 1.88}
63%|██████▎ | 7210/11526 [1:15:25<45:58, 1.56it/s] 63%|██████▎ | 7211/11526 [1:15:25<45:23, 1.58it/s] {'loss': 0.2048, 'grad_norm': 0.578322172164917, 'learning_rate': 3.6955468728103007e-06, 'epoch': 1.88}
63%|██████▎ | 7211/11526 [1:15:25<45:23, 1.58it/s] 63%|██████▎ | 7212/11526 [1:15:26<45:02, 1.60it/s] {'loss': 0.16, 'grad_norm': 0.514086127281189, 'learning_rate': 3.6940850633026593e-06, 'epoch': 1.88}
63%|██████▎ | 7212/11526 [1:15:26<45:02, 1.60it/s] 63%|██████▎ | 7213/11526 [1:15:26<44:46, 1.61it/s] {'loss': 0.2178, 'grad_norm': 0.5956560969352722, 'learning_rate': 3.6926233735809634e-06, 'epoch': 1.88}
63%|██████▎ | 7213/11526 [1:15:26<44:46, 1.61it/s] 63%|██████▎ | 7214/11526 [1:15:27<44:36, 1.61it/s] {'loss': 0.1875, 'grad_norm': 0.562708854675293, 'learning_rate': 3.6911618037792927e-06, 'epoch': 1.88}
63%|██████▎ | 7214/11526 [1:15:27<44:36, 1.61it/s] 63%|██████▎ | 7215/11526 [1:15:27<44:29, 1.62it/s] {'loss': 0.2368, 'grad_norm': 0.5653893947601318, 'learning_rate': 3.689700354031709e-06, 'epoch': 1.88}
63%|██████▎ | 7215/11526 [1:15:28<44:29, 1.62it/s] 63%|██████▎ | 7216/11526 [1:15:28<44:22, 1.62it/s] {'loss': 0.1956, 'grad_norm': 0.5678914785385132, 'learning_rate': 3.6882390244722643e-06, 'epoch': 1.88}
63%|██████▎ | 7216/11526 [1:15:28<44:22, 1.62it/s] 63%|██████▎ | 7217/11526 [1:15:29<44:16, 1.62it/s] {'loss': 0.1515, 'grad_norm': 0.4219602644443512, 'learning_rate': 3.6867778152349975e-06, 'epoch': 1.88}
63%|██████▎ | 7217/11526 [1:15:29<44:16, 1.62it/s] 63%|██████▎ | 7218/11526 [1:15:29<44:12, 1.62it/s] {'loss': 0.2034, 'grad_norm': 0.5902359485626221, 'learning_rate': 3.6853167264539423e-06, 'epoch': 1.88}
63%|██████▎ | 7218/11526 [1:15:29<44:12, 1.62it/s] 63%|██████▎ | 7219/11526 [1:15:30<44:12, 1.62it/s] {'loss': 0.2496, 'grad_norm': 0.5715998411178589, 'learning_rate': 3.6838557582631183e-06, 'epoch': 1.88}
63%|██████▎ | 7219/11526 [1:15:30<44:12, 1.62it/s] 63%|██████▎ | 7220/11526 [1:15:31<44:08, 1.63it/s] {'loss': 0.2344, 'grad_norm': 0.587290346622467, 'learning_rate': 3.682394910796533e-06, 'epoch': 1.88}
63%|██████▎ | 7220/11526 [1:15:31<44:08, 1.63it/s] 63%|██████▎ | 7221/11526 [1:15:31<44:07, 1.63it/s] {'loss': 0.175, 'grad_norm': 0.4719642400741577, 'learning_rate': 3.680934184188182e-06, 'epoch': 1.88}
63%|██████▎ | 7221/11526 [1:15:31<44:07, 1.63it/s] 63%|██████▎ | 7222/11526 [1:15:32<44:06, 1.63it/s] {'loss': 0.2466, 'grad_norm': 1.000137209892273, 'learning_rate': 3.6794735785720515e-06, 'epoch': 1.88}
63%|██████▎ | 7222/11526 [1:15:32<44:06, 1.63it/s] 63%|██████▎ | 7223/11526 [1:15:32<44:05, 1.63it/s] {'loss': 0.1775, 'grad_norm': 0.5446970462799072, 'learning_rate': 3.6780130940821207e-06, 'epoch': 1.88}
63%|██████▎ | 7223/11526 [1:15:32<44:05, 1.63it/s] 63%|██████▎ | 7224/11526 [1:15:33<44:08, 1.62it/s] {'loss': 0.1937, 'grad_norm': 0.5583338141441345, 'learning_rate': 3.67655273085235e-06, 'epoch': 1.88}
63%|██████▎ | 7224/11526 [1:15:33<44:08, 1.62it/s] 63%|██████▎ | 7225/11526 [1:15:34<44:06, 1.62it/s] {'loss': 0.1511, 'grad_norm': 0.5210551619529724, 'learning_rate': 3.675092489016693e-06, 'epoch': 1.88}
63%|██████▎ | 7225/11526 [1:15:34<44:06, 1.62it/s] 63%|██████▎ | 7226/11526 [1:15:34<44:04, 1.63it/s] {'loss': 0.175, 'grad_norm': 0.5004330277442932, 'learning_rate': 3.673632368709089e-06, 'epoch': 1.88}
63%|██████▎ | 7226/11526 [1:15:34<44:04, 1.63it/s] 63%|██████▎ | 7227/11526 [1:15:35<44:02, 1.63it/s] {'loss': 0.2008, 'grad_norm': 0.563385546207428, 'learning_rate': 3.672172370063474e-06, 'epoch': 1.88}
63%|██████▎ | 7227/11526 [1:15:35<44:02, 1.63it/s] 63%|██████▎ | 7228/11526 [1:15:35<44:01, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.570486307144165, 'learning_rate': 3.670712493213763e-06, 'epoch': 1.88}
63%|██████▎ | 7228/11526 [1:15:36<44:01, 1.63it/s] 63%|██████▎ | 7229/11526 [1:15:36<44:03, 1.63it/s] {'loss': 0.1577, 'grad_norm': 0.43776169419288635, 'learning_rate': 3.6692527382938654e-06, 'epoch': 1.88}
63%|██████▎ | 7229/11526 [1:15:36<44:03, 1.63it/s] 63%|██████▎ | 7230/11526 [1:15:37<44:00, 1.63it/s] {'loss': 0.235, 'grad_norm': 0.6597014665603638, 'learning_rate': 3.667793105437679e-06, 'epoch': 1.88}
63%|██████▎ | 7230/11526 [1:15:37<44:00, 1.63it/s] 63%|██████▎ | 7231/11526 [1:15:37<43:59, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.49898451566696167, 'learning_rate': 3.666333594779087e-06, 'epoch': 1.88}
63%|██████▎ | 7231/11526 [1:15:37<43:59, 1.63it/s] 63%|██████▎ | 7232/11526 [1:15:38<43:57, 1.63it/s] {'loss': 0.2666, 'grad_norm': 0.6535360217094421, 'learning_rate': 3.6648742064519675e-06, 'epoch': 1.88}
63%|██████▎ | 7232/11526 [1:15:38<43:57, 1.63it/s] 63%|██████▎ | 7233/11526 [1:15:39<43:58, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.4629923105239868, 'learning_rate': 3.663414940590182e-06, 'epoch': 1.88}
63%|██████▎ | 7233/11526 [1:15:39<43:58, 1.63it/s] 63%|██████▎ | 7234/11526 [1:15:39<44:00, 1.63it/s] {'loss': 0.192, 'grad_norm': 0.5394689440727234, 'learning_rate': 3.661955797327583e-06, 'epoch': 1.88}
63%|██████▎ | 7234/11526 [1:15:39<44:00, 1.63it/s] 63%|██████▎ | 7235/11526 [1:15:40<43:58, 1.63it/s] {'loss': 0.2702, 'grad_norm': 0.7442535161972046, 'learning_rate': 3.6604967767980106e-06, 'epoch': 1.88}
63%|██████▎ | 7235/11526 [1:15:40<43:58, 1.63it/s] 63%|██████▎ | 7236/11526 [1:15:40<43:57, 1.63it/s] {'loss': 0.1313, 'grad_norm': 0.422684907913208, 'learning_rate': 3.6590378791352965e-06, 'epoch': 1.88}
63%|██████▎ | 7236/11526 [1:15:40<43:57, 1.63it/s] 63%|██████▎ | 7237/11526 [1:15:41<43:57, 1.63it/s] {'loss': 0.1829, 'grad_norm': 0.5019363760948181, 'learning_rate': 3.657579104473258e-06, 'epoch': 1.88}
63%|██████▎ | 7237/11526 [1:15:41<43:57, 1.63it/s] 63%|██████▎ | 7238/11526 [1:15:42<43:54, 1.63it/s] {'loss': 0.18, 'grad_norm': 0.5339580178260803, 'learning_rate': 3.656120452945702e-06, 'epoch': 1.88}
63%|██████▎ | 7238/11526 [1:15:42<43:54, 1.63it/s] 63%|██████▎ | 7239/11526 [1:15:42<43:56, 1.63it/s] {'loss': 0.199, 'grad_norm': 0.55171799659729, 'learning_rate': 3.6546619246864235e-06, 'epoch': 1.88}
63%|██████▎ | 7239/11526 [1:15:42<43:56, 1.63it/s] 63%|██████▎ | 7240/11526 [1:15:43<43:56, 1.63it/s] {'loss': 0.2052, 'grad_norm': 0.5512077808380127, 'learning_rate': 3.653203519829208e-06, 'epoch': 1.88}
63%|██████▎ | 7240/11526 [1:15:43<43:56, 1.63it/s] 63%|██████▎ | 7241/11526 [1:15:43<43:54, 1.63it/s] {'loss': 0.2091, 'grad_norm': 0.6229820847511292, 'learning_rate': 3.651745238507829e-06, 'epoch': 1.88}
63%|██████▎ | 7241/11526 [1:15:44<43:54, 1.63it/s] 63%|██████▎ | 7242/11526 [1:15:44<43:54, 1.63it/s] {'loss': 0.1479, 'grad_norm': 0.45499855279922485, 'learning_rate': 3.6502870808560485e-06, 'epoch': 1.88}
63%|██████▎ | 7242/11526 [1:15:44<43:54, 1.63it/s] 63%|██████▎ | 7243/11526 [1:15:45<43:53, 1.63it/s] {'loss': 0.3009, 'grad_norm': 0.6252976059913635, 'learning_rate': 3.648829047007616e-06, 'epoch': 1.89}
63%|██████▎ | 7243/11526 [1:15:45<43:53, 1.63it/s] 63%|██████▎ | 7244/11526 [1:15:45<43:53, 1.63it/s] {'loss': 0.1971, 'grad_norm': 0.5121604800224304, 'learning_rate': 3.6473711370962706e-06, 'epoch': 1.89}
63%|██████▎ | 7244/11526 [1:15:45<43:53, 1.63it/s] 63%|██████▎ | 7245/11526 [1:15:46<43:51, 1.63it/s] {'loss': 0.257, 'grad_norm': 0.5986030697822571, 'learning_rate': 3.6459133512557408e-06, 'epoch': 1.89}
63%|██████▎ | 7245/11526 [1:15:46<43:51, 1.63it/s] 63%|██████▎ | 7246/11526 [1:15:47<43:50, 1.63it/s] {'loss': 0.1718, 'grad_norm': 0.5473315715789795, 'learning_rate': 3.644455689619744e-06, 'epoch': 1.89}
63%|██████▎ | 7246/11526 [1:15:47<43:50, 1.63it/s] 63%|██████▎ | 7247/11526 [1:15:47<43:48, 1.63it/s] {'loss': 0.179, 'grad_norm': 0.5053600668907166, 'learning_rate': 3.6429981523219832e-06, 'epoch': 1.89}
63%|██████▎ | 7247/11526 [1:15:47<43:48, 1.63it/s] 63%|██████▎ | 7248/11526 [1:15:48<43:48, 1.63it/s] {'loss': 0.2055, 'grad_norm': 0.6183459758758545, 'learning_rate': 3.6415407394961536e-06, 'epoch': 1.89}
63%|██████▎ | 7248/11526 [1:15:48<43:48, 1.63it/s] 63%|██████▎ | 7249/11526 [1:15:48<43:49, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.4770529866218567, 'learning_rate': 3.6400834512759353e-06, 'epoch': 1.89}
63%|██████▎ | 7249/11526 [1:15:48<43:49, 1.63it/s] 63%|██████▎ | 7250/11526 [1:15:49<43:49, 1.63it/s] {'loss': 0.2737, 'grad_norm': 0.6753323078155518, 'learning_rate': 3.6386262877950018e-06, 'epoch': 1.89}
63%|██████▎ | 7250/11526 [1:15:49<43:49, 1.63it/s] 63%|██████▎ | 7251/11526 [1:15:50<43:49, 1.63it/s] {'loss': 0.1643, 'grad_norm': 0.45065537095069885, 'learning_rate': 3.6371692491870103e-06, 'epoch': 1.89}
63%|██████▎ | 7251/11526 [1:15:50<43:49, 1.63it/s] 63%|██████▎ | 7252/11526 [1:15:50<43:47, 1.63it/s] {'loss': 0.2181, 'grad_norm': 0.5785152912139893, 'learning_rate': 3.6357123355856104e-06, 'epoch': 1.89}
63%|██████▎ | 7252/11526 [1:15:50<43:47, 1.63it/s] 63%|██████▎ | 7253/11526 [1:15:51<43:45, 1.63it/s] {'loss': 0.1442, 'grad_norm': 0.46462392807006836, 'learning_rate': 3.634255547124436e-06, 'epoch': 1.89}
63%|██████▎ | 7253/11526 [1:15:51<43:45, 1.63it/s] 63%|██████▎ | 7254/11526 [1:15:51<43:48, 1.62it/s] {'loss': 0.2446, 'grad_norm': 0.6379711627960205, 'learning_rate': 3.632798883937114e-06, 'epoch': 1.89}
63%|██████▎ | 7254/11526 [1:15:52<43:48, 1.62it/s] 63%|██████▎ | 7255/11526 [1:15:52<43:46, 1.63it/s] {'loss': 0.1787, 'grad_norm': 0.5191456079483032, 'learning_rate': 3.6313423461572584e-06, 'epoch': 1.89}
63%|██████▎ | 7255/11526 [1:15:52<43:46, 1.63it/s] 63%|██████▎ | 7256/11526 [1:15:53<43:45, 1.63it/s] {'loss': 0.1932, 'grad_norm': 0.49874550104141235, 'learning_rate': 3.62988593391847e-06, 'epoch': 1.89}
63%|██████▎ | 7256/11526 [1:15:53<43:45, 1.63it/s] 63%|██████▎ | 7257/11526 [1:15:53<43:42, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.4940688908100128, 'learning_rate': 3.628429647354338e-06, 'epoch': 1.89}
63%|██████▎ | 7257/11526 [1:15:53<43:42, 1.63it/s] 63%|██████▎ | 7258/11526 [1:15:54<43:42, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.4371122717857361, 'learning_rate': 3.626973486598444e-06, 'epoch': 1.89}
63%|██████▎ | 7258/11526 [1:15:54<43:42, 1.63it/s] 63%|██████▎ | 7259/11526 [1:15:54<43:43, 1.63it/s] {'loss': 0.2138, 'grad_norm': 0.6568179130554199, 'learning_rate': 3.6255174517843535e-06, 'epoch': 1.89}
63%|██████▎ | 7259/11526 [1:15:55<43:43, 1.63it/s] 63%|██████▎ | 7260/11526 [1:15:55<43:42, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.5019732713699341, 'learning_rate': 3.624061543045624e-06, 'epoch': 1.89}
63%|██████▎ | 7260/11526 [1:15:55<43:42, 1.63it/s] 63%|██████▎ | 7261/11526 [1:15:56<43:41, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.42245885729789734, 'learning_rate': 3.622605760515798e-06, 'epoch': 1.89}
63%|██████▎ | 7261/11526 [1:15:56<43:41, 1.63it/s] 63%|██████▎ | 7262/11526 [1:15:56<43:39, 1.63it/s] {'loss': 0.2365, 'grad_norm': 0.6035646796226501, 'learning_rate': 3.621150104328407e-06, 'epoch': 1.89}
63%|██████▎ | 7262/11526 [1:15:56<43:39, 1.63it/s] 63%|██████▎ | 7263/11526 [1:15:57<43:39, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.5489895343780518, 'learning_rate': 3.6196945746169744e-06, 'epoch': 1.89}
63%|██████▎ | 7263/11526 [1:15:57<43:39, 1.63it/s] 63%|██████▎ | 7264/11526 [1:15:58<43:41, 1.63it/s] {'loss': 0.2257, 'grad_norm': 0.6503371596336365, 'learning_rate': 3.61823917151501e-06, 'epoch': 1.89}
63%|██████▎ | 7264/11526 [1:15:58<43:41, 1.63it/s] 63%|██████▎ | 7265/11526 [1:15:58<43:41, 1.63it/s] {'loss': 0.1898, 'grad_norm': 0.6193427443504333, 'learning_rate': 3.6167838951560116e-06, 'epoch': 1.89}
63%|██████▎ | 7265/11526 [1:15:58<43:41, 1.63it/s] 63%|██████▎ | 7266/11526 [1:15:59<43:39, 1.63it/s] {'loss': 0.2086, 'grad_norm': 0.5584445595741272, 'learning_rate': 3.6153287456734614e-06, 'epoch': 1.89}
63%|██████▎ | 7266/11526 [1:15:59<43:39, 1.63it/s] 63%|██████▎ | 7267/11526 [1:15:59<43:38, 1.63it/s] {'loss': 0.1841, 'grad_norm': 0.5143095254898071, 'learning_rate': 3.6138737232008386e-06, 'epoch': 1.89}
63%|██████▎ | 7267/11526 [1:16:00<43:38, 1.63it/s] 63%|██████▎ | 7268/11526 [1:16:00<43:36, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.5768393278121948, 'learning_rate': 3.6124188278716055e-06, 'epoch': 1.89}
63%|██████▎ | 7268/11526 [1:16:00<43:36, 1.63it/s] 63%|██████▎ | 7269/11526 [1:16:01<43:49, 1.62it/s] {'loss': 0.1885, 'grad_norm': 0.5071187019348145, 'learning_rate': 3.610964059819213e-06, 'epoch': 1.89}
63%|██████▎ | 7269/11526 [1:16:01<43:49, 1.62it/s] 63%|██████▎ | 7270/11526 [1:16:01<43:44, 1.62it/s] {'loss': 0.215, 'grad_norm': 0.4812282919883728, 'learning_rate': 3.6095094191770984e-06, 'epoch': 1.89}
63%|██████▎ | 7270/11526 [1:16:01<43:44, 1.62it/s] 63%|██████▎ | 7271/11526 [1:16:02<43:40, 1.62it/s] {'loss': 0.1712, 'grad_norm': 0.496812641620636, 'learning_rate': 3.6080549060786914e-06, 'epoch': 1.89}
63%|██████▎ | 7271/11526 [1:16:02<43:40, 1.62it/s] 63%|██████▎ | 7272/11526 [1:16:02<43:37, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.544966995716095, 'learning_rate': 3.6066005206574095e-06, 'epoch': 1.89}
63%|██████▎ | 7272/11526 [1:16:03<43:37, 1.63it/s] 63%|██████▎ | 7273/11526 [1:16:03<43:35, 1.63it/s] {'loss': 0.2221, 'grad_norm': 0.5776386857032776, 'learning_rate': 3.605146263046656e-06, 'epoch': 1.89}
63%|██████▎ | 7273/11526 [1:16:03<43:35, 1.63it/s] 63%|██████▎ | 7274/11526 [1:16:04<43:39, 1.62it/s] {'loss': 0.213, 'grad_norm': 0.6511527299880981, 'learning_rate': 3.6036921333798236e-06, 'epoch': 1.89}
63%|██████▎ | 7274/11526 [1:16:04<43:39, 1.62it/s] 63%|██████▎ | 7275/11526 [1:16:04<43:35, 1.63it/s] {'loss': 0.2434, 'grad_norm': 0.5948319435119629, 'learning_rate': 3.602238131790291e-06, 'epoch': 1.89}
63%|██████▎ | 7275/11526 [1:16:04<43:35, 1.63it/s] 63%|██████▎ | 7276/11526 [1:16:05<43:32, 1.63it/s] {'loss': 0.1836, 'grad_norm': 0.5603308081626892, 'learning_rate': 3.600784258411433e-06, 'epoch': 1.89}
63%|██████▎ | 7276/11526 [1:16:05<43:32, 1.63it/s] 63%|██████▎ | 7277/11526 [1:16:06<43:31, 1.63it/s] {'loss': 0.2138, 'grad_norm': 0.5678926110267639, 'learning_rate': 3.599330513376603e-06, 'epoch': 1.89}
63%|██████▎ | 7277/11526 [1:16:06<43:31, 1.63it/s] 63%|██████▎ | 7278/11526 [1:16:06<43:30, 1.63it/s] {'loss': 0.1358, 'grad_norm': 0.42057618498802185, 'learning_rate': 3.597876896819148e-06, 'epoch': 1.89}
63%|██████▎ | 7278/11526 [1:16:06<43:30, 1.63it/s] 63%|██████▎ | 7279/11526 [1:16:07<43:30, 1.63it/s] {'loss': 0.2142, 'grad_norm': 0.572735071182251, 'learning_rate': 3.5964234088724014e-06, 'epoch': 1.89}
63%|██████▎ | 7279/11526 [1:16:07<43:30, 1.63it/s] 63%|██████▎ | 7280/11526 [1:16:07<43:29, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.4867153763771057, 'learning_rate': 3.594970049669685e-06, 'epoch': 1.89}
63%|██████▎ | 7280/11526 [1:16:08<43:29, 1.63it/s] 63%|██████▎ | 7281/11526 [1:16:08<43:26, 1.63it/s] {'loss': 0.1967, 'grad_norm': 0.5769285559654236, 'learning_rate': 3.5935168193443114e-06, 'epoch': 1.9}
63%|██████▎ | 7281/11526 [1:16:08<43:26, 1.63it/s] 63%|██████▎ | 7282/11526 [1:16:09<43:26, 1.63it/s] {'loss': 0.152, 'grad_norm': 0.4810367524623871, 'learning_rate': 3.592063718029577e-06, 'epoch': 1.9}
63%|██████▎ | 7282/11526 [1:16:09<43:26, 1.63it/s] 63%|██████▎ | 7283/11526 [1:16:09<43:25, 1.63it/s] {'loss': 0.1846, 'grad_norm': 0.5422468185424805, 'learning_rate': 3.5906107458587697e-06, 'epoch': 1.9}
63%|██████▎ | 7283/11526 [1:16:09<43:25, 1.63it/s] 63%|██████▎ | 7284/11526 [1:16:10<43:28, 1.63it/s] {'loss': 0.2461, 'grad_norm': 0.5548682808876038, 'learning_rate': 3.5891579029651617e-06, 'epoch': 1.9}
63%|██████▎ | 7284/11526 [1:16:10<43:28, 1.63it/s] 63%|██████▎ | 7285/11526 [1:16:10<43:27, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.5359472632408142, 'learning_rate': 3.5877051894820207e-06, 'epoch': 1.9}
63%|██████▎ | 7285/11526 [1:16:11<43:27, 1.63it/s] 63%|██████▎ | 7286/11526 [1:16:11<43:26, 1.63it/s] {'loss': 0.2181, 'grad_norm': 0.5831964015960693, 'learning_rate': 3.5862526055425943e-06, 'epoch': 1.9}
63%|██████▎ | 7286/11526 [1:16:11<43:26, 1.63it/s] 63%|██████▎ | 7287/11526 [1:16:12<43:24, 1.63it/s] {'loss': 0.2399, 'grad_norm': 0.7305446267127991, 'learning_rate': 3.5848001512801237e-06, 'epoch': 1.9}
63%|██████▎ | 7287/11526 [1:16:12<43:24, 1.63it/s] 63%|██████▎ | 7288/11526 [1:16:12<43:22, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.4344012439250946, 'learning_rate': 3.5833478268278355e-06, 'epoch': 1.9}
63%|██████▎ | 7288/11526 [1:16:12<43:22, 1.63it/s] 63%|██████▎ | 7289/11526 [1:16:13<43:33, 1.62it/s] {'loss': 0.1973, 'grad_norm': 0.5590127110481262, 'learning_rate': 3.5818956323189437e-06, 'epoch': 1.9}
63%|██████▎ | 7289/11526 [1:16:13<43:33, 1.62it/s] 63%|██████▎ | 7290/11526 [1:16:14<43:27, 1.62it/s] {'loss': 0.204, 'grad_norm': 0.6096685528755188, 'learning_rate': 3.5804435678866536e-06, 'epoch': 1.9}
63%|██████▎ | 7290/11526 [1:16:14<43:27, 1.62it/s] 63%|██████▎ | 7291/11526 [1:16:14<43:25, 1.63it/s] {'loss': 0.285, 'grad_norm': 0.7525134086608887, 'learning_rate': 3.578991633664157e-06, 'epoch': 1.9}
63%|██████▎ | 7291/11526 [1:16:14<43:25, 1.63it/s] 63%|██████▎ | 7292/11526 [1:16:15<43:25, 1.63it/s] {'loss': 0.2174, 'grad_norm': 0.6427628993988037, 'learning_rate': 3.5775398297846333e-06, 'epoch': 1.9}
63%|██████▎ | 7292/11526 [1:16:15<43:25, 1.63it/s] 63%|██████▎ | 7293/11526 [1:16:15<43:24, 1.63it/s] {'loss': 0.1646, 'grad_norm': 0.48857226967811584, 'learning_rate': 3.5760881563812476e-06, 'epoch': 1.9}
63%|██████▎ | 7293/11526 [1:16:16<43:24, 1.63it/s] 63%|██████▎ | 7294/11526 [1:16:16<43:25, 1.62it/s] {'loss': 0.2022, 'grad_norm': 0.5492984652519226, 'learning_rate': 3.574636613587159e-06, 'epoch': 1.9}
63%|██████▎ | 7294/11526 [1:16:16<43:25, 1.62it/s] 63%|██████▎ | 7295/11526 [1:16:17<43:22, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.4740538001060486, 'learning_rate': 3.573185201535511e-06, 'epoch': 1.9}
63%|██████▎ | 7295/11526 [1:16:17<43:22, 1.63it/s] 63%|██████▎ | 7296/11526 [1:16:17<43:19, 1.63it/s] {'loss': 0.1958, 'grad_norm': 0.5854569673538208, 'learning_rate': 3.571733920359435e-06, 'epoch': 1.9}
63%|██████▎ | 7296/11526 [1:16:17<43:19, 1.63it/s] 63%|██████▎ | 7297/11526 [1:16:18<43:18, 1.63it/s] {'loss': 0.1328, 'grad_norm': 0.43096229434013367, 'learning_rate': 3.5702827701920494e-06, 'epoch': 1.9}
63%|██████▎ | 7297/11526 [1:16:18<43:18, 1.63it/s] 63%|██████▎ | 7298/11526 [1:16:18<43:19, 1.63it/s] {'loss': 0.2038, 'grad_norm': 0.5609747767448425, 'learning_rate': 3.5688317511664616e-06, 'epoch': 1.9}
63%|██████▎ | 7298/11526 [1:16:19<43:19, 1.63it/s] 63%|██████▎ | 7299/11526 [1:16:19<43:20, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.5699211359024048, 'learning_rate': 3.5673808634157704e-06, 'epoch': 1.9}
63%|██████▎ | 7299/11526 [1:16:19<43:20, 1.63it/s] 63%|██████▎ | 7300/11526 [1:16:20<43:18, 1.63it/s] {'loss': 0.1773, 'grad_norm': 0.4651049077510834, 'learning_rate': 3.565930107073058e-06, 'epoch': 1.9}
63%|██████▎ | 7300/11526 [1:16:20<43:18, 1.63it/s] 63%|██████▎ | 7301/11526 [1:16:20<43:18, 1.63it/s] {'loss': 0.2587, 'grad_norm': 0.5495930314064026, 'learning_rate': 3.5644794822713947e-06, 'epoch': 1.9}
63%|██████▎ | 7301/11526 [1:16:20<43:18, 1.63it/s] 63%|██████▎ | 7302/11526 [1:16:21<43:16, 1.63it/s] {'loss': 0.1733, 'grad_norm': 0.4797864258289337, 'learning_rate': 3.56302898914384e-06, 'epoch': 1.9}
63%|██████▎ | 7302/11526 [1:16:21<43:16, 1.63it/s] 63%|██████▎ | 7303/11526 [1:16:22<43:14, 1.63it/s] {'loss': 0.2246, 'grad_norm': 0.6003931164741516, 'learning_rate': 3.5615786278234443e-06, 'epoch': 1.9}
63%|██████▎ | 7303/11526 [1:16:22<43:14, 1.63it/s] 63%|██████▎ | 7304/11526 [1:16:22<43:16, 1.63it/s] {'loss': 0.1694, 'grad_norm': 0.4619912803173065, 'learning_rate': 3.5601283984432417e-06, 'epoch': 1.9}
63%|██████▎ | 7304/11526 [1:16:22<43:16, 1.63it/s] 63%|██████▎ | 7305/11526 [1:16:23<43:13, 1.63it/s] {'loss': 0.1846, 'grad_norm': 0.5366246700286865, 'learning_rate': 3.5586783011362536e-06, 'epoch': 1.9}
63%|██████▎ | 7305/11526 [1:16:23<43:13, 1.63it/s] 63%|██████▎ | 7306/11526 [1:16:23<43:10, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.5848516821861267, 'learning_rate': 3.557228336035491e-06, 'epoch': 1.9}
63%|██████▎ | 7306/11526 [1:16:24<43:10, 1.63it/s] 63%|██████▎ | 7307/11526 [1:16:24<43:10, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.5173238515853882, 'learning_rate': 3.5557785032739567e-06, 'epoch': 1.9}
63%|██████▎ | 7307/11526 [1:16:24<43:10, 1.63it/s] 63%|██████▎ | 7308/11526 [1:16:25<43:10, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5781916379928589, 'learning_rate': 3.5543288029846357e-06, 'epoch': 1.9}
63%|██████▎ | 7308/11526 [1:16:25<43:10, 1.63it/s] 63%|██████▎ | 7309/11526 [1:16:25<43:08, 1.63it/s] {'loss': 0.2086, 'grad_norm': 0.5533777475357056, 'learning_rate': 3.5528792353005015e-06, 'epoch': 1.9}
63%|██████▎ | 7309/11526 [1:16:25<43:08, 1.63it/s] 63%|██████▎ | 7310/11526 [1:16:26<43:08, 1.63it/s] {'loss': 0.2686, 'grad_norm': 0.5952131152153015, 'learning_rate': 3.5514298003545185e-06, 'epoch': 1.9}
63%|██████▎ | 7310/11526 [1:16:26<43:08, 1.63it/s] 63%|██████▎ | 7311/11526 [1:16:26<43:08, 1.63it/s] {'loss': 0.2043, 'grad_norm': 0.5788241624832153, 'learning_rate': 3.5499804982796336e-06, 'epoch': 1.9}
63%|██████▎ | 7311/11526 [1:16:27<43:08, 1.63it/s] 63%|██████▎ | 7312/11526 [1:16:27<43:07, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.5821784138679504, 'learning_rate': 3.5485313292087903e-06, 'epoch': 1.9}
63%|██████▎ | 7312/11526 [1:16:27<43:07, 1.63it/s] 63%|██████▎ | 7313/11526 [1:16:28<43:06, 1.63it/s] {'loss': 0.18, 'grad_norm': 0.5491522550582886, 'learning_rate': 3.5470822932749104e-06, 'epoch': 1.9}
63%|██████▎ | 7313/11526 [1:16:28<43:06, 1.63it/s] 63%|██████▎ | 7314/11526 [1:16:28<43:05, 1.63it/s] {'loss': 0.1603, 'grad_norm': 0.4836592674255371, 'learning_rate': 3.5456333906109113e-06, 'epoch': 1.9}
63%|██████▎ | 7314/11526 [1:16:28<43:05, 1.63it/s] 63%|██████▎ | 7315/11526 [1:16:29<43:04, 1.63it/s] {'loss': 0.2141, 'grad_norm': 0.6388636231422424, 'learning_rate': 3.54418462134969e-06, 'epoch': 1.9}
63%|██████▎ | 7315/11526 [1:16:29<43:04, 1.63it/s] 63%|██████▎ | 7316/11526 [1:16:30<43:04, 1.63it/s] {'loss': 0.1784, 'grad_norm': 0.5744719505310059, 'learning_rate': 3.5427359856241404e-06, 'epoch': 1.9}
63%|██████▎ | 7316/11526 [1:16:30<43:04, 1.63it/s] 63%|██████▎ | 7317/11526 [1:16:30<43:04, 1.63it/s] {'loss': 0.1876, 'grad_norm': 0.5654053688049316, 'learning_rate': 3.541287483567137e-06, 'epoch': 1.9}
63%|██████▎ | 7317/11526 [1:16:30<43:04, 1.63it/s] 63%|██████▎ | 7318/11526 [1:16:31<43:02, 1.63it/s] {'loss': 0.2289, 'grad_norm': 0.5577751398086548, 'learning_rate': 3.5398391153115464e-06, 'epoch': 1.9}
63%|██████▎ | 7318/11526 [1:16:31<43:02, 1.63it/s] 63%|██████▎ | 7319/11526 [1:16:31<43:02, 1.63it/s] {'loss': 0.1903, 'grad_norm': 0.4916341006755829, 'learning_rate': 3.538390880990219e-06, 'epoch': 1.9}
63%|██████▎ | 7319/11526 [1:16:32<43:02, 1.63it/s] 64%|██████▎ | 7320/11526 [1:16:32<43:01, 1.63it/s] {'loss': 0.2286, 'grad_norm': 0.5870441794395447, 'learning_rate': 3.5369427807359955e-06, 'epoch': 1.91}
64%|██████▎ | 7320/11526 [1:16:32<43:01, 1.63it/s] 64%|██████▎ | 7321/11526 [1:16:33<43:02, 1.63it/s] {'loss': 0.161, 'grad_norm': 0.45855963230133057, 'learning_rate': 3.535494814681706e-06, 'epoch': 1.91}
64%|██████▎ | 7321/11526 [1:16:33<43:02, 1.63it/s] 64%|██████▎ | 7322/11526 [1:16:33<43:01, 1.63it/s] {'loss': 0.234, 'grad_norm': 0.5348453521728516, 'learning_rate': 3.5340469829601647e-06, 'epoch': 1.91}
64%|██████▎ | 7322/11526 [1:16:33<43:01, 1.63it/s] 64%|██████▎ | 7323/11526 [1:16:34<43:00, 1.63it/s] {'loss': 0.1401, 'grad_norm': 0.4355834126472473, 'learning_rate': 3.5325992857041746e-06, 'epoch': 1.91}
64%|██████▎ | 7323/11526 [1:16:34<43:00, 1.63it/s] 64%|██████▎ | 7324/11526 [1:16:34<42:58, 1.63it/s] {'loss': 0.1804, 'grad_norm': 0.49435752630233765, 'learning_rate': 3.5311517230465255e-06, 'epoch': 1.91}
64%|██████▎ | 7324/11526 [1:16:35<42:58, 1.63it/s] 64%|██████▎ | 7325/11526 [1:16:35<42:59, 1.63it/s] {'loss': 0.285, 'grad_norm': 0.7039461135864258, 'learning_rate': 3.52970429512e-06, 'epoch': 1.91}
64%|██████▎ | 7325/11526 [1:16:35<42:59, 1.63it/s] 64%|██████▎ | 7326/11526 [1:16:36<42:59, 1.63it/s] {'loss': 0.1825, 'grad_norm': 0.5481545925140381, 'learning_rate': 3.5282570020573626e-06, 'epoch': 1.91}
64%|██████▎ | 7326/11526 [1:16:36<42:59, 1.63it/s] 64%|██████▎ | 7327/11526 [1:16:36<42:58, 1.63it/s] {'loss': 0.2095, 'grad_norm': 0.49936023354530334, 'learning_rate': 3.5268098439913657e-06, 'epoch': 1.91}
64%|██████▎ | 7327/11526 [1:16:36<42:58, 1.63it/s] 64%|██████▎ | 7328/11526 [1:16:37<42:57, 1.63it/s] {'loss': 0.1878, 'grad_norm': 0.476435124874115, 'learning_rate': 3.525362821054753e-06, 'epoch': 1.91}
64%|██████▎ | 7328/11526 [1:16:37<42:57, 1.63it/s] 64%|██████▎ | 7329/11526 [1:16:38<42:57, 1.63it/s] {'loss': 0.2146, 'grad_norm': 0.6905679106712341, 'learning_rate': 3.523915933380251e-06, 'epoch': 1.91}
64%|██████▎ | 7329/11526 [1:16:38<42:57, 1.63it/s] 64%|██████▎ | 7330/11526 [1:16:38<42:55, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.5464801788330078, 'learning_rate': 3.5224691811005797e-06, 'epoch': 1.91}
64%|██████▎ | 7330/11526 [1:16:38<42:55, 1.63it/s] 64%|██████▎ | 7331/11526 [1:16:39<42:55, 1.63it/s] {'loss': 0.1956, 'grad_norm': 0.53854900598526, 'learning_rate': 3.521022564348441e-06, 'epoch': 1.91}
64%|██████▎ | 7331/11526 [1:16:39<42:55, 1.63it/s] 64%|██████▎ | 7332/11526 [1:16:39<42:54, 1.63it/s] {'loss': 0.2646, 'grad_norm': 0.7601766586303711, 'learning_rate': 3.5195760832565285e-06, 'epoch': 1.91}
64%|██████▎ | 7332/11526 [1:16:39<42:54, 1.63it/s] 64%|██████▎ | 7333/11526 [1:16:40<42:54, 1.63it/s] {'loss': 0.18, 'grad_norm': 0.5089917182922363, 'learning_rate': 3.518129737957519e-06, 'epoch': 1.91}
64%|██████▎ | 7333/11526 [1:16:40<42:54, 1.63it/s] 64%|██████▎ | 7334/11526 [1:16:41<42:55, 1.63it/s] {'loss': 0.1797, 'grad_norm': 0.4403330981731415, 'learning_rate': 3.516683528584084e-06, 'epoch': 1.91}
64%|██████▎ | 7334/11526 [1:16:41<42:55, 1.63it/s] 64%|██████▎ | 7335/11526 [1:16:41<42:56, 1.63it/s] {'loss': 0.1983, 'grad_norm': 0.5540303587913513, 'learning_rate': 3.515237455268874e-06, 'epoch': 1.91}
64%|██████▎ | 7335/11526 [1:16:41<42:56, 1.63it/s] 64%|██████▎ | 7336/11526 [1:16:42<42:53, 1.63it/s] {'loss': 0.254, 'grad_norm': 0.6016185283660889, 'learning_rate': 3.5137915181445335e-06, 'epoch': 1.91}
64%|██████▎ | 7336/11526 [1:16:42<42:53, 1.63it/s] 64%|██████▎ | 7337/11526 [1:16:42<42:53, 1.63it/s] {'loss': 0.2033, 'grad_norm': 0.5480796694755554, 'learning_rate': 3.5123457173436916e-06, 'epoch': 1.91}
64%|██████▎ | 7337/11526 [1:16:43<42:53, 1.63it/s] 64%|██████▎ | 7338/11526 [1:16:43<42:53, 1.63it/s] {'loss': 0.1941, 'grad_norm': 0.5123698115348816, 'learning_rate': 3.510900052998962e-06, 'epoch': 1.91}
64%|██████▎ | 7338/11526 [1:16:43<42:53, 1.63it/s] 64%|██████▎ | 7339/11526 [1:16:44<42:53, 1.63it/s] {'loss': 0.1957, 'grad_norm': 0.4787314236164093, 'learning_rate': 3.509454525242954e-06, 'epoch': 1.91}
64%|██████▎ | 7339/11526 [1:16:44<42:53, 1.63it/s] 64%|██████▎ | 7340/11526 [1:16:44<42:52, 1.63it/s] {'loss': 0.2228, 'grad_norm': 0.6084563136100769, 'learning_rate': 3.508009134208259e-06, 'epoch': 1.91}
64%|██████▎ | 7340/11526 [1:16:44<42:52, 1.63it/s] 64%|██████▎ | 7341/11526 [1:16:45<42:51, 1.63it/s] {'loss': 0.1777, 'grad_norm': 0.45917031168937683, 'learning_rate': 3.5065638800274543e-06, 'epoch': 1.91}
64%|██████▎ | 7341/11526 [1:16:45<42:51, 1.63it/s] 64%|██████▎ | 7342/11526 [1:16:46<42:49, 1.63it/s] {'loss': 0.2083, 'grad_norm': 0.5341155529022217, 'learning_rate': 3.5051187628331062e-06, 'epoch': 1.91}
64%|██████▎ | 7342/11526 [1:16:46<42:49, 1.63it/s] 64%|██████▎ | 7343/11526 [1:16:46<42:47, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.4486672580242157, 'learning_rate': 3.503673782757772e-06, 'epoch': 1.91}
64%|██████▎ | 7343/11526 [1:16:46<42:47, 1.63it/s] 64%|██████▎ | 7344/11526 [1:16:47<42:46, 1.63it/s] {'loss': 0.2579, 'grad_norm': 0.5985079407691956, 'learning_rate': 3.5022289399339933e-06, 'epoch': 1.91}
64%|██████▎ | 7344/11526 [1:16:47<42:46, 1.63it/s] 64%|██████▎ | 7345/11526 [1:16:47<42:46, 1.63it/s] {'loss': 0.2203, 'grad_norm': 0.5910089612007141, 'learning_rate': 3.5007842344942977e-06, 'epoch': 1.91}
64%|██████▎ | 7345/11526 [1:16:47<42:46, 1.63it/s] 64%|██████▎ | 7346/11526 [1:16:48<42:45, 1.63it/s] {'loss': 0.1882, 'grad_norm': 0.517078697681427, 'learning_rate': 3.4993396665712015e-06, 'epoch': 1.91}
64%|██████▎ | 7346/11526 [1:16:48<42:45, 1.63it/s] 64%|██████▎ | 7347/11526 [1:16:49<42:47, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.5304461121559143, 'learning_rate': 3.4978952362972087e-06, 'epoch': 1.91}
64%|██████▎ | 7347/11526 [1:16:49<42:47, 1.63it/s] 64%|██████▍ | 7348/11526 [1:16:49<42:49, 1.63it/s] {'loss': 0.1995, 'grad_norm': 0.5261043906211853, 'learning_rate': 3.4964509438048134e-06, 'epoch': 1.91}
64%|██████▍ | 7348/11526 [1:16:49<42:49, 1.63it/s] 64%|██████▍ | 7349/11526 [1:16:50<42:49, 1.63it/s] {'loss': 0.2096, 'grad_norm': 0.4907042384147644, 'learning_rate': 3.4950067892264915e-06, 'epoch': 1.91}
64%|██████▍ | 7349/11526 [1:16:50<42:49, 1.63it/s] 64%|██████▍ | 7350/11526 [1:16:50<42:47, 1.63it/s] {'loss': 0.1904, 'grad_norm': 0.5509510040283203, 'learning_rate': 3.49356277269471e-06, 'epoch': 1.91}
64%|██████▍ | 7350/11526 [1:16:51<42:47, 1.63it/s] 64%|██████▍ | 7351/11526 [1:16:51<42:45, 1.63it/s] {'loss': 0.2729, 'grad_norm': 0.7134014368057251, 'learning_rate': 3.492118894341921e-06, 'epoch': 1.91}
64%|██████▍ | 7351/11526 [1:16:51<42:45, 1.63it/s] 64%|██████▍ | 7352/11526 [1:16:52<42:43, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.4892096519470215, 'learning_rate': 3.4906751543005685e-06, 'epoch': 1.91}
64%|██████▍ | 7352/11526 [1:16:52<42:43, 1.63it/s] 64%|██████▍ | 7353/11526 [1:16:52<42:42, 1.63it/s] {'loss': 0.1673, 'grad_norm': 0.512302041053772, 'learning_rate': 3.4892315527030785e-06, 'epoch': 1.91}
64%|██████▍ | 7353/11526 [1:16:52<42:42, 1.63it/s] 64%|██████▍ | 7354/11526 [1:16:53<42:46, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.5295395255088806, 'learning_rate': 3.4877880896818655e-06, 'epoch': 1.91}
64%|██████▍ | 7354/11526 [1:16:53<42:46, 1.63it/s] 64%|██████▍ | 7355/11526 [1:16:53<42:43, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.586807370185852, 'learning_rate': 3.486344765369332e-06, 'epoch': 1.91}
64%|██████▍ | 7355/11526 [1:16:54<42:43, 1.63it/s] 64%|██████▍ | 7356/11526 [1:16:54<42:41, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5818963646888733, 'learning_rate': 3.484901579897871e-06, 'epoch': 1.91}
64%|██████▍ | 7356/11526 [1:16:54<42:41, 1.63it/s] 64%|██████▍ | 7357/11526 [1:16:55<42:39, 1.63it/s] {'loss': 0.2051, 'grad_norm': 0.5921418070793152, 'learning_rate': 3.483458533399857e-06, 'epoch': 1.91}
64%|██████▍ | 7357/11526 [1:16:55<42:39, 1.63it/s] 64%|██████▍ | 7358/11526 [1:16:55<42:38, 1.63it/s] {'loss': 0.168, 'grad_norm': 0.45039141178131104, 'learning_rate': 3.482015626007655e-06, 'epoch': 1.92}
64%|██████▍ | 7358/11526 [1:16:55<42:38, 1.63it/s] 64%|██████▍ | 7359/11526 [1:16:56<42:41, 1.63it/s] {'loss': 0.2313, 'grad_norm': 0.5667645931243896, 'learning_rate': 3.4805728578536172e-06, 'epoch': 1.92}
64%|██████▍ | 7359/11526 [1:16:56<42:41, 1.63it/s] 64%|██████▍ | 7360/11526 [1:16:57<42:40, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.47476309537887573, 'learning_rate': 3.4791302290700803e-06, 'epoch': 1.92}
64%|██████▍ | 7360/11526 [1:16:57<42:40, 1.63it/s] 64%|██████▍ | 7361/11526 [1:16:57<42:38, 1.63it/s] {'loss': 0.193, 'grad_norm': 0.4622335731983185, 'learning_rate': 3.4776877397893744e-06, 'epoch': 1.92}
64%|██████▍ | 7361/11526 [1:16:57<42:38, 1.63it/s] 64%|██████▍ | 7362/11526 [1:16:58<42:37, 1.63it/s] {'loss': 0.2023, 'grad_norm': 0.5282748341560364, 'learning_rate': 3.4762453901438092e-06, 'epoch': 1.92}
64%|██████▍ | 7362/11526 [1:16:58<42:37, 1.63it/s] 64%|██████▍ | 7363/11526 [1:16:58<42:37, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.4960128366947174, 'learning_rate': 3.474803180265688e-06, 'epoch': 1.92}
64%|██████▍ | 7363/11526 [1:16:59<42:37, 1.63it/s] 64%|██████▍ | 7364/11526 [1:16:59<42:39, 1.63it/s] {'loss': 0.161, 'grad_norm': 0.4664328396320343, 'learning_rate': 3.4733611102872943e-06, 'epoch': 1.92}
64%|██████▍ | 7364/11526 [1:16:59<42:39, 1.63it/s] 64%|██████▍ | 7365/11526 [1:17:00<42:37, 1.63it/s] {'loss': 0.1533, 'grad_norm': 0.4849121868610382, 'learning_rate': 3.4719191803409086e-06, 'epoch': 1.92}
64%|██████▍ | 7365/11526 [1:17:00<42:37, 1.63it/s] 64%|██████▍ | 7366/11526 [1:17:00<42:37, 1.63it/s] {'loss': 0.1961, 'grad_norm': 0.5739268660545349, 'learning_rate': 3.470477390558789e-06, 'epoch': 1.92}
64%|██████▍ | 7366/11526 [1:17:00<42:37, 1.63it/s] 64%|██████▍ | 7367/11526 [1:17:01<42:34, 1.63it/s] {'loss': 0.2354, 'grad_norm': 0.5512015223503113, 'learning_rate': 3.4690357410731867e-06, 'epoch': 1.92}
64%|██████▍ | 7367/11526 [1:17:01<42:34, 1.63it/s] 64%|██████▍ | 7368/11526 [1:17:01<42:33, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.4597456157207489, 'learning_rate': 3.467594232016337e-06, 'epoch': 1.92}
64%|██████▍ | 7368/11526 [1:17:02<42:33, 1.63it/s] 64%|██████▍ | 7369/11526 [1:17:02<42:33, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.44288867712020874, 'learning_rate': 3.4661528635204613e-06, 'epoch': 1.92}
64%|██████▍ | 7369/11526 [1:17:02<42:33, 1.63it/s] 64%|██████▍ | 7370/11526 [1:17:03<42:33, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.4738188683986664, 'learning_rate': 3.4647116357177736e-06, 'epoch': 1.92}
64%|██████▍ | 7370/11526 [1:17:03<42:33, 1.63it/s] 64%|██████▍ | 7371/11526 [1:17:03<42:31, 1.63it/s] {'loss': 0.1742, 'grad_norm': 0.46657541394233704, 'learning_rate': 3.4632705487404705e-06, 'epoch': 1.92}
64%|██████▍ | 7371/11526 [1:17:03<42:31, 1.63it/s] 64%|██████▍ | 7372/11526 [1:17:04<42:31, 1.63it/s] {'loss': 0.2019, 'grad_norm': 0.5201616883277893, 'learning_rate': 3.4618296027207366e-06, 'epoch': 1.92}
64%|██████▍ | 7372/11526 [1:17:04<42:31, 1.63it/s] 64%|██████▍ | 7373/11526 [1:17:05<42:30, 1.63it/s] {'loss': 0.2102, 'grad_norm': 0.551425039768219, 'learning_rate': 3.4603887977907415e-06, 'epoch': 1.92}
64%|██████▍ | 7373/11526 [1:17:05<42:30, 1.63it/s] 64%|██████▍ | 7374/11526 [1:17:05<42:29, 1.63it/s] {'loss': 0.2975, 'grad_norm': 0.7415542602539062, 'learning_rate': 3.458948134082646e-06, 'epoch': 1.92}
64%|██████▍ | 7374/11526 [1:17:05<42:29, 1.63it/s] 64%|██████▍ | 7375/11526 [1:17:06<42:30, 1.63it/s] {'loss': 0.2229, 'grad_norm': 0.5647424459457397, 'learning_rate': 3.4575076117285975e-06, 'epoch': 1.92}
64%|██████▍ | 7375/11526 [1:17:06<42:30, 1.63it/s] 64%|██████▍ | 7376/11526 [1:17:06<42:29, 1.63it/s] {'loss': 0.1496, 'grad_norm': 0.42993277311325073, 'learning_rate': 3.456067230860728e-06, 'epoch': 1.92}
64%|██████▍ | 7376/11526 [1:17:07<42:29, 1.63it/s] 64%|██████▍ | 7377/11526 [1:17:07<42:29, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.5654239058494568, 'learning_rate': 3.4546269916111547e-06, 'epoch': 1.92}
64%|██████▍ | 7377/11526 [1:17:07<42:29, 1.63it/s] 64%|██████▍ | 7378/11526 [1:17:08<42:29, 1.63it/s] {'loss': 0.1933, 'grad_norm': 0.5128702521324158, 'learning_rate': 3.453186894111986e-06, 'epoch': 1.92}
64%|██████▍ | 7378/11526 [1:17:08<42:29, 1.63it/s] 64%|██████▍ | 7379/11526 [1:17:08<42:28, 1.63it/s] {'loss': 0.2075, 'grad_norm': 0.574367344379425, 'learning_rate': 3.4517469384953183e-06, 'epoch': 1.92}
64%|██████▍ | 7379/11526 [1:17:08<42:28, 1.63it/s] 64%|██████▍ | 7380/11526 [1:17:09<42:29, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.46286430954933167, 'learning_rate': 3.450307124893231e-06, 'epoch': 1.92}
64%|██████▍ | 7380/11526 [1:17:09<42:29, 1.63it/s] 64%|██████▍ | 7381/11526 [1:17:09<42:28, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.49174630641937256, 'learning_rate': 3.4488674534377895e-06, 'epoch': 1.92}
64%|██████▍ | 7381/11526 [1:17:10<42:28, 1.63it/s] 64%|██████▍ | 7382/11526 [1:17:10<42:25, 1.63it/s] {'loss': 0.2557, 'grad_norm': 0.7017353177070618, 'learning_rate': 3.4474279242610504e-06, 'epoch': 1.92}
64%|██████▍ | 7382/11526 [1:17:10<42:25, 1.63it/s] 64%|██████▍ | 7383/11526 [1:17:11<42:24, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.5534937977790833, 'learning_rate': 3.4459885374950577e-06, 'epoch': 1.92}
64%|██████▍ | 7383/11526 [1:17:11<42:24, 1.63it/s] 64%|██████▍ | 7384/11526 [1:17:11<42:27, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.4596095085144043, 'learning_rate': 3.4445492932718372e-06, 'epoch': 1.92}
64%|██████▍ | 7384/11526 [1:17:11<42:27, 1.63it/s] 64%|██████▍ | 7385/11526 [1:17:12<42:25, 1.63it/s] {'loss': 0.1974, 'grad_norm': 0.53948974609375, 'learning_rate': 3.443110191723407e-06, 'epoch': 1.92}
64%|██████▍ | 7385/11526 [1:17:12<42:25, 1.63it/s] 64%|██████▍ | 7386/11526 [1:17:13<42:24, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.5168489813804626, 'learning_rate': 3.4416712329817686e-06, 'epoch': 1.92}
64%|██████▍ | 7386/11526 [1:17:13<42:24, 1.63it/s] 64%|██████▍ | 7387/11526 [1:17:13<42:24, 1.63it/s] {'loss': 0.2511, 'grad_norm': 0.5916317701339722, 'learning_rate': 3.4402324171789093e-06, 'epoch': 1.92}
64%|██████▍ | 7387/11526 [1:17:13<42:24, 1.63it/s] 64%|██████▍ | 7388/11526 [1:17:14<42:22, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5068082213401794, 'learning_rate': 3.438793744446808e-06, 'epoch': 1.92}
64%|██████▍ | 7388/11526 [1:17:14<42:22, 1.63it/s] 64%|██████▍ | 7389/11526 [1:17:14<42:21, 1.63it/s] {'loss': 0.2473, 'grad_norm': 0.7148132920265198, 'learning_rate': 3.4373552149174284e-06, 'epoch': 1.92}
64%|██████▍ | 7389/11526 [1:17:15<42:21, 1.63it/s] 64%|██████▍ | 7390/11526 [1:17:15<42:20, 1.63it/s] {'loss': 0.2857, 'grad_norm': 0.6930810213088989, 'learning_rate': 3.435916828722719e-06, 'epoch': 1.92}
64%|██████▍ | 7390/11526 [1:17:15<42:20, 1.63it/s] 64%|██████▍ | 7391/11526 [1:17:16<42:19, 1.63it/s] {'loss': 0.1778, 'grad_norm': 0.5693141222000122, 'learning_rate': 3.434478585994616e-06, 'epoch': 1.92}
64%|██████▍ | 7391/11526 [1:17:16<42:19, 1.63it/s] 64%|██████▍ | 7392/11526 [1:17:16<42:19, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.5099214315414429, 'learning_rate': 3.4330404868650456e-06, 'epoch': 1.92}
64%|██████▍ | 7392/11526 [1:17:16<42:19, 1.63it/s] 64%|██████▍ | 7393/11526 [1:17:17<42:19, 1.63it/s] {'loss': 0.1544, 'grad_norm': 0.44530975818634033, 'learning_rate': 3.431602531465918e-06, 'epoch': 1.92}
64%|██████▍ | 7393/11526 [1:17:17<42:19, 1.63it/s] 64%|██████▍ | 7394/11526 [1:17:17<42:18, 1.63it/s] {'loss': 0.1914, 'grad_norm': 0.5959989428520203, 'learning_rate': 3.4301647199291297e-06, 'epoch': 1.92}
64%|██████▍ | 7394/11526 [1:17:18<42:18, 1.63it/s] 64%|██████▍ | 7395/11526 [1:17:18<42:16, 1.63it/s] {'loss': 0.2086, 'grad_norm': 0.5866818428039551, 'learning_rate': 3.4287270523865657e-06, 'epoch': 1.92}
64%|██████▍ | 7395/11526 [1:17:18<42:16, 1.63it/s] 64%|██████▍ | 7396/11526 [1:17:19<42:17, 1.63it/s] {'loss': 0.2031, 'grad_norm': 0.5811987519264221, 'learning_rate': 3.427289528970095e-06, 'epoch': 1.93}
64%|██████▍ | 7396/11526 [1:17:19<42:17, 1.63it/s] 64%|██████▍ | 7397/11526 [1:17:19<42:17, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.5544639229774475, 'learning_rate': 3.4258521498115783e-06, 'epoch': 1.93}
64%|██████▍ | 7397/11526 [1:17:19<42:17, 1.63it/s] 64%|██████▍ | 7398/11526 [1:17:20<42:15, 1.63it/s] {'loss': 0.2137, 'grad_norm': 0.6521393060684204, 'learning_rate': 3.4244149150428596e-06, 'epoch': 1.93}
64%|██████▍ | 7398/11526 [1:17:20<42:15, 1.63it/s] 64%|██████▍ | 7399/11526 [1:17:21<42:25, 1.62it/s] {'loss': 0.2181, 'grad_norm': 0.48963066935539246, 'learning_rate': 3.422977824795768e-06, 'epoch': 1.93}
64%|██████▍ | 7399/11526 [1:17:21<42:25, 1.62it/s] 64%|██████▍ | 7400/11526 [1:17:21<42:20, 1.62it/s] {'loss': 0.1958, 'grad_norm': 0.5834378004074097, 'learning_rate': 3.4215408792021227e-06, 'epoch': 1.93}
64%|██████▍ | 7400/11526 [1:17:21<42:20, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5210646986961365, 'eval_runtime': 1.9545, 'eval_samples_per_second': 102.33, 'eval_steps_per_second': 6.651, 'epoch': 1.93}
64%|██████▍ | 7400/11526 [1:17:23<42:20, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 64%|██████▍ | 7401/11526 [1:17:24<1:22:41, 1.20s/it] {'loss': 0.2332, 'grad_norm': 0.5813145637512207, 'learning_rate': 3.4201040783937313e-06, 'epoch': 1.93}
64%|██████▍ | 7401/11526 [1:17:24<1:22:41, 1.20s/it] 64%|██████▍ | 7402/11526 [1:17:24<1:10:31, 1.03s/it] {'loss': 0.2324, 'grad_norm': 0.598505973815918, 'learning_rate': 3.4186674225023826e-06, 'epoch': 1.93}
64%|██████▍ | 7402/11526 [1:17:24<1:10:31, 1.03s/it] 64%|██████▍ | 7403/11526 [1:17:25<1:01:59, 1.11it/s] {'loss': 0.1588, 'grad_norm': 0.43500760197639465, 'learning_rate': 3.4172309116598544e-06, 'epoch': 1.93}
64%|██████▍ | 7403/11526 [1:17:25<1:01:59, 1.11it/s] 64%|██████▍ | 7404/11526 [1:17:26<56:04, 1.23it/s] {'loss': 0.1809, 'grad_norm': 0.5958430767059326, 'learning_rate': 3.4157945459979115e-06, 'epoch': 1.93}
64%|██████▍ | 7404/11526 [1:17:26<56:04, 1.23it/s] 64%|██████▍ | 7405/11526 [1:17:26<51:53, 1.32it/s] {'loss': 0.1505, 'grad_norm': 0.47519737482070923, 'learning_rate': 3.4143583256483086e-06, 'epoch': 1.93}
64%|██████▍ | 7405/11526 [1:17:26<51:53, 1.32it/s] 64%|██████▍ | 7406/11526 [1:17:27<48:56, 1.40it/s] {'loss': 0.1968, 'grad_norm': 0.5371419787406921, 'learning_rate': 3.4129222507427822e-06, 'epoch': 1.93}
64%|██████▍ | 7406/11526 [1:17:27<48:56, 1.40it/s] 64%|██████▍ | 7407/11526 [1:17:27<46:54, 1.46it/s] {'loss': 0.2198, 'grad_norm': 0.5737542510032654, 'learning_rate': 3.4114863214130557e-06, 'epoch': 1.93}
64%|██████▍ | 7407/11526 [1:17:28<46:54, 1.46it/s] 64%|██████▍ | 7408/11526 [1:17:28<45:27, 1.51it/s] {'loss': 0.2539, 'grad_norm': 0.6893150806427002, 'learning_rate': 3.410050537790843e-06, 'epoch': 1.93}
64%|██████▍ | 7408/11526 [1:17:28<45:27, 1.51it/s] 64%|██████▍ | 7409/11526 [1:17:29<44:31, 1.54it/s] {'loss': 0.2314, 'grad_norm': 0.585308313369751, 'learning_rate': 3.4086149000078407e-06, 'epoch': 1.93}
64%|██████▍ | 7409/11526 [1:17:29<44:31, 1.54it/s] 64%|██████▍ | 7410/11526 [1:17:29<43:47, 1.57it/s] {'loss': 0.1844, 'grad_norm': 0.5070647597312927, 'learning_rate': 3.4071794081957354e-06, 'epoch': 1.93}
64%|██████▍ | 7410/11526 [1:17:29<43:47, 1.57it/s] 64%|██████▍ | 7411/11526 [1:17:30<43:28, 1.58it/s] {'loss': 0.1828, 'grad_norm': 0.6332312822341919, 'learning_rate': 3.405744062486196e-06, 'epoch': 1.93}
64%|██████▍ | 7411/11526 [1:17:30<43:28, 1.58it/s] 64%|██████▍ | 7412/11526 [1:17:30<43:02, 1.59it/s] {'loss': 0.1871, 'grad_norm': 0.5661672949790955, 'learning_rate': 3.4043088630108838e-06, 'epoch': 1.93}
64%|██████▍ | 7412/11526 [1:17:31<43:02, 1.59it/s] 64%|██████▍ | 7413/11526 [1:17:31<42:46, 1.60it/s] {'loss': 0.1724, 'grad_norm': 0.5217599272727966, 'learning_rate': 3.4028738099014402e-06, 'epoch': 1.93}
64%|██████▍ | 7413/11526 [1:17:31<42:46, 1.60it/s] 64%|██████▍ | 7414/11526 [1:17:32<42:37, 1.61it/s] {'loss': 0.2515, 'grad_norm': 0.6906412839889526, 'learning_rate': 3.401438903289499e-06, 'epoch': 1.93}
64%|██████▍ | 7414/11526 [1:17:32<42:37, 1.61it/s] 64%|██████▍ | 7415/11526 [1:17:32<42:25, 1.61it/s] {'loss': 0.2439, 'grad_norm': 0.6421289443969727, 'learning_rate': 3.4000041433066767e-06, 'epoch': 1.93}
64%|██████▍ | 7415/11526 [1:17:32<42:25, 1.61it/s] 64%|██████▍ | 7416/11526 [1:17:33<42:18, 1.62it/s] {'loss': 0.2008, 'grad_norm': 0.554893434047699, 'learning_rate': 3.398569530084579e-06, 'epoch': 1.93}
64%|██████▍ | 7416/11526 [1:17:33<42:18, 1.62it/s] 64%|██████▍ | 7417/11526 [1:17:34<42:13, 1.62it/s] {'loss': 0.1972, 'grad_norm': 0.5599725246429443, 'learning_rate': 3.3971350637547956e-06, 'epoch': 1.93}
64%|██████▍ | 7417/11526 [1:17:34<42:13, 1.62it/s] 64%|██████▍ | 7418/11526 [1:17:34<42:10, 1.62it/s] {'loss': 0.1944, 'grad_norm': 0.5591024160385132, 'learning_rate': 3.395700744448902e-06, 'epoch': 1.93}
64%|██████▍ | 7418/11526 [1:17:34<42:10, 1.62it/s] 64%|██████▍ | 7419/11526 [1:17:35<42:14, 1.62it/s] {'loss': 0.182, 'grad_norm': 0.5523084998130798, 'learning_rate': 3.3942665722984653e-06, 'epoch': 1.93}
64%|██████▍ | 7419/11526 [1:17:35<42:14, 1.62it/s] 64%|██████▍ | 7420/11526 [1:17:35<42:11, 1.62it/s] {'loss': 0.2475, 'grad_norm': 0.6597966551780701, 'learning_rate': 3.3928325474350352e-06, 'epoch': 1.93}
64%|██████▍ | 7420/11526 [1:17:36<42:11, 1.62it/s] 64%|██████▍ | 7421/11526 [1:17:36<42:08, 1.62it/s] {'loss': 0.2227, 'grad_norm': 0.5684273838996887, 'learning_rate': 3.3913986699901497e-06, 'epoch': 1.93}
64%|██████▍ | 7421/11526 [1:17:36<42:08, 1.62it/s] 64%|██████▍ | 7422/11526 [1:17:37<42:04, 1.63it/s] {'loss': 0.1967, 'grad_norm': 0.5563762187957764, 'learning_rate': 3.3899649400953277e-06, 'epoch': 1.93}
64%|██████▍ | 7422/11526 [1:17:37<42:04, 1.63it/s] 64%|██████▍ | 7423/11526 [1:17:37<42:03, 1.63it/s] {'loss': 0.2, 'grad_norm': 0.4881194829940796, 'learning_rate': 3.388531357882083e-06, 'epoch': 1.93}
64%|██████▍ | 7423/11526 [1:17:37<42:03, 1.63it/s] 64%|██████▍ | 7424/11526 [1:17:38<42:15, 1.62it/s] {'loss': 0.221, 'grad_norm': 0.5263651013374329, 'learning_rate': 3.3870979234819125e-06, 'epoch': 1.93}
64%|██████▍ | 7424/11526 [1:17:38<42:15, 1.62it/s] 64%|██████▍ | 7425/11526 [1:17:38<42:11, 1.62it/s] {'loss': 0.1886, 'grad_norm': 0.5126994252204895, 'learning_rate': 3.385664637026298e-06, 'epoch': 1.93}
64%|██████▍ | 7425/11526 [1:17:39<42:11, 1.62it/s] 64%|██████▍ | 7426/11526 [1:17:39<42:06, 1.62it/s] {'loss': 0.2204, 'grad_norm': 0.5870233178138733, 'learning_rate': 3.384231498646706e-06, 'epoch': 1.93}
64%|██████▍ | 7426/11526 [1:17:39<42:06, 1.62it/s] 64%|██████▍ | 7427/11526 [1:17:40<42:04, 1.62it/s] {'loss': 0.1613, 'grad_norm': 0.48144158720970154, 'learning_rate': 3.382798508474594e-06, 'epoch': 1.93}
64%|██████▍ | 7427/11526 [1:17:40<42:04, 1.62it/s] 64%|██████▍ | 7428/11526 [1:17:40<42:02, 1.62it/s] {'loss': 0.1875, 'grad_norm': 0.6069017052650452, 'learning_rate': 3.3813656666414065e-06, 'epoch': 1.93}
64%|██████▍ | 7428/11526 [1:17:40<42:02, 1.62it/s] 64%|██████▍ | 7429/11526 [1:17:41<42:03, 1.62it/s] {'loss': 0.1898, 'grad_norm': 0.5271340012550354, 'learning_rate': 3.379932973278569e-06, 'epoch': 1.93}
64%|██████▍ | 7429/11526 [1:17:41<42:03, 1.62it/s] 64%|██████▍ | 7430/11526 [1:17:42<42:01, 1.62it/s] {'loss': 0.1442, 'grad_norm': 0.4334936738014221, 'learning_rate': 3.378500428517496e-06, 'epoch': 1.93}
64%|██████▍ | 7430/11526 [1:17:42<42:01, 1.62it/s] 64%|██████▍ | 7431/11526 [1:17:42<41:58, 1.63it/s] {'loss': 0.2301, 'grad_norm': 0.546100914478302, 'learning_rate': 3.377068032489589e-06, 'epoch': 1.93}
64%|██████▍ | 7431/11526 [1:17:42<41:58, 1.63it/s] 64%|██████▍ | 7432/11526 [1:17:43<41:56, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.43188154697418213, 'learning_rate': 3.3756357853262386e-06, 'epoch': 1.93}
64%|██████▍ | 7432/11526 [1:17:43<41:56, 1.63it/s] 64%|██████▍ | 7433/11526 [1:17:43<41:54, 1.63it/s] {'loss': 0.1701, 'grad_norm': 0.49406683444976807, 'learning_rate': 3.374203687158816e-06, 'epoch': 1.93}
64%|██████▍ | 7433/11526 [1:17:44<41:54, 1.63it/s] 64%|██████▍ | 7434/11526 [1:17:44<41:55, 1.63it/s] {'loss': 0.2827, 'grad_norm': 0.6761515736579895, 'learning_rate': 3.37277173811868e-06, 'epoch': 1.93}
64%|██████▍ | 7434/11526 [1:17:44<41:55, 1.63it/s] 65%|██████▍ | 7435/11526 [1:17:45<41:53, 1.63it/s] {'loss': 0.1886, 'grad_norm': 0.4794254004955292, 'learning_rate': 3.37133993833718e-06, 'epoch': 1.94}
65%|██████▍ | 7435/11526 [1:17:45<41:53, 1.63it/s] 65%|██████▍ | 7436/11526 [1:17:45<41:53, 1.63it/s] {'loss': 0.1724, 'grad_norm': 0.5095937848091125, 'learning_rate': 3.3699082879456456e-06, 'epoch': 1.94}
65%|██████▍ | 7436/11526 [1:17:45<41:53, 1.63it/s] 65%|██████▍ | 7437/11526 [1:17:46<41:51, 1.63it/s] {'loss': 0.1232, 'grad_norm': 0.4042714834213257, 'learning_rate': 3.3684767870753997e-06, 'epoch': 1.94}
65%|██████▍ | 7437/11526 [1:17:46<41:51, 1.63it/s] 65%|██████▍ | 7438/11526 [1:17:46<41:50, 1.63it/s] {'loss': 0.1825, 'grad_norm': 0.490991473197937, 'learning_rate': 3.3670454358577452e-06, 'epoch': 1.94}
65%|██████▍ | 7438/11526 [1:17:47<41:50, 1.63it/s] 65%|██████▍ | 7439/11526 [1:17:47<41:52, 1.63it/s] {'loss': 0.2057, 'grad_norm': 0.6300304532051086, 'learning_rate': 3.3656142344239758e-06, 'epoch': 1.94}
65%|██████▍ | 7439/11526 [1:17:47<41:52, 1.63it/s] 65%|██████▍ | 7440/11526 [1:17:48<41:50, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.4662487208843231, 'learning_rate': 3.3641831829053652e-06, 'epoch': 1.94}
65%|██████▍ | 7440/11526 [1:17:48<41:50, 1.63it/s] 65%|██████▍ | 7441/11526 [1:17:48<41:49, 1.63it/s] {'loss': 0.187, 'grad_norm': 0.5440546870231628, 'learning_rate': 3.362752281433183e-06, 'epoch': 1.94}
65%|██████▍ | 7441/11526 [1:17:48<41:49, 1.63it/s] 65%|██████▍ | 7442/11526 [1:17:49<41:48, 1.63it/s] {'loss': 0.1994, 'grad_norm': 0.6161431074142456, 'learning_rate': 3.361321530138676e-06, 'epoch': 1.94}
65%|██████▍ | 7442/11526 [1:17:49<41:48, 1.63it/s] 65%|██████▍ | 7443/11526 [1:17:50<41:48, 1.63it/s] {'loss': 0.195, 'grad_norm': 0.49703025817871094, 'learning_rate': 3.3598909291530837e-06, 'epoch': 1.94}
65%|██████▍ | 7443/11526 [1:17:50<41:48, 1.63it/s] 65%|██████▍ | 7444/11526 [1:17:50<41:47, 1.63it/s] {'loss': 0.2411, 'grad_norm': 0.5754651427268982, 'learning_rate': 3.3584604786076247e-06, 'epoch': 1.94}
65%|██████▍ | 7444/11526 [1:17:50<41:47, 1.63it/s] 65%|██████▍ | 7445/11526 [1:17:51<41:47, 1.63it/s] {'loss': 0.2303, 'grad_norm': 0.6313542127609253, 'learning_rate': 3.3570301786335114e-06, 'epoch': 1.94}
65%|██████▍ | 7445/11526 [1:17:51<41:47, 1.63it/s] 65%|██████▍ | 7446/11526 [1:17:51<41:47, 1.63it/s] {'loss': 0.2887, 'grad_norm': 0.6338769197463989, 'learning_rate': 3.3556000293619386e-06, 'epoch': 1.94}
65%|██████▍ | 7446/11526 [1:17:52<41:47, 1.63it/s] 65%|██████▍ | 7447/11526 [1:17:52<41:46, 1.63it/s] {'loss': 0.2333, 'grad_norm': 0.5903488993644714, 'learning_rate': 3.3541700309240875e-06, 'epoch': 1.94}
65%|██████▍ | 7447/11526 [1:17:52<41:46, 1.63it/s] 65%|██████▍ | 7448/11526 [1:17:53<41:45, 1.63it/s] {'loss': 0.1707, 'grad_norm': 0.49006566405296326, 'learning_rate': 3.3527401834511254e-06, 'epoch': 1.94}
65%|██████▍ | 7448/11526 [1:17:53<41:45, 1.63it/s] 65%|██████▍ | 7449/11526 [1:17:53<41:47, 1.63it/s] {'loss': 0.1437, 'grad_norm': 0.40828442573547363, 'learning_rate': 3.351310487074205e-06, 'epoch': 1.94}
65%|██████▍ | 7449/11526 [1:17:53<41:47, 1.63it/s] 65%|██████▍ | 7450/11526 [1:17:54<41:45, 1.63it/s] {'loss': 0.2199, 'grad_norm': 0.5882642269134521, 'learning_rate': 3.349880941924469e-06, 'epoch': 1.94}
65%|██████▍ | 7450/11526 [1:17:54<41:45, 1.63it/s] 65%|██████▍ | 7451/11526 [1:17:54<41:44, 1.63it/s] {'loss': 0.2051, 'grad_norm': 0.5476576685905457, 'learning_rate': 3.3484515481330416e-06, 'epoch': 1.94}
65%|██████▍ | 7451/11526 [1:17:55<41:44, 1.63it/s] 65%|██████▍ | 7452/11526 [1:17:55<41:43, 1.63it/s] {'loss': 0.1714, 'grad_norm': 0.5229399800300598, 'learning_rate': 3.347022305831035e-06, 'epoch': 1.94}
65%|██████▍ | 7452/11526 [1:17:55<41:43, 1.63it/s] 65%|██████▍ | 7453/11526 [1:17:56<41:41, 1.63it/s] {'loss': 0.1979, 'grad_norm': 0.5286787152290344, 'learning_rate': 3.3455932151495465e-06, 'epoch': 1.94}
65%|██████▍ | 7453/11526 [1:17:56<41:41, 1.63it/s] 65%|██████▍ | 7454/11526 [1:17:56<41:42, 1.63it/s] {'loss': 0.1408, 'grad_norm': 0.4174371361732483, 'learning_rate': 3.3441642762196624e-06, 'epoch': 1.94}
65%|██████▍ | 7454/11526 [1:17:56<41:42, 1.63it/s] 65%|██████▍ | 7455/11526 [1:17:57<41:41, 1.63it/s] {'loss': 0.1991, 'grad_norm': 0.6073282361030579, 'learning_rate': 3.342735489172453e-06, 'epoch': 1.94}
65%|██████▍ | 7455/11526 [1:17:57<41:41, 1.63it/s] 65%|██████▍ | 7456/11526 [1:17:58<41:40, 1.63it/s] {'loss': 0.2015, 'grad_norm': 0.5311110615730286, 'learning_rate': 3.3413068541389725e-06, 'epoch': 1.94}
65%|██████▍ | 7456/11526 [1:17:58<41:40, 1.63it/s] 65%|██████▍ | 7457/11526 [1:17:58<41:39, 1.63it/s] {'loss': 0.1193, 'grad_norm': 0.37428516149520874, 'learning_rate': 3.3398783712502656e-06, 'epoch': 1.94}
65%|██████▍ | 7457/11526 [1:17:58<41:39, 1.63it/s] 65%|██████▍ | 7458/11526 [1:17:59<41:39, 1.63it/s] {'loss': 0.1556, 'grad_norm': 0.45990559458732605, 'learning_rate': 3.338450040637359e-06, 'epoch': 1.94}
65%|██████▍ | 7458/11526 [1:17:59<41:39, 1.63it/s] 65%|██████▍ | 7459/11526 [1:17:59<41:40, 1.63it/s] {'loss': 0.2059, 'grad_norm': 0.5263742208480835, 'learning_rate': 3.337021862431271e-06, 'epoch': 1.94}
65%|██████▍ | 7459/11526 [1:18:00<41:40, 1.63it/s] 65%|██████▍ | 7460/11526 [1:18:00<41:39, 1.63it/s] {'loss': 0.2115, 'grad_norm': 0.5624071955680847, 'learning_rate': 3.335593836762998e-06, 'epoch': 1.94}
65%|██████▍ | 7460/11526 [1:18:00<41:39, 1.63it/s] 65%|██████▍ | 7461/11526 [1:18:01<41:38, 1.63it/s] {'loss': 0.2122, 'grad_norm': 0.5538498163223267, 'learning_rate': 3.334165963763529e-06, 'epoch': 1.94}
65%|██████▍ | 7461/11526 [1:18:01<41:38, 1.63it/s] 65%|██████▍ | 7462/11526 [1:18:01<41:38, 1.63it/s] {'loss': 0.2549, 'grad_norm': 0.6089410185813904, 'learning_rate': 3.3327382435638345e-06, 'epoch': 1.94}
65%|██████▍ | 7462/11526 [1:18:01<41:38, 1.63it/s] 65%|██████▍ | 7463/11526 [1:18:02<41:44, 1.62it/s] {'loss': 0.153, 'grad_norm': 0.4851757884025574, 'learning_rate': 3.3313106762948767e-06, 'epoch': 1.94}
65%|██████▍ | 7463/11526 [1:18:02<41:44, 1.62it/s] 65%|██████▍ | 7464/11526 [1:18:02<41:43, 1.62it/s] {'loss': 0.2292, 'grad_norm': 0.6850539445877075, 'learning_rate': 3.329883262087597e-06, 'epoch': 1.94}
65%|██████▍ | 7464/11526 [1:18:03<41:43, 1.62it/s] 65%|██████▍ | 7465/11526 [1:18:03<41:40, 1.62it/s] {'loss': 0.2405, 'grad_norm': 0.5607638359069824, 'learning_rate': 3.328456001072928e-06, 'epoch': 1.94}
65%|██████▍ | 7465/11526 [1:18:03<41:40, 1.62it/s] 65%|██████▍ | 7466/11526 [1:18:04<41:38, 1.62it/s] {'loss': 0.1835, 'grad_norm': 0.5450652837753296, 'learning_rate': 3.327028893381785e-06, 'epoch': 1.94}
65%|██████▍ | 7466/11526 [1:18:04<41:38, 1.62it/s] 65%|██████▍ | 7467/11526 [1:18:04<41:36, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.49002805352211, 'learning_rate': 3.3256019391450696e-06, 'epoch': 1.94}
65%|██████▍ | 7467/11526 [1:18:04<41:36, 1.63it/s] 65%|██████▍ | 7468/11526 [1:18:05<41:35, 1.63it/s] {'loss': 0.1788, 'grad_norm': 0.5613922476768494, 'learning_rate': 3.3241751384936717e-06, 'epoch': 1.94}
65%|██████▍ | 7468/11526 [1:18:05<41:35, 1.63it/s] 65%|██████▍ | 7469/11526 [1:18:06<41:36, 1.63it/s] {'loss': 0.2035, 'grad_norm': 0.5241309404373169, 'learning_rate': 3.3227484915584664e-06, 'epoch': 1.94}
65%|██████▍ | 7469/11526 [1:18:06<41:36, 1.63it/s] 65%|██████▍ | 7470/11526 [1:18:06<41:34, 1.63it/s] {'loss': 0.2036, 'grad_norm': 0.5337450504302979, 'learning_rate': 3.321321998470312e-06, 'epoch': 1.94}
65%|██████▍ | 7470/11526 [1:18:06<41:34, 1.63it/s] 65%|██████▍ | 7471/11526 [1:18:07<41:32, 1.63it/s] {'loss': 0.151, 'grad_norm': 0.44568055868148804, 'learning_rate': 3.319895659360054e-06, 'epoch': 1.94}
65%|██████▍ | 7471/11526 [1:18:07<41:32, 1.63it/s] 65%|██████▍ | 7472/11526 [1:18:07<41:29, 1.63it/s] {'loss': 0.1921, 'grad_norm': 0.480472207069397, 'learning_rate': 3.3184694743585254e-06, 'epoch': 1.94}
65%|██████▍ | 7472/11526 [1:18:08<41:29, 1.63it/s] 65%|██████▍ | 7473/11526 [1:18:08<41:29, 1.63it/s] {'loss': 0.2628, 'grad_norm': 0.6547864675521851, 'learning_rate': 3.317043443596546e-06, 'epoch': 1.95}
65%|██████▍ | 7473/11526 [1:18:08<41:29, 1.63it/s] 65%|██████▍ | 7474/11526 [1:18:09<41:30, 1.63it/s] {'loss': 0.1597, 'grad_norm': 0.47993186116218567, 'learning_rate': 3.3156175672049175e-06, 'epoch': 1.95}
65%|██████▍ | 7474/11526 [1:18:09<41:30, 1.63it/s] 65%|██████▍ | 7475/11526 [1:18:09<41:29, 1.63it/s] {'loss': 0.2014, 'grad_norm': 0.5144858956336975, 'learning_rate': 3.3141918453144283e-06, 'epoch': 1.95}
65%|██████▍ | 7475/11526 [1:18:09<41:29, 1.63it/s] 65%|██████▍ | 7476/11526 [1:18:10<41:28, 1.63it/s] {'loss': 0.2127, 'grad_norm': 0.5714917182922363, 'learning_rate': 3.3127662780558543e-06, 'epoch': 1.95}
65%|██████▍ | 7476/11526 [1:18:10<41:28, 1.63it/s] 65%|██████▍ | 7477/11526 [1:18:10<41:28, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.5245741009712219, 'learning_rate': 3.311340865559959e-06, 'epoch': 1.95}
65%|██████▍ | 7477/11526 [1:18:11<41:28, 1.63it/s] 65%|██████▍ | 7478/11526 [1:18:11<41:28, 1.63it/s] {'loss': 0.1997, 'grad_norm': 0.5662109851837158, 'learning_rate': 3.309915607957487e-06, 'epoch': 1.95}
65%|██████▍ | 7478/11526 [1:18:11<41:28, 1.63it/s] 65%|██████▍ | 7479/11526 [1:18:12<41:30, 1.63it/s] {'loss': 0.174, 'grad_norm': 0.4688761830329895, 'learning_rate': 3.3084905053791717e-06, 'epoch': 1.95}
65%|██████▍ | 7479/11526 [1:18:12<41:30, 1.63it/s] 65%|██████▍ | 7480/11526 [1:18:12<41:26, 1.63it/s] {'loss': 0.1583, 'grad_norm': 0.4919617474079132, 'learning_rate': 3.3070655579557297e-06, 'epoch': 1.95}
65%|██████▍ | 7480/11526 [1:18:12<41:26, 1.63it/s] 65%|██████▍ | 7481/11526 [1:18:13<41:26, 1.63it/s] {'loss': 0.217, 'grad_norm': 0.5438754558563232, 'learning_rate': 3.305640765817869e-06, 'epoch': 1.95}
65%|██████▍ | 7481/11526 [1:18:13<41:26, 1.63it/s] 65%|██████▍ | 7482/11526 [1:18:14<41:24, 1.63it/s] {'loss': 0.2883, 'grad_norm': 0.6869872212409973, 'learning_rate': 3.304216129096278e-06, 'epoch': 1.95}
65%|██████▍ | 7482/11526 [1:18:14<41:24, 1.63it/s] 65%|██████▍ | 7483/11526 [1:18:14<41:22, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.6049967408180237, 'learning_rate': 3.3027916479216322e-06, 'epoch': 1.95}
65%|██████▍ | 7483/11526 [1:18:14<41:22, 1.63it/s] 65%|██████▍ | 7484/11526 [1:18:15<41:24, 1.63it/s] {'loss': 0.2513, 'grad_norm': 0.6424980163574219, 'learning_rate': 3.3013673224245936e-06, 'epoch': 1.95}
65%|██████▍ | 7484/11526 [1:18:15<41:24, 1.63it/s] 65%|██████▍ | 7485/11526 [1:18:15<41:22, 1.63it/s] {'loss': 0.182, 'grad_norm': 0.4755328595638275, 'learning_rate': 3.2999431527358063e-06, 'epoch': 1.95}
65%|██████▍ | 7485/11526 [1:18:16<41:22, 1.63it/s] 65%|██████▍ | 7486/11526 [1:18:16<41:21, 1.63it/s] {'loss': 0.1697, 'grad_norm': 0.5260552763938904, 'learning_rate': 3.2985191389859083e-06, 'epoch': 1.95}
65%|██████▍ | 7486/11526 [1:18:16<41:21, 1.63it/s] 65%|██████▍ | 7487/11526 [1:18:17<41:19, 1.63it/s] {'loss': 0.177, 'grad_norm': 0.5399602651596069, 'learning_rate': 3.2970952813055153e-06, 'epoch': 1.95}
65%|██████▍ | 7487/11526 [1:18:17<41:19, 1.63it/s] 65%|██████▍ | 7488/11526 [1:18:17<41:18, 1.63it/s] {'loss': 0.2477, 'grad_norm': 0.6257007718086243, 'learning_rate': 3.2956715798252324e-06, 'epoch': 1.95}
65%|██████▍ | 7488/11526 [1:18:17<41:18, 1.63it/s] 65%|██████▍ | 7489/11526 [1:18:18<41:20, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.5747504830360413, 'learning_rate': 3.2942480346756482e-06, 'epoch': 1.95}
65%|██████▍ | 7489/11526 [1:18:18<41:20, 1.63it/s] 65%|██████▍ | 7490/11526 [1:18:18<41:18, 1.63it/s] {'loss': 0.1754, 'grad_norm': 0.4968256652355194, 'learning_rate': 3.2928246459873414e-06, 'epoch': 1.95}
65%|██████▍ | 7490/11526 [1:18:19<41:18, 1.63it/s] 65%|██████▍ | 7491/11526 [1:18:19<41:18, 1.63it/s] {'loss': 0.2295, 'grad_norm': 0.6135972738265991, 'learning_rate': 3.2914014138908706e-06, 'epoch': 1.95}
65%|██████▍ | 7491/11526 [1:18:19<41:18, 1.63it/s] 65%|██████▌ | 7492/11526 [1:18:20<41:17, 1.63it/s] {'loss': 0.2273, 'grad_norm': 0.6180579662322998, 'learning_rate': 3.289978338516785e-06, 'epoch': 1.95}
65%|██████▌ | 7492/11526 [1:18:20<41:17, 1.63it/s] 65%|██████▌ | 7493/11526 [1:18:20<41:16, 1.63it/s] {'loss': 0.2071, 'grad_norm': 0.4777679741382599, 'learning_rate': 3.2885554199956147e-06, 'epoch': 1.95}
65%|██████▌ | 7493/11526 [1:18:20<41:16, 1.63it/s] 65%|██████▌ | 7494/11526 [1:18:21<41:19, 1.63it/s] {'loss': 0.1706, 'grad_norm': 0.46959590911865234, 'learning_rate': 3.2871326584578776e-06, 'epoch': 1.95}
65%|██████▌ | 7494/11526 [1:18:21<41:19, 1.63it/s] 65%|██████▌ | 7495/11526 [1:18:22<41:18, 1.63it/s] {'loss': 0.1886, 'grad_norm': 0.49501797556877136, 'learning_rate': 3.2857100540340804e-06, 'epoch': 1.95}
65%|██████▌ | 7495/11526 [1:18:22<41:18, 1.63it/s] 65%|██████▌ | 7496/11526 [1:18:22<41:17, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.5242252349853516, 'learning_rate': 3.2842876068547115e-06, 'epoch': 1.95}
65%|██████▌ | 7496/11526 [1:18:22<41:17, 1.63it/s] 65%|██████▌ | 7497/11526 [1:18:23<41:15, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.5442031621932983, 'learning_rate': 3.282865317050245e-06, 'epoch': 1.95}
65%|██████▌ | 7497/11526 [1:18:23<41:15, 1.63it/s] 65%|██████▌ | 7498/11526 [1:18:23<41:13, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.37216269969940186, 'learning_rate': 3.2814431847511395e-06, 'epoch': 1.95}
65%|██████▌ | 7498/11526 [1:18:23<41:13, 1.63it/s] 65%|██████▌ | 7499/11526 [1:18:24<41:17, 1.63it/s] {'loss': 0.2417, 'grad_norm': 0.5259295105934143, 'learning_rate': 3.2800212100878454e-06, 'epoch': 1.95}
65%|██████▌ | 7499/11526 [1:18:24<41:17, 1.63it/s] 65%|██████▌ | 7500/11526 [1:18:25<41:15, 1.63it/s] {'loss': 0.2013, 'grad_norm': 0.5586903691291809, 'learning_rate': 3.278599393190792e-06, 'epoch': 1.95}
65%|██████▌ | 7500/11526 [1:18:25<41:15, 1.63it/s] 65%|██████▌ | 7501/11526 [1:18:25<41:14, 1.63it/s] {'loss': 0.2394, 'grad_norm': 0.6209220886230469, 'learning_rate': 3.277177734190398e-06, 'epoch': 1.95}
65%|██████▌ | 7501/11526 [1:18:25<41:14, 1.63it/s] 65%|██████▌ | 7502/11526 [1:18:26<41:12, 1.63it/s] {'loss': 0.1986, 'grad_norm': 0.4852820634841919, 'learning_rate': 3.275756233217061e-06, 'epoch': 1.95}
65%|██████▌ | 7502/11526 [1:18:26<41:12, 1.63it/s] 65%|██████▌ | 7503/11526 [1:18:26<41:12, 1.63it/s] {'loss': 0.2395, 'grad_norm': 0.6581979393959045, 'learning_rate': 3.274334890401175e-06, 'epoch': 1.95}
65%|██████▌ | 7503/11526 [1:18:27<41:12, 1.63it/s] 65%|██████▌ | 7504/11526 [1:18:27<41:17, 1.62it/s] {'loss': 0.2052, 'grad_norm': 0.6219602227210999, 'learning_rate': 3.272913705873112e-06, 'epoch': 1.95}
65%|██████▌ | 7504/11526 [1:18:27<41:17, 1.62it/s] 65%|██████▌ | 7505/11526 [1:18:28<41:15, 1.62it/s] {'loss': 0.1975, 'grad_norm': 0.5921702980995178, 'learning_rate': 3.2714926797632306e-06, 'epoch': 1.95}
65%|██████▌ | 7505/11526 [1:18:28<41:15, 1.62it/s] 65%|██████▌ | 7506/11526 [1:18:28<41:14, 1.62it/s] {'loss': 0.2038, 'grad_norm': 0.5578120946884155, 'learning_rate': 3.2700718122018747e-06, 'epoch': 1.95}
65%|██████▌ | 7506/11526 [1:18:28<41:14, 1.62it/s] 65%|██████▌ | 7507/11526 [1:18:29<41:12, 1.63it/s] {'loss': 0.2335, 'grad_norm': 0.6209713220596313, 'learning_rate': 3.268651103319374e-06, 'epoch': 1.95}
65%|██████▌ | 7507/11526 [1:18:29<41:12, 1.63it/s] 65%|██████▌ | 7508/11526 [1:18:30<41:10, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.5046960711479187, 'learning_rate': 3.267230553246047e-06, 'epoch': 1.95}
65%|██████▌ | 7508/11526 [1:18:30<41:10, 1.63it/s] 65%|██████▌ | 7509/11526 [1:18:30<41:13, 1.62it/s] {'loss': 0.1438, 'grad_norm': 0.4401105046272278, 'learning_rate': 3.265810162112193e-06, 'epoch': 1.95}
65%|██████▌ | 7509/11526 [1:18:30<41:13, 1.62it/s] 65%|██████▌ | 7510/11526 [1:18:31<41:10, 1.63it/s] {'loss': 0.2033, 'grad_norm': 0.6019783020019531, 'learning_rate': 3.2643899300480964e-06, 'epoch': 1.95}
65%|██████▌ | 7510/11526 [1:18:31<41:10, 1.63it/s] 65%|██████▌ | 7511/11526 [1:18:31<41:07, 1.63it/s] {'loss': 0.2519, 'grad_norm': 0.6336613297462463, 'learning_rate': 3.26296985718403e-06, 'epoch': 1.95}
65%|██████▌ | 7511/11526 [1:18:31<41:07, 1.63it/s] 65%|██████▌ | 7512/11526 [1:18:32<41:06, 1.63it/s] {'loss': 0.2707, 'grad_norm': 0.7685343027114868, 'learning_rate': 3.261549943650254e-06, 'epoch': 1.96}
65%|██████▌ | 7512/11526 [1:18:32<41:06, 1.63it/s] 65%|██████▌ | 7513/11526 [1:18:33<41:04, 1.63it/s] {'loss': 0.1927, 'grad_norm': 0.5067522525787354, 'learning_rate': 3.260130189577008e-06, 'epoch': 1.96}
65%|██████▌ | 7513/11526 [1:18:33<41:04, 1.63it/s] 65%|██████▌ | 7514/11526 [1:18:33<41:17, 1.62it/s] {'loss': 0.2041, 'grad_norm': 0.610969066619873, 'learning_rate': 3.25871059509452e-06, 'epoch': 1.96}
65%|██████▌ | 7514/11526 [1:18:33<41:17, 1.62it/s] 65%|██████▌ | 7515/11526 [1:18:34<41:13, 1.62it/s] {'loss': 0.1679, 'grad_norm': 0.4574359357357025, 'learning_rate': 3.2572911603330036e-06, 'epoch': 1.96}
65%|██████▌ | 7515/11526 [1:18:34<41:13, 1.62it/s] 65%|██████▌ | 7516/11526 [1:18:34<41:08, 1.62it/s] {'loss': 0.2076, 'grad_norm': 0.5554258227348328, 'learning_rate': 3.2558718854226567e-06, 'epoch': 1.96}
65%|██████▌ | 7516/11526 [1:18:35<41:08, 1.62it/s] 65%|██████▌ | 7517/11526 [1:18:35<41:05, 1.63it/s] {'loss': 0.2047, 'grad_norm': 0.5356367826461792, 'learning_rate': 3.2544527704936655e-06, 'epoch': 1.96}
65%|██████▌ | 7517/11526 [1:18:35<41:05, 1.63it/s] 65%|██████▌ | 7518/11526 [1:18:36<41:03, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.5363260507583618, 'learning_rate': 3.2530338156761974e-06, 'epoch': 1.96}
65%|██████▌ | 7518/11526 [1:18:36<41:03, 1.63it/s] 65%|██████▌ | 7519/11526 [1:18:36<41:04, 1.63it/s] {'loss': 0.1701, 'grad_norm': 0.436409056186676, 'learning_rate': 3.2516150211004082e-06, 'epoch': 1.96}
65%|██████▌ | 7519/11526 [1:18:36<41:04, 1.63it/s] 65%|██████▌ | 7520/11526 [1:18:37<41:02, 1.63it/s] {'loss': 0.1707, 'grad_norm': 0.5327011346817017, 'learning_rate': 3.2501963868964358e-06, 'epoch': 1.96}
65%|██████▌ | 7520/11526 [1:18:37<41:02, 1.63it/s] 65%|██████▌ | 7521/11526 [1:18:38<41:00, 1.63it/s] {'loss': 0.1952, 'grad_norm': 0.6075000166893005, 'learning_rate': 3.248777913194408e-06, 'epoch': 1.96}
65%|██████▌ | 7521/11526 [1:18:38<41:00, 1.63it/s] 65%|██████▌ | 7522/11526 [1:18:38<41:00, 1.63it/s] {'loss': 0.2312, 'grad_norm': 0.5648849606513977, 'learning_rate': 3.2473596001244334e-06, 'epoch': 1.96}
65%|██████▌ | 7522/11526 [1:18:38<41:00, 1.63it/s] 65%|██████▌ | 7523/11526 [1:18:39<40:58, 1.63it/s] {'loss': 0.1673, 'grad_norm': 0.449773907661438, 'learning_rate': 3.2459414478166095e-06, 'epoch': 1.96}
65%|██████▌ | 7523/11526 [1:18:39<40:58, 1.63it/s] 65%|██████▌ | 7524/11526 [1:18:39<41:00, 1.63it/s] {'loss': 0.1712, 'grad_norm': 0.5019554495811462, 'learning_rate': 3.2445234564010154e-06, 'epoch': 1.96}
65%|██████▌ | 7524/11526 [1:18:39<41:00, 1.63it/s] 65%|██████▌ | 7525/11526 [1:18:40<40:58, 1.63it/s] {'loss': 0.2445, 'grad_norm': 0.6104938387870789, 'learning_rate': 3.2431056260077175e-06, 'epoch': 1.96}
65%|██████▌ | 7525/11526 [1:18:40<40:58, 1.63it/s] 65%|██████▌ | 7526/11526 [1:18:41<40:57, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.5433170795440674, 'learning_rate': 3.2416879567667683e-06, 'epoch': 1.96}
65%|██████▌ | 7526/11526 [1:18:41<40:57, 1.63it/s] 65%|██████▌ | 7527/11526 [1:18:41<40:56, 1.63it/s] {'loss': 0.3513, 'grad_norm': 0.6779436469078064, 'learning_rate': 3.240270448808205e-06, 'epoch': 1.96}
65%|██████▌ | 7527/11526 [1:18:41<40:56, 1.63it/s] 65%|██████▌ | 7528/11526 [1:18:42<40:56, 1.63it/s] {'loss': 0.223, 'grad_norm': 0.566417932510376, 'learning_rate': 3.2388531022620474e-06, 'epoch': 1.96}
65%|██████▌ | 7528/11526 [1:18:42<40:56, 1.63it/s] 65%|██████▌ | 7529/11526 [1:18:42<40:56, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.48754432797431946, 'learning_rate': 3.237435917258304e-06, 'epoch': 1.96}
65%|██████▌ | 7529/11526 [1:18:43<40:56, 1.63it/s] 65%|██████▌ | 7530/11526 [1:18:43<40:55, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.46910494565963745, 'learning_rate': 3.2360188939269664e-06, 'epoch': 1.96}
65%|██████▌ | 7530/11526 [1:18:43<40:55, 1.63it/s] 65%|██████▌ | 7531/11526 [1:18:44<40:54, 1.63it/s] {'loss': 0.1968, 'grad_norm': 0.509126603603363, 'learning_rate': 3.2346020323980137e-06, 'epoch': 1.96}
65%|██████▌ | 7531/11526 [1:18:44<40:54, 1.63it/s] 65%|██████▌ | 7532/11526 [1:18:44<40:53, 1.63it/s] {'loss': 0.1813, 'grad_norm': 0.5260135531425476, 'learning_rate': 3.233185332801406e-06, 'epoch': 1.96}
65%|██████▌ | 7532/11526 [1:18:44<40:53, 1.63it/s] 65%|██████▌ | 7533/11526 [1:18:45<40:53, 1.63it/s] {'loss': 0.2232, 'grad_norm': 0.838839590549469, 'learning_rate': 3.231768795267094e-06, 'epoch': 1.96}
65%|██████▌ | 7533/11526 [1:18:45<40:53, 1.63it/s] 65%|██████▌ | 7534/11526 [1:18:45<40:54, 1.63it/s] {'loss': 0.1946, 'grad_norm': 0.6725423336029053, 'learning_rate': 3.230352419925007e-06, 'epoch': 1.96}
65%|██████▌ | 7534/11526 [1:18:46<40:54, 1.63it/s] 65%|██████▌ | 7535/11526 [1:18:46<40:53, 1.63it/s] {'loss': 0.2421, 'grad_norm': 0.5301441550254822, 'learning_rate': 3.2289362069050665e-06, 'epoch': 1.96}
65%|██████▌ | 7535/11526 [1:18:46<40:53, 1.63it/s] 65%|██████▌ | 7536/11526 [1:18:47<40:52, 1.63it/s] {'loss': 0.1583, 'grad_norm': 0.4583558440208435, 'learning_rate': 3.227520156337173e-06, 'epoch': 1.96}
65%|██████▌ | 7536/11526 [1:18:47<40:52, 1.63it/s] 65%|██████▌ | 7537/11526 [1:18:47<40:50, 1.63it/s] {'loss': 0.1838, 'grad_norm': 0.5439873337745667, 'learning_rate': 3.226104268351217e-06, 'epoch': 1.96}
65%|██████▌ | 7537/11526 [1:18:47<40:50, 1.63it/s] 65%|██████▌ | 7538/11526 [1:18:48<40:49, 1.63it/s] {'loss': 0.2317, 'grad_norm': 0.5614979863166809, 'learning_rate': 3.2246885430770684e-06, 'epoch': 1.96}
65%|██████▌ | 7538/11526 [1:18:48<40:49, 1.63it/s] 65%|██████▌ | 7539/11526 [1:18:49<40:51, 1.63it/s] {'loss': 0.2531, 'grad_norm': 0.6572096943855286, 'learning_rate': 3.22327298064459e-06, 'epoch': 1.96}
65%|██████▌ | 7539/11526 [1:18:49<40:51, 1.63it/s] 65%|██████▌ | 7540/11526 [1:18:49<40:50, 1.63it/s] {'loss': 0.1719, 'grad_norm': 0.5360339879989624, 'learning_rate': 3.2218575811836217e-06, 'epoch': 1.96}
65%|██████▌ | 7540/11526 [1:18:49<40:50, 1.63it/s] 65%|██████▌ | 7541/11526 [1:18:50<40:49, 1.63it/s] {'loss': 0.2135, 'grad_norm': 0.61893230676651, 'learning_rate': 3.220442344823994e-06, 'epoch': 1.96}
65%|██████▌ | 7541/11526 [1:18:50<40:49, 1.63it/s] 65%|██████▌ | 7542/11526 [1:18:50<40:49, 1.63it/s] {'loss': 0.2244, 'grad_norm': 0.5814613103866577, 'learning_rate': 3.2190272716955207e-06, 'epoch': 1.96}
65%|██████▌ | 7542/11526 [1:18:51<40:49, 1.63it/s] 65%|██████▌ | 7543/11526 [1:18:51<40:49, 1.63it/s] {'loss': 0.1614, 'grad_norm': 0.5286638140678406, 'learning_rate': 3.2176123619279963e-06, 'epoch': 1.96}
65%|██████▌ | 7543/11526 [1:18:51<40:49, 1.63it/s] 65%|██████▌ | 7544/11526 [1:18:52<40:57, 1.62it/s] {'loss': 0.1671, 'grad_norm': 0.5679326057434082, 'learning_rate': 3.216197615651209e-06, 'epoch': 1.96}
65%|██████▌ | 7544/11526 [1:18:52<40:57, 1.62it/s] 65%|██████▌ | 7545/11526 [1:18:52<40:53, 1.62it/s] {'loss': 0.1653, 'grad_norm': 0.4656113088130951, 'learning_rate': 3.2147830329949266e-06, 'epoch': 1.96}
65%|██████▌ | 7545/11526 [1:18:52<40:53, 1.62it/s] 65%|██████▌ | 7546/11526 [1:18:53<40:49, 1.62it/s] {'loss': 0.1882, 'grad_norm': 0.470541775226593, 'learning_rate': 3.213368614088902e-06, 'epoch': 1.96}
65%|██████▌ | 7546/11526 [1:18:53<40:49, 1.62it/s] 65%|██████▌ | 7547/11526 [1:18:53<40:48, 1.62it/s] {'loss': 0.2115, 'grad_norm': 0.6270729303359985, 'learning_rate': 3.211954359062871e-06, 'epoch': 1.96}
65%|██████▌ | 7547/11526 [1:18:54<40:48, 1.62it/s] 65%|██████▌ | 7548/11526 [1:18:54<40:49, 1.62it/s] {'loss': 0.1557, 'grad_norm': 0.46934324502944946, 'learning_rate': 3.2105402680465614e-06, 'epoch': 1.96}
65%|██████▌ | 7548/11526 [1:18:54<40:49, 1.62it/s] 65%|██████▌ | 7549/11526 [1:18:55<40:50, 1.62it/s] {'loss': 0.1972, 'grad_norm': 0.5395406484603882, 'learning_rate': 3.209126341169681e-06, 'epoch': 1.96}
65%|██████▌ | 7549/11526 [1:18:55<40:50, 1.62it/s] 66%|██████▌ | 7550/11526 [1:18:55<40:48, 1.62it/s] {'loss': 0.2429, 'grad_norm': 0.6831571459770203, 'learning_rate': 3.2077125785619224e-06, 'epoch': 1.97}
66%|██████▌ | 7550/11526 [1:18:55<40:48, 1.62it/s] 66%|██████▌ | 7551/11526 [1:18:56<40:46, 1.62it/s] {'loss': 0.2148, 'grad_norm': 0.6113748550415039, 'learning_rate': 3.206298980352961e-06, 'epoch': 1.97}
66%|██████▌ | 7551/11526 [1:18:56<40:46, 1.62it/s] 66%|██████▌ | 7552/11526 [1:18:57<40:44, 1.63it/s] {'loss': 0.2023, 'grad_norm': 0.5887892842292786, 'learning_rate': 3.204885546672465e-06, 'epoch': 1.97}
66%|██████▌ | 7552/11526 [1:18:57<40:44, 1.63it/s] 66%|██████▌ | 7553/11526 [1:18:57<40:41, 1.63it/s] {'loss': 0.1753, 'grad_norm': 0.5667897462844849, 'learning_rate': 3.2034722776500805e-06, 'epoch': 1.97}
66%|██████▌ | 7553/11526 [1:18:57<40:41, 1.63it/s] 66%|██████▌ | 7554/11526 [1:18:58<40:44, 1.62it/s] {'loss': 0.1994, 'grad_norm': 0.5246461629867554, 'learning_rate': 3.2020591734154407e-06, 'epoch': 1.97}
66%|██████▌ | 7554/11526 [1:18:58<40:44, 1.62it/s] 66%|██████▌ | 7555/11526 [1:18:58<40:41, 1.63it/s] {'loss': 0.2048, 'grad_norm': 0.588197648525238, 'learning_rate': 3.2006462340981628e-06, 'epoch': 1.97}
66%|██████▌ | 7555/11526 [1:18:59<40:41, 1.63it/s] 66%|██████▌ | 7556/11526 [1:18:59<40:39, 1.63it/s] {'loss': 0.1893, 'grad_norm': 0.5918688774108887, 'learning_rate': 3.1992334598278486e-06, 'epoch': 1.97}
66%|██████▌ | 7556/11526 [1:18:59<40:39, 1.63it/s] 66%|██████▌ | 7557/11526 [1:19:00<40:39, 1.63it/s] {'loss': 0.2284, 'grad_norm': 0.529207706451416, 'learning_rate': 3.19782085073409e-06, 'epoch': 1.97}
66%|██████▌ | 7557/11526 [1:19:00<40:39, 1.63it/s] 66%|██████▌ | 7558/11526 [1:19:00<40:38, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.4790841341018677, 'learning_rate': 3.196408406946456e-06, 'epoch': 1.97}
66%|██████▌ | 7558/11526 [1:19:00<40:38, 1.63it/s] 66%|██████▌ | 7559/11526 [1:19:01<40:39, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.4860024154186249, 'learning_rate': 3.194996128594505e-06, 'epoch': 1.97}
66%|██████▌ | 7559/11526 [1:19:01<40:39, 1.63it/s] 66%|██████▌ | 7560/11526 [1:19:01<40:38, 1.63it/s] {'loss': 0.185, 'grad_norm': 0.511823832988739, 'learning_rate': 3.193584015807778e-06, 'epoch': 1.97}
66%|██████▌ | 7560/11526 [1:19:02<40:38, 1.63it/s] 66%|██████▌ | 7561/11526 [1:19:02<40:36, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.6218997836112976, 'learning_rate': 3.1921720687158047e-06, 'epoch': 1.97}
66%|██████▌ | 7561/11526 [1:19:02<40:36, 1.63it/s] 66%|██████▌ | 7562/11526 [1:19:03<40:35, 1.63it/s] {'loss': 0.1965, 'grad_norm': 0.5394735932350159, 'learning_rate': 3.1907602874480957e-06, 'epoch': 1.97}
66%|██████▌ | 7562/11526 [1:19:03<40:35, 1.63it/s] 66%|██████▌ | 7563/11526 [1:19:03<40:33, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.4815986454486847, 'learning_rate': 3.189348672134146e-06, 'epoch': 1.97}
66%|██████▌ | 7563/11526 [1:19:03<40:33, 1.63it/s] 66%|██████▌ | 7564/11526 [1:19:04<40:38, 1.63it/s] {'loss': 0.2377, 'grad_norm': 0.7314162850379944, 'learning_rate': 3.1879372229034395e-06, 'epoch': 1.97}
66%|██████▌ | 7564/11526 [1:19:04<40:38, 1.63it/s] 66%|██████▌ | 7565/11526 [1:19:05<40:35, 1.63it/s] {'loss': 0.2001, 'grad_norm': 0.5833917856216431, 'learning_rate': 3.1865259398854388e-06, 'epoch': 1.97}
66%|██████▌ | 7565/11526 [1:19:05<40:35, 1.63it/s] 66%|██████▌ | 7566/11526 [1:19:05<40:34, 1.63it/s] {'loss': 0.1818, 'grad_norm': 0.5520609617233276, 'learning_rate': 3.1851148232095995e-06, 'epoch': 1.97}
66%|██████▌ | 7566/11526 [1:19:05<40:34, 1.63it/s] 66%|██████▌ | 7567/11526 [1:19:06<40:32, 1.63it/s] {'loss': 0.1757, 'grad_norm': 0.45561885833740234, 'learning_rate': 3.1837038730053538e-06, 'epoch': 1.97}
66%|██████▌ | 7567/11526 [1:19:06<40:32, 1.63it/s] 66%|██████▌ | 7568/11526 [1:19:06<40:31, 1.63it/s] {'loss': 0.1765, 'grad_norm': 0.4970478415489197, 'learning_rate': 3.182293089402124e-06, 'epoch': 1.97}
66%|██████▌ | 7568/11526 [1:19:07<40:31, 1.63it/s] 66%|██████▌ | 7569/11526 [1:19:07<40:33, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.49575066566467285, 'learning_rate': 3.1808824725293124e-06, 'epoch': 1.97}
66%|██████▌ | 7569/11526 [1:19:07<40:33, 1.63it/s] 66%|██████▌ | 7570/11526 [1:19:08<40:32, 1.63it/s] {'loss': 0.2471, 'grad_norm': 0.6400864720344543, 'learning_rate': 3.1794720225163133e-06, 'epoch': 1.97}
66%|██████▌ | 7570/11526 [1:19:08<40:32, 1.63it/s] 66%|██████▌ | 7571/11526 [1:19:08<40:30, 1.63it/s] {'loss': 0.1829, 'grad_norm': 0.48349425196647644, 'learning_rate': 3.1780617394924975e-06, 'epoch': 1.97}
66%|██████▌ | 7571/11526 [1:19:08<40:30, 1.63it/s] 66%|██████▌ | 7572/11526 [1:19:09<40:28, 1.63it/s] {'loss': 0.2008, 'grad_norm': 0.5554109811782837, 'learning_rate': 3.176651623587226e-06, 'epoch': 1.97}
66%|██████▌ | 7572/11526 [1:19:09<40:28, 1.63it/s] 66%|██████▌ | 7573/11526 [1:19:09<40:28, 1.63it/s] {'loss': 0.1574, 'grad_norm': 0.47047287225723267, 'learning_rate': 3.175241674929842e-06, 'epoch': 1.97}
66%|██████▌ | 7573/11526 [1:19:10<40:28, 1.63it/s] 66%|██████▌ | 7574/11526 [1:19:10<40:30, 1.63it/s] {'loss': 0.1942, 'grad_norm': 0.5315290689468384, 'learning_rate': 3.1738318936496713e-06, 'epoch': 1.97}
66%|██████▌ | 7574/11526 [1:19:10<40:30, 1.63it/s] 66%|██████▌ | 7575/11526 [1:19:11<40:27, 1.63it/s] {'loss': 0.2422, 'grad_norm': 0.6110561490058899, 'learning_rate': 3.172422279876032e-06, 'epoch': 1.97}
66%|██████▌ | 7575/11526 [1:19:11<40:27, 1.63it/s] 66%|██████▌ | 7576/11526 [1:19:11<40:27, 1.63it/s] {'loss': 0.1836, 'grad_norm': 0.48456084728240967, 'learning_rate': 3.1710128337382206e-06, 'epoch': 1.97}
66%|██████▌ | 7576/11526 [1:19:11<40:27, 1.63it/s] 66%|██████▌ | 7577/11526 [1:19:12<40:26, 1.63it/s] {'loss': 0.2177, 'grad_norm': 0.5741434693336487, 'learning_rate': 3.169603555365518e-06, 'epoch': 1.97}
66%|██████▌ | 7577/11526 [1:19:12<40:26, 1.63it/s] 66%|██████▌ | 7578/11526 [1:19:13<40:24, 1.63it/s] {'loss': 0.1642, 'grad_norm': 0.465180367231369, 'learning_rate': 3.1681944448871905e-06, 'epoch': 1.97}
66%|██████▌ | 7578/11526 [1:19:13<40:24, 1.63it/s] 66%|██████▌ | 7579/11526 [1:19:13<40:26, 1.63it/s] {'loss': 0.1986, 'grad_norm': 0.47477659583091736, 'learning_rate': 3.1667855024324912e-06, 'epoch': 1.97}
66%|██████▌ | 7579/11526 [1:19:13<40:26, 1.63it/s] 66%|██████▌ | 7580/11526 [1:19:14<40:25, 1.63it/s] {'loss': 0.2075, 'grad_norm': 0.552848219871521, 'learning_rate': 3.165376728130658e-06, 'epoch': 1.97}
66%|██████▌ | 7580/11526 [1:19:14<40:25, 1.63it/s] 66%|██████▌ | 7581/11526 [1:19:14<40:24, 1.63it/s] {'loss': 0.1624, 'grad_norm': 0.4590122103691101, 'learning_rate': 3.163968122110909e-06, 'epoch': 1.97}
66%|██████▌ | 7581/11526 [1:19:15<40:24, 1.63it/s] 66%|██████▌ | 7582/11526 [1:19:15<40:22, 1.63it/s] {'loss': 0.1702, 'grad_norm': 0.5375771522521973, 'learning_rate': 3.1625596845024497e-06, 'epoch': 1.97}
66%|██████▌ | 7582/11526 [1:19:15<40:22, 1.63it/s] 66%|██████▌ | 7583/11526 [1:19:16<40:21, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.48578163981437683, 'learning_rate': 3.1611514154344703e-06, 'epoch': 1.97}
66%|██████▌ | 7583/11526 [1:19:16<40:21, 1.63it/s] 66%|██████▌ | 7584/11526 [1:19:16<40:25, 1.63it/s] {'loss': 0.2724, 'grad_norm': 0.6566855907440186, 'learning_rate': 3.1597433150361477e-06, 'epoch': 1.97}
66%|██████▌ | 7584/11526 [1:19:16<40:25, 1.63it/s] 66%|██████▌ | 7585/11526 [1:19:17<40:23, 1.63it/s] {'loss': 0.1933, 'grad_norm': 0.5420049428939819, 'learning_rate': 3.1583353834366383e-06, 'epoch': 1.97}
66%|██████▌ | 7585/11526 [1:19:17<40:23, 1.63it/s] 66%|██████▌ | 7586/11526 [1:19:17<40:21, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.5160303115844727, 'learning_rate': 3.1569276207650855e-06, 'epoch': 1.97}
66%|██████▌ | 7586/11526 [1:19:18<40:21, 1.63it/s] 66%|██████▌ | 7587/11526 [1:19:18<40:20, 1.63it/s] {'loss': 0.2566, 'grad_norm': 0.6528110504150391, 'learning_rate': 3.155520027150617e-06, 'epoch': 1.97}
66%|██████▌ | 7587/11526 [1:19:18<40:20, 1.63it/s] 66%|██████▌ | 7588/11526 [1:19:19<40:19, 1.63it/s] {'loss': 0.1819, 'grad_norm': 0.4973883032798767, 'learning_rate': 3.1541126027223478e-06, 'epoch': 1.98}
66%|██████▌ | 7588/11526 [1:19:19<40:19, 1.63it/s] 66%|██████▌ | 7589/11526 [1:19:19<40:19, 1.63it/s] {'loss': 0.2161, 'grad_norm': 0.6104224324226379, 'learning_rate': 3.152705347609374e-06, 'epoch': 1.98}
66%|██████▌ | 7589/11526 [1:19:19<40:19, 1.63it/s] 66%|██████▌ | 7590/11526 [1:19:20<40:20, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.4822770059108734, 'learning_rate': 3.151298261940775e-06, 'epoch': 1.98}
66%|██████▌ | 7590/11526 [1:19:20<40:20, 1.63it/s] 66%|██████▌ | 7591/11526 [1:19:21<40:19, 1.63it/s] {'loss': 0.2081, 'grad_norm': 0.6194984316825867, 'learning_rate': 3.149891345845619e-06, 'epoch': 1.98}
66%|██████▌ | 7591/11526 [1:19:21<40:19, 1.63it/s] 66%|██████▌ | 7592/11526 [1:19:21<40:16, 1.63it/s] {'loss': 0.2272, 'grad_norm': 0.6496408581733704, 'learning_rate': 3.1484845994529536e-06, 'epoch': 1.98}
66%|██████▌ | 7592/11526 [1:19:21<40:16, 1.63it/s] 66%|██████▌ | 7593/11526 [1:19:22<40:16, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.5311450362205505, 'learning_rate': 3.1470780228918173e-06, 'epoch': 1.98}
66%|██████▌ | 7593/11526 [1:19:22<40:16, 1.63it/s] 66%|██████▌ | 7594/11526 [1:19:22<40:20, 1.62it/s] {'loss': 0.1793, 'grad_norm': 0.5224694609642029, 'learning_rate': 3.145671616291227e-06, 'epoch': 1.98}
66%|██████▌ | 7594/11526 [1:19:23<40:20, 1.62it/s] 66%|██████▌ | 7595/11526 [1:19:23<40:17, 1.63it/s] {'loss': 0.3017, 'grad_norm': 0.7663317918777466, 'learning_rate': 3.144265379780187e-06, 'epoch': 1.98}
66%|██████▌ | 7595/11526 [1:19:23<40:17, 1.63it/s] 66%|██████▌ | 7596/11526 [1:19:24<40:15, 1.63it/s] {'loss': 0.2113, 'grad_norm': 0.5665456652641296, 'learning_rate': 3.142859313487684e-06, 'epoch': 1.98}
66%|██████▌ | 7596/11526 [1:19:24<40:15, 1.63it/s] 66%|██████▌ | 7597/11526 [1:19:24<40:13, 1.63it/s] {'loss': 0.2592, 'grad_norm': 0.6182903051376343, 'learning_rate': 3.1414534175426913e-06, 'epoch': 1.98}
66%|██████▌ | 7597/11526 [1:19:24<40:13, 1.63it/s] 66%|██████▌ | 7598/11526 [1:19:25<40:12, 1.63it/s] {'loss': 0.2145, 'grad_norm': 0.5466854572296143, 'learning_rate': 3.1400476920741673e-06, 'epoch': 1.98}
66%|██████▌ | 7598/11526 [1:19:25<40:12, 1.63it/s] 66%|██████▌ | 7599/11526 [1:19:25<40:15, 1.63it/s] {'loss': 0.187, 'grad_norm': 0.5070269703865051, 'learning_rate': 3.138642137211052e-06, 'epoch': 1.98}
66%|██████▌ | 7599/11526 [1:19:26<40:15, 1.63it/s] 66%|██████▌ | 7600/11526 [1:19:26<40:14, 1.63it/s] {'loss': 0.2379, 'grad_norm': 0.5133275985717773, 'learning_rate': 3.1372367530822685e-06, 'epoch': 1.98}
66%|██████▌ | 7600/11526 [1:19:26<40:14, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.36it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.02it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5165576338768005, 'eval_runtime': 1.9537, 'eval_samples_per_second': 102.37, 'eval_steps_per_second': 6.654, 'epoch': 1.98}
66%|██████▌ | 7600/11526 [1:19:28<40:14, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 66%|██████▌ | 7601/11526 [1:19:29<1:18:40, 1.20s/it] {'loss': 0.2376, 'grad_norm': 0.5545743703842163, 'learning_rate': 3.1358315398167294e-06, 'epoch': 1.98}
66%|██████▌ | 7601/11526 [1:19:29<1:18:40, 1.20s/it] 66%|██████▌ | 7602/11526 [1:19:29<1:07:06, 1.03s/it] {'loss': 0.2203, 'grad_norm': 0.5689961314201355, 'learning_rate': 3.134426497543329e-06, 'epoch': 1.98}
66%|██████▌ | 7602/11526 [1:19:29<1:07:06, 1.03s/it] 66%|██████▌ | 7603/11526 [1:19:30<59:00, 1.11it/s] {'loss': 0.1627, 'grad_norm': 0.44613149762153625, 'learning_rate': 3.1330216263909453e-06, 'epoch': 1.98}
66%|██████▌ | 7603/11526 [1:19:30<59:00, 1.11it/s] 66%|██████▌ | 7604/11526 [1:19:30<53:22, 1.22it/s] {'loss': 0.2282, 'grad_norm': 0.6634203791618347, 'learning_rate': 3.13161692648844e-06, 'epoch': 1.98}
66%|██████▌ | 7604/11526 [1:19:31<53:22, 1.22it/s] 66%|██████▌ | 7605/11526 [1:19:31<49:23, 1.32it/s] {'loss': 0.1571, 'grad_norm': 0.4983389675617218, 'learning_rate': 3.1302123979646592e-06, 'epoch': 1.98}
66%|██████▌ | 7605/11526 [1:19:31<49:23, 1.32it/s] 66%|██████▌ | 7606/11526 [1:19:32<46:38, 1.40it/s] {'loss': 0.2463, 'grad_norm': 0.616987407207489, 'learning_rate': 3.128808040948438e-06, 'epoch': 1.98}
66%|██████▌ | 7606/11526 [1:19:32<46:38, 1.40it/s] 66%|██████▌ | 7607/11526 [1:19:32<44:41, 1.46it/s] {'loss': 0.1524, 'grad_norm': 0.43783995509147644, 'learning_rate': 3.1274038555685904e-06, 'epoch': 1.98}
66%|██████▌ | 7607/11526 [1:19:32<44:41, 1.46it/s] 66%|██████▌ | 7608/11526 [1:19:33<43:18, 1.51it/s] {'loss': 0.1652, 'grad_norm': 0.44734448194503784, 'learning_rate': 3.125999841953914e-06, 'epoch': 1.98}
66%|██████▌ | 7608/11526 [1:19:33<43:18, 1.51it/s] 66%|██████▌ | 7609/11526 [1:19:34<42:23, 1.54it/s] {'loss': 0.2245, 'grad_norm': 0.6227524876594543, 'learning_rate': 3.124596000233194e-06, 'epoch': 1.98}
66%|██████▌ | 7609/11526 [1:19:34<42:23, 1.54it/s] 66%|██████▌ | 7610/11526 [1:19:34<41:40, 1.57it/s] {'loss': 0.2133, 'grad_norm': 0.5516629815101624, 'learning_rate': 3.123192330535202e-06, 'epoch': 1.98}
66%|██████▌ | 7610/11526 [1:19:34<41:40, 1.57it/s] 66%|██████▌ | 7611/11526 [1:19:35<41:11, 1.58it/s] {'loss': 0.1918, 'grad_norm': 0.529068648815155, 'learning_rate': 3.1217888329886883e-06, 'epoch': 1.98}
66%|██████▌ | 7611/11526 [1:19:35<41:11, 1.58it/s] 66%|██████▌ | 7612/11526 [1:19:35<40:51, 1.60it/s] {'loss': 0.1748, 'grad_norm': 0.633489727973938, 'learning_rate': 3.1203855077223877e-06, 'epoch': 1.98}
66%|██████▌ | 7612/11526 [1:19:36<40:51, 1.60it/s] 66%|██████▌ | 7613/11526 [1:19:36<40:35, 1.61it/s] {'loss': 0.1929, 'grad_norm': 0.5559565424919128, 'learning_rate': 3.1189823548650234e-06, 'epoch': 1.98}
66%|██████▌ | 7613/11526 [1:19:36<40:35, 1.61it/s] 66%|██████▌ | 7614/11526 [1:19:37<40:28, 1.61it/s] {'loss': 0.1928, 'grad_norm': 0.71767657995224, 'learning_rate': 3.1175793745452986e-06, 'epoch': 1.98}
66%|██████▌ | 7614/11526 [1:19:37<40:28, 1.61it/s] 66%|██████▌ | 7615/11526 [1:19:37<40:19, 1.62it/s] {'loss': 0.2607, 'grad_norm': 0.6306091547012329, 'learning_rate': 3.1161765668919065e-06, 'epoch': 1.98}
66%|██████▌ | 7615/11526 [1:19:37<40:19, 1.62it/s] 66%|██████▌ | 7616/11526 [1:19:38<40:12, 1.62it/s] {'loss': 0.2096, 'grad_norm': 0.5305703282356262, 'learning_rate': 3.1147739320335168e-06, 'epoch': 1.98}
66%|██████▌ | 7616/11526 [1:19:38<40:12, 1.62it/s] 66%|██████▌ | 7617/11526 [1:19:38<40:07, 1.62it/s] {'loss': 0.2201, 'grad_norm': 0.5601246953010559, 'learning_rate': 3.1133714700987892e-06, 'epoch': 1.98}
66%|██████▌ | 7617/11526 [1:19:39<40:07, 1.62it/s] 66%|██████▌ | 7618/11526 [1:19:39<40:10, 1.62it/s] {'loss': 0.22, 'grad_norm': 0.5898014307022095, 'learning_rate': 3.111969181216363e-06, 'epoch': 1.98}
66%|██████▌ | 7618/11526 [1:19:39<40:10, 1.62it/s] 66%|██████▌ | 7619/11526 [1:19:40<40:09, 1.62it/s] {'loss': 0.2463, 'grad_norm': 0.6257513165473938, 'learning_rate': 3.110567065514869e-06, 'epoch': 1.98}
66%|██████▌ | 7619/11526 [1:19:40<40:09, 1.62it/s] 66%|██████▌ | 7620/11526 [1:19:40<40:05, 1.62it/s] {'loss': 0.2214, 'grad_norm': 0.6734793186187744, 'learning_rate': 3.1091651231229126e-06, 'epoch': 1.98}
66%|██████▌ | 7620/11526 [1:19:40<40:05, 1.62it/s] 66%|██████▌ | 7621/11526 [1:19:41<40:02, 1.63it/s] {'loss': 0.2311, 'grad_norm': 0.6320680975914001, 'learning_rate': 3.1077633541690898e-06, 'epoch': 1.98}
66%|██████▌ | 7621/11526 [1:19:41<40:02, 1.63it/s] 66%|██████▌ | 7622/11526 [1:19:42<39:59, 1.63it/s] {'loss': 0.2101, 'grad_norm': 0.5462419390678406, 'learning_rate': 3.1063617587819795e-06, 'epoch': 1.98}
66%|██████▌ | 7622/11526 [1:19:42<39:59, 1.63it/s] 66%|██████▌ | 7623/11526 [1:19:42<40:00, 1.63it/s] {'loss': 0.2308, 'grad_norm': 0.6382221579551697, 'learning_rate': 3.1049603370901403e-06, 'epoch': 1.98}
66%|██████▌ | 7623/11526 [1:19:42<40:00, 1.63it/s] 66%|██████▌ | 7624/11526 [1:19:43<39:59, 1.63it/s] {'loss': 0.1884, 'grad_norm': 0.533572256565094, 'learning_rate': 3.1035590892221225e-06, 'epoch': 1.98}
66%|██████▌ | 7624/11526 [1:19:43<39:59, 1.63it/s] 66%|██████▌ | 7625/11526 [1:19:43<39:57, 1.63it/s] {'loss': 0.2305, 'grad_norm': 0.6084513068199158, 'learning_rate': 3.102158015306457e-06, 'epoch': 1.98}
66%|██████▌ | 7625/11526 [1:19:44<39:57, 1.63it/s] 66%|██████▌ | 7626/11526 [1:19:44<39:55, 1.63it/s] {'loss': 0.2448, 'grad_norm': 0.6686943173408508, 'learning_rate': 3.1007571154716555e-06, 'epoch': 1.98}
66%|██████▌ | 7626/11526 [1:19:44<39:55, 1.63it/s] 66%|██████▌ | 7627/11526 [1:19:45<39:55, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5509684681892395, 'learning_rate': 3.0993563898462164e-06, 'epoch': 1.99}
66%|██████▌ | 7627/11526 [1:19:45<39:55, 1.63it/s] 66%|██████▌ | 7628/11526 [1:19:45<39:53, 1.63it/s] {'loss': 0.2006, 'grad_norm': 0.48345211148262024, 'learning_rate': 3.097955838558624e-06, 'epoch': 1.99}
66%|██████▌ | 7628/11526 [1:19:45<39:53, 1.63it/s] 66%|██████▌ | 7629/11526 [1:19:46<39:51, 1.63it/s] {'loss': 0.2321, 'grad_norm': 0.6899839639663696, 'learning_rate': 3.0965554617373454e-06, 'epoch': 1.99}
66%|██████▌ | 7629/11526 [1:19:46<39:51, 1.63it/s] 66%|██████▌ | 7630/11526 [1:19:46<39:52, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.501044511795044, 'learning_rate': 3.0951552595108304e-06, 'epoch': 1.99}
66%|██████▌ | 7630/11526 [1:19:47<39:52, 1.63it/s] 66%|██████▌ | 7631/11526 [1:19:47<39:51, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.5733312964439392, 'learning_rate': 3.0937552320075116e-06, 'epoch': 1.99}
66%|██████▌ | 7631/11526 [1:19:47<39:51, 1.63it/s] 66%|██████▌ | 7632/11526 [1:19:48<39:51, 1.63it/s] {'loss': 0.251, 'grad_norm': 0.5780131816864014, 'learning_rate': 3.092355379355808e-06, 'epoch': 1.99}
66%|██████▌ | 7632/11526 [1:19:48<39:51, 1.63it/s] 66%|██████▌ | 7633/11526 [1:19:48<39:52, 1.63it/s] {'loss': 0.1656, 'grad_norm': 0.48067817091941833, 'learning_rate': 3.090955701684125e-06, 'epoch': 1.99}
66%|██████▌ | 7633/11526 [1:19:48<39:52, 1.63it/s] 66%|██████▌ | 7634/11526 [1:19:49<39:51, 1.63it/s] {'loss': 0.2136, 'grad_norm': 0.5373337864875793, 'learning_rate': 3.089556199120848e-06, 'epoch': 1.99}
66%|██████▌ | 7634/11526 [1:19:49<39:51, 1.63it/s] 66%|██████▌ | 7635/11526 [1:19:50<39:51, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.4700952172279358, 'learning_rate': 3.0881568717943444e-06, 'epoch': 1.99}
66%|██████▌ | 7635/11526 [1:19:50<39:51, 1.63it/s] 66%|██████▋ | 7636/11526 [1:19:50<39:50, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.4580720067024231, 'learning_rate': 3.08675771983297e-06, 'epoch': 1.99}
66%|██████▋ | 7636/11526 [1:19:50<39:50, 1.63it/s] 66%|██████▋ | 7637/11526 [1:19:51<39:48, 1.63it/s] {'loss': 0.2016, 'grad_norm': 0.5387079119682312, 'learning_rate': 3.085358743365065e-06, 'epoch': 1.99}
66%|██████▋ | 7637/11526 [1:19:51<39:48, 1.63it/s] 66%|██████▋ | 7638/11526 [1:19:51<39:47, 1.63it/s] {'loss': 0.2234, 'grad_norm': 0.7262717485427856, 'learning_rate': 3.08395994251895e-06, 'epoch': 1.99}
66%|██████▋ | 7638/11526 [1:19:52<39:47, 1.63it/s] 66%|██████▋ | 7639/11526 [1:19:52<39:47, 1.63it/s] {'loss': 0.2101, 'grad_norm': 0.5666662454605103, 'learning_rate': 3.08256131742293e-06, 'epoch': 1.99}
66%|██████▋ | 7639/11526 [1:19:52<39:47, 1.63it/s] 66%|██████▋ | 7640/11526 [1:19:53<39:47, 1.63it/s] {'loss': 0.1604, 'grad_norm': 0.4853518307209015, 'learning_rate': 3.081162868205297e-06, 'epoch': 1.99}
66%|██████▋ | 7640/11526 [1:19:53<39:47, 1.63it/s] 66%|██████▋ | 7641/11526 [1:19:53<39:46, 1.63it/s] {'loss': 0.178, 'grad_norm': 0.444379597902298, 'learning_rate': 3.079764594994321e-06, 'epoch': 1.99}
66%|██████▋ | 7641/11526 [1:19:53<39:46, 1.63it/s] 66%|██████▋ | 7642/11526 [1:19:54<39:45, 1.63it/s] {'loss': 0.1881, 'grad_norm': 0.5109987854957581, 'learning_rate': 3.078366497918264e-06, 'epoch': 1.99}
66%|██████▋ | 7642/11526 [1:19:54<39:45, 1.63it/s] 66%|██████▋ | 7643/11526 [1:19:54<39:45, 1.63it/s] {'loss': 0.2288, 'grad_norm': 0.619681179523468, 'learning_rate': 3.0769685771053647e-06, 'epoch': 1.99}
66%|██████▋ | 7643/11526 [1:19:55<39:45, 1.63it/s] 66%|██████▋ | 7644/11526 [1:19:55<39:46, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.4188116192817688, 'learning_rate': 3.0755708326838507e-06, 'epoch': 1.99}
66%|██████▋ | 7644/11526 [1:19:55<39:46, 1.63it/s] 66%|██████▋ | 7645/11526 [1:19:56<39:45, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.5948120355606079, 'learning_rate': 3.0741732647819268e-06, 'epoch': 1.99}
66%|██████▋ | 7645/11526 [1:19:56<39:45, 1.63it/s] 66%|██████▋ | 7646/11526 [1:19:56<39:44, 1.63it/s] {'loss': 0.1779, 'grad_norm': 0.5156999826431274, 'learning_rate': 3.07277587352779e-06, 'epoch': 1.99}
66%|██████▋ | 7646/11526 [1:19:56<39:44, 1.63it/s] 66%|██████▋ | 7647/11526 [1:19:57<39:42, 1.63it/s] {'loss': 0.2048, 'grad_norm': 0.5260574221611023, 'learning_rate': 3.0713786590496155e-06, 'epoch': 1.99}
66%|██████▋ | 7647/11526 [1:19:57<39:42, 1.63it/s] 66%|██████▋ | 7648/11526 [1:19:58<39:42, 1.63it/s] {'loss': 0.1873, 'grad_norm': 0.510768473148346, 'learning_rate': 3.0699816214755645e-06, 'epoch': 1.99}
66%|██████▋ | 7648/11526 [1:19:58<39:42, 1.63it/s] 66%|██████▋ | 7649/11526 [1:19:58<39:45, 1.63it/s] {'loss': 0.1888, 'grad_norm': 0.5245544910430908, 'learning_rate': 3.0685847609337784e-06, 'epoch': 1.99}
66%|██████▋ | 7649/11526 [1:19:58<39:45, 1.63it/s] 66%|██████▋ | 7650/11526 [1:19:59<39:43, 1.63it/s] {'loss': 0.3034, 'grad_norm': 0.6989494562149048, 'learning_rate': 3.0671880775523895e-06, 'epoch': 1.99}
66%|██████▋ | 7650/11526 [1:19:59<39:43, 1.63it/s] 66%|██████▋ | 7651/11526 [1:19:59<39:41, 1.63it/s] {'loss': 0.1638, 'grad_norm': 0.43776512145996094, 'learning_rate': 3.0657915714595064e-06, 'epoch': 1.99}
66%|██████▋ | 7651/11526 [1:20:00<39:41, 1.63it/s] 66%|██████▋ | 7652/11526 [1:20:00<39:40, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.5094297528266907, 'learning_rate': 3.064395242783226e-06, 'epoch': 1.99}
66%|██████▋ | 7652/11526 [1:20:00<39:40, 1.63it/s] 66%|██████▋ | 7653/11526 [1:20:01<39:39, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.5409683585166931, 'learning_rate': 3.062999091651628e-06, 'epoch': 1.99}
66%|██████▋ | 7653/11526 [1:20:01<39:39, 1.63it/s] 66%|██████▋ | 7654/11526 [1:20:01<39:37, 1.63it/s] {'loss': 0.3041, 'grad_norm': 0.6612545847892761, 'learning_rate': 3.0616031181927717e-06, 'epoch': 1.99}
66%|██████▋ | 7654/11526 [1:20:01<39:37, 1.63it/s] 66%|██████▋ | 7655/11526 [1:20:02<39:38, 1.63it/s] {'loss': 0.2203, 'grad_norm': 0.5863503813743591, 'learning_rate': 3.0602073225347074e-06, 'epoch': 1.99}
66%|██████▋ | 7655/11526 [1:20:02<39:38, 1.63it/s] 66%|██████▋ | 7656/11526 [1:20:02<39:36, 1.63it/s] {'loss': 0.202, 'grad_norm': 0.6817415952682495, 'learning_rate': 3.0588117048054656e-06, 'epoch': 1.99}
66%|██████▋ | 7656/11526 [1:20:03<39:36, 1.63it/s] 66%|██████▋ | 7657/11526 [1:20:03<39:34, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.5002606511116028, 'learning_rate': 3.0574162651330585e-06, 'epoch': 1.99}
66%|██████▋ | 7657/11526 [1:20:03<39:34, 1.63it/s] 66%|██████▋ | 7658/11526 [1:20:04<39:35, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.5052294135093689, 'learning_rate': 3.056021003645484e-06, 'epoch': 1.99}
66%|██████▋ | 7658/11526 [1:20:04<39:35, 1.63it/s] 66%|██████▋ | 7659/11526 [1:20:04<39:35, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.465069979429245, 'learning_rate': 3.054625920470724e-06, 'epoch': 1.99}
66%|██████▋ | 7659/11526 [1:20:04<39:35, 1.63it/s] 66%|██████▋ | 7660/11526 [1:20:05<39:34, 1.63it/s] {'loss': 0.2135, 'grad_norm': 0.5772091150283813, 'learning_rate': 3.053231015736744e-06, 'epoch': 1.99}
66%|██████▋ | 7660/11526 [1:20:05<39:34, 1.63it/s] 66%|██████▋ | 7661/11526 [1:20:06<39:34, 1.63it/s] {'loss': 0.2281, 'grad_norm': 0.5728339552879333, 'learning_rate': 3.0518362895714914e-06, 'epoch': 1.99}
66%|██████▋ | 7661/11526 [1:20:06<39:34, 1.63it/s] 66%|██████▋ | 7662/11526 [1:20:06<39:32, 1.63it/s] {'loss': 0.1813, 'grad_norm': 0.547650933265686, 'learning_rate': 3.0504417421028997e-06, 'epoch': 1.99}
66%|██████▋ | 7662/11526 [1:20:06<39:32, 1.63it/s] 66%|██████▋ | 7663/11526 [1:20:07<40:43, 1.58it/s] {'loss': 0.189, 'grad_norm': 0.5451721549034119, 'learning_rate': 3.049047373458882e-06, 'epoch': 1.99}
66%|██████▋ | 7663/11526 [1:20:07<40:43, 1.58it/s] 66%|██████▋ | 7664/11526 [1:20:07<40:24, 1.59it/s] {'loss': 0.1976, 'grad_norm': 0.5507566332817078, 'learning_rate': 3.0476531837673417e-06, 'epoch': 1.99}
66%|██████▋ | 7664/11526 [1:20:08<40:24, 1.59it/s] 67%|██████▋ | 7665/11526 [1:20:08<40:07, 1.60it/s] {'loss': 0.1911, 'grad_norm': 0.5610688924789429, 'learning_rate': 3.0462591731561586e-06, 'epoch': 2.0}
67%|██████▋ | 7665/11526 [1:20:08<40:07, 1.60it/s] 67%|██████▋ | 7666/11526 [1:20:09<41:13, 1.56it/s] {'loss': 0.2305, 'grad_norm': 0.562248170375824, 'learning_rate': 3.0448653417532014e-06, 'epoch': 2.0}
67%|██████▋ | 7666/11526 [1:20:09<41:13, 1.56it/s] 67%|██████▋ | 7667/11526 [1:20:09<40:51, 1.57it/s] {'loss': 0.1413, 'grad_norm': 0.41125330328941345, 'learning_rate': 3.043471689686317e-06, 'epoch': 2.0}
67%|██████▋ | 7667/11526 [1:20:09<40:51, 1.57it/s] 67%|██████▋ | 7668/11526 [1:20:10<40:25, 1.59it/s] {'loss': 0.2213, 'grad_norm': 0.5528373122215271, 'learning_rate': 3.042078217083344e-06, 'epoch': 2.0}
67%|██████▋ | 7668/11526 [1:20:10<40:25, 1.59it/s] 67%|██████▋ | 7669/11526 [1:20:11<40:10, 1.60it/s] {'loss': 0.1764, 'grad_norm': 0.455848753452301, 'learning_rate': 3.040684924072096e-06, 'epoch': 2.0}
67%|██████▋ | 7669/11526 [1:20:11<40:10, 1.60it/s] 67%|██████▋ | 7670/11526 [1:20:11<39:58, 1.61it/s] {'loss': 0.1922, 'grad_norm': 0.49440842866897583, 'learning_rate': 3.039291810780376e-06, 'epoch': 2.0}
67%|██████▋ | 7670/11526 [1:20:11<39:58, 1.61it/s] 67%|██████▋ | 7671/11526 [1:20:12<39:47, 1.61it/s] {'loss': 0.1425, 'grad_norm': 0.43989893794059753, 'learning_rate': 3.0378988773359665e-06, 'epoch': 2.0}
67%|██████▋ | 7671/11526 [1:20:12<39:47, 1.61it/s] 67%|██████▋ | 7672/11526 [1:20:12<40:40, 1.58it/s] {'loss': 0.2025, 'grad_norm': 0.5750511288642883, 'learning_rate': 3.0365061238666336e-06, 'epoch': 2.0}
67%|██████▋ | 7672/11526 [1:20:13<40:40, 1.58it/s] 67%|██████▋ | 7673/11526 [1:20:13<40:26, 1.59it/s] {'loss': 0.1999, 'grad_norm': 0.5940809845924377, 'learning_rate': 3.035113550500133e-06, 'epoch': 2.0}
67%|██████▋ | 7673/11526 [1:20:13<40:26, 1.59it/s] 67%|██████▋ | 7674/11526 [1:20:14<40:10, 1.60it/s] {'loss': 0.1856, 'grad_norm': 0.5382583737373352, 'learning_rate': 3.0337211573641973e-06, 'epoch': 2.0}
67%|██████▋ | 7674/11526 [1:20:14<40:10, 1.60it/s] 67%|██████▋ | 7675/11526 [1:20:14<39:57, 1.61it/s] {'loss': 0.2123, 'grad_norm': 0.5874987840652466, 'learning_rate': 3.032328944586545e-06, 'epoch': 2.0}
67%|██████▋ | 7675/11526 [1:20:14<39:57, 1.61it/s] 67%|██████▋ | 7676/11526 [1:20:15<39:47, 1.61it/s] {'loss': 0.1609, 'grad_norm': 0.4324181377887726, 'learning_rate': 3.0309369122948756e-06, 'epoch': 2.0}
67%|██████▋ | 7676/11526 [1:20:15<39:47, 1.61it/s] 67%|██████▋ | 7677/11526 [1:20:16<39:39, 1.62it/s] {'loss': 0.1745, 'grad_norm': 0.5069663524627686, 'learning_rate': 3.0295450606168775e-06, 'epoch': 2.0}
67%|██████▋ | 7677/11526 [1:20:16<39:39, 1.62it/s] 67%|██████▋ | 7678/11526 [1:20:16<39:34, 1.62it/s] {'loss': 0.2613, 'grad_norm': 0.6848385334014893, 'learning_rate': 3.0281533896802186e-06, 'epoch': 2.0}
67%|██████▋ | 7678/11526 [1:20:16<39:34, 1.62it/s] 67%|██████▋ | 7679/11526 [1:20:17<39:32, 1.62it/s] {'loss': 0.1614, 'grad_norm': 0.4796990156173706, 'learning_rate': 3.026761899612549e-06, 'epoch': 2.0}
67%|██████▋ | 7679/11526 [1:20:17<39:32, 1.62it/s] 67%|██████▋ | 7680/11526 [1:20:17<39:29, 1.62it/s] {'loss': 0.1244, 'grad_norm': 0.41775229573249817, 'learning_rate': 3.0253705905415053e-06, 'epoch': 2.0}
67%|██████▋ | 7680/11526 [1:20:18<39:29, 1.62it/s] 67%|██████▋ | 7681/11526 [1:20:18<39:26, 1.63it/s] {'loss': 0.2372, 'grad_norm': 0.6039578318595886, 'learning_rate': 3.0239794625947057e-06, 'epoch': 2.0}
67%|██████▋ | 7681/11526 [1:20:18<39:26, 1.63it/s] 67%|██████▋ | 7682/11526 [1:20:19<39:24, 1.63it/s] {'loss': 0.1662, 'grad_norm': 0.47038528323173523, 'learning_rate': 3.0225885158997536e-06, 'epoch': 2.0}
67%|██████▋ | 7682/11526 [1:20:19<39:24, 1.63it/s] 67%|██████▋ | 7683/11526 [1:20:19<39:24, 1.63it/s] {'loss': 0.2205, 'grad_norm': 0.6433441042900085, 'learning_rate': 3.021197750584235e-06, 'epoch': 2.0}
67%|██████▋ | 7683/11526 [1:20:19<39:24, 1.63it/s] 67%|██████▋ | 7684/11526 [1:20:20<39:22, 1.63it/s] {'loss': 0.2071, 'grad_norm': 0.5682211518287659, 'learning_rate': 3.019807166775716e-06, 'epoch': 2.0}
67%|██████▋ | 7684/11526 [1:20:20<39:22, 1.63it/s] 67%|██████▋ | 7685/11526 [1:20:20<39:25, 1.62it/s] {'loss': 0.1652, 'grad_norm': 0.45053011178970337, 'learning_rate': 3.01841676460175e-06, 'epoch': 2.0}
67%|██████▋ | 7685/11526 [1:20:21<39:25, 1.62it/s] 67%|██████▋ | 7686/11526 [1:20:21<39:23, 1.62it/s] {'loss': 0.1103, 'grad_norm': 0.3509102463722229, 'learning_rate': 3.0170265441898745e-06, 'epoch': 2.0}
67%|██████▋ | 7686/11526 [1:20:21<39:23, 1.62it/s] 67%|██████▋ | 7687/11526 [1:20:22<39:20, 1.63it/s] {'loss': 0.1522, 'grad_norm': 0.46218428015708923, 'learning_rate': 3.0156365056676074e-06, 'epoch': 2.0}
67%|██████▋ | 7687/11526 [1:20:22<39:20, 1.63it/s] 67%|██████▋ | 7688/11526 [1:20:22<39:20, 1.63it/s] {'loss': 0.1562, 'grad_norm': 0.4674440026283264, 'learning_rate': 3.0142466491624495e-06, 'epoch': 2.0}
67%|██████▋ | 7688/11526 [1:20:22<39:20, 1.63it/s] 67%|██████▋ | 7689/11526 [1:20:23<39:23, 1.62it/s] {'loss': 0.1503, 'grad_norm': 0.46950721740722656, 'learning_rate': 3.012856974801888e-06, 'epoch': 2.0}
67%|██████▋ | 7689/11526 [1:20:23<39:23, 1.62it/s] 67%|██████▋ | 7690/11526 [1:20:24<39:19, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.4119868874549866, 'learning_rate': 3.0114674827133892e-06, 'epoch': 2.0}
67%|██████▋ | 7690/11526 [1:20:24<39:19, 1.63it/s] 67%|██████▋ | 7691/11526 [1:20:24<39:17, 1.63it/s] {'loss': 0.1836, 'grad_norm': 0.5414679646492004, 'learning_rate': 3.01007817302441e-06, 'epoch': 2.0}
67%|██████▋ | 7691/11526 [1:20:24<39:17, 1.63it/s] 67%|██████▋ | 7692/11526 [1:20:25<39:16, 1.63it/s] {'loss': 0.1812, 'grad_norm': 0.4726102352142334, 'learning_rate': 3.0086890458623812e-06, 'epoch': 2.0}
67%|██████▋ | 7692/11526 [1:20:25<39:16, 1.63it/s] 67%|██████▋ | 7693/11526 [1:20:25<39:14, 1.63it/s] {'loss': 0.1625, 'grad_norm': 0.4467054307460785, 'learning_rate': 3.007300101354724e-06, 'epoch': 2.0}
67%|██████▋ | 7693/11526 [1:20:26<39:14, 1.63it/s] 67%|██████▋ | 7694/11526 [1:20:26<39:17, 1.63it/s] {'loss': 0.1423, 'grad_norm': 0.44572514295578003, 'learning_rate': 3.0059113396288386e-06, 'epoch': 2.0}
67%|██████▋ | 7694/11526 [1:20:26<39:17, 1.63it/s] 67%|██████▋ | 7695/11526 [1:20:27<39:16, 1.63it/s] {'loss': 0.1476, 'grad_norm': 0.44619807600975037, 'learning_rate': 3.004522760812112e-06, 'epoch': 2.0}
67%|██████▋ | 7695/11526 [1:20:27<39:16, 1.63it/s] 67%|██████▋ | 7696/11526 [1:20:27<39:14, 1.63it/s] {'loss': 0.1369, 'grad_norm': 0.44742822647094727, 'learning_rate': 3.0031343650319113e-06, 'epoch': 2.0}
67%|██████▋ | 7696/11526 [1:20:27<39:14, 1.63it/s] 67%|██████▋ | 7697/11526 [1:20:28<39:12, 1.63it/s] {'loss': 0.1299, 'grad_norm': 0.4340263605117798, 'learning_rate': 3.0017461524155888e-06, 'epoch': 2.0}
67%|██████▋ | 7697/11526 [1:20:28<39:12, 1.63it/s] 67%|██████▋ | 7698/11526 [1:20:28<39:12, 1.63it/s] {'loss': 0.1786, 'grad_norm': 0.4611908793449402, 'learning_rate': 3.0003581230904766e-06, 'epoch': 2.0}
67%|██████▋ | 7698/11526 [1:20:29<39:12, 1.63it/s] 67%|██████▋ | 7699/11526 [1:20:29<39:24, 1.62it/s] {'loss': 0.1342, 'grad_norm': 0.472720742225647, 'learning_rate': 2.9989702771838973e-06, 'epoch': 2.0}
67%|██████▋ | 7699/11526 [1:20:29<39:24, 1.62it/s] 67%|██████▋ | 7700/11526 [1:20:30<39:18, 1.62it/s] {'loss': 0.1513, 'grad_norm': 0.4652046859264374, 'learning_rate': 2.997582614823149e-06, 'epoch': 2.0}
67%|██████▋ | 7700/11526 [1:20:30<39:18, 1.62it/s] 67%|██████▋ | 7701/11526 [1:20:30<39:15, 1.62it/s] {'loss': 0.1037, 'grad_norm': 0.36768263578414917, 'learning_rate': 2.9961951361355167e-06, 'epoch': 2.0}
67%|██████▋ | 7701/11526 [1:20:30<39:15, 1.62it/s] 67%|██████▋ | 7702/11526 [1:20:31<39:12, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.4656660258769989, 'learning_rate': 2.9948078412482684e-06, 'epoch': 2.0}
67%|██████▋ | 7702/11526 [1:20:31<39:12, 1.63it/s] 67%|██████▋ | 7703/11526 [1:20:32<39:11, 1.63it/s] {'loss': 0.1712, 'grad_norm': 0.49562105536460876, 'learning_rate': 2.993420730288651e-06, 'epoch': 2.0}
67%|██████▋ | 7703/11526 [1:20:32<39:11, 1.63it/s] 67%|██████▋ | 7704/11526 [1:20:32<39:13, 1.62it/s] {'loss': 0.1656, 'grad_norm': 0.4609875977039337, 'learning_rate': 2.992033803383903e-06, 'epoch': 2.01}
67%|██████▋ | 7704/11526 [1:20:32<39:13, 1.62it/s] 67%|██████▋ | 7705/11526 [1:20:33<39:11, 1.63it/s] {'loss': 0.1415, 'grad_norm': 0.47220805287361145, 'learning_rate': 2.9906470606612403e-06, 'epoch': 2.01}
67%|██████▋ | 7705/11526 [1:20:33<39:11, 1.63it/s] 67%|██████▋ | 7706/11526 [1:20:33<39:09, 1.63it/s] {'loss': 0.1276, 'grad_norm': 0.4391045868396759, 'learning_rate': 2.9892605022478617e-06, 'epoch': 2.01}
67%|██████▋ | 7706/11526 [1:20:34<39:09, 1.63it/s] 67%|██████▋ | 7707/11526 [1:20:34<39:07, 1.63it/s] {'loss': 0.1487, 'grad_norm': 0.7274668216705322, 'learning_rate': 2.987874128270949e-06, 'epoch': 2.01}
67%|██████▋ | 7707/11526 [1:20:34<39:07, 1.63it/s] 67%|██████▋ | 7708/11526 [1:20:35<39:05, 1.63it/s] {'loss': 0.1132, 'grad_norm': 0.4270000457763672, 'learning_rate': 2.9864879388576693e-06, 'epoch': 2.01}
67%|██████▋ | 7708/11526 [1:20:35<39:05, 1.63it/s] 67%|██████▋ | 7709/11526 [1:20:35<39:15, 1.62it/s] {'loss': 0.135, 'grad_norm': 0.48503124713897705, 'learning_rate': 2.985101934135175e-06, 'epoch': 2.01}
67%|██████▋ | 7709/11526 [1:20:35<39:15, 1.62it/s] 67%|██████▋ | 7710/11526 [1:20:36<39:11, 1.62it/s] {'loss': 0.1264, 'grad_norm': 0.38935157656669617, 'learning_rate': 2.9837161142305947e-06, 'epoch': 2.01}
67%|██████▋ | 7710/11526 [1:20:36<39:11, 1.62it/s] 67%|██████▋ | 7711/11526 [1:20:36<39:08, 1.62it/s] {'loss': 0.1651, 'grad_norm': 0.4940428137779236, 'learning_rate': 2.9823304792710437e-06, 'epoch': 2.01}
67%|██████▋ | 7711/11526 [1:20:37<39:08, 1.62it/s] 67%|██████▋ | 7712/11526 [1:20:37<39:04, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5100566148757935, 'learning_rate': 2.9809450293836207e-06, 'epoch': 2.01}
67%|██████▋ | 7712/11526 [1:20:37<39:04, 1.63it/s] 67%|██████▋ | 7713/11526 [1:20:38<39:04, 1.63it/s] {'loss': 0.1382, 'grad_norm': 0.48048922419548035, 'learning_rate': 2.979559764695409e-06, 'epoch': 2.01}
67%|██████▋ | 7713/11526 [1:20:38<39:04, 1.63it/s] 67%|██████▋ | 7714/11526 [1:20:38<39:04, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.5279778838157654, 'learning_rate': 2.9781746853334715e-06, 'epoch': 2.01}
67%|██████▋ | 7714/11526 [1:20:38<39:04, 1.63it/s] 67%|██████▋ | 7715/11526 [1:20:39<39:02, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.5230104327201843, 'learning_rate': 2.976789791424855e-06, 'epoch': 2.01}
67%|██████▋ | 7715/11526 [1:20:39<39:02, 1.63it/s] 67%|██████▋ | 7716/11526 [1:20:40<39:03, 1.63it/s] {'loss': 0.2197, 'grad_norm': 0.5553642511367798, 'learning_rate': 2.975405083096589e-06, 'epoch': 2.01}
67%|██████▋ | 7716/11526 [1:20:40<39:03, 1.63it/s] 67%|██████▋ | 7717/11526 [1:20:40<39:02, 1.63it/s] {'loss': 0.158, 'grad_norm': 0.5750640034675598, 'learning_rate': 2.9740205604756906e-06, 'epoch': 2.01}
67%|██████▋ | 7717/11526 [1:20:40<39:02, 1.63it/s] 67%|██████▋ | 7718/11526 [1:20:41<39:01, 1.63it/s] {'loss': 0.1508, 'grad_norm': 0.5272514224052429, 'learning_rate': 2.9726362236891536e-06, 'epoch': 2.01}
67%|██████▋ | 7718/11526 [1:20:41<39:01, 1.63it/s] 67%|██████▋ | 7719/11526 [1:20:41<39:05, 1.62it/s] {'loss': 0.1895, 'grad_norm': 0.659447193145752, 'learning_rate': 2.971252072863956e-06, 'epoch': 2.01}
67%|██████▋ | 7719/11526 [1:20:42<39:05, 1.62it/s] 67%|██████▋ | 7720/11526 [1:20:42<39:03, 1.62it/s] {'loss': 0.1413, 'grad_norm': 0.5036587119102478, 'learning_rate': 2.969868108127063e-06, 'epoch': 2.01}
67%|██████▋ | 7720/11526 [1:20:42<39:03, 1.62it/s] 67%|██████▋ | 7721/11526 [1:20:43<39:00, 1.63it/s] {'loss': 0.1881, 'grad_norm': 0.5996931791305542, 'learning_rate': 2.9684843296054154e-06, 'epoch': 2.01}
67%|██████▋ | 7721/11526 [1:20:43<39:00, 1.63it/s] 67%|██████▋ | 7722/11526 [1:20:43<38:59, 1.63it/s] {'loss': 0.1602, 'grad_norm': 0.6028060913085938, 'learning_rate': 2.9671007374259466e-06, 'epoch': 2.01}
67%|██████▋ | 7722/11526 [1:20:43<38:59, 1.63it/s] 67%|██████▋ | 7723/11526 [1:20:44<38:57, 1.63it/s] {'loss': 0.1369, 'grad_norm': 0.5055657625198364, 'learning_rate': 2.9657173317155642e-06, 'epoch': 2.01}
67%|██████▋ | 7723/11526 [1:20:44<38:57, 1.63it/s] 67%|██████▋ | 7724/11526 [1:20:44<38:59, 1.63it/s] {'loss': 0.1253, 'grad_norm': 0.5185709595680237, 'learning_rate': 2.964334112601163e-06, 'epoch': 2.01}
67%|██████▋ | 7724/11526 [1:20:45<38:59, 1.63it/s] 67%|██████▋ | 7725/11526 [1:20:45<38:57, 1.63it/s] {'loss': 0.1446, 'grad_norm': 0.5689511895179749, 'learning_rate': 2.962951080209617e-06, 'epoch': 2.01}
67%|██████▋ | 7725/11526 [1:20:45<38:57, 1.63it/s] 67%|██████▋ | 7726/11526 [1:20:46<38:55, 1.63it/s] {'loss': 0.1543, 'grad_norm': 0.5332662463188171, 'learning_rate': 2.961568234667791e-06, 'epoch': 2.01}
67%|██████▋ | 7726/11526 [1:20:46<38:55, 1.63it/s] 67%|██████▋ | 7727/11526 [1:20:46<38:52, 1.63it/s] {'loss': 0.2312, 'grad_norm': 0.7806150317192078, 'learning_rate': 2.9601855761025234e-06, 'epoch': 2.01}
67%|██████▋ | 7727/11526 [1:20:46<38:52, 1.63it/s] 67%|██████▋ | 7728/11526 [1:20:47<38:52, 1.63it/s] {'loss': 0.1612, 'grad_norm': 0.5560540556907654, 'learning_rate': 2.9588031046406426e-06, 'epoch': 2.01}
67%|██████▋ | 7728/11526 [1:20:47<38:52, 1.63it/s] 67%|██████▋ | 7729/11526 [1:20:48<38:54, 1.63it/s] {'loss': 0.1738, 'grad_norm': 0.5663638114929199, 'learning_rate': 2.9574208204089526e-06, 'epoch': 2.01}
67%|██████▋ | 7729/11526 [1:20:48<38:54, 1.63it/s] 67%|██████▋ | 7730/11526 [1:20:48<38:54, 1.63it/s] {'loss': 0.1619, 'grad_norm': 0.5166555047035217, 'learning_rate': 2.956038723534248e-06, 'epoch': 2.01}
67%|██████▋ | 7730/11526 [1:20:48<38:54, 1.63it/s] 67%|██████▋ | 7731/11526 [1:20:49<38:51, 1.63it/s] {'loss': 0.1126, 'grad_norm': 0.3816492259502411, 'learning_rate': 2.9546568141433007e-06, 'epoch': 2.01}
67%|██████▋ | 7731/11526 [1:20:49<38:51, 1.63it/s] 67%|██████▋ | 7732/11526 [1:20:49<38:50, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.4769774377346039, 'learning_rate': 2.9532750923628694e-06, 'epoch': 2.01}
67%|██████▋ | 7732/11526 [1:20:50<38:50, 1.63it/s] 67%|██████▋ | 7733/11526 [1:20:50<38:49, 1.63it/s] {'loss': 0.1656, 'grad_norm': 0.5986752510070801, 'learning_rate': 2.9518935583196908e-06, 'epoch': 2.01}
67%|██████▋ | 7733/11526 [1:20:50<38:49, 1.63it/s] 67%|██████▋ | 7734/11526 [1:20:51<38:49, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.56634920835495, 'learning_rate': 2.9505122121404882e-06, 'epoch': 2.01}
67%|██████▋ | 7734/11526 [1:20:51<38:49, 1.63it/s] 67%|██████▋ | 7735/11526 [1:20:51<38:48, 1.63it/s] {'loss': 0.1975, 'grad_norm': 0.6510811448097229, 'learning_rate': 2.9491310539519665e-06, 'epoch': 2.01}
67%|██████▋ | 7735/11526 [1:20:51<38:48, 1.63it/s] 67%|██████▋ | 7736/11526 [1:20:52<38:46, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.5960936546325684, 'learning_rate': 2.947750083880814e-06, 'epoch': 2.01}
67%|██████▋ | 7736/11526 [1:20:52<38:46, 1.63it/s] 67%|██████▋ | 7737/11526 [1:20:52<38:45, 1.63it/s] {'loss': 0.1547, 'grad_norm': 0.5829259753227234, 'learning_rate': 2.946369302053701e-06, 'epoch': 2.01}
67%|██████▋ | 7737/11526 [1:20:53<38:45, 1.63it/s] 67%|██████▋ | 7738/11526 [1:20:53<38:45, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.5301044583320618, 'learning_rate': 2.94498870859728e-06, 'epoch': 2.01}
67%|██████▋ | 7738/11526 [1:20:53<38:45, 1.63it/s] 67%|██████▋ | 7739/11526 [1:20:54<38:47, 1.63it/s] {'loss': 0.1316, 'grad_norm': 0.4822932779788971, 'learning_rate': 2.9436083036381858e-06, 'epoch': 2.01}
67%|██████▋ | 7739/11526 [1:20:54<38:47, 1.63it/s] 67%|██████▋ | 7740/11526 [1:20:54<38:46, 1.63it/s] {'loss': 0.1664, 'grad_norm': 0.5712140202522278, 'learning_rate': 2.94222808730304e-06, 'epoch': 2.01}
67%|██████▋ | 7740/11526 [1:20:54<38:46, 1.63it/s] 67%|██████▋ | 7741/11526 [1:20:55<38:44, 1.63it/s] {'loss': 0.1837, 'grad_norm': 0.5836939811706543, 'learning_rate': 2.9408480597184417e-06, 'epoch': 2.01}
67%|██████▋ | 7741/11526 [1:20:55<38:44, 1.63it/s] 67%|██████▋ | 7742/11526 [1:20:56<38:43, 1.63it/s] {'loss': 0.1118, 'grad_norm': 0.46531546115875244, 'learning_rate': 2.939468221010976e-06, 'epoch': 2.02}
67%|██████▋ | 7742/11526 [1:20:56<38:43, 1.63it/s] 67%|██████▋ | 7743/11526 [1:20:56<38:43, 1.63it/s] {'loss': 0.1461, 'grad_norm': 0.5426199436187744, 'learning_rate': 2.938088571307208e-06, 'epoch': 2.02}
67%|██████▋ | 7743/11526 [1:20:56<38:43, 1.63it/s] 67%|██████▋ | 7744/11526 [1:20:57<38:45, 1.63it/s] {'loss': 0.1853, 'grad_norm': 0.6213687658309937, 'learning_rate': 2.9367091107336885e-06, 'epoch': 2.02}
67%|██████▋ | 7744/11526 [1:20:57<38:45, 1.63it/s] 67%|██████▋ | 7745/11526 [1:20:57<38:44, 1.63it/s] {'loss': 0.1146, 'grad_norm': 0.4313426613807678, 'learning_rate': 2.9353298394169494e-06, 'epoch': 2.02}
67%|██████▋ | 7745/11526 [1:20:57<38:44, 1.63it/s] 67%|██████▋ | 7746/11526 [1:20:58<38:43, 1.63it/s] {'loss': 0.1671, 'grad_norm': 0.5655930638313293, 'learning_rate': 2.9339507574835052e-06, 'epoch': 2.02}
67%|██████▋ | 7746/11526 [1:20:58<38:43, 1.63it/s] 67%|██████▋ | 7747/11526 [1:20:59<38:42, 1.63it/s] {'loss': 0.1205, 'grad_norm': 0.44999030232429504, 'learning_rate': 2.9325718650598513e-06, 'epoch': 2.02}
67%|██████▋ | 7747/11526 [1:20:59<38:42, 1.63it/s] 67%|██████▋ | 7748/11526 [1:20:59<38:40, 1.63it/s] {'loss': 0.1586, 'grad_norm': 0.6225541234016418, 'learning_rate': 2.9311931622724706e-06, 'epoch': 2.02}
67%|██████▋ | 7748/11526 [1:20:59<38:40, 1.63it/s] 67%|██████▋ | 7749/11526 [1:21:00<38:40, 1.63it/s] {'loss': 0.1499, 'grad_norm': 0.5098994374275208, 'learning_rate': 2.929814649247823e-06, 'epoch': 2.02}
67%|██████▋ | 7749/11526 [1:21:00<38:40, 1.63it/s] 67%|██████▋ | 7750/11526 [1:21:00<38:39, 1.63it/s] {'loss': 0.1296, 'grad_norm': 0.5323485732078552, 'learning_rate': 2.928436326112356e-06, 'epoch': 2.02}
67%|██████▋ | 7750/11526 [1:21:01<38:39, 1.63it/s] 67%|██████▋ | 7751/11526 [1:21:01<38:37, 1.63it/s] {'loss': 0.1913, 'grad_norm': 0.6610565781593323, 'learning_rate': 2.927058192992496e-06, 'epoch': 2.02}
67%|██████▋ | 7751/11526 [1:21:01<38:37, 1.63it/s] 67%|██████▋ | 7752/11526 [1:21:02<38:36, 1.63it/s] {'loss': 0.14, 'grad_norm': 0.5188800096511841, 'learning_rate': 2.9256802500146506e-06, 'epoch': 2.02}
67%|██████▋ | 7752/11526 [1:21:02<38:36, 1.63it/s] 67%|██████▋ | 7753/11526 [1:21:02<38:36, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.5942791700363159, 'learning_rate': 2.924302497305217e-06, 'epoch': 2.02}
67%|██████▋ | 7753/11526 [1:21:02<38:36, 1.63it/s] 67%|██████▋ | 7754/11526 [1:21:03<38:38, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.5675233006477356, 'learning_rate': 2.9229249349905686e-06, 'epoch': 2.02}
67%|██████▋ | 7754/11526 [1:21:03<38:38, 1.63it/s] 67%|██████▋ | 7755/11526 [1:21:04<38:35, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.5571386814117432, 'learning_rate': 2.9215475631970637e-06, 'epoch': 2.02}
67%|██████▋ | 7755/11526 [1:21:04<38:35, 1.63it/s] 67%|██████▋ | 7756/11526 [1:21:04<38:34, 1.63it/s] {'loss': 0.152, 'grad_norm': 0.5274417996406555, 'learning_rate': 2.9201703820510396e-06, 'epoch': 2.02}
67%|██████▋ | 7756/11526 [1:21:04<38:34, 1.63it/s] 67%|██████▋ | 7757/11526 [1:21:05<38:34, 1.63it/s] {'loss': 0.193, 'grad_norm': 0.5733721256256104, 'learning_rate': 2.9187933916788247e-06, 'epoch': 2.02}
67%|██████▋ | 7757/11526 [1:21:05<38:34, 1.63it/s] 67%|██████▋ | 7758/11526 [1:21:05<38:32, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5377631783485413, 'learning_rate': 2.91741659220672e-06, 'epoch': 2.02}
67%|██████▋ | 7758/11526 [1:21:05<38:32, 1.63it/s] 67%|██████▋ | 7759/11526 [1:21:06<38:39, 1.62it/s] {'loss': 0.1217, 'grad_norm': 0.49848535656929016, 'learning_rate': 2.9160399837610146e-06, 'epoch': 2.02}
67%|██████▋ | 7759/11526 [1:21:06<38:39, 1.62it/s] 67%|██████▋ | 7760/11526 [1:21:07<38:37, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.6121918559074402, 'learning_rate': 2.914663566467982e-06, 'epoch': 2.02}
67%|██████▋ | 7760/11526 [1:21:07<38:37, 1.63it/s] 67%|██████▋ | 7761/11526 [1:21:07<38:36, 1.63it/s] {'loss': 0.1501, 'grad_norm': 0.5854814648628235, 'learning_rate': 2.9132873404538677e-06, 'epoch': 2.02}
67%|██████▋ | 7761/11526 [1:21:07<38:36, 1.63it/s] 67%|██████▋ | 7762/11526 [1:21:08<38:35, 1.63it/s] {'loss': 0.1746, 'grad_norm': 0.58206707239151, 'learning_rate': 2.9119113058449156e-06, 'epoch': 2.02}
67%|██████▋ | 7762/11526 [1:21:08<38:35, 1.63it/s] 67%|██████▋ | 7763/11526 [1:21:08<38:33, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.5489407777786255, 'learning_rate': 2.9105354627673365e-06, 'epoch': 2.02}
67%|██████▋ | 7763/11526 [1:21:09<38:33, 1.63it/s] 67%|██████▋ | 7764/11526 [1:21:09<38:36, 1.62it/s] {'loss': 0.1903, 'grad_norm': 0.6587295532226562, 'learning_rate': 2.909159811347334e-06, 'epoch': 2.02}
67%|██████▋ | 7764/11526 [1:21:09<38:36, 1.62it/s] 67%|██████▋ | 7765/11526 [1:21:10<38:33, 1.63it/s] {'loss': 0.1925, 'grad_norm': 0.5771775841712952, 'learning_rate': 2.9077843517110897e-06, 'epoch': 2.02}
67%|██████▋ | 7765/11526 [1:21:10<38:33, 1.63it/s] 67%|██████▋ | 7766/11526 [1:21:10<38:31, 1.63it/s] {'loss': 0.1403, 'grad_norm': 0.5043246150016785, 'learning_rate': 2.9064090839847692e-06, 'epoch': 2.02}
67%|██████▋ | 7766/11526 [1:21:10<38:31, 1.63it/s] 67%|██████▋ | 7767/11526 [1:21:11<38:30, 1.63it/s] {'loss': 0.1606, 'grad_norm': 0.5958981513977051, 'learning_rate': 2.9050340082945195e-06, 'epoch': 2.02}
67%|██████▋ | 7767/11526 [1:21:11<38:30, 1.63it/s] 67%|██████▋ | 7768/11526 [1:21:11<38:29, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.5950961112976074, 'learning_rate': 2.9036591247664727e-06, 'epoch': 2.02}
67%|██████▋ | 7768/11526 [1:21:12<38:29, 1.63it/s] 67%|██████▋ | 7769/11526 [1:21:12<38:30, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.5187472105026245, 'learning_rate': 2.902284433526736e-06, 'epoch': 2.02}
67%|██████▋ | 7769/11526 [1:21:12<38:30, 1.63it/s] 67%|██████▋ | 7770/11526 [1:21:13<38:27, 1.63it/s] {'loss': 0.1282, 'grad_norm': 0.4887823164463043, 'learning_rate': 2.900909934701407e-06, 'epoch': 2.02}
67%|██████▋ | 7770/11526 [1:21:13<38:27, 1.63it/s] 67%|██████▋ | 7771/11526 [1:21:13<38:27, 1.63it/s] {'loss': 0.1701, 'grad_norm': 0.5632336139678955, 'learning_rate': 2.8995356284165615e-06, 'epoch': 2.02}
67%|██████▋ | 7771/11526 [1:21:13<38:27, 1.63it/s] 67%|██████▋ | 7772/11526 [1:21:14<38:26, 1.63it/s] {'loss': 0.1399, 'grad_norm': 0.45983240008354187, 'learning_rate': 2.89816151479826e-06, 'epoch': 2.02}
67%|██████▋ | 7772/11526 [1:21:14<38:26, 1.63it/s] 67%|██████▋ | 7773/11526 [1:21:15<38:26, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.5483591556549072, 'learning_rate': 2.896787593972543e-06, 'epoch': 2.02}
67%|██████▋ | 7773/11526 [1:21:15<38:26, 1.63it/s] 67%|██████▋ | 7774/11526 [1:21:15<38:25, 1.63it/s] {'loss': 0.1675, 'grad_norm': 0.6588839888572693, 'learning_rate': 2.8954138660654345e-06, 'epoch': 2.02}
67%|██████▋ | 7774/11526 [1:21:15<38:25, 1.63it/s] 67%|██████▋ | 7775/11526 [1:21:16<38:24, 1.63it/s] {'loss': 0.152, 'grad_norm': 0.5792985558509827, 'learning_rate': 2.8940403312029407e-06, 'epoch': 2.02}
67%|██████▋ | 7775/11526 [1:21:16<38:24, 1.63it/s] 67%|██████▋ | 7776/11526 [1:21:16<38:23, 1.63it/s] {'loss': 0.1193, 'grad_norm': 0.4869558811187744, 'learning_rate': 2.8926669895110525e-06, 'epoch': 2.02}
67%|██████▋ | 7776/11526 [1:21:17<38:23, 1.63it/s] 67%|██████▋ | 7777/11526 [1:21:17<38:22, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.592850923538208, 'learning_rate': 2.8912938411157366e-06, 'epoch': 2.02}
67%|██████▋ | 7777/11526 [1:21:17<38:22, 1.63it/s] 67%|██████▋ | 7778/11526 [1:21:18<38:22, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.529350221157074, 'learning_rate': 2.889920886142947e-06, 'epoch': 2.02}
67%|██████▋ | 7778/11526 [1:21:18<38:22, 1.63it/s] 67%|██████▋ | 7779/11526 [1:21:18<38:24, 1.63it/s] {'loss': 0.1255, 'grad_norm': 0.7259747385978699, 'learning_rate': 2.8885481247186202e-06, 'epoch': 2.02}
67%|██████▋ | 7779/11526 [1:21:18<38:24, 1.63it/s] 67%|██████▋ | 7780/11526 [1:21:19<38:23, 1.63it/s] {'loss': 0.1064, 'grad_norm': 0.43385496735572815, 'learning_rate': 2.887175556968673e-06, 'epoch': 2.02}
67%|██████▋ | 7780/11526 [1:21:19<38:23, 1.63it/s] 68%|██████▊ | 7781/11526 [1:21:19<38:21, 1.63it/s] {'loss': 0.1827, 'grad_norm': 0.6821838021278381, 'learning_rate': 2.8858031830190046e-06, 'epoch': 2.03}
68%|██████▊ | 7781/11526 [1:21:20<38:21, 1.63it/s] 68%|██████▊ | 7782/11526 [1:21:20<38:19, 1.63it/s] {'loss': 0.1391, 'grad_norm': 0.5993831157684326, 'learning_rate': 2.8844310029955003e-06, 'epoch': 2.03}
68%|██████▊ | 7782/11526 [1:21:20<38:19, 1.63it/s] 68%|██████▊ | 7783/11526 [1:21:21<38:17, 1.63it/s] {'loss': 0.1103, 'grad_norm': 0.44148024916648865, 'learning_rate': 2.883059017024017e-06, 'epoch': 2.03}
68%|██████▊ | 7783/11526 [1:21:21<38:17, 1.63it/s] 68%|██████▊ | 7784/11526 [1:21:21<38:16, 1.63it/s] {'loss': 0.1339, 'grad_norm': 0.5207849144935608, 'learning_rate': 2.8816872252304095e-06, 'epoch': 2.03}
68%|██████▊ | 7784/11526 [1:21:21<38:16, 1.63it/s] 68%|██████▊ | 7785/11526 [1:21:22<38:16, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.5246085524559021, 'learning_rate': 2.8803156277405e-06, 'epoch': 2.03}
68%|██████▊ | 7785/11526 [1:21:22<38:16, 1.63it/s] 68%|██████▊ | 7786/11526 [1:21:23<38:18, 1.63it/s] {'loss': 0.1566, 'grad_norm': 0.5035196542739868, 'learning_rate': 2.8789442246801027e-06, 'epoch': 2.03}
68%|██████▊ | 7786/11526 [1:21:23<38:18, 1.63it/s] 68%|██████▊ | 7787/11526 [1:21:23<38:18, 1.63it/s] {'loss': 0.1581, 'grad_norm': 0.5287911295890808, 'learning_rate': 2.8775730161750086e-06, 'epoch': 2.03}
68%|██████▊ | 7787/11526 [1:21:23<38:18, 1.63it/s] 68%|██████▊ | 7788/11526 [1:21:24<38:17, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.6062538027763367, 'learning_rate': 2.876202002350994e-06, 'epoch': 2.03}
68%|██████▊ | 7788/11526 [1:21:24<38:17, 1.63it/s] 68%|██████▊ | 7789/11526 [1:21:24<38:15, 1.63it/s] {'loss': 0.1315, 'grad_norm': 0.48366960883140564, 'learning_rate': 2.8748311833338154e-06, 'epoch': 2.03}
68%|██████▊ | 7789/11526 [1:21:25<38:15, 1.63it/s] 68%|██████▊ | 7790/11526 [1:21:25<38:17, 1.63it/s] {'loss': 0.1485, 'grad_norm': 0.48531684279441833, 'learning_rate': 2.8734605592492147e-06, 'epoch': 2.03}
68%|██████▊ | 7790/11526 [1:21:25<38:17, 1.63it/s] 68%|██████▊ | 7791/11526 [1:21:26<38:25, 1.62it/s] {'loss': 0.1466, 'grad_norm': 0.5458398461341858, 'learning_rate': 2.8720901302229086e-06, 'epoch': 2.03}
68%|██████▊ | 7791/11526 [1:21:26<38:25, 1.62it/s] 68%|██████▊ | 7792/11526 [1:21:26<38:20, 1.62it/s] {'loss': 0.1472, 'grad_norm': 0.5420532822608948, 'learning_rate': 2.8707198963806033e-06, 'epoch': 2.03}
68%|██████▊ | 7792/11526 [1:21:26<38:20, 1.62it/s] 68%|██████▊ | 7793/11526 [1:21:27<38:17, 1.62it/s] {'loss': 0.1666, 'grad_norm': 0.635770857334137, 'learning_rate': 2.869349857847984e-06, 'epoch': 2.03}
68%|██████▊ | 7793/11526 [1:21:27<38:17, 1.62it/s] 68%|██████▊ | 7794/11526 [1:21:27<38:16, 1.63it/s] {'loss': 0.1699, 'grad_norm': 0.6152536273002625, 'learning_rate': 2.8679800147507186e-06, 'epoch': 2.03}
68%|██████▊ | 7794/11526 [1:21:28<38:16, 1.63it/s] 68%|██████▊ | 7795/11526 [1:21:28<38:14, 1.63it/s] {'loss': 0.1251, 'grad_norm': 0.457194447517395, 'learning_rate': 2.8666103672144597e-06, 'epoch': 2.03}
68%|██████▊ | 7795/11526 [1:21:28<38:14, 1.63it/s] 68%|██████▊ | 7796/11526 [1:21:29<38:13, 1.63it/s] {'loss': 0.1225, 'grad_norm': 0.475854754447937, 'learning_rate': 2.865240915364832e-06, 'epoch': 2.03}
68%|██████▊ | 7796/11526 [1:21:29<38:13, 1.63it/s] 68%|██████▊ | 7797/11526 [1:21:29<38:13, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.5011893510818481, 'learning_rate': 2.8638716593274553e-06, 'epoch': 2.03}
68%|██████▊ | 7797/11526 [1:21:29<38:13, 1.63it/s] 68%|██████▊ | 7798/11526 [1:21:30<38:10, 1.63it/s] {'loss': 0.2222, 'grad_norm': 0.756392776966095, 'learning_rate': 2.8625025992279263e-06, 'epoch': 2.03}
68%|██████▊ | 7798/11526 [1:21:30<38:10, 1.63it/s] 68%|██████▊ | 7799/11526 [1:21:31<38:12, 1.63it/s] {'loss': 0.1806, 'grad_norm': 0.6209340691566467, 'learning_rate': 2.8611337351918188e-06, 'epoch': 2.03}
68%|██████▊ | 7799/11526 [1:21:31<38:12, 1.63it/s] 68%|██████▊ | 7800/11526 [1:21:31<38:10, 1.63it/s] {'loss': 0.1719, 'grad_norm': 0.7036527991294861, 'learning_rate': 2.859765067344695e-06, 'epoch': 2.03}
68%|██████▊ | 7800/11526 [1:21:31<38:10, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.72it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.5458315014839172, 'eval_runtime': 1.9576, 'eval_samples_per_second': 102.168, 'eval_steps_per_second': 6.641, 'epoch': 2.03}
68%|██████▊ | 7800/11526 [1:21:33<38:10, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 68%|██████▊ | 7801/11526 [1:21:34<1:14:43, 1.20s/it] {'loss': 0.188, 'grad_norm': 0.6165781021118164, 'learning_rate': 2.8583965958120963e-06, 'epoch': 2.03}
68%|██████▊ | 7801/11526 [1:21:34<1:14:43, 1.20s/it] 68%|██████▊ | 7802/11526 [1:21:34<1:03:45, 1.03s/it] {'loss': 0.1519, 'grad_norm': 0.5653084516525269, 'learning_rate': 2.8570283207195474e-06, 'epoch': 2.03}
68%|██████▊ | 7802/11526 [1:21:34<1:03:45, 1.03s/it] 68%|██████▊ | 7803/11526 [1:21:35<56:04, 1.11it/s] {'loss': 0.1227, 'grad_norm': 0.4960523843765259, 'learning_rate': 2.855660242192556e-06, 'epoch': 2.03}
68%|██████▊ | 7803/11526 [1:21:35<56:04, 1.11it/s] 68%|██████▊ | 7804/11526 [1:21:36<50:43, 1.22it/s] {'loss': 0.1422, 'grad_norm': 0.5459579825401306, 'learning_rate': 2.8542923603566053e-06, 'epoch': 2.03}
68%|██████▊ | 7804/11526 [1:21:36<50:43, 1.22it/s] 68%|██████▊ | 7805/11526 [1:21:36<46:56, 1.32it/s] {'loss': 0.1596, 'grad_norm': 0.6465276479721069, 'learning_rate': 2.8529246753371666e-06, 'epoch': 2.03}
68%|██████▊ | 7805/11526 [1:21:36<46:56, 1.32it/s] 68%|██████▊ | 7806/11526 [1:21:37<44:16, 1.40it/s] {'loss': 0.1756, 'grad_norm': 0.5503053665161133, 'learning_rate': 2.851557187259697e-06, 'epoch': 2.03}
68%|██████▊ | 7806/11526 [1:21:37<44:16, 1.40it/s] 68%|██████▊ | 7807/11526 [1:21:37<42:23, 1.46it/s] {'loss': 0.1328, 'grad_norm': 0.4998021721839905, 'learning_rate': 2.850189896249624e-06, 'epoch': 2.03}
68%|██████▊ | 7807/11526 [1:21:38<42:23, 1.46it/s] 68%|██████▊ | 7808/11526 [1:21:38<41:04, 1.51it/s] {'loss': 0.1105, 'grad_norm': 0.4933095872402191, 'learning_rate': 2.848822802432366e-06, 'epoch': 2.03}
68%|██████▊ | 7808/11526 [1:21:38<41:04, 1.51it/s] 68%|██████▊ | 7809/11526 [1:21:39<40:11, 1.54it/s] {'loss': 0.1547, 'grad_norm': 0.5857586860656738, 'learning_rate': 2.8474559059333218e-06, 'epoch': 2.03}
68%|██████▊ | 7809/11526 [1:21:39<40:11, 1.54it/s] 68%|██████▊ | 7810/11526 [1:21:39<39:32, 1.57it/s] {'loss': 0.1437, 'grad_norm': 0.5687321424484253, 'learning_rate': 2.8460892068778656e-06, 'epoch': 2.03}
68%|██████▊ | 7810/11526 [1:21:39<39:32, 1.57it/s] 68%|██████▊ | 7811/11526 [1:21:40<39:04, 1.58it/s] {'loss': 0.1274, 'grad_norm': 0.6090855002403259, 'learning_rate': 2.8447227053913662e-06, 'epoch': 2.03}
68%|██████▊ | 7811/11526 [1:21:40<39:04, 1.58it/s] 68%|██████▊ | 7812/11526 [1:21:41<38:44, 1.60it/s] {'loss': 0.1423, 'grad_norm': 0.473171591758728, 'learning_rate': 2.843356401599162e-06, 'epoch': 2.03}
68%|██████▊ | 7812/11526 [1:21:41<38:44, 1.60it/s] 68%|██████▊ | 7813/11526 [1:21:41<38:31, 1.61it/s] {'loss': 0.1854, 'grad_norm': 0.6518737077713013, 'learning_rate': 2.8419902956265788e-06, 'epoch': 2.03}
68%|██████▊ | 7813/11526 [1:21:41<38:31, 1.61it/s] 68%|██████▊ | 7814/11526 [1:21:42<38:25, 1.61it/s] {'loss': 0.1429, 'grad_norm': 0.50721675157547, 'learning_rate': 2.8406243875989247e-06, 'epoch': 2.03}
68%|██████▊ | 7814/11526 [1:21:42<38:25, 1.61it/s] 68%|██████▊ | 7815/11526 [1:21:42<38:16, 1.62it/s] {'loss': 0.1408, 'grad_norm': 0.4860514998435974, 'learning_rate': 2.8392586776414877e-06, 'epoch': 2.03}
68%|██████▊ | 7815/11526 [1:21:42<38:16, 1.62it/s] 68%|██████▊ | 7816/11526 [1:21:43<38:10, 1.62it/s] {'loss': 0.1569, 'grad_norm': 0.60845547914505, 'learning_rate': 2.8378931658795396e-06, 'epoch': 2.03}
68%|██████▊ | 7816/11526 [1:21:43<38:10, 1.62it/s] 68%|██████▊ | 7817/11526 [1:21:44<38:05, 1.62it/s] {'loss': 0.1327, 'grad_norm': 0.49716970324516296, 'learning_rate': 2.8365278524383333e-06, 'epoch': 2.03}
68%|██████▊ | 7817/11526 [1:21:44<38:05, 1.62it/s] 68%|██████▊ | 7818/11526 [1:21:44<38:03, 1.62it/s] {'loss': 0.1823, 'grad_norm': 0.6213840842247009, 'learning_rate': 2.8351627374431005e-06, 'epoch': 2.03}
68%|██████▊ | 7818/11526 [1:21:44<38:03, 1.62it/s] 68%|██████▊ | 7819/11526 [1:21:45<38:12, 1.62it/s] {'loss': 0.1394, 'grad_norm': 0.5486015677452087, 'learning_rate': 2.8337978210190586e-06, 'epoch': 2.04}
68%|██████▊ | 7819/11526 [1:21:45<38:12, 1.62it/s] 68%|██████▊ | 7820/11526 [1:21:45<38:06, 1.62it/s] {'loss': 0.1519, 'grad_norm': 0.6609869003295898, 'learning_rate': 2.8324331032914065e-06, 'epoch': 2.04}
68%|██████▊ | 7820/11526 [1:21:46<38:06, 1.62it/s] 68%|██████▊ | 7821/11526 [1:21:46<38:03, 1.62it/s] {'loss': 0.1449, 'grad_norm': 0.49844637513160706, 'learning_rate': 2.8310685843853223e-06, 'epoch': 2.04}
68%|██████▊ | 7821/11526 [1:21:46<38:03, 1.62it/s] 68%|██████▊ | 7822/11526 [1:21:47<38:00, 1.62it/s] {'loss': 0.1505, 'grad_norm': 0.5642498731613159, 'learning_rate': 2.8297042644259703e-06, 'epoch': 2.04}
68%|██████▊ | 7822/11526 [1:21:47<38:00, 1.62it/s] 68%|██████▊ | 7823/11526 [1:21:47<37:57, 1.63it/s] {'loss': 0.1662, 'grad_norm': 0.6580663323402405, 'learning_rate': 2.828340143538488e-06, 'epoch': 2.04}
68%|██████▊ | 7823/11526 [1:21:47<37:57, 1.63it/s] 68%|██████▊ | 7824/11526 [1:21:48<37:59, 1.62it/s] {'loss': 0.2335, 'grad_norm': 0.6023546457290649, 'learning_rate': 2.8269762218480056e-06, 'epoch': 2.04}
68%|██████▊ | 7824/11526 [1:21:48<37:59, 1.62it/s] 68%|██████▊ | 7825/11526 [1:21:49<37:56, 1.63it/s] {'loss': 0.2337, 'grad_norm': 0.67133629322052, 'learning_rate': 2.825612499479631e-06, 'epoch': 2.04}
68%|██████▊ | 7825/11526 [1:21:49<37:56, 1.63it/s] 68%|██████▊ | 7826/11526 [1:21:49<37:54, 1.63it/s] {'loss': 0.1397, 'grad_norm': 0.5108235478401184, 'learning_rate': 2.824248976558447e-06, 'epoch': 2.04}
68%|██████▊ | 7826/11526 [1:21:49<37:54, 1.63it/s] 68%|██████▊ | 7827/11526 [1:21:50<37:53, 1.63it/s] {'loss': 0.1612, 'grad_norm': 0.5509976148605347, 'learning_rate': 2.822885653209526e-06, 'epoch': 2.04}
68%|██████▊ | 7827/11526 [1:21:50<37:53, 1.63it/s] 68%|██████▊ | 7828/11526 [1:21:50<37:51, 1.63it/s] {'loss': 0.1239, 'grad_norm': 0.4282972514629364, 'learning_rate': 2.8215225295579206e-06, 'epoch': 2.04}
68%|██████▊ | 7828/11526 [1:21:50<37:51, 1.63it/s] 68%|██████▊ | 7829/11526 [1:21:51<37:53, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.5625062584877014, 'learning_rate': 2.820159605728664e-06, 'epoch': 2.04}
68%|██████▊ | 7829/11526 [1:21:51<37:53, 1.63it/s] 68%|██████▊ | 7830/11526 [1:21:52<37:51, 1.63it/s] {'loss': 0.118, 'grad_norm': 0.4649454355239868, 'learning_rate': 2.8187968818467725e-06, 'epoch': 2.04}
68%|██████▊ | 7830/11526 [1:21:52<37:51, 1.63it/s] 68%|██████▊ | 7831/11526 [1:21:52<37:53, 1.63it/s] {'loss': 0.15, 'grad_norm': 0.5591362118721008, 'learning_rate': 2.81743435803724e-06, 'epoch': 2.04}
68%|██████▊ | 7831/11526 [1:21:52<37:53, 1.63it/s] 68%|██████▊ | 7832/11526 [1:21:53<37:51, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.5474936962127686, 'learning_rate': 2.8160720344250445e-06, 'epoch': 2.04}
68%|██████▊ | 7832/11526 [1:21:53<37:51, 1.63it/s] 68%|██████▊ | 7833/11526 [1:21:53<37:51, 1.63it/s] {'loss': 0.1744, 'grad_norm': 0.5714928507804871, 'learning_rate': 2.8147099111351517e-06, 'epoch': 2.04}
68%|██████▊ | 7833/11526 [1:21:54<37:51, 1.63it/s] 68%|██████▊ | 7834/11526 [1:21:54<37:53, 1.62it/s] {'loss': 0.2089, 'grad_norm': 0.7913138270378113, 'learning_rate': 2.8133479882924985e-06, 'epoch': 2.04}
68%|██████▊ | 7834/11526 [1:21:54<37:53, 1.62it/s] 68%|██████▊ | 7835/11526 [1:21:55<37:50, 1.63it/s] {'loss': 0.1375, 'grad_norm': 0.5120853781700134, 'learning_rate': 2.8119862660220088e-06, 'epoch': 2.04}
68%|██████▊ | 7835/11526 [1:21:55<37:50, 1.63it/s] 68%|██████▊ | 7836/11526 [1:21:55<37:49, 1.63it/s] {'loss': 0.1565, 'grad_norm': 0.5898758769035339, 'learning_rate': 2.810624744448588e-06, 'epoch': 2.04}
68%|██████▊ | 7836/11526 [1:21:55<37:49, 1.63it/s] 68%|██████▊ | 7837/11526 [1:21:56<37:47, 1.63it/s] {'loss': 0.097, 'grad_norm': 0.44519612193107605, 'learning_rate': 2.8092634236971224e-06, 'epoch': 2.04}
68%|██████▊ | 7837/11526 [1:21:56<37:47, 1.63it/s] 68%|██████▊ | 7838/11526 [1:21:57<37:46, 1.63it/s] {'loss': 0.2332, 'grad_norm': 0.5080232620239258, 'learning_rate': 2.8079023038924803e-06, 'epoch': 2.04}
68%|██████▊ | 7838/11526 [1:21:57<37:46, 1.63it/s] 68%|██████▊ | 7839/11526 [1:21:57<37:47, 1.63it/s] {'loss': 0.1277, 'grad_norm': 0.5101184248924255, 'learning_rate': 2.8065413851595143e-06, 'epoch': 2.04}
68%|██████▊ | 7839/11526 [1:21:57<37:47, 1.63it/s] 68%|██████▊ | 7840/11526 [1:21:58<37:47, 1.63it/s] {'loss': 0.1069, 'grad_norm': 0.41910815238952637, 'learning_rate': 2.8051806676230496e-06, 'epoch': 2.04}
68%|██████▊ | 7840/11526 [1:21:58<37:47, 1.63it/s] 68%|██████▊ | 7841/11526 [1:21:58<37:44, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.5382745862007141, 'learning_rate': 2.803820151407903e-06, 'epoch': 2.04}
68%|██████▊ | 7841/11526 [1:21:58<37:44, 1.63it/s] 68%|██████▊ | 7842/11526 [1:21:59<37:47, 1.62it/s] {'loss': 0.1553, 'grad_norm': 0.5557162165641785, 'learning_rate': 2.8024598366388677e-06, 'epoch': 2.04}
68%|██████▊ | 7842/11526 [1:21:59<37:47, 1.62it/s] 68%|██████▊ | 7843/11526 [1:22:00<37:45, 1.63it/s] {'loss': 0.166, 'grad_norm': 0.6208021640777588, 'learning_rate': 2.801099723440719e-06, 'epoch': 2.04}
68%|██████▊ | 7843/11526 [1:22:00<37:45, 1.63it/s] 68%|██████▊ | 7844/11526 [1:22:00<37:47, 1.62it/s] {'loss': 0.1622, 'grad_norm': 0.5634217262268066, 'learning_rate': 2.7997398119382175e-06, 'epoch': 2.04}
68%|██████▊ | 7844/11526 [1:22:00<37:47, 1.62it/s] 68%|██████▊ | 7845/11526 [1:22:01<37:45, 1.62it/s] {'loss': 0.1397, 'grad_norm': 0.5108175277709961, 'learning_rate': 2.798380102256095e-06, 'epoch': 2.04}
68%|██████▊ | 7845/11526 [1:22:01<37:45, 1.62it/s] 68%|██████▊ | 7846/11526 [1:22:01<37:43, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.660685658454895, 'learning_rate': 2.7970205945190788e-06, 'epoch': 2.04}
68%|██████▊ | 7846/11526 [1:22:02<37:43, 1.63it/s] 68%|██████▊ | 7847/11526 [1:22:02<37:42, 1.63it/s] {'loss': 0.1231, 'grad_norm': 0.5102872848510742, 'learning_rate': 2.79566128885187e-06, 'epoch': 2.04}
68%|██████▊ | 7847/11526 [1:22:02<37:42, 1.63it/s] 68%|██████▊ | 7848/11526 [1:22:03<37:41, 1.63it/s] {'loss': 0.1482, 'grad_norm': 0.5566264986991882, 'learning_rate': 2.7943021853791475e-06, 'epoch': 2.04}
68%|██████▊ | 7848/11526 [1:22:03<37:41, 1.63it/s] 68%|██████▊ | 7849/11526 [1:22:03<37:42, 1.62it/s] {'loss': 0.1541, 'grad_norm': 0.6136370897293091, 'learning_rate': 2.7929432842255787e-06, 'epoch': 2.04}
68%|██████▊ | 7849/11526 [1:22:03<37:42, 1.62it/s] 68%|██████▊ | 7850/11526 [1:22:04<37:42, 1.62it/s] {'loss': 0.1489, 'grad_norm': 0.5580892562866211, 'learning_rate': 2.7915845855158096e-06, 'epoch': 2.04}
68%|██████▊ | 7850/11526 [1:22:04<37:42, 1.62it/s] 68%|██████▊ | 7851/11526 [1:22:05<37:39, 1.63it/s] {'loss': 0.1582, 'grad_norm': 0.5959145426750183, 'learning_rate': 2.7902260893744674e-06, 'epoch': 2.04}
68%|██████▊ | 7851/11526 [1:22:05<37:39, 1.63it/s] 68%|██████▊ | 7852/11526 [1:22:05<37:39, 1.63it/s] {'loss': 0.1018, 'grad_norm': 0.5373513698577881, 'learning_rate': 2.7888677959261635e-06, 'epoch': 2.04}
68%|██████▊ | 7852/11526 [1:22:05<37:39, 1.63it/s] 68%|██████▊ | 7853/11526 [1:22:06<37:37, 1.63it/s] {'loss': 0.1762, 'grad_norm': 0.612394392490387, 'learning_rate': 2.787509705295485e-06, 'epoch': 2.04}
68%|██████▊ | 7853/11526 [1:22:06<37:37, 1.63it/s] 68%|██████▊ | 7854/11526 [1:22:06<37:40, 1.62it/s] {'loss': 0.1569, 'grad_norm': 0.5162434577941895, 'learning_rate': 2.7861518176070017e-06, 'epoch': 2.04}
68%|██████▊ | 7854/11526 [1:22:06<37:40, 1.62it/s] 68%|██████▊ | 7855/11526 [1:22:07<37:38, 1.63it/s] {'loss': 0.19, 'grad_norm': 0.6259366273880005, 'learning_rate': 2.7847941329852745e-06, 'epoch': 2.04}
68%|██████▊ | 7855/11526 [1:22:07<37:38, 1.63it/s] 68%|██████▊ | 7856/11526 [1:22:08<37:37, 1.63it/s] {'loss': 0.1721, 'grad_norm': 0.6373807191848755, 'learning_rate': 2.7834366515548315e-06, 'epoch': 2.04}
68%|██████▊ | 7856/11526 [1:22:08<37:37, 1.63it/s] 68%|██████▊ | 7857/11526 [1:22:08<37:35, 1.63it/s] {'loss': 0.1532, 'grad_norm': 0.6670811176300049, 'learning_rate': 2.7820793734401904e-06, 'epoch': 2.05}
68%|██████▊ | 7857/11526 [1:22:08<37:35, 1.63it/s] 68%|██████▊ | 7858/11526 [1:22:09<37:34, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.5793454647064209, 'learning_rate': 2.78072229876585e-06, 'epoch': 2.05}
68%|██████▊ | 7858/11526 [1:22:09<37:34, 1.63it/s] 68%|██████▊ | 7859/11526 [1:22:09<37:37, 1.62it/s] {'loss': 0.1668, 'grad_norm': 0.4944702386856079, 'learning_rate': 2.779365427656284e-06, 'epoch': 2.05}
68%|██████▊ | 7859/11526 [1:22:10<37:37, 1.62it/s] 68%|██████▊ | 7860/11526 [1:22:10<37:35, 1.63it/s] {'loss': 0.1513, 'grad_norm': 0.5136257410049438, 'learning_rate': 2.778008760235961e-06, 'epoch': 2.05}
68%|██████▊ | 7860/11526 [1:22:10<37:35, 1.63it/s] 68%|██████▊ | 7861/11526 [1:22:11<37:34, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.512755811214447, 'learning_rate': 2.7766522966293135e-06, 'epoch': 2.05}
68%|██████▊ | 7861/11526 [1:22:11<37:34, 1.63it/s] 68%|██████▊ | 7862/11526 [1:22:11<37:32, 1.63it/s] {'loss': 0.1004, 'grad_norm': 0.48999902606010437, 'learning_rate': 2.7752960369607682e-06, 'epoch': 2.05}
68%|██████▊ | 7862/11526 [1:22:11<37:32, 1.63it/s] 68%|██████▊ | 7863/11526 [1:22:12<37:33, 1.63it/s] {'loss': 0.1479, 'grad_norm': 0.4803529679775238, 'learning_rate': 2.7739399813547284e-06, 'epoch': 2.05}
68%|██████▊ | 7863/11526 [1:22:12<37:33, 1.63it/s] 68%|██████▊ | 7864/11526 [1:22:13<37:43, 1.62it/s] {'loss': 0.1289, 'grad_norm': 0.5620209574699402, 'learning_rate': 2.7725841299355795e-06, 'epoch': 2.05}
68%|██████▊ | 7864/11526 [1:22:13<37:43, 1.62it/s] 68%|██████▊ | 7865/11526 [1:22:13<37:38, 1.62it/s] {'loss': 0.1427, 'grad_norm': 0.5476803779602051, 'learning_rate': 2.771228482827688e-06, 'epoch': 2.05}
68%|██████▊ | 7865/11526 [1:22:13<37:38, 1.62it/s] 68%|██████▊ | 7866/11526 [1:22:14<37:34, 1.62it/s] {'loss': 0.131, 'grad_norm': 0.5042217373847961, 'learning_rate': 2.7698730401554026e-06, 'epoch': 2.05}
68%|██████▊ | 7866/11526 [1:22:14<37:34, 1.62it/s] 68%|██████▊ | 7867/11526 [1:22:14<37:32, 1.62it/s] {'loss': 0.1352, 'grad_norm': 0.528773307800293, 'learning_rate': 2.768517802043049e-06, 'epoch': 2.05}
68%|██████▊ | 7867/11526 [1:22:14<37:32, 1.62it/s] 68%|██████▊ | 7868/11526 [1:22:15<37:30, 1.63it/s] {'loss': 0.1638, 'grad_norm': 0.6155386567115784, 'learning_rate': 2.7671627686149396e-06, 'epoch': 2.05}
68%|██████▊ | 7868/11526 [1:22:15<37:30, 1.63it/s] 68%|██████▊ | 7869/11526 [1:22:16<37:40, 1.62it/s] {'loss': 0.1386, 'grad_norm': 0.5817124843597412, 'learning_rate': 2.765807939995365e-06, 'epoch': 2.05}
68%|██████▊ | 7869/11526 [1:22:16<37:40, 1.62it/s] 68%|██████▊ | 7870/11526 [1:22:16<37:35, 1.62it/s] {'loss': 0.1402, 'grad_norm': 0.5300195217132568, 'learning_rate': 2.764453316308598e-06, 'epoch': 2.05}
68%|██████▊ | 7870/11526 [1:22:16<37:35, 1.62it/s] 68%|██████▊ | 7871/11526 [1:22:17<37:32, 1.62it/s] {'loss': 0.1569, 'grad_norm': 0.5827738642692566, 'learning_rate': 2.7630988976788948e-06, 'epoch': 2.05}
68%|██████▊ | 7871/11526 [1:22:17<37:32, 1.62it/s] 68%|██████▊ | 7872/11526 [1:22:17<37:28, 1.62it/s] {'loss': 0.1405, 'grad_norm': 0.5155806541442871, 'learning_rate': 2.761744684230484e-06, 'epoch': 2.05}
68%|██████▊ | 7872/11526 [1:22:18<37:28, 1.62it/s] 68%|██████▊ | 7873/11526 [1:22:18<37:28, 1.62it/s] {'loss': 0.2021, 'grad_norm': 0.696050763130188, 'learning_rate': 2.7603906760875878e-06, 'epoch': 2.05}
68%|██████▊ | 7873/11526 [1:22:18<37:28, 1.62it/s] 68%|██████▊ | 7874/11526 [1:22:19<37:29, 1.62it/s] {'loss': 0.1191, 'grad_norm': 0.43382346630096436, 'learning_rate': 2.7590368733744042e-06, 'epoch': 2.05}
68%|██████▊ | 7874/11526 [1:22:19<37:29, 1.62it/s] 68%|██████▊ | 7875/11526 [1:22:19<37:26, 1.63it/s] {'loss': 0.1304, 'grad_norm': 0.4960828423500061, 'learning_rate': 2.7576832762151063e-06, 'epoch': 2.05}
68%|██████▊ | 7875/11526 [1:22:19<37:26, 1.63it/s] 68%|██████▊ | 7876/11526 [1:22:20<37:25, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.5518319010734558, 'learning_rate': 2.7563298847338572e-06, 'epoch': 2.05}
68%|██████▊ | 7876/11526 [1:22:20<37:25, 1.63it/s] 68%|██████▊ | 7877/11526 [1:22:21<37:23, 1.63it/s] {'loss': 0.1951, 'grad_norm': 0.6922118067741394, 'learning_rate': 2.7549766990547973e-06, 'epoch': 2.05}
68%|██████▊ | 7877/11526 [1:22:21<37:23, 1.63it/s] 68%|██████▊ | 7878/11526 [1:22:21<37:22, 1.63it/s] {'loss': 0.1659, 'grad_norm': 0.577890157699585, 'learning_rate': 2.7536237193020477e-06, 'epoch': 2.05}
68%|██████▊ | 7878/11526 [1:22:21<37:22, 1.63it/s] 68%|██████▊ | 7879/11526 [1:22:22<37:25, 1.62it/s] {'loss': 0.1717, 'grad_norm': 0.8871922492980957, 'learning_rate': 2.7522709455997144e-06, 'epoch': 2.05}
68%|██████▊ | 7879/11526 [1:22:22<37:25, 1.62it/s] 68%|██████▊ | 7880/11526 [1:22:22<37:23, 1.63it/s] {'loss': 0.2216, 'grad_norm': 0.7099394202232361, 'learning_rate': 2.7509183780718764e-06, 'epoch': 2.05}
68%|██████▊ | 7880/11526 [1:22:22<37:23, 1.63it/s] 68%|██████▊ | 7881/11526 [1:22:23<37:21, 1.63it/s] {'loss': 0.1289, 'grad_norm': 0.5176161527633667, 'learning_rate': 2.7495660168426e-06, 'epoch': 2.05}
68%|██████▊ | 7881/11526 [1:22:23<37:21, 1.63it/s] 68%|██████▊ | 7882/11526 [1:22:24<37:20, 1.63it/s] {'loss': 0.1634, 'grad_norm': 0.5157429575920105, 'learning_rate': 2.7482138620359365e-06, 'epoch': 2.05}
68%|██████▊ | 7882/11526 [1:22:24<37:20, 1.63it/s] 68%|██████▊ | 7883/11526 [1:22:24<37:19, 1.63it/s] {'loss': 0.2119, 'grad_norm': 0.6791766881942749, 'learning_rate': 2.7468619137759077e-06, 'epoch': 2.05}
68%|██████▊ | 7883/11526 [1:22:24<37:19, 1.63it/s] 68%|██████▊ | 7884/11526 [1:22:25<37:22, 1.62it/s] {'loss': 0.1325, 'grad_norm': 0.5376219153404236, 'learning_rate': 2.745510172186524e-06, 'epoch': 2.05}
68%|██████▊ | 7884/11526 [1:22:25<37:22, 1.62it/s] 68%|██████▊ | 7885/11526 [1:22:25<37:21, 1.62it/s] {'loss': 0.1392, 'grad_norm': 0.5707976222038269, 'learning_rate': 2.744158637391775e-06, 'epoch': 2.05}
68%|██████▊ | 7885/11526 [1:22:26<37:21, 1.62it/s] 68%|██████▊ | 7886/11526 [1:22:26<37:20, 1.62it/s] {'loss': 0.1453, 'grad_norm': 0.6426084637641907, 'learning_rate': 2.7428073095156317e-06, 'epoch': 2.05}
68%|██████▊ | 7886/11526 [1:22:26<37:20, 1.62it/s] 68%|██████▊ | 7887/11526 [1:22:27<37:19, 1.62it/s] {'loss': 0.1381, 'grad_norm': 0.5466899871826172, 'learning_rate': 2.7414561886820456e-06, 'epoch': 2.05}
68%|██████▊ | 7887/11526 [1:22:27<37:19, 1.62it/s] 68%|██████▊ | 7888/11526 [1:22:27<37:18, 1.62it/s] {'loss': 0.1595, 'grad_norm': 0.557465672492981, 'learning_rate': 2.740105275014947e-06, 'epoch': 2.05}
68%|██████▊ | 7888/11526 [1:22:27<37:18, 1.62it/s] 68%|██████▊ | 7889/11526 [1:22:28<37:27, 1.62it/s] {'loss': 0.1802, 'grad_norm': 0.689690887928009, 'learning_rate': 2.738754568638251e-06, 'epoch': 2.05}
68%|██████▊ | 7889/11526 [1:22:28<37:27, 1.62it/s] 68%|██████▊ | 7890/11526 [1:22:29<37:22, 1.62it/s] {'loss': 0.1441, 'grad_norm': 0.565144956111908, 'learning_rate': 2.737404069675852e-06, 'epoch': 2.05}
68%|██████▊ | 7890/11526 [1:22:29<37:22, 1.62it/s] 68%|██████▊ | 7891/11526 [1:22:29<37:18, 1.62it/s] {'loss': 0.1298, 'grad_norm': 0.4465431272983551, 'learning_rate': 2.736053778251625e-06, 'epoch': 2.05}
68%|██████▊ | 7891/11526 [1:22:29<37:18, 1.62it/s] 68%|██████▊ | 7892/11526 [1:22:30<37:17, 1.62it/s] {'loss': 0.1658, 'grad_norm': 0.6513187885284424, 'learning_rate': 2.7347036944894274e-06, 'epoch': 2.05}
68%|██████▊ | 7892/11526 [1:22:30<37:17, 1.62it/s] 68%|██████▊ | 7893/11526 [1:22:30<37:15, 1.63it/s] {'loss': 0.1509, 'grad_norm': 0.5195037722587585, 'learning_rate': 2.733353818513097e-06, 'epoch': 2.05}
68%|██████▊ | 7893/11526 [1:22:30<37:15, 1.63it/s] 68%|██████▊ | 7894/11526 [1:22:31<37:17, 1.62it/s] {'loss': 0.1895, 'grad_norm': 0.685347318649292, 'learning_rate': 2.7320041504464476e-06, 'epoch': 2.05}
68%|██████▊ | 7894/11526 [1:22:31<37:17, 1.62it/s] 68%|██████▊ | 7895/11526 [1:22:32<37:14, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.6050966382026672, 'learning_rate': 2.730654690413286e-06, 'epoch': 2.05}
68%|██████▊ | 7895/11526 [1:22:32<37:14, 1.63it/s] 69%|██████▊ | 7896/11526 [1:22:32<37:12, 1.63it/s] {'loss': 0.211, 'grad_norm': 0.8184067010879517, 'learning_rate': 2.7293054385373864e-06, 'epoch': 2.06}
69%|██████▊ | 7896/11526 [1:22:32<37:12, 1.63it/s] 69%|██████▊ | 7897/11526 [1:22:33<37:10, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.5558418035507202, 'learning_rate': 2.7279563949425113e-06, 'epoch': 2.06}
69%|██████▊ | 7897/11526 [1:22:33<37:10, 1.63it/s] 69%|██████▊ | 7898/11526 [1:22:33<37:10, 1.63it/s] {'loss': 0.1309, 'grad_norm': 0.5827943682670593, 'learning_rate': 2.726607559752403e-06, 'epoch': 2.06}
69%|██████▊ | 7898/11526 [1:22:34<37:10, 1.63it/s] 69%|██████▊ | 7899/11526 [1:22:34<37:22, 1.62it/s] {'loss': 0.1401, 'grad_norm': 0.5055042505264282, 'learning_rate': 2.7252589330907845e-06, 'epoch': 2.06}
69%|██████▊ | 7899/11526 [1:22:34<37:22, 1.62it/s] 69%|██████▊ | 7900/11526 [1:22:35<37:17, 1.62it/s] {'loss': 0.1535, 'grad_norm': 0.6549550294876099, 'learning_rate': 2.7239105150813592e-06, 'epoch': 2.06}
69%|██████▊ | 7900/11526 [1:22:35<37:17, 1.62it/s] 69%|██████▊ | 7901/11526 [1:22:35<37:13, 1.62it/s] {'loss': 0.1782, 'grad_norm': 0.6375975012779236, 'learning_rate': 2.7225623058478147e-06, 'epoch': 2.06}
69%|██████▊ | 7901/11526 [1:22:35<37:13, 1.62it/s] 69%|██████▊ | 7902/11526 [1:22:36<37:11, 1.62it/s] {'loss': 0.1502, 'grad_norm': 0.5795047283172607, 'learning_rate': 2.7212143055138106e-06, 'epoch': 2.06}
69%|██████▊ | 7902/11526 [1:22:36<37:11, 1.62it/s] 69%|██████▊ | 7903/11526 [1:22:37<37:10, 1.62it/s] {'loss': 0.1576, 'grad_norm': 0.6276691555976868, 'learning_rate': 2.719866514202997e-06, 'epoch': 2.06}
69%|██████▊ | 7903/11526 [1:22:37<37:10, 1.62it/s] 69%|██████▊ | 7904/11526 [1:22:37<37:10, 1.62it/s] {'loss': 0.1553, 'grad_norm': 0.544472873210907, 'learning_rate': 2.7185189320389995e-06, 'epoch': 2.06}
69%|██████▊ | 7904/11526 [1:22:37<37:10, 1.62it/s] 69%|██████▊ | 7905/11526 [1:22:38<37:08, 1.63it/s] {'loss': 0.1429, 'grad_norm': 0.5272735953330994, 'learning_rate': 2.7171715591454274e-06, 'epoch': 2.06}
69%|██████▊ | 7905/11526 [1:22:38<37:08, 1.63it/s] 69%|██████▊ | 7906/11526 [1:22:38<37:05, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.5772393345832825, 'learning_rate': 2.7158243956458673e-06, 'epoch': 2.06}
69%|██████▊ | 7906/11526 [1:22:38<37:05, 1.63it/s] 69%|██████▊ | 7907/11526 [1:22:39<37:04, 1.63it/s] {'loss': 0.1342, 'grad_norm': 0.527538001537323, 'learning_rate': 2.7144774416638932e-06, 'epoch': 2.06}
69%|██████▊ | 7907/11526 [1:22:39<37:04, 1.63it/s] 69%|██████▊ | 7908/11526 [1:22:40<37:03, 1.63it/s] {'loss': 0.1636, 'grad_norm': 0.6216943860054016, 'learning_rate': 2.7131306973230475e-06, 'epoch': 2.06}
69%|██████▊ | 7908/11526 [1:22:40<37:03, 1.63it/s] 69%|██████▊ | 7909/11526 [1:22:40<37:03, 1.63it/s] {'loss': 0.1675, 'grad_norm': 0.6391927003860474, 'learning_rate': 2.7117841627468713e-06, 'epoch': 2.06}
69%|██████▊ | 7909/11526 [1:22:40<37:03, 1.63it/s] 69%|██████▊ | 7910/11526 [1:22:41<37:01, 1.63it/s] {'loss': 0.1496, 'grad_norm': 0.558174729347229, 'learning_rate': 2.7104378380588677e-06, 'epoch': 2.06}
69%|██████▊ | 7910/11526 [1:22:41<37:01, 1.63it/s] 69%|██████▊ | 7911/11526 [1:22:41<37:00, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.4790591895580292, 'learning_rate': 2.709091723382533e-06, 'epoch': 2.06}
69%|██████▊ | 7911/11526 [1:22:42<37:00, 1.63it/s] 69%|██████▊ | 7912/11526 [1:22:42<36:59, 1.63it/s] {'loss': 0.1381, 'grad_norm': 0.5051592588424683, 'learning_rate': 2.7077458188413404e-06, 'epoch': 2.06}
69%|██████▊ | 7912/11526 [1:22:42<36:59, 1.63it/s] 69%|██████▊ | 7913/11526 [1:22:43<36:58, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.5732929706573486, 'learning_rate': 2.7064001245587437e-06, 'epoch': 2.06}
69%|██████▊ | 7913/11526 [1:22:43<36:58, 1.63it/s] 69%|██████▊ | 7914/11526 [1:22:43<36:57, 1.63it/s] {'loss': 0.1615, 'grad_norm': 0.6746164560317993, 'learning_rate': 2.705054640658177e-06, 'epoch': 2.06}
69%|██████▊ | 7914/11526 [1:22:43<36:57, 1.63it/s] 69%|██████▊ | 7915/11526 [1:22:44<36:56, 1.63it/s] {'loss': 0.1483, 'grad_norm': 0.49870678782463074, 'learning_rate': 2.7037093672630595e-06, 'epoch': 2.06}
69%|██████▊ | 7915/11526 [1:22:44<36:56, 1.63it/s] 69%|██████▊ | 7916/11526 [1:22:45<36:56, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.579077959060669, 'learning_rate': 2.7023643044967813e-06, 'epoch': 2.06}
69%|██████▊ | 7916/11526 [1:22:45<36:56, 1.63it/s] 69%|██████▊ | 7917/11526 [1:22:45<36:56, 1.63it/s] {'loss': 0.1356, 'grad_norm': 0.5293189287185669, 'learning_rate': 2.7010194524827227e-06, 'epoch': 2.06}
69%|██████▊ | 7917/11526 [1:22:45<36:56, 1.63it/s] 69%|██████▊ | 7918/11526 [1:22:46<36:55, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5300115942955017, 'learning_rate': 2.6996748113442397e-06, 'epoch': 2.06}
69%|██████▊ | 7918/11526 [1:22:46<36:55, 1.63it/s] 69%|██████▊ | 7919/11526 [1:22:46<36:55, 1.63it/s] {'loss': 0.1271, 'grad_norm': 0.5196948051452637, 'learning_rate': 2.6983303812046724e-06, 'epoch': 2.06}
69%|██████▊ | 7919/11526 [1:22:46<36:55, 1.63it/s] 69%|██████▊ | 7920/11526 [1:22:47<36:54, 1.63it/s] {'loss': 0.1485, 'grad_norm': 0.6586061716079712, 'learning_rate': 2.6969861621873393e-06, 'epoch': 2.06}
69%|██████▊ | 7920/11526 [1:22:47<36:54, 1.63it/s] 69%|██████▊ | 7921/11526 [1:22:48<36:53, 1.63it/s] {'loss': 0.1368, 'grad_norm': 0.5617595314979553, 'learning_rate': 2.695642154415536e-06, 'epoch': 2.06}
69%|██████▊ | 7921/11526 [1:22:48<36:53, 1.63it/s] 69%|██████▊ | 7922/11526 [1:22:48<36:53, 1.63it/s] {'loss': 0.1328, 'grad_norm': 0.5022987723350525, 'learning_rate': 2.694298358012547e-06, 'epoch': 2.06}
69%|██████▊ | 7922/11526 [1:22:48<36:53, 1.63it/s] 69%|██████▊ | 7923/11526 [1:22:49<36:52, 1.63it/s] {'loss': 0.1637, 'grad_norm': 0.5762644410133362, 'learning_rate': 2.692954773101634e-06, 'epoch': 2.06}
69%|██████▊ | 7923/11526 [1:22:49<36:52, 1.63it/s] 69%|██████▊ | 7924/11526 [1:22:49<36:50, 1.63it/s] {'loss': 0.2787, 'grad_norm': 0.6265367269515991, 'learning_rate': 2.6916113998060333e-06, 'epoch': 2.06}
69%|██████▊ | 7924/11526 [1:22:50<36:50, 1.63it/s] 69%|██████▉ | 7925/11526 [1:22:50<36:51, 1.63it/s] {'loss': 0.1538, 'grad_norm': 0.725202202796936, 'learning_rate': 2.6902682382489693e-06, 'epoch': 2.06}
69%|██████▉ | 7925/11526 [1:22:50<36:51, 1.63it/s] 69%|██████▉ | 7926/11526 [1:22:51<36:58, 1.62it/s] {'loss': 0.1441, 'grad_norm': 0.5548332333564758, 'learning_rate': 2.6889252885536445e-06, 'epoch': 2.06}
69%|██████▉ | 7926/11526 [1:22:51<36:58, 1.62it/s] 69%|██████▉ | 7927/11526 [1:22:51<36:54, 1.63it/s] {'loss': 0.1552, 'grad_norm': 0.688299298286438, 'learning_rate': 2.687582550843242e-06, 'epoch': 2.06}
69%|██████▉ | 7927/11526 [1:22:51<36:54, 1.63it/s] 69%|██████▉ | 7928/11526 [1:22:52<36:54, 1.63it/s] {'loss': 0.1433, 'grad_norm': 0.5986329913139343, 'learning_rate': 2.686240025240926e-06, 'epoch': 2.06}
69%|██████▉ | 7928/11526 [1:22:52<36:54, 1.63it/s] 69%|██████▉ | 7929/11526 [1:22:52<36:55, 1.62it/s] {'loss': 0.1252, 'grad_norm': 0.48644399642944336, 'learning_rate': 2.6848977118698382e-06, 'epoch': 2.06}
69%|██████▉ | 7929/11526 [1:22:53<36:55, 1.62it/s] 69%|██████▉ | 7930/11526 [1:22:53<36:53, 1.62it/s] {'loss': 0.144, 'grad_norm': 0.9252856969833374, 'learning_rate': 2.683555610853103e-06, 'epoch': 2.06}
69%|██████▉ | 7930/11526 [1:22:53<36:53, 1.62it/s] 69%|██████▉ | 7931/11526 [1:22:54<36:53, 1.62it/s] {'loss': 0.1072, 'grad_norm': 0.4922001361846924, 'learning_rate': 2.682213722313831e-06, 'epoch': 2.06}
69%|██████▉ | 7931/11526 [1:22:54<36:53, 1.62it/s] 69%|██████▉ | 7932/11526 [1:22:54<36:51, 1.63it/s] {'loss': 0.1183, 'grad_norm': 0.47930675745010376, 'learning_rate': 2.6808720463751014e-06, 'epoch': 2.06}
69%|██████▉ | 7932/11526 [1:22:54<36:51, 1.63it/s] 69%|██████▉ | 7933/11526 [1:22:55<36:49, 1.63it/s] {'loss': 0.1808, 'grad_norm': 0.6869693994522095, 'learning_rate': 2.6795305831599838e-06, 'epoch': 2.06}
69%|██████▉ | 7933/11526 [1:22:55<36:49, 1.63it/s] 69%|██████▉ | 7934/11526 [1:22:56<36:53, 1.62it/s] {'loss': 0.2098, 'grad_norm': 0.7616615891456604, 'learning_rate': 2.6781893327915242e-06, 'epoch': 2.07}
69%|██████▉ | 7934/11526 [1:22:56<36:53, 1.62it/s] 69%|██████▉ | 7935/11526 [1:22:56<36:50, 1.62it/s] {'loss': 0.1502, 'grad_norm': 0.5383204817771912, 'learning_rate': 2.6768482953927487e-06, 'epoch': 2.07}
69%|██████▉ | 7935/11526 [1:22:56<36:50, 1.62it/s] 69%|██████▉ | 7936/11526 [1:22:57<36:49, 1.62it/s] {'loss': 0.1358, 'grad_norm': 0.5600631833076477, 'learning_rate': 2.6755074710866686e-06, 'epoch': 2.07}
69%|██████▉ | 7936/11526 [1:22:57<36:49, 1.62it/s] 69%|██████▉ | 7937/11526 [1:22:57<36:48, 1.63it/s] {'loss': 0.1498, 'grad_norm': 0.5478675365447998, 'learning_rate': 2.6741668599962665e-06, 'epoch': 2.07}
69%|██████▉ | 7937/11526 [1:22:58<36:48, 1.63it/s] 69%|██████▉ | 7938/11526 [1:22:58<36:47, 1.63it/s] {'loss': 0.1761, 'grad_norm': 0.6082763075828552, 'learning_rate': 2.672826462244514e-06, 'epoch': 2.07}
69%|██████▉ | 7938/11526 [1:22:58<36:47, 1.63it/s] 69%|██████▉ | 7939/11526 [1:22:59<36:49, 1.62it/s] {'loss': 0.1627, 'grad_norm': 0.5988122820854187, 'learning_rate': 2.6714862779543586e-06, 'epoch': 2.07}
69%|██████▉ | 7939/11526 [1:22:59<36:49, 1.62it/s] 69%|██████▉ | 7940/11526 [1:22:59<36:46, 1.62it/s] {'loss': 0.1393, 'grad_norm': 0.5194792151451111, 'learning_rate': 2.6701463072487312e-06, 'epoch': 2.07}
69%|██████▉ | 7940/11526 [1:22:59<36:46, 1.62it/s] 69%|██████▉ | 7941/11526 [1:23:00<36:45, 1.63it/s] {'loss': 0.1301, 'grad_norm': 0.5188866257667542, 'learning_rate': 2.6688065502505405e-06, 'epoch': 2.07}
69%|██████▉ | 7941/11526 [1:23:00<36:45, 1.63it/s] 69%|██████▉ | 7942/11526 [1:23:00<36:43, 1.63it/s] {'loss': 0.1171, 'grad_norm': 0.47499844431877136, 'learning_rate': 2.667467007082679e-06, 'epoch': 2.07}
69%|██████▉ | 7942/11526 [1:23:01<36:43, 1.63it/s] 69%|██████▉ | 7943/11526 [1:23:01<36:41, 1.63it/s] {'loss': 0.1969, 'grad_norm': 0.7217071652412415, 'learning_rate': 2.666127677868011e-06, 'epoch': 2.07}
69%|██████▉ | 7943/11526 [1:23:01<36:41, 1.63it/s] 69%|██████▉ | 7944/11526 [1:23:02<36:43, 1.63it/s] {'loss': 0.1707, 'grad_norm': 0.5826088786125183, 'learning_rate': 2.664788562729396e-06, 'epoch': 2.07}
69%|██████▉ | 7944/11526 [1:23:02<36:43, 1.63it/s] 69%|██████▉ | 7945/11526 [1:23:02<36:41, 1.63it/s] {'loss': 0.154, 'grad_norm': 0.5602465867996216, 'learning_rate': 2.663449661789659e-06, 'epoch': 2.07}
69%|██████▉ | 7945/11526 [1:23:02<36:41, 1.63it/s] 69%|██████▉ | 7946/11526 [1:23:03<36:40, 1.63it/s] {'loss': 0.1474, 'grad_norm': 0.6098300814628601, 'learning_rate': 2.6621109751716123e-06, 'epoch': 2.07}
69%|██████▉ | 7946/11526 [1:23:03<36:40, 1.63it/s] 69%|██████▉ | 7947/11526 [1:23:04<36:39, 1.63it/s] {'loss': 0.171, 'grad_norm': 0.5875369906425476, 'learning_rate': 2.6607725029980526e-06, 'epoch': 2.07}
69%|██████▉ | 7947/11526 [1:23:04<36:39, 1.63it/s] 69%|██████▉ | 7948/11526 [1:23:04<36:37, 1.63it/s] {'loss': 0.1493, 'grad_norm': 0.5856122970581055, 'learning_rate': 2.6594342453917433e-06, 'epoch': 2.07}
69%|██████▉ | 7948/11526 [1:23:04<36:37, 1.63it/s] 69%|██████▉ | 7949/11526 [1:23:05<36:47, 1.62it/s] {'loss': 0.178, 'grad_norm': 0.6170784831047058, 'learning_rate': 2.6580962024754447e-06, 'epoch': 2.07}
69%|██████▉ | 7949/11526 [1:23:05<36:47, 1.62it/s] 69%|██████▉ | 7950/11526 [1:23:05<36:44, 1.62it/s] {'loss': 0.1226, 'grad_norm': 0.446206271648407, 'learning_rate': 2.6567583743718895e-06, 'epoch': 2.07}
69%|██████▉ | 7950/11526 [1:23:06<36:44, 1.62it/s] 69%|██████▉ | 7951/11526 [1:23:06<36:41, 1.62it/s] {'loss': 0.1248, 'grad_norm': 0.4902716279029846, 'learning_rate': 2.6554207612037862e-06, 'epoch': 2.07}
69%|██████▉ | 7951/11526 [1:23:06<36:41, 1.62it/s] 69%|██████▉ | 7952/11526 [1:23:07<36:38, 1.63it/s] {'loss': 0.1538, 'grad_norm': 0.5857459902763367, 'learning_rate': 2.6540833630938314e-06, 'epoch': 2.07}
69%|██████▉ | 7952/11526 [1:23:07<36:38, 1.63it/s] 69%|██████▉ | 7953/11526 [1:23:07<36:37, 1.63it/s] {'loss': 0.1251, 'grad_norm': 0.49288254976272583, 'learning_rate': 2.6527461801646973e-06, 'epoch': 2.07}
69%|██████▉ | 7953/11526 [1:23:07<36:37, 1.63it/s] 69%|██████▉ | 7954/11526 [1:23:08<36:36, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.5834349989891052, 'learning_rate': 2.65140921253904e-06, 'epoch': 2.07}
69%|██████▉ | 7954/11526 [1:23:08<36:36, 1.63it/s] 69%|██████▉ | 7955/11526 [1:23:08<36:35, 1.63it/s] {'loss': 0.1542, 'grad_norm': 0.6038489937782288, 'learning_rate': 2.650072460339494e-06, 'epoch': 2.07}
69%|██████▉ | 7955/11526 [1:23:09<36:35, 1.63it/s] 69%|██████▉ | 7956/11526 [1:23:09<36:33, 1.63it/s] {'loss': 0.1706, 'grad_norm': 0.6710420250892639, 'learning_rate': 2.6487359236886705e-06, 'epoch': 2.07}
69%|██████▉ | 7956/11526 [1:23:09<36:33, 1.63it/s] 69%|██████▉ | 7957/11526 [1:23:10<36:33, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.6102171540260315, 'learning_rate': 2.6473996027091642e-06, 'epoch': 2.07}
69%|██████▉ | 7957/11526 [1:23:10<36:33, 1.63it/s] 69%|██████▉ | 7958/11526 [1:23:10<36:32, 1.63it/s] {'loss': 0.1827, 'grad_norm': 0.6601142883300781, 'learning_rate': 2.6460634975235566e-06, 'epoch': 2.07}
69%|██████▉ | 7958/11526 [1:23:10<36:32, 1.63it/s] 69%|██████▉ | 7959/11526 [1:23:11<36:34, 1.63it/s] {'loss': 0.1777, 'grad_norm': 0.4988883435726166, 'learning_rate': 2.644727608254396e-06, 'epoch': 2.07}
69%|██████▉ | 7959/11526 [1:23:11<36:34, 1.63it/s] 69%|██████▉ | 7960/11526 [1:23:12<36:32, 1.63it/s] {'loss': 0.2093, 'grad_norm': 0.7350429892539978, 'learning_rate': 2.6433919350242197e-06, 'epoch': 2.07}
69%|██████▉ | 7960/11526 [1:23:12<36:32, 1.63it/s] 69%|██████▉ | 7961/11526 [1:23:12<36:30, 1.63it/s] {'loss': 0.1205, 'grad_norm': 0.5238615870475769, 'learning_rate': 2.6420564779555447e-06, 'epoch': 2.07}
69%|██████▉ | 7961/11526 [1:23:12<36:30, 1.63it/s] 69%|██████▉ | 7962/11526 [1:23:13<36:30, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.5944420099258423, 'learning_rate': 2.6407212371708646e-06, 'epoch': 2.07}
69%|██████▉ | 7962/11526 [1:23:13<36:30, 1.63it/s] 69%|██████▉ | 7963/11526 [1:23:13<36:28, 1.63it/s] {'loss': 0.14, 'grad_norm': 0.5324707627296448, 'learning_rate': 2.639386212792659e-06, 'epoch': 2.07}
69%|██████▉ | 7963/11526 [1:23:14<36:28, 1.63it/s] 69%|██████▉ | 7964/11526 [1:23:14<36:31, 1.63it/s] {'loss': 0.151, 'grad_norm': 0.5281546711921692, 'learning_rate': 2.6380514049433792e-06, 'epoch': 2.07}
69%|██████▉ | 7964/11526 [1:23:14<36:31, 1.63it/s] 69%|██████▉ | 7965/11526 [1:23:15<36:29, 1.63it/s] {'loss': 0.1405, 'grad_norm': 0.4789679944515228, 'learning_rate': 2.6367168137454635e-06, 'epoch': 2.07}
69%|██████▉ | 7965/11526 [1:23:15<36:29, 1.63it/s] 69%|██████▉ | 7966/11526 [1:23:15<36:28, 1.63it/s] {'loss': 0.1243, 'grad_norm': 0.47206076979637146, 'learning_rate': 2.635382439321328e-06, 'epoch': 2.07}
69%|██████▉ | 7966/11526 [1:23:15<36:28, 1.63it/s] 69%|██████▉ | 7967/11526 [1:23:16<36:26, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.5441890954971313, 'learning_rate': 2.6340482817933694e-06, 'epoch': 2.07}
69%|██████▉ | 7967/11526 [1:23:16<36:26, 1.63it/s] 69%|██████▉ | 7968/11526 [1:23:16<36:26, 1.63it/s] {'loss': 0.1375, 'grad_norm': 0.5521000623703003, 'learning_rate': 2.632714341283964e-06, 'epoch': 2.07}
69%|██████▉ | 7968/11526 [1:23:17<36:26, 1.63it/s] 69%|██████▉ | 7969/11526 [1:23:17<36:28, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.5728987455368042, 'learning_rate': 2.6313806179154705e-06, 'epoch': 2.07}
69%|██████▉ | 7969/11526 [1:23:17<36:28, 1.63it/s] 69%|██████▉ | 7970/11526 [1:23:18<36:25, 1.63it/s] {'loss': 0.1336, 'grad_norm': 0.4807518720626831, 'learning_rate': 2.63004711181022e-06, 'epoch': 2.07}
69%|██████▉ | 7970/11526 [1:23:18<36:25, 1.63it/s] 69%|██████▉ | 7971/11526 [1:23:18<36:25, 1.63it/s] {'loss': 0.1508, 'grad_norm': 0.5305424928665161, 'learning_rate': 2.628713823090536e-06, 'epoch': 2.07}
69%|██████▉ | 7971/11526 [1:23:18<36:25, 1.63it/s] 69%|██████▉ | 7972/11526 [1:23:19<36:23, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.6307209730148315, 'learning_rate': 2.627380751878711e-06, 'epoch': 2.07}
69%|██████▉ | 7972/11526 [1:23:19<36:23, 1.63it/s] 69%|██████▉ | 7973/11526 [1:23:20<36:22, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.554693341255188, 'learning_rate': 2.626047898297023e-06, 'epoch': 2.08}
69%|██████▉ | 7973/11526 [1:23:20<36:22, 1.63it/s] 69%|██████▉ | 7974/11526 [1:23:20<36:26, 1.62it/s] {'loss': 0.1662, 'grad_norm': 0.6615602374076843, 'learning_rate': 2.6247152624677288e-06, 'epoch': 2.08}
69%|██████▉ | 7974/11526 [1:23:20<36:26, 1.62it/s] 69%|██████▉ | 7975/11526 [1:23:21<36:24, 1.63it/s] {'loss': 0.1249, 'grad_norm': 0.5189738869667053, 'learning_rate': 2.6233828445130654e-06, 'epoch': 2.08}
69%|██████▉ | 7975/11526 [1:23:21<36:24, 1.63it/s] 69%|██████▉ | 7976/11526 [1:23:21<36:23, 1.63it/s] {'loss': 0.1357, 'grad_norm': 0.703543484210968, 'learning_rate': 2.62205064455525e-06, 'epoch': 2.08}
69%|██████▉ | 7976/11526 [1:23:22<36:23, 1.63it/s] 69%|██████▉ | 7977/11526 [1:23:22<36:21, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.5767462253570557, 'learning_rate': 2.6207186627164818e-06, 'epoch': 2.08}
69%|██████▉ | 7977/11526 [1:23:22<36:21, 1.63it/s] 69%|██████▉ | 7978/11526 [1:23:23<36:20, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.6564162373542786, 'learning_rate': 2.619386899118933e-06, 'epoch': 2.08}
69%|██████▉ | 7978/11526 [1:23:23<36:20, 1.63it/s] 69%|██████▉ | 7979/11526 [1:23:23<36:23, 1.62it/s] {'loss': 0.1769, 'grad_norm': 0.6394917368888855, 'learning_rate': 2.618055353884763e-06, 'epoch': 2.08}
69%|██████▉ | 7979/11526 [1:23:23<36:23, 1.62it/s] 69%|██████▉ | 7980/11526 [1:23:24<36:21, 1.63it/s] {'loss': 0.167, 'grad_norm': 0.6056299805641174, 'learning_rate': 2.6167240271361096e-06, 'epoch': 2.08}
69%|██████▉ | 7980/11526 [1:23:24<36:21, 1.63it/s] 69%|██████▉ | 7981/11526 [1:23:24<36:19, 1.63it/s] {'loss': 0.1507, 'grad_norm': 0.5301500558853149, 'learning_rate': 2.615392918995087e-06, 'epoch': 2.08}
69%|██████▉ | 7981/11526 [1:23:25<36:19, 1.63it/s] 69%|██████▉ | 7982/11526 [1:23:25<36:18, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.5151186585426331, 'learning_rate': 2.614062029583795e-06, 'epoch': 2.08}
69%|██████▉ | 7982/11526 [1:23:25<36:18, 1.63it/s] 69%|██████▉ | 7983/11526 [1:23:26<36:17, 1.63it/s] {'loss': 0.1765, 'grad_norm': 0.6645902991294861, 'learning_rate': 2.6127313590243085e-06, 'epoch': 2.08}
69%|██████▉ | 7983/11526 [1:23:26<36:17, 1.63it/s] 69%|██████▉ | 7984/11526 [1:23:26<36:17, 1.63it/s] {'loss': 0.1613, 'grad_norm': 0.6196467280387878, 'learning_rate': 2.611400907438685e-06, 'epoch': 2.08}
69%|██████▉ | 7984/11526 [1:23:26<36:17, 1.63it/s] 69%|██████▉ | 7985/11526 [1:23:27<36:15, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.6253953576087952, 'learning_rate': 2.6100706749489623e-06, 'epoch': 2.08}
69%|██████▉ | 7985/11526 [1:23:27<36:15, 1.63it/s] 69%|██████▉ | 7986/11526 [1:23:28<36:14, 1.63it/s] {'loss': 0.1323, 'grad_norm': 0.5890523195266724, 'learning_rate': 2.608740661677154e-06, 'epoch': 2.08}
69%|██████▉ | 7986/11526 [1:23:28<36:14, 1.63it/s] 69%|██████▉ | 7987/11526 [1:23:28<36:13, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.5927785634994507, 'learning_rate': 2.607410867745258e-06, 'epoch': 2.08}
69%|██████▉ | 7987/11526 [1:23:28<36:13, 1.63it/s] 69%|██████▉ | 7988/11526 [1:23:29<36:12, 1.63it/s] {'loss': 0.1354, 'grad_norm': 0.5230270624160767, 'learning_rate': 2.60608129327525e-06, 'epoch': 2.08}
69%|██████▉ | 7988/11526 [1:23:29<36:12, 1.63it/s] 69%|██████▉ | 7989/11526 [1:23:29<36:14, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.6299304962158203, 'learning_rate': 2.6047519383890875e-06, 'epoch': 2.08}
69%|██████▉ | 7989/11526 [1:23:30<36:14, 1.63it/s] 69%|██████▉ | 7990/11526 [1:23:30<36:12, 1.63it/s] {'loss': 0.1299, 'grad_norm': 0.48860302567481995, 'learning_rate': 2.603422803208705e-06, 'epoch': 2.08}
69%|██████▉ | 7990/11526 [1:23:30<36:12, 1.63it/s] 69%|██████▉ | 7991/11526 [1:23:31<36:11, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.5216189622879028, 'learning_rate': 2.602093887856021e-06, 'epoch': 2.08}
69%|██████▉ | 7991/11526 [1:23:31<36:11, 1.63it/s] 69%|██████▉ | 7992/11526 [1:23:31<36:11, 1.63it/s] {'loss': 0.1517, 'grad_norm': 0.5612105131149292, 'learning_rate': 2.6007651924529243e-06, 'epoch': 2.08}
69%|██████▉ | 7992/11526 [1:23:31<36:11, 1.63it/s] 69%|██████▉ | 7993/11526 [1:23:32<36:10, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.6600630879402161, 'learning_rate': 2.5994367171213003e-06, 'epoch': 2.08}
69%|██████▉ | 7993/11526 [1:23:32<36:10, 1.63it/s] 69%|██████▉ | 7994/11526 [1:23:32<36:09, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.6943789720535278, 'learning_rate': 2.5981084619829964e-06, 'epoch': 2.08}
69%|██████▉ | 7994/11526 [1:23:33<36:09, 1.63it/s] 69%|██████▉ | 7995/11526 [1:23:33<36:09, 1.63it/s] {'loss': 0.1363, 'grad_norm': 0.5084745287895203, 'learning_rate': 2.5967804271598517e-06, 'epoch': 2.08}
69%|██████▉ | 7995/11526 [1:23:33<36:09, 1.63it/s] 69%|██████▉ | 7996/11526 [1:23:34<36:07, 1.63it/s] {'loss': 0.1476, 'grad_norm': 0.56313157081604, 'learning_rate': 2.595452612773681e-06, 'epoch': 2.08}
69%|██████▉ | 7996/11526 [1:23:34<36:07, 1.63it/s] 69%|██████▉ | 7997/11526 [1:23:34<36:06, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.6078083515167236, 'learning_rate': 2.5941250189462745e-06, 'epoch': 2.08}
69%|██████▉ | 7997/11526 [1:23:34<36:06, 1.63it/s] 69%|██████▉ | 7998/11526 [1:23:35<36:05, 1.63it/s] {'loss': 0.1602, 'grad_norm': 0.5525612235069275, 'learning_rate': 2.592797645799412e-06, 'epoch': 2.08}
69%|██████▉ | 7998/11526 [1:23:35<36:05, 1.63it/s] 69%|██████▉ | 7999/11526 [1:23:36<36:07, 1.63it/s] {'loss': 0.1377, 'grad_norm': 0.5352036952972412, 'learning_rate': 2.591470493454848e-06, 'epoch': 2.08}
69%|██████▉ | 7999/11526 [1:23:36<36:07, 1.63it/s] 69%|██████▉ | 8000/11526 [1:23:36<36:07, 1.63it/s] {'loss': 0.1386, 'grad_norm': 0.5021727681159973, 'learning_rate': 2.590143562034312e-06, 'epoch': 2.08}
69%|██████▉ | 8000/11526 [1:23:36<36:07, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.32it/s]
31%|███ | 4/13 [00:00<00:01, 8.37it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.77it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5431423783302307, 'eval_runtime': 1.9562, 'eval_samples_per_second': 102.239, 'eval_steps_per_second': 6.646, 'epoch': 2.08}
69%|██████▉ | 8000/11526 [1:23:38<36:07, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 69%|██████▉ | 8001/11526 [1:23:39<1:10:40, 1.20s/it] {'loss': 0.1843, 'grad_norm': 0.7385039329528809, 'learning_rate': 2.58881685165952e-06, 'epoch': 2.08}
69%|██████▉ | 8001/11526 [1:23:39<1:10:40, 1.20s/it] 69%|██████▉ | 8002/11526 [1:23:39<1:00:17, 1.03s/it] {'loss': 0.1823, 'grad_norm': 0.5849365592002869, 'learning_rate': 2.587490362452166e-06, 'epoch': 2.08}
69%|██████▉ | 8002/11526 [1:23:39<1:00:17, 1.03s/it] 69%|██████▉ | 8003/11526 [1:23:40<53:01, 1.11it/s] {'loss': 0.1485, 'grad_norm': 0.6627076864242554, 'learning_rate': 2.5861640945339216e-06, 'epoch': 2.08}
69%|██████▉ | 8003/11526 [1:23:40<53:01, 1.11it/s] 69%|██████▉ | 8004/11526 [1:23:41<47:57, 1.22it/s] {'loss': 0.187, 'grad_norm': 0.7070517539978027, 'learning_rate': 2.5848380480264433e-06, 'epoch': 2.08}
69%|██████▉ | 8004/11526 [1:23:41<47:57, 1.22it/s] 69%|██████▉ | 8005/11526 [1:23:41<44:22, 1.32it/s] {'loss': 0.1694, 'grad_norm': 0.6097087264060974, 'learning_rate': 2.583512223051359e-06, 'epoch': 2.08}
69%|██████▉ | 8005/11526 [1:23:41<44:22, 1.32it/s] 69%|██████▉ | 8006/11526 [1:23:42<41:49, 1.40it/s] {'loss': 0.1877, 'grad_norm': 0.7588161826133728, 'learning_rate': 2.5821866197302803e-06, 'epoch': 2.08}
69%|██████▉ | 8006/11526 [1:23:42<41:49, 1.40it/s] 69%|██████▉ | 8007/11526 [1:23:42<40:05, 1.46it/s] {'loss': 0.1177, 'grad_norm': 0.4551866352558136, 'learning_rate': 2.5808612381848064e-06, 'epoch': 2.08}
69%|██████▉ | 8007/11526 [1:23:43<40:05, 1.46it/s] 69%|██████▉ | 8008/11526 [1:23:43<38:51, 1.51it/s] {'loss': 0.1208, 'grad_norm': 0.4795069694519043, 'learning_rate': 2.5795360785365016e-06, 'epoch': 2.08}
69%|██████▉ | 8008/11526 [1:23:43<38:51, 1.51it/s] 69%|██████▉ | 8009/11526 [1:23:44<38:01, 1.54it/s] {'loss': 0.1729, 'grad_norm': 0.5798162817955017, 'learning_rate': 2.5782111409069198e-06, 'epoch': 2.08}
69%|██████▉ | 8009/11526 [1:23:44<38:01, 1.54it/s] 69%|██████▉ | 8010/11526 [1:23:44<37:24, 1.57it/s] {'loss': 0.1704, 'grad_norm': 0.6597828269004822, 'learning_rate': 2.5768864254175907e-06, 'epoch': 2.08}
69%|██████▉ | 8010/11526 [1:23:44<37:24, 1.57it/s] 70%|██████▉ | 8011/11526 [1:23:45<36:58, 1.58it/s] {'loss': 0.163, 'grad_norm': 0.6485750079154968, 'learning_rate': 2.5755619321900267e-06, 'epoch': 2.09}
70%|██████▉ | 8011/11526 [1:23:45<36:58, 1.58it/s] 70%|██████▉ | 8012/11526 [1:23:45<36:40, 1.60it/s] {'loss': 0.1409, 'grad_norm': 0.5297408699989319, 'learning_rate': 2.574237661345718e-06, 'epoch': 2.09}
70%|██████▉ | 8012/11526 [1:23:46<36:40, 1.60it/s] 70%|██████▉ | 8013/11526 [1:23:46<36:27, 1.61it/s] {'loss': 0.1482, 'grad_norm': 0.5808911323547363, 'learning_rate': 2.5729136130061318e-06, 'epoch': 2.09}
70%|██████▉ | 8013/11526 [1:23:46<36:27, 1.61it/s] 70%|██████▉ | 8014/11526 [1:23:47<36:20, 1.61it/s] {'loss': 0.1605, 'grad_norm': 0.5685950517654419, 'learning_rate': 2.5715897872927176e-06, 'epoch': 2.09}
70%|██████▉ | 8014/11526 [1:23:47<36:20, 1.61it/s] 70%|██████▉ | 8015/11526 [1:23:47<36:12, 1.62it/s] {'loss': 0.1387, 'grad_norm': 0.5913057923316956, 'learning_rate': 2.570266184326906e-06, 'epoch': 2.09}
70%|██████▉ | 8015/11526 [1:23:47<36:12, 1.62it/s] 70%|██████▉ | 8016/11526 [1:23:48<36:06, 1.62it/s] {'loss': 0.1155, 'grad_norm': 0.4859989583492279, 'learning_rate': 2.5689428042301045e-06, 'epoch': 2.09}
70%|██████▉ | 8016/11526 [1:23:48<36:06, 1.62it/s] 70%|██████▉ | 8017/11526 [1:23:49<36:02, 1.62it/s] {'loss': 0.1338, 'grad_norm': 0.6034117341041565, 'learning_rate': 2.5676196471237003e-06, 'epoch': 2.09}
70%|██████▉ | 8017/11526 [1:23:49<36:02, 1.62it/s] 70%|██████▉ | 8018/11526 [1:23:49<35:59, 1.62it/s] {'loss': 0.173, 'grad_norm': 0.5439207553863525, 'learning_rate': 2.566296713129065e-06, 'epoch': 2.09}
70%|██████▉ | 8018/11526 [1:23:49<35:59, 1.62it/s] 70%|██████▉ | 8019/11526 [1:23:50<35:59, 1.62it/s] {'loss': 0.1208, 'grad_norm': 0.4570992887020111, 'learning_rate': 2.5649740023675373e-06, 'epoch': 2.09}
70%|██████▉ | 8019/11526 [1:23:50<35:59, 1.62it/s] 70%|██████▉ | 8020/11526 [1:23:50<35:57, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.555279552936554, 'learning_rate': 2.5636515149604534e-06, 'epoch': 2.09}
70%|██████▉ | 8020/11526 [1:23:51<35:57, 1.63it/s] 70%|██████▉ | 8021/11526 [1:23:51<35:55, 1.63it/s] {'loss': 0.1324, 'grad_norm': 0.5058407187461853, 'learning_rate': 2.5623292510291124e-06, 'epoch': 2.09}
70%|██████▉ | 8021/11526 [1:23:51<35:55, 1.63it/s] 70%|██████▉ | 8022/11526 [1:23:52<35:53, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.5669804811477661, 'learning_rate': 2.5610072106948023e-06, 'epoch': 2.09}
70%|██████▉ | 8022/11526 [1:23:52<35:53, 1.63it/s] 70%|██████▉ | 8023/11526 [1:23:52<35:52, 1.63it/s] {'loss': 0.1253, 'grad_norm': 0.5020909905433655, 'learning_rate': 2.559685394078791e-06, 'epoch': 2.09}
70%|██████▉ | 8023/11526 [1:23:52<35:52, 1.63it/s] 70%|██████▉ | 8024/11526 [1:23:53<35:53, 1.63it/s] {'loss': 0.1555, 'grad_norm': 0.5231995582580566, 'learning_rate': 2.5583638013023156e-06, 'epoch': 2.09}
70%|██████▉ | 8024/11526 [1:23:53<35:53, 1.63it/s] 70%|██████▉ | 8025/11526 [1:23:53<35:51, 1.63it/s] {'loss': 0.1507, 'grad_norm': 0.615699052810669, 'learning_rate': 2.557042432486606e-06, 'epoch': 2.09}
70%|██████▉ | 8025/11526 [1:23:54<35:51, 1.63it/s] 70%|██████▉ | 8026/11526 [1:23:54<35:51, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.5611687898635864, 'learning_rate': 2.5557212877528668e-06, 'epoch': 2.09}
70%|██████▉ | 8026/11526 [1:23:54<35:51, 1.63it/s] 70%|██████▉ | 8027/11526 [1:23:55<35:50, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.6372145414352417, 'learning_rate': 2.5544003672222773e-06, 'epoch': 2.09}
70%|██████▉ | 8027/11526 [1:23:55<35:50, 1.63it/s] 70%|██████▉ | 8028/11526 [1:23:55<35:49, 1.63it/s] {'loss': 0.1335, 'grad_norm': 0.5068512558937073, 'learning_rate': 2.553079671016e-06, 'epoch': 2.09}
70%|██████▉ | 8028/11526 [1:23:55<35:49, 1.63it/s] 70%|██████▉ | 8029/11526 [1:23:56<35:48, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.6207832098007202, 'learning_rate': 2.551759199255178e-06, 'epoch': 2.09}
70%|██████▉ | 8029/11526 [1:23:56<35:48, 1.63it/s] 70%|██████▉ | 8030/11526 [1:23:57<35:48, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.6848413348197937, 'learning_rate': 2.550438952060932e-06, 'epoch': 2.09}
70%|██████▉ | 8030/11526 [1:23:57<35:48, 1.63it/s] 70%|██████▉ | 8031/11526 [1:23:57<35:47, 1.63it/s] {'loss': 0.1491, 'grad_norm': 0.5599398612976074, 'learning_rate': 2.549118929554365e-06, 'epoch': 2.09}
70%|██████▉ | 8031/11526 [1:23:57<35:47, 1.63it/s] 70%|██████▉ | 8032/11526 [1:23:58<35:47, 1.63it/s] {'loss': 0.1381, 'grad_norm': 0.47900089621543884, 'learning_rate': 2.547799131856551e-06, 'epoch': 2.09}
70%|██████▉ | 8032/11526 [1:23:58<35:47, 1.63it/s] 70%|██████▉ | 8033/11526 [1:23:58<35:46, 1.63it/s] {'loss': 0.1469, 'grad_norm': 0.5651615262031555, 'learning_rate': 2.546479559088555e-06, 'epoch': 2.09}
70%|██████▉ | 8033/11526 [1:23:59<35:46, 1.63it/s] 70%|██████▉ | 8034/11526 [1:23:59<35:48, 1.63it/s] {'loss': 0.1547, 'grad_norm': 0.555004358291626, 'learning_rate': 2.5451602113714146e-06, 'epoch': 2.09}
70%|██████▉ | 8034/11526 [1:23:59<35:48, 1.63it/s] 70%|██████▉ | 8035/11526 [1:24:00<35:47, 1.63it/s] {'loss': 0.1575, 'grad_norm': 0.6128608584403992, 'learning_rate': 2.5438410888261465e-06, 'epoch': 2.09}
70%|██████▉ | 8035/11526 [1:24:00<35:47, 1.63it/s] 70%|██████▉ | 8036/11526 [1:24:00<35:46, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.5999276638031006, 'learning_rate': 2.542522191573748e-06, 'epoch': 2.09}
70%|██████▉ | 8036/11526 [1:24:00<35:46, 1.63it/s] 70%|██████▉ | 8037/11526 [1:24:01<35:44, 1.63it/s] {'loss': 0.1582, 'grad_norm': 0.5983501076698303, 'learning_rate': 2.541203519735197e-06, 'epoch': 2.09}
70%|██████▉ | 8037/11526 [1:24:01<35:44, 1.63it/s] 70%|██████▉ | 8038/11526 [1:24:01<35:43, 1.63it/s] {'loss': 0.139, 'grad_norm': 0.603956937789917, 'learning_rate': 2.5398850734314485e-06, 'epoch': 2.09}
70%|██████▉ | 8038/11526 [1:24:02<35:43, 1.63it/s] 70%|██████▉ | 8039/11526 [1:24:02<35:52, 1.62it/s] {'loss': 0.1242, 'grad_norm': 0.47500619292259216, 'learning_rate': 2.5385668527834413e-06, 'epoch': 2.09}
70%|██████▉ | 8039/11526 [1:24:02<35:52, 1.62it/s] 70%|██████▉ | 8040/11526 [1:24:03<35:47, 1.62it/s] {'loss': 0.1689, 'grad_norm': 0.5904012322425842, 'learning_rate': 2.5372488579120848e-06, 'epoch': 2.09}
70%|██████▉ | 8040/11526 [1:24:03<35:47, 1.62it/s] 70%|██████▉ | 8041/11526 [1:24:03<35:44, 1.62it/s] {'loss': 0.1559, 'grad_norm': 0.6129305362701416, 'learning_rate': 2.535931088938274e-06, 'epoch': 2.09}
70%|██████▉ | 8041/11526 [1:24:03<35:44, 1.62it/s] 70%|██████▉ | 8042/11526 [1:24:04<35:42, 1.63it/s] {'loss': 0.1898, 'grad_norm': 0.551466703414917, 'learning_rate': 2.534613545982887e-06, 'epoch': 2.09}
70%|██████▉ | 8042/11526 [1:24:04<35:42, 1.63it/s] 70%|██████▉ | 8043/11526 [1:24:05<35:41, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.6074867248535156, 'learning_rate': 2.5332962291667716e-06, 'epoch': 2.09}
70%|██████▉ | 8043/11526 [1:24:05<35:41, 1.63it/s] 70%|██████▉ | 8044/11526 [1:24:05<35:48, 1.62it/s] {'loss': 0.1379, 'grad_norm': 0.6069484353065491, 'learning_rate': 2.5319791386107607e-06, 'epoch': 2.09}
70%|██████▉ | 8044/11526 [1:24:05<35:48, 1.62it/s] 70%|██████▉ | 8045/11526 [1:24:06<35:46, 1.62it/s] {'loss': 0.1317, 'grad_norm': 0.5110628604888916, 'learning_rate': 2.530662274435668e-06, 'epoch': 2.09}
70%|██████▉ | 8045/11526 [1:24:06<35:46, 1.62it/s] 70%|██████▉ | 8046/11526 [1:24:06<35:45, 1.62it/s] {'loss': 0.1564, 'grad_norm': 0.6163866519927979, 'learning_rate': 2.529345636762277e-06, 'epoch': 2.09}
70%|██████▉ | 8046/11526 [1:24:07<35:45, 1.62it/s] 70%|██████▉ | 8047/11526 [1:24:07<35:42, 1.62it/s] {'loss': 0.1487, 'grad_norm': 0.55876225233078, 'learning_rate': 2.528029225711366e-06, 'epoch': 2.09}
70%|██████▉ | 8047/11526 [1:24:07<35:42, 1.62it/s] 70%|██████▉ | 8048/11526 [1:24:08<35:41, 1.62it/s] {'loss': 0.1663, 'grad_norm': 0.6520662903785706, 'learning_rate': 2.5267130414036765e-06, 'epoch': 2.09}
70%|██████▉ | 8048/11526 [1:24:08<35:41, 1.62it/s] 70%|██████▉ | 8049/11526 [1:24:08<35:39, 1.62it/s] {'loss': 0.1727, 'grad_norm': 0.6107629537582397, 'learning_rate': 2.525397083959941e-06, 'epoch': 2.1}
70%|██████▉ | 8049/11526 [1:24:08<35:39, 1.62it/s] 70%|██████▉ | 8050/11526 [1:24:09<35:37, 1.63it/s] {'loss': 0.147, 'grad_norm': 0.543532133102417, 'learning_rate': 2.5240813535008634e-06, 'epoch': 2.1}
70%|██████▉ | 8050/11526 [1:24:09<35:37, 1.63it/s] 70%|██████▉ | 8051/11526 [1:24:09<35:36, 1.63it/s] {'loss': 0.156, 'grad_norm': 0.633022665977478, 'learning_rate': 2.522765850147132e-06, 'epoch': 2.1}
70%|██████▉ | 8051/11526 [1:24:10<35:36, 1.63it/s] 70%|██████▉ | 8052/11526 [1:24:10<35:35, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5612101554870605, 'learning_rate': 2.521450574019412e-06, 'epoch': 2.1}
70%|██████▉ | 8052/11526 [1:24:10<35:35, 1.63it/s] 70%|██████▉ | 8053/11526 [1:24:11<35:34, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.6236371397972107, 'learning_rate': 2.5201355252383485e-06, 'epoch': 2.1}
70%|██████▉ | 8053/11526 [1:24:11<35:34, 1.63it/s] 70%|██████▉ | 8054/11526 [1:24:11<35:36, 1.63it/s] {'loss': 0.1784, 'grad_norm': 0.5813413262367249, 'learning_rate': 2.5188207039245637e-06, 'epoch': 2.1}
70%|██████▉ | 8054/11526 [1:24:11<35:36, 1.63it/s] 70%|██████▉ | 8055/11526 [1:24:12<35:34, 1.63it/s] {'loss': 0.1831, 'grad_norm': 0.6360166072845459, 'learning_rate': 2.517506110198659e-06, 'epoch': 2.1}
70%|██████▉ | 8055/11526 [1:24:12<35:34, 1.63it/s] 70%|██████▉ | 8056/11526 [1:24:13<35:34, 1.63it/s] {'loss': 0.1889, 'grad_norm': 0.6649025678634644, 'learning_rate': 2.516191744181222e-06, 'epoch': 2.1}
70%|██████▉ | 8056/11526 [1:24:13<35:34, 1.63it/s] 70%|██████▉ | 8057/11526 [1:24:13<35:33, 1.63it/s] {'loss': 0.1756, 'grad_norm': 0.6504244804382324, 'learning_rate': 2.5148776059928092e-06, 'epoch': 2.1}
70%|██████▉ | 8057/11526 [1:24:13<35:33, 1.63it/s] 70%|██████▉ | 8058/11526 [1:24:14<35:32, 1.63it/s] {'loss': 0.1772, 'grad_norm': 0.598876953125, 'learning_rate': 2.5135636957539623e-06, 'epoch': 2.1}
70%|██████▉ | 8058/11526 [1:24:14<35:32, 1.63it/s] 70%|██████▉ | 8059/11526 [1:24:14<35:32, 1.63it/s] {'loss': 0.1514, 'grad_norm': 0.5661754012107849, 'learning_rate': 2.5122500135852e-06, 'epoch': 2.1}
70%|██████▉ | 8059/11526 [1:24:15<35:32, 1.63it/s] 70%|██████▉ | 8060/11526 [1:24:15<35:31, 1.63it/s] {'loss': 0.1254, 'grad_norm': 0.47792330384254456, 'learning_rate': 2.510936559607021e-06, 'epoch': 2.1}
70%|██████▉ | 8060/11526 [1:24:15<35:31, 1.63it/s] 70%|██████▉ | 8061/11526 [1:24:16<35:29, 1.63it/s] {'loss': 0.1837, 'grad_norm': 0.6900224685668945, 'learning_rate': 2.5096233339399058e-06, 'epoch': 2.1}
70%|██████▉ | 8061/11526 [1:24:16<35:29, 1.63it/s] 70%|██████▉ | 8062/11526 [1:24:16<35:27, 1.63it/s] {'loss': 0.1615, 'grad_norm': 0.6360383629798889, 'learning_rate': 2.5083103367043053e-06, 'epoch': 2.1}
70%|██████▉ | 8062/11526 [1:24:16<35:27, 1.63it/s] 70%|██████▉ | 8063/11526 [1:24:17<35:27, 1.63it/s] {'loss': 0.1536, 'grad_norm': 0.5956168174743652, 'learning_rate': 2.5069975680206582e-06, 'epoch': 2.1}
70%|██████▉ | 8063/11526 [1:24:17<35:27, 1.63it/s] 70%|██████▉ | 8064/11526 [1:24:17<35:28, 1.63it/s] {'loss': 0.1509, 'grad_norm': 0.556392252445221, 'learning_rate': 2.5056850280093787e-06, 'epoch': 2.1}
70%|██████▉ | 8064/11526 [1:24:18<35:28, 1.63it/s] 70%|██████▉ | 8065/11526 [1:24:18<35:26, 1.63it/s] {'loss': 0.1861, 'grad_norm': 0.6877421736717224, 'learning_rate': 2.5043727167908605e-06, 'epoch': 2.1}
70%|██████▉ | 8065/11526 [1:24:18<35:26, 1.63it/s] 70%|██████▉ | 8066/11526 [1:24:19<35:25, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.7147549986839294, 'learning_rate': 2.5030606344854756e-06, 'epoch': 2.1}
70%|██████▉ | 8066/11526 [1:24:19<35:25, 1.63it/s] 70%|██████▉ | 8067/11526 [1:24:19<35:24, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.5279784202575684, 'learning_rate': 2.5017487812135793e-06, 'epoch': 2.1}
70%|██████▉ | 8067/11526 [1:24:19<35:24, 1.63it/s] 70%|██████▉ | 8068/11526 [1:24:20<35:24, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.5733980536460876, 'learning_rate': 2.500437157095495e-06, 'epoch': 2.1}
70%|██████▉ | 8068/11526 [1:24:20<35:24, 1.63it/s] 70%|███████ | 8069/11526 [1:24:21<35:26, 1.63it/s] {'loss': 0.1339, 'grad_norm': 0.48810115456581116, 'learning_rate': 2.4991257622515404e-06, 'epoch': 2.1}
70%|███████ | 8069/11526 [1:24:21<35:26, 1.63it/s] 70%|███████ | 8070/11526 [1:24:21<35:25, 1.63it/s] {'loss': 0.1131, 'grad_norm': 0.5116328001022339, 'learning_rate': 2.4978145968019984e-06, 'epoch': 2.1}
70%|███████ | 8070/11526 [1:24:21<35:25, 1.63it/s] 70%|███████ | 8071/11526 [1:24:22<35:22, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5875647068023682, 'learning_rate': 2.4965036608671384e-06, 'epoch': 2.1}
70%|███████ | 8071/11526 [1:24:22<35:22, 1.63it/s] 70%|███████ | 8072/11526 [1:24:22<35:22, 1.63it/s] {'loss': 0.1441, 'grad_norm': 0.5873213410377502, 'learning_rate': 2.4951929545672087e-06, 'epoch': 2.1}
70%|███████ | 8072/11526 [1:24:23<35:22, 1.63it/s] 70%|███████ | 8073/11526 [1:24:23<35:21, 1.63it/s] {'loss': 0.4287, 'grad_norm': 0.680503785610199, 'learning_rate': 2.493882478022429e-06, 'epoch': 2.1}
70%|███████ | 8073/11526 [1:24:23<35:21, 1.63it/s] 70%|███████ | 8074/11526 [1:24:24<35:22, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5442952513694763, 'learning_rate': 2.4925722313530097e-06, 'epoch': 2.1}
70%|███████ | 8074/11526 [1:24:24<35:22, 1.63it/s] 70%|███████ | 8075/11526 [1:24:24<35:21, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.510952353477478, 'learning_rate': 2.491262214679134e-06, 'epoch': 2.1}
70%|███████ | 8075/11526 [1:24:24<35:21, 1.63it/s] 70%|███████ | 8076/11526 [1:24:25<35:19, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.5822171568870544, 'learning_rate': 2.4899524281209602e-06, 'epoch': 2.1}
70%|███████ | 8076/11526 [1:24:25<35:19, 1.63it/s] 70%|███████ | 8077/11526 [1:24:25<35:18, 1.63it/s] {'loss': 0.1356, 'grad_norm': 0.5770906805992126, 'learning_rate': 2.488642871798631e-06, 'epoch': 2.1}
70%|███████ | 8077/11526 [1:24:26<35:18, 1.63it/s] 70%|███████ | 8078/11526 [1:24:26<35:17, 1.63it/s] {'loss': 0.1566, 'grad_norm': 0.5475075244903564, 'learning_rate': 2.4873335458322674e-06, 'epoch': 2.1}
70%|███████ | 8078/11526 [1:24:26<35:17, 1.63it/s] 70%|███████ | 8079/11526 [1:24:27<35:21, 1.62it/s] {'loss': 0.2031, 'grad_norm': 0.7395617365837097, 'learning_rate': 2.4860244503419666e-06, 'epoch': 2.1}
70%|███████ | 8079/11526 [1:24:27<35:21, 1.62it/s] 70%|███████ | 8080/11526 [1:24:27<35:19, 1.63it/s] {'loss': 0.1439, 'grad_norm': 0.5114082098007202, 'learning_rate': 2.4847155854478096e-06, 'epoch': 2.1}
70%|███████ | 8080/11526 [1:24:27<35:19, 1.63it/s] 70%|███████ | 8081/11526 [1:24:28<35:19, 1.63it/s] {'loss': 0.1179, 'grad_norm': 0.5347930788993835, 'learning_rate': 2.4834069512698466e-06, 'epoch': 2.1}
70%|███████ | 8081/11526 [1:24:28<35:19, 1.63it/s] 70%|███████ | 8082/11526 [1:24:29<35:18, 1.63it/s] {'loss': 0.1905, 'grad_norm': 0.5516717433929443, 'learning_rate': 2.4820985479281184e-06, 'epoch': 2.1}
70%|███████ | 8082/11526 [1:24:29<35:18, 1.63it/s] 70%|███████ | 8083/11526 [1:24:29<35:16, 1.63it/s] {'loss': 0.1603, 'grad_norm': 0.5728520154953003, 'learning_rate': 2.4807903755426398e-06, 'epoch': 2.1}
70%|███████ | 8083/11526 [1:24:29<35:16, 1.63it/s] 70%|███████ | 8084/11526 [1:24:30<35:19, 1.62it/s] {'loss': 0.106, 'grad_norm': 0.431545227766037, 'learning_rate': 2.4794824342333997e-06, 'epoch': 2.1}
70%|███████ | 8084/11526 [1:24:30<35:19, 1.62it/s] 70%|███████ | 8085/11526 [1:24:30<35:17, 1.62it/s] {'loss': 0.1481, 'grad_norm': 0.5442871451377869, 'learning_rate': 2.478174724120372e-06, 'epoch': 2.1}
70%|███████ | 8085/11526 [1:24:31<35:17, 1.62it/s] 70%|███████ | 8086/11526 [1:24:31<35:15, 1.63it/s] {'loss': 0.1759, 'grad_norm': 1.1695282459259033, 'learning_rate': 2.476867245323507e-06, 'epoch': 2.1}
70%|███████ | 8086/11526 [1:24:31<35:15, 1.63it/s] 70%|███████ | 8087/11526 [1:24:32<35:13, 1.63it/s] {'loss': 0.1549, 'grad_norm': 0.5948282480239868, 'learning_rate': 2.4755599979627344e-06, 'epoch': 2.1}
70%|███████ | 8087/11526 [1:24:32<35:13, 1.63it/s] 70%|███████ | 8088/11526 [1:24:32<35:12, 1.63it/s] {'loss': 0.1492, 'grad_norm': 0.5362713932991028, 'learning_rate': 2.474252982157964e-06, 'epoch': 2.11}
70%|███████ | 8088/11526 [1:24:32<35:12, 1.63it/s] 70%|███████ | 8089/11526 [1:24:33<35:13, 1.63it/s] {'loss': 0.1775, 'grad_norm': 0.6031323671340942, 'learning_rate': 2.4729461980290796e-06, 'epoch': 2.11}
70%|███████ | 8089/11526 [1:24:33<35:13, 1.63it/s] 70%|███████ | 8090/11526 [1:24:33<35:11, 1.63it/s] {'loss': 0.1513, 'grad_norm': 0.6583135724067688, 'learning_rate': 2.471639645695946e-06, 'epoch': 2.11}
70%|███████ | 8090/11526 [1:24:34<35:11, 1.63it/s] 70%|███████ | 8091/11526 [1:24:34<35:16, 1.62it/s] {'loss': 0.119, 'grad_norm': 0.4853956699371338, 'learning_rate': 2.4703333252784138e-06, 'epoch': 2.11}
70%|███████ | 8091/11526 [1:24:34<35:16, 1.62it/s] 70%|███████ | 8092/11526 [1:24:35<35:12, 1.63it/s] {'loss': 0.1508, 'grad_norm': 0.6978268623352051, 'learning_rate': 2.4690272368963003e-06, 'epoch': 2.11}
70%|███████ | 8092/11526 [1:24:35<35:12, 1.63it/s] 70%|███████ | 8093/11526 [1:24:35<35:11, 1.63it/s] {'loss': 0.1617, 'grad_norm': 0.6073335409164429, 'learning_rate': 2.467721380669409e-06, 'epoch': 2.11}
70%|███████ | 8093/11526 [1:24:35<35:11, 1.63it/s] 70%|███████ | 8094/11526 [1:24:36<35:10, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.6367928981781006, 'learning_rate': 2.4664157567175227e-06, 'epoch': 2.11}
70%|███████ | 8094/11526 [1:24:36<35:10, 1.63it/s] 70%|███████ | 8095/11526 [1:24:37<35:07, 1.63it/s] {'loss': 0.1295, 'grad_norm': 0.5184325575828552, 'learning_rate': 2.465110365160395e-06, 'epoch': 2.11}
70%|███████ | 8095/11526 [1:24:37<35:07, 1.63it/s] 70%|███████ | 8096/11526 [1:24:37<35:08, 1.63it/s] {'loss': 0.1238, 'grad_norm': 0.4731839895248413, 'learning_rate': 2.4638052061177715e-06, 'epoch': 2.11}
70%|███████ | 8096/11526 [1:24:37<35:08, 1.63it/s] 70%|███████ | 8097/11526 [1:24:38<35:08, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.5591551661491394, 'learning_rate': 2.462500279709363e-06, 'epoch': 2.11}
70%|███████ | 8097/11526 [1:24:38<35:08, 1.63it/s] 70%|███████ | 8098/11526 [1:24:38<35:07, 1.63it/s] {'loss': 0.1474, 'grad_norm': 0.5775114297866821, 'learning_rate': 2.4611955860548663e-06, 'epoch': 2.11}
70%|███████ | 8098/11526 [1:24:38<35:07, 1.63it/s] 70%|███████ | 8099/11526 [1:24:39<35:08, 1.63it/s] {'loss': 0.1778, 'grad_norm': 0.6580596566200256, 'learning_rate': 2.4598911252739553e-06, 'epoch': 2.11}
70%|███████ | 8099/11526 [1:24:39<35:08, 1.63it/s] 70%|███████ | 8100/11526 [1:24:40<35:08, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.6844307780265808, 'learning_rate': 2.4585868974862836e-06, 'epoch': 2.11}
70%|███████ | 8100/11526 [1:24:40<35:08, 1.63it/s] 70%|███████ | 8101/11526 [1:24:40<35:06, 1.63it/s] {'loss': 0.1616, 'grad_norm': 0.5679958462715149, 'learning_rate': 2.4572829028114815e-06, 'epoch': 2.11}
70%|███████ | 8101/11526 [1:24:40<35:06, 1.63it/s] 70%|███████ | 8102/11526 [1:24:41<35:04, 1.63it/s] {'loss': 0.1374, 'grad_norm': 0.4850377142429352, 'learning_rate': 2.455979141369161e-06, 'epoch': 2.11}
70%|███████ | 8102/11526 [1:24:41<35:04, 1.63it/s] 70%|███████ | 8103/11526 [1:24:41<35:03, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.5850309133529663, 'learning_rate': 2.4546756132789063e-06, 'epoch': 2.11}
70%|███████ | 8103/11526 [1:24:42<35:03, 1.63it/s] 70%|███████ | 8104/11526 [1:24:42<35:06, 1.62it/s] {'loss': 0.1588, 'grad_norm': 0.6501890420913696, 'learning_rate': 2.453372318660287e-06, 'epoch': 2.11}
70%|███████ | 8104/11526 [1:24:42<35:06, 1.62it/s] 70%|███████ | 8105/11526 [1:24:43<35:03, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.6169329285621643, 'learning_rate': 2.452069257632848e-06, 'epoch': 2.11}
70%|███████ | 8105/11526 [1:24:43<35:03, 1.63it/s] 70%|███████ | 8106/11526 [1:24:43<35:03, 1.63it/s] {'loss': 0.1429, 'grad_norm': 0.521846354007721, 'learning_rate': 2.4507664303161143e-06, 'epoch': 2.11}
70%|███████ | 8106/11526 [1:24:43<35:03, 1.63it/s] 70%|███████ | 8107/11526 [1:24:44<35:02, 1.63it/s] {'loss': 0.135, 'grad_norm': 0.5100751519203186, 'learning_rate': 2.4494638368295897e-06, 'epoch': 2.11}
70%|███████ | 8107/11526 [1:24:44<35:02, 1.63it/s] 70%|███████ | 8108/11526 [1:24:45<35:01, 1.63it/s] {'loss': 0.1793, 'grad_norm': 0.578186571598053, 'learning_rate': 2.4481614772927503e-06, 'epoch': 2.11}
70%|███████ | 8108/11526 [1:24:45<35:01, 1.63it/s] 70%|███████ | 8109/11526 [1:24:45<34:59, 1.63it/s] {'loss': 0.1265, 'grad_norm': 0.5304055213928223, 'learning_rate': 2.4468593518250616e-06, 'epoch': 2.11}
70%|███████ | 8109/11526 [1:24:45<34:59, 1.63it/s] 70%|███████ | 8110/11526 [1:24:46<34:59, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.639374315738678, 'learning_rate': 2.4455574605459616e-06, 'epoch': 2.11}
70%|███████ | 8110/11526 [1:24:46<34:59, 1.63it/s] 70%|███████ | 8111/11526 [1:24:46<34:58, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.47561120986938477, 'learning_rate': 2.444255803574864e-06, 'epoch': 2.11}
70%|███████ | 8111/11526 [1:24:46<34:58, 1.63it/s] 70%|███████ | 8112/11526 [1:24:47<35:00, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.5417733192443848, 'learning_rate': 2.4429543810311646e-06, 'epoch': 2.11}
70%|███████ | 8112/11526 [1:24:47<35:00, 1.63it/s] 70%|███████ | 8113/11526 [1:24:48<34:57, 1.63it/s] {'loss': 0.1675, 'grad_norm': 0.6166174411773682, 'learning_rate': 2.441653193034239e-06, 'epoch': 2.11}
70%|███████ | 8113/11526 [1:24:48<34:57, 1.63it/s] 70%|███████ | 8114/11526 [1:24:48<34:55, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.6800180077552795, 'learning_rate': 2.440352239703439e-06, 'epoch': 2.11}
70%|███████ | 8114/11526 [1:24:48<34:55, 1.63it/s] 70%|███████ | 8115/11526 [1:24:49<34:55, 1.63it/s] {'loss': 0.1223, 'grad_norm': 0.4809109568595886, 'learning_rate': 2.4390515211580956e-06, 'epoch': 2.11}
70%|███████ | 8115/11526 [1:24:49<34:55, 1.63it/s] 70%|███████ | 8116/11526 [1:24:49<34:54, 1.63it/s] {'loss': 0.1487, 'grad_norm': 0.567209780216217, 'learning_rate': 2.4377510375175197e-06, 'epoch': 2.11}
70%|███████ | 8116/11526 [1:24:50<34:54, 1.63it/s] 70%|███████ | 8117/11526 [1:24:50<34:53, 1.63it/s] {'loss': 0.1572, 'grad_norm': 0.539130449295044, 'learning_rate': 2.4364507889009933e-06, 'epoch': 2.11}
70%|███████ | 8117/11526 [1:24:50<34:53, 1.63it/s] 70%|███████ | 8118/11526 [1:24:51<34:52, 1.63it/s] {'loss': 0.1573, 'grad_norm': 0.5775097608566284, 'learning_rate': 2.4351507754277896e-06, 'epoch': 2.11}
70%|███████ | 8118/11526 [1:24:51<34:52, 1.63it/s] 70%|███████ | 8119/11526 [1:24:51<34:51, 1.63it/s] {'loss': 0.1533, 'grad_norm': 0.7373995184898376, 'learning_rate': 2.4338509972171493e-06, 'epoch': 2.11}
70%|███████ | 8119/11526 [1:24:51<34:51, 1.63it/s] 70%|███████ | 8120/11526 [1:24:52<34:50, 1.63it/s] {'loss': 0.176, 'grad_norm': 0.6853962540626526, 'learning_rate': 2.432551454388296e-06, 'epoch': 2.11}
70%|███████ | 8120/11526 [1:24:52<34:50, 1.63it/s] 70%|███████ | 8121/11526 [1:24:53<34:51, 1.63it/s] {'loss': 0.1328, 'grad_norm': 0.6881516575813293, 'learning_rate': 2.4312521470604333e-06, 'epoch': 2.11}
70%|███████ | 8121/11526 [1:24:53<34:51, 1.63it/s] 70%|███████ | 8122/11526 [1:24:53<34:50, 1.63it/s] {'loss': 0.1192, 'grad_norm': 0.45562711358070374, 'learning_rate': 2.4299530753527356e-06, 'epoch': 2.11}
70%|███████ | 8122/11526 [1:24:53<34:50, 1.63it/s] 70%|███████ | 8123/11526 [1:24:54<34:49, 1.63it/s] {'loss': 0.1273, 'grad_norm': 0.5313162803649902, 'learning_rate': 2.4286542393843665e-06, 'epoch': 2.11}
70%|███████ | 8123/11526 [1:24:54<34:49, 1.63it/s] 70%|███████ | 8124/11526 [1:24:54<34:50, 1.63it/s] {'loss': 0.174, 'grad_norm': 0.5124939680099487, 'learning_rate': 2.4273556392744628e-06, 'epoch': 2.11}
70%|███████ | 8124/11526 [1:24:54<34:50, 1.63it/s] 70%|███████ | 8125/11526 [1:24:55<34:48, 1.63it/s] {'loss': 0.1237, 'grad_norm': 0.5427232980728149, 'learning_rate': 2.4260572751421364e-06, 'epoch': 2.11}
70%|███████ | 8125/11526 [1:24:55<34:48, 1.63it/s] 71%|███████ | 8126/11526 [1:24:56<34:48, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.6885648369789124, 'learning_rate': 2.4247591471064807e-06, 'epoch': 2.12}
71%|███████ | 8126/11526 [1:24:56<34:48, 1.63it/s] 71%|███████ | 8127/11526 [1:24:56<34:47, 1.63it/s] {'loss': 0.1332, 'grad_norm': 0.5962315797805786, 'learning_rate': 2.4234612552865694e-06, 'epoch': 2.12}
71%|███████ | 8127/11526 [1:24:56<34:47, 1.63it/s] 71%|███████ | 8128/11526 [1:24:57<34:47, 1.63it/s] {'loss': 0.1351, 'grad_norm': 0.5028374195098877, 'learning_rate': 2.4221635998014516e-06, 'epoch': 2.12}
71%|███████ | 8128/11526 [1:24:57<34:47, 1.63it/s] 71%|███████ | 8129/11526 [1:24:57<34:45, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.6276793479919434, 'learning_rate': 2.420866180770157e-06, 'epoch': 2.12}
71%|███████ | 8129/11526 [1:24:58<34:45, 1.63it/s] 71%|███████ | 8130/11526 [1:24:58<34:46, 1.63it/s] {'loss': 0.1679, 'grad_norm': 0.5008382201194763, 'learning_rate': 2.419568998311688e-06, 'epoch': 2.12}
71%|███████ | 8130/11526 [1:24:58<34:46, 1.63it/s] 71%|███████ | 8131/11526 [1:24:59<34:44, 1.63it/s] {'loss': 0.1649, 'grad_norm': 0.5530479550361633, 'learning_rate': 2.4182720525450336e-06, 'epoch': 2.12}
71%|███████ | 8131/11526 [1:24:59<34:44, 1.63it/s] 71%|███████ | 8132/11526 [1:24:59<34:43, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.4994702637195587, 'learning_rate': 2.4169753435891583e-06, 'epoch': 2.12}
71%|███████ | 8132/11526 [1:24:59<34:43, 1.63it/s] 71%|███████ | 8133/11526 [1:25:00<34:43, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.5757455825805664, 'learning_rate': 2.4156788715629992e-06, 'epoch': 2.12}
71%|███████ | 8133/11526 [1:25:00<34:43, 1.63it/s] 71%|███████ | 8134/11526 [1:25:00<34:43, 1.63it/s] {'loss': 0.1463, 'grad_norm': 0.49380385875701904, 'learning_rate': 2.4143826365854783e-06, 'epoch': 2.12}
71%|███████ | 8134/11526 [1:25:01<34:43, 1.63it/s] 71%|███████ | 8135/11526 [1:25:01<34:42, 1.63it/s] {'loss': 0.1385, 'grad_norm': 0.5459789633750916, 'learning_rate': 2.4130866387754937e-06, 'epoch': 2.12}
71%|███████ | 8135/11526 [1:25:01<34:42, 1.63it/s] 71%|███████ | 8136/11526 [1:25:02<34:41, 1.63it/s] {'loss': 0.1318, 'grad_norm': 0.5446740388870239, 'learning_rate': 2.4117908782519212e-06, 'epoch': 2.12}
71%|███████ | 8136/11526 [1:25:02<34:41, 1.63it/s] 71%|███████ | 8137/11526 [1:25:02<34:40, 1.63it/s] {'loss': 0.1403, 'grad_norm': 0.52053302526474, 'learning_rate': 2.410495355133618e-06, 'epoch': 2.12}
71%|███████ | 8137/11526 [1:25:02<34:40, 1.63it/s] 71%|███████ | 8138/11526 [1:25:03<34:40, 1.63it/s] {'loss': 0.1907, 'grad_norm': 0.6263159513473511, 'learning_rate': 2.409200069539412e-06, 'epoch': 2.12}
71%|███████ | 8138/11526 [1:25:03<34:40, 1.63it/s] 71%|███████ | 8139/11526 [1:25:04<34:39, 1.63it/s] {'loss': 0.2377, 'grad_norm': 0.761496901512146, 'learning_rate': 2.407905021588115e-06, 'epoch': 2.12}
71%|███████ | 8139/11526 [1:25:04<34:39, 1.63it/s] 71%|███████ | 8140/11526 [1:25:04<34:39, 1.63it/s] {'loss': 0.1593, 'grad_norm': 0.5755504369735718, 'learning_rate': 2.4066102113985216e-06, 'epoch': 2.12}
71%|███████ | 8140/11526 [1:25:04<34:39, 1.63it/s] 71%|███████ | 8141/11526 [1:25:05<34:39, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.586273729801178, 'learning_rate': 2.405315639089394e-06, 'epoch': 2.12}
71%|███████ | 8141/11526 [1:25:05<34:39, 1.63it/s] 71%|███████ | 8142/11526 [1:25:05<34:39, 1.63it/s] {'loss': 0.1627, 'grad_norm': 0.5897175073623657, 'learning_rate': 2.4040213047794787e-06, 'epoch': 2.12}
71%|███████ | 8142/11526 [1:25:06<34:39, 1.63it/s] 71%|███████ | 8143/11526 [1:25:06<34:37, 1.63it/s] {'loss': 0.1564, 'grad_norm': 0.5590037107467651, 'learning_rate': 2.402727208587502e-06, 'epoch': 2.12}
71%|███████ | 8143/11526 [1:25:06<34:37, 1.63it/s] 71%|███████ | 8144/11526 [1:25:07<34:37, 1.63it/s] {'loss': 0.1137, 'grad_norm': 0.42685750126838684, 'learning_rate': 2.401433350632159e-06, 'epoch': 2.12}
71%|███████ | 8144/11526 [1:25:07<34:37, 1.63it/s] 71%|███████ | 8145/11526 [1:25:07<34:36, 1.63it/s] {'loss': 0.1796, 'grad_norm': 0.5932849645614624, 'learning_rate': 2.400139731032139e-06, 'epoch': 2.12}
71%|███████ | 8145/11526 [1:25:07<34:36, 1.63it/s] 71%|███████ | 8146/11526 [1:25:08<34:36, 1.63it/s] {'loss': 0.1115, 'grad_norm': 0.41508957743644714, 'learning_rate': 2.3988463499060933e-06, 'epoch': 2.12}
71%|███████ | 8146/11526 [1:25:08<34:36, 1.63it/s] 71%|███████ | 8147/11526 [1:25:08<34:35, 1.63it/s] {'loss': 0.1769, 'grad_norm': 0.6479595303535461, 'learning_rate': 2.3975532073726605e-06, 'epoch': 2.12}
71%|███████ | 8147/11526 [1:25:09<34:35, 1.63it/s] 71%|███████ | 8148/11526 [1:25:09<34:35, 1.63it/s] {'loss': 0.1631, 'grad_norm': 0.5469586849212646, 'learning_rate': 2.3962603035504545e-06, 'epoch': 2.12}
71%|███████ | 8148/11526 [1:25:09<34:35, 1.63it/s] 71%|███████ | 8149/11526 [1:25:10<34:35, 1.63it/s] {'loss': 0.1009, 'grad_norm': 0.4440729022026062, 'learning_rate': 2.3949676385580677e-06, 'epoch': 2.12}
71%|███████ | 8149/11526 [1:25:10<34:35, 1.63it/s] 71%|███████ | 8150/11526 [1:25:10<34:34, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.58995121717453, 'learning_rate': 2.3936752125140723e-06, 'epoch': 2.12}
71%|███████ | 8150/11526 [1:25:10<34:34, 1.63it/s] 71%|███████ | 8151/11526 [1:25:11<34:33, 1.63it/s] {'loss': 0.159, 'grad_norm': 0.5658692121505737, 'learning_rate': 2.3923830255370167e-06, 'epoch': 2.12}
71%|███████ | 8151/11526 [1:25:11<34:33, 1.63it/s] 71%|███████ | 8152/11526 [1:25:12<34:32, 1.63it/s] {'loss': 0.1496, 'grad_norm': 0.5282787084579468, 'learning_rate': 2.391091077745425e-06, 'epoch': 2.12}
71%|███████ | 8152/11526 [1:25:12<34:32, 1.63it/s] 71%|███████ | 8153/11526 [1:25:12<34:32, 1.63it/s] {'loss': 0.1711, 'grad_norm': 0.6018747687339783, 'learning_rate': 2.3897993692578043e-06, 'epoch': 2.12}
71%|███████ | 8153/11526 [1:25:12<34:32, 1.63it/s] 71%|███████ | 8154/11526 [1:25:13<34:32, 1.63it/s] {'loss': 0.1481, 'grad_norm': 0.5517609715461731, 'learning_rate': 2.3885079001926364e-06, 'epoch': 2.12}
71%|███████ | 8154/11526 [1:25:13<34:32, 1.63it/s] 71%|███████ | 8155/11526 [1:25:13<34:31, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.5535705089569092, 'learning_rate': 2.3872166706683826e-06, 'epoch': 2.12}
71%|███████ | 8155/11526 [1:25:14<34:31, 1.63it/s] 71%|███████ | 8156/11526 [1:25:14<34:30, 1.63it/s] {'loss': 0.121, 'grad_norm': 0.4229605793952942, 'learning_rate': 2.3859256808034835e-06, 'epoch': 2.12}
71%|███████ | 8156/11526 [1:25:14<34:30, 1.63it/s] 71%|███████ | 8157/11526 [1:25:15<34:31, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.6777618527412415, 'learning_rate': 2.3846349307163513e-06, 'epoch': 2.12}
71%|███████ | 8157/11526 [1:25:15<34:31, 1.63it/s] 71%|███████ | 8158/11526 [1:25:15<34:29, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.5740674138069153, 'learning_rate': 2.3833444205253856e-06, 'epoch': 2.12}
71%|███████ | 8158/11526 [1:25:15<34:29, 1.63it/s] 71%|███████ | 8159/11526 [1:25:16<34:28, 1.63it/s] {'loss': 0.1761, 'grad_norm': 0.6828050017356873, 'learning_rate': 2.38205415034896e-06, 'epoch': 2.12}
71%|███████ | 8159/11526 [1:25:16<34:28, 1.63it/s] 71%|███████ | 8160/11526 [1:25:16<34:28, 1.63it/s] {'loss': 0.1785, 'grad_norm': 0.5960330963134766, 'learning_rate': 2.380764120305421e-06, 'epoch': 2.12}
71%|███████ | 8160/11526 [1:25:17<34:28, 1.63it/s] 71%|███████ | 8161/11526 [1:25:17<34:27, 1.63it/s] {'loss': 0.1362, 'grad_norm': 0.5397813320159912, 'learning_rate': 2.3794743305131e-06, 'epoch': 2.12}
71%|███████ | 8161/11526 [1:25:17<34:27, 1.63it/s] 71%|███████ | 8162/11526 [1:25:18<34:26, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.5888392925262451, 'learning_rate': 2.3781847810903036e-06, 'epoch': 2.12}
71%|███████ | 8162/11526 [1:25:18<34:26, 1.63it/s] 71%|███████ | 8163/11526 [1:25:18<34:26, 1.63it/s] {'loss': 0.1416, 'grad_norm': 0.5294067859649658, 'learning_rate': 2.376895472155316e-06, 'epoch': 2.12}
71%|███████ | 8163/11526 [1:25:18<34:26, 1.63it/s] 71%|███████ | 8164/11526 [1:25:19<34:26, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.6532846093177795, 'learning_rate': 2.3756064038264033e-06, 'epoch': 2.12}
71%|███████ | 8164/11526 [1:25:19<34:26, 1.63it/s] 71%|███████ | 8165/11526 [1:25:20<34:25, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.7496652007102966, 'learning_rate': 2.374317576221801e-06, 'epoch': 2.13}
71%|███████ | 8165/11526 [1:25:20<34:25, 1.63it/s] 71%|███████ | 8166/11526 [1:25:20<34:27, 1.62it/s] {'loss': 0.1554, 'grad_norm': 0.6619461178779602, 'learning_rate': 2.3730289894597284e-06, 'epoch': 2.13}
71%|███████ | 8166/11526 [1:25:20<34:27, 1.62it/s] 71%|███████ | 8167/11526 [1:25:21<34:25, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.5821983814239502, 'learning_rate': 2.371740643658387e-06, 'epoch': 2.13}
71%|███████ | 8167/11526 [1:25:21<34:25, 1.63it/s] 71%|███████ | 8168/11526 [1:25:21<34:25, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.7299254536628723, 'learning_rate': 2.3704525389359473e-06, 'epoch': 2.13}
71%|███████ | 8168/11526 [1:25:22<34:25, 1.63it/s] 71%|███████ | 8169/11526 [1:25:22<34:24, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.5430393815040588, 'learning_rate': 2.369164675410561e-06, 'epoch': 2.13}
71%|███████ | 8169/11526 [1:25:22<34:24, 1.63it/s] 71%|███████ | 8170/11526 [1:25:23<34:24, 1.63it/s] {'loss': 0.1368, 'grad_norm': 0.6094874739646912, 'learning_rate': 2.367877053200362e-06, 'epoch': 2.13}
71%|███████ | 8170/11526 [1:25:23<34:24, 1.63it/s] 71%|███████ | 8171/11526 [1:25:23<34:23, 1.63it/s] {'loss': 0.1447, 'grad_norm': 0.6515103578567505, 'learning_rate': 2.3665896724234523e-06, 'epoch': 2.13}
71%|███████ | 8171/11526 [1:25:23<34:23, 1.63it/s] 71%|███████ | 8172/11526 [1:25:24<34:22, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.501188337802887, 'learning_rate': 2.3653025331979247e-06, 'epoch': 2.13}
71%|███████ | 8172/11526 [1:25:24<34:22, 1.63it/s] 71%|███████ | 8173/11526 [1:25:24<34:22, 1.63it/s] {'loss': 0.1489, 'grad_norm': 0.5567168593406677, 'learning_rate': 2.3640156356418377e-06, 'epoch': 2.13}
71%|███████ | 8173/11526 [1:25:25<34:22, 1.63it/s] 71%|███████ | 8174/11526 [1:25:25<34:28, 1.62it/s] {'loss': 0.1625, 'grad_norm': 0.6426517963409424, 'learning_rate': 2.3627289798732343e-06, 'epoch': 2.13}
71%|███████ | 8174/11526 [1:25:25<34:28, 1.62it/s] 71%|███████ | 8175/11526 [1:25:26<34:25, 1.62it/s] {'loss': 0.1813, 'grad_norm': 0.6597522497177124, 'learning_rate': 2.3614425660101347e-06, 'epoch': 2.13}
71%|███████ | 8175/11526 [1:25:26<34:25, 1.62it/s] 71%|███████ | 8176/11526 [1:25:26<34:22, 1.62it/s] {'loss': 0.1327, 'grad_norm': 0.5346845984458923, 'learning_rate': 2.360156394170536e-06, 'epoch': 2.13}
71%|███████ | 8176/11526 [1:25:26<34:22, 1.62it/s] 71%|███████ | 8177/11526 [1:25:27<34:20, 1.63it/s] {'loss': 0.147, 'grad_norm': 0.5955610275268555, 'learning_rate': 2.3588704644724127e-06, 'epoch': 2.13}
71%|███████ | 8177/11526 [1:25:27<34:20, 1.63it/s] 71%|███████ | 8178/11526 [1:25:28<34:19, 1.63it/s] {'loss': 0.1679, 'grad_norm': 0.570989727973938, 'learning_rate': 2.3575847770337198e-06, 'epoch': 2.13}
71%|███████ | 8178/11526 [1:25:28<34:19, 1.63it/s] 71%|███████ | 8179/11526 [1:25:28<34:22, 1.62it/s] {'loss': 0.1493, 'grad_norm': 0.550952672958374, 'learning_rate': 2.3562993319723815e-06, 'epoch': 2.13}
71%|███████ | 8179/11526 [1:25:28<34:22, 1.62it/s] 71%|███████ | 8180/11526 [1:25:29<34:18, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.7679899334907532, 'learning_rate': 2.355014129406315e-06, 'epoch': 2.13}
71%|███████ | 8180/11526 [1:25:29<34:18, 1.63it/s] 71%|███████ | 8181/11526 [1:25:29<34:16, 1.63it/s] {'loss': 0.1388, 'grad_norm': 0.6028029918670654, 'learning_rate': 2.3537291694533996e-06, 'epoch': 2.13}
71%|███████ | 8181/11526 [1:25:30<34:16, 1.63it/s] 71%|███████ | 8182/11526 [1:25:30<34:15, 1.63it/s] {'loss': 0.1503, 'grad_norm': 0.5816556811332703, 'learning_rate': 2.3524444522315013e-06, 'epoch': 2.13}
71%|███████ | 8182/11526 [1:25:30<34:15, 1.63it/s] 71%|███████ | 8183/11526 [1:25:31<34:13, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.5595823526382446, 'learning_rate': 2.351159977858463e-06, 'epoch': 2.13}
71%|███████ | 8183/11526 [1:25:31<34:13, 1.63it/s] 71%|███████ | 8184/11526 [1:25:31<34:16, 1.62it/s] {'loss': 0.1611, 'grad_norm': 0.6077173352241516, 'learning_rate': 2.3498757464521018e-06, 'epoch': 2.13}
71%|███████ | 8184/11526 [1:25:31<34:16, 1.62it/s] 71%|███████ | 8185/11526 [1:25:32<34:15, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.6287395358085632, 'learning_rate': 2.3485917581302154e-06, 'epoch': 2.13}
71%|███████ | 8185/11526 [1:25:32<34:15, 1.63it/s] 71%|███████ | 8186/11526 [1:25:32<34:14, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.5659155249595642, 'learning_rate': 2.3473080130105813e-06, 'epoch': 2.13}
71%|███████ | 8186/11526 [1:25:33<34:14, 1.63it/s] 71%|███████ | 8187/11526 [1:25:33<34:13, 1.63it/s] {'loss': 0.1153, 'grad_norm': 0.4489125609397888, 'learning_rate': 2.3460245112109474e-06, 'epoch': 2.13}
71%|███████ | 8187/11526 [1:25:33<34:13, 1.63it/s] 71%|███████ | 8188/11526 [1:25:34<34:13, 1.63it/s] {'loss': 0.1512, 'grad_norm': 0.5586099028587341, 'learning_rate': 2.344741252849046e-06, 'epoch': 2.13}
71%|███████ | 8188/11526 [1:25:34<34:13, 1.63it/s] 71%|███████ | 8189/11526 [1:25:34<34:21, 1.62it/s] {'loss': 0.1254, 'grad_norm': 0.5155147314071655, 'learning_rate': 2.343458238042584e-06, 'epoch': 2.13}
71%|███████ | 8189/11526 [1:25:34<34:21, 1.62it/s] 71%|███████ | 8190/11526 [1:25:35<34:17, 1.62it/s] {'loss': 0.118, 'grad_norm': 0.5042359232902527, 'learning_rate': 2.3421754669092483e-06, 'epoch': 2.13}
71%|███████ | 8190/11526 [1:25:35<34:17, 1.62it/s] 71%|███████ | 8191/11526 [1:25:36<34:14, 1.62it/s] {'loss': 0.1686, 'grad_norm': 0.5496435761451721, 'learning_rate': 2.3408929395667013e-06, 'epoch': 2.13}
71%|███████ | 8191/11526 [1:25:36<34:14, 1.62it/s] 71%|███████ | 8192/11526 [1:25:36<34:12, 1.62it/s] {'loss': 0.1357, 'grad_norm': 0.4898000955581665, 'learning_rate': 2.339610656132585e-06, 'epoch': 2.13}
71%|███████ | 8192/11526 [1:25:36<34:12, 1.62it/s] 71%|███████ | 8193/11526 [1:25:37<34:10, 1.63it/s] {'loss': 0.1553, 'grad_norm': 0.5400184988975525, 'learning_rate': 2.3383286167245123e-06, 'epoch': 2.13}
71%|███████ | 8193/11526 [1:25:37<34:10, 1.63it/s] 71%|███████ | 8194/11526 [1:25:37<34:13, 1.62it/s] {'loss': 0.1523, 'grad_norm': 0.558213472366333, 'learning_rate': 2.3370468214600873e-06, 'epoch': 2.13}
71%|███████ | 8194/11526 [1:25:38<34:13, 1.62it/s] 71%|███████ | 8195/11526 [1:25:38<34:10, 1.62it/s] {'loss': 0.2333, 'grad_norm': 0.6418808698654175, 'learning_rate': 2.3357652704568775e-06, 'epoch': 2.13}
71%|███████ | 8195/11526 [1:25:38<34:10, 1.62it/s] 71%|███████ | 8196/11526 [1:25:39<34:09, 1.63it/s] {'loss': 0.1361, 'grad_norm': 0.5383483171463013, 'learning_rate': 2.334483963832436e-06, 'epoch': 2.13}
71%|███████ | 8196/11526 [1:25:39<34:09, 1.63it/s] 71%|███████ | 8197/11526 [1:25:39<34:07, 1.63it/s] {'loss': 0.1512, 'grad_norm': 0.552180826663971, 'learning_rate': 2.333202901704291e-06, 'epoch': 2.13}
71%|███████ | 8197/11526 [1:25:39<34:07, 1.63it/s] 71%|███████ | 8198/11526 [1:25:40<34:06, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.5143652558326721, 'learning_rate': 2.3319220841899502e-06, 'epoch': 2.13}
71%|███████ | 8198/11526 [1:25:40<34:06, 1.63it/s] 71%|███████ | 8199/11526 [1:25:40<34:07, 1.62it/s] {'loss': 0.2038, 'grad_norm': 0.5913935899734497, 'learning_rate': 2.3306415114068957e-06, 'epoch': 2.13}
71%|███████ | 8199/11526 [1:25:41<34:07, 1.62it/s] 71%|███████ | 8200/11526 [1:25:41<34:07, 1.62it/s] {'loss': 0.1485, 'grad_norm': 0.7279759049415588, 'learning_rate': 2.3293611834725927e-06, 'epoch': 2.13}
71%|███████ | 8200/11526 [1:25:41<34:07, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.32it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.77it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5458920001983643, 'eval_runtime': 1.9559, 'eval_samples_per_second': 102.255, 'eval_steps_per_second': 6.647, 'epoch': 2.13}
71%|███████ | 8200/11526 [1:25:43<34:07, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 71%|███████ | 8201/11526 [1:25:44<1:06:42, 1.20s/it] {'loss': 0.1735, 'grad_norm': 0.6606225371360779, 'learning_rate': 2.3280811005044744e-06, 'epoch': 2.13}
71%|███████ | 8201/11526 [1:25:44<1:06:42, 1.20s/it] 71%|███████ | 8202/11526 [1:25:44<56:53, 1.03s/it] {'loss': 0.1818, 'grad_norm': 0.6054589152336121, 'learning_rate': 2.3268012626199604e-06, 'epoch': 2.13}
71%|███████ | 8202/11526 [1:25:44<56:53, 1.03s/it] 71%|███████ | 8203/11526 [1:25:45<50:00, 1.11it/s] {'loss': 0.2189, 'grad_norm': 0.6954500079154968, 'learning_rate': 2.3255216699364446e-06, 'epoch': 2.14}
71%|███████ | 8203/11526 [1:25:45<50:00, 1.11it/s] 71%|███████ | 8204/11526 [1:25:45<45:15, 1.22it/s] {'loss': 0.1378, 'grad_norm': 0.5834916234016418, 'learning_rate': 2.3242423225712984e-06, 'epoch': 2.14}
71%|███████ | 8204/11526 [1:25:46<45:15, 1.22it/s] 71%|███████ | 8205/11526 [1:25:46<41:52, 1.32it/s] {'loss': 0.1443, 'grad_norm': 0.5944322943687439, 'learning_rate': 2.3229632206418727e-06, 'epoch': 2.14}
71%|███████ | 8205/11526 [1:25:46<41:52, 1.32it/s] 71%|███████ | 8206/11526 [1:25:47<39:29, 1.40it/s] {'loss': 0.1629, 'grad_norm': 0.6283146739006042, 'learning_rate': 2.3216843642654887e-06, 'epoch': 2.14}
71%|███████ | 8206/11526 [1:25:47<39:29, 1.40it/s] 71%|███████ | 8207/11526 [1:25:47<37:49, 1.46it/s] {'loss': 0.1338, 'grad_norm': 0.6075490117073059, 'learning_rate': 2.320405753559455e-06, 'epoch': 2.14}
71%|███████ | 8207/11526 [1:25:47<37:49, 1.46it/s] 71%|███████ | 8208/11526 [1:25:48<36:40, 1.51it/s] {'loss': 0.1801, 'grad_norm': 0.7616950273513794, 'learning_rate': 2.3191273886410542e-06, 'epoch': 2.14}
71%|███████ | 8208/11526 [1:25:48<36:40, 1.51it/s] 71%|███████ | 8209/11526 [1:25:49<35:54, 1.54it/s] {'loss': 0.1306, 'grad_norm': 0.5215824842453003, 'learning_rate': 2.317849269627541e-06, 'epoch': 2.14}
71%|███████ | 8209/11526 [1:25:49<35:54, 1.54it/s] 71%|███████ | 8210/11526 [1:25:49<35:19, 1.56it/s] {'loss': 0.2075, 'grad_norm': 0.7067606449127197, 'learning_rate': 2.316571396636153e-06, 'epoch': 2.14}
71%|███████ | 8210/11526 [1:25:49<35:19, 1.56it/s] 71%|███████ | 8211/11526 [1:25:50<34:53, 1.58it/s] {'loss': 0.1462, 'grad_norm': 0.5585421323776245, 'learning_rate': 2.315293769784105e-06, 'epoch': 2.14}
71%|███████ | 8211/11526 [1:25:50<34:53, 1.58it/s] 71%|███████ | 8212/11526 [1:25:50<34:35, 1.60it/s] {'loss': 0.1464, 'grad_norm': 0.5622038841247559, 'learning_rate': 2.3140163891885875e-06, 'epoch': 2.14}
71%|███████ | 8212/11526 [1:25:51<34:35, 1.60it/s] 71%|███████▏ | 8213/11526 [1:25:51<34:23, 1.61it/s] {'loss': 0.1416, 'grad_norm': 0.5287469029426575, 'learning_rate': 2.3127392549667714e-06, 'epoch': 2.14}
71%|███████▏ | 8213/11526 [1:25:51<34:23, 1.61it/s] 71%|███████▏ | 8214/11526 [1:25:52<34:16, 1.61it/s] {'loss': 0.1703, 'grad_norm': 0.6453402638435364, 'learning_rate': 2.3114623672357993e-06, 'epoch': 2.14}
71%|███████▏ | 8214/11526 [1:25:52<34:16, 1.61it/s] 71%|███████▏ | 8215/11526 [1:25:52<34:09, 1.62it/s] {'loss': 0.1393, 'grad_norm': 0.5060290694236755, 'learning_rate': 2.3101857261127935e-06, 'epoch': 2.14}
71%|███████▏ | 8215/11526 [1:25:52<34:09, 1.62it/s] 71%|███████▏ | 8216/11526 [1:25:53<34:04, 1.62it/s] {'loss': 0.1381, 'grad_norm': 0.5192933082580566, 'learning_rate': 2.3089093317148616e-06, 'epoch': 2.14}
71%|███████▏ | 8216/11526 [1:25:53<34:04, 1.62it/s] 71%|███████▏ | 8217/11526 [1:25:53<34:00, 1.62it/s] {'loss': 0.1441, 'grad_norm': 0.5635449290275574, 'learning_rate': 2.3076331841590756e-06, 'epoch': 2.14}
71%|███████▏ | 8217/11526 [1:25:54<34:00, 1.62it/s] 71%|███████▏ | 8218/11526 [1:25:54<33:57, 1.62it/s] {'loss': 0.1578, 'grad_norm': 0.6039936542510986, 'learning_rate': 2.3063572835624924e-06, 'epoch': 2.14}
71%|███████▏ | 8218/11526 [1:25:54<33:57, 1.62it/s] 71%|███████▏ | 8219/11526 [1:25:55<34:08, 1.61it/s] {'loss': 0.1164, 'grad_norm': 0.48349544405937195, 'learning_rate': 2.3050816300421473e-06, 'epoch': 2.14}
71%|███████▏ | 8219/11526 [1:25:55<34:08, 1.61it/s] 71%|███████▏ | 8220/11526 [1:25:55<34:02, 1.62it/s] {'loss': 0.1785, 'grad_norm': 0.6538219451904297, 'learning_rate': 2.303806223715045e-06, 'epoch': 2.14}
71%|███████▏ | 8220/11526 [1:25:55<34:02, 1.62it/s] 71%|███████▏ | 8221/11526 [1:25:56<33:58, 1.62it/s] {'loss': 0.1502, 'grad_norm': 0.5834782719612122, 'learning_rate': 2.30253106469818e-06, 'epoch': 2.14}
71%|███████▏ | 8221/11526 [1:25:56<33:58, 1.62it/s] 71%|███████▏ | 8222/11526 [1:25:57<33:54, 1.62it/s] {'loss': 0.1553, 'grad_norm': 0.5509215593338013, 'learning_rate': 2.3012561531085125e-06, 'epoch': 2.14}
71%|███████▏ | 8222/11526 [1:25:57<33:54, 1.62it/s] 71%|███████▏ | 8223/11526 [1:25:57<33:52, 1.63it/s] {'loss': 0.1809, 'grad_norm': 0.7043911814689636, 'learning_rate': 2.299981489062986e-06, 'epoch': 2.14}
71%|███████▏ | 8223/11526 [1:25:57<33:52, 1.63it/s] 71%|███████▏ | 8224/11526 [1:25:58<33:53, 1.62it/s] {'loss': 0.1295, 'grad_norm': 0.5216290950775146, 'learning_rate': 2.2987070726785186e-06, 'epoch': 2.14}
71%|███████▏ | 8224/11526 [1:25:58<33:53, 1.62it/s] 71%|███████▏ | 8225/11526 [1:25:58<33:51, 1.62it/s] {'loss': 0.1521, 'grad_norm': 0.5912718176841736, 'learning_rate': 2.297432904072009e-06, 'epoch': 2.14}
71%|███████▏ | 8225/11526 [1:25:59<33:51, 1.62it/s] 71%|███████▏ | 8226/11526 [1:25:59<33:50, 1.62it/s] {'loss': 0.148, 'grad_norm': 0.5648638010025024, 'learning_rate': 2.2961589833603305e-06, 'epoch': 2.14}
71%|███████▏ | 8226/11526 [1:25:59<33:50, 1.62it/s] 71%|███████▏ | 8227/11526 [1:26:00<33:48, 1.63it/s] {'loss': 0.1207, 'grad_norm': 0.5015857219696045, 'learning_rate': 2.294885310660336e-06, 'epoch': 2.14}
71%|███████▏ | 8227/11526 [1:26:00<33:48, 1.63it/s] 71%|███████▏ | 8228/11526 [1:26:00<33:47, 1.63it/s] {'loss': 0.1504, 'grad_norm': 0.5310381054878235, 'learning_rate': 2.2936118860888483e-06, 'epoch': 2.14}
71%|███████▏ | 8228/11526 [1:26:00<33:47, 1.63it/s] 71%|███████▏ | 8229/11526 [1:26:01<33:49, 1.62it/s] {'loss': 0.1817, 'grad_norm': 0.5928050875663757, 'learning_rate': 2.292338709762681e-06, 'epoch': 2.14}
71%|███████▏ | 8229/11526 [1:26:01<33:49, 1.62it/s] 71%|███████▏ | 8230/11526 [1:26:01<33:46, 1.63it/s] {'loss': 0.1277, 'grad_norm': 0.5538178086280823, 'learning_rate': 2.2910657817986117e-06, 'epoch': 2.14}
71%|███████▏ | 8230/11526 [1:26:02<33:46, 1.63it/s] 71%|███████▏ | 8231/11526 [1:26:02<33:45, 1.63it/s] {'loss': 0.1349, 'grad_norm': 0.5737244486808777, 'learning_rate': 2.289793102313401e-06, 'epoch': 2.14}
71%|███████▏ | 8231/11526 [1:26:02<33:45, 1.63it/s] 71%|███████▏ | 8232/11526 [1:26:03<33:43, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5962591767311096, 'learning_rate': 2.288520671423791e-06, 'epoch': 2.14}
71%|███████▏ | 8232/11526 [1:26:03<33:43, 1.63it/s] 71%|███████▏ | 8233/11526 [1:26:03<33:43, 1.63it/s] {'loss': 0.1483, 'grad_norm': 0.5235579013824463, 'learning_rate': 2.2872484892464874e-06, 'epoch': 2.14}
71%|███████▏ | 8233/11526 [1:26:03<33:43, 1.63it/s] 71%|███████▏ | 8234/11526 [1:26:04<33:43, 1.63it/s] {'loss': 0.1396, 'grad_norm': 0.5123848915100098, 'learning_rate': 2.2859765558981895e-06, 'epoch': 2.14}
71%|███████▏ | 8234/11526 [1:26:04<33:43, 1.63it/s] 71%|███████▏ | 8235/11526 [1:26:05<33:43, 1.63it/s] {'loss': 0.1359, 'grad_norm': 0.48288393020629883, 'learning_rate': 2.2847048714955663e-06, 'epoch': 2.14}
71%|███████▏ | 8235/11526 [1:26:05<33:43, 1.63it/s] 71%|███████▏ | 8236/11526 [1:26:05<33:41, 1.63it/s] {'loss': 0.1298, 'grad_norm': 0.4929841160774231, 'learning_rate': 2.2834334361552596e-06, 'epoch': 2.14}
71%|███████▏ | 8236/11526 [1:26:05<33:41, 1.63it/s] 71%|███████▏ | 8237/11526 [1:26:06<33:40, 1.63it/s] {'loss': 0.1342, 'grad_norm': 0.5324632525444031, 'learning_rate': 2.282162249993895e-06, 'epoch': 2.14}
71%|███████▏ | 8237/11526 [1:26:06<33:40, 1.63it/s] 71%|███████▏ | 8238/11526 [1:26:06<33:39, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.5615243911743164, 'learning_rate': 2.2808913131280724e-06, 'epoch': 2.14}
71%|███████▏ | 8238/11526 [1:26:07<33:39, 1.63it/s] 71%|███████▏ | 8239/11526 [1:26:07<33:40, 1.63it/s] {'loss': 0.124, 'grad_norm': 0.46913161873817444, 'learning_rate': 2.27962062567437e-06, 'epoch': 2.14}
71%|███████▏ | 8239/11526 [1:26:07<33:40, 1.63it/s] 71%|███████▏ | 8240/11526 [1:26:08<33:39, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.553542971611023, 'learning_rate': 2.2783501877493437e-06, 'epoch': 2.14}
71%|███████▏ | 8240/11526 [1:26:08<33:39, 1.63it/s] 71%|███████▏ | 8241/11526 [1:26:08<33:38, 1.63it/s] {'loss': 0.1815, 'grad_norm': 0.6953550577163696, 'learning_rate': 2.277079999469522e-06, 'epoch': 2.14}
71%|███████▏ | 8241/11526 [1:26:08<33:38, 1.63it/s] 72%|███████▏ | 8242/11526 [1:26:09<33:37, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.5333102941513062, 'learning_rate': 2.275810060951413e-06, 'epoch': 2.15}
72%|███████▏ | 8242/11526 [1:26:09<33:37, 1.63it/s] 72%|███████▏ | 8243/11526 [1:26:09<33:43, 1.62it/s] {'loss': 0.1672, 'grad_norm': 0.622829258441925, 'learning_rate': 2.2745403723115102e-06, 'epoch': 2.15}
72%|███████▏ | 8243/11526 [1:26:10<33:43, 1.62it/s] 72%|███████▏ | 8244/11526 [1:26:10<33:39, 1.63it/s] {'loss': 0.1279, 'grad_norm': 0.45777428150177, 'learning_rate': 2.273270933666269e-06, 'epoch': 2.15}
72%|███████▏ | 8244/11526 [1:26:10<33:39, 1.63it/s] 72%|███████▏ | 8245/11526 [1:26:11<33:37, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.5331337451934814, 'learning_rate': 2.272001745132132e-06, 'epoch': 2.15}
72%|███████▏ | 8245/11526 [1:26:11<33:37, 1.63it/s] 72%|███████▏ | 8246/11526 [1:26:11<33:35, 1.63it/s] {'loss': 0.1773, 'grad_norm': 0.7392135262489319, 'learning_rate': 2.270732806825517e-06, 'epoch': 2.15}
72%|███████▏ | 8246/11526 [1:26:11<33:35, 1.63it/s] 72%|███████▏ | 8247/11526 [1:26:12<33:34, 1.63it/s] {'loss': 0.1852, 'grad_norm': 0.8457289338111877, 'learning_rate': 2.269464118862818e-06, 'epoch': 2.15}
72%|███████▏ | 8247/11526 [1:26:12<33:34, 1.63it/s] 72%|███████▏ | 8248/11526 [1:26:13<33:33, 1.63it/s] {'loss': 0.1089, 'grad_norm': 0.40349721908569336, 'learning_rate': 2.268195681360408e-06, 'epoch': 2.15}
72%|███████▏ | 8248/11526 [1:26:13<33:33, 1.63it/s] 72%|███████▏ | 8249/11526 [1:26:13<33:33, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5397130846977234, 'learning_rate': 2.266927494434631e-06, 'epoch': 2.15}
72%|███████▏ | 8249/11526 [1:26:13<33:33, 1.63it/s] 72%|███████▏ | 8250/11526 [1:26:14<33:32, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5650476217269897, 'learning_rate': 2.2656595582018147e-06, 'epoch': 2.15}
72%|███████▏ | 8250/11526 [1:26:14<33:32, 1.63it/s] 72%|███████▏ | 8251/11526 [1:26:14<33:30, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5836114883422852, 'learning_rate': 2.264391872778262e-06, 'epoch': 2.15}
72%|███████▏ | 8251/11526 [1:26:15<33:30, 1.63it/s] 72%|███████▏ | 8252/11526 [1:26:15<33:30, 1.63it/s] {'loss': 0.1533, 'grad_norm': 0.5900980234146118, 'learning_rate': 2.2631244382802518e-06, 'epoch': 2.15}
72%|███████▏ | 8252/11526 [1:26:15<33:30, 1.63it/s] 72%|███████▏ | 8253/11526 [1:26:16<33:29, 1.63it/s] {'loss': 0.1492, 'grad_norm': 0.5605417490005493, 'learning_rate': 2.2618572548240403e-06, 'epoch': 2.15}
72%|███████▏ | 8253/11526 [1:26:16<33:29, 1.63it/s] 72%|███████▏ | 8254/11526 [1:26:16<33:30, 1.63it/s] {'loss': 0.1299, 'grad_norm': 0.5158534049987793, 'learning_rate': 2.2605903225258625e-06, 'epoch': 2.15}
72%|███████▏ | 8254/11526 [1:26:16<33:30, 1.63it/s] 72%|███████▏ | 8255/11526 [1:26:17<33:28, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.5818209052085876, 'learning_rate': 2.2593236415019227e-06, 'epoch': 2.15}
72%|███████▏ | 8255/11526 [1:26:17<33:28, 1.63it/s] 72%|███████▏ | 8256/11526 [1:26:17<33:27, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.5864853262901306, 'learning_rate': 2.2580572118684162e-06, 'epoch': 2.15}
72%|███████▏ | 8256/11526 [1:26:18<33:27, 1.63it/s] 72%|███████▏ | 8257/11526 [1:26:18<33:25, 1.63it/s] {'loss': 0.1152, 'grad_norm': 0.44723746180534363, 'learning_rate': 2.2567910337415017e-06, 'epoch': 2.15}
72%|███████▏ | 8257/11526 [1:26:18<33:25, 1.63it/s] 72%|███████▏ | 8258/11526 [1:26:19<33:26, 1.63it/s] {'loss': 0.1958, 'grad_norm': 0.7314780354499817, 'learning_rate': 2.255525107237322e-06, 'epoch': 2.15}
72%|███████▏ | 8258/11526 [1:26:19<33:26, 1.63it/s] 72%|███████▏ | 8259/11526 [1:26:19<33:28, 1.63it/s] {'loss': 0.1523, 'grad_norm': 0.5174112915992737, 'learning_rate': 2.2542594324719947e-06, 'epoch': 2.15}
72%|███████▏ | 8259/11526 [1:26:19<33:28, 1.63it/s] 72%|███████▏ | 8260/11526 [1:26:20<33:33, 1.62it/s] {'loss': 0.1566, 'grad_norm': 0.5963327884674072, 'learning_rate': 2.2529940095616142e-06, 'epoch': 2.15}
72%|███████▏ | 8260/11526 [1:26:20<33:33, 1.62it/s] 72%|███████▏ | 8261/11526 [1:26:21<33:30, 1.62it/s] {'loss': 0.1346, 'grad_norm': 0.5653273463249207, 'learning_rate': 2.2517288386222543e-06, 'epoch': 2.15}
72%|███████▏ | 8261/11526 [1:26:21<33:30, 1.62it/s] 72%|███████▏ | 8262/11526 [1:26:21<33:28, 1.63it/s] {'loss': 0.1501, 'grad_norm': 0.5908359289169312, 'learning_rate': 2.250463919769963e-06, 'epoch': 2.15}
72%|███████▏ | 8262/11526 [1:26:21<33:28, 1.63it/s] 72%|███████▏ | 8263/11526 [1:26:22<33:25, 1.63it/s] {'loss': 0.1306, 'grad_norm': 0.5926213264465332, 'learning_rate': 2.2491992531207633e-06, 'epoch': 2.15}
72%|███████▏ | 8263/11526 [1:26:22<33:25, 1.63it/s] 72%|███████▏ | 8264/11526 [1:26:22<33:28, 1.62it/s] {'loss': 0.1668, 'grad_norm': 0.4785595238208771, 'learning_rate': 2.2479348387906587e-06, 'epoch': 2.15}
72%|███████▏ | 8264/11526 [1:26:23<33:28, 1.62it/s] 72%|███████▏ | 8265/11526 [1:26:23<33:25, 1.63it/s] {'loss': 0.1694, 'grad_norm': 0.6221127510070801, 'learning_rate': 2.246670676895632e-06, 'epoch': 2.15}
72%|███████▏ | 8265/11526 [1:26:23<33:25, 1.63it/s] 72%|███████▏ | 8266/11526 [1:26:24<33:25, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.6415732502937317, 'learning_rate': 2.2454067675516355e-06, 'epoch': 2.15}
72%|███████▏ | 8266/11526 [1:26:24<33:25, 1.63it/s] 72%|███████▏ | 8267/11526 [1:26:24<33:23, 1.63it/s] {'loss': 0.1888, 'grad_norm': 0.71381676197052, 'learning_rate': 2.244143110874603e-06, 'epoch': 2.15}
72%|███████▏ | 8267/11526 [1:26:24<33:23, 1.63it/s] 72%|███████▏ | 8268/11526 [1:26:25<33:23, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.5238995552062988, 'learning_rate': 2.2428797069804466e-06, 'epoch': 2.15}
72%|███████▏ | 8268/11526 [1:26:25<33:23, 1.63it/s] 72%|███████▏ | 8269/11526 [1:26:25<33:26, 1.62it/s] {'loss': 0.1537, 'grad_norm': 0.6753324866294861, 'learning_rate': 2.2416165559850467e-06, 'epoch': 2.15}
72%|███████▏ | 8269/11526 [1:26:26<33:26, 1.62it/s] 72%|███████▏ | 8270/11526 [1:26:26<33:25, 1.62it/s] {'loss': 0.1503, 'grad_norm': 0.5362997055053711, 'learning_rate': 2.2403536580042746e-06, 'epoch': 2.15}
72%|███████▏ | 8270/11526 [1:26:26<33:25, 1.62it/s] 72%|███████▏ | 8271/11526 [1:26:27<33:22, 1.63it/s] {'loss': 0.1546, 'grad_norm': 0.5778745412826538, 'learning_rate': 2.239091013153965e-06, 'epoch': 2.15}
72%|███████▏ | 8271/11526 [1:26:27<33:22, 1.63it/s] 72%|███████▏ | 8272/11526 [1:26:27<33:20, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.5838703513145447, 'learning_rate': 2.237828621549937e-06, 'epoch': 2.15}
72%|███████▏ | 8272/11526 [1:26:27<33:20, 1.63it/s] 72%|███████▏ | 8273/11526 [1:26:28<33:19, 1.63it/s] {'loss': 0.1438, 'grad_norm': 0.5312973260879517, 'learning_rate': 2.2365664833079833e-06, 'epoch': 2.15}
72%|███████▏ | 8273/11526 [1:26:28<33:19, 1.63it/s] 72%|███████▏ | 8274/11526 [1:26:29<33:21, 1.62it/s] {'loss': 0.1579, 'grad_norm': 0.5065193176269531, 'learning_rate': 2.2353045985438747e-06, 'epoch': 2.15}
72%|███████▏ | 8274/11526 [1:26:29<33:21, 1.62it/s] 72%|███████▏ | 8275/11526 [1:26:29<33:19, 1.63it/s] {'loss': 0.1265, 'grad_norm': 0.5176178812980652, 'learning_rate': 2.23404296737336e-06, 'epoch': 2.15}
72%|███████▏ | 8275/11526 [1:26:29<33:19, 1.63it/s] 72%|███████▏ | 8276/11526 [1:26:30<33:18, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.4937765300273895, 'learning_rate': 2.232781589912163e-06, 'epoch': 2.15}
72%|███████▏ | 8276/11526 [1:26:30<33:18, 1.63it/s] 72%|███████▏ | 8277/11526 [1:26:30<33:17, 1.63it/s] {'loss': 0.1431, 'grad_norm': 0.5509374737739563, 'learning_rate': 2.23152046627598e-06, 'epoch': 2.15}
72%|███████▏ | 8277/11526 [1:26:31<33:17, 1.63it/s] 72%|███████▏ | 8278/11526 [1:26:31<33:15, 1.63it/s] {'loss': 0.1512, 'grad_norm': 0.5305643677711487, 'learning_rate': 2.230259596580495e-06, 'epoch': 2.15}
72%|███████▏ | 8278/11526 [1:26:31<33:15, 1.63it/s] 72%|███████▏ | 8279/11526 [1:26:32<33:14, 1.63it/s] {'loss': 0.1538, 'grad_norm': 0.6281760334968567, 'learning_rate': 2.2289989809413576e-06, 'epoch': 2.15}
72%|███████▏ | 8279/11526 [1:26:32<33:14, 1.63it/s] 72%|███████▏ | 8280/11526 [1:26:32<33:13, 1.63it/s] {'loss': 0.1729, 'grad_norm': 0.6670506000518799, 'learning_rate': 2.2277386194742003e-06, 'epoch': 2.16}
72%|███████▏ | 8280/11526 [1:26:32<33:13, 1.63it/s] 72%|███████▏ | 8281/11526 [1:26:33<33:12, 1.63it/s] {'loss': 0.1948, 'grad_norm': 0.7241019010543823, 'learning_rate': 2.2264785122946313e-06, 'epoch': 2.16}
72%|███████▏ | 8281/11526 [1:26:33<33:12, 1.63it/s] 72%|███████▏ | 8282/11526 [1:26:33<33:11, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.6706886887550354, 'learning_rate': 2.2252186595182308e-06, 'epoch': 2.16}
72%|███████▏ | 8282/11526 [1:26:34<33:11, 1.63it/s] 72%|███████▏ | 8283/11526 [1:26:34<33:11, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.542465090751648, 'learning_rate': 2.2239590612605645e-06, 'epoch': 2.16}
72%|███████▏ | 8283/11526 [1:26:34<33:11, 1.63it/s] 72%|███████▏ | 8284/11526 [1:26:35<33:13, 1.63it/s] {'loss': 0.1378, 'grad_norm': 0.569693922996521, 'learning_rate': 2.22269971763717e-06, 'epoch': 2.16}
72%|███████▏ | 8284/11526 [1:26:35<33:13, 1.63it/s] 72%|███████▏ | 8285/11526 [1:26:35<33:12, 1.63it/s] {'loss': 0.1307, 'grad_norm': 0.5540089011192322, 'learning_rate': 2.2214406287635574e-06, 'epoch': 2.16}
72%|███████▏ | 8285/11526 [1:26:35<33:12, 1.63it/s] 72%|███████▏ | 8286/11526 [1:26:36<33:11, 1.63it/s] {'loss': 0.1489, 'grad_norm': 0.538386344909668, 'learning_rate': 2.2201817947552203e-06, 'epoch': 2.16}
72%|███████▏ | 8286/11526 [1:26:36<33:11, 1.63it/s] 72%|███████▏ | 8287/11526 [1:26:37<33:11, 1.63it/s] {'loss': 0.1694, 'grad_norm': 0.6310000419616699, 'learning_rate': 2.2189232157276247e-06, 'epoch': 2.16}
72%|███████▏ | 8287/11526 [1:26:37<33:11, 1.63it/s] 72%|███████▏ | 8288/11526 [1:26:37<33:09, 1.63it/s] {'loss': 0.1601, 'grad_norm': 0.5675488710403442, 'learning_rate': 2.2176648917962162e-06, 'epoch': 2.16}
72%|███████▏ | 8288/11526 [1:26:37<33:09, 1.63it/s] 72%|███████▏ | 8289/11526 [1:26:38<33:10, 1.63it/s] {'loss': 0.322, 'grad_norm': 0.7946687936782837, 'learning_rate': 2.2164068230764163e-06, 'epoch': 2.16}
72%|███████▏ | 8289/11526 [1:26:38<33:10, 1.63it/s] 72%|███████▏ | 8290/11526 [1:26:38<33:09, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.6819049119949341, 'learning_rate': 2.2151490096836183e-06, 'epoch': 2.16}
72%|███████▏ | 8290/11526 [1:26:39<33:09, 1.63it/s] 72%|███████▏ | 8291/11526 [1:26:39<33:06, 1.63it/s] {'loss': 0.1553, 'grad_norm': 0.6014243960380554, 'learning_rate': 2.2138914517331965e-06, 'epoch': 2.16}
72%|███████▏ | 8291/11526 [1:26:39<33:06, 1.63it/s] 72%|███████▏ | 8292/11526 [1:26:40<33:06, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.6184793710708618, 'learning_rate': 2.2126341493405076e-06, 'epoch': 2.16}
72%|███████▏ | 8292/11526 [1:26:40<33:06, 1.63it/s] 72%|███████▏ | 8293/11526 [1:26:40<33:05, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.5949475765228271, 'learning_rate': 2.211377102620871e-06, 'epoch': 2.16}
72%|███████▏ | 8293/11526 [1:26:40<33:05, 1.63it/s] 72%|███████▏ | 8294/11526 [1:26:41<33:06, 1.63it/s] {'loss': 0.1818, 'grad_norm': 0.7307855486869812, 'learning_rate': 2.2101203116895936e-06, 'epoch': 2.16}
72%|███████▏ | 8294/11526 [1:26:41<33:06, 1.63it/s] 72%|███████▏ | 8295/11526 [1:26:41<33:06, 1.63it/s] {'loss': 0.1471, 'grad_norm': 0.536896824836731, 'learning_rate': 2.2088637766619554e-06, 'epoch': 2.16}
72%|███████▏ | 8295/11526 [1:26:42<33:06, 1.63it/s] 72%|███████▏ | 8296/11526 [1:26:42<33:05, 1.63it/s] {'loss': 0.1279, 'grad_norm': 0.4925210773944855, 'learning_rate': 2.2076074976532117e-06, 'epoch': 2.16}
72%|███████▏ | 8296/11526 [1:26:42<33:05, 1.63it/s] 72%|███████▏ | 8297/11526 [1:26:43<33:04, 1.63it/s] {'loss': 0.1224, 'grad_norm': 0.5381885766983032, 'learning_rate': 2.2063514747785987e-06, 'epoch': 2.16}
72%|███████▏ | 8297/11526 [1:26:43<33:04, 1.63it/s] 72%|███████▏ | 8298/11526 [1:26:43<33:04, 1.63it/s] {'loss': 0.1474, 'grad_norm': 0.6012558341026306, 'learning_rate': 2.205095708153322e-06, 'epoch': 2.16}
72%|███████▏ | 8298/11526 [1:26:43<33:04, 1.63it/s] 72%|███████▏ | 8299/11526 [1:26:44<33:06, 1.62it/s] {'loss': 0.1594, 'grad_norm': 0.6764559745788574, 'learning_rate': 2.203840197892569e-06, 'epoch': 2.16}
72%|███████▏ | 8299/11526 [1:26:44<33:06, 1.62it/s] 72%|███████▏ | 8300/11526 [1:26:45<33:04, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5993819236755371, 'learning_rate': 2.202584944111503e-06, 'epoch': 2.16}
72%|███████▏ | 8300/11526 [1:26:45<33:04, 1.63it/s] 72%|███████▏ | 8301/11526 [1:26:45<33:03, 1.63it/s] {'loss': 0.1205, 'grad_norm': 0.47191306948661804, 'learning_rate': 2.2013299469252615e-06, 'epoch': 2.16}
72%|███████▏ | 8301/11526 [1:26:45<33:03, 1.63it/s] 72%|███████▏ | 8302/11526 [1:26:46<33:01, 1.63it/s] {'loss': 0.1336, 'grad_norm': 0.5411447286605835, 'learning_rate': 2.2000752064489617e-06, 'epoch': 2.16}
72%|███████▏ | 8302/11526 [1:26:46<33:01, 1.63it/s] 72%|███████▏ | 8303/11526 [1:26:46<33:00, 1.63it/s] {'loss': 0.177, 'grad_norm': 0.6066616177558899, 'learning_rate': 2.1988207227976967e-06, 'epoch': 2.16}
72%|███████▏ | 8303/11526 [1:26:46<33:00, 1.63it/s] 72%|███████▏ | 8304/11526 [1:26:47<33:02, 1.63it/s] {'loss': 0.1162, 'grad_norm': 0.4724459648132324, 'learning_rate': 2.1975664960865294e-06, 'epoch': 2.16}
72%|███████▏ | 8304/11526 [1:26:47<33:02, 1.63it/s] 72%|███████▏ | 8305/11526 [1:26:48<33:01, 1.63it/s] {'loss': 0.1869, 'grad_norm': 0.6639047265052795, 'learning_rate': 2.1963125264305117e-06, 'epoch': 2.16}
72%|███████▏ | 8305/11526 [1:26:48<33:01, 1.63it/s] 72%|███████▏ | 8306/11526 [1:26:48<33:00, 1.63it/s] {'loss': 0.1537, 'grad_norm': 0.6347536444664001, 'learning_rate': 2.19505881394466e-06, 'epoch': 2.16}
72%|███████▏ | 8306/11526 [1:26:48<33:00, 1.63it/s] 72%|███████▏ | 8307/11526 [1:26:49<32:58, 1.63it/s] {'loss': 0.1287, 'grad_norm': 0.4936181306838989, 'learning_rate': 2.1938053587439734e-06, 'epoch': 2.16}
72%|███████▏ | 8307/11526 [1:26:49<32:58, 1.63it/s] 72%|███████▏ | 8308/11526 [1:26:49<32:56, 1.63it/s] {'loss': 0.148, 'grad_norm': 0.6082388758659363, 'learning_rate': 2.1925521609434275e-06, 'epoch': 2.16}
72%|███████▏ | 8308/11526 [1:26:50<32:56, 1.63it/s] 72%|███████▏ | 8309/11526 [1:26:50<32:59, 1.62it/s] {'loss': 0.1589, 'grad_norm': 0.6085417866706848, 'learning_rate': 2.191299220657967e-06, 'epoch': 2.16}
72%|███████▏ | 8309/11526 [1:26:50<32:59, 1.62it/s] 72%|███████▏ | 8310/11526 [1:26:51<32:59, 1.62it/s] {'loss': 0.1292, 'grad_norm': 0.5012262463569641, 'learning_rate': 2.1900465380025253e-06, 'epoch': 2.16}
72%|███████▏ | 8310/11526 [1:26:51<32:59, 1.62it/s] 72%|███████▏ | 8311/11526 [1:26:51<32:58, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.5414086580276489, 'learning_rate': 2.188794113092005e-06, 'epoch': 2.16}
72%|███████▏ | 8311/11526 [1:26:51<32:58, 1.63it/s] 72%|███████▏ | 8312/11526 [1:26:52<32:56, 1.63it/s] {'loss': 0.1703, 'grad_norm': 0.5407191514968872, 'learning_rate': 2.187541946041281e-06, 'epoch': 2.16}
72%|███████▏ | 8312/11526 [1:26:52<32:56, 1.63it/s] 72%|███████▏ | 8313/11526 [1:26:53<32:56, 1.63it/s] {'loss': 0.1522, 'grad_norm': 0.508511483669281, 'learning_rate': 2.186290036965213e-06, 'epoch': 2.16}
72%|███████▏ | 8313/11526 [1:26:53<32:56, 1.63it/s] 72%|███████▏ | 8314/11526 [1:26:53<32:57, 1.62it/s] {'loss': 0.1589, 'grad_norm': 0.5723199844360352, 'learning_rate': 2.185038385978632e-06, 'epoch': 2.16}
72%|███████▏ | 8314/11526 [1:26:53<32:57, 1.62it/s] 72%|███████▏ | 8315/11526 [1:26:54<32:53, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.531221866607666, 'learning_rate': 2.1837869931963455e-06, 'epoch': 2.16}
72%|███████▏ | 8315/11526 [1:26:54<32:53, 1.63it/s] 72%|███████▏ | 8316/11526 [1:26:54<32:53, 1.63it/s] {'loss': 0.1297, 'grad_norm': 0.5350104570388794, 'learning_rate': 2.182535858733143e-06, 'epoch': 2.16}
72%|███████▏ | 8316/11526 [1:26:54<32:53, 1.63it/s] 72%|███████▏ | 8317/11526 [1:26:55<32:52, 1.63it/s] {'loss': 0.1652, 'grad_norm': 0.6654622554779053, 'learning_rate': 2.1812849827037795e-06, 'epoch': 2.16}
72%|███████▏ | 8317/11526 [1:26:55<32:52, 1.63it/s] 72%|███████▏ | 8318/11526 [1:26:56<32:51, 1.63it/s] {'loss': 0.1527, 'grad_norm': 0.6085385680198669, 'learning_rate': 2.1800343652229936e-06, 'epoch': 2.17}
72%|███████▏ | 8318/11526 [1:26:56<32:51, 1.63it/s] 72%|███████▏ | 8319/11526 [1:26:56<32:53, 1.63it/s] {'loss': 0.1483, 'grad_norm': 0.5847705602645874, 'learning_rate': 2.1787840064055043e-06, 'epoch': 2.17}
72%|███████▏ | 8319/11526 [1:26:56<32:53, 1.63it/s] 72%|███████▏ | 8320/11526 [1:26:57<32:51, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.5579280853271484, 'learning_rate': 2.177533906365996e-06, 'epoch': 2.17}
72%|███████▏ | 8320/11526 [1:26:57<32:51, 1.63it/s] 72%|███████▏ | 8321/11526 [1:26:57<32:49, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.6183887720108032, 'learning_rate': 2.1762840652191375e-06, 'epoch': 2.17}
72%|███████▏ | 8321/11526 [1:26:58<32:49, 1.63it/s] 72%|███████▏ | 8322/11526 [1:26:58<32:49, 1.63it/s] {'loss': 0.1848, 'grad_norm': 0.6377242207527161, 'learning_rate': 2.1750344830795705e-06, 'epoch': 2.17}
72%|███████▏ | 8322/11526 [1:26:58<32:49, 1.63it/s] 72%|███████▏ | 8323/11526 [1:26:59<32:48, 1.63it/s] {'loss': 0.1591, 'grad_norm': 0.5655726790428162, 'learning_rate': 2.173785160061914e-06, 'epoch': 2.17}
72%|███████▏ | 8323/11526 [1:26:59<32:48, 1.63it/s] 72%|███████▏ | 8324/11526 [1:26:59<32:50, 1.63it/s] {'loss': 0.1398, 'grad_norm': 0.6014202237129211, 'learning_rate': 2.1725360962807635e-06, 'epoch': 2.17}
72%|███████▏ | 8324/11526 [1:26:59<32:50, 1.63it/s] 72%|███████▏ | 8325/11526 [1:27:00<32:48, 1.63it/s] {'loss': 0.2097, 'grad_norm': 0.5820503830909729, 'learning_rate': 2.171287291850691e-06, 'epoch': 2.17}
72%|███████▏ | 8325/11526 [1:27:00<32:48, 1.63it/s] 72%|███████▏ | 8326/11526 [1:27:01<32:47, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.5329025387763977, 'learning_rate': 2.17003874688624e-06, 'epoch': 2.17}
72%|███████▏ | 8326/11526 [1:27:01<32:47, 1.63it/s] 72%|███████▏ | 8327/11526 [1:27:01<32:46, 1.63it/s] {'loss': 0.1427, 'grad_norm': 0.5784018635749817, 'learning_rate': 2.16879046150194e-06, 'epoch': 2.17}
72%|███████▏ | 8327/11526 [1:27:01<32:46, 1.63it/s] 72%|███████▏ | 8328/11526 [1:27:02<32:46, 1.63it/s] {'loss': 0.1318, 'grad_norm': 0.4954712390899658, 'learning_rate': 2.167542435812286e-06, 'epoch': 2.17}
72%|███████▏ | 8328/11526 [1:27:02<32:46, 1.63it/s] 72%|███████▏ | 8329/11526 [1:27:02<32:56, 1.62it/s] {'loss': 0.1334, 'grad_norm': 0.5055112838745117, 'learning_rate': 2.166294669931756e-06, 'epoch': 2.17}
72%|███████▏ | 8329/11526 [1:27:02<32:56, 1.62it/s] 72%|███████▏ | 8330/11526 [1:27:03<32:52, 1.62it/s] {'loss': 0.1978, 'grad_norm': 0.7175705432891846, 'learning_rate': 2.1650471639748037e-06, 'epoch': 2.17}
72%|███████▏ | 8330/11526 [1:27:03<32:52, 1.62it/s] 72%|███████▏ | 8331/11526 [1:27:04<32:49, 1.62it/s] {'loss': 0.1373, 'grad_norm': 0.7985043525695801, 'learning_rate': 2.1637999180558524e-06, 'epoch': 2.17}
72%|███████▏ | 8331/11526 [1:27:04<32:49, 1.62it/s] 72%|███████▏ | 8332/11526 [1:27:04<32:46, 1.62it/s] {'loss': 0.1357, 'grad_norm': 0.5764251947402954, 'learning_rate': 2.162552932289311e-06, 'epoch': 2.17}
72%|███████▏ | 8332/11526 [1:27:04<32:46, 1.62it/s] 72%|███████▏ | 8333/11526 [1:27:05<32:43, 1.63it/s] {'loss': 0.1808, 'grad_norm': 0.6258105039596558, 'learning_rate': 2.161306206789561e-06, 'epoch': 2.17}
72%|███████▏ | 8333/11526 [1:27:05<32:43, 1.63it/s] 72%|███████▏ | 8334/11526 [1:27:05<32:44, 1.62it/s] {'loss': 0.1896, 'grad_norm': 0.6958464980125427, 'learning_rate': 2.1600597416709562e-06, 'epoch': 2.17}
72%|███████▏ | 8334/11526 [1:27:06<32:44, 1.62it/s] 72%|███████▏ | 8335/11526 [1:27:06<32:42, 1.63it/s] {'loss': 0.2226, 'grad_norm': 0.8388484120368958, 'learning_rate': 2.15881353704783e-06, 'epoch': 2.17}
72%|███████▏ | 8335/11526 [1:27:06<32:42, 1.63it/s] 72%|███████▏ | 8336/11526 [1:27:07<32:40, 1.63it/s] {'loss': 0.1265, 'grad_norm': 0.5033448934555054, 'learning_rate': 2.157567593034492e-06, 'epoch': 2.17}
72%|███████▏ | 8336/11526 [1:27:07<32:40, 1.63it/s] 72%|███████▏ | 8337/11526 [1:27:07<32:39, 1.63it/s] {'loss': 0.1834, 'grad_norm': 0.6761311888694763, 'learning_rate': 2.156321909745227e-06, 'epoch': 2.17}
72%|███████▏ | 8337/11526 [1:27:07<32:39, 1.63it/s] 72%|███████▏ | 8338/11526 [1:27:08<32:38, 1.63it/s] {'loss': 0.1591, 'grad_norm': 0.662070631980896, 'learning_rate': 2.155076487294298e-06, 'epoch': 2.17}
72%|███████▏ | 8338/11526 [1:27:08<32:38, 1.63it/s] 72%|███████▏ | 8339/11526 [1:27:09<32:40, 1.63it/s] {'loss': 0.1825, 'grad_norm': 0.5884599685668945, 'learning_rate': 2.153831325795939e-06, 'epoch': 2.17}
72%|███████▏ | 8339/11526 [1:27:09<32:40, 1.63it/s] 72%|███████▏ | 8340/11526 [1:27:09<32:39, 1.63it/s] {'loss': 0.1594, 'grad_norm': 0.6596765518188477, 'learning_rate': 2.1525864253643632e-06, 'epoch': 2.17}
72%|███████▏ | 8340/11526 [1:27:09<32:39, 1.63it/s] 72%|███████▏ | 8341/11526 [1:27:10<32:38, 1.63it/s] {'loss': 0.1605, 'grad_norm': 0.578164279460907, 'learning_rate': 2.151341786113765e-06, 'epoch': 2.17}
72%|███████▏ | 8341/11526 [1:27:10<32:38, 1.63it/s] 72%|███████▏ | 8342/11526 [1:27:10<32:37, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.6746026277542114, 'learning_rate': 2.1500974081583044e-06, 'epoch': 2.17}
72%|███████▏ | 8342/11526 [1:27:10<32:37, 1.63it/s] 72%|███████▏ | 8343/11526 [1:27:11<32:36, 1.63it/s] {'loss': 0.1588, 'grad_norm': 0.531009316444397, 'learning_rate': 2.148853291612125e-06, 'epoch': 2.17}
72%|███████▏ | 8343/11526 [1:27:11<32:36, 1.63it/s] 72%|███████▏ | 8344/11526 [1:27:12<32:35, 1.63it/s] {'loss': 0.1355, 'grad_norm': 0.4763471782207489, 'learning_rate': 2.1476094365893434e-06, 'epoch': 2.17}
72%|███████▏ | 8344/11526 [1:27:12<32:35, 1.63it/s] 72%|███████▏ | 8345/11526 [1:27:12<32:37, 1.62it/s] {'loss': 0.1592, 'grad_norm': 0.5484766960144043, 'learning_rate': 2.1463658432040542e-06, 'epoch': 2.17}
72%|███████▏ | 8345/11526 [1:27:12<32:37, 1.62it/s] 72%|███████▏ | 8346/11526 [1:27:13<32:35, 1.63it/s] {'loss': 0.1253, 'grad_norm': 0.4859631657600403, 'learning_rate': 2.1451225115703285e-06, 'epoch': 2.17}
72%|███████▏ | 8346/11526 [1:27:13<32:35, 1.63it/s] 72%|███████▏ | 8347/11526 [1:27:13<32:34, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.651066780090332, 'learning_rate': 2.1438794418022073e-06, 'epoch': 2.17}
72%|███████▏ | 8347/11526 [1:27:14<32:34, 1.63it/s] 72%|███████▏ | 8348/11526 [1:27:14<32:32, 1.63it/s] {'loss': 0.1181, 'grad_norm': 0.4716893136501312, 'learning_rate': 2.1426366340137145e-06, 'epoch': 2.17}
72%|███████▏ | 8348/11526 [1:27:14<32:32, 1.63it/s] 72%|███████▏ | 8349/11526 [1:27:15<32:34, 1.63it/s] {'loss': 0.1434, 'grad_norm': 0.5766528248786926, 'learning_rate': 2.141394088318847e-06, 'epoch': 2.17}
72%|███████▏ | 8349/11526 [1:27:15<32:34, 1.63it/s] 72%|███████▏ | 8350/11526 [1:27:15<32:32, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.6247525215148926, 'learning_rate': 2.1401518048315796e-06, 'epoch': 2.17}
72%|███████▏ | 8350/11526 [1:27:15<32:32, 1.63it/s] 72%|███████▏ | 8351/11526 [1:27:16<32:31, 1.63it/s] {'loss': 0.2248, 'grad_norm': 0.7079320549964905, 'learning_rate': 2.1389097836658594e-06, 'epoch': 2.17}
72%|███████▏ | 8351/11526 [1:27:16<32:31, 1.63it/s] 72%|███████▏ | 8352/11526 [1:27:16<32:29, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.6454307436943054, 'learning_rate': 2.137668024935615e-06, 'epoch': 2.17}
72%|███████▏ | 8352/11526 [1:27:17<32:29, 1.63it/s] 72%|███████▏ | 8353/11526 [1:27:17<32:29, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.6052002906799316, 'learning_rate': 2.136426528754742e-06, 'epoch': 2.17}
72%|███████▏ | 8353/11526 [1:27:17<32:29, 1.63it/s] 72%|███████▏ | 8354/11526 [1:27:18<32:30, 1.63it/s] {'loss': 0.1975, 'grad_norm': 0.6519799828529358, 'learning_rate': 2.1351852952371244e-06, 'epoch': 2.17}
72%|███████▏ | 8354/11526 [1:27:18<32:30, 1.63it/s] 72%|███████▏ | 8355/11526 [1:27:18<32:31, 1.62it/s] {'loss': 0.1852, 'grad_norm': 0.6283969283103943, 'learning_rate': 2.1339443244966098e-06, 'epoch': 2.17}
72%|███████▏ | 8355/11526 [1:27:18<32:31, 1.62it/s] 72%|███████▏ | 8356/11526 [1:27:19<32:30, 1.63it/s] {'loss': 0.1375, 'grad_norm': 0.5123996734619141, 'learning_rate': 2.1327036166470285e-06, 'epoch': 2.17}
72%|███████▏ | 8356/11526 [1:27:19<32:30, 1.63it/s] 73%|███████▎ | 8357/11526 [1:27:20<32:28, 1.63it/s] {'loss': 0.1344, 'grad_norm': 0.506442666053772, 'learning_rate': 2.131463171802188e-06, 'epoch': 2.18}
73%|███████▎ | 8357/11526 [1:27:20<32:28, 1.63it/s] 73%|███████▎ | 8358/11526 [1:27:20<32:27, 1.63it/s] {'loss': 0.1819, 'grad_norm': 0.6033913493156433, 'learning_rate': 2.1302229900758625e-06, 'epoch': 2.18}
73%|███████▎ | 8358/11526 [1:27:20<32:27, 1.63it/s] 73%|███████▎ | 8359/11526 [1:27:21<32:28, 1.63it/s] {'loss': 0.1415, 'grad_norm': 0.5147992372512817, 'learning_rate': 2.128983071581815e-06, 'epoch': 2.18}
73%|███████▎ | 8359/11526 [1:27:21<32:28, 1.63it/s] 73%|███████▎ | 8360/11526 [1:27:21<32:27, 1.63it/s] {'loss': 0.1562, 'grad_norm': 0.5466569662094116, 'learning_rate': 2.1277434164337773e-06, 'epoch': 2.18}
73%|███████▎ | 8360/11526 [1:27:22<32:27, 1.63it/s] 73%|███████▎ | 8361/11526 [1:27:22<32:25, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.6256712675094604, 'learning_rate': 2.1265040247454543e-06, 'epoch': 2.18}
73%|███████▎ | 8361/11526 [1:27:22<32:25, 1.63it/s] 73%|███████▎ | 8362/11526 [1:27:23<32:24, 1.63it/s] {'loss': 0.1214, 'grad_norm': 0.4804351031780243, 'learning_rate': 2.1252648966305317e-06, 'epoch': 2.18}
73%|███████▎ | 8362/11526 [1:27:23<32:24, 1.63it/s] 73%|███████▎ | 8363/11526 [1:27:23<32:22, 1.63it/s] {'loss': 0.1348, 'grad_norm': 0.5310905575752258, 'learning_rate': 2.124026032202669e-06, 'epoch': 2.18}
73%|███████▎ | 8363/11526 [1:27:23<32:22, 1.63it/s] 73%|███████▎ | 8364/11526 [1:27:24<32:24, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.6207589507102966, 'learning_rate': 2.122787431575502e-06, 'epoch': 2.18}
73%|███████▎ | 8364/11526 [1:27:24<32:24, 1.63it/s] 73%|███████▎ | 8365/11526 [1:27:25<33:13, 1.59it/s] {'loss': 0.1653, 'grad_norm': 0.6207730174064636, 'learning_rate': 2.121549094862645e-06, 'epoch': 2.18}
73%|███████▎ | 8365/11526 [1:27:25<33:13, 1.59it/s] 73%|███████▎ | 8366/11526 [1:27:25<33:04, 1.59it/s] {'loss': 0.1629, 'grad_norm': 0.5465573072433472, 'learning_rate': 2.1203110221776785e-06, 'epoch': 2.18}
73%|███████▎ | 8366/11526 [1:27:25<33:04, 1.59it/s] 73%|███████▎ | 8367/11526 [1:27:26<32:50, 1.60it/s] {'loss': 0.1467, 'grad_norm': 0.6069388389587402, 'learning_rate': 2.1190732136341718e-06, 'epoch': 2.18}
73%|███████▎ | 8367/11526 [1:27:26<32:50, 1.60it/s] 73%|███████▎ | 8368/11526 [1:27:26<33:40, 1.56it/s] {'loss': 0.1575, 'grad_norm': 0.5784884691238403, 'learning_rate': 2.1178356693456635e-06, 'epoch': 2.18}
73%|███████▎ | 8368/11526 [1:27:27<33:40, 1.56it/s] 73%|███████▎ | 8369/11526 [1:27:27<33:19, 1.58it/s] {'loss': 0.1533, 'grad_norm': 0.5575758218765259, 'learning_rate': 2.1165983894256647e-06, 'epoch': 2.18}
73%|███████▎ | 8369/11526 [1:27:27<33:19, 1.58it/s] 73%|███████▎ | 8370/11526 [1:27:28<32:58, 1.60it/s] {'loss': 0.1866, 'grad_norm': 0.6388062238693237, 'learning_rate': 2.1153613739876676e-06, 'epoch': 2.18}
73%|███████▎ | 8370/11526 [1:27:28<32:58, 1.60it/s] 73%|███████▎ | 8371/11526 [1:27:28<33:38, 1.56it/s] {'loss': 0.1627, 'grad_norm': 0.656924843788147, 'learning_rate': 2.1141246231451385e-06, 'epoch': 2.18}
73%|███████▎ | 8371/11526 [1:27:29<33:38, 1.56it/s] 73%|███████▎ | 8372/11526 [1:27:29<33:23, 1.57it/s] {'loss': 0.1557, 'grad_norm': 0.5960628390312195, 'learning_rate': 2.1128881370115186e-06, 'epoch': 2.18}
73%|███████▎ | 8372/11526 [1:27:29<33:23, 1.57it/s] 73%|███████▎ | 8373/11526 [1:27:30<33:02, 1.59it/s] {'loss': 0.1209, 'grad_norm': 0.5080195665359497, 'learning_rate': 2.1116519157002275e-06, 'epoch': 2.18}
73%|███████▎ | 8373/11526 [1:27:30<33:02, 1.59it/s] 73%|███████▎ | 8374/11526 [1:27:30<32:48, 1.60it/s] {'loss': 0.1589, 'grad_norm': 0.5442765355110168, 'learning_rate': 2.1104159593246547e-06, 'epoch': 2.18}
73%|███████▎ | 8374/11526 [1:27:30<32:48, 1.60it/s] 73%|███████▎ | 8375/11526 [1:27:31<32:39, 1.61it/s] {'loss': 0.1592, 'grad_norm': 0.6641260981559753, 'learning_rate': 2.109180267998169e-06, 'epoch': 2.18}
73%|███████▎ | 8375/11526 [1:27:31<32:39, 1.61it/s] 73%|███████▎ | 8376/11526 [1:27:31<32:32, 1.61it/s] {'loss': 0.1483, 'grad_norm': 0.6538058519363403, 'learning_rate': 2.107944841834122e-06, 'epoch': 2.18}
73%|███████▎ | 8376/11526 [1:27:32<32:32, 1.61it/s] 73%|███████▎ | 8377/11526 [1:27:32<32:34, 1.61it/s] {'loss': 0.1438, 'grad_norm': 0.4993155002593994, 'learning_rate': 2.1067096809458267e-06, 'epoch': 2.18}
73%|███████▎ | 8377/11526 [1:27:32<32:34, 1.61it/s] 73%|███████▎ | 8378/11526 [1:27:33<32:28, 1.62it/s] {'loss': 0.15, 'grad_norm': 0.5469313859939575, 'learning_rate': 2.105474785446582e-06, 'epoch': 2.18}
73%|███████▎ | 8378/11526 [1:27:33<32:28, 1.62it/s] 73%|███████▎ | 8379/11526 [1:27:33<32:22, 1.62it/s] {'loss': 0.1467, 'grad_norm': 0.606773316860199, 'learning_rate': 2.1042401554496603e-06, 'epoch': 2.18}
73%|███████▎ | 8379/11526 [1:27:33<32:22, 1.62it/s] 73%|███████▎ | 8380/11526 [1:27:34<32:18, 1.62it/s] {'loss': 0.1429, 'grad_norm': 0.5308037400245667, 'learning_rate': 2.1030057910683043e-06, 'epoch': 2.18}
73%|███████▎ | 8380/11526 [1:27:34<32:18, 1.62it/s] 73%|███████▎ | 8381/11526 [1:27:35<32:16, 1.62it/s] {'loss': 0.1479, 'grad_norm': 0.5625312924385071, 'learning_rate': 2.1017716924157443e-06, 'epoch': 2.18}
73%|███████▎ | 8381/11526 [1:27:35<32:16, 1.62it/s] 73%|███████▎ | 8382/11526 [1:27:35<32:15, 1.62it/s] {'loss': 0.1457, 'grad_norm': 0.618724524974823, 'learning_rate': 2.1005378596051727e-06, 'epoch': 2.18}
73%|███████▎ | 8382/11526 [1:27:35<32:15, 1.62it/s] 73%|███████▎ | 8383/11526 [1:27:36<32:14, 1.63it/s] {'loss': 0.1261, 'grad_norm': 0.5456370711326599, 'learning_rate': 2.0993042927497663e-06, 'epoch': 2.18}
73%|███████▎ | 8383/11526 [1:27:36<32:14, 1.63it/s] 73%|███████▎ | 8384/11526 [1:27:36<32:12, 1.63it/s] {'loss': 0.1655, 'grad_norm': 0.5977212190628052, 'learning_rate': 2.0980709919626747e-06, 'epoch': 2.18}
73%|███████▎ | 8384/11526 [1:27:36<32:12, 1.63it/s] 73%|███████▎ | 8385/11526 [1:27:37<32:10, 1.63it/s] {'loss': 0.1444, 'grad_norm': 0.5759695172309875, 'learning_rate': 2.0968379573570226e-06, 'epoch': 2.18}
73%|███████▎ | 8385/11526 [1:27:37<32:10, 1.63it/s] 73%|███████▎ | 8386/11526 [1:27:38<32:09, 1.63it/s] {'loss': 0.1433, 'grad_norm': 0.5581424236297607, 'learning_rate': 2.0956051890459113e-06, 'epoch': 2.18}
73%|███████▎ | 8386/11526 [1:27:38<32:09, 1.63it/s] 73%|███████▎ | 8387/11526 [1:27:38<32:12, 1.62it/s] {'loss': 0.1521, 'grad_norm': 0.6054892539978027, 'learning_rate': 2.094372687142419e-06, 'epoch': 2.18}
73%|███████▎ | 8387/11526 [1:27:38<32:12, 1.62it/s] 73%|███████▎ | 8388/11526 [1:27:39<32:09, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.6773520112037659, 'learning_rate': 2.093140451759595e-06, 'epoch': 2.18}
73%|███████▎ | 8388/11526 [1:27:39<32:09, 1.63it/s] 73%|███████▎ | 8389/11526 [1:27:39<32:07, 1.63it/s] {'loss': 0.1529, 'grad_norm': 0.5966917276382446, 'learning_rate': 2.0919084830104674e-06, 'epoch': 2.18}
73%|███████▎ | 8389/11526 [1:27:40<32:07, 1.63it/s] 73%|███████▎ | 8390/11526 [1:27:40<32:06, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.5063797831535339, 'learning_rate': 2.09067678100804e-06, 'epoch': 2.18}
73%|███████▎ | 8390/11526 [1:27:40<32:06, 1.63it/s] 73%|███████▎ | 8391/11526 [1:27:41<32:04, 1.63it/s] {'loss': 0.1485, 'grad_norm': 0.5686753988265991, 'learning_rate': 2.0894453458652923e-06, 'epoch': 2.18}
73%|███████▎ | 8391/11526 [1:27:41<32:04, 1.63it/s] 73%|███████▎ | 8392/11526 [1:27:41<32:08, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.5921817421913147, 'learning_rate': 2.088214177695178e-06, 'epoch': 2.18}
73%|███████▎ | 8392/11526 [1:27:41<32:08, 1.63it/s] 73%|███████▎ | 8393/11526 [1:27:42<32:06, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.5996673107147217, 'learning_rate': 2.0869832766106265e-06, 'epoch': 2.18}
73%|███████▎ | 8393/11526 [1:27:42<32:06, 1.63it/s] 73%|███████▎ | 8394/11526 [1:27:43<32:05, 1.63it/s] {'loss': 0.159, 'grad_norm': 0.5162465572357178, 'learning_rate': 2.0857526427245426e-06, 'epoch': 2.18}
73%|███████▎ | 8394/11526 [1:27:43<32:05, 1.63it/s] 73%|███████▎ | 8395/11526 [1:27:43<32:04, 1.63it/s] {'loss': 0.1791, 'grad_norm': 0.6736350655555725, 'learning_rate': 2.08452227614981e-06, 'epoch': 2.19}
73%|███████▎ | 8395/11526 [1:27:43<32:04, 1.63it/s] 73%|███████▎ | 8396/11526 [1:27:44<32:07, 1.62it/s] {'loss': 0.1358, 'grad_norm': 0.5711714029312134, 'learning_rate': 2.0832921769992808e-06, 'epoch': 2.19}
73%|███████▎ | 8396/11526 [1:27:44<32:07, 1.62it/s] 73%|███████▎ | 8397/11526 [1:27:44<32:09, 1.62it/s] {'loss': 0.1679, 'grad_norm': 0.598861038684845, 'learning_rate': 2.0820623453857884e-06, 'epoch': 2.19}
73%|███████▎ | 8397/11526 [1:27:44<32:09, 1.62it/s] 73%|███████▎ | 8398/11526 [1:27:45<32:07, 1.62it/s] {'loss': 0.1704, 'grad_norm': 0.6754838824272156, 'learning_rate': 2.0808327814221403e-06, 'epoch': 2.19}
73%|███████▎ | 8398/11526 [1:27:45<32:07, 1.62it/s] 73%|███████▎ | 8399/11526 [1:27:46<32:04, 1.62it/s] {'loss': 0.1289, 'grad_norm': 0.5424450039863586, 'learning_rate': 2.079603485221118e-06, 'epoch': 2.19}
73%|███████▎ | 8399/11526 [1:27:46<32:04, 1.62it/s] 73%|███████▎ | 8400/11526 [1:27:46<32:02, 1.63it/s] {'loss': 0.1572, 'grad_norm': 0.5805314779281616, 'learning_rate': 2.0783744568954815e-06, 'epoch': 2.19}
73%|███████▎ | 8400/11526 [1:27:46<32:02, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.32it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.77it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.544998049736023, 'eval_runtime': 1.9556, 'eval_samples_per_second': 102.27, 'eval_steps_per_second': 6.648, 'epoch': 2.19}
73%|███████▎ | 8400/11526 [1:27:48<32:02, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 73%|███████▎ | 8401/11526 [1:27:49<1:02:38, 1.20s/it] {'loss': 0.1157, 'grad_norm': 0.4748297929763794, 'learning_rate': 2.0771456965579645e-06, 'epoch': 2.19}
73%|███████▎ | 8401/11526 [1:27:49<1:02:38, 1.20s/it] 73%|███████▎ | 8402/11526 [1:27:49<53:26, 1.03s/it] {'loss': 0.1592, 'grad_norm': 0.5704538822174072, 'learning_rate': 2.075917204321271e-06, 'epoch': 2.19}
73%|███████▎ | 8402/11526 [1:27:50<53:26, 1.03s/it] 73%|███████▎ | 8403/11526 [1:27:50<46:58, 1.11it/s] {'loss': 0.1787, 'grad_norm': 0.6146233677864075, 'learning_rate': 2.0746889802980926e-06, 'epoch': 2.19}
73%|███████▎ | 8403/11526 [1:27:50<46:58, 1.11it/s] 73%|███████▎ | 8404/11526 [1:27:51<42:27, 1.23it/s] {'loss': 0.1796, 'grad_norm': 0.6476463079452515, 'learning_rate': 2.073461024601083e-06, 'epoch': 2.19}
73%|███████▎ | 8404/11526 [1:27:51<42:27, 1.23it/s] 73%|███████▎ | 8405/11526 [1:27:51<39:17, 1.32it/s] {'loss': 0.1549, 'grad_norm': 0.5519036650657654, 'learning_rate': 2.0722333373428806e-06, 'epoch': 2.19}
73%|███████▎ | 8405/11526 [1:27:51<39:17, 1.32it/s] 73%|███████▎ | 8406/11526 [1:27:52<37:03, 1.40it/s] {'loss': 0.1417, 'grad_norm': 0.5329334735870361, 'learning_rate': 2.0710059186360953e-06, 'epoch': 2.19}
73%|███████▎ | 8406/11526 [1:27:52<37:03, 1.40it/s] 73%|███████▎ | 8407/11526 [1:27:52<35:32, 1.46it/s] {'loss': 0.131, 'grad_norm': 0.545513391494751, 'learning_rate': 2.0697787685933094e-06, 'epoch': 2.19}
73%|███████▎ | 8407/11526 [1:27:53<35:32, 1.46it/s] 73%|███████▎ | 8408/11526 [1:27:53<34:28, 1.51it/s] {'loss': 0.1736, 'grad_norm': 0.6651443839073181, 'learning_rate': 2.0685518873270894e-06, 'epoch': 2.19}
73%|███████▎ | 8408/11526 [1:27:53<34:28, 1.51it/s] 73%|███████▎ | 8409/11526 [1:27:54<33:42, 1.54it/s] {'loss': 0.1386, 'grad_norm': 0.5238356590270996, 'learning_rate': 2.0673252749499704e-06, 'epoch': 2.19}
73%|███████▎ | 8409/11526 [1:27:54<33:42, 1.54it/s] 73%|███████▎ | 8410/11526 [1:27:54<33:09, 1.57it/s] {'loss': 0.1745, 'grad_norm': 0.5809915661811829, 'learning_rate': 2.0660989315744624e-06, 'epoch': 2.19}
73%|███████▎ | 8410/11526 [1:27:54<33:09, 1.57it/s] 73%|███████▎ | 8411/11526 [1:27:55<32:46, 1.58it/s] {'loss': 0.1547, 'grad_norm': 0.5548303723335266, 'learning_rate': 2.0648728573130532e-06, 'epoch': 2.19}
73%|███████▎ | 8411/11526 [1:27:55<32:46, 1.58it/s] 73%|███████▎ | 8412/11526 [1:27:56<32:33, 1.59it/s] {'loss': 0.1067, 'grad_norm': 0.43080899119377136, 'learning_rate': 2.063647052278206e-06, 'epoch': 2.19}
73%|███████▎ | 8412/11526 [1:27:56<32:33, 1.59it/s] 73%|███████▎ | 8413/11526 [1:27:56<32:19, 1.60it/s] {'loss': 0.1514, 'grad_norm': 0.5829296708106995, 'learning_rate': 2.062421516582358e-06, 'epoch': 2.19}
73%|███████▎ | 8413/11526 [1:27:56<32:19, 1.60it/s] 73%|███████▎ | 8414/11526 [1:27:57<32:10, 1.61it/s] {'loss': 0.1274, 'grad_norm': 0.59073406457901, 'learning_rate': 2.0611962503379247e-06, 'epoch': 2.19}
73%|███████▎ | 8414/11526 [1:27:57<32:10, 1.61it/s] 73%|███████▎ | 8415/11526 [1:27:57<32:03, 1.62it/s] {'loss': 0.1487, 'grad_norm': 0.5892074108123779, 'learning_rate': 2.0599712536572887e-06, 'epoch': 2.19}
73%|███████▎ | 8415/11526 [1:27:58<32:03, 1.62it/s] 73%|███████▎ | 8416/11526 [1:27:58<31:59, 1.62it/s] {'loss': 0.1513, 'grad_norm': 0.6086567044258118, 'learning_rate': 2.0587465266528188e-06, 'epoch': 2.19}
73%|███████▎ | 8416/11526 [1:27:58<31:59, 1.62it/s] 73%|███████▎ | 8417/11526 [1:27:59<31:59, 1.62it/s] {'loss': 0.1498, 'grad_norm': 0.5517714619636536, 'learning_rate': 2.057522069436855e-06, 'epoch': 2.19}
73%|███████▎ | 8417/11526 [1:27:59<31:59, 1.62it/s] 73%|███████▎ | 8418/11526 [1:27:59<31:55, 1.62it/s] {'loss': 0.1819, 'grad_norm': 0.5809224843978882, 'learning_rate': 2.0562978821217066e-06, 'epoch': 2.19}
73%|███████▎ | 8418/11526 [1:27:59<31:55, 1.62it/s] 73%|███████▎ | 8419/11526 [1:28:00<31:52, 1.62it/s] {'loss': 0.1332, 'grad_norm': 0.5175097584724426, 'learning_rate': 2.055073964819666e-06, 'epoch': 2.19}
73%|███████▎ | 8419/11526 [1:28:00<31:52, 1.62it/s] 73%|███████▎ | 8420/11526 [1:28:00<31:50, 1.63it/s] {'loss': 0.1185, 'grad_norm': 0.4930518567562103, 'learning_rate': 2.0538503176429967e-06, 'epoch': 2.19}
73%|███████▎ | 8420/11526 [1:28:01<31:50, 1.63it/s] 73%|███████▎ | 8421/11526 [1:28:01<31:48, 1.63it/s] {'loss': 0.1365, 'grad_norm': 0.5493196845054626, 'learning_rate': 2.0526269407039394e-06, 'epoch': 2.19}
73%|███████▎ | 8421/11526 [1:28:01<31:48, 1.63it/s] 73%|███████▎ | 8422/11526 [1:28:02<31:49, 1.63it/s] {'loss': 0.094, 'grad_norm': 0.3700513243675232, 'learning_rate': 2.0514038341147112e-06, 'epoch': 2.19}
73%|███████▎ | 8422/11526 [1:28:02<31:49, 1.63it/s] 73%|███████▎ | 8423/11526 [1:28:02<31:48, 1.63it/s] {'loss': 0.1272, 'grad_norm': 0.5421152114868164, 'learning_rate': 2.050180997987498e-06, 'epoch': 2.19}
73%|███████▎ | 8423/11526 [1:28:02<31:48, 1.63it/s] 73%|███████▎ | 8424/11526 [1:28:03<31:47, 1.63it/s] {'loss': 0.1135, 'grad_norm': 0.449718713760376, 'learning_rate': 2.048958432434466e-06, 'epoch': 2.19}
73%|███████▎ | 8424/11526 [1:28:03<31:47, 1.63it/s] 73%|███████▎ | 8425/11526 [1:28:04<31:45, 1.63it/s] {'loss': 0.1411, 'grad_norm': 0.5769368410110474, 'learning_rate': 2.0477361375677603e-06, 'epoch': 2.19}
73%|███████▎ | 8425/11526 [1:28:04<31:45, 1.63it/s] 73%|███████▎ | 8426/11526 [1:28:04<31:45, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.6157791614532471, 'learning_rate': 2.046514113499492e-06, 'epoch': 2.19}
73%|███████▎ | 8426/11526 [1:28:04<31:45, 1.63it/s] 73%|███████▎ | 8427/11526 [1:28:05<31:45, 1.63it/s] {'loss': 0.1325, 'grad_norm': 0.6079584360122681, 'learning_rate': 2.045292360341754e-06, 'epoch': 2.19}
73%|███████▎ | 8427/11526 [1:28:05<31:45, 1.63it/s] 73%|███████▎ | 8428/11526 [1:28:05<31:44, 1.63it/s] {'loss': 0.1148, 'grad_norm': 0.4470120072364807, 'learning_rate': 2.0440708782066143e-06, 'epoch': 2.19}
73%|███████▎ | 8428/11526 [1:28:06<31:44, 1.63it/s] 73%|███████▎ | 8429/11526 [1:28:06<31:43, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.5364553928375244, 'learning_rate': 2.042849667206108e-06, 'epoch': 2.19}
73%|███████▎ | 8429/11526 [1:28:06<31:43, 1.63it/s] 73%|███████▎ | 8430/11526 [1:28:07<31:41, 1.63it/s] {'loss': 0.1481, 'grad_norm': 0.6646642088890076, 'learning_rate': 2.04162872745226e-06, 'epoch': 2.19}
73%|███████▎ | 8430/11526 [1:28:07<31:41, 1.63it/s] 73%|███████▎ | 8431/11526 [1:28:07<31:41, 1.63it/s] {'loss': 0.1666, 'grad_norm': 0.5557377934455872, 'learning_rate': 2.040408059057056e-06, 'epoch': 2.19}
73%|███████▎ | 8431/11526 [1:28:07<31:41, 1.63it/s] 73%|███████▎ | 8432/11526 [1:28:08<31:44, 1.62it/s] {'loss': 0.1216, 'grad_norm': 0.5159407258033752, 'learning_rate': 2.039187662132463e-06, 'epoch': 2.19}
73%|███████▎ | 8432/11526 [1:28:08<31:44, 1.62it/s] 73%|███████▎ | 8433/11526 [1:28:08<31:42, 1.63it/s] {'loss': 0.1593, 'grad_norm': 0.5695891380310059, 'learning_rate': 2.037967536790425e-06, 'epoch': 2.19}
73%|███████▎ | 8433/11526 [1:28:09<31:42, 1.63it/s] 73%|███████▎ | 8434/11526 [1:28:09<31:40, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.5741772651672363, 'learning_rate': 2.0367476831428574e-06, 'epoch': 2.2}
73%|███████▎ | 8434/11526 [1:28:09<31:40, 1.63it/s] 73%|███████▎ | 8435/11526 [1:28:10<31:40, 1.63it/s] {'loss': 0.1523, 'grad_norm': 0.6357660889625549, 'learning_rate': 2.0355281013016526e-06, 'epoch': 2.2}
73%|███████▎ | 8435/11526 [1:28:10<31:40, 1.63it/s] 73%|███████▎ | 8436/11526 [1:28:10<31:39, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.6416739225387573, 'learning_rate': 2.034308791378679e-06, 'epoch': 2.2}
73%|███████▎ | 8436/11526 [1:28:10<31:39, 1.63it/s] 73%|███████▎ | 8437/11526 [1:28:11<31:43, 1.62it/s] {'loss': 0.1391, 'grad_norm': 0.5734087228775024, 'learning_rate': 2.0330897534857756e-06, 'epoch': 2.2}
73%|███████▎ | 8437/11526 [1:28:11<31:43, 1.62it/s] 73%|███████▎ | 8438/11526 [1:28:12<31:39, 1.63it/s] {'loss': 0.1574, 'grad_norm': 0.6264486312866211, 'learning_rate': 2.0318709877347616e-06, 'epoch': 2.2}
73%|███████▎ | 8438/11526 [1:28:12<31:39, 1.63it/s] 73%|███████▎ | 8439/11526 [1:28:12<31:38, 1.63it/s] {'loss': 0.1228, 'grad_norm': 0.49313247203826904, 'learning_rate': 2.0306524942374277e-06, 'epoch': 2.2}
73%|███████▎ | 8439/11526 [1:28:12<31:38, 1.63it/s] 73%|███████▎ | 8440/11526 [1:28:13<31:35, 1.63it/s] {'loss': 0.1781, 'grad_norm': 0.5893586874008179, 'learning_rate': 2.029434273105542e-06, 'epoch': 2.2}
73%|███████▎ | 8440/11526 [1:28:13<31:35, 1.63it/s] 73%|███████▎ | 8441/11526 [1:28:13<31:36, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.5396973490715027, 'learning_rate': 2.0282163244508484e-06, 'epoch': 2.2}
73%|███████▎ | 8441/11526 [1:28:14<31:36, 1.63it/s] 73%|███████▎ | 8442/11526 [1:28:14<31:36, 1.63it/s] {'loss': 0.1281, 'grad_norm': 0.49350443482398987, 'learning_rate': 2.026998648385059e-06, 'epoch': 2.2}
73%|███████▎ | 8442/11526 [1:28:14<31:36, 1.63it/s] 73%|███████▎ | 8443/11526 [1:28:15<31:35, 1.63it/s] {'loss': 0.1565, 'grad_norm': 0.6217325925827026, 'learning_rate': 2.025781245019871e-06, 'epoch': 2.2}
73%|███████▎ | 8443/11526 [1:28:15<31:35, 1.63it/s] 73%|███████▎ | 8444/11526 [1:28:15<31:33, 1.63it/s] {'loss': 0.1351, 'grad_norm': 0.5109628438949585, 'learning_rate': 2.0245641144669516e-06, 'epoch': 2.2}
73%|███████▎ | 8444/11526 [1:28:15<31:33, 1.63it/s] 73%|███████▎ | 8445/11526 [1:28:16<31:32, 1.63it/s] {'loss': 0.1973, 'grad_norm': 0.6944909691810608, 'learning_rate': 2.0233472568379393e-06, 'epoch': 2.2}
73%|███████▎ | 8445/11526 [1:28:16<31:32, 1.63it/s] 73%|███████▎ | 8446/11526 [1:28:16<31:31, 1.63it/s] {'loss': 0.109, 'grad_norm': 0.4196477234363556, 'learning_rate': 2.0221306722444533e-06, 'epoch': 2.2}
73%|███████▎ | 8446/11526 [1:28:17<31:31, 1.63it/s] 73%|███████▎ | 8447/11526 [1:28:17<31:30, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.5669395923614502, 'learning_rate': 2.020914360798085e-06, 'epoch': 2.2}
73%|███████▎ | 8447/11526 [1:28:17<31:30, 1.63it/s] 73%|███████▎ | 8448/11526 [1:28:18<31:29, 1.63it/s] {'loss': 0.1431, 'grad_norm': 0.7142566442489624, 'learning_rate': 2.0196983226104024e-06, 'epoch': 2.2}
73%|███████▎ | 8448/11526 [1:28:18<31:29, 1.63it/s] 73%|███████▎ | 8449/11526 [1:28:18<31:28, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.6983550786972046, 'learning_rate': 2.0184825577929477e-06, 'epoch': 2.2}
73%|███████▎ | 8449/11526 [1:28:18<31:28, 1.63it/s] 73%|███████▎ | 8450/11526 [1:28:19<31:28, 1.63it/s] {'loss': 0.1639, 'grad_norm': 0.6063715815544128, 'learning_rate': 2.017267066457236e-06, 'epoch': 2.2}
73%|███████▎ | 8450/11526 [1:28:19<31:28, 1.63it/s] 73%|███████▎ | 8451/11526 [1:28:20<31:28, 1.63it/s] {'loss': 0.1395, 'grad_norm': 0.49577659368515015, 'learning_rate': 2.016051848714758e-06, 'epoch': 2.2}
73%|███████▎ | 8451/11526 [1:28:20<31:28, 1.63it/s] 73%|███████▎ | 8452/11526 [1:28:20<31:28, 1.63it/s] {'loss': 0.1458, 'grad_norm': 0.5523250699043274, 'learning_rate': 2.0148369046769854e-06, 'epoch': 2.2}
73%|███████▎ | 8452/11526 [1:28:20<31:28, 1.63it/s] 73%|███████▎ | 8453/11526 [1:28:21<31:27, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.7136170268058777, 'learning_rate': 2.0136222344553557e-06, 'epoch': 2.2}
73%|███████▎ | 8453/11526 [1:28:21<31:27, 1.63it/s] 73%|███████▎ | 8454/11526 [1:28:21<31:26, 1.63it/s] {'loss': 0.1578, 'grad_norm': 0.6251581311225891, 'learning_rate': 2.0124078381612868e-06, 'epoch': 2.2}
73%|███████▎ | 8454/11526 [1:28:21<31:26, 1.63it/s] 73%|███████▎ | 8455/11526 [1:28:22<31:27, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.48310133814811707, 'learning_rate': 2.0111937159061708e-06, 'epoch': 2.2}
73%|███████▎ | 8455/11526 [1:28:22<31:27, 1.63it/s] 73%|███████▎ | 8456/11526 [1:28:23<31:26, 1.63it/s] {'loss': 0.1364, 'grad_norm': 0.5636410713195801, 'learning_rate': 2.0099798678013687e-06, 'epoch': 2.2}
73%|███████▎ | 8456/11526 [1:28:23<31:26, 1.63it/s] 73%|███████▎ | 8457/11526 [1:28:23<31:28, 1.63it/s] {'loss': 0.1705, 'grad_norm': 0.6930277347564697, 'learning_rate': 2.0087662939582303e-06, 'epoch': 2.2}
73%|███████▎ | 8457/11526 [1:28:23<31:28, 1.63it/s] 73%|███████▎ | 8458/11526 [1:28:24<31:25, 1.63it/s] {'loss': 0.1428, 'grad_norm': 0.5698708891868591, 'learning_rate': 2.0075529944880646e-06, 'epoch': 2.2}
73%|███████▎ | 8458/11526 [1:28:24<31:25, 1.63it/s] 73%|███████▎ | 8459/11526 [1:28:24<31:24, 1.63it/s] {'loss': 0.1773, 'grad_norm': 0.6468815803527832, 'learning_rate': 2.006339969502164e-06, 'epoch': 2.2}
73%|███████▎ | 8459/11526 [1:28:25<31:24, 1.63it/s] 73%|███████▎ | 8460/11526 [1:28:25<31:23, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.48844295740127563, 'learning_rate': 2.0051272191117955e-06, 'epoch': 2.2}
73%|███████▎ | 8460/11526 [1:28:25<31:23, 1.63it/s] 73%|███████▎ | 8461/11526 [1:28:26<31:23, 1.63it/s] {'loss': 0.1342, 'grad_norm': 0.5795053243637085, 'learning_rate': 2.0039147434281982e-06, 'epoch': 2.2}
73%|███████▎ | 8461/11526 [1:28:26<31:23, 1.63it/s] 73%|███████▎ | 8462/11526 [1:28:26<31:32, 1.62it/s] {'loss': 0.1806, 'grad_norm': 0.6673493385314941, 'learning_rate': 2.002702542562588e-06, 'epoch': 2.2}
73%|███████▎ | 8462/11526 [1:28:26<31:32, 1.62it/s] 73%|███████▎ | 8463/11526 [1:28:27<31:28, 1.62it/s] {'loss': 0.1615, 'grad_norm': 0.6560284495353699, 'learning_rate': 2.0014906166261556e-06, 'epoch': 2.2}
73%|███████▎ | 8463/11526 [1:28:27<31:28, 1.62it/s] 73%|███████▎ | 8464/11526 [1:28:28<31:25, 1.62it/s] {'loss': 0.1347, 'grad_norm': 0.5146645307540894, 'learning_rate': 2.0002789657300615e-06, 'epoch': 2.2}
73%|███████▎ | 8464/11526 [1:28:28<31:25, 1.62it/s] 73%|███████▎ | 8465/11526 [1:28:28<31:23, 1.63it/s] {'loss': 0.1732, 'grad_norm': 0.6627865433692932, 'learning_rate': 1.9990675899854522e-06, 'epoch': 2.2}
73%|███████▎ | 8465/11526 [1:28:28<31:23, 1.63it/s] 73%|███████▎ | 8466/11526 [1:28:29<31:21, 1.63it/s] {'loss': 0.1513, 'grad_norm': 0.566474437713623, 'learning_rate': 1.9978564895034357e-06, 'epoch': 2.2}
73%|███████▎ | 8466/11526 [1:28:29<31:21, 1.63it/s] 73%|███████▎ | 8467/11526 [1:28:29<31:22, 1.63it/s] {'loss': 0.2218, 'grad_norm': 0.9432145953178406, 'learning_rate': 1.996645664395104e-06, 'epoch': 2.2}
73%|███████▎ | 8467/11526 [1:28:29<31:22, 1.63it/s] 73%|███████▎ | 8468/11526 [1:28:30<31:22, 1.62it/s] {'loss': 0.1688, 'grad_norm': 0.6962944269180298, 'learning_rate': 1.99543511477152e-06, 'epoch': 2.2}
73%|███████▎ | 8468/11526 [1:28:30<31:22, 1.62it/s] 73%|███████▎ | 8469/11526 [1:28:31<31:20, 1.63it/s] {'loss': 0.1503, 'grad_norm': 0.569614827632904, 'learning_rate': 1.994224840743723e-06, 'epoch': 2.2}
73%|███████▎ | 8469/11526 [1:28:31<31:20, 1.63it/s] 73%|███████▎ | 8470/11526 [1:28:31<31:18, 1.63it/s] {'loss': 0.1864, 'grad_norm': 0.6336362957954407, 'learning_rate': 1.9930148424227254e-06, 'epoch': 2.2}
73%|███████▎ | 8470/11526 [1:28:31<31:18, 1.63it/s] 73%|███████▎ | 8471/11526 [1:28:32<31:18, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.5887694358825684, 'learning_rate': 1.9918051199195175e-06, 'epoch': 2.2}
73%|███████▎ | 8471/11526 [1:28:32<31:18, 1.63it/s] 74%|███████▎ | 8472/11526 [1:28:32<31:18, 1.63it/s] {'loss': 0.214, 'grad_norm': 0.6975953578948975, 'learning_rate': 1.9905956733450583e-06, 'epoch': 2.21}
74%|███████▎ | 8472/11526 [1:28:33<31:18, 1.63it/s] 74%|███████▎ | 8473/11526 [1:28:33<31:16, 1.63it/s] {'loss': 0.1461, 'grad_norm': 0.5471853613853455, 'learning_rate': 1.9893865028102843e-06, 'epoch': 2.21}
74%|███████▎ | 8473/11526 [1:28:33<31:16, 1.63it/s] 74%|███████▎ | 8474/11526 [1:28:34<31:15, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.6034783720970154, 'learning_rate': 1.988177608426114e-06, 'epoch': 2.21}
74%|███████▎ | 8474/11526 [1:28:34<31:15, 1.63it/s] 74%|███████▎ | 8475/11526 [1:28:34<31:14, 1.63it/s] {'loss': 0.1589, 'grad_norm': 0.6087373495101929, 'learning_rate': 1.9869689903034285e-06, 'epoch': 2.21}
74%|███████▎ | 8475/11526 [1:28:34<31:14, 1.63it/s] 74%|███████▎ | 8476/11526 [1:28:35<31:13, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.6346095204353333, 'learning_rate': 1.98576064855309e-06, 'epoch': 2.21}
74%|███████▎ | 8476/11526 [1:28:35<31:13, 1.63it/s] 74%|███████▎ | 8477/11526 [1:28:36<31:14, 1.63it/s] {'loss': 0.1187, 'grad_norm': 0.4830820560455322, 'learning_rate': 1.984552583285937e-06, 'epoch': 2.21}
74%|███████▎ | 8477/11526 [1:28:36<31:14, 1.63it/s] 74%|███████▎ | 8478/11526 [1:28:36<31:13, 1.63it/s] {'loss': 0.1463, 'grad_norm': 0.5360310077667236, 'learning_rate': 1.9833447946127748e-06, 'epoch': 2.21}
74%|███████▎ | 8478/11526 [1:28:36<31:13, 1.63it/s] 74%|███████▎ | 8479/11526 [1:28:37<31:17, 1.62it/s] {'loss': 0.1246, 'grad_norm': 0.563264012336731, 'learning_rate': 1.982137282644396e-06, 'epoch': 2.21}
74%|███████▎ | 8479/11526 [1:28:37<31:17, 1.62it/s] 74%|███████▎ | 8480/11526 [1:28:37<31:15, 1.62it/s] {'loss': 0.1642, 'grad_norm': 0.615913450717926, 'learning_rate': 1.980930047491554e-06, 'epoch': 2.21}
74%|███████▎ | 8480/11526 [1:28:37<31:15, 1.62it/s] 74%|███████▎ | 8481/11526 [1:28:38<31:13, 1.62it/s] {'loss': 0.1346, 'grad_norm': 0.5467604994773865, 'learning_rate': 1.9797230892649867e-06, 'epoch': 2.21}
74%|███████▎ | 8481/11526 [1:28:38<31:13, 1.62it/s] 74%|███████▎ | 8482/11526 [1:28:39<31:14, 1.62it/s] {'loss': 0.1372, 'grad_norm': 0.6119275093078613, 'learning_rate': 1.978516408075402e-06, 'epoch': 2.21}
74%|███████▎ | 8482/11526 [1:28:39<31:14, 1.62it/s] 74%|███████▎ | 8483/11526 [1:28:39<31:12, 1.63it/s] {'loss': 0.1292, 'grad_norm': 0.540126383304596, 'learning_rate': 1.977310004033484e-06, 'epoch': 2.21}
74%|███████▎ | 8483/11526 [1:28:39<31:12, 1.63it/s] 74%|███████▎ | 8484/11526 [1:28:40<31:10, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.6190468072891235, 'learning_rate': 1.976103877249891e-06, 'epoch': 2.21}
74%|███████▎ | 8484/11526 [1:28:40<31:10, 1.63it/s] 74%|███████▎ | 8485/11526 [1:28:40<31:08, 1.63it/s] {'loss': 0.1476, 'grad_norm': 0.5681361556053162, 'learning_rate': 1.974898027835257e-06, 'epoch': 2.21}
74%|███████▎ | 8485/11526 [1:28:41<31:08, 1.63it/s] 74%|███████▎ | 8486/11526 [1:28:41<31:09, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.604262113571167, 'learning_rate': 1.9736924559001865e-06, 'epoch': 2.21}
74%|███████▎ | 8486/11526 [1:28:41<31:09, 1.63it/s] 74%|███████▎ | 8487/11526 [1:28:42<31:10, 1.62it/s] {'loss': 0.17, 'grad_norm': 0.5902095437049866, 'learning_rate': 1.972487161555262e-06, 'epoch': 2.21}
74%|███████▎ | 8487/11526 [1:28:42<31:10, 1.62it/s] 74%|███████▎ | 8488/11526 [1:28:42<31:08, 1.63it/s] {'loss': 0.1357, 'grad_norm': 0.559669017791748, 'learning_rate': 1.971282144911042e-06, 'epoch': 2.21}
74%|███████▎ | 8488/11526 [1:28:42<31:08, 1.63it/s] 74%|███████▎ | 8489/11526 [1:28:43<31:07, 1.63it/s] {'loss': 0.1347, 'grad_norm': 0.5874924063682556, 'learning_rate': 1.9700774060780558e-06, 'epoch': 2.21}
74%|███████▎ | 8489/11526 [1:28:43<31:07, 1.63it/s] 74%|███████▎ | 8490/11526 [1:28:44<31:05, 1.63it/s] {'loss': 0.1638, 'grad_norm': 0.5977917909622192, 'learning_rate': 1.9688729451668116e-06, 'epoch': 2.21}
74%|███████▎ | 8490/11526 [1:28:44<31:05, 1.63it/s] 74%|███████▎ | 8491/11526 [1:28:44<31:04, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.5938209891319275, 'learning_rate': 1.967668762287784e-06, 'epoch': 2.21}
74%|███████▎ | 8491/11526 [1:28:44<31:04, 1.63it/s] 74%|███████▎ | 8492/11526 [1:28:45<31:04, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.6107929944992065, 'learning_rate': 1.9664648575514316e-06, 'epoch': 2.21}
74%|███████▎ | 8492/11526 [1:28:45<31:04, 1.63it/s] 74%|███████▎ | 8493/11526 [1:28:45<31:03, 1.63it/s] {'loss': 0.18, 'grad_norm': 0.6612604856491089, 'learning_rate': 1.965261231068185e-06, 'epoch': 2.21}
74%|███████▎ | 8493/11526 [1:28:45<31:03, 1.63it/s] 74%|███████▎ | 8494/11526 [1:28:46<31:01, 1.63it/s] {'loss': 0.1313, 'grad_norm': 0.5425468683242798, 'learning_rate': 1.964057882948443e-06, 'epoch': 2.21}
74%|███████▎ | 8494/11526 [1:28:46<31:01, 1.63it/s] 74%|███████▎ | 8495/11526 [1:28:47<31:00, 1.63it/s] {'loss': 0.1615, 'grad_norm': 0.6652870178222656, 'learning_rate': 1.9628548133025866e-06, 'epoch': 2.21}
74%|███████▎ | 8495/11526 [1:28:47<31:00, 1.63it/s] 74%|███████▎ | 8496/11526 [1:28:47<31:03, 1.63it/s] {'loss': 0.1872, 'grad_norm': 0.6651562452316284, 'learning_rate': 1.961652022240967e-06, 'epoch': 2.21}
74%|███████▎ | 8496/11526 [1:28:47<31:03, 1.63it/s] 74%|███████▎ | 8497/11526 [1:28:48<31:03, 1.63it/s] {'loss': 0.1403, 'grad_norm': 0.5395395755767822, 'learning_rate': 1.960449509873911e-06, 'epoch': 2.21}
74%|███████▎ | 8497/11526 [1:28:48<31:03, 1.63it/s] 74%|███████▎ | 8498/11526 [1:28:48<31:01, 1.63it/s] {'loss': 0.1831, 'grad_norm': 0.5723538398742676, 'learning_rate': 1.959247276311722e-06, 'epoch': 2.21}
74%|███████▎ | 8498/11526 [1:28:49<31:01, 1.63it/s] 74%|███████▎ | 8499/11526 [1:28:49<30:59, 1.63it/s] {'loss': 0.1634, 'grad_norm': 0.6473268270492554, 'learning_rate': 1.958045321664673e-06, 'epoch': 2.21}
74%|███████▎ | 8499/11526 [1:28:49<30:59, 1.63it/s] 74%|███████▎ | 8500/11526 [1:28:50<30:58, 1.63it/s] {'loss': 0.1233, 'grad_norm': 0.47176918387413025, 'learning_rate': 1.9568436460430125e-06, 'epoch': 2.21}
74%|███████▎ | 8500/11526 [1:28:50<30:58, 1.63it/s] 74%|███████▍ | 8501/11526 [1:28:50<30:56, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.5915610790252686, 'learning_rate': 1.9556422495569714e-06, 'epoch': 2.21}
74%|███████▍ | 8501/11526 [1:28:50<30:56, 1.63it/s] 74%|███████▍ | 8502/11526 [1:28:51<30:57, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.6315069794654846, 'learning_rate': 1.9544411323167433e-06, 'epoch': 2.21}
74%|███████▍ | 8502/11526 [1:28:51<30:57, 1.63it/s] 74%|███████▍ | 8503/11526 [1:28:51<30:57, 1.63it/s] {'loss': 0.1808, 'grad_norm': 0.9103038311004639, 'learning_rate': 1.953240294432503e-06, 'epoch': 2.21}
74%|███████▍ | 8503/11526 [1:28:52<30:57, 1.63it/s] 74%|███████▍ | 8504/11526 [1:28:52<30:56, 1.63it/s] {'loss': 0.1446, 'grad_norm': 0.583886444568634, 'learning_rate': 1.9520397360144e-06, 'epoch': 2.21}
74%|███████▍ | 8504/11526 [1:28:52<30:56, 1.63it/s] 74%|███████▍ | 8505/11526 [1:28:53<30:56, 1.63it/s] {'loss': 0.1644, 'grad_norm': 0.7547414898872375, 'learning_rate': 1.9508394571725507e-06, 'epoch': 2.21}
74%|███████▍ | 8505/11526 [1:28:53<30:56, 1.63it/s] 74%|███████▍ | 8506/11526 [1:28:53<30:55, 1.63it/s] {'loss': 0.1417, 'grad_norm': 0.5680786967277527, 'learning_rate': 1.94963945801706e-06, 'epoch': 2.21}
74%|███████▍ | 8506/11526 [1:28:53<30:55, 1.63it/s] 74%|███████▍ | 8507/11526 [1:28:54<30:54, 1.63it/s] {'loss': 0.1265, 'grad_norm': 0.5222706198692322, 'learning_rate': 1.948439738657991e-06, 'epoch': 2.21}
74%|███████▍ | 8507/11526 [1:28:54<30:54, 1.63it/s] 74%|███████▍ | 8508/11526 [1:28:55<30:54, 1.63it/s] {'loss': 0.1649, 'grad_norm': 0.6666382551193237, 'learning_rate': 1.9472402992053923e-06, 'epoch': 2.21}
74%|███████▍ | 8508/11526 [1:28:55<30:54, 1.63it/s] 74%|███████▍ | 8509/11526 [1:28:55<30:53, 1.63it/s] {'loss': 0.1314, 'grad_norm': 0.5622068643569946, 'learning_rate': 1.946041139769283e-06, 'epoch': 2.21}
74%|███████▍ | 8509/11526 [1:28:55<30:53, 1.63it/s] 74%|███████▍ | 8510/11526 [1:28:56<30:53, 1.63it/s] {'loss': 0.1489, 'grad_norm': 0.5844137072563171, 'learning_rate': 1.944842260459657e-06, 'epoch': 2.21}
74%|███████▍ | 8510/11526 [1:28:56<30:53, 1.63it/s] 74%|███████▍ | 8511/11526 [1:28:56<30:52, 1.63it/s] {'loss': 0.2137, 'grad_norm': 0.7157076597213745, 'learning_rate': 1.943643661386481e-06, 'epoch': 2.22}
74%|███████▍ | 8511/11526 [1:28:57<30:52, 1.63it/s] 74%|███████▍ | 8512/11526 [1:28:57<30:51, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.6465024948120117, 'learning_rate': 1.942445342659701e-06, 'epoch': 2.22}
74%|███████▍ | 8512/11526 [1:28:57<30:51, 1.63it/s] 74%|███████▍ | 8513/11526 [1:28:58<30:50, 1.63it/s] {'loss': 0.1626, 'grad_norm': 0.6351872086524963, 'learning_rate': 1.941247304389227e-06, 'epoch': 2.22}
74%|███████▍ | 8513/11526 [1:28:58<30:50, 1.63it/s] 74%|███████▍ | 8514/11526 [1:28:58<30:50, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.6030579209327698, 'learning_rate': 1.940049546684958e-06, 'epoch': 2.22}
74%|███████▍ | 8514/11526 [1:28:58<30:50, 1.63it/s] 74%|███████▍ | 8515/11526 [1:28:59<30:49, 1.63it/s] {'loss': 0.1877, 'grad_norm': 0.758524477481842, 'learning_rate': 1.9388520696567524e-06, 'epoch': 2.22}
74%|███████▍ | 8515/11526 [1:28:59<30:49, 1.63it/s] 74%|███████▍ | 8516/11526 [1:28:59<30:49, 1.63it/s] {'loss': 0.1362, 'grad_norm': 0.5481259226799011, 'learning_rate': 1.937654873414452e-06, 'epoch': 2.22}
74%|███████▍ | 8516/11526 [1:29:00<30:49, 1.63it/s] 74%|███████▍ | 8517/11526 [1:29:00<30:48, 1.63it/s] {'loss': 0.1626, 'grad_norm': 0.6475717425346375, 'learning_rate': 1.9364579580678734e-06, 'epoch': 2.22}
74%|███████▍ | 8517/11526 [1:29:00<30:48, 1.63it/s] 74%|███████▍ | 8518/11526 [1:29:01<30:47, 1.63it/s] {'loss': 0.1476, 'grad_norm': 0.6134359836578369, 'learning_rate': 1.9352613237267974e-06, 'epoch': 2.22}
74%|███████▍ | 8518/11526 [1:29:01<30:47, 1.63it/s] 74%|███████▍ | 8519/11526 [1:29:01<30:46, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.8404259085655212, 'learning_rate': 1.9340649705009925e-06, 'epoch': 2.22}
74%|███████▍ | 8519/11526 [1:29:01<30:46, 1.63it/s] 74%|███████▍ | 8520/11526 [1:29:02<30:47, 1.63it/s] {'loss': 0.1715, 'grad_norm': 0.6907621622085571, 'learning_rate': 1.9328688985001948e-06, 'epoch': 2.22}
74%|███████▍ | 8520/11526 [1:29:02<30:47, 1.63it/s] 74%|███████▍ | 8521/11526 [1:29:03<30:45, 1.63it/s] {'loss': 0.1309, 'grad_norm': 0.5664962530136108, 'learning_rate': 1.931673107834111e-06, 'epoch': 2.22}
74%|███████▍ | 8521/11526 [1:29:03<30:45, 1.63it/s] 74%|███████▍ | 8522/11526 [1:29:03<30:45, 1.63it/s] {'loss': 0.1423, 'grad_norm': 0.595417857170105, 'learning_rate': 1.9304775986124275e-06, 'epoch': 2.22}
74%|███████▍ | 8522/11526 [1:29:03<30:45, 1.63it/s] 74%|███████▍ | 8523/11526 [1:29:04<30:45, 1.63it/s] {'loss': 0.1272, 'grad_norm': 0.4935171604156494, 'learning_rate': 1.929282370944804e-06, 'epoch': 2.22}
74%|███████▍ | 8523/11526 [1:29:04<30:45, 1.63it/s] 74%|███████▍ | 8524/11526 [1:29:04<30:43, 1.63it/s] {'loss': 0.1487, 'grad_norm': 0.5608553290367126, 'learning_rate': 1.928087424940873e-06, 'epoch': 2.22}
74%|███████▍ | 8524/11526 [1:29:05<30:43, 1.63it/s] 74%|███████▍ | 8525/11526 [1:29:05<30:43, 1.63it/s] {'loss': 0.1617, 'grad_norm': 0.5361735224723816, 'learning_rate': 1.926892760710244e-06, 'epoch': 2.22}
74%|███████▍ | 8525/11526 [1:29:05<30:43, 1.63it/s] 74%|███████▍ | 8526/11526 [1:29:06<30:43, 1.63it/s] {'loss': 0.1126, 'grad_norm': 0.4653090834617615, 'learning_rate': 1.925698378362494e-06, 'epoch': 2.22}
74%|███████▍ | 8526/11526 [1:29:06<30:43, 1.63it/s] 74%|███████▍ | 8527/11526 [1:29:06<30:43, 1.63it/s] {'loss': 0.1614, 'grad_norm': 0.5946760773658752, 'learning_rate': 1.9245042780071786e-06, 'epoch': 2.22}
74%|███████▍ | 8527/11526 [1:29:06<30:43, 1.63it/s] 74%|███████▍ | 8528/11526 [1:29:07<30:42, 1.63it/s] {'loss': 0.1172, 'grad_norm': 0.4819985628128052, 'learning_rate': 1.9233104597538337e-06, 'epoch': 2.22}
74%|███████▍ | 8528/11526 [1:29:07<30:42, 1.63it/s] 74%|███████▍ | 8529/11526 [1:29:07<30:41, 1.63it/s] {'loss': 0.1461, 'grad_norm': 0.5993649959564209, 'learning_rate': 1.9221169237119573e-06, 'epoch': 2.22}
74%|███████▍ | 8529/11526 [1:29:08<30:41, 1.63it/s] 74%|███████▍ | 8530/11526 [1:29:08<30:40, 1.63it/s] {'loss': 0.1486, 'grad_norm': 0.5720258951187134, 'learning_rate': 1.9209236699910293e-06, 'epoch': 2.22}
74%|███████▍ | 8530/11526 [1:29:08<30:40, 1.63it/s] 74%|███████▍ | 8531/11526 [1:29:09<30:39, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.7445575594902039, 'learning_rate': 1.9197306987005015e-06, 'epoch': 2.22}
74%|███████▍ | 8531/11526 [1:29:09<30:39, 1.63it/s] 74%|███████▍ | 8532/11526 [1:29:09<30:41, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.5943021774291992, 'learning_rate': 1.9185380099497997e-06, 'epoch': 2.22}
74%|███████▍ | 8532/11526 [1:29:09<30:41, 1.63it/s] 74%|███████▍ | 8533/11526 [1:29:10<30:40, 1.63it/s] {'loss': 0.1473, 'grad_norm': 0.5577079057693481, 'learning_rate': 1.9173456038483244e-06, 'epoch': 2.22}
74%|███████▍ | 8533/11526 [1:29:10<30:40, 1.63it/s] 74%|███████▍ | 8534/11526 [1:29:11<30:38, 1.63it/s] {'loss': 0.0986, 'grad_norm': 0.4556047320365906, 'learning_rate': 1.9161534805054523e-06, 'epoch': 2.22}
74%|███████▍ | 8534/11526 [1:29:11<30:38, 1.63it/s] 74%|███████▍ | 8535/11526 [1:29:11<30:37, 1.63it/s] {'loss': 0.1246, 'grad_norm': 0.4930405914783478, 'learning_rate': 1.914961640030527e-06, 'epoch': 2.22}
74%|███████▍ | 8535/11526 [1:29:11<30:37, 1.63it/s] 74%|███████▍ | 8536/11526 [1:29:12<30:37, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.6282840967178345, 'learning_rate': 1.913770082532873e-06, 'epoch': 2.22}
74%|███████▍ | 8536/11526 [1:29:12<30:37, 1.63it/s] 74%|███████▍ | 8537/11526 [1:29:12<30:37, 1.63it/s] {'loss': 0.1635, 'grad_norm': 0.6171577572822571, 'learning_rate': 1.912578808121787e-06, 'epoch': 2.22}
74%|███████▍ | 8537/11526 [1:29:13<30:37, 1.63it/s] 74%|███████▍ | 8538/11526 [1:29:13<30:36, 1.63it/s] {'loss': 0.135, 'grad_norm': 0.5510342121124268, 'learning_rate': 1.9113878169065403e-06, 'epoch': 2.22}
74%|███████▍ | 8538/11526 [1:29:13<30:36, 1.63it/s] 74%|███████▍ | 8539/11526 [1:29:14<30:35, 1.63it/s] {'loss': 0.1327, 'grad_norm': 0.5473871827125549, 'learning_rate': 1.910197108996377e-06, 'epoch': 2.22}
74%|███████▍ | 8539/11526 [1:29:14<30:35, 1.63it/s] 74%|███████▍ | 8540/11526 [1:29:14<30:34, 1.63it/s] {'loss': 0.1337, 'grad_norm': 0.5595588684082031, 'learning_rate': 1.909006684500513e-06, 'epoch': 2.22}
74%|███████▍ | 8540/11526 [1:29:14<30:34, 1.63it/s] 74%|███████▍ | 8541/11526 [1:29:15<30:34, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5097293257713318, 'learning_rate': 1.907816543528144e-06, 'epoch': 2.22}
74%|███████▍ | 8541/11526 [1:29:15<30:34, 1.63it/s] 74%|███████▍ | 8542/11526 [1:29:15<30:33, 1.63it/s] {'loss': 0.1366, 'grad_norm': 0.7222911715507507, 'learning_rate': 1.906626686188437e-06, 'epoch': 2.22}
74%|███████▍ | 8542/11526 [1:29:16<30:33, 1.63it/s] 74%|███████▍ | 8543/11526 [1:29:16<30:31, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.5576021671295166, 'learning_rate': 1.905437112590529e-06, 'epoch': 2.22}
74%|███████▍ | 8543/11526 [1:29:16<30:31, 1.63it/s] 74%|███████▍ | 8544/11526 [1:29:17<30:32, 1.63it/s] {'loss': 0.1316, 'grad_norm': 0.450194388628006, 'learning_rate': 1.9042478228435368e-06, 'epoch': 2.22}
74%|███████▍ | 8544/11526 [1:29:17<30:32, 1.63it/s] 74%|███████▍ | 8545/11526 [1:29:17<30:32, 1.63it/s] {'loss': 0.2015, 'grad_norm': 0.570811927318573, 'learning_rate': 1.9030588170565478e-06, 'epoch': 2.22}
74%|███████▍ | 8545/11526 [1:29:17<30:32, 1.63it/s] 74%|███████▍ | 8546/11526 [1:29:18<30:30, 1.63it/s] {'loss': 0.1283, 'grad_norm': 0.49691241979599, 'learning_rate': 1.9018700953386253e-06, 'epoch': 2.22}
74%|███████▍ | 8546/11526 [1:29:18<30:30, 1.63it/s] 74%|███████▍ | 8547/11526 [1:29:19<30:30, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.6141167283058167, 'learning_rate': 1.9006816577988075e-06, 'epoch': 2.22}
74%|███████▍ | 8547/11526 [1:29:19<30:30, 1.63it/s] 74%|███████▍ | 8548/11526 [1:29:19<30:29, 1.63it/s] {'loss': 0.1299, 'grad_norm': 0.5868151187896729, 'learning_rate': 1.8994935045461005e-06, 'epoch': 2.22}
74%|███████▍ | 8548/11526 [1:29:19<30:29, 1.63it/s] 74%|███████▍ | 8549/11526 [1:29:20<30:28, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.6574309468269348, 'learning_rate': 1.8983056356894885e-06, 'epoch': 2.23}
74%|███████▍ | 8549/11526 [1:29:20<30:28, 1.63it/s] 74%|███████▍ | 8550/11526 [1:29:20<30:27, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.5190168619155884, 'learning_rate': 1.897118051337935e-06, 'epoch': 2.23}
74%|███████▍ | 8550/11526 [1:29:20<30:27, 1.63it/s] 74%|███████▍ | 8551/11526 [1:29:21<30:28, 1.63it/s] {'loss': 0.1652, 'grad_norm': 0.6947648525238037, 'learning_rate': 1.8959307516003666e-06, 'epoch': 2.23}
74%|███████▍ | 8551/11526 [1:29:21<30:28, 1.63it/s] 74%|███████▍ | 8552/11526 [1:29:22<30:30, 1.62it/s] {'loss': 0.3024, 'grad_norm': 0.6514483094215393, 'learning_rate': 1.8947437365856914e-06, 'epoch': 2.23}
74%|███████▍ | 8552/11526 [1:29:22<30:30, 1.62it/s] 74%|███████▍ | 8553/11526 [1:29:22<30:28, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.6226493716239929, 'learning_rate': 1.8935570064027903e-06, 'epoch': 2.23}
74%|███████▍ | 8553/11526 [1:29:22<30:28, 1.63it/s] 74%|███████▍ | 8554/11526 [1:29:23<30:26, 1.63it/s] {'loss': 0.1332, 'grad_norm': 0.5718969106674194, 'learning_rate': 1.892370561160512e-06, 'epoch': 2.23}
74%|███████▍ | 8554/11526 [1:29:23<30:26, 1.63it/s] 74%|███████▍ | 8555/11526 [1:29:23<30:24, 1.63it/s] {'loss': 0.1552, 'grad_norm': 0.5283840894699097, 'learning_rate': 1.891184400967691e-06, 'epoch': 2.23}
74%|███████▍ | 8555/11526 [1:29:24<30:24, 1.63it/s] 74%|███████▍ | 8556/11526 [1:29:24<30:25, 1.63it/s] {'loss': 0.1499, 'grad_norm': 0.5646435618400574, 'learning_rate': 1.8899985259331238e-06, 'epoch': 2.23}
74%|███████▍ | 8556/11526 [1:29:24<30:25, 1.63it/s] 74%|███████▍ | 8557/11526 [1:29:25<30:25, 1.63it/s] {'loss': 0.1909, 'grad_norm': 0.7117833495140076, 'learning_rate': 1.8888129361655866e-06, 'epoch': 2.23}
74%|███████▍ | 8557/11526 [1:29:25<30:25, 1.63it/s] 74%|███████▍ | 8558/11526 [1:29:25<30:23, 1.63it/s] {'loss': 0.1918, 'grad_norm': 0.7893514633178711, 'learning_rate': 1.887627631773829e-06, 'epoch': 2.23}
74%|███████▍ | 8558/11526 [1:29:25<30:23, 1.63it/s] 74%|███████▍ | 8559/11526 [1:29:26<30:23, 1.63it/s] {'loss': 0.1327, 'grad_norm': 0.5011346340179443, 'learning_rate': 1.8864426128665736e-06, 'epoch': 2.23}
74%|███████▍ | 8559/11526 [1:29:26<30:23, 1.63it/s] 74%|███████▍ | 8560/11526 [1:29:27<30:23, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.5151160359382629, 'learning_rate': 1.8852578795525172e-06, 'epoch': 2.23}
74%|███████▍ | 8560/11526 [1:29:27<30:23, 1.63it/s] 74%|███████▍ | 8561/11526 [1:29:27<30:23, 1.63it/s] {'loss': 0.1119, 'grad_norm': 0.4462321400642395, 'learning_rate': 1.8840734319403325e-06, 'epoch': 2.23}
74%|███████▍ | 8561/11526 [1:29:27<30:23, 1.63it/s] 74%|███████▍ | 8562/11526 [1:29:28<30:24, 1.62it/s] {'loss': 0.152, 'grad_norm': 0.5768293142318726, 'learning_rate': 1.8828892701386574e-06, 'epoch': 2.23}
74%|███████▍ | 8562/11526 [1:29:28<30:24, 1.62it/s] 74%|███████▍ | 8563/11526 [1:29:28<30:22, 1.63it/s] {'loss': 0.1463, 'grad_norm': 0.5327551960945129, 'learning_rate': 1.8817053942561176e-06, 'epoch': 2.23}
74%|███████▍ | 8563/11526 [1:29:28<30:22, 1.63it/s] 74%|███████▍ | 8564/11526 [1:29:29<30:20, 1.63it/s] {'loss': 0.142, 'grad_norm': 0.576043426990509, 'learning_rate': 1.8805218044012996e-06, 'epoch': 2.23}
74%|███████▍ | 8564/11526 [1:29:29<30:20, 1.63it/s] 74%|███████▍ | 8565/11526 [1:29:30<30:21, 1.63it/s] {'loss': 0.1278, 'grad_norm': 0.5622308254241943, 'learning_rate': 1.8793385006827713e-06, 'epoch': 2.23}
74%|███████▍ | 8565/11526 [1:29:30<30:21, 1.63it/s] 74%|███████▍ | 8566/11526 [1:29:30<30:21, 1.62it/s] {'loss': 0.1861, 'grad_norm': 0.7252774834632874, 'learning_rate': 1.8781554832090726e-06, 'epoch': 2.23}
74%|███████▍ | 8566/11526 [1:29:30<30:21, 1.62it/s] 74%|███████▍ | 8567/11526 [1:29:31<30:23, 1.62it/s] {'loss': 0.1506, 'grad_norm': 0.5273796319961548, 'learning_rate': 1.8769727520887128e-06, 'epoch': 2.23}
74%|███████▍ | 8567/11526 [1:29:31<30:23, 1.62it/s] 74%|███████▍ | 8568/11526 [1:29:31<30:21, 1.62it/s] {'loss': 0.1567, 'grad_norm': 0.6240941882133484, 'learning_rate': 1.8757903074301826e-06, 'epoch': 2.23}
74%|███████▍ | 8568/11526 [1:29:32<30:21, 1.62it/s] 74%|███████▍ | 8569/11526 [1:29:32<30:19, 1.62it/s] {'loss': 0.129, 'grad_norm': 0.5244702696800232, 'learning_rate': 1.8746081493419432e-06, 'epoch': 2.23}
74%|███████▍ | 8569/11526 [1:29:32<30:19, 1.62it/s] 74%|███████▍ | 8570/11526 [1:29:33<30:17, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.6276236772537231, 'learning_rate': 1.8734262779324258e-06, 'epoch': 2.23}
74%|███████▍ | 8570/11526 [1:29:33<30:17, 1.63it/s] 74%|███████▍ | 8571/11526 [1:29:33<30:16, 1.63it/s] {'loss': 0.1628, 'grad_norm': 0.6007559895515442, 'learning_rate': 1.8722446933100391e-06, 'epoch': 2.23}
74%|███████▍ | 8571/11526 [1:29:33<30:16, 1.63it/s] 74%|███████▍ | 8572/11526 [1:29:34<30:17, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.5560280084609985, 'learning_rate': 1.871063395583166e-06, 'epoch': 2.23}
74%|███████▍ | 8572/11526 [1:29:34<30:17, 1.63it/s] 74%|███████▍ | 8573/11526 [1:29:35<30:16, 1.63it/s] {'loss': 0.1405, 'grad_norm': 0.5351617336273193, 'learning_rate': 1.8698823848601604e-06, 'epoch': 2.23}
74%|███████▍ | 8573/11526 [1:29:35<30:16, 1.63it/s] 74%|███████▍ | 8574/11526 [1:29:35<30:14, 1.63it/s] {'loss': 0.1564, 'grad_norm': 0.5572490096092224, 'learning_rate': 1.8687016612493542e-06, 'epoch': 2.23}
74%|███████▍ | 8574/11526 [1:29:35<30:14, 1.63it/s] 74%|███████▍ | 8575/11526 [1:29:36<30:13, 1.63it/s] {'loss': 0.1015, 'grad_norm': 0.41605880856513977, 'learning_rate': 1.8675212248590462e-06, 'epoch': 2.23}
74%|███████▍ | 8575/11526 [1:29:36<30:13, 1.63it/s] 74%|███████▍ | 8576/11526 [1:29:36<30:12, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.5599823594093323, 'learning_rate': 1.8663410757975131e-06, 'epoch': 2.23}
74%|███████▍ | 8576/11526 [1:29:36<30:12, 1.63it/s] 74%|███████▍ | 8577/11526 [1:29:37<30:14, 1.62it/s] {'loss': 0.1891, 'grad_norm': 0.6589640378952026, 'learning_rate': 1.865161214173009e-06, 'epoch': 2.23}
74%|███████▍ | 8577/11526 [1:29:37<30:14, 1.62it/s] 74%|███████▍ | 8578/11526 [1:29:38<30:11, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.6179867386817932, 'learning_rate': 1.8639816400937538e-06, 'epoch': 2.23}
74%|███████▍ | 8578/11526 [1:29:38<30:11, 1.63it/s] 74%|███████▍ | 8579/11526 [1:29:38<30:11, 1.63it/s] {'loss': 0.1389, 'grad_norm': 0.48194828629493713, 'learning_rate': 1.8628023536679458e-06, 'epoch': 2.23}
74%|███████▍ | 8579/11526 [1:29:38<30:11, 1.63it/s] 74%|███████▍ | 8580/11526 [1:29:39<30:10, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.5579119324684143, 'learning_rate': 1.8616233550037555e-06, 'epoch': 2.23}
74%|███████▍ | 8580/11526 [1:29:39<30:10, 1.63it/s] 74%|███████▍ | 8581/11526 [1:29:39<30:09, 1.63it/s] {'loss': 0.1709, 'grad_norm': 0.6641929149627686, 'learning_rate': 1.8604446442093276e-06, 'epoch': 2.23}
74%|███████▍ | 8581/11526 [1:29:40<30:09, 1.63it/s] 74%|███████▍ | 8582/11526 [1:29:40<30:10, 1.63it/s] {'loss': 0.1421, 'grad_norm': 0.5318847894668579, 'learning_rate': 1.8592662213927826e-06, 'epoch': 2.23}
74%|███████▍ | 8582/11526 [1:29:40<30:10, 1.63it/s] 74%|███████▍ | 8583/11526 [1:29:41<30:08, 1.63it/s] {'loss': 0.1056, 'grad_norm': 0.5149203538894653, 'learning_rate': 1.8580880866622074e-06, 'epoch': 2.23}
74%|███████▍ | 8583/11526 [1:29:41<30:08, 1.63it/s] 74%|███████▍ | 8584/11526 [1:29:41<30:07, 1.63it/s] {'loss': 0.1047, 'grad_norm': 0.43430978059768677, 'learning_rate': 1.85691024012567e-06, 'epoch': 2.23}
74%|███████▍ | 8584/11526 [1:29:41<30:07, 1.63it/s] 74%|███████▍ | 8585/11526 [1:29:42<30:07, 1.63it/s] {'loss': 0.1346, 'grad_norm': 0.5482231378555298, 'learning_rate': 1.855732681891209e-06, 'epoch': 2.23}
74%|███████▍ | 8585/11526 [1:29:42<30:07, 1.63it/s] 74%|███████▍ | 8586/11526 [1:29:43<30:07, 1.63it/s] {'loss': 0.1231, 'grad_norm': 0.501240074634552, 'learning_rate': 1.8545554120668362e-06, 'epoch': 2.23}
74%|███████▍ | 8586/11526 [1:29:43<30:07, 1.63it/s] 75%|███████▍ | 8587/11526 [1:29:43<30:06, 1.63it/s] {'loss': 0.177, 'grad_norm': 0.6311327815055847, 'learning_rate': 1.853378430760538e-06, 'epoch': 2.24}
75%|███████▍ | 8587/11526 [1:29:43<30:06, 1.63it/s] 75%|███████▍ | 8588/11526 [1:29:44<30:04, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.6143361926078796, 'learning_rate': 1.8522017380802754e-06, 'epoch': 2.24}
75%|███████▍ | 8588/11526 [1:29:44<30:04, 1.63it/s] 75%|███████▍ | 8589/11526 [1:29:44<30:04, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.6226780414581299, 'learning_rate': 1.8510253341339762e-06, 'epoch': 2.24}
75%|███████▍ | 8589/11526 [1:29:44<30:04, 1.63it/s] 75%|███████▍ | 8590/11526 [1:29:45<30:03, 1.63it/s] {'loss': 0.213, 'grad_norm': 0.8026469945907593, 'learning_rate': 1.8498492190295536e-06, 'epoch': 2.24}
75%|███████▍ | 8590/11526 [1:29:45<30:03, 1.63it/s] 75%|███████▍ | 8591/11526 [1:29:46<30:03, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.5349567532539368, 'learning_rate': 1.8486733928748824e-06, 'epoch': 2.24}
75%|███████▍ | 8591/11526 [1:29:46<30:03, 1.63it/s] 75%|███████▍ | 8592/11526 [1:29:46<30:02, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.7034109830856323, 'learning_rate': 1.8474978557778183e-06, 'epoch': 2.24}
75%|███████▍ | 8592/11526 [1:29:46<30:02, 1.63it/s] 75%|███████▍ | 8593/11526 [1:29:47<30:02, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.5738634467124939, 'learning_rate': 1.8463226078461876e-06, 'epoch': 2.24}
75%|███████▍ | 8593/11526 [1:29:47<30:02, 1.63it/s] 75%|███████▍ | 8594/11526 [1:29:47<30:03, 1.63it/s] {'loss': 0.1441, 'grad_norm': 0.5624711513519287, 'learning_rate': 1.8451476491877913e-06, 'epoch': 2.24}
75%|███████▍ | 8594/11526 [1:29:48<30:03, 1.63it/s] 75%|███████▍ | 8595/11526 [1:29:48<30:01, 1.63it/s] {'loss': 0.15, 'grad_norm': 0.5477032661437988, 'learning_rate': 1.8439729799104022e-06, 'epoch': 2.24}
75%|███████▍ | 8595/11526 [1:29:48<30:01, 1.63it/s] 75%|███████▍ | 8596/11526 [1:29:49<30:00, 1.63it/s] {'loss': 0.1614, 'grad_norm': 0.565072774887085, 'learning_rate': 1.8427986001217713e-06, 'epoch': 2.24}
75%|███████▍ | 8596/11526 [1:29:49<30:00, 1.63it/s] 75%|███████▍ | 8597/11526 [1:29:49<30:02, 1.63it/s] {'loss': 0.1592, 'grad_norm': 0.6138553619384766, 'learning_rate': 1.8416245099296137e-06, 'epoch': 2.24}
75%|███████▍ | 8597/11526 [1:29:49<30:02, 1.63it/s] 75%|███████▍ | 8598/11526 [1:29:50<30:00, 1.63it/s] {'loss': 0.1796, 'grad_norm': 0.7372974157333374, 'learning_rate': 1.8404507094416273e-06, 'epoch': 2.24}
75%|███████▍ | 8598/11526 [1:29:50<30:00, 1.63it/s] 75%|███████▍ | 8599/11526 [1:29:50<29:59, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.6308121085166931, 'learning_rate': 1.8392771987654785e-06, 'epoch': 2.24}
75%|███████▍ | 8599/11526 [1:29:51<29:59, 1.63it/s] 75%|███████▍ | 8600/11526 [1:29:51<29:58, 1.63it/s] {'loss': 0.1951, 'grad_norm': 0.7891608476638794, 'learning_rate': 1.838103978008809e-06, 'epoch': 2.24}
75%|███████▍ | 8600/11526 [1:29:51<29:58, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.5459461808204651, 'eval_runtime': 1.9534, 'eval_samples_per_second': 102.387, 'eval_steps_per_second': 6.655, 'epoch': 2.24}
75%|███████▍ | 8600/11526 [1:29:53<29:58, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 75%|███████▍ | 8601/11526 [1:29:54<58:35, 1.20s/it] {'loss': 0.1591, 'grad_norm': 0.6245896816253662, 'learning_rate': 1.8369310472792329e-06, 'epoch': 2.24}
75%|███████▍ | 8601/11526 [1:29:54<58:35, 1.20s/it] 75%|███████▍ | 8602/11526 [1:29:54<49:57, 1.03s/it] {'loss': 0.168, 'grad_norm': 0.668510913848877, 'learning_rate': 1.83575840668434e-06, 'epoch': 2.24}
75%|███████▍ | 8602/11526 [1:29:54<49:57, 1.03s/it] 75%|███████▍ | 8603/11526 [1:29:55<43:55, 1.11it/s] {'loss': 0.1492, 'grad_norm': 0.5749463438987732, 'learning_rate': 1.8345860563316865e-06, 'epoch': 2.24}
75%|███████▍ | 8603/11526 [1:29:55<43:55, 1.11it/s] 75%|███████▍ | 8604/11526 [1:29:56<39:43, 1.23it/s] {'loss': 0.1243, 'grad_norm': 0.5680091977119446, 'learning_rate': 1.8334139963288138e-06, 'epoch': 2.24}
75%|███████▍ | 8604/11526 [1:29:56<39:43, 1.23it/s] 75%|███████▍ | 8605/11526 [1:29:56<36:47, 1.32it/s] {'loss': 0.1709, 'grad_norm': 0.6319881677627563, 'learning_rate': 1.8322422267832246e-06, 'epoch': 2.24}
75%|███████▍ | 8605/11526 [1:29:56<36:47, 1.32it/s] 75%|███████▍ | 8606/11526 [1:29:57<34:43, 1.40it/s] {'loss': 0.1363, 'grad_norm': 0.5302412509918213, 'learning_rate': 1.831070747802402e-06, 'epoch': 2.24}
75%|███████▍ | 8606/11526 [1:29:57<34:43, 1.40it/s] 75%|███████▍ | 8607/11526 [1:29:57<33:18, 1.46it/s] {'loss': 0.1573, 'grad_norm': 0.641804575920105, 'learning_rate': 1.8298995594938012e-06, 'epoch': 2.24}
75%|███████▍ | 8607/11526 [1:29:57<33:18, 1.46it/s] 75%|███████▍ | 8608/11526 [1:29:58<32:16, 1.51it/s] {'loss': 0.1436, 'grad_norm': 0.5695658922195435, 'learning_rate': 1.8287286619648504e-06, 'epoch': 2.24}
75%|███████▍ | 8608/11526 [1:29:58<32:16, 1.51it/s] 75%|███████▍ | 8609/11526 [1:29:59<31:32, 1.54it/s] {'loss': 0.1367, 'grad_norm': 0.5249311923980713, 'learning_rate': 1.82755805532295e-06, 'epoch': 2.24}
75%|███████▍ | 8609/11526 [1:29:59<31:32, 1.54it/s] 75%|███████▍ | 8610/11526 [1:29:59<31:01, 1.57it/s] {'loss': 0.1205, 'grad_norm': 0.5002131462097168, 'learning_rate': 1.8263877396754765e-06, 'epoch': 2.24}
75%|███████▍ | 8610/11526 [1:29:59<31:01, 1.57it/s] 75%|███████▍ | 8611/11526 [1:30:00<30:39, 1.58it/s] {'loss': 0.1399, 'grad_norm': 0.5214263796806335, 'learning_rate': 1.825217715129774e-06, 'epoch': 2.24}
75%|███████▍ | 8611/11526 [1:30:00<30:39, 1.58it/s] 75%|███████▍ | 8612/11526 [1:30:00<30:33, 1.59it/s] {'loss': 0.124, 'grad_norm': 0.48691102862358093, 'learning_rate': 1.8240479817931694e-06, 'epoch': 2.24}
75%|███████▍ | 8612/11526 [1:30:01<30:33, 1.59it/s] 75%|███████▍ | 8613/11526 [1:30:01<30:19, 1.60it/s] {'loss': 0.1658, 'grad_norm': 0.5573247075080872, 'learning_rate': 1.8228785397729526e-06, 'epoch': 2.24}
75%|███████▍ | 8613/11526 [1:30:01<30:19, 1.60it/s] 75%|███████▍ | 8614/11526 [1:30:02<30:09, 1.61it/s] {'loss': 0.1886, 'grad_norm': 0.5299180150032043, 'learning_rate': 1.8217093891763933e-06, 'epoch': 2.24}
75%|███████▍ | 8614/11526 [1:30:02<30:09, 1.61it/s] 75%|███████▍ | 8615/11526 [1:30:02<30:03, 1.61it/s] {'loss': 0.1697, 'grad_norm': 0.6093220114707947, 'learning_rate': 1.8205405301107343e-06, 'epoch': 2.24}
75%|███████▍ | 8615/11526 [1:30:02<30:03, 1.61it/s] 75%|███████▍ | 8616/11526 [1:30:03<29:58, 1.62it/s] {'loss': 0.1383, 'grad_norm': 0.5669288039207458, 'learning_rate': 1.8193719626831846e-06, 'epoch': 2.24}
75%|███████▍ | 8616/11526 [1:30:03<29:58, 1.62it/s] 75%|███████▍ | 8617/11526 [1:30:04<29:57, 1.62it/s] {'loss': 0.1039, 'grad_norm': 0.5215912461280823, 'learning_rate': 1.8182036870009378e-06, 'epoch': 2.24}
75%|███████▍ | 8617/11526 [1:30:04<29:57, 1.62it/s] 75%|███████▍ | 8618/11526 [1:30:04<29:54, 1.62it/s] {'loss': 0.1642, 'grad_norm': 0.599324107170105, 'learning_rate': 1.8170357031711538e-06, 'epoch': 2.24}
75%|███████▍ | 8618/11526 [1:30:04<29:54, 1.62it/s] 75%|███████▍ | 8619/11526 [1:30:05<29:50, 1.62it/s] {'loss': 0.141, 'grad_norm': 0.5692152380943298, 'learning_rate': 1.8158680113009637e-06, 'epoch': 2.24}
75%|███████▍ | 8619/11526 [1:30:05<29:50, 1.62it/s] 75%|███████▍ | 8620/11526 [1:30:05<29:49, 1.62it/s] {'loss': 0.1301, 'grad_norm': 0.5144198536872864, 'learning_rate': 1.8147006114974764e-06, 'epoch': 2.24}
75%|███████▍ | 8620/11526 [1:30:06<29:49, 1.62it/s] 75%|███████▍ | 8621/11526 [1:30:06<29:47, 1.63it/s] {'loss': 0.1281, 'grad_norm': 0.5797700881958008, 'learning_rate': 1.8135335038677731e-06, 'epoch': 2.24}
75%|███████▍ | 8621/11526 [1:30:06<29:47, 1.63it/s] 75%|███████▍ | 8622/11526 [1:30:07<29:52, 1.62it/s] {'loss': 0.1604, 'grad_norm': 0.6062191128730774, 'learning_rate': 1.8123666885189066e-06, 'epoch': 2.24}
75%|███████▍ | 8622/11526 [1:30:07<29:52, 1.62it/s] 75%|███████▍ | 8623/11526 [1:30:07<29:48, 1.62it/s] {'loss': 0.1629, 'grad_norm': 0.5533252358436584, 'learning_rate': 1.8112001655579065e-06, 'epoch': 2.24}
75%|███████▍ | 8623/11526 [1:30:07<29:48, 1.62it/s] 75%|███████▍ | 8624/11526 [1:30:08<29:46, 1.62it/s] {'loss': 0.1237, 'grad_norm': 0.5043208599090576, 'learning_rate': 1.8100339350917684e-06, 'epoch': 2.24}
75%|███████▍ | 8624/11526 [1:30:08<29:46, 1.62it/s] 75%|███████▍ | 8625/11526 [1:30:08<29:44, 1.63it/s] {'loss': 0.1588, 'grad_norm': 0.5204381346702576, 'learning_rate': 1.808867997227467e-06, 'epoch': 2.24}
75%|███████▍ | 8625/11526 [1:30:09<29:44, 1.63it/s] 75%|███████▍ | 8626/11526 [1:30:09<29:42, 1.63it/s] {'loss': 0.1444, 'grad_norm': 0.5572690963745117, 'learning_rate': 1.8077023520719522e-06, 'epoch': 2.25}
75%|███████▍ | 8626/11526 [1:30:09<29:42, 1.63it/s] 75%|███████▍ | 8627/11526 [1:30:10<29:44, 1.62it/s] {'loss': 0.1381, 'grad_norm': 0.5013654232025146, 'learning_rate': 1.8065369997321402e-06, 'epoch': 2.25}
75%|███████▍ | 8627/11526 [1:30:10<29:44, 1.62it/s] 75%|███████▍ | 8628/11526 [1:30:10<29:42, 1.63it/s] {'loss': 0.1483, 'grad_norm': 0.580693244934082, 'learning_rate': 1.8053719403149245e-06, 'epoch': 2.25}
75%|███████▍ | 8628/11526 [1:30:10<29:42, 1.63it/s] 75%|███████▍ | 8629/11526 [1:30:11<29:40, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.6440860629081726, 'learning_rate': 1.8042071739271704e-06, 'epoch': 2.25}
75%|███████▍ | 8629/11526 [1:30:11<29:40, 1.63it/s] 75%|███████▍ | 8630/11526 [1:30:12<29:39, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.5200802087783813, 'learning_rate': 1.8030427006757184e-06, 'epoch': 2.25}
75%|███████▍ | 8630/11526 [1:30:12<29:39, 1.63it/s] 75%|███████▍ | 8631/11526 [1:30:12<29:38, 1.63it/s] {'loss': 0.1366, 'grad_norm': 0.4959962069988251, 'learning_rate': 1.8018785206673811e-06, 'epoch': 2.25}
75%|███████▍ | 8631/11526 [1:30:12<29:38, 1.63it/s] 75%|███████▍ | 8632/11526 [1:30:13<29:40, 1.63it/s] {'loss': 0.1314, 'grad_norm': 0.4742683172225952, 'learning_rate': 1.800714634008941e-06, 'epoch': 2.25}
75%|███████▍ | 8632/11526 [1:30:13<29:40, 1.63it/s] 75%|███████▍ | 8633/11526 [1:30:13<29:38, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.5472671985626221, 'learning_rate': 1.7995510408071582e-06, 'epoch': 2.25}
75%|███████▍ | 8633/11526 [1:30:13<29:38, 1.63it/s] 75%|███████▍ | 8634/11526 [1:30:14<29:37, 1.63it/s] {'loss': 0.1636, 'grad_norm': 0.6458108425140381, 'learning_rate': 1.7983877411687627e-06, 'epoch': 2.25}
75%|███████▍ | 8634/11526 [1:30:14<29:37, 1.63it/s] 75%|███████▍ | 8635/11526 [1:30:15<29:35, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.6678019762039185, 'learning_rate': 1.7972247352004607e-06, 'epoch': 2.25}
75%|███████▍ | 8635/11526 [1:30:15<29:35, 1.63it/s] 75%|███████▍ | 8636/11526 [1:30:15<29:35, 1.63it/s] {'loss': 0.2098, 'grad_norm': 0.7481626272201538, 'learning_rate': 1.7960620230089292e-06, 'epoch': 2.25}
75%|███████▍ | 8636/11526 [1:30:15<29:35, 1.63it/s] 75%|███████▍ | 8637/11526 [1:30:16<29:37, 1.63it/s] {'loss': 0.2606, 'grad_norm': 1.0377216339111328, 'learning_rate': 1.7948996047008198e-06, 'epoch': 2.25}
75%|███████▍ | 8637/11526 [1:30:16<29:37, 1.63it/s] 75%|███████▍ | 8638/11526 [1:30:16<29:35, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5988994240760803, 'learning_rate': 1.7937374803827518e-06, 'epoch': 2.25}
75%|███████▍ | 8638/11526 [1:30:17<29:35, 1.63it/s] 75%|███████▍ | 8639/11526 [1:30:17<29:34, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.5687957406044006, 'learning_rate': 1.7925756501613284e-06, 'epoch': 2.25}
75%|███████▍ | 8639/11526 [1:30:17<29:34, 1.63it/s] 75%|███████▍ | 8640/11526 [1:30:18<29:32, 1.63it/s] {'loss': 0.1461, 'grad_norm': 0.617480993270874, 'learning_rate': 1.7914141141431141e-06, 'epoch': 2.25}
75%|███████▍ | 8640/11526 [1:30:18<29:32, 1.63it/s] 75%|███████▍ | 8641/11526 [1:30:18<29:32, 1.63it/s] {'loss': 0.1963, 'grad_norm': 0.6989611983299255, 'learning_rate': 1.7902528724346534e-06, 'epoch': 2.25}
75%|███████▍ | 8641/11526 [1:30:18<29:32, 1.63it/s] 75%|███████▍ | 8642/11526 [1:30:19<29:34, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.58207768201828, 'learning_rate': 1.7890919251424637e-06, 'epoch': 2.25}
75%|███████▍ | 8642/11526 [1:30:19<29:34, 1.63it/s] 75%|███████▍ | 8643/11526 [1:30:20<29:33, 1.63it/s] {'loss': 0.1622, 'grad_norm': 0.5783748626708984, 'learning_rate': 1.7879312723730284e-06, 'epoch': 2.25}
75%|███████▍ | 8643/11526 [1:30:20<29:33, 1.63it/s] 75%|███████▍ | 8644/11526 [1:30:20<29:31, 1.63it/s] {'loss': 0.1617, 'grad_norm': 0.5708690285682678, 'learning_rate': 1.7867709142328143e-06, 'epoch': 2.25}
75%|███████▍ | 8644/11526 [1:30:20<29:31, 1.63it/s] 75%|███████▌ | 8645/11526 [1:30:21<29:30, 1.63it/s] {'loss': 0.1597, 'grad_norm': 0.5893325805664062, 'learning_rate': 1.7856108508282566e-06, 'epoch': 2.25}
75%|███████▌ | 8645/11526 [1:30:21<29:30, 1.63it/s] 75%|███████▌ | 8646/11526 [1:30:21<29:29, 1.63it/s] {'loss': 0.1573, 'grad_norm': 0.6776777505874634, 'learning_rate': 1.784451082265759e-06, 'epoch': 2.25}
75%|███████▌ | 8646/11526 [1:30:21<29:29, 1.63it/s] 75%|███████▌ | 8647/11526 [1:30:22<29:29, 1.63it/s] {'loss': 0.1118, 'grad_norm': 0.45135125517845154, 'learning_rate': 1.7832916086517033e-06, 'epoch': 2.25}
75%|███████▌ | 8647/11526 [1:30:22<29:29, 1.63it/s] 75%|███████▌ | 8648/11526 [1:30:23<29:29, 1.63it/s] {'loss': 0.1431, 'grad_norm': 0.6715752482414246, 'learning_rate': 1.7821324300924443e-06, 'epoch': 2.25}
75%|███████▌ | 8648/11526 [1:30:23<29:29, 1.63it/s] 75%|███████▌ | 8649/11526 [1:30:23<29:28, 1.63it/s] {'loss': 0.1693, 'grad_norm': 0.6128302812576294, 'learning_rate': 1.7809735466943073e-06, 'epoch': 2.25}
75%|███████▌ | 8649/11526 [1:30:23<29:28, 1.63it/s] 75%|███████▌ | 8650/11526 [1:30:24<29:27, 1.63it/s] {'loss': 0.1416, 'grad_norm': 0.5316421985626221, 'learning_rate': 1.7798149585635938e-06, 'epoch': 2.25}
75%|███████▌ | 8650/11526 [1:30:24<29:27, 1.63it/s] 75%|███████▌ | 8651/11526 [1:30:24<29:28, 1.63it/s] {'loss': 0.1406, 'grad_norm': 0.5814611315727234, 'learning_rate': 1.7786566658065723e-06, 'epoch': 2.25}
75%|███████▌ | 8651/11526 [1:30:25<29:28, 1.63it/s] 75%|███████▌ | 8652/11526 [1:30:25<29:29, 1.62it/s] {'loss': 0.1537, 'grad_norm': 0.6084991097450256, 'learning_rate': 1.7774986685294894e-06, 'epoch': 2.25}
75%|███████▌ | 8652/11526 [1:30:25<29:29, 1.62it/s] 75%|███████▌ | 8653/11526 [1:30:26<29:29, 1.62it/s] {'loss': 0.1365, 'grad_norm': 0.5112936496734619, 'learning_rate': 1.7763409668385666e-06, 'epoch': 2.25}
75%|███████▌ | 8653/11526 [1:30:26<29:29, 1.62it/s] 75%|███████▌ | 8654/11526 [1:30:26<29:27, 1.62it/s] {'loss': 0.1385, 'grad_norm': 0.6009647250175476, 'learning_rate': 1.7751835608399903e-06, 'epoch': 2.25}
75%|███████▌ | 8654/11526 [1:30:26<29:27, 1.62it/s] 75%|███████▌ | 8655/11526 [1:30:27<29:25, 1.63it/s] {'loss': 0.1884, 'grad_norm': 0.7109799385070801, 'learning_rate': 1.774026450639927e-06, 'epoch': 2.25}
75%|███████▌ | 8655/11526 [1:30:27<29:25, 1.63it/s] 75%|███████▌ | 8656/11526 [1:30:28<29:24, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.5477400422096252, 'learning_rate': 1.772869636344512e-06, 'epoch': 2.25}
75%|███████▌ | 8656/11526 [1:30:28<29:24, 1.63it/s] 75%|███████▌ | 8657/11526 [1:30:28<29:25, 1.62it/s] {'loss': 0.1138, 'grad_norm': 0.4510226547718048, 'learning_rate': 1.7717131180598556e-06, 'epoch': 2.25}
75%|███████▌ | 8657/11526 [1:30:28<29:25, 1.62it/s] 75%|███████▌ | 8658/11526 [1:30:29<29:23, 1.63it/s] {'loss': 0.2002, 'grad_norm': 0.6530742049217224, 'learning_rate': 1.7705568958920426e-06, 'epoch': 2.25}
75%|███████▌ | 8658/11526 [1:30:29<29:23, 1.63it/s] 75%|███████▌ | 8659/11526 [1:30:29<29:22, 1.63it/s] {'loss': 0.1357, 'grad_norm': 0.5534534454345703, 'learning_rate': 1.7694009699471238e-06, 'epoch': 2.25}
75%|███████▌ | 8659/11526 [1:30:29<29:22, 1.63it/s] 75%|███████▌ | 8660/11526 [1:30:30<29:21, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.5841659307479858, 'learning_rate': 1.7682453403311272e-06, 'epoch': 2.25}
75%|███████▌ | 8660/11526 [1:30:30<29:21, 1.63it/s] 75%|███████▌ | 8661/11526 [1:30:31<29:20, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.5497185587882996, 'learning_rate': 1.7670900071500596e-06, 'epoch': 2.25}
75%|███████▌ | 8661/11526 [1:30:31<29:20, 1.63it/s] 75%|███████▌ | 8662/11526 [1:30:31<29:21, 1.63it/s] {'loss': 0.1142, 'grad_norm': 0.5362561941146851, 'learning_rate': 1.76593497050989e-06, 'epoch': 2.25}
75%|███████▌ | 8662/11526 [1:30:31<29:21, 1.63it/s] 75%|███████▌ | 8663/11526 [1:30:32<29:20, 1.63it/s] {'loss': 0.1814, 'grad_norm': 0.6205732226371765, 'learning_rate': 1.764780230516565e-06, 'epoch': 2.25}
75%|███████▌ | 8663/11526 [1:30:32<29:20, 1.63it/s] 75%|███████▌ | 8664/11526 [1:30:32<29:19, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.5571140646934509, 'learning_rate': 1.7636257872760072e-06, 'epoch': 2.26}
75%|███████▌ | 8664/11526 [1:30:33<29:19, 1.63it/s] 75%|███████▌ | 8665/11526 [1:30:33<29:17, 1.63it/s] {'loss': 0.1397, 'grad_norm': 0.46645689010620117, 'learning_rate': 1.7624716408941029e-06, 'epoch': 2.26}
75%|███████▌ | 8665/11526 [1:30:33<29:17, 1.63it/s] 75%|███████▌ | 8666/11526 [1:30:34<29:17, 1.63it/s] {'loss': 0.2667, 'grad_norm': 0.5412365794181824, 'learning_rate': 1.7613177914767237e-06, 'epoch': 2.26}
75%|███████▌ | 8666/11526 [1:30:34<29:17, 1.63it/s] 75%|███████▌ | 8667/11526 [1:30:34<29:20, 1.62it/s] {'loss': 0.1809, 'grad_norm': 0.6731945872306824, 'learning_rate': 1.7601642391297024e-06, 'epoch': 2.26}
75%|███████▌ | 8667/11526 [1:30:34<29:20, 1.62it/s] 75%|███████▌ | 8668/11526 [1:30:35<29:19, 1.62it/s] {'loss': 0.1885, 'grad_norm': 0.6154832243919373, 'learning_rate': 1.759010983958851e-06, 'epoch': 2.26}
75%|███████▌ | 8668/11526 [1:30:35<29:19, 1.62it/s] 75%|███████▌ | 8669/11526 [1:30:36<29:17, 1.63it/s] {'loss': 0.1567, 'grad_norm': 0.5714876651763916, 'learning_rate': 1.7578580260699524e-06, 'epoch': 2.26}
75%|███████▌ | 8669/11526 [1:30:36<29:17, 1.63it/s] 75%|███████▌ | 8670/11526 [1:30:36<29:16, 1.63it/s] {'loss': 0.1457, 'grad_norm': 0.5784781575202942, 'learning_rate': 1.7567053655687633e-06, 'epoch': 2.26}
75%|███████▌ | 8670/11526 [1:30:36<29:16, 1.63it/s] 75%|███████▌ | 8671/11526 [1:30:37<29:14, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.5161809325218201, 'learning_rate': 1.7555530025610113e-06, 'epoch': 2.26}
75%|███████▌ | 8671/11526 [1:30:37<29:14, 1.63it/s] 75%|███████▌ | 8672/11526 [1:30:37<29:17, 1.62it/s] {'loss': 0.1394, 'grad_norm': 0.4932239055633545, 'learning_rate': 1.7544009371523996e-06, 'epoch': 2.26}
75%|███████▌ | 8672/11526 [1:30:37<29:17, 1.62it/s] 75%|███████▌ | 8673/11526 [1:30:38<29:15, 1.63it/s] {'loss': 0.1655, 'grad_norm': 0.676135241985321, 'learning_rate': 1.7532491694485988e-06, 'epoch': 2.26}
75%|███████▌ | 8673/11526 [1:30:38<29:15, 1.63it/s] 75%|███████▌ | 8674/11526 [1:30:39<29:13, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.5618002414703369, 'learning_rate': 1.752097699555258e-06, 'epoch': 2.26}
75%|███████▌ | 8674/11526 [1:30:39<29:13, 1.63it/s] 75%|███████▌ | 8675/11526 [1:30:39<29:12, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.6216199994087219, 'learning_rate': 1.7509465275779953e-06, 'epoch': 2.26}
75%|███████▌ | 8675/11526 [1:30:39<29:12, 1.63it/s] 75%|███████▌ | 8676/11526 [1:30:40<29:14, 1.62it/s] {'loss': 0.1776, 'grad_norm': 0.6876410841941833, 'learning_rate': 1.7497956536224036e-06, 'epoch': 2.26}
75%|███████▌ | 8676/11526 [1:30:40<29:14, 1.62it/s] 75%|███████▌ | 8677/11526 [1:30:40<29:21, 1.62it/s] {'loss': 0.1663, 'grad_norm': 0.5140557885169983, 'learning_rate': 1.7486450777940479e-06, 'epoch': 2.26}
75%|███████▌ | 8677/11526 [1:30:41<29:21, 1.62it/s] 75%|███████▌ | 8678/11526 [1:30:41<29:18, 1.62it/s] {'loss': 0.1442, 'grad_norm': 0.5673971772193909, 'learning_rate': 1.7474948001984649e-06, 'epoch': 2.26}
75%|███████▌ | 8678/11526 [1:30:41<29:18, 1.62it/s] 75%|███████▌ | 8679/11526 [1:30:42<29:18, 1.62it/s] {'loss': 0.1294, 'grad_norm': 0.540118396282196, 'learning_rate': 1.7463448209411649e-06, 'epoch': 2.26}
75%|███████▌ | 8679/11526 [1:30:42<29:18, 1.62it/s] 75%|███████▌ | 8680/11526 [1:30:42<29:14, 1.62it/s] {'loss': 0.1892, 'grad_norm': 0.614761471748352, 'learning_rate': 1.7451951401276318e-06, 'epoch': 2.26}
75%|███████▌ | 8680/11526 [1:30:42<29:14, 1.62it/s] 75%|███████▌ | 8681/11526 [1:30:43<29:13, 1.62it/s] {'loss': 0.1858, 'grad_norm': 0.6380756497383118, 'learning_rate': 1.744045757863318e-06, 'epoch': 2.26}
75%|███████▌ | 8681/11526 [1:30:43<29:13, 1.62it/s] 75%|███████▌ | 8682/11526 [1:30:44<29:14, 1.62it/s] {'loss': 0.1337, 'grad_norm': 0.4879712462425232, 'learning_rate': 1.742896674253653e-06, 'epoch': 2.26}
75%|███████▌ | 8682/11526 [1:30:44<29:14, 1.62it/s] 75%|███████▌ | 8683/11526 [1:30:44<29:12, 1.62it/s] {'loss': 0.245, 'grad_norm': 0.7590476274490356, 'learning_rate': 1.7417478894040374e-06, 'epoch': 2.26}
75%|███████▌ | 8683/11526 [1:30:44<29:12, 1.62it/s] 75%|███████▌ | 8684/11526 [1:30:45<29:10, 1.62it/s] {'loss': 0.1494, 'grad_norm': 0.5990495085716248, 'learning_rate': 1.7405994034198437e-06, 'epoch': 2.26}
75%|███████▌ | 8684/11526 [1:30:45<29:10, 1.62it/s] 75%|███████▌ | 8685/11526 [1:30:45<29:08, 1.62it/s] {'loss': 0.1855, 'grad_norm': 0.6685135364532471, 'learning_rate': 1.7394512164064182e-06, 'epoch': 2.26}
75%|███████▌ | 8685/11526 [1:30:45<29:08, 1.62it/s] 75%|███████▌ | 8686/11526 [1:30:46<29:06, 1.63it/s] {'loss': 0.1279, 'grad_norm': 0.5200831294059753, 'learning_rate': 1.7383033284690804e-06, 'epoch': 2.26}
75%|███████▌ | 8686/11526 [1:30:46<29:06, 1.63it/s] 75%|███████▌ | 8687/11526 [1:30:47<29:15, 1.62it/s] {'loss': 0.1364, 'grad_norm': 0.5644316673278809, 'learning_rate': 1.7371557397131161e-06, 'epoch': 2.26}
75%|███████▌ | 8687/11526 [1:30:47<29:15, 1.62it/s] 75%|███████▌ | 8688/11526 [1:30:47<29:11, 1.62it/s] {'loss': 0.1591, 'grad_norm': 0.6000602841377258, 'learning_rate': 1.7360084502437957e-06, 'epoch': 2.26}
75%|███████▌ | 8688/11526 [1:30:47<29:11, 1.62it/s] 75%|███████▌ | 8689/11526 [1:30:48<29:07, 1.62it/s] {'loss': 0.138, 'grad_norm': 0.535276472568512, 'learning_rate': 1.73486146016635e-06, 'epoch': 2.26}
75%|███████▌ | 8689/11526 [1:30:48<29:07, 1.62it/s] 75%|███████▌ | 8690/11526 [1:30:48<29:06, 1.62it/s] {'loss': 0.1785, 'grad_norm': 0.6605271100997925, 'learning_rate': 1.733714769585989e-06, 'epoch': 2.26}
75%|███████▌ | 8690/11526 [1:30:49<29:06, 1.62it/s] 75%|███████▌ | 8691/11526 [1:30:49<29:04, 1.63it/s] {'loss': 0.2122, 'grad_norm': 0.6913619041442871, 'learning_rate': 1.732568378607895e-06, 'epoch': 2.26}
75%|███████▌ | 8691/11526 [1:30:49<29:04, 1.63it/s] 75%|███████▌ | 8692/11526 [1:30:50<29:06, 1.62it/s] {'loss': 0.1416, 'grad_norm': 0.5008960962295532, 'learning_rate': 1.7314222873372177e-06, 'epoch': 2.26}
75%|███████▌ | 8692/11526 [1:30:50<29:06, 1.62it/s] 75%|███████▌ | 8693/11526 [1:30:50<29:04, 1.62it/s] {'loss': 0.176, 'grad_norm': 0.6157169342041016, 'learning_rate': 1.730276495879087e-06, 'epoch': 2.26}
75%|███████▌ | 8693/11526 [1:30:50<29:04, 1.62it/s] 75%|███████▌ | 8694/11526 [1:30:51<29:03, 1.62it/s] {'loss': 0.1699, 'grad_norm': 0.722813606262207, 'learning_rate': 1.7291310043386011e-06, 'epoch': 2.26}
75%|███████▌ | 8694/11526 [1:30:51<29:03, 1.62it/s] 75%|███████▌ | 8695/11526 [1:30:52<29:01, 1.63it/s] {'loss': 0.1439, 'grad_norm': 0.6820262670516968, 'learning_rate': 1.7279858128208287e-06, 'epoch': 2.26}
75%|███████▌ | 8695/11526 [1:30:52<29:01, 1.63it/s] 75%|███████▌ | 8696/11526 [1:30:52<29:00, 1.63it/s] {'loss': 0.124, 'grad_norm': 0.5887527465820312, 'learning_rate': 1.7268409214308152e-06, 'epoch': 2.26}
75%|███████▌ | 8696/11526 [1:30:52<29:00, 1.63it/s] 75%|███████▌ | 8697/11526 [1:30:53<29:02, 1.62it/s] {'loss': 0.1641, 'grad_norm': 0.6402209997177124, 'learning_rate': 1.7256963302735752e-06, 'epoch': 2.26}
75%|███████▌ | 8697/11526 [1:30:53<29:02, 1.62it/s] 75%|███████▌ | 8698/11526 [1:30:53<29:00, 1.63it/s] {'loss': 0.1846, 'grad_norm': 0.632451057434082, 'learning_rate': 1.7245520394540977e-06, 'epoch': 2.26}
75%|███████▌ | 8698/11526 [1:30:53<29:00, 1.63it/s] 75%|███████▌ | 8699/11526 [1:30:54<28:59, 1.63it/s] {'loss': 0.2125, 'grad_norm': 0.7227647304534912, 'learning_rate': 1.723408049077346e-06, 'epoch': 2.26}
75%|███████▌ | 8699/11526 [1:30:54<28:59, 1.63it/s] 75%|███████▌ | 8700/11526 [1:30:55<28:57, 1.63it/s] {'loss': 0.1444, 'grad_norm': 0.51820307970047, 'learning_rate': 1.7222643592482486e-06, 'epoch': 2.26}
75%|███████▌ | 8700/11526 [1:30:55<28:57, 1.63it/s] 75%|███████▌ | 8701/11526 [1:30:55<28:56, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.5837035775184631, 'learning_rate': 1.7211209700717123e-06, 'epoch': 2.26}
75%|███████▌ | 8701/11526 [1:30:55<28:56, 1.63it/s] 75%|███████▌ | 8702/11526 [1:30:56<28:58, 1.62it/s] {'loss': 0.1378, 'grad_norm': 0.5251055955886841, 'learning_rate': 1.7199778816526202e-06, 'epoch': 2.26}
75%|███████▌ | 8702/11526 [1:30:56<28:58, 1.62it/s] 76%|███████▌ | 8703/11526 [1:30:56<28:58, 1.62it/s] {'loss': 0.1317, 'grad_norm': 0.518406867980957, 'learning_rate': 1.7188350940958176e-06, 'epoch': 2.27}
76%|███████▌ | 8703/11526 [1:30:57<28:58, 1.62it/s] 76%|███████▌ | 8704/11526 [1:30:57<28:55, 1.63it/s] {'loss': 0.1712, 'grad_norm': 0.6340608596801758, 'learning_rate': 1.717692607506129e-06, 'epoch': 2.27}
76%|███████▌ | 8704/11526 [1:30:57<28:55, 1.63it/s] 76%|███████▌ | 8705/11526 [1:30:58<28:54, 1.63it/s] {'loss': 0.1634, 'grad_norm': 0.631791353225708, 'learning_rate': 1.7165504219883505e-06, 'epoch': 2.27}
76%|███████▌ | 8705/11526 [1:30:58<28:54, 1.63it/s] 76%|███████▌ | 8706/11526 [1:30:58<28:53, 1.63it/s] {'loss': 0.1411, 'grad_norm': 0.5739006400108337, 'learning_rate': 1.715408537647249e-06, 'epoch': 2.27}
76%|███████▌ | 8706/11526 [1:30:58<28:53, 1.63it/s] 76%|███████▌ | 8707/11526 [1:30:59<28:55, 1.62it/s] {'loss': 0.1404, 'grad_norm': 0.6494494676589966, 'learning_rate': 1.7142669545875667e-06, 'epoch': 2.27}
76%|███████▌ | 8707/11526 [1:30:59<28:55, 1.62it/s] 76%|███████▌ | 8708/11526 [1:31:00<28:53, 1.63it/s] {'loss': 0.1352, 'grad_norm': 0.5496934652328491, 'learning_rate': 1.7131256729140128e-06, 'epoch': 2.27}
76%|███████▌ | 8708/11526 [1:31:00<28:53, 1.63it/s] 76%|███████▌ | 8709/11526 [1:31:00<28:51, 1.63it/s] {'loss': 0.1222, 'grad_norm': 0.5120833516120911, 'learning_rate': 1.7119846927312722e-06, 'epoch': 2.27}
76%|███████▌ | 8709/11526 [1:31:00<28:51, 1.63it/s] 76%|███████▌ | 8710/11526 [1:31:01<28:50, 1.63it/s] {'loss': 0.1715, 'grad_norm': 0.6127618551254272, 'learning_rate': 1.7108440141440064e-06, 'epoch': 2.27}
76%|███████▌ | 8710/11526 [1:31:01<28:50, 1.63it/s] 76%|███████▌ | 8711/11526 [1:31:01<28:50, 1.63it/s] {'loss': 0.1424, 'grad_norm': 0.5723620057106018, 'learning_rate': 1.709703637256841e-06, 'epoch': 2.27}
76%|███████▌ | 8711/11526 [1:31:01<28:50, 1.63it/s] 76%|███████▌ | 8712/11526 [1:31:02<28:58, 1.62it/s] {'loss': 0.1831, 'grad_norm': 0.6596749424934387, 'learning_rate': 1.708563562174379e-06, 'epoch': 2.27}
76%|███████▌ | 8712/11526 [1:31:02<28:58, 1.62it/s] 76%|███████▌ | 8713/11526 [1:31:03<28:53, 1.62it/s] {'loss': 0.1455, 'grad_norm': 0.6410313248634338, 'learning_rate': 1.7074237890011958e-06, 'epoch': 2.27}
76%|███████▌ | 8713/11526 [1:31:03<28:53, 1.62it/s] 76%|███████▌ | 8714/11526 [1:31:03<28:51, 1.62it/s] {'loss': 0.1087, 'grad_norm': 0.44170886278152466, 'learning_rate': 1.7062843178418337e-06, 'epoch': 2.27}
76%|███████▌ | 8714/11526 [1:31:03<28:51, 1.62it/s] 76%|███████▌ | 8715/11526 [1:31:04<28:49, 1.63it/s] {'loss': 0.1585, 'grad_norm': 0.6397140622138977, 'learning_rate': 1.7051451488008174e-06, 'epoch': 2.27}
76%|███████▌ | 8715/11526 [1:31:04<28:49, 1.63it/s] 76%|███████▌ | 8716/11526 [1:31:04<28:47, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.551471471786499, 'learning_rate': 1.7040062819826342e-06, 'epoch': 2.27}
76%|███████▌ | 8716/11526 [1:31:05<28:47, 1.63it/s] 76%|███████▌ | 8717/11526 [1:31:05<28:54, 1.62it/s] {'loss': 0.162, 'grad_norm': 0.6671346426010132, 'learning_rate': 1.7028677174917474e-06, 'epoch': 2.27}
76%|███████▌ | 8717/11526 [1:31:05<28:54, 1.62it/s] 76%|███████▌ | 8718/11526 [1:31:06<28:53, 1.62it/s] {'loss': 0.1376, 'grad_norm': 0.6163726449012756, 'learning_rate': 1.7017294554325937e-06, 'epoch': 2.27}
76%|███████▌ | 8718/11526 [1:31:06<28:53, 1.62it/s] 76%|███████▌ | 8719/11526 [1:31:06<28:49, 1.62it/s] {'loss': 0.1366, 'grad_norm': 0.552501916885376, 'learning_rate': 1.7005914959095809e-06, 'epoch': 2.27}
76%|███████▌ | 8719/11526 [1:31:06<28:49, 1.62it/s] 76%|███████▌ | 8720/11526 [1:31:07<28:47, 1.62it/s] {'loss': 0.1654, 'grad_norm': 0.6141290068626404, 'learning_rate': 1.6994538390270887e-06, 'epoch': 2.27}
76%|███████▌ | 8720/11526 [1:31:07<28:47, 1.62it/s] 76%|███████▌ | 8721/11526 [1:31:08<28:46, 1.62it/s] {'loss': 0.1398, 'grad_norm': 0.5687982439994812, 'learning_rate': 1.6983164848894712e-06, 'epoch': 2.27}
76%|███████▌ | 8721/11526 [1:31:08<28:46, 1.62it/s] 76%|███████▌ | 8722/11526 [1:31:08<28:46, 1.62it/s] {'loss': 0.1578, 'grad_norm': 0.6570899486541748, 'learning_rate': 1.6971794336010506e-06, 'epoch': 2.27}
76%|███████▌ | 8722/11526 [1:31:08<28:46, 1.62it/s] 76%|███████▌ | 8723/11526 [1:31:09<28:45, 1.62it/s] {'loss': 0.1233, 'grad_norm': 0.45937931537628174, 'learning_rate': 1.6960426852661239e-06, 'epoch': 2.27}
76%|███████▌ | 8723/11526 [1:31:09<28:45, 1.62it/s] 76%|███████▌ | 8724/11526 [1:31:09<28:43, 1.63it/s] {'loss': 0.1827, 'grad_norm': 0.63831627368927, 'learning_rate': 1.6949062399889609e-06, 'epoch': 2.27}
76%|███████▌ | 8724/11526 [1:31:09<28:43, 1.63it/s] 76%|███████▌ | 8725/11526 [1:31:10<28:42, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.5141412615776062, 'learning_rate': 1.693770097873803e-06, 'epoch': 2.27}
76%|███████▌ | 8725/11526 [1:31:10<28:42, 1.63it/s] 76%|███████▌ | 8726/11526 [1:31:11<28:40, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.5562852621078491, 'learning_rate': 1.692634259024865e-06, 'epoch': 2.27}
76%|███████▌ | 8726/11526 [1:31:11<28:40, 1.63it/s] 76%|███████▌ | 8727/11526 [1:31:11<28:44, 1.62it/s] {'loss': 0.2327, 'grad_norm': 0.710246741771698, 'learning_rate': 1.6914987235463275e-06, 'epoch': 2.27}
76%|███████▌ | 8727/11526 [1:31:11<28:44, 1.62it/s] 76%|███████▌ | 8728/11526 [1:31:12<28:42, 1.62it/s] {'loss': 0.1527, 'grad_norm': 0.6922497749328613, 'learning_rate': 1.6903634915423539e-06, 'epoch': 2.27}
76%|███████▌ | 8728/11526 [1:31:12<28:42, 1.62it/s] 76%|███████▌ | 8729/11526 [1:31:12<28:39, 1.63it/s] {'loss': 0.1471, 'grad_norm': 0.5663548707962036, 'learning_rate': 1.6892285631170729e-06, 'epoch': 2.27}
76%|███████▌ | 8729/11526 [1:31:13<28:39, 1.63it/s] 76%|███████▌ | 8730/11526 [1:31:13<28:38, 1.63it/s] {'loss': 0.1591, 'grad_norm': 0.6114604473114014, 'learning_rate': 1.6880939383745842e-06, 'epoch': 2.27}
76%|███████▌ | 8730/11526 [1:31:13<28:38, 1.63it/s] 76%|███████▌ | 8731/11526 [1:31:14<28:38, 1.63it/s] {'loss': 0.1243, 'grad_norm': 0.5350226163864136, 'learning_rate': 1.6869596174189635e-06, 'epoch': 2.27}
76%|███████▌ | 8731/11526 [1:31:14<28:38, 1.63it/s] 76%|███████▌ | 8732/11526 [1:31:14<28:40, 1.62it/s] {'loss': 0.1571, 'grad_norm': 0.5812162756919861, 'learning_rate': 1.6858256003542566e-06, 'epoch': 2.27}
76%|███████▌ | 8732/11526 [1:31:14<28:40, 1.62it/s] 76%|███████▌ | 8733/11526 [1:31:15<28:37, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.5955747961997986, 'learning_rate': 1.684691887284483e-06, 'epoch': 2.27}
76%|███████▌ | 8733/11526 [1:31:15<28:37, 1.63it/s] 76%|███████▌ | 8734/11526 [1:31:16<28:36, 1.63it/s] {'loss': 0.1635, 'grad_norm': 0.6205931901931763, 'learning_rate': 1.6835584783136345e-06, 'epoch': 2.27}
76%|███████▌ | 8734/11526 [1:31:16<28:36, 1.63it/s] 76%|███████▌ | 8735/11526 [1:31:16<28:35, 1.63it/s] {'loss': 0.1427, 'grad_norm': 0.5522063374519348, 'learning_rate': 1.6824253735456703e-06, 'epoch': 2.27}
76%|███████▌ | 8735/11526 [1:31:16<28:35, 1.63it/s] 76%|███████▌ | 8736/11526 [1:31:17<28:35, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5184552073478699, 'learning_rate': 1.6812925730845258e-06, 'epoch': 2.27}
76%|███████▌ | 8736/11526 [1:31:17<28:35, 1.63it/s] 76%|███████▌ | 8737/11526 [1:31:17<28:36, 1.62it/s] {'loss': 0.1296, 'grad_norm': 0.5381393432617188, 'learning_rate': 1.680160077034112e-06, 'epoch': 2.27}
76%|███████▌ | 8737/11526 [1:31:17<28:36, 1.62it/s] 76%|███████▌ | 8738/11526 [1:31:18<28:34, 1.63it/s] {'loss': 0.2036, 'grad_norm': 0.6794476509094238, 'learning_rate': 1.6790278854983033e-06, 'epoch': 2.27}
76%|███████▌ | 8738/11526 [1:31:18<28:34, 1.63it/s] 76%|███████▌ | 8739/11526 [1:31:19<28:34, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.7321059703826904, 'learning_rate': 1.677895998580953e-06, 'epoch': 2.27}
76%|███████▌ | 8739/11526 [1:31:19<28:34, 1.63it/s] 76%|███████▌ | 8740/11526 [1:31:19<28:32, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.5554350018501282, 'learning_rate': 1.6767644163858848e-06, 'epoch': 2.27}
76%|███████▌ | 8740/11526 [1:31:19<28:32, 1.63it/s] 76%|███████▌ | 8741/11526 [1:31:20<28:31, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.6348845362663269, 'learning_rate': 1.67563313901689e-06, 'epoch': 2.28}
76%|███████▌ | 8741/11526 [1:31:20<28:31, 1.63it/s] 76%|███████▌ | 8742/11526 [1:31:20<28:33, 1.62it/s] {'loss': 0.1271, 'grad_norm': 0.4957828223705292, 'learning_rate': 1.6745021665777417e-06, 'epoch': 2.28}
76%|███████▌ | 8742/11526 [1:31:21<28:33, 1.62it/s] 76%|███████▌ | 8743/11526 [1:31:21<28:31, 1.63it/s] {'loss': 0.1456, 'grad_norm': 0.5464438199996948, 'learning_rate': 1.6733714991721738e-06, 'epoch': 2.28}
76%|███████▌ | 8743/11526 [1:31:21<28:31, 1.63it/s] 76%|███████▌ | 8744/11526 [1:31:22<28:29, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.549994170665741, 'learning_rate': 1.6722411369039004e-06, 'epoch': 2.28}
76%|███████▌ | 8744/11526 [1:31:22<28:29, 1.63it/s] 76%|███████▌ | 8745/11526 [1:31:22<28:30, 1.63it/s] {'loss': 0.1807, 'grad_norm': 0.6867144703865051, 'learning_rate': 1.671111079876604e-06, 'epoch': 2.28}
76%|███████▌ | 8745/11526 [1:31:22<28:30, 1.63it/s] 76%|███████▌ | 8746/11526 [1:31:23<28:29, 1.63it/s] {'loss': 0.1681, 'grad_norm': 0.6484703421592712, 'learning_rate': 1.6699813281939398e-06, 'epoch': 2.28}
76%|███████▌ | 8746/11526 [1:31:23<28:29, 1.63it/s] 76%|███████▌ | 8747/11526 [1:31:24<28:29, 1.63it/s] {'loss': 0.1803, 'grad_norm': 0.7691338658332825, 'learning_rate': 1.668851881959535e-06, 'epoch': 2.28}
76%|███████▌ | 8747/11526 [1:31:24<28:29, 1.63it/s] 76%|███████▌ | 8748/11526 [1:31:24<28:27, 1.63it/s] {'loss': 0.133, 'grad_norm': 0.506417453289032, 'learning_rate': 1.667722741276991e-06, 'epoch': 2.28}
76%|███████▌ | 8748/11526 [1:31:24<28:27, 1.63it/s] 76%|███████▌ | 8749/11526 [1:31:25<28:26, 1.63it/s] {'loss': 0.1619, 'grad_norm': 0.6253814697265625, 'learning_rate': 1.6665939062498753e-06, 'epoch': 2.28}
76%|███████▌ | 8749/11526 [1:31:25<28:26, 1.63it/s] 76%|███████▌ | 8750/11526 [1:31:25<28:25, 1.63it/s] {'loss': 0.1471, 'grad_norm': 0.60845947265625, 'learning_rate': 1.6654653769817315e-06, 'epoch': 2.28}
76%|███████▌ | 8750/11526 [1:31:25<28:25, 1.63it/s] 76%|███████▌ | 8751/11526 [1:31:26<28:24, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.6441958546638489, 'learning_rate': 1.6643371535760788e-06, 'epoch': 2.28}
76%|███████▌ | 8751/11526 [1:31:26<28:24, 1.63it/s] 76%|███████▌ | 8752/11526 [1:31:27<28:25, 1.63it/s] {'loss': 0.1357, 'grad_norm': 0.5395568013191223, 'learning_rate': 1.6632092361364e-06, 'epoch': 2.28}
76%|███████▌ | 8752/11526 [1:31:27<28:25, 1.63it/s] 76%|███████▌ | 8753/11526 [1:31:27<28:24, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.6293237209320068, 'learning_rate': 1.6620816247661559e-06, 'epoch': 2.28}
76%|███████▌ | 8753/11526 [1:31:27<28:24, 1.63it/s] 76%|███████▌ | 8754/11526 [1:31:28<28:22, 1.63it/s] {'loss': 0.1605, 'grad_norm': 0.6294264793395996, 'learning_rate': 1.6609543195687765e-06, 'epoch': 2.28}
76%|███████▌ | 8754/11526 [1:31:28<28:22, 1.63it/s] 76%|███████▌ | 8755/11526 [1:31:28<28:21, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.7382712364196777, 'learning_rate': 1.6598273206476655e-06, 'epoch': 2.28}
76%|███████▌ | 8755/11526 [1:31:29<28:21, 1.63it/s] 76%|███████▌ | 8756/11526 [1:31:29<28:21, 1.63it/s] {'loss': 0.1159, 'grad_norm': 0.4743589162826538, 'learning_rate': 1.6587006281061996e-06, 'epoch': 2.28}
76%|███████▌ | 8756/11526 [1:31:29<28:21, 1.63it/s] 76%|███████▌ | 8757/11526 [1:31:30<28:22, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.741549551486969, 'learning_rate': 1.6575742420477214e-06, 'epoch': 2.28}
76%|███████▌ | 8757/11526 [1:31:30<28:22, 1.63it/s] 76%|███████▌ | 8758/11526 [1:31:30<28:20, 1.63it/s] {'loss': 0.1564, 'grad_norm': 0.5831612348556519, 'learning_rate': 1.6564481625755497e-06, 'epoch': 2.28}
76%|███████▌ | 8758/11526 [1:31:30<28:20, 1.63it/s] 76%|███████▌ | 8759/11526 [1:31:31<28:20, 1.63it/s] {'loss': 0.1488, 'grad_norm': 0.5602267980575562, 'learning_rate': 1.6553223897929798e-06, 'epoch': 2.28}
76%|███████▌ | 8759/11526 [1:31:31<28:20, 1.63it/s] 76%|███████▌ | 8760/11526 [1:31:32<28:19, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.6543534398078918, 'learning_rate': 1.654196923803269e-06, 'epoch': 2.28}
76%|███████▌ | 8760/11526 [1:31:32<28:19, 1.63it/s] 76%|███████▌ | 8761/11526 [1:31:32<28:19, 1.63it/s] {'loss': 0.1664, 'grad_norm': 0.6610387563705444, 'learning_rate': 1.6530717647096533e-06, 'epoch': 2.28}
76%|███████▌ | 8761/11526 [1:31:32<28:19, 1.63it/s] 76%|███████▌ | 8762/11526 [1:31:33<28:19, 1.63it/s] {'loss': 0.1913, 'grad_norm': 0.6263650059700012, 'learning_rate': 1.6519469126153404e-06, 'epoch': 2.28}
76%|███████▌ | 8762/11526 [1:31:33<28:19, 1.63it/s] 76%|███████▌ | 8763/11526 [1:31:33<28:18, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.5387486815452576, 'learning_rate': 1.6508223676235025e-06, 'epoch': 2.28}
76%|███████▌ | 8763/11526 [1:31:33<28:18, 1.63it/s] 76%|███████▌ | 8764/11526 [1:31:34<28:17, 1.63it/s] {'loss': 0.1496, 'grad_norm': 0.6114454865455627, 'learning_rate': 1.649698129837297e-06, 'epoch': 2.28}
76%|███████▌ | 8764/11526 [1:31:34<28:17, 1.63it/s] 76%|███████▌ | 8765/11526 [1:31:35<28:16, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.7674050331115723, 'learning_rate': 1.6485741993598392e-06, 'epoch': 2.28}
76%|███████▌ | 8765/11526 [1:31:35<28:16, 1.63it/s] 76%|███████▌ | 8766/11526 [1:31:35<28:14, 1.63it/s] {'loss': 0.1594, 'grad_norm': 0.5504546165466309, 'learning_rate': 1.647450576294225e-06, 'epoch': 2.28}
76%|███████▌ | 8766/11526 [1:31:35<28:14, 1.63it/s] 76%|███████▌ | 8767/11526 [1:31:36<28:23, 1.62it/s] {'loss': 0.1468, 'grad_norm': 0.5629093050956726, 'learning_rate': 1.6463272607435194e-06, 'epoch': 2.28}
76%|███████▌ | 8767/11526 [1:31:36<28:23, 1.62it/s] 76%|███████▌ | 8768/11526 [1:31:36<28:20, 1.62it/s] {'loss': 0.1219, 'grad_norm': 0.4868939518928528, 'learning_rate': 1.6452042528107587e-06, 'epoch': 2.28}
76%|███████▌ | 8768/11526 [1:31:37<28:20, 1.62it/s] 76%|███████▌ | 8769/11526 [1:31:37<28:16, 1.62it/s] {'loss': 0.1173, 'grad_norm': 0.5105841159820557, 'learning_rate': 1.6440815525989517e-06, 'epoch': 2.28}
76%|███████▌ | 8769/11526 [1:31:37<28:16, 1.62it/s] 76%|███████▌ | 8770/11526 [1:31:38<28:15, 1.63it/s] {'loss': 0.1648, 'grad_norm': 0.635399580001831, 'learning_rate': 1.642959160211081e-06, 'epoch': 2.28}
76%|███████▌ | 8770/11526 [1:31:38<28:15, 1.63it/s] 76%|███████▌ | 8771/11526 [1:31:38<28:14, 1.63it/s] {'loss': 0.1384, 'grad_norm': 0.6286176443099976, 'learning_rate': 1.6418370757500956e-06, 'epoch': 2.28}
76%|███████▌ | 8771/11526 [1:31:38<28:14, 1.63it/s] 76%|███████▌ | 8772/11526 [1:31:39<28:15, 1.62it/s] {'loss': 0.1401, 'grad_norm': 0.5110074877738953, 'learning_rate': 1.6407152993189208e-06, 'epoch': 2.28}
76%|███████▌ | 8772/11526 [1:31:39<28:15, 1.62it/s] 76%|███████▌ | 8773/11526 [1:31:40<28:14, 1.62it/s] {'loss': 0.1773, 'grad_norm': 0.7313774824142456, 'learning_rate': 1.6395938310204517e-06, 'epoch': 2.28}
76%|███████▌ | 8773/11526 [1:31:40<28:14, 1.62it/s] 76%|███████▌ | 8774/11526 [1:31:40<28:12, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.5533020496368408, 'learning_rate': 1.6384726709575566e-06, 'epoch': 2.28}
76%|███████▌ | 8774/11526 [1:31:40<28:12, 1.63it/s] 76%|███████▌ | 8775/11526 [1:31:41<28:10, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.6160313487052917, 'learning_rate': 1.6373518192330762e-06, 'epoch': 2.28}
76%|███████▌ | 8775/11526 [1:31:41<28:10, 1.63it/s] 76%|███████▌ | 8776/11526 [1:31:41<28:09, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.6114586591720581, 'learning_rate': 1.6362312759498161e-06, 'epoch': 2.28}
76%|███████▌ | 8776/11526 [1:31:41<28:09, 1.63it/s] 76%|███████▌ | 8777/11526 [1:31:42<28:16, 1.62it/s] {'loss': 0.1377, 'grad_norm': 0.510876476764679, 'learning_rate': 1.6351110412105643e-06, 'epoch': 2.28}
76%|███████▌ | 8777/11526 [1:31:42<28:16, 1.62it/s] 76%|███████▌ | 8778/11526 [1:31:43<28:14, 1.62it/s] {'loss': 0.1217, 'grad_norm': 0.4647568464279175, 'learning_rate': 1.6339911151180743e-06, 'epoch': 2.28}
76%|███████▌ | 8778/11526 [1:31:43<28:14, 1.62it/s] 76%|███████▌ | 8779/11526 [1:31:43<28:12, 1.62it/s] {'loss': 0.1693, 'grad_norm': 0.6064291000366211, 'learning_rate': 1.6328714977750698e-06, 'epoch': 2.29}
76%|███████▌ | 8779/11526 [1:31:43<28:12, 1.62it/s] 76%|███████▌ | 8780/11526 [1:31:44<28:11, 1.62it/s] {'loss': 0.1665, 'grad_norm': 0.5954320430755615, 'learning_rate': 1.6317521892842497e-06, 'epoch': 2.29}
76%|███████▌ | 8780/11526 [1:31:44<28:11, 1.62it/s] 76%|███████▌ | 8781/11526 [1:31:44<28:09, 1.62it/s] {'loss': 0.1561, 'grad_norm': 0.6202597618103027, 'learning_rate': 1.6306331897482836e-06, 'epoch': 2.29}
76%|███████▌ | 8781/11526 [1:31:45<28:09, 1.62it/s] 76%|███████▌ | 8782/11526 [1:31:45<28:10, 1.62it/s] {'loss': 0.1348, 'grad_norm': 0.5026199817657471, 'learning_rate': 1.629514499269812e-06, 'epoch': 2.29}
76%|███████▌ | 8782/11526 [1:31:45<28:10, 1.62it/s] 76%|███████▌ | 8783/11526 [1:31:46<28:08, 1.62it/s] {'loss': 0.1826, 'grad_norm': 0.7001839876174927, 'learning_rate': 1.628396117951449e-06, 'epoch': 2.29}
76%|███████▌ | 8783/11526 [1:31:46<28:08, 1.62it/s] 76%|███████▌ | 8784/11526 [1:31:46<28:06, 1.63it/s] {'loss': 0.126, 'grad_norm': 0.5176535844802856, 'learning_rate': 1.6272780458957765e-06, 'epoch': 2.29}
76%|███████▌ | 8784/11526 [1:31:46<28:06, 1.63it/s] 76%|███████▌ | 8785/11526 [1:31:47<28:05, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.6298236846923828, 'learning_rate': 1.6261602832053502e-06, 'epoch': 2.29}
76%|███████▌ | 8785/11526 [1:31:47<28:05, 1.63it/s] 76%|███████▌ | 8786/11526 [1:31:47<28:03, 1.63it/s] {'loss': 0.1442, 'grad_norm': 0.5727644562721252, 'learning_rate': 1.625042829982702e-06, 'epoch': 2.29}
76%|███████▌ | 8786/11526 [1:31:48<28:03, 1.63it/s] 76%|███████▌ | 8787/11526 [1:31:48<28:04, 1.63it/s] {'loss': 0.1265, 'grad_norm': 0.5042742490768433, 'learning_rate': 1.6239256863303266e-06, 'epoch': 2.29}
76%|███████▌ | 8787/11526 [1:31:48<28:04, 1.63it/s] 76%|███████▌ | 8788/11526 [1:31:49<28:03, 1.63it/s] {'loss': 0.1367, 'grad_norm': 0.5832157731056213, 'learning_rate': 1.6228088523506963e-06, 'epoch': 2.29}
76%|███████▌ | 8788/11526 [1:31:49<28:03, 1.63it/s] 76%|███████▋ | 8789/11526 [1:31:49<28:02, 1.63it/s] {'loss': 0.1197, 'grad_norm': 0.4336841106414795, 'learning_rate': 1.6216923281462555e-06, 'epoch': 2.29}
76%|███████▋ | 8789/11526 [1:31:49<28:02, 1.63it/s] 76%|███████▋ | 8790/11526 [1:31:50<28:01, 1.63it/s] {'loss': 0.1212, 'grad_norm': 0.5122340321540833, 'learning_rate': 1.6205761138194121e-06, 'epoch': 2.29}
76%|███████▋ | 8790/11526 [1:31:50<28:01, 1.63it/s] 76%|███████▋ | 8791/11526 [1:31:51<28:00, 1.63it/s] {'loss': 0.1411, 'grad_norm': 0.6204702854156494, 'learning_rate': 1.6194602094725598e-06, 'epoch': 2.29}
76%|███████▋ | 8791/11526 [1:31:51<28:00, 1.63it/s] 76%|███████▋ | 8792/11526 [1:31:51<28:01, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.49855563044548035, 'learning_rate': 1.6183446152080495e-06, 'epoch': 2.29}
76%|███████▋ | 8792/11526 [1:31:51<28:01, 1.63it/s] 76%|███████▋ | 8793/11526 [1:31:52<28:00, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.6464234590530396, 'learning_rate': 1.6172293311282117e-06, 'epoch': 2.29}
76%|███████▋ | 8793/11526 [1:31:52<28:00, 1.63it/s] 76%|███████▋ | 8794/11526 [1:31:52<28:00, 1.63it/s] {'loss': 0.1235, 'grad_norm': 0.5360401272773743, 'learning_rate': 1.616114357335347e-06, 'epoch': 2.29}
76%|███████▋ | 8794/11526 [1:31:53<28:00, 1.63it/s] 76%|███████▋ | 8795/11526 [1:31:53<27:59, 1.63it/s] {'loss': 0.1583, 'grad_norm': 0.5177309513092041, 'learning_rate': 1.614999693931727e-06, 'epoch': 2.29}
76%|███████▋ | 8795/11526 [1:31:53<27:59, 1.63it/s] 76%|███████▋ | 8796/11526 [1:31:54<27:59, 1.63it/s] {'loss': 0.2235, 'grad_norm': 0.7706599831581116, 'learning_rate': 1.6138853410195948e-06, 'epoch': 2.29}
76%|███████▋ | 8796/11526 [1:31:54<27:59, 1.63it/s] 76%|███████▋ | 8797/11526 [1:31:54<28:00, 1.62it/s] {'loss': 0.1765, 'grad_norm': 0.6650514602661133, 'learning_rate': 1.6127712987011674e-06, 'epoch': 2.29}
76%|███████▋ | 8797/11526 [1:31:54<28:00, 1.62it/s] 76%|███████▋ | 8798/11526 [1:31:55<27:59, 1.62it/s] {'loss': 0.1313, 'grad_norm': 0.5880774259567261, 'learning_rate': 1.6116575670786266e-06, 'epoch': 2.29}
76%|███████▋ | 8798/11526 [1:31:55<27:59, 1.62it/s] 76%|███████▋ | 8799/11526 [1:31:55<27:57, 1.63it/s] {'loss': 0.146, 'grad_norm': 0.5643131732940674, 'learning_rate': 1.6105441462541333e-06, 'epoch': 2.29}
76%|███████▋ | 8799/11526 [1:31:56<27:57, 1.63it/s] 76%|███████▋ | 8800/11526 [1:31:56<27:56, 1.63it/s] {'loss': 0.2433, 'grad_norm': 1.0140581130981445, 'learning_rate': 1.609431036329816e-06, 'epoch': 2.29}
76%|███████▋ | 8800/11526 [1:31:56<27:56, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.33it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5416151285171509, 'eval_runtime': 1.955, 'eval_samples_per_second': 102.301, 'eval_steps_per_second': 6.65, 'epoch': 2.29}
76%|███████▋ | 8800/11526 [1:31:58<27:56, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 76%|███████▋ | 8801/11526 [1:31:59<54:38, 1.20s/it] {'loss': 0.1171, 'grad_norm': 0.5108508467674255, 'learning_rate': 1.6083182374077755e-06, 'epoch': 2.29}
76%|███████▋ | 8801/11526 [1:31:59<54:38, 1.20s/it] 76%|███████▋ | 8802/11526 [1:31:59<46:35, 1.03s/it] {'loss': 0.1124, 'grad_norm': 0.46391764283180237, 'learning_rate': 1.6072057495900855e-06, 'epoch': 2.29}
76%|███████▋ | 8802/11526 [1:31:59<46:35, 1.03s/it] 76%|███████▋ | 8803/11526 [1:32:00<40:57, 1.11it/s] {'loss': 0.1925, 'grad_norm': 0.645709216594696, 'learning_rate': 1.6060935729787857e-06, 'epoch': 2.29}
76%|███████▋ | 8803/11526 [1:32:00<40:57, 1.11it/s] 76%|███████▋ | 8804/11526 [1:32:01<37:00, 1.23it/s] {'loss': 0.1215, 'grad_norm': 0.49662601947784424, 'learning_rate': 1.604981707675895e-06, 'epoch': 2.29}
76%|███████▋ | 8804/11526 [1:32:01<37:00, 1.23it/s] 76%|███████▋ | 8805/11526 [1:32:01<34:15, 1.32it/s] {'loss': 0.1226, 'grad_norm': 0.5364681482315063, 'learning_rate': 1.6038701537834007e-06, 'epoch': 2.29}
76%|███████▋ | 8805/11526 [1:32:01<34:15, 1.32it/s] 76%|███████▋ | 8806/11526 [1:32:02<32:18, 1.40it/s] {'loss': 0.1662, 'grad_norm': 0.5496565699577332, 'learning_rate': 1.6027589114032576e-06, 'epoch': 2.29}
76%|███████▋ | 8806/11526 [1:32:02<32:18, 1.40it/s] 76%|███████▋ | 8807/11526 [1:32:02<31:06, 1.46it/s] {'loss': 0.1708, 'grad_norm': 0.6008120179176331, 'learning_rate': 1.6016479806373963e-06, 'epoch': 2.29}
76%|███████▋ | 8807/11526 [1:32:03<31:06, 1.46it/s] 76%|███████▋ | 8808/11526 [1:32:03<30:07, 1.50it/s] {'loss': 0.1211, 'grad_norm': 0.5388187170028687, 'learning_rate': 1.6005373615877185e-06, 'epoch': 2.29}
76%|███████▋ | 8808/11526 [1:32:03<30:07, 1.50it/s] 76%|███████▋ | 8809/11526 [1:32:04<29:25, 1.54it/s] {'loss': 0.1422, 'grad_norm': 0.5353203415870667, 'learning_rate': 1.5994270543560958e-06, 'epoch': 2.29}
76%|███████▋ | 8809/11526 [1:32:04<29:25, 1.54it/s] 76%|███████▋ | 8810/11526 [1:32:04<28:56, 1.56it/s] {'loss': 0.1394, 'grad_norm': 0.5067267417907715, 'learning_rate': 1.598317059044372e-06, 'epoch': 2.29}
76%|███████▋ | 8810/11526 [1:32:04<28:56, 1.56it/s] 76%|███████▋ | 8811/11526 [1:32:05<28:34, 1.58it/s] {'loss': 0.1522, 'grad_norm': 0.6075836420059204, 'learning_rate': 1.5972073757543643e-06, 'epoch': 2.29}
76%|███████▋ | 8811/11526 [1:32:05<28:34, 1.58it/s] 76%|███████▋ | 8812/11526 [1:32:05<28:23, 1.59it/s] {'loss': 0.129, 'grad_norm': 0.538530170917511, 'learning_rate': 1.5960980045878538e-06, 'epoch': 2.29}
76%|███████▋ | 8812/11526 [1:32:06<28:23, 1.59it/s] 76%|███████▋ | 8813/11526 [1:32:06<28:12, 1.60it/s] {'loss': 0.1392, 'grad_norm': 0.6007243990898132, 'learning_rate': 1.594988945646605e-06, 'epoch': 2.29}
76%|███████▋ | 8813/11526 [1:32:06<28:12, 1.60it/s] 76%|███████▋ | 8814/11526 [1:32:07<28:04, 1.61it/s] {'loss': 0.1179, 'grad_norm': 0.5579546689987183, 'learning_rate': 1.5938801990323422e-06, 'epoch': 2.29}
76%|███████▋ | 8814/11526 [1:32:07<28:04, 1.61it/s] 76%|███████▋ | 8815/11526 [1:32:07<27:57, 1.62it/s] {'loss': 0.1657, 'grad_norm': 0.5776947736740112, 'learning_rate': 1.5927717648467673e-06, 'epoch': 2.29}
76%|███████▋ | 8815/11526 [1:32:07<27:57, 1.62it/s] 76%|███████▋ | 8816/11526 [1:32:08<27:53, 1.62it/s] {'loss': 0.1425, 'grad_norm': 0.5501949191093445, 'learning_rate': 1.5916636431915528e-06, 'epoch': 2.29}
76%|███████▋ | 8816/11526 [1:32:08<27:53, 1.62it/s] 76%|███████▋ | 8817/11526 [1:32:09<27:53, 1.62it/s] {'loss': 0.1529, 'grad_norm': 0.5664679408073425, 'learning_rate': 1.5905558341683414e-06, 'epoch': 2.29}
76%|███████▋ | 8817/11526 [1:32:09<27:53, 1.62it/s] 77%|███████▋ | 8818/11526 [1:32:09<27:50, 1.62it/s] {'loss': 0.1493, 'grad_norm': 0.5974206328392029, 'learning_rate': 1.5894483378787479e-06, 'epoch': 2.3}
77%|███████▋ | 8818/11526 [1:32:09<27:50, 1.62it/s] 77%|███████▋ | 8819/11526 [1:32:10<27:48, 1.62it/s] {'loss': 0.1256, 'grad_norm': 0.513643205165863, 'learning_rate': 1.5883411544243594e-06, 'epoch': 2.3}
77%|███████▋ | 8819/11526 [1:32:10<27:48, 1.62it/s] 77%|███████▋ | 8820/11526 [1:32:10<27:47, 1.62it/s] {'loss': 0.1318, 'grad_norm': 0.48103052377700806, 'learning_rate': 1.5872342839067305e-06, 'epoch': 2.3}
77%|███████▋ | 8820/11526 [1:32:11<27:47, 1.62it/s] 77%|███████▋ | 8821/11526 [1:32:11<27:46, 1.62it/s] {'loss': 0.1192, 'grad_norm': 0.476402223110199, 'learning_rate': 1.5861277264273905e-06, 'epoch': 2.3}
77%|███████▋ | 8821/11526 [1:32:11<27:46, 1.62it/s] 77%|███████▋ | 8822/11526 [1:32:12<27:47, 1.62it/s] {'loss': 0.1287, 'grad_norm': 0.5179280638694763, 'learning_rate': 1.5850214820878401e-06, 'epoch': 2.3}
77%|███████▋ | 8822/11526 [1:32:12<27:47, 1.62it/s] 77%|███████▋ | 8823/11526 [1:32:12<27:45, 1.62it/s] {'loss': 0.1285, 'grad_norm': 0.4927467405796051, 'learning_rate': 1.5839155509895493e-06, 'epoch': 2.3}
77%|███████▋ | 8823/11526 [1:32:12<27:45, 1.62it/s] 77%|███████▋ | 8824/11526 [1:32:13<27:43, 1.62it/s] {'loss': 0.1522, 'grad_norm': 0.5919942259788513, 'learning_rate': 1.582809933233963e-06, 'epoch': 2.3}
77%|███████▋ | 8824/11526 [1:32:13<27:43, 1.62it/s] 77%|███████▋ | 8825/11526 [1:32:13<27:41, 1.63it/s] {'loss': 0.1108, 'grad_norm': 0.46264299750328064, 'learning_rate': 1.581704628922489e-06, 'epoch': 2.3}
77%|███████▋ | 8825/11526 [1:32:14<27:41, 1.63it/s] 77%|███████▋ | 8826/11526 [1:32:14<27:40, 1.63it/s] {'loss': 0.2198, 'grad_norm': 0.7086575031280518, 'learning_rate': 1.5805996381565174e-06, 'epoch': 2.3}
77%|███████▋ | 8826/11526 [1:32:14<27:40, 1.63it/s] 77%|███████▋ | 8827/11526 [1:32:15<27:40, 1.63it/s] {'loss': 0.1238, 'grad_norm': 0.501856803894043, 'learning_rate': 1.5794949610374045e-06, 'epoch': 2.3}
77%|███████▋ | 8827/11526 [1:32:15<27:40, 1.63it/s] 77%|███████▋ | 8828/11526 [1:32:15<27:39, 1.63it/s] {'loss': 0.1705, 'grad_norm': 0.6370775699615479, 'learning_rate': 1.5783905976664738e-06, 'epoch': 2.3}
77%|███████▋ | 8828/11526 [1:32:15<27:39, 1.63it/s] 77%|███████▋ | 8829/11526 [1:32:16<27:37, 1.63it/s] {'loss': 0.1456, 'grad_norm': 0.5787657499313354, 'learning_rate': 1.5772865481450256e-06, 'epoch': 2.3}
77%|███████▋ | 8829/11526 [1:32:16<27:37, 1.63it/s] 77%|███████▋ | 8830/11526 [1:32:17<27:37, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.5532965064048767, 'learning_rate': 1.5761828125743296e-06, 'epoch': 2.3}
77%|███████▋ | 8830/11526 [1:32:17<27:37, 1.63it/s] 77%|███████▋ | 8831/11526 [1:32:17<27:36, 1.63it/s] {'loss': 0.1205, 'grad_norm': 0.4914114773273468, 'learning_rate': 1.5750793910556266e-06, 'epoch': 2.3}
77%|███████▋ | 8831/11526 [1:32:17<27:36, 1.63it/s] 77%|███████▋ | 8832/11526 [1:32:18<27:44, 1.62it/s] {'loss': 0.1375, 'grad_norm': 0.5553314685821533, 'learning_rate': 1.5739762836901312e-06, 'epoch': 2.3}
77%|███████▋ | 8832/11526 [1:32:18<27:44, 1.62it/s] 77%|███████▋ | 8833/11526 [1:32:18<27:40, 1.62it/s] {'loss': 0.1549, 'grad_norm': 0.6705570816993713, 'learning_rate': 1.5728734905790222e-06, 'epoch': 2.3}
77%|███████▋ | 8833/11526 [1:32:19<27:40, 1.62it/s] 77%|███████▋ | 8834/11526 [1:32:19<27:37, 1.62it/s] {'loss': 0.1735, 'grad_norm': 0.6360194683074951, 'learning_rate': 1.5717710118234546e-06, 'epoch': 2.3}
77%|███████▋ | 8834/11526 [1:32:19<27:37, 1.62it/s] 77%|███████▋ | 8835/11526 [1:32:20<27:35, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.6829579472541809, 'learning_rate': 1.5706688475245584e-06, 'epoch': 2.3}
77%|███████▋ | 8835/11526 [1:32:20<27:35, 1.63it/s] 77%|███████▋ | 8836/11526 [1:32:20<27:33, 1.63it/s] {'loss': 0.1691, 'grad_norm': 0.5758550763130188, 'learning_rate': 1.5695669977834265e-06, 'epoch': 2.3}
77%|███████▋ | 8836/11526 [1:32:20<27:33, 1.63it/s] 77%|███████▋ | 8837/11526 [1:32:21<27:39, 1.62it/s] {'loss': 0.1651, 'grad_norm': 0.6858059167861938, 'learning_rate': 1.568465462701128e-06, 'epoch': 2.3}
77%|███████▋ | 8837/11526 [1:32:21<27:39, 1.62it/s] 77%|███████▋ | 8838/11526 [1:32:21<27:37, 1.62it/s] {'loss': 0.1279, 'grad_norm': 0.5405189990997314, 'learning_rate': 1.5673642423787033e-06, 'epoch': 2.3}
77%|███████▋ | 8838/11526 [1:32:22<27:37, 1.62it/s] 77%|███████▋ | 8839/11526 [1:32:22<27:34, 1.62it/s] {'loss': 0.2356, 'grad_norm': 0.7835726141929626, 'learning_rate': 1.566263336917158e-06, 'epoch': 2.3}
77%|███████▋ | 8839/11526 [1:32:22<27:34, 1.62it/s] 77%|███████▋ | 8840/11526 [1:32:23<27:32, 1.63it/s] {'loss': 0.1623, 'grad_norm': 0.6513432860374451, 'learning_rate': 1.5651627464174795e-06, 'epoch': 2.3}
77%|███████▋ | 8840/11526 [1:32:23<27:32, 1.63it/s] 77%|███████▋ | 8841/11526 [1:32:23<27:30, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.5636640191078186, 'learning_rate': 1.5640624709806162e-06, 'epoch': 2.3}
77%|███████▋ | 8841/11526 [1:32:23<27:30, 1.63it/s] 77%|███████▋ | 8842/11526 [1:32:24<27:31, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.5050799250602722, 'learning_rate': 1.5629625107074926e-06, 'epoch': 2.3}
77%|███████▋ | 8842/11526 [1:32:24<27:31, 1.62it/s] 77%|███████▋ | 8843/11526 [1:32:25<27:29, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.5503177046775818, 'learning_rate': 1.5618628656990037e-06, 'epoch': 2.3}
77%|███████▋ | 8843/11526 [1:32:25<27:29, 1.63it/s] 77%|███████▋ | 8844/11526 [1:32:25<27:28, 1.63it/s] {'loss': 0.1332, 'grad_norm': 0.536226212978363, 'learning_rate': 1.560763536056014e-06, 'epoch': 2.3}
77%|███████▋ | 8844/11526 [1:32:25<27:28, 1.63it/s] 77%|███████▋ | 8845/11526 [1:32:26<27:28, 1.63it/s] {'loss': 0.1366, 'grad_norm': 0.6000056266784668, 'learning_rate': 1.559664521879362e-06, 'epoch': 2.3}
77%|███████▋ | 8845/11526 [1:32:26<27:28, 1.63it/s] 77%|███████▋ | 8846/11526 [1:32:26<27:27, 1.63it/s] {'loss': 0.1361, 'grad_norm': 0.5342493057250977, 'learning_rate': 1.5585658232698564e-06, 'epoch': 2.3}
77%|███████▋ | 8846/11526 [1:32:27<27:27, 1.63it/s] 77%|███████▋ | 8847/11526 [1:32:27<27:28, 1.63it/s] {'loss': 0.165, 'grad_norm': 0.6292035579681396, 'learning_rate': 1.557467440328273e-06, 'epoch': 2.3}
77%|███████▋ | 8847/11526 [1:32:27<27:28, 1.63it/s] 77%|███████▋ | 8848/11526 [1:32:28<27:27, 1.63it/s] {'loss': 0.1356, 'grad_norm': 0.5756126046180725, 'learning_rate': 1.5563693731553626e-06, 'epoch': 2.3}
77%|███████▋ | 8848/11526 [1:32:28<27:27, 1.63it/s] 77%|███████▋ | 8849/11526 [1:32:28<27:25, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.6608495116233826, 'learning_rate': 1.555271621851847e-06, 'epoch': 2.3}
77%|███████▋ | 8849/11526 [1:32:28<27:25, 1.63it/s] 77%|███████▋ | 8850/11526 [1:32:29<27:24, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.6381732821464539, 'learning_rate': 1.5541741865184179e-06, 'epoch': 2.3}
77%|███████▋ | 8850/11526 [1:32:29<27:24, 1.63it/s] 77%|███████▋ | 8851/11526 [1:32:29<27:23, 1.63it/s] {'loss': 0.1616, 'grad_norm': 0.59759521484375, 'learning_rate': 1.55307706725574e-06, 'epoch': 2.3}
77%|███████▋ | 8851/11526 [1:32:30<27:23, 1.63it/s] 77%|███████▋ | 8852/11526 [1:32:30<27:24, 1.63it/s] {'loss': 0.1668, 'grad_norm': 0.6908975839614868, 'learning_rate': 1.5519802641644422e-06, 'epoch': 2.3}
77%|███████▋ | 8852/11526 [1:32:30<27:24, 1.63it/s] 77%|███████▋ | 8853/11526 [1:32:31<27:23, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.6139684915542603, 'learning_rate': 1.5508837773451352e-06, 'epoch': 2.3}
77%|███████▋ | 8853/11526 [1:32:31<27:23, 1.63it/s] 77%|███████▋ | 8854/11526 [1:32:31<27:22, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.6131900548934937, 'learning_rate': 1.5497876068983942e-06, 'epoch': 2.3}
77%|███████▋ | 8854/11526 [1:32:31<27:22, 1.63it/s] 77%|███████▋ | 8855/11526 [1:32:32<27:20, 1.63it/s] {'loss': 0.1517, 'grad_norm': 0.5876078009605408, 'learning_rate': 1.5486917529247624e-06, 'epoch': 2.3}
77%|███████▋ | 8855/11526 [1:32:32<27:20, 1.63it/s] 77%|███████▋ | 8856/11526 [1:32:33<27:19, 1.63it/s] {'loss': 0.1542, 'grad_norm': 0.61721271276474, 'learning_rate': 1.547596215524761e-06, 'epoch': 2.31}
77%|███████▋ | 8856/11526 [1:32:33<27:19, 1.63it/s] 77%|███████▋ | 8857/11526 [1:32:33<27:20, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.5424014925956726, 'learning_rate': 1.546500994798878e-06, 'epoch': 2.31}
77%|███████▋ | 8857/11526 [1:32:33<27:20, 1.63it/s] 77%|███████▋ | 8858/11526 [1:32:34<27:18, 1.63it/s] {'loss': 0.1416, 'grad_norm': 0.5506722927093506, 'learning_rate': 1.5454060908475733e-06, 'epoch': 2.31}
77%|███████▋ | 8858/11526 [1:32:34<27:18, 1.63it/s] 77%|███████▋ | 8859/11526 [1:32:34<27:18, 1.63it/s] {'loss': 0.1334, 'grad_norm': 0.5143094062805176, 'learning_rate': 1.544311503771279e-06, 'epoch': 2.31}
77%|███████▋ | 8859/11526 [1:32:34<27:18, 1.63it/s] 77%|███████▋ | 8860/11526 [1:32:35<27:17, 1.63it/s] {'loss': 0.1714, 'grad_norm': 0.6080276370048523, 'learning_rate': 1.5432172336703944e-06, 'epoch': 2.31}
77%|███████▋ | 8860/11526 [1:32:35<27:17, 1.63it/s] 77%|███████▋ | 8861/11526 [1:32:36<27:17, 1.63it/s] {'loss': 0.2149, 'grad_norm': 0.7451002597808838, 'learning_rate': 1.542123280645292e-06, 'epoch': 2.31}
77%|███████▋ | 8861/11526 [1:32:36<27:17, 1.63it/s] 77%|███████▋ | 8862/11526 [1:32:36<27:20, 1.62it/s] {'loss': 0.1675, 'grad_norm': 0.5827478766441345, 'learning_rate': 1.5410296447963197e-06, 'epoch': 2.31}
77%|███████▋ | 8862/11526 [1:32:36<27:20, 1.62it/s] 77%|███████▋ | 8863/11526 [1:32:37<27:20, 1.62it/s] {'loss': 0.1345, 'grad_norm': 0.5292099118232727, 'learning_rate': 1.5399363262237877e-06, 'epoch': 2.31}
77%|███████▋ | 8863/11526 [1:32:37<27:20, 1.62it/s] 77%|███████▋ | 8864/11526 [1:32:37<27:18, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.5138031244277954, 'learning_rate': 1.5388433250279827e-06, 'epoch': 2.31}
77%|███████▋ | 8864/11526 [1:32:38<27:18, 1.63it/s] 77%|███████▋ | 8865/11526 [1:32:38<27:16, 1.63it/s] {'loss': 0.1379, 'grad_norm': 0.5113317966461182, 'learning_rate': 1.5377506413091608e-06, 'epoch': 2.31}
77%|███████▋ | 8865/11526 [1:32:38<27:16, 1.63it/s] 77%|███████▋ | 8866/11526 [1:32:39<27:15, 1.63it/s] {'loss': 0.1387, 'grad_norm': 0.5175746083259583, 'learning_rate': 1.5366582751675497e-06, 'epoch': 2.31}
77%|███████▋ | 8866/11526 [1:32:39<27:15, 1.63it/s] 77%|███████▋ | 8867/11526 [1:32:39<27:16, 1.62it/s] {'loss': 0.0983, 'grad_norm': 0.4487150013446808, 'learning_rate': 1.535566226703349e-06, 'epoch': 2.31}
77%|███████▋ | 8867/11526 [1:32:39<27:16, 1.62it/s] 77%|███████▋ | 8868/11526 [1:32:40<27:15, 1.63it/s] {'loss': 0.134, 'grad_norm': 0.4850304126739502, 'learning_rate': 1.5344744960167241e-06, 'epoch': 2.31}
77%|███████▋ | 8868/11526 [1:32:40<27:15, 1.63it/s] 77%|███████▋ | 8869/11526 [1:32:41<27:13, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.6765167713165283, 'learning_rate': 1.533383083207816e-06, 'epoch': 2.31}
77%|███████▋ | 8869/11526 [1:32:41<27:13, 1.63it/s] 77%|███████▋ | 8870/11526 [1:32:41<27:12, 1.63it/s] {'loss': 0.1638, 'grad_norm': 0.6078419089317322, 'learning_rate': 1.5322919883767362e-06, 'epoch': 2.31}
77%|███████▋ | 8870/11526 [1:32:41<27:12, 1.63it/s] 77%|███████▋ | 8871/11526 [1:32:42<27:10, 1.63it/s] {'loss': 0.1248, 'grad_norm': 0.5672053694725037, 'learning_rate': 1.5312012116235659e-06, 'epoch': 2.31}
77%|███████▋ | 8871/11526 [1:32:42<27:10, 1.63it/s] 77%|███████▋ | 8872/11526 [1:32:42<27:12, 1.63it/s] {'loss': 0.2695, 'grad_norm': 0.5949676632881165, 'learning_rate': 1.5301107530483566e-06, 'epoch': 2.31}
77%|███████▋ | 8872/11526 [1:32:42<27:12, 1.63it/s] 77%|███████▋ | 8873/11526 [1:32:43<27:10, 1.63it/s] {'loss': 0.1468, 'grad_norm': 0.656792163848877, 'learning_rate': 1.5290206127511337e-06, 'epoch': 2.31}
77%|███████▋ | 8873/11526 [1:32:43<27:10, 1.63it/s] 77%|███████▋ | 8874/11526 [1:32:44<27:09, 1.63it/s] {'loss': 0.173, 'grad_norm': 0.6379019618034363, 'learning_rate': 1.5279307908318858e-06, 'epoch': 2.31}
77%|███████▋ | 8874/11526 [1:32:44<27:09, 1.63it/s] 77%|███████▋ | 8875/11526 [1:32:44<27:08, 1.63it/s] {'loss': 0.159, 'grad_norm': 0.5997089743614197, 'learning_rate': 1.5268412873905848e-06, 'epoch': 2.31}
77%|███████▋ | 8875/11526 [1:32:44<27:08, 1.63it/s] 77%|███████▋ | 8876/11526 [1:32:45<27:07, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.6348556280136108, 'learning_rate': 1.5257521025271604e-06, 'epoch': 2.31}
77%|███████▋ | 8876/11526 [1:32:45<27:07, 1.63it/s] 77%|███████▋ | 8877/11526 [1:32:45<27:08, 1.63it/s] {'loss': 0.1616, 'grad_norm': 0.6848830580711365, 'learning_rate': 1.524663236341521e-06, 'epoch': 2.31}
77%|███████▋ | 8877/11526 [1:32:46<27:08, 1.63it/s] 77%|███████▋ | 8878/11526 [1:32:46<27:08, 1.63it/s] {'loss': 0.1228, 'grad_norm': 0.5070384740829468, 'learning_rate': 1.523574688933543e-06, 'epoch': 2.31}
77%|███████▋ | 8878/11526 [1:32:46<27:08, 1.63it/s] 77%|███████▋ | 8879/11526 [1:32:47<27:07, 1.63it/s] {'loss': 0.1985, 'grad_norm': 0.7301344275474548, 'learning_rate': 1.5224864604030749e-06, 'epoch': 2.31}
77%|███████▋ | 8879/11526 [1:32:47<27:07, 1.63it/s] 77%|███████▋ | 8880/11526 [1:32:47<27:05, 1.63it/s] {'loss': 0.1821, 'grad_norm': 0.6682087779045105, 'learning_rate': 1.521398550849935e-06, 'epoch': 2.31}
77%|███████▋ | 8880/11526 [1:32:47<27:05, 1.63it/s] 77%|███████▋ | 8881/11526 [1:32:48<27:05, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.5369552373886108, 'learning_rate': 1.5203109603739136e-06, 'epoch': 2.31}
77%|███████▋ | 8881/11526 [1:32:48<27:05, 1.63it/s] 77%|███████▋ | 8882/11526 [1:32:49<27:06, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.5901002287864685, 'learning_rate': 1.5192236890747685e-06, 'epoch': 2.31}
77%|███████▋ | 8882/11526 [1:32:49<27:06, 1.63it/s] 77%|███████▋ | 8883/11526 [1:32:49<27:04, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.6518903970718384, 'learning_rate': 1.5181367370522309e-06, 'epoch': 2.31}
77%|███████▋ | 8883/11526 [1:32:49<27:04, 1.63it/s] 77%|███████▋ | 8884/11526 [1:32:50<27:03, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.5835035443305969, 'learning_rate': 1.5170501044060031e-06, 'epoch': 2.31}
77%|███████▋ | 8884/11526 [1:32:50<27:03, 1.63it/s] 77%|███████▋ | 8885/11526 [1:32:50<27:02, 1.63it/s] {'loss': 0.1627, 'grad_norm': 0.732899010181427, 'learning_rate': 1.5159637912357566e-06, 'epoch': 2.31}
77%|███████▋ | 8885/11526 [1:32:50<27:02, 1.63it/s] 77%|███████▋ | 8886/11526 [1:32:51<27:01, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.620129406452179, 'learning_rate': 1.514877797641135e-06, 'epoch': 2.31}
77%|███████▋ | 8886/11526 [1:32:51<27:01, 1.63it/s] 77%|███████▋ | 8887/11526 [1:32:52<27:02, 1.63it/s] {'loss': 0.1313, 'grad_norm': 0.491670697927475, 'learning_rate': 1.5137921237217523e-06, 'epoch': 2.31}
77%|███████▋ | 8887/11526 [1:32:52<27:02, 1.63it/s] 77%|███████▋ | 8888/11526 [1:32:52<27:02, 1.63it/s] {'loss': 0.1419, 'grad_norm': 0.5689735412597656, 'learning_rate': 1.512706769577189e-06, 'epoch': 2.31}
77%|███████▋ | 8888/11526 [1:32:52<27:02, 1.63it/s] 77%|███████▋ | 8889/11526 [1:32:53<27:00, 1.63it/s] {'loss': 0.146, 'grad_norm': 0.5570707321166992, 'learning_rate': 1.5116217353070056e-06, 'epoch': 2.31}
77%|███████▋ | 8889/11526 [1:32:53<27:00, 1.63it/s] 77%|███████▋ | 8890/11526 [1:32:53<26:59, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.5264554023742676, 'learning_rate': 1.5105370210107233e-06, 'epoch': 2.31}
77%|███████▋ | 8890/11526 [1:32:54<26:59, 1.63it/s] 77%|███████▋ | 8891/11526 [1:32:54<26:58, 1.63it/s] {'loss': 0.1788, 'grad_norm': 0.6475868225097656, 'learning_rate': 1.5094526267878395e-06, 'epoch': 2.31}
77%|███████▋ | 8891/11526 [1:32:54<26:58, 1.63it/s] 77%|███████▋ | 8892/11526 [1:32:55<26:59, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.7157787680625916, 'learning_rate': 1.5083685527378217e-06, 'epoch': 2.31}
77%|███████▋ | 8892/11526 [1:32:55<26:59, 1.63it/s] 77%|███████▋ | 8893/11526 [1:32:55<26:57, 1.63it/s] {'loss': 0.1936, 'grad_norm': 0.5870731472969055, 'learning_rate': 1.5072847989601064e-06, 'epoch': 2.31}
77%|███████▋ | 8893/11526 [1:32:55<26:57, 1.63it/s] 77%|███████▋ | 8894/11526 [1:32:56<27:03, 1.62it/s] {'loss': 0.1509, 'grad_norm': 0.5242484211921692, 'learning_rate': 1.506201365554102e-06, 'epoch': 2.31}
77%|███████▋ | 8894/11526 [1:32:56<27:03, 1.62it/s] 77%|███████▋ | 8895/11526 [1:32:57<27:00, 1.62it/s] {'loss': 0.1354, 'grad_norm': 0.498103529214859, 'learning_rate': 1.5051182526191888e-06, 'epoch': 2.32}
77%|███████▋ | 8895/11526 [1:32:57<27:00, 1.62it/s] 77%|███████▋ | 8896/11526 [1:32:57<26:57, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.5975452661514282, 'learning_rate': 1.504035460254713e-06, 'epoch': 2.32}
77%|███████▋ | 8896/11526 [1:32:57<26:57, 1.63it/s] 77%|███████▋ | 8897/11526 [1:32:58<26:57, 1.62it/s] {'loss': 0.1461, 'grad_norm': 1.0666425228118896, 'learning_rate': 1.502952988559996e-06, 'epoch': 2.32}
77%|███████▋ | 8897/11526 [1:32:58<26:57, 1.62it/s] 77%|███████▋ | 8898/11526 [1:32:58<26:55, 1.63it/s] {'loss': 0.1255, 'grad_norm': 0.5307433009147644, 'learning_rate': 1.5018708376343283e-06, 'epoch': 2.32}
77%|███████▋ | 8898/11526 [1:32:58<26:55, 1.63it/s] 77%|███████▋ | 8899/11526 [1:32:59<26:54, 1.63it/s] {'loss': 0.1937, 'grad_norm': 0.6922629475593567, 'learning_rate': 1.5007890075769704e-06, 'epoch': 2.32}
77%|███████▋ | 8899/11526 [1:32:59<26:54, 1.63it/s] 77%|███████▋ | 8900/11526 [1:33:00<26:53, 1.63it/s] {'loss': 0.1508, 'grad_norm': 0.6077615022659302, 'learning_rate': 1.4997074984871567e-06, 'epoch': 2.32}
77%|███████▋ | 8900/11526 [1:33:00<26:53, 1.63it/s] 77%|███████▋ | 8901/11526 [1:33:00<26:52, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.6101649403572083, 'learning_rate': 1.4986263104640835e-06, 'epoch': 2.32}
77%|███████▋ | 8901/11526 [1:33:00<26:52, 1.63it/s] 77%|███████▋ | 8902/11526 [1:33:01<26:54, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.6717856526374817, 'learning_rate': 1.4975454436069292e-06, 'epoch': 2.32}
77%|███████▋ | 8902/11526 [1:33:01<26:54, 1.63it/s] 77%|███████▋ | 8903/11526 [1:33:01<26:52, 1.63it/s] {'loss': 0.1642, 'grad_norm': 0.5219039916992188, 'learning_rate': 1.4964648980148361e-06, 'epoch': 2.32}
77%|███████▋ | 8903/11526 [1:33:02<26:52, 1.63it/s] 77%|███████▋ | 8904/11526 [1:33:02<26:52, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5640570521354675, 'learning_rate': 1.4953846737869155e-06, 'epoch': 2.32}
77%|███████▋ | 8904/11526 [1:33:02<26:52, 1.63it/s] 77%|███████▋ | 8905/11526 [1:33:03<26:51, 1.63it/s] {'loss': 0.2056, 'grad_norm': 0.7393317222595215, 'learning_rate': 1.4943047710222525e-06, 'epoch': 2.32}
77%|███████▋ | 8905/11526 [1:33:03<26:51, 1.63it/s] 77%|███████▋ | 8906/11526 [1:33:03<26:50, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5474609732627869, 'learning_rate': 1.4932251898199029e-06, 'epoch': 2.32}
77%|███████▋ | 8906/11526 [1:33:03<26:50, 1.63it/s] 77%|███████▋ | 8907/11526 [1:33:04<26:53, 1.62it/s] {'loss': 0.1704, 'grad_norm': 0.6378577351570129, 'learning_rate': 1.4921459302788916e-06, 'epoch': 2.32}
77%|███████▋ | 8907/11526 [1:33:04<26:53, 1.62it/s] 77%|███████▋ | 8908/11526 [1:33:04<26:51, 1.62it/s] {'loss': 0.1539, 'grad_norm': 0.5997334122657776, 'learning_rate': 1.4910669924982162e-06, 'epoch': 2.32}
77%|███████▋ | 8908/11526 [1:33:05<26:51, 1.62it/s] 77%|███████▋ | 8909/11526 [1:33:05<26:49, 1.63it/s] {'loss': 0.1488, 'grad_norm': 0.5539162755012512, 'learning_rate': 1.4899883765768397e-06, 'epoch': 2.32}
77%|███████▋ | 8909/11526 [1:33:05<26:49, 1.63it/s] 77%|███████▋ | 8910/11526 [1:33:06<26:47, 1.63it/s] {'loss': 0.1362, 'grad_norm': 0.5920711159706116, 'learning_rate': 1.4889100826136987e-06, 'epoch': 2.32}
77%|███████▋ | 8910/11526 [1:33:06<26:47, 1.63it/s] 77%|███████▋ | 8911/11526 [1:33:06<26:46, 1.63it/s] {'loss': 0.1343, 'grad_norm': 0.5293447971343994, 'learning_rate': 1.4878321107077053e-06, 'epoch': 2.32}
77%|███████▋ | 8911/11526 [1:33:06<26:46, 1.63it/s] 77%|███████▋ | 8912/11526 [1:33:07<26:46, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.6852651834487915, 'learning_rate': 1.4867544609577322e-06, 'epoch': 2.32}
77%|███████▋ | 8912/11526 [1:33:07<26:46, 1.63it/s] 77%|███████▋ | 8913/11526 [1:33:08<26:46, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.6602678894996643, 'learning_rate': 1.48567713346263e-06, 'epoch': 2.32}
77%|███████▋ | 8913/11526 [1:33:08<26:46, 1.63it/s] 77%|███████▋ | 8914/11526 [1:33:08<26:45, 1.63it/s] {'loss': 0.1767, 'grad_norm': 0.6167036890983582, 'learning_rate': 1.484600128321217e-06, 'epoch': 2.32}
77%|███████▋ | 8914/11526 [1:33:08<26:45, 1.63it/s] 77%|███████▋ | 8915/11526 [1:33:09<26:44, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.6291491985321045, 'learning_rate': 1.4835234456322812e-06, 'epoch': 2.32}
77%|███████▋ | 8915/11526 [1:33:09<26:44, 1.63it/s] 77%|███████▋ | 8916/11526 [1:33:09<26:43, 1.63it/s] {'loss': 0.147, 'grad_norm': 0.5737622976303101, 'learning_rate': 1.4824470854945849e-06, 'epoch': 2.32}
77%|███████▋ | 8916/11526 [1:33:10<26:43, 1.63it/s] 77%|███████▋ | 8917/11526 [1:33:10<26:43, 1.63it/s] {'loss': 0.1663, 'grad_norm': 1.216631531715393, 'learning_rate': 1.4813710480068543e-06, 'epoch': 2.32}
77%|███████▋ | 8917/11526 [1:33:10<26:43, 1.63it/s] 77%|███████▋ | 8918/11526 [1:33:11<26:42, 1.63it/s] {'loss': 0.1312, 'grad_norm': 0.49199411273002625, 'learning_rate': 1.4802953332677915e-06, 'epoch': 2.32}
77%|███████▋ | 8918/11526 [1:33:11<26:42, 1.63it/s] 77%|███████▋ | 8919/11526 [1:33:11<26:42, 1.63it/s] {'loss': 0.1828, 'grad_norm': 0.6187573075294495, 'learning_rate': 1.4792199413760671e-06, 'epoch': 2.32}
77%|███████▋ | 8919/11526 [1:33:11<26:42, 1.63it/s] 77%|███████▋ | 8920/11526 [1:33:12<26:42, 1.63it/s] {'loss': 0.2154, 'grad_norm': 0.6591649055480957, 'learning_rate': 1.4781448724303222e-06, 'epoch': 2.32}
77%|███████▋ | 8920/11526 [1:33:12<26:42, 1.63it/s] 77%|███████▋ | 8921/11526 [1:33:12<26:42, 1.63it/s] {'loss': 0.1398, 'grad_norm': 0.5717276930809021, 'learning_rate': 1.4770701265291682e-06, 'epoch': 2.32}
77%|███████▋ | 8921/11526 [1:33:13<26:42, 1.63it/s] 77%|███████▋ | 8922/11526 [1:33:13<26:41, 1.63it/s] {'loss': 0.1413, 'grad_norm': 0.5270786285400391, 'learning_rate': 1.475995703771188e-06, 'epoch': 2.32}
77%|███████▋ | 8922/11526 [1:33:13<26:41, 1.63it/s] 77%|███████▋ | 8923/11526 [1:33:14<26:40, 1.63it/s] {'loss': 0.1369, 'grad_norm': 0.5191637277603149, 'learning_rate': 1.4749216042549297e-06, 'epoch': 2.32}
77%|███████▋ | 8923/11526 [1:33:14<26:40, 1.63it/s] 77%|███████▋ | 8924/11526 [1:33:14<26:38, 1.63it/s] {'loss': 0.2224, 'grad_norm': 0.6824501156806946, 'learning_rate': 1.4738478280789215e-06, 'epoch': 2.32}
77%|███████▋ | 8924/11526 [1:33:14<26:38, 1.63it/s] 77%|███████▋ | 8925/11526 [1:33:15<26:38, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.530282735824585, 'learning_rate': 1.4727743753416524e-06, 'epoch': 2.32}
77%|███████▋ | 8925/11526 [1:33:15<26:38, 1.63it/s] 77%|███████▋ | 8926/11526 [1:33:16<26:36, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.548755407333374, 'learning_rate': 1.4717012461415863e-06, 'epoch': 2.32}
77%|███████▋ | 8926/11526 [1:33:16<26:36, 1.63it/s] 77%|███████▋ | 8927/11526 [1:33:16<26:39, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.6806468367576599, 'learning_rate': 1.4706284405771587e-06, 'epoch': 2.32}
77%|███████▋ | 8927/11526 [1:33:16<26:39, 1.63it/s] 77%|███████▋ | 8928/11526 [1:33:17<26:37, 1.63it/s] {'loss': 0.1491, 'grad_norm': 0.5620914697647095, 'learning_rate': 1.469555958746769e-06, 'epoch': 2.32}
77%|███████▋ | 8928/11526 [1:33:17<26:37, 1.63it/s] 77%|███████▋ | 8929/11526 [1:33:17<26:36, 1.63it/s] {'loss': 0.178, 'grad_norm': 0.702404260635376, 'learning_rate': 1.4684838007487957e-06, 'epoch': 2.32}
77%|███████▋ | 8929/11526 [1:33:18<26:36, 1.63it/s] 77%|███████▋ | 8930/11526 [1:33:18<26:35, 1.63it/s] {'loss': 0.1206, 'grad_norm': 0.4985222816467285, 'learning_rate': 1.4674119666815828e-06, 'epoch': 2.32}
77%|███████▋ | 8930/11526 [1:33:18<26:35, 1.63it/s] 77%|███████▋ | 8931/11526 [1:33:19<26:33, 1.63it/s] {'loss': 0.1625, 'grad_norm': 0.6718385219573975, 'learning_rate': 1.4663404566434425e-06, 'epoch': 2.32}
77%|███████▋ | 8931/11526 [1:33:19<26:33, 1.63it/s] 77%|███████▋ | 8932/11526 [1:33:19<26:34, 1.63it/s] {'loss': 0.129, 'grad_norm': 0.6102374792098999, 'learning_rate': 1.4652692707326616e-06, 'epoch': 2.32}
77%|███████▋ | 8932/11526 [1:33:19<26:34, 1.63it/s] 78%|███████▊ | 8933/11526 [1:33:20<26:34, 1.63it/s] {'loss': 0.1452, 'grad_norm': 0.5291335582733154, 'learning_rate': 1.4641984090474948e-06, 'epoch': 2.33}
78%|███████▊ | 8933/11526 [1:33:20<26:34, 1.63it/s] 78%|███████▊ | 8934/11526 [1:33:20<26:32, 1.63it/s] {'loss': 0.1086, 'grad_norm': 0.4638914167881012, 'learning_rate': 1.4631278716861674e-06, 'epoch': 2.33}
78%|███████▊ | 8934/11526 [1:33:21<26:32, 1.63it/s] 78%|███████▊ | 8935/11526 [1:33:21<26:32, 1.63it/s] {'loss': 0.177, 'grad_norm': 0.6851697564125061, 'learning_rate': 1.4620576587468777e-06, 'epoch': 2.33}
78%|███████▊ | 8935/11526 [1:33:21<26:32, 1.63it/s] 78%|███████▊ | 8936/11526 [1:33:22<26:31, 1.63it/s] {'loss': 0.1234, 'grad_norm': 0.4799026548862457, 'learning_rate': 1.460987770327788e-06, 'epoch': 2.33}
78%|███████▊ | 8936/11526 [1:33:22<26:31, 1.63it/s] 78%|███████▊ | 8937/11526 [1:33:22<26:32, 1.63it/s] {'loss': 0.1293, 'grad_norm': 0.5537145733833313, 'learning_rate': 1.459918206527034e-06, 'epoch': 2.33}
78%|███████▊ | 8937/11526 [1:33:22<26:32, 1.63it/s] 78%|███████▊ | 8938/11526 [1:33:23<26:32, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5794025659561157, 'learning_rate': 1.458848967442728e-06, 'epoch': 2.33}
78%|███████▊ | 8938/11526 [1:33:23<26:32, 1.63it/s] 78%|███████▊ | 8939/11526 [1:33:24<26:31, 1.63it/s] {'loss': 0.1418, 'grad_norm': 0.597206711769104, 'learning_rate': 1.4577800531729413e-06, 'epoch': 2.33}
78%|███████▊ | 8939/11526 [1:33:24<26:31, 1.63it/s] 78%|███████▊ | 8940/11526 [1:33:24<26:29, 1.63it/s] {'loss': 0.1885, 'grad_norm': 0.6783007383346558, 'learning_rate': 1.4567114638157225e-06, 'epoch': 2.33}
78%|███████▊ | 8940/11526 [1:33:24<26:29, 1.63it/s] 78%|███████▊ | 8941/11526 [1:33:25<26:28, 1.63it/s] {'loss': 0.1301, 'grad_norm': 0.5481235384941101, 'learning_rate': 1.455643199469089e-06, 'epoch': 2.33}
78%|███████▊ | 8941/11526 [1:33:25<26:28, 1.63it/s] 78%|███████▊ | 8942/11526 [1:33:25<26:29, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.637937068939209, 'learning_rate': 1.4545752602310277e-06, 'epoch': 2.33}
78%|███████▊ | 8942/11526 [1:33:26<26:29, 1.63it/s] 78%|███████▊ | 8943/11526 [1:33:26<26:28, 1.63it/s] {'loss': 0.1264, 'grad_norm': 0.5258358120918274, 'learning_rate': 1.4535076461994974e-06, 'epoch': 2.33}
78%|███████▊ | 8943/11526 [1:33:26<26:28, 1.63it/s] 78%|███████▊ | 8944/11526 [1:33:27<26:28, 1.63it/s] {'loss': 0.2244, 'grad_norm': 0.6388576626777649, 'learning_rate': 1.4524403574724233e-06, 'epoch': 2.33}
78%|███████▊ | 8944/11526 [1:33:27<26:28, 1.63it/s] 78%|███████▊ | 8945/11526 [1:33:27<26:26, 1.63it/s] {'loss': 0.1341, 'grad_norm': 0.5240498781204224, 'learning_rate': 1.4513733941477027e-06, 'epoch': 2.33}
78%|███████▊ | 8945/11526 [1:33:27<26:26, 1.63it/s] 78%|███████▊ | 8946/11526 [1:33:28<26:26, 1.63it/s] {'loss': 0.1668, 'grad_norm': 0.540454089641571, 'learning_rate': 1.4503067563232081e-06, 'epoch': 2.33}
78%|███████▊ | 8946/11526 [1:33:28<26:26, 1.63it/s] 78%|███████▊ | 8947/11526 [1:33:28<26:26, 1.63it/s] {'loss': 0.1159, 'grad_norm': 0.4853179454803467, 'learning_rate': 1.4492404440967733e-06, 'epoch': 2.33}
78%|███████▊ | 8947/11526 [1:33:29<26:26, 1.63it/s] 78%|███████▊ | 8948/11526 [1:33:29<26:25, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5558112263679504, 'learning_rate': 1.4481744575662082e-06, 'epoch': 2.33}
78%|███████▊ | 8948/11526 [1:33:29<26:25, 1.63it/s] 78%|███████▊ | 8949/11526 [1:33:30<26:23, 1.63it/s] {'loss': 0.1195, 'grad_norm': 0.5033069252967834, 'learning_rate': 1.4471087968292925e-06, 'epoch': 2.33}
78%|███████▊ | 8949/11526 [1:33:30<26:23, 1.63it/s] 78%|███████▊ | 8950/11526 [1:33:30<26:23, 1.63it/s] {'loss': 0.1838, 'grad_norm': 0.6701622009277344, 'learning_rate': 1.4460434619837693e-06, 'epoch': 2.33}
78%|███████▊ | 8950/11526 [1:33:30<26:23, 1.63it/s] 78%|███████▊ | 8951/11526 [1:33:31<26:21, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.49765944480895996, 'learning_rate': 1.4449784531273648e-06, 'epoch': 2.33}
78%|███████▊ | 8951/11526 [1:33:31<26:21, 1.63it/s] 78%|███████▊ | 8952/11526 [1:33:32<26:24, 1.62it/s] {'loss': 0.1697, 'grad_norm': 0.654125452041626, 'learning_rate': 1.4439137703577622e-06, 'epoch': 2.33}
78%|███████▊ | 8952/11526 [1:33:32<26:24, 1.62it/s] 78%|███████▊ | 8953/11526 [1:33:32<26:23, 1.63it/s] {'loss': 0.1555, 'grad_norm': 0.6408305764198303, 'learning_rate': 1.4428494137726213e-06, 'epoch': 2.33}
78%|███████▊ | 8953/11526 [1:33:32<26:23, 1.63it/s] 78%|███████▊ | 8954/11526 [1:33:33<26:21, 1.63it/s] {'loss': 0.1321, 'grad_norm': 0.5744127035140991, 'learning_rate': 1.4417853834695721e-06, 'epoch': 2.33}
78%|███████▊ | 8954/11526 [1:33:33<26:21, 1.63it/s] 78%|███████▊ | 8955/11526 [1:33:33<26:20, 1.63it/s] {'loss': 0.1444, 'grad_norm': 0.5630233287811279, 'learning_rate': 1.4407216795462126e-06, 'epoch': 2.33}
78%|███████▊ | 8955/11526 [1:33:34<26:20, 1.63it/s] 78%|███████▊ | 8956/11526 [1:33:34<26:20, 1.63it/s] {'loss': 0.1358, 'grad_norm': 0.5630348324775696, 'learning_rate': 1.439658302100112e-06, 'epoch': 2.33}
78%|███████▊ | 8956/11526 [1:33:34<26:20, 1.63it/s] 78%|███████▊ | 8957/11526 [1:33:35<26:20, 1.63it/s] {'loss': 0.1727, 'grad_norm': 0.5889248251914978, 'learning_rate': 1.4385952512288115e-06, 'epoch': 2.33}
78%|███████▊ | 8957/11526 [1:33:35<26:20, 1.63it/s] 78%|███████▊ | 8958/11526 [1:33:35<26:18, 1.63it/s] {'loss': 0.1388, 'grad_norm': 0.5795228481292725, 'learning_rate': 1.4375325270298163e-06, 'epoch': 2.33}
78%|███████▊ | 8958/11526 [1:33:35<26:18, 1.63it/s] 78%|███████▊ | 8959/11526 [1:33:36<26:17, 1.63it/s] {'loss': 0.1386, 'grad_norm': 0.5454730987548828, 'learning_rate': 1.4364701296006055e-06, 'epoch': 2.33}
78%|███████▊ | 8959/11526 [1:33:36<26:17, 1.63it/s] 78%|███████▊ | 8960/11526 [1:33:36<26:16, 1.63it/s] {'loss': 0.1931, 'grad_norm': 0.7215837240219116, 'learning_rate': 1.4354080590386338e-06, 'epoch': 2.33}
78%|███████▊ | 8960/11526 [1:33:37<26:16, 1.63it/s] 78%|███████▊ | 8961/11526 [1:33:37<26:15, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.5687815546989441, 'learning_rate': 1.4343463154413145e-06, 'epoch': 2.33}
78%|███████▊ | 8961/11526 [1:33:37<26:15, 1.63it/s] 78%|███████▊ | 8962/11526 [1:33:38<26:15, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.6873444318771362, 'learning_rate': 1.433284898906039e-06, 'epoch': 2.33}
78%|███████▊ | 8962/11526 [1:33:38<26:15, 1.63it/s] 78%|███████▊ | 8963/11526 [1:33:38<26:14, 1.63it/s] {'loss': 0.1434, 'grad_norm': 0.5920166373252869, 'learning_rate': 1.4322238095301665e-06, 'epoch': 2.33}
78%|███████▊ | 8963/11526 [1:33:38<26:14, 1.63it/s] 78%|███████▊ | 8964/11526 [1:33:39<26:14, 1.63it/s] {'loss': 0.1251, 'grad_norm': 0.5173030495643616, 'learning_rate': 1.4311630474110262e-06, 'epoch': 2.33}
78%|███████▊ | 8964/11526 [1:33:39<26:14, 1.63it/s] 78%|███████▊ | 8965/11526 [1:33:40<26:13, 1.63it/s] {'loss': 0.1111, 'grad_norm': 0.4812847375869751, 'learning_rate': 1.4301026126459177e-06, 'epoch': 2.33}
78%|███████▊ | 8965/11526 [1:33:40<26:13, 1.63it/s] 78%|███████▊ | 8966/11526 [1:33:40<26:11, 1.63it/s] {'loss': 0.1903, 'grad_norm': 0.6516045331954956, 'learning_rate': 1.4290425053321088e-06, 'epoch': 2.33}
78%|███████▊ | 8966/11526 [1:33:40<26:11, 1.63it/s] 78%|███████▊ | 8967/11526 [1:33:41<26:10, 1.63it/s] {'loss': 0.1881, 'grad_norm': 0.7571191787719727, 'learning_rate': 1.4279827255668387e-06, 'epoch': 2.33}
78%|███████▊ | 8967/11526 [1:33:41<26:10, 1.63it/s] 78%|███████▊ | 8968/11526 [1:33:41<26:10, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.5115664601325989, 'learning_rate': 1.4269232734473171e-06, 'epoch': 2.33}
78%|███████▊ | 8968/11526 [1:33:42<26:10, 1.63it/s] 78%|███████▊ | 8969/11526 [1:33:42<26:09, 1.63it/s] {'loss': 0.1709, 'grad_norm': 0.6342775225639343, 'learning_rate': 1.425864149070723e-06, 'epoch': 2.33}
78%|███████▊ | 8969/11526 [1:33:42<26:09, 1.63it/s] 78%|███████▊ | 8970/11526 [1:33:43<26:08, 1.63it/s] {'loss': 0.1174, 'grad_norm': 0.5135000944137573, 'learning_rate': 1.4248053525342053e-06, 'epoch': 2.33}
78%|███████▊ | 8970/11526 [1:33:43<26:08, 1.63it/s] 78%|███████▊ | 8971/11526 [1:33:43<26:08, 1.63it/s] {'loss': 0.1498, 'grad_norm': 0.6425220370292664, 'learning_rate': 1.423746883934884e-06, 'epoch': 2.33}
78%|███████▊ | 8971/11526 [1:33:43<26:08, 1.63it/s] 78%|███████▊ | 8972/11526 [1:33:44<26:07, 1.63it/s] {'loss': 0.1636, 'grad_norm': 0.6263210773468018, 'learning_rate': 1.4226887433698433e-06, 'epoch': 2.34}
78%|███████▊ | 8972/11526 [1:33:44<26:07, 1.63it/s] 78%|███████▊ | 8973/11526 [1:33:44<26:07, 1.63it/s] {'loss': 0.1673, 'grad_norm': 0.5771814584732056, 'learning_rate': 1.421630930936148e-06, 'epoch': 2.34}
78%|███████▊ | 8973/11526 [1:33:45<26:07, 1.63it/s] 78%|███████▊ | 8974/11526 [1:33:45<26:06, 1.63it/s] {'loss': 0.1664, 'grad_norm': 0.6530445218086243, 'learning_rate': 1.420573446730823e-06, 'epoch': 2.34}
78%|███████▊ | 8974/11526 [1:33:45<26:06, 1.63it/s] 78%|███████▊ | 8975/11526 [1:33:46<26:06, 1.63it/s] {'loss': 0.1656, 'grad_norm': 0.6173193454742432, 'learning_rate': 1.419516290850867e-06, 'epoch': 2.34}
78%|███████▊ | 8975/11526 [1:33:46<26:06, 1.63it/s] 78%|███████▊ | 8976/11526 [1:33:46<26:05, 1.63it/s] {'loss': 0.1432, 'grad_norm': 0.5714867115020752, 'learning_rate': 1.4184594633932502e-06, 'epoch': 2.34}
78%|███████▊ | 8976/11526 [1:33:46<26:05, 1.63it/s] 78%|███████▊ | 8977/11526 [1:33:47<26:04, 1.63it/s] {'loss': 0.1503, 'grad_norm': 0.5832103490829468, 'learning_rate': 1.417402964454907e-06, 'epoch': 2.34}
78%|███████▊ | 8977/11526 [1:33:47<26:04, 1.63it/s] 78%|███████▊ | 8978/11526 [1:33:48<26:03, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.6513274908065796, 'learning_rate': 1.4163467941327498e-06, 'epoch': 2.34}
78%|███████▊ | 8978/11526 [1:33:48<26:03, 1.63it/s] 78%|███████▊ | 8979/11526 [1:33:48<26:02, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5625723600387573, 'learning_rate': 1.4152909525236557e-06, 'epoch': 2.34}
78%|███████▊ | 8979/11526 [1:33:48<26:02, 1.63it/s] 78%|███████▊ | 8980/11526 [1:33:49<26:01, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.6542501449584961, 'learning_rate': 1.41423543972447e-06, 'epoch': 2.34}
78%|███████▊ | 8980/11526 [1:33:49<26:01, 1.63it/s] 78%|███████▊ | 8981/11526 [1:33:49<26:01, 1.63it/s] {'loss': 0.1348, 'grad_norm': 0.5347817540168762, 'learning_rate': 1.4131802558320124e-06, 'epoch': 2.34}
78%|███████▊ | 8981/11526 [1:33:49<26:01, 1.63it/s] 78%|███████▊ | 8982/11526 [1:33:50<26:01, 1.63it/s] {'loss': 0.1324, 'grad_norm': 0.5755084156990051, 'learning_rate': 1.4121254009430696e-06, 'epoch': 2.34}
78%|███████▊ | 8982/11526 [1:33:50<26:01, 1.63it/s] 78%|███████▊ | 8983/11526 [1:33:51<26:00, 1.63it/s] {'loss': 0.1416, 'grad_norm': 0.5461104512214661, 'learning_rate': 1.4110708751543995e-06, 'epoch': 2.34}
78%|███████▊ | 8983/11526 [1:33:51<26:00, 1.63it/s] 78%|███████▊ | 8984/11526 [1:33:51<26:00, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.6479084491729736, 'learning_rate': 1.4100166785627301e-06, 'epoch': 2.34}
78%|███████▊ | 8984/11526 [1:33:51<26:00, 1.63it/s] 78%|███████▊ | 8985/11526 [1:33:52<25:59, 1.63it/s] {'loss': 0.1676, 'grad_norm': 0.5907288193702698, 'learning_rate': 1.4089628112647557e-06, 'epoch': 2.34}
78%|███████▊ | 8985/11526 [1:33:52<25:59, 1.63it/s] 78%|███████▊ | 8986/11526 [1:33:52<26:00, 1.63it/s] {'loss': 0.1821, 'grad_norm': 0.6565757989883423, 'learning_rate': 1.4079092733571432e-06, 'epoch': 2.34}
78%|███████▊ | 8986/11526 [1:33:53<26:00, 1.63it/s] 78%|███████▊ | 8987/11526 [1:33:53<26:00, 1.63it/s] {'loss': 0.1259, 'grad_norm': 0.5469707250595093, 'learning_rate': 1.406856064936533e-06, 'epoch': 2.34}
78%|███████▊ | 8987/11526 [1:33:53<26:00, 1.63it/s] 78%|███████▊ | 8988/11526 [1:33:54<25:58, 1.63it/s] {'loss': 0.1657, 'grad_norm': 0.7027230262756348, 'learning_rate': 1.4058031860995269e-06, 'epoch': 2.34}
78%|███████▊ | 8988/11526 [1:33:54<25:58, 1.63it/s] 78%|███████▊ | 8989/11526 [1:33:54<25:57, 1.63it/s] {'loss': 0.1576, 'grad_norm': 0.5957316756248474, 'learning_rate': 1.4047506369427034e-06, 'epoch': 2.34}
78%|███████▊ | 8989/11526 [1:33:54<25:57, 1.63it/s] 78%|███████▊ | 8990/11526 [1:33:55<25:58, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.56580650806427, 'learning_rate': 1.4036984175626068e-06, 'epoch': 2.34}
78%|███████▊ | 8990/11526 [1:33:55<25:58, 1.63it/s] 78%|███████▊ | 8991/11526 [1:33:55<25:57, 1.63it/s] {'loss': 0.1713, 'grad_norm': 0.6362360119819641, 'learning_rate': 1.4026465280557538e-06, 'epoch': 2.34}
78%|███████▊ | 8991/11526 [1:33:56<25:57, 1.63it/s] 78%|███████▊ | 8992/11526 [1:33:56<25:56, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.7455258965492249, 'learning_rate': 1.4015949685186315e-06, 'epoch': 2.34}
78%|███████▊ | 8992/11526 [1:33:56<25:56, 1.63it/s] 78%|███████▊ | 8993/11526 [1:33:57<25:56, 1.63it/s] {'loss': 0.1237, 'grad_norm': 0.4985145330429077, 'learning_rate': 1.4005437390476906e-06, 'epoch': 2.34}
78%|███████▊ | 8993/11526 [1:33:57<25:56, 1.63it/s] 78%|███████▊ | 8994/11526 [1:33:57<25:56, 1.63it/s] {'loss': 0.1954, 'grad_norm': 0.6560027003288269, 'learning_rate': 1.3994928397393565e-06, 'epoch': 2.34}
78%|███████▊ | 8994/11526 [1:33:57<25:56, 1.63it/s] 78%|███████▊ | 8995/11526 [1:33:58<25:54, 1.63it/s] {'loss': 0.2005, 'grad_norm': 0.7346134781837463, 'learning_rate': 1.398442270690028e-06, 'epoch': 2.34}
78%|███████▊ | 8995/11526 [1:33:58<25:54, 1.63it/s] 78%|███████▊ | 8996/11526 [1:33:59<25:54, 1.63it/s] {'loss': 0.1244, 'grad_norm': 0.5812108516693115, 'learning_rate': 1.3973920319960654e-06, 'epoch': 2.34}
78%|███████▊ | 8996/11526 [1:33:59<25:54, 1.63it/s] 78%|███████▊ | 8997/11526 [1:33:59<25:53, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.5214060544967651, 'learning_rate': 1.3963421237538033e-06, 'epoch': 2.34}
78%|███████▊ | 8997/11526 [1:33:59<25:53, 1.63it/s] 78%|███████▊ | 8998/11526 [1:34:00<25:52, 1.63it/s] {'loss': 0.1262, 'grad_norm': 0.5659401416778564, 'learning_rate': 1.3952925460595473e-06, 'epoch': 2.34}
78%|███████▊ | 8998/11526 [1:34:00<25:52, 1.63it/s] 78%|███████▊ | 8999/11526 [1:34:00<25:52, 1.63it/s] {'loss': 0.173, 'grad_norm': 0.6428674459457397, 'learning_rate': 1.3942432990095655e-06, 'epoch': 2.34}
78%|███████▊ | 8999/11526 [1:34:01<25:52, 1.63it/s] 78%|███████▊ | 9000/11526 [1:34:01<25:51, 1.63it/s] {'loss': 0.208, 'grad_norm': 0.8110184073448181, 'learning_rate': 1.3931943827001077e-06, 'epoch': 2.34}
78%|███████▊ | 9000/11526 [1:34:01<25:51, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5437502861022949, 'eval_runtime': 1.9541, 'eval_samples_per_second': 102.347, 'eval_steps_per_second': 6.653, 'epoch': 2.34}
78%|███████▊ | 9000/11526 [1:34:03<25:51, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 78%|███████▊ | 9001/11526 [1:34:04<50:34, 1.20s/it] {'loss': 0.1495, 'grad_norm': 0.5822898149490356, 'learning_rate': 1.3921457972273815e-06, 'epoch': 2.34}
78%|███████▊ | 9001/11526 [1:34:04<50:34, 1.20s/it] 78%|███████▊ | 9002/11526 [1:34:04<43:08, 1.03s/it] {'loss': 0.1684, 'grad_norm': 0.6106717586517334, 'learning_rate': 1.3910975426875706e-06, 'epoch': 2.34}
78%|███████▊ | 9002/11526 [1:34:04<43:08, 1.03s/it] 78%|███████▊ | 9003/11526 [1:34:05<37:55, 1.11it/s] {'loss': 0.1559, 'grad_norm': 0.640407919883728, 'learning_rate': 1.3900496191768265e-06, 'epoch': 2.34}
78%|███████▊ | 9003/11526 [1:34:05<37:55, 1.11it/s] 78%|███████▊ | 9004/11526 [1:34:05<34:16, 1.23it/s] {'loss': 0.1431, 'grad_norm': 0.5458511710166931, 'learning_rate': 1.3890020267912712e-06, 'epoch': 2.34}
78%|███████▊ | 9004/11526 [1:34:06<34:16, 1.23it/s] 78%|███████▊ | 9005/11526 [1:34:06<31:43, 1.32it/s] {'loss': 0.1608, 'grad_norm': 0.5619826316833496, 'learning_rate': 1.387954765626996e-06, 'epoch': 2.34}
78%|███████▊ | 9005/11526 [1:34:06<31:43, 1.32it/s] 78%|███████▊ | 9006/11526 [1:34:07<29:55, 1.40it/s] {'loss': 0.1215, 'grad_norm': 0.4727800190448761, 'learning_rate': 1.3869078357800625e-06, 'epoch': 2.34}
78%|███████▊ | 9006/11526 [1:34:07<29:55, 1.40it/s] 78%|███████▊ | 9007/11526 [1:34:07<28:40, 1.46it/s] {'loss': 0.1316, 'grad_norm': 0.559736967086792, 'learning_rate': 1.3858612373464992e-06, 'epoch': 2.34}
78%|███████▊ | 9007/11526 [1:34:07<28:40, 1.46it/s] 78%|███████▊ | 9008/11526 [1:34:08<27:47, 1.51it/s] {'loss': 0.1476, 'grad_norm': 0.5609815120697021, 'learning_rate': 1.3848149704223062e-06, 'epoch': 2.34}
78%|███████▊ | 9008/11526 [1:34:08<27:47, 1.51it/s] 78%|███████▊ | 9009/11526 [1:34:09<27:10, 1.54it/s] {'loss': 0.1645, 'grad_norm': 0.6564740538597107, 'learning_rate': 1.3837690351034539e-06, 'epoch': 2.34}
78%|███████▊ | 9009/11526 [1:34:09<27:10, 1.54it/s] 78%|███████▊ | 9010/11526 [1:34:09<26:44, 1.57it/s] {'loss': 0.1408, 'grad_norm': 0.6113739013671875, 'learning_rate': 1.3827234314858806e-06, 'epoch': 2.35}
78%|███████▊ | 9010/11526 [1:34:09<26:44, 1.57it/s] 78%|███████▊ | 9011/11526 [1:34:10<26:26, 1.59it/s] {'loss': 0.1743, 'grad_norm': 0.6447719931602478, 'learning_rate': 1.3816781596654976e-06, 'epoch': 2.35}
78%|███████▊ | 9011/11526 [1:34:10<26:26, 1.59it/s] 78%|███████▊ | 9012/11526 [1:34:10<26:12, 1.60it/s] {'loss': 0.2263, 'grad_norm': 0.6734469532966614, 'learning_rate': 1.3806332197381783e-06, 'epoch': 2.35}
78%|███████▊ | 9012/11526 [1:34:10<26:12, 1.60it/s] 78%|███████▊ | 9013/11526 [1:34:11<26:03, 1.61it/s] {'loss': 0.1777, 'grad_norm': 0.6195014119148254, 'learning_rate': 1.3795886117997748e-06, 'epoch': 2.35}
78%|███████▊ | 9013/11526 [1:34:11<26:03, 1.61it/s] 78%|███████▊ | 9014/11526 [1:34:12<25:56, 1.61it/s] {'loss': 0.1247, 'grad_norm': 0.5216385722160339, 'learning_rate': 1.378544335946105e-06, 'epoch': 2.35}
78%|███████▊ | 9014/11526 [1:34:12<25:56, 1.61it/s] 78%|███████▊ | 9015/11526 [1:34:12<25:51, 1.62it/s] {'loss': 0.1998, 'grad_norm': 0.699129045009613, 'learning_rate': 1.3775003922729518e-06, 'epoch': 2.35}
78%|███████▊ | 9015/11526 [1:34:12<25:51, 1.62it/s] 78%|███████▊ | 9016/11526 [1:34:13<25:48, 1.62it/s] {'loss': 0.1502, 'grad_norm': 0.5635551810264587, 'learning_rate': 1.3764567808760743e-06, 'epoch': 2.35}
78%|███████▊ | 9016/11526 [1:34:13<25:48, 1.62it/s] 78%|███████▊ | 9017/11526 [1:34:13<25:45, 1.62it/s] {'loss': 0.1382, 'grad_norm': 0.516681432723999, 'learning_rate': 1.3754135018511978e-06, 'epoch': 2.35}
78%|███████▊ | 9017/11526 [1:34:14<25:45, 1.62it/s] 78%|███████▊ | 9018/11526 [1:34:14<25:43, 1.63it/s] {'loss': 0.1412, 'grad_norm': 0.5998542904853821, 'learning_rate': 1.3743705552940178e-06, 'epoch': 2.35}
78%|███████▊ | 9018/11526 [1:34:14<25:43, 1.63it/s] 78%|███████▊ | 9019/11526 [1:34:15<25:41, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.5252413749694824, 'learning_rate': 1.3733279413001998e-06, 'epoch': 2.35}
78%|███████▊ | 9019/11526 [1:34:15<25:41, 1.63it/s] 78%|███████▊ | 9020/11526 [1:34:15<25:40, 1.63it/s] {'loss': 0.1872, 'grad_norm': 0.6263416409492493, 'learning_rate': 1.3722856599653788e-06, 'epoch': 2.35}
78%|███████▊ | 9020/11526 [1:34:15<25:40, 1.63it/s] 78%|███████▊ | 9021/11526 [1:34:16<25:39, 1.63it/s] {'loss': 0.149, 'grad_norm': 0.597007691860199, 'learning_rate': 1.3712437113851556e-06, 'epoch': 2.35}
78%|███████▊ | 9021/11526 [1:34:16<25:39, 1.63it/s] 78%|███████▊ | 9022/11526 [1:34:16<25:38, 1.63it/s] {'loss': 0.1807, 'grad_norm': 0.6945601105690002, 'learning_rate': 1.3702020956551087e-06, 'epoch': 2.35}
78%|███████▊ | 9022/11526 [1:34:17<25:38, 1.63it/s] 78%|███████▊ | 9023/11526 [1:34:17<25:37, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.6334457993507385, 'learning_rate': 1.3691608128707767e-06, 'epoch': 2.35}
78%|███████▊ | 9023/11526 [1:34:17<25:37, 1.63it/s] 78%|███████▊ | 9024/11526 [1:34:18<25:36, 1.63it/s] {'loss': 0.1236, 'grad_norm': 0.47630971670150757, 'learning_rate': 1.3681198631276737e-06, 'epoch': 2.35}
78%|███████▊ | 9024/11526 [1:34:18<25:36, 1.63it/s] 78%|███████▊ | 9025/11526 [1:34:18<25:36, 1.63it/s] {'loss': 0.1742, 'grad_norm': 0.6430226564407349, 'learning_rate': 1.3670792465212828e-06, 'epoch': 2.35}
78%|███████▊ | 9025/11526 [1:34:18<25:36, 1.63it/s] 78%|███████▊ | 9026/11526 [1:34:19<25:35, 1.63it/s] {'loss': 0.1396, 'grad_norm': 0.5459347367286682, 'learning_rate': 1.3660389631470505e-06, 'epoch': 2.35}
78%|███████▊ | 9026/11526 [1:34:19<25:35, 1.63it/s] 78%|███████▊ | 9027/11526 [1:34:20<25:34, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.6231408715248108, 'learning_rate': 1.3649990131004032e-06, 'epoch': 2.35}
78%|███████▊ | 9027/11526 [1:34:20<25:34, 1.63it/s] 78%|███████▊ | 9028/11526 [1:34:20<25:34, 1.63it/s] {'loss': 0.1322, 'grad_norm': 0.504542887210846, 'learning_rate': 1.3639593964767295e-06, 'epoch': 2.35}
78%|███████▊ | 9028/11526 [1:34:20<25:34, 1.63it/s] 78%|███████▊ | 9029/11526 [1:34:21<25:33, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.6793603897094727, 'learning_rate': 1.362920113371387e-06, 'epoch': 2.35}
78%|███████▊ | 9029/11526 [1:34:21<25:33, 1.63it/s] 78%|███████▊ | 9030/11526 [1:34:21<25:33, 1.63it/s] {'loss': 0.133, 'grad_norm': 0.665596067905426, 'learning_rate': 1.3618811638797058e-06, 'epoch': 2.35}
78%|███████▊ | 9030/11526 [1:34:22<25:33, 1.63it/s] 78%|███████▊ | 9031/11526 [1:34:22<25:33, 1.63it/s] {'loss': 0.1544, 'grad_norm': 0.5897590517997742, 'learning_rate': 1.3608425480969846e-06, 'epoch': 2.35}
78%|███████▊ | 9031/11526 [1:34:22<25:33, 1.63it/s] 78%|███████▊ | 9032/11526 [1:34:23<25:34, 1.63it/s] {'loss': 0.1756, 'grad_norm': 0.6866048574447632, 'learning_rate': 1.35980426611849e-06, 'epoch': 2.35}
78%|███████▊ | 9032/11526 [1:34:23<25:34, 1.63it/s] 78%|███████▊ | 9033/11526 [1:34:23<25:33, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.6649227738380432, 'learning_rate': 1.3587663180394622e-06, 'epoch': 2.35}
78%|███████▊ | 9033/11526 [1:34:23<25:33, 1.63it/s] 78%|███████▊ | 9034/11526 [1:34:24<25:33, 1.62it/s] {'loss': 0.1682, 'grad_norm': 0.5764189958572388, 'learning_rate': 1.357728703955104e-06, 'epoch': 2.35}
78%|███████▊ | 9034/11526 [1:34:24<25:33, 1.62it/s] 78%|███████▊ | 9035/11526 [1:34:24<25:31, 1.63it/s] {'loss': 0.1378, 'grad_norm': 0.5344220995903015, 'learning_rate': 1.356691423960591e-06, 'epoch': 2.35}
78%|███████▊ | 9035/11526 [1:34:25<25:31, 1.63it/s] 78%|███████▊ | 9036/11526 [1:34:25<25:31, 1.63it/s] {'loss': 0.1741, 'grad_norm': 0.8227381706237793, 'learning_rate': 1.3556544781510732e-06, 'epoch': 2.35}
78%|███████▊ | 9036/11526 [1:34:25<25:31, 1.63it/s] 78%|███████▊ | 9037/11526 [1:34:26<25:31, 1.62it/s] {'loss': 0.1267, 'grad_norm': 0.5525941848754883, 'learning_rate': 1.3546178666216603e-06, 'epoch': 2.35}
78%|███████▊ | 9037/11526 [1:34:26<25:31, 1.62it/s] 78%|███████▊ | 9038/11526 [1:34:26<25:31, 1.62it/s] {'loss': 0.1677, 'grad_norm': 0.6406201720237732, 'learning_rate': 1.3535815894674386e-06, 'epoch': 2.35}
78%|███████▊ | 9038/11526 [1:34:26<25:31, 1.62it/s] 78%|███████▊ | 9039/11526 [1:34:27<25:29, 1.63it/s] {'loss': 0.1249, 'grad_norm': 0.4618340730667114, 'learning_rate': 1.3525456467834607e-06, 'epoch': 2.35}
78%|███████▊ | 9039/11526 [1:34:27<25:29, 1.63it/s] 78%|███████▊ | 9040/11526 [1:34:28<25:28, 1.63it/s] {'loss': 0.6708, 'grad_norm': 0.8966801166534424, 'learning_rate': 1.351510038664749e-06, 'epoch': 2.35}
78%|███████▊ | 9040/11526 [1:34:28<25:28, 1.63it/s] 78%|███████▊ | 9041/11526 [1:34:28<25:27, 1.63it/s] {'loss': 0.0926, 'grad_norm': 0.3894195556640625, 'learning_rate': 1.350474765206297e-06, 'epoch': 2.35}
78%|███████▊ | 9041/11526 [1:34:28<25:27, 1.63it/s] 78%|███████▊ | 9042/11526 [1:34:29<25:27, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.6330950260162354, 'learning_rate': 1.3494398265030635e-06, 'epoch': 2.35}
78%|███████▊ | 9042/11526 [1:34:29<25:27, 1.63it/s] 78%|███████▊ | 9043/11526 [1:34:29<25:27, 1.63it/s] {'loss': 0.1597, 'grad_norm': 0.5905683636665344, 'learning_rate': 1.3484052226499777e-06, 'epoch': 2.35}
78%|███████▊ | 9043/11526 [1:34:30<25:27, 1.63it/s] 78%|███████▊ | 9044/11526 [1:34:30<25:26, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.7040612101554871, 'learning_rate': 1.347370953741945e-06, 'epoch': 2.35}
78%|███████▊ | 9044/11526 [1:34:30<25:26, 1.63it/s] 78%|███████▊ | 9045/11526 [1:34:31<25:25, 1.63it/s] {'loss': 0.1282, 'grad_norm': 0.4750531017780304, 'learning_rate': 1.3463370198738295e-06, 'epoch': 2.35}
78%|███████▊ | 9045/11526 [1:34:31<25:25, 1.63it/s] 78%|███████▊ | 9046/11526 [1:34:31<25:24, 1.63it/s] {'loss': 0.1453, 'grad_norm': 0.5509241819381714, 'learning_rate': 1.3453034211404714e-06, 'epoch': 2.35}
78%|███████▊ | 9046/11526 [1:34:31<25:24, 1.63it/s] 78%|███████▊ | 9047/11526 [1:34:32<25:25, 1.62it/s] {'loss': 0.1273, 'grad_norm': 0.546957790851593, 'learning_rate': 1.344270157636679e-06, 'epoch': 2.35}
78%|███████▊ | 9047/11526 [1:34:32<25:25, 1.62it/s] 79%|███████▊ | 9048/11526 [1:34:32<25:24, 1.63it/s] {'loss': 0.115, 'grad_norm': 0.5057303309440613, 'learning_rate': 1.343237229457225e-06, 'epoch': 2.36}
79%|███████▊ | 9048/11526 [1:34:33<25:24, 1.63it/s] 79%|███████▊ | 9049/11526 [1:34:33<25:22, 1.63it/s] {'loss': 0.1528, 'grad_norm': 0.6128002405166626, 'learning_rate': 1.3422046366968611e-06, 'epoch': 2.36}
79%|███████▊ | 9049/11526 [1:34:33<25:22, 1.63it/s] 79%|███████▊ | 9050/11526 [1:34:34<25:21, 1.63it/s] {'loss': 0.1664, 'grad_norm': 0.5298196077346802, 'learning_rate': 1.341172379450299e-06, 'epoch': 2.36}
79%|███████▊ | 9050/11526 [1:34:34<25:21, 1.63it/s] 79%|███████▊ | 9051/11526 [1:34:34<25:20, 1.63it/s] {'loss': 0.1312, 'grad_norm': 0.5414793491363525, 'learning_rate': 1.340140457812224e-06, 'epoch': 2.36}
79%|███████▊ | 9051/11526 [1:34:34<25:20, 1.63it/s] 79%|███████▊ | 9052/11526 [1:34:35<25:19, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.5174211263656616, 'learning_rate': 1.3391088718772893e-06, 'epoch': 2.36}
79%|███████▊ | 9052/11526 [1:34:35<25:19, 1.63it/s] 79%|███████▊ | 9053/11526 [1:34:36<25:20, 1.63it/s] {'loss': 0.128, 'grad_norm': 0.5253357291221619, 'learning_rate': 1.3380776217401192e-06, 'epoch': 2.36}
79%|███████▊ | 9053/11526 [1:34:36<25:20, 1.63it/s] 79%|███████▊ | 9054/11526 [1:34:36<25:19, 1.63it/s] {'loss': 0.1399, 'grad_norm': 0.566878080368042, 'learning_rate': 1.3370467074953053e-06, 'epoch': 2.36}
79%|███████▊ | 9054/11526 [1:34:36<25:19, 1.63it/s] 79%|███████▊ | 9055/11526 [1:34:37<25:18, 1.63it/s] {'loss': 0.1257, 'grad_norm': 0.49009576439857483, 'learning_rate': 1.3360161292374097e-06, 'epoch': 2.36}
79%|███████▊ | 9055/11526 [1:34:37<25:18, 1.63it/s] 79%|███████▊ | 9056/11526 [1:34:37<25:17, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.5876243710517883, 'learning_rate': 1.3349858870609606e-06, 'epoch': 2.36}
79%|███████▊ | 9056/11526 [1:34:38<25:17, 1.63it/s] 79%|███████▊ | 9057/11526 [1:34:38<25:17, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.6098654270172119, 'learning_rate': 1.333955981060459e-06, 'epoch': 2.36}
79%|███████▊ | 9057/11526 [1:34:38<25:17, 1.63it/s] 79%|███████▊ | 9058/11526 [1:34:39<25:16, 1.63it/s] {'loss': 0.1332, 'grad_norm': 0.47515425086021423, 'learning_rate': 1.3329264113303735e-06, 'epoch': 2.36}
79%|███████▊ | 9058/11526 [1:34:39<25:16, 1.63it/s] 79%|███████▊ | 9059/11526 [1:34:39<25:15, 1.63it/s] {'loss': 0.1507, 'grad_norm': 0.608245849609375, 'learning_rate': 1.331897177965143e-06, 'epoch': 2.36}
79%|███████▊ | 9059/11526 [1:34:39<25:15, 1.63it/s] 79%|███████▊ | 9060/11526 [1:34:40<25:14, 1.63it/s] {'loss': 0.1403, 'grad_norm': 0.5593333840370178, 'learning_rate': 1.330868281059175e-06, 'epoch': 2.36}
79%|███████▊ | 9060/11526 [1:34:40<25:14, 1.63it/s] 79%|███████▊ | 9061/11526 [1:34:40<25:13, 1.63it/s] {'loss': 0.1232, 'grad_norm': 0.4743400514125824, 'learning_rate': 1.3298397207068414e-06, 'epoch': 2.36}
79%|███████▊ | 9061/11526 [1:34:41<25:13, 1.63it/s] 79%|███████▊ | 9062/11526 [1:34:41<25:13, 1.63it/s] {'loss': 0.2264, 'grad_norm': 0.7731077671051025, 'learning_rate': 1.328811497002493e-06, 'epoch': 2.36}
79%|███████▊ | 9062/11526 [1:34:41<25:13, 1.63it/s] 79%|███████▊ | 9063/11526 [1:34:42<25:54, 1.58it/s] {'loss': 0.1156, 'grad_norm': 0.43650397658348083, 'learning_rate': 1.327783610040444e-06, 'epoch': 2.36}
79%|███████▊ | 9063/11526 [1:34:42<25:54, 1.58it/s] 79%|███████▊ | 9064/11526 [1:34:42<25:46, 1.59it/s] {'loss': 0.1486, 'grad_norm': 0.6194279789924622, 'learning_rate': 1.3267560599149742e-06, 'epoch': 2.36}
79%|███████▊ | 9064/11526 [1:34:42<25:46, 1.59it/s] 79%|███████▊ | 9065/11526 [1:34:43<25:36, 1.60it/s] {'loss': 0.1384, 'grad_norm': 0.5299244523048401, 'learning_rate': 1.325728846720339e-06, 'epoch': 2.36}
79%|███████▊ | 9065/11526 [1:34:43<25:36, 1.60it/s] 79%|███████▊ | 9066/11526 [1:34:44<25:27, 1.61it/s] {'loss': 0.1667, 'grad_norm': 0.7893844246864319, 'learning_rate': 1.3247019705507596e-06, 'epoch': 2.36}
79%|███████▊ | 9066/11526 [1:34:44<25:27, 1.61it/s] 79%|███████▊ | 9067/11526 [1:34:44<25:24, 1.61it/s] {'loss': 0.143, 'grad_norm': 0.5003603100776672, 'learning_rate': 1.323675431500427e-06, 'epoch': 2.36}
79%|███████▊ | 9067/11526 [1:34:44<25:24, 1.61it/s] 79%|███████▊ | 9068/11526 [1:34:45<25:19, 1.62it/s] {'loss': 0.1378, 'grad_norm': 0.5390474796295166, 'learning_rate': 1.3226492296635023e-06, 'epoch': 2.36}
79%|███████▊ | 9068/11526 [1:34:45<25:19, 1.62it/s] 79%|███████▊ | 9069/11526 [1:34:45<25:54, 1.58it/s] {'loss': 0.1773, 'grad_norm': 0.6710186004638672, 'learning_rate': 1.3216233651341126e-06, 'epoch': 2.36}
79%|███████▊ | 9069/11526 [1:34:46<25:54, 1.58it/s] 79%|███████▊ | 9070/11526 [1:34:46<25:46, 1.59it/s] {'loss': 0.1305, 'grad_norm': 0.6072589755058289, 'learning_rate': 1.3205978380063556e-06, 'epoch': 2.36}
79%|███████▊ | 9070/11526 [1:34:46<25:46, 1.59it/s] 79%|███████▊ | 9071/11526 [1:34:47<25:34, 1.60it/s] {'loss': 0.132, 'grad_norm': 0.548701286315918, 'learning_rate': 1.319572648374302e-06, 'epoch': 2.36}
79%|███████▊ | 9071/11526 [1:34:47<25:34, 1.60it/s] 79%|███████▊ | 9072/11526 [1:34:47<26:17, 1.56it/s] {'loss': 0.1402, 'grad_norm': 0.5692879557609558, 'learning_rate': 1.3185477963319847e-06, 'epoch': 2.36}
79%|███████▊ | 9072/11526 [1:34:48<26:17, 1.56it/s] 79%|███████▊ | 9073/11526 [1:34:48<25:56, 1.58it/s] {'loss': 0.16, 'grad_norm': 0.5415201187133789, 'learning_rate': 1.3175232819734106e-06, 'epoch': 2.36}
79%|███████▊ | 9073/11526 [1:34:48<25:56, 1.58it/s] 79%|███████▊ | 9074/11526 [1:34:49<25:41, 1.59it/s] {'loss': 0.1297, 'grad_norm': 0.508195161819458, 'learning_rate': 1.3164991053925547e-06, 'epoch': 2.36}
79%|███████▊ | 9074/11526 [1:34:49<25:41, 1.59it/s] 79%|███████▊ | 9075/11526 [1:34:49<25:30, 1.60it/s] {'loss': 0.1561, 'grad_norm': 0.5566919445991516, 'learning_rate': 1.315475266683356e-06, 'epoch': 2.36}
79%|███████▊ | 9075/11526 [1:34:49<25:30, 1.60it/s] 79%|███████▊ | 9076/11526 [1:34:50<25:22, 1.61it/s] {'loss': 0.1636, 'grad_norm': 0.6077998280525208, 'learning_rate': 1.3144517659397332e-06, 'epoch': 2.36}
79%|███████▊ | 9076/11526 [1:34:50<25:22, 1.61it/s] 79%|███████▉ | 9077/11526 [1:34:50<25:16, 1.61it/s] {'loss': 0.204, 'grad_norm': 0.6538534164428711, 'learning_rate': 1.3134286032555626e-06, 'epoch': 2.36}
79%|███████▉ | 9077/11526 [1:34:51<25:16, 1.61it/s] 79%|███████▉ | 9078/11526 [1:34:51<25:11, 1.62it/s] {'loss': 0.1487, 'grad_norm': 0.5248966217041016, 'learning_rate': 1.3124057787246963e-06, 'epoch': 2.36}
79%|███████▉ | 9078/11526 [1:34:51<25:11, 1.62it/s] 79%|███████▉ | 9079/11526 [1:34:52<25:08, 1.62it/s] {'loss': 0.1627, 'grad_norm': 0.6350533962249756, 'learning_rate': 1.3113832924409537e-06, 'epoch': 2.36}
79%|███████▉ | 9079/11526 [1:34:52<25:08, 1.62it/s] 79%|███████▉ | 9080/11526 [1:34:52<25:05, 1.62it/s] {'loss': 0.1841, 'grad_norm': 0.645596981048584, 'learning_rate': 1.3103611444981224e-06, 'epoch': 2.36}
79%|███████▉ | 9080/11526 [1:34:52<25:05, 1.62it/s] 79%|███████▉ | 9081/11526 [1:34:53<25:03, 1.63it/s] {'loss': 0.1707, 'grad_norm': 0.5668038129806519, 'learning_rate': 1.3093393349899603e-06, 'epoch': 2.36}
79%|███████▉ | 9081/11526 [1:34:53<25:03, 1.63it/s] 79%|███████▉ | 9082/11526 [1:34:54<25:02, 1.63it/s] {'loss': 0.1279, 'grad_norm': 0.5389478802680969, 'learning_rate': 1.3083178640101951e-06, 'epoch': 2.36}
79%|███████▉ | 9082/11526 [1:34:54<25:02, 1.63it/s] 79%|███████▉ | 9083/11526 [1:34:54<25:02, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.552025556564331, 'learning_rate': 1.3072967316525187e-06, 'epoch': 2.36}
79%|███████▉ | 9083/11526 [1:34:54<25:02, 1.63it/s] 79%|███████▉ | 9084/11526 [1:34:55<25:00, 1.63it/s] {'loss': 0.142, 'grad_norm': 0.5296077132225037, 'learning_rate': 1.3062759380105967e-06, 'epoch': 2.36}
79%|███████▉ | 9084/11526 [1:34:55<25:00, 1.63it/s] 79%|███████▉ | 9085/11526 [1:34:55<24:59, 1.63it/s] {'loss': 0.2416, 'grad_norm': 0.7269709706306458, 'learning_rate': 1.3052554831780618e-06, 'epoch': 2.36}
79%|███████▉ | 9085/11526 [1:34:56<24:59, 1.63it/s] 79%|███████▉ | 9086/11526 [1:34:56<24:59, 1.63it/s] {'loss': 0.1528, 'grad_norm': 0.623208224773407, 'learning_rate': 1.3042353672485163e-06, 'epoch': 2.36}
79%|███████▉ | 9086/11526 [1:34:56<24:59, 1.63it/s] 79%|███████▉ | 9087/11526 [1:34:57<24:58, 1.63it/s] {'loss': 0.1623, 'grad_norm': 0.6529412269592285, 'learning_rate': 1.3032155903155313e-06, 'epoch': 2.37}
79%|███████▉ | 9087/11526 [1:34:57<24:58, 1.63it/s] 79%|███████▉ | 9088/11526 [1:34:57<24:56, 1.63it/s] {'loss': 0.1334, 'grad_norm': 0.5164181590080261, 'learning_rate': 1.3021961524726457e-06, 'epoch': 2.37}
79%|███████▉ | 9088/11526 [1:34:57<24:56, 1.63it/s] 79%|███████▉ | 9089/11526 [1:34:58<24:56, 1.63it/s] {'loss': 0.1405, 'grad_norm': 0.7524504661560059, 'learning_rate': 1.301177053813369e-06, 'epoch': 2.37}
79%|███████▉ | 9089/11526 [1:34:58<24:56, 1.63it/s] 79%|███████▉ | 9090/11526 [1:34:58<24:56, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.6450844407081604, 'learning_rate': 1.3001582944311798e-06, 'epoch': 2.37}
79%|███████▉ | 9090/11526 [1:34:59<24:56, 1.63it/s] 79%|███████▉ | 9091/11526 [1:34:59<24:55, 1.63it/s] {'loss': 0.1966, 'grad_norm': 0.7245790958404541, 'learning_rate': 1.299139874419521e-06, 'epoch': 2.37}
79%|███████▉ | 9091/11526 [1:34:59<24:55, 1.63it/s] 79%|███████▉ | 9092/11526 [1:35:00<24:54, 1.63it/s] {'loss': 0.1475, 'grad_norm': 0.6211736798286438, 'learning_rate': 1.2981217938718104e-06, 'epoch': 2.37}
79%|███████▉ | 9092/11526 [1:35:00<24:54, 1.63it/s] 79%|███████▉ | 9093/11526 [1:35:00<24:54, 1.63it/s] {'loss': 0.1782, 'grad_norm': 0.5835933685302734, 'learning_rate': 1.2971040528814321e-06, 'epoch': 2.37}
79%|███████▉ | 9093/11526 [1:35:00<24:54, 1.63it/s] 79%|███████▉ | 9094/11526 [1:35:01<24:53, 1.63it/s] {'loss': 0.1155, 'grad_norm': 0.501556932926178, 'learning_rate': 1.296086651541738e-06, 'epoch': 2.37}
79%|███████▉ | 9094/11526 [1:35:01<24:53, 1.63it/s] 79%|███████▉ | 9095/11526 [1:35:02<24:52, 1.63it/s] {'loss': 0.16, 'grad_norm': 0.5686661005020142, 'learning_rate': 1.2950695899460509e-06, 'epoch': 2.37}
79%|███████▉ | 9095/11526 [1:35:02<24:52, 1.63it/s] 79%|███████▉ | 9096/11526 [1:35:02<24:51, 1.63it/s] {'loss': 0.165, 'grad_norm': 0.5951306223869324, 'learning_rate': 1.294052868187663e-06, 'epoch': 2.37}
79%|███████▉ | 9096/11526 [1:35:02<24:51, 1.63it/s] 79%|███████▉ | 9097/11526 [1:35:03<24:50, 1.63it/s] {'loss': 0.1744, 'grad_norm': 0.5692095756530762, 'learning_rate': 1.2930364863598282e-06, 'epoch': 2.37}
79%|███████▉ | 9097/11526 [1:35:03<24:50, 1.63it/s] 79%|███████▉ | 9098/11526 [1:35:03<24:50, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.5715782046318054, 'learning_rate': 1.292020444555782e-06, 'epoch': 2.37}
79%|███████▉ | 9098/11526 [1:35:04<24:50, 1.63it/s] 79%|███████▉ | 9099/11526 [1:35:04<24:50, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.6364665031433105, 'learning_rate': 1.2910047428687173e-06, 'epoch': 2.37}
79%|███████▉ | 9099/11526 [1:35:04<24:50, 1.63it/s] 79%|███████▉ | 9100/11526 [1:35:05<24:49, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.6259024143218994, 'learning_rate': 1.2899893813918002e-06, 'epoch': 2.37}
79%|███████▉ | 9100/11526 [1:35:05<24:49, 1.63it/s] 79%|███████▉ | 9101/11526 [1:35:05<24:49, 1.63it/s] {'loss': 0.1335, 'grad_norm': 0.538275957107544, 'learning_rate': 1.2889743602181665e-06, 'epoch': 2.37}
79%|███████▉ | 9101/11526 [1:35:05<24:49, 1.63it/s] 79%|███████▉ | 9102/11526 [1:35:06<24:49, 1.63it/s] {'loss': 0.1396, 'grad_norm': 0.5546454191207886, 'learning_rate': 1.287959679440919e-06, 'epoch': 2.37}
79%|███████▉ | 9102/11526 [1:35:06<24:49, 1.63it/s] 79%|███████▉ | 9103/11526 [1:35:06<24:48, 1.63it/s] {'loss': 0.1379, 'grad_norm': 0.7344808578491211, 'learning_rate': 1.2869453391531306e-06, 'epoch': 2.37}
79%|███████▉ | 9103/11526 [1:35:07<24:48, 1.63it/s] 79%|███████▉ | 9104/11526 [1:35:07<24:47, 1.63it/s] {'loss': 0.1503, 'grad_norm': 0.5443531274795532, 'learning_rate': 1.2859313394478434e-06, 'epoch': 2.37}
79%|███████▉ | 9104/11526 [1:35:07<24:47, 1.63it/s] 79%|███████▉ | 9105/11526 [1:35:08<24:47, 1.63it/s] {'loss': 0.1141, 'grad_norm': 0.5191530585289001, 'learning_rate': 1.2849176804180647e-06, 'epoch': 2.37}
79%|███████▉ | 9105/11526 [1:35:08<24:47, 1.63it/s] 79%|███████▉ | 9106/11526 [1:35:08<24:45, 1.63it/s] {'loss': 0.2203, 'grad_norm': 0.7841927409172058, 'learning_rate': 1.2839043621567738e-06, 'epoch': 2.37}
79%|███████▉ | 9106/11526 [1:35:08<24:45, 1.63it/s] 79%|███████▉ | 9107/11526 [1:35:09<24:45, 1.63it/s] {'loss': 0.1555, 'grad_norm': 0.5838181972503662, 'learning_rate': 1.2828913847569185e-06, 'epoch': 2.37}
79%|███████▉ | 9107/11526 [1:35:09<24:45, 1.63it/s] 79%|███████▉ | 9108/11526 [1:35:10<24:45, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.6289310455322266, 'learning_rate': 1.2818787483114154e-06, 'epoch': 2.37}
79%|███████▉ | 9108/11526 [1:35:10<24:45, 1.63it/s] 79%|███████▉ | 9109/11526 [1:35:10<24:44, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.563339352607727, 'learning_rate': 1.2808664529131504e-06, 'epoch': 2.37}
79%|███████▉ | 9109/11526 [1:35:10<24:44, 1.63it/s] 79%|███████▉ | 9110/11526 [1:35:11<24:43, 1.63it/s] {'loss': 0.2074, 'grad_norm': 0.5709774494171143, 'learning_rate': 1.2798544986549715e-06, 'epoch': 2.37}
79%|███████▉ | 9110/11526 [1:35:11<24:43, 1.63it/s] 79%|███████▉ | 9111/11526 [1:35:11<24:42, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.5733843445777893, 'learning_rate': 1.278842885629707e-06, 'epoch': 2.37}
79%|███████▉ | 9111/11526 [1:35:11<24:42, 1.63it/s] 79%|███████▉ | 9112/11526 [1:35:12<24:42, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.5977748036384583, 'learning_rate': 1.2778316139301467e-06, 'epoch': 2.37}
79%|███████▉ | 9112/11526 [1:35:12<24:42, 1.63it/s] 79%|███████▉ | 9113/11526 [1:35:13<24:41, 1.63it/s] {'loss': 0.1417, 'grad_norm': 0.5577558279037476, 'learning_rate': 1.2768206836490476e-06, 'epoch': 2.37}
79%|███████▉ | 9113/11526 [1:35:13<24:41, 1.63it/s] 79%|███████▉ | 9114/11526 [1:35:13<24:40, 1.63it/s] {'loss': 0.1633, 'grad_norm': 0.6653910279273987, 'learning_rate': 1.275810094879139e-06, 'epoch': 2.37}
79%|███████▉ | 9114/11526 [1:35:13<24:40, 1.63it/s] 79%|███████▉ | 9115/11526 [1:35:14<24:39, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.5868679285049438, 'learning_rate': 1.274799847713119e-06, 'epoch': 2.37}
79%|███████▉ | 9115/11526 [1:35:14<24:39, 1.63it/s] 79%|███████▉ | 9116/11526 [1:35:14<24:40, 1.63it/s] {'loss': 0.1367, 'grad_norm': 0.6972008943557739, 'learning_rate': 1.2737899422436523e-06, 'epoch': 2.37}
79%|███████▉ | 9116/11526 [1:35:15<24:40, 1.63it/s] 79%|███████▉ | 9117/11526 [1:35:15<24:39, 1.63it/s] {'loss': 0.1551, 'grad_norm': 0.5502164959907532, 'learning_rate': 1.2727803785633742e-06, 'epoch': 2.37}
79%|███████▉ | 9117/11526 [1:35:15<24:39, 1.63it/s] 79%|███████▉ | 9118/11526 [1:35:16<24:39, 1.63it/s] {'loss': 0.1208, 'grad_norm': 0.47268298268318176, 'learning_rate': 1.271771156764886e-06, 'epoch': 2.37}
79%|███████▉ | 9118/11526 [1:35:16<24:39, 1.63it/s] 79%|███████▉ | 9119/11526 [1:35:16<24:38, 1.63it/s] {'loss': 0.1858, 'grad_norm': 0.6809082627296448, 'learning_rate': 1.2707622769407585e-06, 'epoch': 2.37}
79%|███████▉ | 9119/11526 [1:35:16<24:38, 1.63it/s] 79%|███████▉ | 9120/11526 [1:35:17<24:38, 1.63it/s] {'loss': 0.181, 'grad_norm': 0.6918680667877197, 'learning_rate': 1.2697537391835356e-06, 'epoch': 2.37}
79%|███████▉ | 9120/11526 [1:35:17<24:38, 1.63it/s] 79%|███████▉ | 9121/11526 [1:35:18<24:36, 1.63it/s] {'loss': 0.1543, 'grad_norm': 0.6612601280212402, 'learning_rate': 1.2687455435857239e-06, 'epoch': 2.37}
79%|███████▉ | 9121/11526 [1:35:18<24:36, 1.63it/s] 79%|███████▉ | 9122/11526 [1:35:18<24:36, 1.63it/s] {'loss': 0.1203, 'grad_norm': 0.4938737452030182, 'learning_rate': 1.2677376902397998e-06, 'epoch': 2.37}
79%|███████▉ | 9122/11526 [1:35:18<24:36, 1.63it/s] 79%|███████▉ | 9123/11526 [1:35:19<24:35, 1.63it/s] {'loss': 0.143, 'grad_norm': 0.5988520383834839, 'learning_rate': 1.266730179238212e-06, 'epoch': 2.37}
79%|███████▉ | 9123/11526 [1:35:19<24:35, 1.63it/s] 79%|███████▉ | 9124/11526 [1:35:19<24:34, 1.63it/s] {'loss': 0.1339, 'grad_norm': 0.5425729751586914, 'learning_rate': 1.265723010673371e-06, 'epoch': 2.37}
79%|███████▉ | 9124/11526 [1:35:19<24:34, 1.63it/s] 79%|███████▉ | 9125/11526 [1:35:20<24:36, 1.63it/s] {'loss': 0.2453, 'grad_norm': 0.6685412526130676, 'learning_rate': 1.2647161846376654e-06, 'epoch': 2.38}
79%|███████▉ | 9125/11526 [1:35:20<24:36, 1.63it/s] 79%|███████▉ | 9126/11526 [1:35:21<24:35, 1.63it/s] {'loss': 0.1209, 'grad_norm': 0.4720262885093689, 'learning_rate': 1.2637097012234428e-06, 'epoch': 2.38}
79%|███████▉ | 9126/11526 [1:35:21<24:35, 1.63it/s] 79%|███████▉ | 9127/11526 [1:35:21<24:33, 1.63it/s] {'loss': 0.1184, 'grad_norm': 0.499628484249115, 'learning_rate': 1.2627035605230253e-06, 'epoch': 2.38}
79%|███████▉ | 9127/11526 [1:35:21<24:33, 1.63it/s] 79%|███████▉ | 9128/11526 [1:35:22<24:33, 1.63it/s] {'loss': 0.1309, 'grad_norm': 0.49489498138427734, 'learning_rate': 1.2616977626287014e-06, 'epoch': 2.38}
79%|███████▉ | 9128/11526 [1:35:22<24:33, 1.63it/s] 79%|███████▉ | 9129/11526 [1:35:22<24:32, 1.63it/s] {'loss': 0.1441, 'grad_norm': 0.5842120051383972, 'learning_rate': 1.260692307632728e-06, 'epoch': 2.38}
79%|███████▉ | 9129/11526 [1:35:23<24:32, 1.63it/s] 79%|███████▉ | 9130/11526 [1:35:23<24:38, 1.62it/s] {'loss': 0.1283, 'grad_norm': 0.5871896147727966, 'learning_rate': 1.2596871956273332e-06, 'epoch': 2.38}
79%|███████▉ | 9130/11526 [1:35:23<24:38, 1.62it/s] 79%|███████▉ | 9131/11526 [1:35:24<24:35, 1.62it/s] {'loss': 0.175, 'grad_norm': 0.651127278804779, 'learning_rate': 1.258682426704711e-06, 'epoch': 2.38}
79%|███████▉ | 9131/11526 [1:35:24<24:35, 1.62it/s] 79%|███████▉ | 9132/11526 [1:35:24<24:34, 1.62it/s] {'loss': 0.1896, 'grad_norm': 0.6429955363273621, 'learning_rate': 1.2576780009570228e-06, 'epoch': 2.38}
79%|███████▉ | 9132/11526 [1:35:24<24:34, 1.62it/s] 79%|███████▉ | 9133/11526 [1:35:25<24:32, 1.63it/s] {'loss': 0.1412, 'grad_norm': 0.5161025524139404, 'learning_rate': 1.256673918476401e-06, 'epoch': 2.38}
79%|███████▉ | 9133/11526 [1:35:25<24:32, 1.63it/s] 79%|███████▉ | 9134/11526 [1:35:26<24:30, 1.63it/s] {'loss': 0.1442, 'grad_norm': 0.6118475794792175, 'learning_rate': 1.2556701793549458e-06, 'epoch': 2.38}
79%|███████▉ | 9134/11526 [1:35:26<24:30, 1.63it/s] 79%|███████▉ | 9135/11526 [1:35:26<24:32, 1.62it/s] {'loss': 0.2181, 'grad_norm': 0.6692938208580017, 'learning_rate': 1.2546667836847265e-06, 'epoch': 2.38}
79%|███████▉ | 9135/11526 [1:35:26<24:32, 1.62it/s] 79%|███████▉ | 9136/11526 [1:35:27<24:30, 1.63it/s] {'loss': 0.1915, 'grad_norm': 0.5791694521903992, 'learning_rate': 1.2536637315577816e-06, 'epoch': 2.38}
79%|███████▉ | 9136/11526 [1:35:27<24:30, 1.63it/s] 79%|███████▉ | 9137/11526 [1:35:27<24:29, 1.63it/s] {'loss': 0.1524, 'grad_norm': 0.5469754338264465, 'learning_rate': 1.2526610230661112e-06, 'epoch': 2.38}
79%|███████▉ | 9137/11526 [1:35:27<24:29, 1.63it/s] 79%|███████▉ | 9138/11526 [1:35:28<24:28, 1.63it/s] {'loss': 0.1362, 'grad_norm': 0.5968782901763916, 'learning_rate': 1.2516586583016953e-06, 'epoch': 2.38}
79%|███████▉ | 9138/11526 [1:35:28<24:28, 1.63it/s] 79%|███████▉ | 9139/11526 [1:35:29<24:27, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.6290566325187683, 'learning_rate': 1.250656637356476e-06, 'epoch': 2.38}
79%|███████▉ | 9139/11526 [1:35:29<24:27, 1.63it/s] 79%|███████▉ | 9140/11526 [1:35:29<24:27, 1.63it/s] {'loss': 0.1253, 'grad_norm': 0.4783994257450104, 'learning_rate': 1.2496549603223612e-06, 'epoch': 2.38}
79%|███████▉ | 9140/11526 [1:35:29<24:27, 1.63it/s] 79%|███████▉ | 9141/11526 [1:35:30<24:25, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.5590322613716125, 'learning_rate': 1.248653627291232e-06, 'epoch': 2.38}
79%|███████▉ | 9141/11526 [1:35:30<24:25, 1.63it/s] 79%|███████▉ | 9142/11526 [1:35:30<24:24, 1.63it/s] {'loss': 0.137, 'grad_norm': 0.5417456030845642, 'learning_rate': 1.2476526383549365e-06, 'epoch': 2.38}
79%|███████▉ | 9142/11526 [1:35:31<24:24, 1.63it/s] 79%|███████▉ | 9143/11526 [1:35:31<24:23, 1.63it/s] {'loss': 0.1532, 'grad_norm': 0.5867708325386047, 'learning_rate': 1.2466519936052907e-06, 'epoch': 2.38}
79%|███████▉ | 9143/11526 [1:35:31<24:23, 1.63it/s] 79%|███████▉ | 9144/11526 [1:35:32<24:22, 1.63it/s] {'loss': 0.1343, 'grad_norm': 0.5484371185302734, 'learning_rate': 1.2456516931340813e-06, 'epoch': 2.38}
79%|███████▉ | 9144/11526 [1:35:32<24:22, 1.63it/s] 79%|███████▉ | 9145/11526 [1:35:32<24:24, 1.63it/s] {'loss': 0.2073, 'grad_norm': 0.704052209854126, 'learning_rate': 1.2446517370330585e-06, 'epoch': 2.38}
79%|███████▉ | 9145/11526 [1:35:32<24:24, 1.63it/s] 79%|███████▉ | 9146/11526 [1:35:33<24:23, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.590675950050354, 'learning_rate': 1.2436521253939437e-06, 'epoch': 2.38}
79%|███████▉ | 9146/11526 [1:35:33<24:23, 1.63it/s] 79%|███████▉ | 9147/11526 [1:35:33<24:23, 1.63it/s] {'loss': 0.1269, 'grad_norm': 0.47727465629577637, 'learning_rate': 1.2426528583084325e-06, 'epoch': 2.38}
79%|███████▉ | 9147/11526 [1:35:34<24:23, 1.63it/s] 79%|███████▉ | 9148/11526 [1:35:34<24:23, 1.63it/s] {'loss': 0.137, 'grad_norm': 0.52681565284729, 'learning_rate': 1.2416539358681772e-06, 'epoch': 2.38}
79%|███████▉ | 9148/11526 [1:35:34<24:23, 1.63it/s] 79%|███████▉ | 9149/11526 [1:35:35<24:22, 1.63it/s] {'loss': 0.1375, 'grad_norm': 0.5478277206420898, 'learning_rate': 1.2406553581648073e-06, 'epoch': 2.38}
79%|███████▉ | 9149/11526 [1:35:35<24:22, 1.63it/s] 79%|███████▉ | 9150/11526 [1:35:35<24:22, 1.62it/s] {'loss': 0.1307, 'grad_norm': 0.4844534695148468, 'learning_rate': 1.2396571252899182e-06, 'epoch': 2.38}
79%|███████▉ | 9150/11526 [1:35:35<24:22, 1.62it/s] 79%|███████▉ | 9151/11526 [1:35:36<24:20, 1.63it/s] {'loss': 0.1561, 'grad_norm': 0.6058218479156494, 'learning_rate': 1.2386592373350726e-06, 'epoch': 2.38}
79%|███████▉ | 9151/11526 [1:35:36<24:20, 1.63it/s] 79%|███████▉ | 9152/11526 [1:35:37<24:18, 1.63it/s] {'loss': 0.1262, 'grad_norm': 0.520460307598114, 'learning_rate': 1.237661694391804e-06, 'epoch': 2.38}
79%|███████▉ | 9152/11526 [1:35:37<24:18, 1.63it/s] 79%|███████▉ | 9153/11526 [1:35:37<24:17, 1.63it/s] {'loss': 0.1431, 'grad_norm': 0.6654089093208313, 'learning_rate': 1.2366644965516106e-06, 'epoch': 2.38}
79%|███████▉ | 9153/11526 [1:35:37<24:17, 1.63it/s] 79%|███████▉ | 9154/11526 [1:35:38<24:16, 1.63it/s] {'loss': 0.1175, 'grad_norm': 0.484759658575058, 'learning_rate': 1.2356676439059618e-06, 'epoch': 2.38}
79%|███████▉ | 9154/11526 [1:35:38<24:16, 1.63it/s] 79%|███████▉ | 9155/11526 [1:35:38<24:17, 1.63it/s] {'loss': 0.123, 'grad_norm': 0.48456820845603943, 'learning_rate': 1.2346711365462955e-06, 'epoch': 2.38}
79%|███████▉ | 9155/11526 [1:35:39<24:17, 1.63it/s] 79%|███████▉ | 9156/11526 [1:35:39<24:16, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.6698471307754517, 'learning_rate': 1.233674974564016e-06, 'epoch': 2.38}
79%|███████▉ | 9156/11526 [1:35:39<24:16, 1.63it/s] 79%|███████▉ | 9157/11526 [1:35:40<24:15, 1.63it/s] {'loss': 0.1768, 'grad_norm': 0.7445210814476013, 'learning_rate': 1.232679158050497e-06, 'epoch': 2.38}
79%|███████▉ | 9157/11526 [1:35:40<24:15, 1.63it/s] 79%|███████▉ | 9158/11526 [1:35:40<24:13, 1.63it/s] {'loss': 0.1649, 'grad_norm': 0.6395494937896729, 'learning_rate': 1.2316836870970827e-06, 'epoch': 2.38}
79%|███████▉ | 9158/11526 [1:35:40<24:13, 1.63it/s] 79%|███████▉ | 9159/11526 [1:35:41<24:13, 1.63it/s] {'loss': 0.1606, 'grad_norm': 0.6373420357704163, 'learning_rate': 1.230688561795078e-06, 'epoch': 2.38}
79%|███████▉ | 9159/11526 [1:35:41<24:13, 1.63it/s] 79%|███████▉ | 9160/11526 [1:35:41<24:22, 1.62it/s] {'loss': 0.1391, 'grad_norm': 0.5185761451721191, 'learning_rate': 1.2296937822357686e-06, 'epoch': 2.38}
79%|███████▉ | 9160/11526 [1:35:42<24:22, 1.62it/s] 79%|███████▉ | 9161/11526 [1:35:42<24:18, 1.62it/s] {'loss': 0.1679, 'grad_norm': 0.643128514289856, 'learning_rate': 1.2286993485103953e-06, 'epoch': 2.38}
79%|███████▉ | 9161/11526 [1:35:42<24:18, 1.62it/s] 79%|███████▉ | 9162/11526 [1:35:43<24:16, 1.62it/s] {'loss': 0.1582, 'grad_norm': 0.5308807492256165, 'learning_rate': 1.2277052607101763e-06, 'epoch': 2.38}
79%|███████▉ | 9162/11526 [1:35:43<24:16, 1.62it/s] 79%|███████▉ | 9163/11526 [1:35:43<24:14, 1.62it/s] {'loss': 0.1938, 'grad_norm': 0.6905574202537537, 'learning_rate': 1.226711518926294e-06, 'epoch': 2.38}
79%|███████▉ | 9163/11526 [1:35:43<24:14, 1.62it/s] 80%|███████▉ | 9164/11526 [1:35:44<24:12, 1.63it/s] {'loss': 0.1943, 'grad_norm': 0.7062472701072693, 'learning_rate': 1.2257181232499005e-06, 'epoch': 2.39}
80%|███████▉ | 9164/11526 [1:35:44<24:12, 1.63it/s] 80%|███████▉ | 9165/11526 [1:35:45<24:13, 1.62it/s] {'loss': 0.1367, 'grad_norm': 0.5832489132881165, 'learning_rate': 1.2247250737721151e-06, 'epoch': 2.39}
80%|███████▉ | 9165/11526 [1:35:45<24:13, 1.62it/s] 80%|███████▉ | 9166/11526 [1:35:45<24:12, 1.63it/s] {'loss': 0.1564, 'grad_norm': 0.6103530526161194, 'learning_rate': 1.223732370584028e-06, 'epoch': 2.39}
80%|███████▉ | 9166/11526 [1:35:45<24:12, 1.63it/s] 80%|███████▉ | 9167/11526 [1:35:46<24:10, 1.63it/s] {'loss': 0.1296, 'grad_norm': 0.5184571743011475, 'learning_rate': 1.2227400137766925e-06, 'epoch': 2.39}
80%|███████▉ | 9167/11526 [1:35:46<24:10, 1.63it/s] 80%|███████▉ | 9168/11526 [1:35:46<24:09, 1.63it/s] {'loss': 0.1429, 'grad_norm': 0.5728943943977356, 'learning_rate': 1.2217480034411333e-06, 'epoch': 2.39}
80%|███████▉ | 9168/11526 [1:35:47<24:09, 1.63it/s] 80%|███████▉ | 9169/11526 [1:35:47<24:09, 1.63it/s] {'loss': 0.1305, 'grad_norm': 0.5830782651901245, 'learning_rate': 1.2207563396683475e-06, 'epoch': 2.39}
80%|███████▉ | 9169/11526 [1:35:47<24:09, 1.63it/s] 80%|███████▉ | 9170/11526 [1:35:48<24:09, 1.63it/s] {'loss': 0.1363, 'grad_norm': 0.5541354417800903, 'learning_rate': 1.2197650225492919e-06, 'epoch': 2.39}
80%|███████▉ | 9170/11526 [1:35:48<24:09, 1.63it/s] 80%|███████▉ | 9171/11526 [1:35:48<24:07, 1.63it/s] {'loss': 0.1532, 'grad_norm': 0.5878540277481079, 'learning_rate': 1.218774052174897e-06, 'epoch': 2.39}
80%|███████▉ | 9171/11526 [1:35:48<24:07, 1.63it/s] 80%|███████▉ | 9172/11526 [1:35:49<24:06, 1.63it/s] {'loss': 0.1357, 'grad_norm': 0.498953253030777, 'learning_rate': 1.2177834286360612e-06, 'epoch': 2.39}
80%|███████▉ | 9172/11526 [1:35:49<24:06, 1.63it/s] 80%|███████▉ | 9173/11526 [1:35:49<24:05, 1.63it/s] {'loss': 0.1141, 'grad_norm': 0.5166007876396179, 'learning_rate': 1.216793152023647e-06, 'epoch': 2.39}
80%|███████▉ | 9173/11526 [1:35:50<24:05, 1.63it/s] 80%|███████▉ | 9174/11526 [1:35:50<24:05, 1.63it/s] {'loss': 0.1821, 'grad_norm': 0.6347969770431519, 'learning_rate': 1.2158032224284933e-06, 'epoch': 2.39}
80%|███████▉ | 9174/11526 [1:35:50<24:05, 1.63it/s] 80%|███████▉ | 9175/11526 [1:35:51<24:05, 1.63it/s] {'loss': 0.1602, 'grad_norm': 0.7039428353309631, 'learning_rate': 1.214813639941398e-06, 'epoch': 2.39}
80%|███████▉ | 9175/11526 [1:35:51<24:05, 1.63it/s] 80%|███████▉ | 9176/11526 [1:35:51<24:04, 1.63it/s] {'loss': 0.1286, 'grad_norm': 0.4910382926464081, 'learning_rate': 1.2138244046531316e-06, 'epoch': 2.39}
80%|███████▉ | 9176/11526 [1:35:51<24:04, 1.63it/s] 80%|███████▉ | 9177/11526 [1:35:52<24:03, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.6225078105926514, 'learning_rate': 1.2128355166544343e-06, 'epoch': 2.39}
80%|███████▉ | 9177/11526 [1:35:52<24:03, 1.63it/s] 80%|███████▉ | 9178/11526 [1:35:53<24:03, 1.63it/s] {'loss': 0.1396, 'grad_norm': 0.5909313559532166, 'learning_rate': 1.2118469760360107e-06, 'epoch': 2.39}
80%|███████▉ | 9178/11526 [1:35:53<24:03, 1.63it/s] 80%|███████▉ | 9179/11526 [1:35:53<24:01, 1.63it/s] {'loss': 0.1613, 'grad_norm': 0.5890839099884033, 'learning_rate': 1.2108587828885366e-06, 'epoch': 2.39}
80%|███████▉ | 9179/11526 [1:35:53<24:01, 1.63it/s] 80%|███████▉ | 9180/11526 [1:35:54<24:02, 1.63it/s] {'loss': 0.1499, 'grad_norm': 0.5601792335510254, 'learning_rate': 1.2098709373026552e-06, 'epoch': 2.39}
80%|███████▉ | 9180/11526 [1:35:54<24:02, 1.63it/s] 80%|███████▉ | 9181/11526 [1:35:54<24:01, 1.63it/s] {'loss': 0.1474, 'grad_norm': 0.6132181286811829, 'learning_rate': 1.2088834393689753e-06, 'epoch': 2.39}
80%|███████▉ | 9181/11526 [1:35:55<24:01, 1.63it/s] 80%|███████▉ | 9182/11526 [1:35:55<23:59, 1.63it/s] {'loss': 0.1819, 'grad_norm': 0.6276120543479919, 'learning_rate': 1.2078962891780771e-06, 'epoch': 2.39}
80%|███████▉ | 9182/11526 [1:35:55<23:59, 1.63it/s] 80%|███████▉ | 9183/11526 [1:35:56<23:58, 1.63it/s] {'loss': 0.149, 'grad_norm': 0.5419355034828186, 'learning_rate': 1.2069094868205072e-06, 'epoch': 2.39}
80%|███████▉ | 9183/11526 [1:35:56<23:58, 1.63it/s] 80%|███████▉ | 9184/11526 [1:35:56<23:58, 1.63it/s] {'loss': 0.125, 'grad_norm': 0.5414787530899048, 'learning_rate': 1.205923032386781e-06, 'epoch': 2.39}
80%|███████▉ | 9184/11526 [1:35:56<23:58, 1.63it/s] 80%|███████▉ | 9185/11526 [1:35:57<24:00, 1.63it/s] {'loss': 0.1255, 'grad_norm': 0.5600021481513977, 'learning_rate': 1.2049369259673833e-06, 'epoch': 2.39}
80%|███████▉ | 9185/11526 [1:35:57<24:00, 1.63it/s] 80%|███████▉ | 9186/11526 [1:35:57<23:59, 1.63it/s] {'loss': 0.1788, 'grad_norm': 0.6447593569755554, 'learning_rate': 1.2039511676527605e-06, 'epoch': 2.39}
80%|███████▉ | 9186/11526 [1:35:58<23:59, 1.63it/s] 80%|███████▉ | 9187/11526 [1:35:58<23:57, 1.63it/s] {'loss': 0.1509, 'grad_norm': 0.5869770646095276, 'learning_rate': 1.202965757533337e-06, 'epoch': 2.39}
80%|███████▉ | 9187/11526 [1:35:58<23:57, 1.63it/s] 80%|███████▉ | 9188/11526 [1:35:59<23:56, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.5652978420257568, 'learning_rate': 1.2019806956994996e-06, 'epoch': 2.39}
80%|███████▉ | 9188/11526 [1:35:59<23:56, 1.63it/s] 80%|███████▉ | 9189/11526 [1:35:59<23:55, 1.63it/s] {'loss': 0.1405, 'grad_norm': 0.5724868774414062, 'learning_rate': 1.2009959822416012e-06, 'epoch': 2.39}
80%|███████▉ | 9189/11526 [1:35:59<23:55, 1.63it/s] 80%|███████▉ | 9190/11526 [1:36:00<23:57, 1.63it/s] {'loss': 0.1883, 'grad_norm': 0.6453364491462708, 'learning_rate': 1.200011617249967e-06, 'epoch': 2.39}
80%|███████▉ | 9190/11526 [1:36:00<23:57, 1.63it/s] 80%|███████▉ | 9191/11526 [1:36:01<23:55, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.6133716106414795, 'learning_rate': 1.1990276008148883e-06, 'epoch': 2.39}
80%|███████▉ | 9191/11526 [1:36:01<23:55, 1.63it/s] 80%|███████▉ | 9192/11526 [1:36:01<23:55, 1.63it/s] {'loss': 0.1496, 'grad_norm': 0.5780921578407288, 'learning_rate': 1.1980439330266242e-06, 'epoch': 2.39}
80%|███████▉ | 9192/11526 [1:36:01<23:55, 1.63it/s] 80%|███████▉ | 9193/11526 [1:36:02<23:54, 1.63it/s] {'loss': 0.1313, 'grad_norm': 0.5312519073486328, 'learning_rate': 1.197060613975405e-06, 'epoch': 2.39}
80%|███████▉ | 9193/11526 [1:36:02<23:54, 1.63it/s] 80%|███████▉ | 9194/11526 [1:36:02<23:52, 1.63it/s] {'loss': 0.1356, 'grad_norm': 0.49267005920410156, 'learning_rate': 1.1960776437514222e-06, 'epoch': 2.39}
80%|███████▉ | 9194/11526 [1:36:03<23:52, 1.63it/s] 80%|███████▉ | 9195/11526 [1:36:03<23:53, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.6135675311088562, 'learning_rate': 1.1950950224448398e-06, 'epoch': 2.39}
80%|███████▉ | 9195/11526 [1:36:03<23:53, 1.63it/s] 80%|███████▉ | 9196/11526 [1:36:04<23:51, 1.63it/s] {'loss': 0.1023, 'grad_norm': 0.4059127867221832, 'learning_rate': 1.1941127501457939e-06, 'epoch': 2.39}
80%|███████▉ | 9196/11526 [1:36:04<23:51, 1.63it/s] 80%|███████▉ | 9197/11526 [1:36:04<23:50, 1.63it/s] {'loss': 0.1754, 'grad_norm': 0.6807244420051575, 'learning_rate': 1.1931308269443798e-06, 'epoch': 2.39}
80%|███████▉ | 9197/11526 [1:36:04<23:50, 1.63it/s] 80%|███████▉ | 9198/11526 [1:36:05<23:50, 1.63it/s] {'loss': 0.149, 'grad_norm': 0.5635837316513062, 'learning_rate': 1.1921492529306666e-06, 'epoch': 2.39}
80%|███████▉ | 9198/11526 [1:36:05<23:50, 1.63it/s] 80%|███████▉ | 9199/11526 [1:36:05<23:49, 1.63it/s] {'loss': 0.1637, 'grad_norm': 0.6160342693328857, 'learning_rate': 1.1911680281946897e-06, 'epoch': 2.39}
80%|███████▉ | 9199/11526 [1:36:06<23:49, 1.63it/s] 80%|███████▉ | 9200/11526 [1:36:06<23:56, 1.62it/s] {'loss': 0.1691, 'grad_norm': 0.6273287534713745, 'learning_rate': 1.1901871528264524e-06, 'epoch': 2.39}
80%|███████▉ | 9200/11526 [1:36:06<23:56, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5426750183105469, 'eval_runtime': 1.9547, 'eval_samples_per_second': 102.319, 'eval_steps_per_second': 6.651, 'epoch': 2.39}
80%|███████▉ | 9200/11526 [1:36:08<23:56, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 80%|███████▉ | 9201/11526 [1:36:09<46:40, 1.20s/it] {'loss': 0.1745, 'grad_norm': 0.732605516910553, 'learning_rate': 1.189206626915928e-06, 'epoch': 2.39}
80%|███████▉ | 9201/11526 [1:36:09<46:40, 1.20s/it] 80%|███████▉ | 9202/11526 [1:36:09<39:48, 1.03s/it] {'loss': 0.1548, 'grad_norm': 0.6473396420478821, 'learning_rate': 1.1882264505530527e-06, 'epoch': 2.4}
80%|███████▉ | 9202/11526 [1:36:09<39:48, 1.03s/it] 80%|███████▉ | 9203/11526 [1:36:10<34:59, 1.11it/s] {'loss': 0.1545, 'grad_norm': 0.5929003357887268, 'learning_rate': 1.1872466238277357e-06, 'epoch': 2.4}
80%|███████▉ | 9203/11526 [1:36:10<34:59, 1.11it/s] 80%|███████▉ | 9204/11526 [1:36:11<31:36, 1.22it/s] {'loss': 0.1512, 'grad_norm': 0.5354161858558655, 'learning_rate': 1.186267146829852e-06, 'epoch': 2.4}
80%|███████▉ | 9204/11526 [1:36:11<31:36, 1.22it/s] 80%|███████▉ | 9205/11526 [1:36:11<29:14, 1.32it/s] {'loss': 0.1787, 'grad_norm': 0.6769199967384338, 'learning_rate': 1.1852880196492444e-06, 'epoch': 2.4}
80%|███████▉ | 9205/11526 [1:36:11<29:14, 1.32it/s] 80%|███████▉ | 9206/11526 [1:36:12<27:33, 1.40it/s] {'loss': 0.1313, 'grad_norm': 0.5716630816459656, 'learning_rate': 1.1843092423757246e-06, 'epoch': 2.4}
80%|███████▉ | 9206/11526 [1:36:12<27:33, 1.40it/s] 80%|███████▉ | 9207/11526 [1:36:12<26:25, 1.46it/s] {'loss': 0.155, 'grad_norm': 0.6086785793304443, 'learning_rate': 1.1833308150990735e-06, 'epoch': 2.4}
80%|███████▉ | 9207/11526 [1:36:12<26:25, 1.46it/s] 80%|███████▉ | 9208/11526 [1:36:13<25:36, 1.51it/s] {'loss': 0.1351, 'grad_norm': 0.5831606984138489, 'learning_rate': 1.1823527379090327e-06, 'epoch': 2.4}
80%|███████▉ | 9208/11526 [1:36:13<25:36, 1.51it/s] 80%|███████▉ | 9209/11526 [1:36:14<25:01, 1.54it/s] {'loss': 0.1456, 'grad_norm': 0.606380820274353, 'learning_rate': 1.1813750108953232e-06, 'epoch': 2.4}
80%|███████▉ | 9209/11526 [1:36:14<25:01, 1.54it/s] 80%|███████▉ | 9210/11526 [1:36:14<24:37, 1.57it/s] {'loss': 0.142, 'grad_norm': 0.5830070376396179, 'learning_rate': 1.180397634147623e-06, 'epoch': 2.4}
80%|███████▉ | 9210/11526 [1:36:14<24:37, 1.57it/s] 80%|███████▉ | 9211/11526 [1:36:15<24:20, 1.59it/s] {'loss': 0.1447, 'grad_norm': 0.5521222949028015, 'learning_rate': 1.1794206077555847e-06, 'epoch': 2.4}
80%|███████▉ | 9211/11526 [1:36:15<24:20, 1.59it/s] 80%|███████▉ | 9212/11526 [1:36:15<24:08, 1.60it/s] {'loss': 0.1494, 'grad_norm': 0.6234821081161499, 'learning_rate': 1.1784439318088282e-06, 'epoch': 2.4}
80%|███████▉ | 9212/11526 [1:36:16<24:08, 1.60it/s] 80%|███████▉ | 9213/11526 [1:36:16<23:59, 1.61it/s] {'loss': 0.1679, 'grad_norm': 0.7114114761352539, 'learning_rate': 1.1774676063969349e-06, 'epoch': 2.4}
80%|███████▉ | 9213/11526 [1:36:16<23:59, 1.61it/s] 80%|███████▉ | 9214/11526 [1:36:17<23:53, 1.61it/s] {'loss': 0.1673, 'grad_norm': 0.7149627804756165, 'learning_rate': 1.1764916316094632e-06, 'epoch': 2.4}
80%|███████▉ | 9214/11526 [1:36:17<23:53, 1.61it/s] 80%|███████▉ | 9215/11526 [1:36:17<23:48, 1.62it/s] {'loss': 0.1174, 'grad_norm': 0.49418649077415466, 'learning_rate': 1.1755160075359357e-06, 'epoch': 2.4}
80%|███████▉ | 9215/11526 [1:36:17<23:48, 1.62it/s] 80%|███████▉ | 9216/11526 [1:36:18<23:45, 1.62it/s] {'loss': 0.1479, 'grad_norm': 0.644390881061554, 'learning_rate': 1.1745407342658389e-06, 'epoch': 2.4}
80%|███████▉ | 9216/11526 [1:36:18<23:45, 1.62it/s] 80%|███████▉ | 9217/11526 [1:36:18<23:42, 1.62it/s] {'loss': 0.1693, 'grad_norm': 0.6177300810813904, 'learning_rate': 1.1735658118886316e-06, 'epoch': 2.4}
80%|███████▉ | 9217/11526 [1:36:19<23:42, 1.62it/s] 80%|███████▉ | 9218/11526 [1:36:19<23:40, 1.62it/s] {'loss': 0.1361, 'grad_norm': 0.5312216877937317, 'learning_rate': 1.1725912404937402e-06, 'epoch': 2.4}
80%|███████▉ | 9218/11526 [1:36:19<23:40, 1.62it/s] 80%|███████▉ | 9219/11526 [1:36:20<23:38, 1.63it/s] {'loss': 0.1252, 'grad_norm': 0.5706817507743835, 'learning_rate': 1.1716170201705567e-06, 'epoch': 2.4}
80%|███████▉ | 9219/11526 [1:36:20<23:38, 1.63it/s] 80%|███████▉ | 9220/11526 [1:36:20<23:39, 1.62it/s] {'loss': 0.112, 'grad_norm': 0.4646773636341095, 'learning_rate': 1.170643151008445e-06, 'epoch': 2.4}
80%|███████▉ | 9220/11526 [1:36:20<23:39, 1.62it/s] 80%|████████ | 9221/11526 [1:36:21<23:38, 1.62it/s] {'loss': 0.1659, 'grad_norm': 0.6138778924942017, 'learning_rate': 1.1696696330967295e-06, 'epoch': 2.4}
80%|████████ | 9221/11526 [1:36:21<23:38, 1.62it/s] 80%|████████ | 9222/11526 [1:36:22<23:37, 1.62it/s] {'loss': 0.144, 'grad_norm': 0.5955736637115479, 'learning_rate': 1.1686964665247075e-06, 'epoch': 2.4}
80%|████████ | 9222/11526 [1:36:22<23:37, 1.62it/s] 80%|████████ | 9223/11526 [1:36:22<23:36, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.6261533498764038, 'learning_rate': 1.1677236513816476e-06, 'epoch': 2.4}
80%|████████ | 9223/11526 [1:36:22<23:36, 1.63it/s] 80%|████████ | 9224/11526 [1:36:23<23:34, 1.63it/s] {'loss': 0.1866, 'grad_norm': 0.6642406582832336, 'learning_rate': 1.1667511877567778e-06, 'epoch': 2.4}
80%|████████ | 9224/11526 [1:36:23<23:34, 1.63it/s] 80%|████████ | 9225/11526 [1:36:23<23:36, 1.62it/s] {'loss': 0.1522, 'grad_norm': 0.6074286699295044, 'learning_rate': 1.1657790757392994e-06, 'epoch': 2.4}
80%|████████ | 9225/11526 [1:36:24<23:36, 1.62it/s] 80%|████████ | 9226/11526 [1:36:24<23:38, 1.62it/s] {'loss': 0.1519, 'grad_norm': 0.5587487816810608, 'learning_rate': 1.1648073154183799e-06, 'epoch': 2.4}
80%|████████ | 9226/11526 [1:36:24<23:38, 1.62it/s] 80%|████████ | 9227/11526 [1:36:25<23:36, 1.62it/s] {'loss': 0.1938, 'grad_norm': 0.6371143460273743, 'learning_rate': 1.163835906883155e-06, 'epoch': 2.4}
80%|████████ | 9227/11526 [1:36:25<23:36, 1.62it/s] 80%|████████ | 9228/11526 [1:36:25<23:34, 1.62it/s] {'loss': 0.184, 'grad_norm': 0.682195246219635, 'learning_rate': 1.1628648502227274e-06, 'epoch': 2.4}
80%|████████ | 9228/11526 [1:36:25<23:34, 1.62it/s] 80%|████████ | 9229/11526 [1:36:26<23:33, 1.62it/s] {'loss': 0.1505, 'grad_norm': 0.6217867732048035, 'learning_rate': 1.1618941455261695e-06, 'epoch': 2.4}
80%|████████ | 9229/11526 [1:36:26<23:33, 1.62it/s] 80%|████████ | 9230/11526 [1:36:27<23:39, 1.62it/s] {'loss': 0.1552, 'grad_norm': 0.6206703186035156, 'learning_rate': 1.1609237928825174e-06, 'epoch': 2.4}
80%|████████ | 9230/11526 [1:36:27<23:39, 1.62it/s] 80%|████████ | 9231/11526 [1:36:27<23:36, 1.62it/s] {'loss': 0.1327, 'grad_norm': 0.5654293298721313, 'learning_rate': 1.159953792380778e-06, 'epoch': 2.4}
80%|████████ | 9231/11526 [1:36:27<23:36, 1.62it/s] 80%|████████ | 9232/11526 [1:36:28<23:33, 1.62it/s] {'loss': 0.1722, 'grad_norm': 0.6117693185806274, 'learning_rate': 1.1589841441099264e-06, 'epoch': 2.4}
80%|████████ | 9232/11526 [1:36:28<23:33, 1.62it/s] 80%|████████ | 9233/11526 [1:36:28<23:31, 1.62it/s] {'loss': 0.1635, 'grad_norm': 0.6749207973480225, 'learning_rate': 1.1580148481589032e-06, 'epoch': 2.4}
80%|████████ | 9233/11526 [1:36:28<23:31, 1.62it/s] 80%|████████ | 9234/11526 [1:36:29<23:29, 1.63it/s] {'loss': 0.1788, 'grad_norm': 0.5779091119766235, 'learning_rate': 1.1570459046166194e-06, 'epoch': 2.4}
80%|████████ | 9234/11526 [1:36:29<23:29, 1.63it/s] 80%|████████ | 9235/11526 [1:36:30<23:31, 1.62it/s] {'loss': 0.1639, 'grad_norm': 0.5742413997650146, 'learning_rate': 1.1560773135719478e-06, 'epoch': 2.4}
80%|████████ | 9235/11526 [1:36:30<23:31, 1.62it/s] 80%|████████ | 9236/11526 [1:36:30<23:28, 1.63it/s] {'loss': 0.1481, 'grad_norm': 0.5620416402816772, 'learning_rate': 1.1551090751137378e-06, 'epoch': 2.4}
80%|████████ | 9236/11526 [1:36:30<23:28, 1.63it/s] 80%|████████ | 9237/11526 [1:36:31<23:26, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.5838567018508911, 'learning_rate': 1.1541411893308014e-06, 'epoch': 2.4}
80%|████████ | 9237/11526 [1:36:31<23:26, 1.63it/s] 80%|████████ | 9238/11526 [1:36:31<23:26, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.5831524133682251, 'learning_rate': 1.153173656311915e-06, 'epoch': 2.4}
80%|████████ | 9238/11526 [1:36:32<23:26, 1.63it/s] 80%|████████ | 9239/11526 [1:36:32<23:24, 1.63it/s] {'loss': 0.1509, 'grad_norm': 0.5084181427955627, 'learning_rate': 1.152206476145829e-06, 'epoch': 2.4}
80%|████████ | 9239/11526 [1:36:32<23:24, 1.63it/s] 80%|████████ | 9240/11526 [1:36:33<23:25, 1.63it/s] {'loss': 0.1438, 'grad_norm': 0.5579636693000793, 'learning_rate': 1.151239648921258e-06, 'epoch': 2.4}
80%|████████ | 9240/11526 [1:36:33<23:25, 1.63it/s] 80%|████████ | 9241/11526 [1:36:33<23:24, 1.63it/s] {'loss': 0.1439, 'grad_norm': 0.5791940689086914, 'learning_rate': 1.1502731747268846e-06, 'epoch': 2.41}
80%|████████ | 9241/11526 [1:36:33<23:24, 1.63it/s] 80%|████████ | 9242/11526 [1:36:34<23:25, 1.63it/s] {'loss': 0.2185, 'grad_norm': 0.6925150752067566, 'learning_rate': 1.1493070536513618e-06, 'epoch': 2.41}
80%|████████ | 9242/11526 [1:36:34<23:25, 1.63it/s] 80%|████████ | 9243/11526 [1:36:34<23:24, 1.63it/s] {'loss': 0.1556, 'grad_norm': 0.6032105684280396, 'learning_rate': 1.1483412857833037e-06, 'epoch': 2.41}
80%|████████ | 9243/11526 [1:36:35<23:24, 1.63it/s] 80%|████████ | 9244/11526 [1:36:35<23:23, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.5831597447395325, 'learning_rate': 1.1473758712112964e-06, 'epoch': 2.41}
80%|████████ | 9244/11526 [1:36:35<23:23, 1.63it/s] 80%|████████ | 9245/11526 [1:36:36<23:32, 1.62it/s] {'loss': 0.1803, 'grad_norm': 0.6473203897476196, 'learning_rate': 1.146410810023898e-06, 'epoch': 2.41}
80%|████████ | 9245/11526 [1:36:36<23:32, 1.62it/s] 80%|████████ | 9246/11526 [1:36:36<23:28, 1.62it/s] {'loss': 0.1855, 'grad_norm': 0.5936540365219116, 'learning_rate': 1.1454461023096242e-06, 'epoch': 2.41}
80%|████████ | 9246/11526 [1:36:36<23:28, 1.62it/s] 80%|████████ | 9247/11526 [1:36:37<23:25, 1.62it/s] {'loss': 0.1698, 'grad_norm': 0.6638535261154175, 'learning_rate': 1.1444817481569652e-06, 'epoch': 2.41}
80%|████████ | 9247/11526 [1:36:37<23:25, 1.62it/s] 80%|████████ | 9248/11526 [1:36:38<23:23, 1.62it/s] {'loss': 0.1366, 'grad_norm': 0.4966513514518738, 'learning_rate': 1.1435177476543779e-06, 'epoch': 2.41}
80%|████████ | 9248/11526 [1:36:38<23:23, 1.62it/s] 80%|████████ | 9249/11526 [1:36:38<23:21, 1.62it/s] {'loss': 0.1565, 'grad_norm': 0.6474933624267578, 'learning_rate': 1.1425541008902852e-06, 'epoch': 2.41}
80%|████████ | 9249/11526 [1:36:38<23:21, 1.62it/s] 80%|████████ | 9250/11526 [1:36:39<23:22, 1.62it/s] {'loss': 0.121, 'grad_norm': 0.5459997653961182, 'learning_rate': 1.1415908079530796e-06, 'epoch': 2.41}
80%|████████ | 9250/11526 [1:36:39<23:22, 1.62it/s] 80%|████████ | 9251/11526 [1:36:39<23:20, 1.62it/s] {'loss': 0.1587, 'grad_norm': 0.637424647808075, 'learning_rate': 1.1406278689311169e-06, 'epoch': 2.41}
80%|████████ | 9251/11526 [1:36:40<23:20, 1.62it/s] 80%|████████ | 9252/11526 [1:36:40<23:18, 1.63it/s] {'loss': 0.159, 'grad_norm': 0.6657803654670715, 'learning_rate': 1.1396652839127253e-06, 'epoch': 2.41}
80%|████████ | 9252/11526 [1:36:40<23:18, 1.63it/s] 80%|████████ | 9253/11526 [1:36:41<23:17, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.5138235092163086, 'learning_rate': 1.1387030529861982e-06, 'epoch': 2.41}
80%|████████ | 9253/11526 [1:36:41<23:17, 1.63it/s] 80%|████████ | 9254/11526 [1:36:41<23:16, 1.63it/s] {'loss': 0.1746, 'grad_norm': 0.6238129734992981, 'learning_rate': 1.1377411762397967e-06, 'epoch': 2.41}
80%|████████ | 9254/11526 [1:36:41<23:16, 1.63it/s] 80%|████████ | 9255/11526 [1:36:42<23:16, 1.63it/s] {'loss': 0.1882, 'grad_norm': 0.6355192065238953, 'learning_rate': 1.13677965376175e-06, 'epoch': 2.41}
80%|████████ | 9255/11526 [1:36:42<23:16, 1.63it/s] 80%|████████ | 9256/11526 [1:36:43<23:14, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.570368230342865, 'learning_rate': 1.1358184856402555e-06, 'epoch': 2.41}
80%|████████ | 9256/11526 [1:36:43<23:14, 1.63it/s] 80%|████████ | 9257/11526 [1:36:43<23:14, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.867875337600708, 'learning_rate': 1.1348576719634735e-06, 'epoch': 2.41}
80%|████████ | 9257/11526 [1:36:43<23:14, 1.63it/s] 80%|████████ | 9258/11526 [1:36:44<23:12, 1.63it/s] {'loss': 0.187, 'grad_norm': 0.6390416622161865, 'learning_rate': 1.1338972128195396e-06, 'epoch': 2.41}
80%|████████ | 9258/11526 [1:36:44<23:12, 1.63it/s] 80%|████████ | 9259/11526 [1:36:44<23:12, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.5365878343582153, 'learning_rate': 1.1329371082965502e-06, 'epoch': 2.41}
80%|████████ | 9259/11526 [1:36:44<23:12, 1.63it/s] 80%|████████ | 9260/11526 [1:36:45<23:15, 1.62it/s] {'loss': 0.1339, 'grad_norm': 0.5557528734207153, 'learning_rate': 1.1319773584825711e-06, 'epoch': 2.41}
80%|████████ | 9260/11526 [1:36:45<23:15, 1.62it/s] 80%|████████ | 9261/11526 [1:36:46<23:13, 1.63it/s] {'loss': 0.1707, 'grad_norm': 0.638895571231842, 'learning_rate': 1.1310179634656383e-06, 'epoch': 2.41}
80%|████████ | 9261/11526 [1:36:46<23:13, 1.63it/s] 80%|████████ | 9262/11526 [1:36:46<23:12, 1.63it/s] {'loss': 0.3048, 'grad_norm': 0.6884590983390808, 'learning_rate': 1.130058923333749e-06, 'epoch': 2.41}
80%|████████ | 9262/11526 [1:36:46<23:12, 1.63it/s] 80%|████████ | 9263/11526 [1:36:47<23:11, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.6154118180274963, 'learning_rate': 1.1291002381748756e-06, 'epoch': 2.41}
80%|████████ | 9263/11526 [1:36:47<23:11, 1.63it/s] 80%|████████ | 9264/11526 [1:36:47<23:10, 1.63it/s] {'loss': 0.1227, 'grad_norm': 0.5659749507904053, 'learning_rate': 1.128141908076954e-06, 'epoch': 2.41}
80%|████████ | 9264/11526 [1:36:48<23:10, 1.63it/s] 80%|████████ | 9265/11526 [1:36:48<23:10, 1.63it/s] {'loss': 0.1498, 'grad_norm': 0.5516424775123596, 'learning_rate': 1.1271839331278855e-06, 'epoch': 2.41}
80%|████████ | 9265/11526 [1:36:48<23:10, 1.63it/s] 80%|████████ | 9266/11526 [1:36:49<23:10, 1.63it/s] {'loss': 0.1337, 'grad_norm': 0.501522958278656, 'learning_rate': 1.1262263134155416e-06, 'epoch': 2.41}
80%|████████ | 9266/11526 [1:36:49<23:10, 1.63it/s] 80%|████████ | 9267/11526 [1:36:49<23:08, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.543192982673645, 'learning_rate': 1.1252690490277618e-06, 'epoch': 2.41}
80%|████████ | 9267/11526 [1:36:49<23:08, 1.63it/s] 80%|████████ | 9268/11526 [1:36:50<23:07, 1.63it/s] {'loss': 0.116, 'grad_norm': 0.4830274283885956, 'learning_rate': 1.1243121400523504e-06, 'epoch': 2.41}
80%|████████ | 9268/11526 [1:36:50<23:07, 1.63it/s] 80%|████████ | 9269/11526 [1:36:50<23:06, 1.63it/s] {'loss': 0.1783, 'grad_norm': 0.6690633893013, 'learning_rate': 1.1233555865770829e-06, 'epoch': 2.41}
80%|████████ | 9269/11526 [1:36:51<23:06, 1.63it/s] 80%|████████ | 9270/11526 [1:36:51<23:06, 1.63it/s] {'loss': 0.1509, 'grad_norm': 0.6657900214195251, 'learning_rate': 1.1223993886896972e-06, 'epoch': 2.41}
80%|████████ | 9270/11526 [1:36:51<23:06, 1.63it/s] 80%|████████ | 9271/11526 [1:36:52<23:05, 1.63it/s] {'loss': 0.1364, 'grad_norm': 0.5153056383132935, 'learning_rate': 1.1214435464779006e-06, 'epoch': 2.41}
80%|████████ | 9271/11526 [1:36:52<23:05, 1.63it/s] 80%|████████ | 9272/11526 [1:36:52<23:04, 1.63it/s] {'loss': 0.1527, 'grad_norm': 0.6014441251754761, 'learning_rate': 1.1204880600293728e-06, 'epoch': 2.41}
80%|████████ | 9272/11526 [1:36:52<23:04, 1.63it/s] 80%|████████ | 9273/11526 [1:36:53<23:03, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.5584700703620911, 'learning_rate': 1.119532929431752e-06, 'epoch': 2.41}
80%|████████ | 9273/11526 [1:36:53<23:03, 1.63it/s] 80%|████████ | 9274/11526 [1:36:54<23:02, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.6440594792366028, 'learning_rate': 1.1185781547726498e-06, 'epoch': 2.41}
80%|████████ | 9274/11526 [1:36:54<23:02, 1.63it/s] 80%|████████ | 9275/11526 [1:36:54<23:07, 1.62it/s] {'loss': 0.1576, 'grad_norm': 0.6521345973014832, 'learning_rate': 1.1176237361396442e-06, 'epoch': 2.41}
80%|████████ | 9275/11526 [1:36:54<23:07, 1.62it/s] 80%|████████ | 9276/11526 [1:36:55<23:06, 1.62it/s] {'loss': 0.1444, 'grad_norm': 0.6543902158737183, 'learning_rate': 1.1166696736202787e-06, 'epoch': 2.41}
80%|████████ | 9276/11526 [1:36:55<23:06, 1.62it/s] 80%|████████ | 9277/11526 [1:36:55<23:04, 1.62it/s] {'loss': 0.1301, 'grad_norm': 0.495528906583786, 'learning_rate': 1.1157159673020678e-06, 'epoch': 2.41}
80%|████████ | 9277/11526 [1:36:56<23:04, 1.62it/s] 80%|████████ | 9278/11526 [1:36:56<23:03, 1.63it/s] {'loss': 0.1386, 'grad_norm': 0.5442638397216797, 'learning_rate': 1.1147626172724875e-06, 'epoch': 2.41}
80%|████████ | 9278/11526 [1:36:56<23:03, 1.63it/s] 81%|████████ | 9279/11526 [1:36:57<23:00, 1.63it/s] {'loss': 0.1343, 'grad_norm': 0.4920389950275421, 'learning_rate': 1.113809623618986e-06, 'epoch': 2.42}
81%|████████ | 9279/11526 [1:36:57<23:00, 1.63it/s] 81%|████████ | 9280/11526 [1:36:57<23:07, 1.62it/s] {'loss': 0.143, 'grad_norm': 0.5750491619110107, 'learning_rate': 1.1128569864289773e-06, 'epoch': 2.42}
81%|████████ | 9280/11526 [1:36:57<23:07, 1.62it/s] 81%|████████ | 9281/11526 [1:36:58<23:04, 1.62it/s] {'loss': 0.1243, 'grad_norm': 0.49905315041542053, 'learning_rate': 1.1119047057898425e-06, 'epoch': 2.42}
81%|████████ | 9281/11526 [1:36:58<23:04, 1.62it/s] 81%|████████ | 9282/11526 [1:36:58<23:02, 1.62it/s] {'loss': 0.1499, 'grad_norm': 0.5997071266174316, 'learning_rate': 1.1109527817889309e-06, 'epoch': 2.42}
81%|████████ | 9282/11526 [1:36:59<23:02, 1.62it/s] 81%|████████ | 9283/11526 [1:36:59<23:00, 1.62it/s] {'loss': 0.1642, 'grad_norm': 0.8534607291221619, 'learning_rate': 1.1100012145135592e-06, 'epoch': 2.42}
81%|████████ | 9283/11526 [1:36:59<23:00, 1.62it/s] 81%|████████ | 9284/11526 [1:37:00<22:59, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.6106391549110413, 'learning_rate': 1.1090500040510066e-06, 'epoch': 2.42}
81%|████████ | 9284/11526 [1:37:00<22:59, 1.63it/s] 81%|████████ | 9285/11526 [1:37:00<23:00, 1.62it/s] {'loss': 0.165, 'grad_norm': 0.610698938369751, 'learning_rate': 1.1080991504885285e-06, 'epoch': 2.42}
81%|████████ | 9285/11526 [1:37:00<23:00, 1.62it/s] 81%|████████ | 9286/11526 [1:37:01<23:00, 1.62it/s] {'loss': 0.1332, 'grad_norm': 0.54386305809021, 'learning_rate': 1.107148653913339e-06, 'epoch': 2.42}
81%|████████ | 9286/11526 [1:37:01<23:00, 1.62it/s] 81%|████████ | 9287/11526 [1:37:02<22:58, 1.62it/s] {'loss': 0.125, 'grad_norm': 0.4694758653640747, 'learning_rate': 1.1061985144126247e-06, 'epoch': 2.42}
81%|████████ | 9287/11526 [1:37:02<22:58, 1.62it/s] 81%|████████ | 9288/11526 [1:37:02<22:57, 1.62it/s] {'loss': 0.1433, 'grad_norm': 0.5521489381790161, 'learning_rate': 1.1052487320735377e-06, 'epoch': 2.42}
81%|████████ | 9288/11526 [1:37:02<22:57, 1.62it/s] 81%|████████ | 9289/11526 [1:37:03<22:56, 1.62it/s] {'loss': 0.1671, 'grad_norm': 0.6931485533714294, 'learning_rate': 1.104299306983197e-06, 'epoch': 2.42}
81%|████████ | 9289/11526 [1:37:03<22:56, 1.62it/s] 81%|████████ | 9290/11526 [1:37:03<22:58, 1.62it/s] {'loss': 0.139, 'grad_norm': 0.5319863557815552, 'learning_rate': 1.1033502392286893e-06, 'epoch': 2.42}
81%|████████ | 9290/11526 [1:37:04<22:58, 1.62it/s] 81%|████████ | 9291/11526 [1:37:04<22:56, 1.62it/s] {'loss': 0.1367, 'grad_norm': 0.5627626776695251, 'learning_rate': 1.1024015288970702e-06, 'epoch': 2.42}
81%|████████ | 9291/11526 [1:37:04<22:56, 1.62it/s] 81%|████████ | 9292/11526 [1:37:05<22:54, 1.63it/s] {'loss': 0.2212, 'grad_norm': 0.8307976126670837, 'learning_rate': 1.1014531760753577e-06, 'epoch': 2.42}
81%|████████ | 9292/11526 [1:37:05<22:54, 1.63it/s] 81%|████████ | 9293/11526 [1:37:05<22:52, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.5664447546005249, 'learning_rate': 1.1005051808505418e-06, 'epoch': 2.42}
81%|████████ | 9293/11526 [1:37:05<22:52, 1.63it/s] 81%|████████ | 9294/11526 [1:37:06<22:51, 1.63it/s] {'loss': 0.164, 'grad_norm': 0.5903083086013794, 'learning_rate': 1.0995575433095783e-06, 'epoch': 2.42}
81%|████████ | 9294/11526 [1:37:06<22:51, 1.63it/s] 81%|████████ | 9295/11526 [1:37:06<22:52, 1.63it/s] {'loss': 0.1514, 'grad_norm': 0.5948007702827454, 'learning_rate': 1.0986102635393891e-06, 'epoch': 2.42}
81%|████████ | 9295/11526 [1:37:07<22:52, 1.63it/s] 81%|████████ | 9296/11526 [1:37:07<22:51, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.687231183052063, 'learning_rate': 1.0976633416268645e-06, 'epoch': 2.42}
81%|████████ | 9296/11526 [1:37:07<22:51, 1.63it/s] 81%|████████ | 9297/11526 [1:37:08<22:49, 1.63it/s] {'loss': 0.1631, 'grad_norm': 0.5416368246078491, 'learning_rate': 1.0967167776588622e-06, 'epoch': 2.42}
81%|████████ | 9297/11526 [1:37:08<22:49, 1.63it/s] 81%|████████ | 9298/11526 [1:37:08<22:48, 1.63it/s] {'loss': 0.1482, 'grad_norm': 0.5401707887649536, 'learning_rate': 1.095770571722205e-06, 'epoch': 2.42}
81%|████████ | 9298/11526 [1:37:08<22:48, 1.63it/s] 81%|████████ | 9299/11526 [1:37:09<22:47, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.5644132494926453, 'learning_rate': 1.0948247239036868e-06, 'epoch': 2.42}
81%|████████ | 9299/11526 [1:37:09<22:47, 1.63it/s] 81%|████████ | 9300/11526 [1:37:10<22:54, 1.62it/s] {'loss': 0.1107, 'grad_norm': 0.4827902019023895, 'learning_rate': 1.0938792342900633e-06, 'epoch': 2.42}
81%|████████ | 9300/11526 [1:37:10<22:54, 1.62it/s] 81%|████████ | 9301/11526 [1:37:10<22:51, 1.62it/s] {'loss': 0.1734, 'grad_norm': 0.5831023454666138, 'learning_rate': 1.0929341029680612e-06, 'epoch': 2.42}
81%|████████ | 9301/11526 [1:37:10<22:51, 1.62it/s] 81%|████████ | 9302/11526 [1:37:11<22:50, 1.62it/s] {'loss': 0.1348, 'grad_norm': 0.5471182465553284, 'learning_rate': 1.0919893300243733e-06, 'epoch': 2.42}
81%|████████ | 9302/11526 [1:37:11<22:50, 1.62it/s] 81%|████████ | 9303/11526 [1:37:11<22:48, 1.62it/s] {'loss': 0.1543, 'grad_norm': 0.6968185305595398, 'learning_rate': 1.0910449155456599e-06, 'epoch': 2.42}
81%|████████ | 9303/11526 [1:37:12<22:48, 1.62it/s] 81%|████████ | 9304/11526 [1:37:12<22:47, 1.63it/s] {'loss': 0.1415, 'grad_norm': 0.5366839170455933, 'learning_rate': 1.0901008596185481e-06, 'epoch': 2.42}
81%|████████ | 9304/11526 [1:37:12<22:47, 1.63it/s] 81%|████████ | 9305/11526 [1:37:13<22:47, 1.62it/s] {'loss': 0.1653, 'grad_norm': 0.72939532995224, 'learning_rate': 1.0891571623296327e-06, 'epoch': 2.42}
81%|████████ | 9305/11526 [1:37:13<22:47, 1.62it/s] 81%|████████ | 9306/11526 [1:37:13<22:45, 1.63it/s] {'loss': 0.1956, 'grad_norm': 0.7165901064872742, 'learning_rate': 1.0882138237654716e-06, 'epoch': 2.42}
81%|████████ | 9306/11526 [1:37:13<22:45, 1.63it/s] 81%|████████ | 9307/11526 [1:37:14<22:44, 1.63it/s] {'loss': 0.1146, 'grad_norm': 0.4837689995765686, 'learning_rate': 1.0872708440125984e-06, 'epoch': 2.42}
81%|████████ | 9307/11526 [1:37:14<22:44, 1.63it/s] 81%|████████ | 9308/11526 [1:37:14<22:43, 1.63it/s] {'loss': 0.1469, 'grad_norm': 0.5686240792274475, 'learning_rate': 1.086328223157505e-06, 'epoch': 2.42}
81%|████████ | 9308/11526 [1:37:15<22:43, 1.63it/s] 81%|████████ | 9309/11526 [1:37:15<22:42, 1.63it/s] {'loss': 0.1585, 'grad_norm': 0.5863259434700012, 'learning_rate': 1.0853859612866542e-06, 'epoch': 2.42}
81%|████████ | 9309/11526 [1:37:15<22:42, 1.63it/s] 81%|████████ | 9310/11526 [1:37:16<22:48, 1.62it/s] {'loss': 0.1564, 'grad_norm': 0.6141121983528137, 'learning_rate': 1.084444058486478e-06, 'epoch': 2.42}
81%|████████ | 9310/11526 [1:37:16<22:48, 1.62it/s] 81%|████████ | 9311/11526 [1:37:16<22:45, 1.62it/s] {'loss': 0.1508, 'grad_norm': 0.6277586221694946, 'learning_rate': 1.0835025148433686e-06, 'epoch': 2.42}
81%|████████ | 9311/11526 [1:37:16<22:45, 1.62it/s] 81%|████████ | 9312/11526 [1:37:17<22:44, 1.62it/s] {'loss': 0.1529, 'grad_norm': 0.5987240076065063, 'learning_rate': 1.0825613304436938e-06, 'epoch': 2.42}
81%|████████ | 9312/11526 [1:37:17<22:44, 1.62it/s] 81%|████████ | 9313/11526 [1:37:18<22:41, 1.63it/s] {'loss': 0.1493, 'grad_norm': 0.5617390871047974, 'learning_rate': 1.0816205053737843e-06, 'epoch': 2.42}
81%|████████ | 9313/11526 [1:37:18<22:41, 1.63it/s] 81%|████████ | 9314/11526 [1:37:18<22:40, 1.63it/s] {'loss': 0.15, 'grad_norm': 0.5355840921401978, 'learning_rate': 1.0806800397199356e-06, 'epoch': 2.42}
81%|████████ | 9314/11526 [1:37:18<22:40, 1.63it/s] 81%|████████ | 9315/11526 [1:37:19<22:41, 1.62it/s] {'loss': 0.1392, 'grad_norm': 0.6850712299346924, 'learning_rate': 1.0797399335684132e-06, 'epoch': 2.42}
81%|████████ | 9315/11526 [1:37:19<22:41, 1.62it/s] 81%|████████ | 9316/11526 [1:37:19<22:39, 1.63it/s] {'loss': 0.127, 'grad_norm': 0.49982166290283203, 'learning_rate': 1.0788001870054503e-06, 'epoch': 2.42}
81%|████████ | 9316/11526 [1:37:20<22:39, 1.63it/s] 81%|████████ | 9317/11526 [1:37:20<22:38, 1.63it/s] {'loss': 0.1422, 'grad_norm': 0.5689600110054016, 'learning_rate': 1.0778608001172442e-06, 'epoch': 2.43}
81%|████████ | 9317/11526 [1:37:20<22:38, 1.63it/s] 81%|████████ | 9318/11526 [1:37:21<22:37, 1.63it/s] {'loss': 0.1322, 'grad_norm': 0.5496605634689331, 'learning_rate': 1.0769217729899634e-06, 'epoch': 2.43}
81%|████████ | 9318/11526 [1:37:21<22:37, 1.63it/s] 81%|████████ | 9319/11526 [1:37:21<22:36, 1.63it/s] {'loss': 0.1197, 'grad_norm': 0.5726926922798157, 'learning_rate': 1.0759831057097376e-06, 'epoch': 2.43}
81%|████████ | 9319/11526 [1:37:21<22:36, 1.63it/s] 81%|████████ | 9320/11526 [1:37:22<22:38, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.6067259907722473, 'learning_rate': 1.075044798362666e-06, 'epoch': 2.43}
81%|████████ | 9320/11526 [1:37:22<22:38, 1.62it/s] 81%|████████ | 9321/11526 [1:37:22<22:36, 1.63it/s] {'loss': 0.1084, 'grad_norm': 0.4453759789466858, 'learning_rate': 1.0741068510348213e-06, 'epoch': 2.43}
81%|████████ | 9321/11526 [1:37:23<22:36, 1.63it/s] 81%|████████ | 9322/11526 [1:37:23<22:34, 1.63it/s] {'loss': 0.1643, 'grad_norm': 0.6253727078437805, 'learning_rate': 1.0731692638122315e-06, 'epoch': 2.43}
81%|████████ | 9322/11526 [1:37:23<22:34, 1.63it/s] 81%|████████ | 9323/11526 [1:37:24<22:34, 1.63it/s] {'loss': 0.1644, 'grad_norm': 0.6367841362953186, 'learning_rate': 1.0722320367808996e-06, 'epoch': 2.43}
81%|████████ | 9323/11526 [1:37:24<22:34, 1.63it/s] 81%|████████ | 9324/11526 [1:37:24<22:32, 1.63it/s] {'loss': 0.1274, 'grad_norm': 0.5114697813987732, 'learning_rate': 1.0712951700267936e-06, 'epoch': 2.43}
81%|████████ | 9324/11526 [1:37:24<22:32, 1.63it/s] 81%|████████ | 9325/11526 [1:37:25<22:33, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5318324565887451, 'learning_rate': 1.0703586636358482e-06, 'epoch': 2.43}
81%|████████ | 9325/11526 [1:37:25<22:33, 1.63it/s] 81%|████████ | 9326/11526 [1:37:26<22:33, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.5710562467575073, 'learning_rate': 1.069422517693966e-06, 'epoch': 2.43}
81%|████████ | 9326/11526 [1:37:26<22:33, 1.63it/s] 81%|████████ | 9327/11526 [1:37:26<22:32, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.48722124099731445, 'learning_rate': 1.0684867322870135e-06, 'epoch': 2.43}
81%|████████ | 9327/11526 [1:37:26<22:32, 1.63it/s] 81%|████████ | 9328/11526 [1:37:27<22:31, 1.63it/s] {'loss': 0.1661, 'grad_norm': 0.6057515144348145, 'learning_rate': 1.0675513075008271e-06, 'epoch': 2.43}
81%|████████ | 9328/11526 [1:37:27<22:31, 1.63it/s] 81%|████████ | 9329/11526 [1:37:27<22:30, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.643145740032196, 'learning_rate': 1.0666162434212096e-06, 'epoch': 2.43}
81%|████████ | 9329/11526 [1:37:28<22:30, 1.63it/s] 81%|████████ | 9330/11526 [1:37:28<22:32, 1.62it/s] {'loss': 0.1364, 'grad_norm': 0.5977466106414795, 'learning_rate': 1.0656815401339299e-06, 'epoch': 2.43}
81%|████████ | 9330/11526 [1:37:28<22:32, 1.62it/s] 81%|████████ | 9331/11526 [1:37:29<22:31, 1.62it/s] {'loss': 0.1588, 'grad_norm': 0.6314675211906433, 'learning_rate': 1.0647471977247247e-06, 'epoch': 2.43}
81%|████████ | 9331/11526 [1:37:29<22:31, 1.62it/s] 81%|████████ | 9332/11526 [1:37:29<22:29, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5942897200584412, 'learning_rate': 1.0638132162792991e-06, 'epoch': 2.43}
81%|████████ | 9332/11526 [1:37:29<22:29, 1.63it/s] 81%|████████ | 9333/11526 [1:37:30<22:28, 1.63it/s] {'loss': 0.1298, 'grad_norm': 0.5445210337638855, 'learning_rate': 1.0628795958833182e-06, 'epoch': 2.43}
81%|████████ | 9333/11526 [1:37:30<22:28, 1.63it/s] 81%|████████ | 9334/11526 [1:37:30<22:27, 1.63it/s] {'loss': 0.1885, 'grad_norm': 0.7973215579986572, 'learning_rate': 1.0619463366224247e-06, 'epoch': 2.43}
81%|████████ | 9334/11526 [1:37:31<22:27, 1.63it/s] 81%|████████ | 9335/11526 [1:37:31<22:28, 1.62it/s] {'loss': 0.1407, 'grad_norm': 0.5912553668022156, 'learning_rate': 1.0610134385822186e-06, 'epoch': 2.43}
81%|████████ | 9335/11526 [1:37:31<22:28, 1.62it/s] 81%|████████ | 9336/11526 [1:37:32<22:26, 1.63it/s] {'loss': 0.1111, 'grad_norm': 0.5090587139129639, 'learning_rate': 1.0600809018482716e-06, 'epoch': 2.43}
81%|████████ | 9336/11526 [1:37:32<22:26, 1.63it/s] 81%|████████ | 9337/11526 [1:37:32<22:25, 1.63it/s] {'loss': 0.1208, 'grad_norm': 0.5036975145339966, 'learning_rate': 1.0591487265061218e-06, 'epoch': 2.43}
81%|████████ | 9337/11526 [1:37:32<22:25, 1.63it/s] 81%|████████ | 9338/11526 [1:37:33<22:24, 1.63it/s] {'loss': 0.1439, 'grad_norm': 0.5509248375892639, 'learning_rate': 1.0582169126412734e-06, 'epoch': 2.43}
81%|████████ | 9338/11526 [1:37:33<22:24, 1.63it/s] 81%|████████ | 9339/11526 [1:37:34<22:23, 1.63it/s] {'loss': 0.1789, 'grad_norm': 0.6640250086784363, 'learning_rate': 1.0572854603391975e-06, 'epoch': 2.43}
81%|████████ | 9339/11526 [1:37:34<22:23, 1.63it/s] 81%|████████ | 9340/11526 [1:37:34<22:22, 1.63it/s] {'loss': 0.126, 'grad_norm': 0.5581595301628113, 'learning_rate': 1.0563543696853334e-06, 'epoch': 2.43}
81%|████████ | 9340/11526 [1:37:34<22:22, 1.63it/s] 81%|████████ | 9341/11526 [1:37:35<22:23, 1.63it/s] {'loss': 0.1418, 'grad_norm': 0.5814012885093689, 'learning_rate': 1.0554236407650837e-06, 'epoch': 2.43}
81%|████████ | 9341/11526 [1:37:35<22:23, 1.63it/s] 81%|████████ | 9342/11526 [1:37:35<22:22, 1.63it/s] {'loss': 0.1146, 'grad_norm': 0.44271135330200195, 'learning_rate': 1.0544932736638213e-06, 'epoch': 2.43}
81%|████████ | 9342/11526 [1:37:36<22:22, 1.63it/s] 81%|████████ | 9343/11526 [1:37:36<22:21, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.5230066180229187, 'learning_rate': 1.053563268466885e-06, 'epoch': 2.43}
81%|████████ | 9343/11526 [1:37:36<22:21, 1.63it/s] 81%|████████ | 9344/11526 [1:37:37<22:20, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.6380265355110168, 'learning_rate': 1.0526336252595798e-06, 'epoch': 2.43}
81%|████████ | 9344/11526 [1:37:37<22:20, 1.63it/s] 81%|████████ | 9345/11526 [1:37:37<22:22, 1.62it/s] {'loss': 0.1412, 'grad_norm': 0.5466626882553101, 'learning_rate': 1.0517043441271796e-06, 'epoch': 2.43}
81%|████████ | 9345/11526 [1:37:37<22:22, 1.62it/s] 81%|████████ | 9346/11526 [1:37:38<22:20, 1.63it/s] {'loss': 0.2053, 'grad_norm': 0.6328216791152954, 'learning_rate': 1.050775425154919e-06, 'epoch': 2.43}
81%|████████ | 9346/11526 [1:37:38<22:20, 1.63it/s] 81%|████████ | 9347/11526 [1:37:38<22:19, 1.63it/s] {'loss': 0.1501, 'grad_norm': 0.6850627660751343, 'learning_rate': 1.0498468684280084e-06, 'epoch': 2.43}
81%|████████ | 9347/11526 [1:37:39<22:19, 1.63it/s] 81%|████████ | 9348/11526 [1:37:39<22:18, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.5729288458824158, 'learning_rate': 1.0489186740316193e-06, 'epoch': 2.43}
81%|████████ | 9348/11526 [1:37:39<22:18, 1.63it/s] 81%|████████ | 9349/11526 [1:37:40<22:17, 1.63it/s] {'loss': 0.177, 'grad_norm': 0.6886467933654785, 'learning_rate': 1.0479908420508889e-06, 'epoch': 2.43}
81%|████████ | 9349/11526 [1:37:40<22:17, 1.63it/s] 81%|████████ | 9350/11526 [1:37:40<22:19, 1.62it/s] {'loss': 0.1253, 'grad_norm': 0.5595749616622925, 'learning_rate': 1.0470633725709246e-06, 'epoch': 2.43}
81%|████████ | 9350/11526 [1:37:40<22:19, 1.62it/s] 81%|████████ | 9351/11526 [1:37:41<22:17, 1.63it/s] {'loss': 0.1615, 'grad_norm': 0.6689026355743408, 'learning_rate': 1.0461362656767992e-06, 'epoch': 2.43}
81%|████████ | 9351/11526 [1:37:41<22:17, 1.63it/s] 81%|████████ | 9352/11526 [1:37:42<22:16, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.9727226495742798, 'learning_rate': 1.0452095214535517e-06, 'epoch': 2.43}
81%|████████ | 9352/11526 [1:37:42<22:16, 1.63it/s] 81%|████████ | 9353/11526 [1:37:42<22:15, 1.63it/s] {'loss': 0.1385, 'grad_norm': 0.5874255299568176, 'learning_rate': 1.0442831399861903e-06, 'epoch': 2.43}
81%|████████ | 9353/11526 [1:37:42<22:15, 1.63it/s] 81%|████████ | 9354/11526 [1:37:43<22:14, 1.63it/s] {'loss': 0.1186, 'grad_norm': 0.4972309470176697, 'learning_rate': 1.043357121359685e-06, 'epoch': 2.43}
81%|████████ | 9354/11526 [1:37:43<22:14, 1.63it/s] 81%|████████ | 9355/11526 [1:37:43<22:18, 1.62it/s] {'loss': 0.1735, 'grad_norm': 0.6872150897979736, 'learning_rate': 1.0424314656589752e-06, 'epoch': 2.43}
81%|████████ | 9355/11526 [1:37:44<22:18, 1.62it/s] 81%|████████ | 9356/11526 [1:37:44<22:16, 1.62it/s] {'loss': 0.1526, 'grad_norm': 0.5877953171730042, 'learning_rate': 1.0415061729689723e-06, 'epoch': 2.44}
81%|████████ | 9356/11526 [1:37:44<22:16, 1.62it/s] 81%|████████ | 9357/11526 [1:37:45<22:14, 1.62it/s] {'loss': 0.2005, 'grad_norm': 0.6932991743087769, 'learning_rate': 1.0405812433745443e-06, 'epoch': 2.44}
81%|████████ | 9357/11526 [1:37:45<22:14, 1.62it/s] 81%|████████ | 9358/11526 [1:37:45<22:13, 1.63it/s] {'loss': 0.1284, 'grad_norm': 0.5051872730255127, 'learning_rate': 1.0396566769605332e-06, 'epoch': 2.44}
81%|████████ | 9358/11526 [1:37:45<22:13, 1.63it/s] 81%|████████ | 9359/11526 [1:37:46<22:12, 1.63it/s] {'loss': 0.1694, 'grad_norm': 0.6613579392433167, 'learning_rate': 1.038732473811746e-06, 'epoch': 2.44}
81%|████████ | 9359/11526 [1:37:46<22:12, 1.63it/s] 81%|████████ | 9360/11526 [1:37:46<22:13, 1.62it/s] {'loss': 0.1546, 'grad_norm': 0.6229975819587708, 'learning_rate': 1.0378086340129528e-06, 'epoch': 2.44}
81%|████████ | 9360/11526 [1:37:47<22:13, 1.62it/s] 81%|████████ | 9361/11526 [1:37:47<22:12, 1.62it/s] {'loss': 0.2296, 'grad_norm': 0.6285672187805176, 'learning_rate': 1.036885157648898e-06, 'epoch': 2.44}
81%|████████ | 9361/11526 [1:37:47<22:12, 1.62it/s] 81%|████████ | 9362/11526 [1:37:48<22:11, 1.63it/s] {'loss': 0.1418, 'grad_norm': 0.6156869530677795, 'learning_rate': 1.0359620448042845e-06, 'epoch': 2.44}
81%|████████ | 9362/11526 [1:37:48<22:11, 1.63it/s] 81%|████████ | 9363/11526 [1:37:48<22:09, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.6363781690597534, 'learning_rate': 1.0350392955637866e-06, 'epoch': 2.44}
81%|████████ | 9363/11526 [1:37:48<22:09, 1.63it/s] 81%|████████ | 9364/11526 [1:37:49<22:08, 1.63it/s] {'loss': 0.1559, 'grad_norm': 0.6645429730415344, 'learning_rate': 1.034116910012044e-06, 'epoch': 2.44}
81%|████████ | 9364/11526 [1:37:49<22:08, 1.63it/s] 81%|████████▏ | 9365/11526 [1:37:50<22:10, 1.62it/s] {'loss': 0.1963, 'grad_norm': 0.6750583648681641, 'learning_rate': 1.033194888233664e-06, 'epoch': 2.44}
81%|████████▏ | 9365/11526 [1:37:50<22:10, 1.62it/s] 81%|████████▏ | 9366/11526 [1:37:50<22:09, 1.62it/s] {'loss': 0.1684, 'grad_norm': 0.6040905714035034, 'learning_rate': 1.0322732303132183e-06, 'epoch': 2.44}
81%|████████▏ | 9366/11526 [1:37:50<22:09, 1.62it/s] 81%|████████▏ | 9367/11526 [1:37:51<22:07, 1.63it/s] {'loss': 0.1445, 'grad_norm': 0.6008939743041992, 'learning_rate': 1.0313519363352492e-06, 'epoch': 2.44}
81%|████████▏ | 9367/11526 [1:37:51<22:07, 1.63it/s] 81%|████████▏ | 9368/11526 [1:37:51<22:06, 1.63it/s] {'loss': 0.1303, 'grad_norm': 0.5677539706230164, 'learning_rate': 1.0304310063842598e-06, 'epoch': 2.44}
81%|████████▏ | 9368/11526 [1:37:52<22:06, 1.63it/s] 81%|████████▏ | 9369/11526 [1:37:52<22:05, 1.63it/s] {'loss': 0.1218, 'grad_norm': 0.47802531719207764, 'learning_rate': 1.0295104405447244e-06, 'epoch': 2.44}
81%|████████▏ | 9369/11526 [1:37:52<22:05, 1.63it/s] 81%|████████▏ | 9370/11526 [1:37:53<22:08, 1.62it/s] {'loss': 0.1453, 'grad_norm': 0.6605640053749084, 'learning_rate': 1.028590238901082e-06, 'epoch': 2.44}
81%|████████▏ | 9370/11526 [1:37:53<22:08, 1.62it/s] 81%|████████▏ | 9371/11526 [1:37:53<22:06, 1.62it/s] {'loss': 0.1147, 'grad_norm': 0.4707765579223633, 'learning_rate': 1.0276704015377397e-06, 'epoch': 2.44}
81%|████████▏ | 9371/11526 [1:37:53<22:06, 1.62it/s] 81%|████████▏ | 9372/11526 [1:37:54<22:05, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.6320602297782898, 'learning_rate': 1.0267509285390697e-06, 'epoch': 2.44}
81%|████████▏ | 9372/11526 [1:37:54<22:05, 1.63it/s] 81%|████████▏ | 9373/11526 [1:37:54<22:04, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.7119061350822449, 'learning_rate': 1.0258318199894119e-06, 'epoch': 2.44}
81%|████████▏ | 9373/11526 [1:37:55<22:04, 1.63it/s] 81%|████████▏ | 9374/11526 [1:37:55<22:02, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.6077119708061218, 'learning_rate': 1.0249130759730708e-06, 'epoch': 2.44}
81%|████████▏ | 9374/11526 [1:37:55<22:02, 1.63it/s] 81%|████████▏ | 9375/11526 [1:37:56<22:04, 1.62it/s] {'loss': 0.1629, 'grad_norm': 0.5790470242500305, 'learning_rate': 1.0239946965743219e-06, 'epoch': 2.44}
81%|████████▏ | 9375/11526 [1:37:56<22:04, 1.62it/s] 81%|████████▏ | 9376/11526 [1:37:56<22:03, 1.62it/s] {'loss': 0.152, 'grad_norm': 0.6030625104904175, 'learning_rate': 1.0230766818774001e-06, 'epoch': 2.44}
81%|████████▏ | 9376/11526 [1:37:56<22:03, 1.62it/s] 81%|████████▏ | 9377/11526 [1:37:57<22:01, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5688457489013672, 'learning_rate': 1.0221590319665125e-06, 'epoch': 2.44}
81%|████████▏ | 9377/11526 [1:37:57<22:01, 1.63it/s] 81%|████████▏ | 9378/11526 [1:37:58<22:01, 1.63it/s] {'loss': 0.1879, 'grad_norm': 0.7245306372642517, 'learning_rate': 1.0212417469258323e-06, 'epoch': 2.44}
81%|████████▏ | 9378/11526 [1:37:58<22:01, 1.63it/s] 81%|████████▏ | 9379/11526 [1:37:58<22:00, 1.63it/s] {'loss': 0.1706, 'grad_norm': 0.6238027811050415, 'learning_rate': 1.0203248268394965e-06, 'epoch': 2.44}
81%|████████▏ | 9379/11526 [1:37:58<22:00, 1.63it/s] 81%|████████▏ | 9380/11526 [1:37:59<22:00, 1.63it/s] {'loss': 0.1369, 'grad_norm': 0.5202620625495911, 'learning_rate': 1.0194082717916116e-06, 'epoch': 2.44}
81%|████████▏ | 9380/11526 [1:37:59<22:00, 1.63it/s] 81%|████████▏ | 9381/11526 [1:37:59<21:59, 1.63it/s] {'loss': 0.1638, 'grad_norm': 0.5930326581001282, 'learning_rate': 1.0184920818662496e-06, 'epoch': 2.44}
81%|████████▏ | 9381/11526 [1:38:00<21:59, 1.63it/s] 81%|████████▏ | 9382/11526 [1:38:00<21:58, 1.63it/s] {'loss': 0.1336, 'grad_norm': 0.5188186168670654, 'learning_rate': 1.017576257147445e-06, 'epoch': 2.44}
81%|████████▏ | 9382/11526 [1:38:00<21:58, 1.63it/s] 81%|████████▏ | 9383/11526 [1:38:01<21:57, 1.63it/s] {'loss': 0.1455, 'grad_norm': 0.5964707732200623, 'learning_rate': 1.0166607977192084e-06, 'epoch': 2.44}
81%|████████▏ | 9383/11526 [1:38:01<21:57, 1.63it/s] 81%|████████▏ | 9384/11526 [1:38:01<21:56, 1.63it/s] {'loss': 0.1638, 'grad_norm': 0.6256024241447449, 'learning_rate': 1.0157457036655055e-06, 'epoch': 2.44}
81%|████████▏ | 9384/11526 [1:38:01<21:56, 1.63it/s] 81%|████████▏ | 9385/11526 [1:38:02<21:56, 1.63it/s] {'loss': 0.1591, 'grad_norm': 0.7257683873176575, 'learning_rate': 1.0148309750702767e-06, 'epoch': 2.44}
81%|████████▏ | 9385/11526 [1:38:02<21:56, 1.63it/s] 81%|████████▏ | 9386/11526 [1:38:02<21:55, 1.63it/s] {'loss': 0.1553, 'grad_norm': 0.5994715094566345, 'learning_rate': 1.0139166120174254e-06, 'epoch': 2.44}
81%|████████▏ | 9386/11526 [1:38:03<21:55, 1.63it/s] 81%|████████▏ | 9387/11526 [1:38:03<21:53, 1.63it/s] {'loss': 0.1492, 'grad_norm': 0.6360860466957092, 'learning_rate': 1.0130026145908222e-06, 'epoch': 2.44}
81%|████████▏ | 9387/11526 [1:38:03<21:53, 1.63it/s] 81%|████████▏ | 9388/11526 [1:38:04<21:53, 1.63it/s] {'loss': 0.1221, 'grad_norm': 0.5282250046730042, 'learning_rate': 1.0120889828743047e-06, 'epoch': 2.44}
81%|████████▏ | 9388/11526 [1:38:04<21:53, 1.63it/s] 81%|████████▏ | 9389/11526 [1:38:04<21:52, 1.63it/s] {'loss': 0.1882, 'grad_norm': 0.6360235810279846, 'learning_rate': 1.0111757169516766e-06, 'epoch': 2.44}
81%|████████▏ | 9389/11526 [1:38:04<21:52, 1.63it/s] 81%|████████▏ | 9390/11526 [1:38:05<21:53, 1.63it/s] {'loss': 0.1633, 'grad_norm': 0.6494753956794739, 'learning_rate': 1.0102628169067063e-06, 'epoch': 2.44}
81%|████████▏ | 9390/11526 [1:38:05<21:53, 1.63it/s] 81%|████████▏ | 9391/11526 [1:38:06<21:52, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.5879151821136475, 'learning_rate': 1.009350282823131e-06, 'epoch': 2.44}
81%|████████▏ | 9391/11526 [1:38:06<21:52, 1.63it/s] 81%|████████▏ | 9392/11526 [1:38:06<21:51, 1.63it/s] {'loss': 0.1642, 'grad_norm': 0.614220917224884, 'learning_rate': 1.008438114784654e-06, 'epoch': 2.44}
81%|████████▏ | 9392/11526 [1:38:06<21:51, 1.63it/s] 81%|████████▏ | 9393/11526 [1:38:07<21:50, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.6335324048995972, 'learning_rate': 1.0075263128749436e-06, 'epoch': 2.44}
81%|████████▏ | 9393/11526 [1:38:07<21:50, 1.63it/s] 82%|████████▏ | 9394/11526 [1:38:07<21:49, 1.63it/s] {'loss': 0.1243, 'grad_norm': 0.487432599067688, 'learning_rate': 1.006614877177638e-06, 'epoch': 2.45}
82%|████████▏ | 9394/11526 [1:38:08<21:49, 1.63it/s] 82%|████████▏ | 9395/11526 [1:38:08<21:52, 1.62it/s] {'loss': 0.1096, 'grad_norm': 0.49976611137390137, 'learning_rate': 1.0057038077763338e-06, 'epoch': 2.45}
82%|████████▏ | 9395/11526 [1:38:08<21:52, 1.62it/s] 82%|████████▏ | 9396/11526 [1:38:09<21:51, 1.62it/s] {'loss': 0.1253, 'grad_norm': 0.5224179029464722, 'learning_rate': 1.004793104754605e-06, 'epoch': 2.45}
82%|████████▏ | 9396/11526 [1:38:09<21:51, 1.62it/s] 82%|████████▏ | 9397/11526 [1:38:09<21:49, 1.63it/s] {'loss': 0.1234, 'grad_norm': 0.4748661518096924, 'learning_rate': 1.003882768195985e-06, 'epoch': 2.45}
82%|████████▏ | 9397/11526 [1:38:09<21:49, 1.63it/s] 82%|████████▏ | 9398/11526 [1:38:10<21:48, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.6009488105773926, 'learning_rate': 1.0029727981839738e-06, 'epoch': 2.45}
82%|████████▏ | 9398/11526 [1:38:10<21:48, 1.63it/s] 82%|████████▏ | 9399/11526 [1:38:10<21:47, 1.63it/s] {'loss': 0.1445, 'grad_norm': 0.6044520735740662, 'learning_rate': 1.0020631948020398e-06, 'epoch': 2.45}
82%|████████▏ | 9399/11526 [1:38:11<21:47, 1.63it/s] 82%|████████▏ | 9400/11526 [1:38:11<21:48, 1.62it/s] {'loss': 0.1562, 'grad_norm': 0.6026831269264221, 'learning_rate': 1.0011539581336166e-06, 'epoch': 2.45}
82%|████████▏ | 9400/11526 [1:38:11<21:48, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5431468486785889, 'eval_runtime': 1.9543, 'eval_samples_per_second': 102.341, 'eval_steps_per_second': 6.652, 'epoch': 2.45}
82%|████████▏ | 9400/11526 [1:38:13<21:48, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 82%|████████▏ | 9401/11526 [1:38:14<42:36, 1.20s/it] {'loss': 0.1445, 'grad_norm': 0.5998348593711853, 'learning_rate': 1.0002450882621044e-06, 'epoch': 2.45}
82%|████████▏ | 9401/11526 [1:38:14<42:36, 1.20s/it] 82%|████████▏ | 9402/11526 [1:38:14<36:20, 1.03s/it] {'loss': 0.1302, 'grad_norm': 0.5128745436668396, 'learning_rate': 9.993365852708726e-07, 'epoch': 2.45}
82%|████████▏ | 9402/11526 [1:38:14<36:20, 1.03s/it] 82%|████████▏ | 9403/11526 [1:38:15<31:56, 1.11it/s] {'loss': 0.1377, 'grad_norm': 0.5585492253303528, 'learning_rate': 9.9842844924325e-07, 'epoch': 2.45}
82%|████████▏ | 9403/11526 [1:38:15<31:56, 1.11it/s] 82%|████████▏ | 9404/11526 [1:38:15<28:51, 1.23it/s] {'loss': 0.1406, 'grad_norm': 0.5964882969856262, 'learning_rate': 9.97520680262537e-07, 'epoch': 2.45}
82%|████████▏ | 9404/11526 [1:38:16<28:51, 1.23it/s] 82%|████████▏ | 9405/11526 [1:38:16<26:42, 1.32it/s] {'loss': 0.1863, 'grad_norm': 0.6693088412284851, 'learning_rate': 9.96613278412003e-07, 'epoch': 2.45}
82%|████████▏ | 9405/11526 [1:38:16<26:42, 1.32it/s] 82%|████████▏ | 9406/11526 [1:38:17<25:10, 1.40it/s] {'loss': 0.1542, 'grad_norm': 0.8591525554656982, 'learning_rate': 9.95706243774876e-07, 'epoch': 2.45}
82%|████████▏ | 9406/11526 [1:38:17<25:10, 1.40it/s] 82%|████████▏ | 9407/11526 [1:38:17<24:07, 1.46it/s] {'loss': 0.1756, 'grad_norm': 0.6949842572212219, 'learning_rate': 9.947995764343559e-07, 'epoch': 2.45}
82%|████████▏ | 9407/11526 [1:38:17<24:07, 1.46it/s] 82%|████████▏ | 9408/11526 [1:38:18<23:23, 1.51it/s] {'loss': 0.1229, 'grad_norm': 0.5065658688545227, 'learning_rate': 9.938932764736086e-07, 'epoch': 2.45}
82%|████████▏ | 9408/11526 [1:38:18<23:23, 1.51it/s] 82%|████████▏ | 9409/11526 [1:38:19<22:52, 1.54it/s] {'loss': 0.1726, 'grad_norm': 0.5488141775131226, 'learning_rate': 9.929873439757614e-07, 'epoch': 2.45}
82%|████████▏ | 9409/11526 [1:38:19<22:52, 1.54it/s] 82%|████████▏ | 9410/11526 [1:38:19<22:29, 1.57it/s] {'loss': 0.1822, 'grad_norm': 0.6400343775749207, 'learning_rate': 9.920817790239163e-07, 'epoch': 2.45}
82%|████████▏ | 9410/11526 [1:38:19<22:29, 1.57it/s] 82%|████████▏ | 9411/11526 [1:38:20<22:14, 1.58it/s] {'loss': 0.132, 'grad_norm': 0.5934661626815796, 'learning_rate': 9.911765817011332e-07, 'epoch': 2.45}
82%|████████▏ | 9411/11526 [1:38:20<22:14, 1.58it/s] 82%|████████▏ | 9412/11526 [1:38:20<22:03, 1.60it/s] {'loss': 0.1699, 'grad_norm': 0.6811599731445312, 'learning_rate': 9.902717520904437e-07, 'epoch': 2.45}
82%|████████▏ | 9412/11526 [1:38:21<22:03, 1.60it/s] 82%|████████▏ | 9413/11526 [1:38:21<21:55, 1.61it/s] {'loss': 0.1323, 'grad_norm': 0.5221576690673828, 'learning_rate': 9.89367290274843e-07, 'epoch': 2.45}
82%|████████▏ | 9413/11526 [1:38:21<21:55, 1.61it/s] 82%|████████▏ | 9414/11526 [1:38:22<21:50, 1.61it/s] {'loss': 0.1282, 'grad_norm': 0.5507398843765259, 'learning_rate': 9.884631963372943e-07, 'epoch': 2.45}
82%|████████▏ | 9414/11526 [1:38:22<21:50, 1.61it/s] 82%|████████▏ | 9415/11526 [1:38:22<21:46, 1.62it/s] {'loss': 0.1786, 'grad_norm': 0.7108251452445984, 'learning_rate': 9.875594703607255e-07, 'epoch': 2.45}
82%|████████▏ | 9415/11526 [1:38:22<21:46, 1.62it/s] 82%|████████▏ | 9416/11526 [1:38:23<21:42, 1.62it/s] {'loss': 0.1367, 'grad_norm': 0.5336089134216309, 'learning_rate': 9.866561124280338e-07, 'epoch': 2.45}
82%|████████▏ | 9416/11526 [1:38:23<21:42, 1.62it/s] 82%|████████▏ | 9417/11526 [1:38:23<21:40, 1.62it/s] {'loss': 0.2044, 'grad_norm': 0.6797589659690857, 'learning_rate': 9.857531226220768e-07, 'epoch': 2.45}
82%|████████▏ | 9417/11526 [1:38:24<21:40, 1.62it/s] 82%|████████▏ | 9418/11526 [1:38:24<21:38, 1.62it/s] {'loss': 0.1575, 'grad_norm': 0.6632845997810364, 'learning_rate': 9.848505010256837e-07, 'epoch': 2.45}
82%|████████▏ | 9418/11526 [1:38:24<21:38, 1.62it/s] 82%|████████▏ | 9419/11526 [1:38:25<21:37, 1.62it/s] {'loss': 0.1501, 'grad_norm': 0.5230267643928528, 'learning_rate': 9.839482477216477e-07, 'epoch': 2.45}
82%|████████▏ | 9419/11526 [1:38:25<21:37, 1.62it/s] 82%|████████▏ | 9420/11526 [1:38:25<21:36, 1.62it/s] {'loss': 0.1402, 'grad_norm': 0.6111360192298889, 'learning_rate': 9.83046362792729e-07, 'epoch': 2.45}
82%|████████▏ | 9420/11526 [1:38:25<21:36, 1.62it/s] 82%|████████▏ | 9421/11526 [1:38:26<21:34, 1.63it/s] {'loss': 0.1597, 'grad_norm': 0.6563177704811096, 'learning_rate': 9.821448463216548e-07, 'epoch': 2.45}
82%|████████▏ | 9421/11526 [1:38:26<21:34, 1.63it/s] 82%|████████▏ | 9422/11526 [1:38:27<21:33, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.579817533493042, 'learning_rate': 9.812436983911134e-07, 'epoch': 2.45}
82%|████████▏ | 9422/11526 [1:38:27<21:33, 1.63it/s] 82%|████████▏ | 9423/11526 [1:38:27<21:32, 1.63it/s] {'loss': 0.1606, 'grad_norm': 0.6632596254348755, 'learning_rate': 9.803429190837666e-07, 'epoch': 2.45}
82%|████████▏ | 9423/11526 [1:38:27<21:32, 1.63it/s] 82%|████████▏ | 9424/11526 [1:38:28<21:31, 1.63it/s] {'loss': 0.1315, 'grad_norm': 0.48832350969314575, 'learning_rate': 9.794425084822401e-07, 'epoch': 2.45}
82%|████████▏ | 9424/11526 [1:38:28<21:31, 1.63it/s] 82%|████████▏ | 9425/11526 [1:38:28<21:32, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.6484544277191162, 'learning_rate': 9.785424666691206e-07, 'epoch': 2.45}
82%|████████▏ | 9425/11526 [1:38:29<21:32, 1.63it/s] 82%|████████▏ | 9426/11526 [1:38:29<21:30, 1.63it/s] {'loss': 0.1107, 'grad_norm': 0.484916627407074, 'learning_rate': 9.776427937269678e-07, 'epoch': 2.45}
82%|████████▏ | 9426/11526 [1:38:29<21:30, 1.63it/s] 82%|████████▏ | 9427/11526 [1:38:30<21:30, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.6285005807876587, 'learning_rate': 9.767434897383044e-07, 'epoch': 2.45}
82%|████████▏ | 9427/11526 [1:38:30<21:30, 1.63it/s] 82%|████████▏ | 9428/11526 [1:38:30<21:29, 1.63it/s] {'loss': 0.1826, 'grad_norm': 0.6837602853775024, 'learning_rate': 9.758445547856187e-07, 'epoch': 2.45}
82%|████████▏ | 9428/11526 [1:38:30<21:29, 1.63it/s] 82%|████████▏ | 9429/11526 [1:38:31<21:29, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.6850413084030151, 'learning_rate': 9.749459889513691e-07, 'epoch': 2.45}
82%|████████▏ | 9429/11526 [1:38:31<21:29, 1.63it/s] 82%|████████▏ | 9430/11526 [1:38:31<21:30, 1.62it/s] {'loss': 0.1481, 'grad_norm': 1.0706685781478882, 'learning_rate': 9.740477923179737e-07, 'epoch': 2.45}
82%|████████▏ | 9430/11526 [1:38:32<21:30, 1.62it/s] 82%|████████▏ | 9431/11526 [1:38:32<21:28, 1.63it/s] {'loss': 0.1586, 'grad_norm': 0.5883994698524475, 'learning_rate': 9.731499649678195e-07, 'epoch': 2.45}
82%|████████▏ | 9431/11526 [1:38:32<21:28, 1.63it/s] 82%|████████▏ | 9432/11526 [1:38:33<21:26, 1.63it/s] {'loss': 0.1296, 'grad_norm': 0.519993782043457, 'learning_rate': 9.722525069832656e-07, 'epoch': 2.45}
82%|████████▏ | 9432/11526 [1:38:33<21:26, 1.63it/s] 82%|████████▏ | 9433/11526 [1:38:33<21:25, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.6475309133529663, 'learning_rate': 9.713554184466267e-07, 'epoch': 2.46}
82%|████████▏ | 9433/11526 [1:38:33<21:25, 1.63it/s] 82%|████████▏ | 9434/11526 [1:38:34<21:24, 1.63it/s] {'loss': 0.1477, 'grad_norm': 0.6407644748687744, 'learning_rate': 9.704586994401917e-07, 'epoch': 2.46}
82%|████████▏ | 9434/11526 [1:38:34<21:24, 1.63it/s] 82%|████████▏ | 9435/11526 [1:38:35<21:26, 1.63it/s] {'loss': 0.1237, 'grad_norm': 0.499483585357666, 'learning_rate': 9.695623500462114e-07, 'epoch': 2.46}
82%|████████▏ | 9435/11526 [1:38:35<21:26, 1.63it/s] 82%|████████▏ | 9436/11526 [1:38:35<21:25, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.5312414169311523, 'learning_rate': 9.686663703469046e-07, 'epoch': 2.46}
82%|████████▏ | 9436/11526 [1:38:35<21:25, 1.63it/s] 82%|████████▏ | 9437/11526 [1:38:36<21:24, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.6834866404533386, 'learning_rate': 9.677707604244574e-07, 'epoch': 2.46}
82%|████████▏ | 9437/11526 [1:38:36<21:24, 1.63it/s] 82%|████████▏ | 9438/11526 [1:38:36<21:23, 1.63it/s] {'loss': 0.1313, 'grad_norm': 0.5119972229003906, 'learning_rate': 9.668755203610164e-07, 'epoch': 2.46}
82%|████████▏ | 9438/11526 [1:38:37<21:23, 1.63it/s] 82%|████████▏ | 9439/11526 [1:38:37<21:22, 1.63it/s] {'loss': 0.1444, 'grad_norm': 0.5744871497154236, 'learning_rate': 9.65980650238701e-07, 'epoch': 2.46}
82%|████████▏ | 9439/11526 [1:38:37<21:22, 1.63it/s] 82%|████████▏ | 9440/11526 [1:38:38<21:23, 1.63it/s] {'loss': 0.158, 'grad_norm': 0.6905094981193542, 'learning_rate': 9.650861501395925e-07, 'epoch': 2.46}
82%|████████▏ | 9440/11526 [1:38:38<21:23, 1.63it/s] 82%|████████▏ | 9441/11526 [1:38:38<21:22, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.5997456908226013, 'learning_rate': 9.6419202014574e-07, 'epoch': 2.46}
82%|████████▏ | 9441/11526 [1:38:38<21:22, 1.63it/s] 82%|████████▏ | 9442/11526 [1:38:39<21:21, 1.63it/s] {'loss': 0.1635, 'grad_norm': 0.5977472066879272, 'learning_rate': 9.632982603391583e-07, 'epoch': 2.46}
82%|████████▏ | 9442/11526 [1:38:39<21:21, 1.63it/s] 82%|████████▏ | 9443/11526 [1:38:39<21:20, 1.63it/s] {'loss': 0.1586, 'grad_norm': 0.608453631401062, 'learning_rate': 9.624048708018297e-07, 'epoch': 2.46}
82%|████████▏ | 9443/11526 [1:38:40<21:20, 1.63it/s] 82%|████████▏ | 9444/11526 [1:38:40<21:20, 1.63it/s] {'loss': 0.2222, 'grad_norm': 0.6713441014289856, 'learning_rate': 9.615118516156963e-07, 'epoch': 2.46}
82%|████████▏ | 9444/11526 [1:38:40<21:20, 1.63it/s] 82%|████████▏ | 9445/11526 [1:38:41<21:21, 1.62it/s] {'loss': 0.1541, 'grad_norm': 0.6028808951377869, 'learning_rate': 9.606192028626755e-07, 'epoch': 2.46}
82%|████████▏ | 9445/11526 [1:38:41<21:21, 1.62it/s] 82%|████████▏ | 9446/11526 [1:38:41<21:19, 1.63it/s] {'loss': 0.1583, 'grad_norm': 0.6487381458282471, 'learning_rate': 9.597269246246454e-07, 'epoch': 2.46}
82%|████████▏ | 9446/11526 [1:38:41<21:19, 1.63it/s] 82%|████████▏ | 9447/11526 [1:38:42<21:18, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.6185675263404846, 'learning_rate': 9.588350169834493e-07, 'epoch': 2.46}
82%|████████▏ | 9447/11526 [1:38:42<21:18, 1.63it/s] 82%|████████▏ | 9448/11526 [1:38:43<21:17, 1.63it/s] {'loss': 0.1496, 'grad_norm': 0.6029420495033264, 'learning_rate': 9.579434800208981e-07, 'epoch': 2.46}
82%|████████▏ | 9448/11526 [1:38:43<21:17, 1.63it/s] 82%|████████▏ | 9449/11526 [1:38:43<21:16, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.6659913063049316, 'learning_rate': 9.570523138187698e-07, 'epoch': 2.46}
82%|████████▏ | 9449/11526 [1:38:43<21:16, 1.63it/s] 82%|████████▏ | 9450/11526 [1:38:44<21:16, 1.63it/s] {'loss': 0.1315, 'grad_norm': 0.5141522884368896, 'learning_rate': 9.561615184588064e-07, 'epoch': 2.46}
82%|████████▏ | 9450/11526 [1:38:44<21:16, 1.63it/s] 82%|████████▏ | 9451/11526 [1:38:44<21:15, 1.63it/s] {'loss': 0.117, 'grad_norm': 0.55939120054245, 'learning_rate': 9.552710940227184e-07, 'epoch': 2.46}
82%|████████▏ | 9451/11526 [1:38:45<21:15, 1.63it/s] 82%|████████▏ | 9452/11526 [1:38:45<21:14, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.7025426626205444, 'learning_rate': 9.543810405921776e-07, 'epoch': 2.46}
82%|████████▏ | 9452/11526 [1:38:45<21:14, 1.63it/s] 82%|████████▏ | 9453/11526 [1:38:46<21:13, 1.63it/s] {'loss': 0.184, 'grad_norm': 0.6494020223617554, 'learning_rate': 9.534913582488259e-07, 'epoch': 2.46}
82%|████████▏ | 9453/11526 [1:38:46<21:13, 1.63it/s] 82%|████████▏ | 9454/11526 [1:38:46<21:12, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.5848464369773865, 'learning_rate': 9.526020470742725e-07, 'epoch': 2.46}
82%|████████▏ | 9454/11526 [1:38:46<21:12, 1.63it/s] 82%|████████▏ | 9455/11526 [1:38:47<21:13, 1.63it/s] {'loss': 0.1306, 'grad_norm': 0.5202495455741882, 'learning_rate': 9.517131071500868e-07, 'epoch': 2.46}
82%|████████▏ | 9455/11526 [1:38:47<21:13, 1.63it/s] 82%|████████▏ | 9456/11526 [1:38:47<21:13, 1.63it/s] {'loss': 0.1197, 'grad_norm': 0.639578640460968, 'learning_rate': 9.508245385578085e-07, 'epoch': 2.46}
82%|████████▏ | 9456/11526 [1:38:48<21:13, 1.63it/s] 82%|████████▏ | 9457/11526 [1:38:48<21:11, 1.63it/s] {'loss': 0.1488, 'grad_norm': 0.7340441942214966, 'learning_rate': 9.499363413789442e-07, 'epoch': 2.46}
82%|████████▏ | 9457/11526 [1:38:48<21:11, 1.63it/s] 82%|████████▏ | 9458/11526 [1:38:49<21:11, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.7651368975639343, 'learning_rate': 9.490485156949597e-07, 'epoch': 2.46}
82%|████████▏ | 9458/11526 [1:38:49<21:11, 1.63it/s] 82%|████████▏ | 9459/11526 [1:38:49<21:10, 1.63it/s] {'loss': 0.1209, 'grad_norm': 0.4942057728767395, 'learning_rate': 9.481610615872971e-07, 'epoch': 2.46}
82%|████████▏ | 9459/11526 [1:38:49<21:10, 1.63it/s] 82%|████████▏ | 9460/11526 [1:38:50<21:11, 1.62it/s] {'loss': 0.1445, 'grad_norm': 0.6049733757972717, 'learning_rate': 9.472739791373542e-07, 'epoch': 2.46}
82%|████████▏ | 9460/11526 [1:38:50<21:11, 1.62it/s] 82%|████████▏ | 9461/11526 [1:38:51<21:11, 1.62it/s] {'loss': 0.1257, 'grad_norm': 0.5051506161689758, 'learning_rate': 9.463872684265013e-07, 'epoch': 2.46}
82%|████████▏ | 9461/11526 [1:38:51<21:11, 1.62it/s] 82%|████████▏ | 9462/11526 [1:38:51<21:09, 1.63it/s] {'loss': 0.1566, 'grad_norm': 0.5996916890144348, 'learning_rate': 9.455009295360723e-07, 'epoch': 2.46}
82%|████████▏ | 9462/11526 [1:38:51<21:09, 1.63it/s] 82%|████████▏ | 9463/11526 [1:38:52<21:08, 1.63it/s] {'loss': 0.1178, 'grad_norm': 0.5190837383270264, 'learning_rate': 9.446149625473672e-07, 'epoch': 2.46}
82%|████████▏ | 9463/11526 [1:38:52<21:08, 1.63it/s] 82%|████████▏ | 9464/11526 [1:38:52<21:07, 1.63it/s] {'loss': 0.1514, 'grad_norm': 0.6937994360923767, 'learning_rate': 9.43729367541652e-07, 'epoch': 2.46}
82%|████████▏ | 9464/11526 [1:38:53<21:07, 1.63it/s] 82%|████████▏ | 9465/11526 [1:38:53<21:09, 1.62it/s] {'loss': 0.1447, 'grad_norm': 0.5139386057853699, 'learning_rate': 9.428441446001596e-07, 'epoch': 2.46}
82%|████████▏ | 9465/11526 [1:38:53<21:09, 1.62it/s] 82%|████████▏ | 9466/11526 [1:38:54<21:07, 1.63it/s] {'loss': 0.2128, 'grad_norm': 0.7240824699401855, 'learning_rate': 9.419592938040851e-07, 'epoch': 2.46}
82%|████████▏ | 9466/11526 [1:38:54<21:07, 1.63it/s] 82%|████████▏ | 9467/11526 [1:38:54<21:06, 1.63it/s] {'loss': 0.1423, 'grad_norm': 0.6775356531143188, 'learning_rate': 9.410748152345933e-07, 'epoch': 2.46}
82%|████████▏ | 9467/11526 [1:38:54<21:06, 1.63it/s] 82%|████████▏ | 9468/11526 [1:38:55<21:05, 1.63it/s] {'loss': 0.1816, 'grad_norm': 0.657512903213501, 'learning_rate': 9.40190708972814e-07, 'epoch': 2.46}
82%|████████▏ | 9468/11526 [1:38:55<21:05, 1.63it/s] 82%|████████▏ | 9469/11526 [1:38:55<21:03, 1.63it/s] {'loss': 0.1813, 'grad_norm': 0.6818289756774902, 'learning_rate': 9.393069750998424e-07, 'epoch': 2.46}
82%|████████▏ | 9469/11526 [1:38:56<21:03, 1.63it/s] 82%|████████▏ | 9470/11526 [1:38:56<21:02, 1.63it/s] {'loss': 0.1779, 'grad_norm': 0.6359460949897766, 'learning_rate': 9.384236136967406e-07, 'epoch': 2.46}
82%|████████▏ | 9470/11526 [1:38:56<21:02, 1.63it/s] 82%|████████▏ | 9471/11526 [1:38:57<21:01, 1.63it/s] {'loss': 0.1715, 'grad_norm': 0.6390657424926758, 'learning_rate': 9.375406248445318e-07, 'epoch': 2.47}
82%|████████▏ | 9471/11526 [1:38:57<21:01, 1.63it/s] 82%|████████▏ | 9472/11526 [1:38:57<21:00, 1.63it/s] {'loss': 0.1952, 'grad_norm': 0.6557399034500122, 'learning_rate': 9.366580086242128e-07, 'epoch': 2.47}
82%|████████▏ | 9472/11526 [1:38:57<21:00, 1.63it/s] 82%|████████▏ | 9473/11526 [1:38:58<21:00, 1.63it/s] {'loss': 0.16, 'grad_norm': 0.5821563005447388, 'learning_rate': 9.357757651167415e-07, 'epoch': 2.47}
82%|████████▏ | 9473/11526 [1:38:58<21:00, 1.63it/s] 82%|████████▏ | 9474/11526 [1:38:59<21:00, 1.63it/s] {'loss': 0.1562, 'grad_norm': 0.5745429992675781, 'learning_rate': 9.348938944030405e-07, 'epoch': 2.47}
82%|████████▏ | 9474/11526 [1:38:59<21:00, 1.63it/s] 82%|████████▏ | 9475/11526 [1:38:59<21:02, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.5814233422279358, 'learning_rate': 9.340123965640014e-07, 'epoch': 2.47}
82%|████████▏ | 9475/11526 [1:38:59<21:02, 1.62it/s] 82%|████████▏ | 9476/11526 [1:39:00<21:01, 1.63it/s] {'loss': 0.13, 'grad_norm': 0.5421966910362244, 'learning_rate': 9.331312716804791e-07, 'epoch': 2.47}
82%|████████▏ | 9476/11526 [1:39:00<21:01, 1.63it/s] 82%|████████▏ | 9477/11526 [1:39:00<21:00, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5621383786201477, 'learning_rate': 9.322505198332965e-07, 'epoch': 2.47}
82%|████████▏ | 9477/11526 [1:39:00<21:00, 1.63it/s] 82%|████████▏ | 9478/11526 [1:39:01<20:59, 1.63it/s] {'loss': 0.1354, 'grad_norm': 0.5030449032783508, 'learning_rate': 9.313701411032422e-07, 'epoch': 2.47}
82%|████████▏ | 9478/11526 [1:39:01<20:59, 1.63it/s] 82%|████████▏ | 9479/11526 [1:39:02<20:59, 1.63it/s] {'loss': 0.1546, 'grad_norm': 0.6885364055633545, 'learning_rate': 9.304901355710666e-07, 'epoch': 2.47}
82%|████████▏ | 9479/11526 [1:39:02<20:59, 1.63it/s] 82%|████████▏ | 9480/11526 [1:39:02<20:59, 1.62it/s] {'loss': 0.183, 'grad_norm': 0.6976229548454285, 'learning_rate': 9.296105033174891e-07, 'epoch': 2.47}
82%|████████▏ | 9480/11526 [1:39:02<20:59, 1.62it/s] 82%|████████▏ | 9481/11526 [1:39:03<20:57, 1.63it/s] {'loss': 0.1141, 'grad_norm': 0.5112779140472412, 'learning_rate': 9.287312444231982e-07, 'epoch': 2.47}
82%|████████▏ | 9481/11526 [1:39:03<20:57, 1.63it/s] 82%|████████▏ | 9482/11526 [1:39:03<20:56, 1.63it/s] {'loss': 0.1386, 'grad_norm': 0.5595030188560486, 'learning_rate': 9.27852358968841e-07, 'epoch': 2.47}
82%|████████▏ | 9482/11526 [1:39:04<20:56, 1.63it/s] 82%|████████▏ | 9483/11526 [1:39:04<20:55, 1.63it/s] {'loss': 0.1811, 'grad_norm': 0.6023558378219604, 'learning_rate': 9.269738470350347e-07, 'epoch': 2.47}
82%|████████▏ | 9483/11526 [1:39:04<20:55, 1.63it/s] 82%|████████▏ | 9484/11526 [1:39:05<20:54, 1.63it/s] {'loss': 0.1583, 'grad_norm': 0.6176456212997437, 'learning_rate': 9.26095708702362e-07, 'epoch': 2.47}
82%|████████▏ | 9484/11526 [1:39:05<20:54, 1.63it/s] 82%|████████▏ | 9485/11526 [1:39:05<20:54, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.6250483989715576, 'learning_rate': 9.252179440513704e-07, 'epoch': 2.47}
82%|████████▏ | 9485/11526 [1:39:05<20:54, 1.63it/s] 82%|████████▏ | 9486/11526 [1:39:06<20:53, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.600035548210144, 'learning_rate': 9.243405531625749e-07, 'epoch': 2.47}
82%|████████▏ | 9486/11526 [1:39:06<20:53, 1.63it/s] 82%|████████▏ | 9487/11526 [1:39:07<20:54, 1.63it/s] {'loss': 0.178, 'grad_norm': 0.6317929625511169, 'learning_rate': 9.234635361164518e-07, 'epoch': 2.47}
82%|████████▏ | 9487/11526 [1:39:07<20:54, 1.63it/s] 82%|████████▏ | 9488/11526 [1:39:07<20:53, 1.63it/s] {'loss': 0.1246, 'grad_norm': 0.5557188391685486, 'learning_rate': 9.225868929934478e-07, 'epoch': 2.47}
82%|████████▏ | 9488/11526 [1:39:07<20:53, 1.63it/s] 82%|████████▏ | 9489/11526 [1:39:08<20:52, 1.63it/s] {'loss': 0.1405, 'grad_norm': 0.5515019297599792, 'learning_rate': 9.217106238739737e-07, 'epoch': 2.47}
82%|████████▏ | 9489/11526 [1:39:08<20:52, 1.63it/s] 82%|████████▏ | 9490/11526 [1:39:08<20:52, 1.63it/s] {'loss': 0.133, 'grad_norm': 0.5311288833618164, 'learning_rate': 9.208347288384056e-07, 'epoch': 2.47}
82%|████████▏ | 9490/11526 [1:39:08<20:52, 1.63it/s] 82%|████████▏ | 9491/11526 [1:39:09<20:51, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.6491488218307495, 'learning_rate': 9.199592079670855e-07, 'epoch': 2.47}
82%|████████▏ | 9491/11526 [1:39:09<20:51, 1.63it/s] 82%|████████▏ | 9492/11526 [1:39:10<20:50, 1.63it/s] {'loss': 0.1889, 'grad_norm': 0.7049227952957153, 'learning_rate': 9.190840613403229e-07, 'epoch': 2.47}
82%|████████▏ | 9492/11526 [1:39:10<20:50, 1.63it/s] 82%|████████▏ | 9493/11526 [1:39:10<20:49, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.5604729056358337, 'learning_rate': 9.18209289038387e-07, 'epoch': 2.47}
82%|████████▏ | 9493/11526 [1:39:10<20:49, 1.63it/s] 82%|████████▏ | 9494/11526 [1:39:11<20:48, 1.63it/s] {'loss': 0.211, 'grad_norm': 0.7181100249290466, 'learning_rate': 9.17334891141522e-07, 'epoch': 2.47}
82%|████████▏ | 9494/11526 [1:39:11<20:48, 1.63it/s] 82%|████████▏ | 9495/11526 [1:39:11<20:49, 1.63it/s] {'loss': 0.1508, 'grad_norm': 0.6249817609786987, 'learning_rate': 9.164608677299291e-07, 'epoch': 2.47}
82%|████████▏ | 9495/11526 [1:39:12<20:49, 1.63it/s] 82%|████████▏ | 9496/11526 [1:39:12<20:48, 1.63it/s] {'loss': 0.181, 'grad_norm': 0.6862343549728394, 'learning_rate': 9.155872188837794e-07, 'epoch': 2.47}
82%|████████▏ | 9496/11526 [1:39:12<20:48, 1.63it/s] 82%|████████▏ | 9497/11526 [1:39:13<20:47, 1.63it/s] {'loss': 0.1804, 'grad_norm': 0.6595476269721985, 'learning_rate': 9.147139446832109e-07, 'epoch': 2.47}
82%|████████▏ | 9497/11526 [1:39:13<20:47, 1.63it/s] 82%|████████▏ | 9498/11526 [1:39:13<20:46, 1.63it/s] {'loss': 0.1269, 'grad_norm': 0.5776819586753845, 'learning_rate': 9.138410452083213e-07, 'epoch': 2.47}
82%|████████▏ | 9498/11526 [1:39:13<20:46, 1.63it/s] 82%|████████▏ | 9499/11526 [1:39:14<20:45, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.8261678218841553, 'learning_rate': 9.129685205391814e-07, 'epoch': 2.47}
82%|████████▏ | 9499/11526 [1:39:14<20:45, 1.63it/s] 82%|████████▏ | 9500/11526 [1:39:15<20:46, 1.63it/s] {'loss': 0.144, 'grad_norm': 0.5372279286384583, 'learning_rate': 9.120963707558244e-07, 'epoch': 2.47}
82%|████████▏ | 9500/11526 [1:39:15<20:46, 1.63it/s] 82%|████████▏ | 9501/11526 [1:39:15<20:46, 1.62it/s] {'loss': 0.1439, 'grad_norm': 0.5833467841148376, 'learning_rate': 9.112245959382459e-07, 'epoch': 2.47}
82%|████████▏ | 9501/11526 [1:39:15<20:46, 1.62it/s] 82%|████████▏ | 9502/11526 [1:39:16<20:44, 1.63it/s] {'loss': 0.1756, 'grad_norm': 0.6321326494216919, 'learning_rate': 9.10353196166412e-07, 'epoch': 2.47}
82%|████████▏ | 9502/11526 [1:39:16<20:44, 1.63it/s] 82%|████████▏ | 9503/11526 [1:39:16<20:43, 1.63it/s] {'loss': 0.1356, 'grad_norm': 0.5541460514068604, 'learning_rate': 9.094821715202524e-07, 'epoch': 2.47}
82%|████████▏ | 9503/11526 [1:39:16<20:43, 1.63it/s] 82%|████████▏ | 9504/11526 [1:39:17<20:42, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5945034623146057, 'learning_rate': 9.086115220796615e-07, 'epoch': 2.47}
82%|████████▏ | 9504/11526 [1:39:17<20:42, 1.63it/s] 82%|████████▏ | 9505/11526 [1:39:18<20:44, 1.62it/s] {'loss': 0.1628, 'grad_norm': 0.7072017192840576, 'learning_rate': 9.077412479245007e-07, 'epoch': 2.47}
82%|████████▏ | 9505/11526 [1:39:18<20:44, 1.62it/s] 82%|████████▏ | 9506/11526 [1:39:18<20:42, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.726801872253418, 'learning_rate': 9.068713491345987e-07, 'epoch': 2.47}
82%|████████▏ | 9506/11526 [1:39:18<20:42, 1.63it/s] 82%|████████▏ | 9507/11526 [1:39:19<20:46, 1.62it/s] {'loss': 0.1307, 'grad_norm': 0.5233331918716431, 'learning_rate': 9.060018257897424e-07, 'epoch': 2.47}
82%|████████▏ | 9507/11526 [1:39:19<20:46, 1.62it/s] 82%|████████▏ | 9508/11526 [1:39:19<20:43, 1.62it/s] {'loss': 0.1256, 'grad_norm': 0.46933338046073914, 'learning_rate': 9.05132677969695e-07, 'epoch': 2.47}
82%|████████▏ | 9508/11526 [1:39:20<20:43, 1.62it/s] 83%|████████▎ | 9509/11526 [1:39:20<20:41, 1.62it/s] {'loss': 0.1446, 'grad_norm': 0.6156032085418701, 'learning_rate': 9.042639057541758e-07, 'epoch': 2.48}
83%|████████▎ | 9509/11526 [1:39:20<20:41, 1.62it/s] 83%|████████▎ | 9510/11526 [1:39:21<20:47, 1.62it/s] {'loss': 0.143, 'grad_norm': 0.6009649038314819, 'learning_rate': 9.033955092228752e-07, 'epoch': 2.48}
83%|████████▎ | 9510/11526 [1:39:21<20:47, 1.62it/s] 83%|████████▎ | 9511/11526 [1:39:21<20:43, 1.62it/s] {'loss': 0.1363, 'grad_norm': 0.5293670892715454, 'learning_rate': 9.025274884554475e-07, 'epoch': 2.48}
83%|████████▎ | 9511/11526 [1:39:21<20:43, 1.62it/s] 83%|████████▎ | 9512/11526 [1:39:22<20:41, 1.62it/s] {'loss': 0.1499, 'grad_norm': 0.5620976090431213, 'learning_rate': 9.016598435315121e-07, 'epoch': 2.48}
83%|████████▎ | 9512/11526 [1:39:22<20:41, 1.62it/s] 83%|████████▎ | 9513/11526 [1:39:23<20:39, 1.62it/s] {'loss': 0.1688, 'grad_norm': 0.6122113466262817, 'learning_rate': 9.007925745306539e-07, 'epoch': 2.48}
83%|████████▎ | 9513/11526 [1:39:23<20:39, 1.62it/s] 83%|████████▎ | 9514/11526 [1:39:23<20:38, 1.62it/s] {'loss': 0.1381, 'grad_norm': 0.574013352394104, 'learning_rate': 8.999256815324264e-07, 'epoch': 2.48}
83%|████████▎ | 9514/11526 [1:39:23<20:38, 1.62it/s] 83%|████████▎ | 9515/11526 [1:39:24<20:38, 1.62it/s] {'loss': 0.1668, 'grad_norm': 0.6076105237007141, 'learning_rate': 8.990591646163421e-07, 'epoch': 2.48}
83%|████████▎ | 9515/11526 [1:39:24<20:38, 1.62it/s] 83%|████████▎ | 9516/11526 [1:39:24<20:36, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.6111630201339722, 'learning_rate': 8.981930238618846e-07, 'epoch': 2.48}
83%|████████▎ | 9516/11526 [1:39:24<20:36, 1.63it/s] 83%|████████▎ | 9517/11526 [1:39:25<20:35, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.581558108329773, 'learning_rate': 8.973272593485011e-07, 'epoch': 2.48}
83%|████████▎ | 9517/11526 [1:39:25<20:35, 1.63it/s] 83%|████████▎ | 9518/11526 [1:39:26<20:34, 1.63it/s] {'loss': 0.1084, 'grad_norm': 0.42319512367248535, 'learning_rate': 8.96461871155605e-07, 'epoch': 2.48}
83%|████████▎ | 9518/11526 [1:39:26<20:34, 1.63it/s] 83%|████████▎ | 9519/11526 [1:39:26<20:33, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.6366462111473083, 'learning_rate': 8.955968593625752e-07, 'epoch': 2.48}
83%|████████▎ | 9519/11526 [1:39:26<20:33, 1.63it/s] 83%|████████▎ | 9520/11526 [1:39:27<20:33, 1.63it/s] {'loss': 0.1385, 'grad_norm': 0.49409985542297363, 'learning_rate': 8.947322240487521e-07, 'epoch': 2.48}
83%|████████▎ | 9520/11526 [1:39:27<20:33, 1.63it/s] 83%|████████▎ | 9521/11526 [1:39:27<20:31, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.5776953101158142, 'learning_rate': 8.93867965293449e-07, 'epoch': 2.48}
83%|████████▎ | 9521/11526 [1:39:28<20:31, 1.63it/s] 83%|████████▎ | 9522/11526 [1:39:28<20:31, 1.63it/s] {'loss': 0.1221, 'grad_norm': 0.5189456939697266, 'learning_rate': 8.930040831759401e-07, 'epoch': 2.48}
83%|████████▎ | 9522/11526 [1:39:28<20:31, 1.63it/s] 83%|████████▎ | 9523/11526 [1:39:29<20:30, 1.63it/s] {'loss': 0.1406, 'grad_norm': 0.5534934401512146, 'learning_rate': 8.921405777754632e-07, 'epoch': 2.48}
83%|████████▎ | 9523/11526 [1:39:29<20:30, 1.63it/s] 83%|████████▎ | 9524/11526 [1:39:29<20:29, 1.63it/s] {'loss': 0.1283, 'grad_norm': 0.5686557292938232, 'learning_rate': 8.91277449171225e-07, 'epoch': 2.48}
83%|████████▎ | 9524/11526 [1:39:29<20:29, 1.63it/s] 83%|████████▎ | 9525/11526 [1:39:30<20:30, 1.63it/s] {'loss': 0.1361, 'grad_norm': 0.5001038908958435, 'learning_rate': 8.904146974423972e-07, 'epoch': 2.48}
83%|████████▎ | 9525/11526 [1:39:30<20:30, 1.63it/s] 83%|████████▎ | 9526/11526 [1:39:31<20:29, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.4932493567466736, 'learning_rate': 8.895523226681157e-07, 'epoch': 2.48}
83%|████████▎ | 9526/11526 [1:39:31<20:29, 1.63it/s] 83%|████████▎ | 9527/11526 [1:39:31<20:29, 1.63it/s] {'loss': 0.2134, 'grad_norm': 0.7252442836761475, 'learning_rate': 8.886903249274841e-07, 'epoch': 2.48}
83%|████████▎ | 9527/11526 [1:39:31<20:29, 1.63it/s] 83%|████████▎ | 9528/11526 [1:39:32<20:28, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.5172348022460938, 'learning_rate': 8.87828704299567e-07, 'epoch': 2.48}
83%|████████▎ | 9528/11526 [1:39:32<20:28, 1.63it/s] 83%|████████▎ | 9529/11526 [1:39:32<20:27, 1.63it/s] {'loss': 0.1658, 'grad_norm': 0.5930273532867432, 'learning_rate': 8.869674608633972e-07, 'epoch': 2.48}
83%|████████▎ | 9529/11526 [1:39:32<20:27, 1.63it/s] 83%|████████▎ | 9530/11526 [1:39:33<20:27, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.6428400874137878, 'learning_rate': 8.861065946979763e-07, 'epoch': 2.48}
83%|████████▎ | 9530/11526 [1:39:33<20:27, 1.63it/s] 83%|████████▎ | 9531/11526 [1:39:34<20:27, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.6156269907951355, 'learning_rate': 8.852461058822648e-07, 'epoch': 2.48}
83%|████████▎ | 9531/11526 [1:39:34<20:27, 1.63it/s] 83%|████████▎ | 9532/11526 [1:39:34<20:26, 1.63it/s] {'loss': 0.1402, 'grad_norm': 0.5502901673316956, 'learning_rate': 8.843859944951922e-07, 'epoch': 2.48}
83%|████████▎ | 9532/11526 [1:39:34<20:26, 1.63it/s] 83%|████████▎ | 9533/11526 [1:39:35<20:25, 1.63it/s] {'loss': 0.1263, 'grad_norm': 0.4915001690387726, 'learning_rate': 8.83526260615653e-07, 'epoch': 2.48}
83%|████████▎ | 9533/11526 [1:39:35<20:25, 1.63it/s] 83%|████████▎ | 9534/11526 [1:39:35<20:24, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.549921452999115, 'learning_rate': 8.826669043225072e-07, 'epoch': 2.48}
83%|████████▎ | 9534/11526 [1:39:36<20:24, 1.63it/s] 83%|████████▎ | 9535/11526 [1:39:36<20:25, 1.62it/s] {'loss': 0.148, 'grad_norm': 0.6605267524719238, 'learning_rate': 8.818079256945805e-07, 'epoch': 2.48}
83%|████████▎ | 9535/11526 [1:39:36<20:25, 1.62it/s] 83%|████████▎ | 9536/11526 [1:39:37<20:24, 1.63it/s] {'loss': 0.1378, 'grad_norm': 0.5566823482513428, 'learning_rate': 8.809493248106616e-07, 'epoch': 2.48}
83%|████████▎ | 9536/11526 [1:39:37<20:24, 1.63it/s] 83%|████████▎ | 9537/11526 [1:39:37<20:23, 1.63it/s] {'loss': 0.137, 'grad_norm': 0.5659559965133667, 'learning_rate': 8.800911017495067e-07, 'epoch': 2.48}
83%|████████▎ | 9537/11526 [1:39:37<20:23, 1.63it/s] 83%|████████▎ | 9538/11526 [1:39:38<20:23, 1.63it/s] {'loss': 0.1709, 'grad_norm': 0.6671651601791382, 'learning_rate': 8.792332565898376e-07, 'epoch': 2.48}
83%|████████▎ | 9538/11526 [1:39:38<20:23, 1.63it/s] 83%|████████▎ | 9539/11526 [1:39:39<20:22, 1.63it/s] {'loss': 0.1551, 'grad_norm': 0.5976393818855286, 'learning_rate': 8.783757894103401e-07, 'epoch': 2.48}
83%|████████▎ | 9539/11526 [1:39:39<20:22, 1.63it/s] 83%|████████▎ | 9540/11526 [1:39:39<20:22, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.6077564358711243, 'learning_rate': 8.775187002896662e-07, 'epoch': 2.48}
83%|████████▎ | 9540/11526 [1:39:39<20:22, 1.63it/s] 83%|████████▎ | 9541/11526 [1:39:40<20:21, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.5565925240516663, 'learning_rate': 8.766619893064349e-07, 'epoch': 2.48}
83%|████████▎ | 9541/11526 [1:39:40<20:21, 1.63it/s] 83%|████████▎ | 9542/11526 [1:39:40<20:20, 1.63it/s] {'loss': 0.1752, 'grad_norm': 0.6953280568122864, 'learning_rate': 8.758056565392236e-07, 'epoch': 2.48}
83%|████████▎ | 9542/11526 [1:39:40<20:20, 1.63it/s] 83%|████████▎ | 9543/11526 [1:39:41<20:19, 1.63it/s] {'loss': 0.1192, 'grad_norm': 0.439850389957428, 'learning_rate': 8.74949702066586e-07, 'epoch': 2.48}
83%|████████▎ | 9543/11526 [1:39:41<20:19, 1.63it/s] 83%|████████▎ | 9544/11526 [1:39:42<20:20, 1.62it/s] {'loss': 0.1807, 'grad_norm': 0.6152529716491699, 'learning_rate': 8.74094125967031e-07, 'epoch': 2.48}
83%|████████▎ | 9544/11526 [1:39:42<20:20, 1.62it/s] 83%|████████▎ | 9545/11526 [1:39:42<20:19, 1.62it/s] {'loss': 0.141, 'grad_norm': 0.6341289281845093, 'learning_rate': 8.732389283190379e-07, 'epoch': 2.48}
83%|████████▎ | 9545/11526 [1:39:42<20:19, 1.62it/s] 83%|████████▎ | 9546/11526 [1:39:43<20:18, 1.63it/s] {'loss': 0.1269, 'grad_norm': 0.5358812808990479, 'learning_rate': 8.723841092010521e-07, 'epoch': 2.48}
83%|████████▎ | 9546/11526 [1:39:43<20:18, 1.63it/s] 83%|████████▎ | 9547/11526 [1:39:43<20:17, 1.63it/s] {'loss': 0.1528, 'grad_norm': 0.5773465633392334, 'learning_rate': 8.715296686914787e-07, 'epoch': 2.48}
83%|████████▎ | 9547/11526 [1:39:44<20:17, 1.63it/s] 83%|████████▎ | 9548/11526 [1:39:44<20:23, 1.62it/s] {'loss': 0.1509, 'grad_norm': 0.7142391204833984, 'learning_rate': 8.70675606868695e-07, 'epoch': 2.49}
83%|████████▎ | 9548/11526 [1:39:44<20:23, 1.62it/s] 83%|████████▎ | 9549/11526 [1:39:45<20:19, 1.62it/s] {'loss': 0.1373, 'grad_norm': 0.6876763105392456, 'learning_rate': 8.698219238110405e-07, 'epoch': 2.49}
83%|████████▎ | 9549/11526 [1:39:45<20:19, 1.62it/s] 83%|████████▎ | 9550/11526 [1:39:45<20:19, 1.62it/s] {'loss': 0.1505, 'grad_norm': 0.5704160332679749, 'learning_rate': 8.689686195968183e-07, 'epoch': 2.49}
83%|████████▎ | 9550/11526 [1:39:45<20:19, 1.62it/s] 83%|████████▎ | 9551/11526 [1:39:46<20:17, 1.62it/s] {'loss': 0.1271, 'grad_norm': 0.5173086524009705, 'learning_rate': 8.681156943042985e-07, 'epoch': 2.49}
83%|████████▎ | 9551/11526 [1:39:46<20:17, 1.62it/s] 83%|████████▎ | 9552/11526 [1:39:47<20:15, 1.62it/s] {'loss': 0.1588, 'grad_norm': 0.597203254699707, 'learning_rate': 8.672631480117172e-07, 'epoch': 2.49}
83%|████████▎ | 9552/11526 [1:39:47<20:15, 1.62it/s] 83%|████████▎ | 9553/11526 [1:39:47<20:13, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.7537393569946289, 'learning_rate': 8.664109807972737e-07, 'epoch': 2.49}
83%|████████▎ | 9553/11526 [1:39:47<20:13, 1.63it/s] 83%|████████▎ | 9554/11526 [1:39:48<20:13, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.5729901194572449, 'learning_rate': 8.655591927391355e-07, 'epoch': 2.49}
83%|████████▎ | 9554/11526 [1:39:48<20:13, 1.63it/s] 83%|████████▎ | 9555/11526 [1:39:48<20:13, 1.62it/s] {'loss': 0.1504, 'grad_norm': 0.588355302810669, 'learning_rate': 8.647077839154311e-07, 'epoch': 2.49}
83%|████████▎ | 9555/11526 [1:39:48<20:13, 1.62it/s] 83%|████████▎ | 9556/11526 [1:39:49<20:12, 1.62it/s] {'loss': 0.1805, 'grad_norm': 0.7055484056472778, 'learning_rate': 8.638567544042565e-07, 'epoch': 2.49}
83%|████████▎ | 9556/11526 [1:39:49<20:12, 1.62it/s] 83%|████████▎ | 9557/11526 [1:39:50<20:11, 1.63it/s] {'loss': 0.1391, 'grad_norm': 0.534791111946106, 'learning_rate': 8.630061042836763e-07, 'epoch': 2.49}
83%|████████▎ | 9557/11526 [1:39:50<20:11, 1.63it/s] 83%|████████▎ | 9558/11526 [1:39:50<20:10, 1.63it/s] {'loss': 0.1128, 'grad_norm': 0.4202081561088562, 'learning_rate': 8.621558336317132e-07, 'epoch': 2.49}
83%|████████▎ | 9558/11526 [1:39:50<20:10, 1.63it/s] 83%|████████▎ | 9559/11526 [1:39:51<20:08, 1.63it/s] {'loss': 0.1393, 'grad_norm': 0.600702166557312, 'learning_rate': 8.613059425263609e-07, 'epoch': 2.49}
83%|████████▎ | 9559/11526 [1:39:51<20:08, 1.63it/s] 83%|████████▎ | 9560/11526 [1:39:51<20:10, 1.62it/s] {'loss': 0.151, 'grad_norm': 0.6018631458282471, 'learning_rate': 8.604564310455754e-07, 'epoch': 2.49}
83%|████████▎ | 9560/11526 [1:39:52<20:10, 1.62it/s] 83%|████████▎ | 9561/11526 [1:39:52<20:08, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.6597316861152649, 'learning_rate': 8.596072992672793e-07, 'epoch': 2.49}
83%|████████▎ | 9561/11526 [1:39:52<20:08, 1.63it/s] 83%|████████▎ | 9562/11526 [1:39:53<20:07, 1.63it/s] {'loss': 0.156, 'grad_norm': 0.6537373661994934, 'learning_rate': 8.58758547269361e-07, 'epoch': 2.49}
83%|████████▎ | 9562/11526 [1:39:53<20:07, 1.63it/s] 83%|████████▎ | 9563/11526 [1:39:53<20:06, 1.63it/s] {'loss': 0.1241, 'grad_norm': 0.5839985609054565, 'learning_rate': 8.579101751296698e-07, 'epoch': 2.49}
83%|████████▎ | 9563/11526 [1:39:53<20:06, 1.63it/s] 83%|████████▎ | 9564/11526 [1:39:54<20:05, 1.63it/s] {'loss': 0.1849, 'grad_norm': 0.7303462624549866, 'learning_rate': 8.570621829260251e-07, 'epoch': 2.49}
83%|████████▎ | 9564/11526 [1:39:54<20:05, 1.63it/s] 83%|████████▎ | 9565/11526 [1:39:55<20:05, 1.63it/s] {'loss': 0.1579, 'grad_norm': 0.5893614292144775, 'learning_rate': 8.56214570736209e-07, 'epoch': 2.49}
83%|████████▎ | 9565/11526 [1:39:55<20:05, 1.63it/s] 83%|████████▎ | 9566/11526 [1:39:55<20:05, 1.63it/s] {'loss': 0.1627, 'grad_norm': 0.6109346151351929, 'learning_rate': 8.553673386379702e-07, 'epoch': 2.49}
83%|████████▎ | 9566/11526 [1:39:55<20:05, 1.63it/s] 83%|████████▎ | 9567/11526 [1:39:56<20:04, 1.63it/s] {'loss': 0.1552, 'grad_norm': 0.5935704708099365, 'learning_rate': 8.545204867090207e-07, 'epoch': 2.49}
83%|████████▎ | 9567/11526 [1:39:56<20:04, 1.63it/s] 83%|████████▎ | 9568/11526 [1:39:56<20:04, 1.63it/s] {'loss': 0.1758, 'grad_norm': 0.6272428631782532, 'learning_rate': 8.536740150270401e-07, 'epoch': 2.49}
83%|████████▎ | 9568/11526 [1:39:56<20:04, 1.63it/s] 83%|████████▎ | 9569/11526 [1:39:57<20:03, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.559107780456543, 'learning_rate': 8.528279236696679e-07, 'epoch': 2.49}
83%|████████▎ | 9569/11526 [1:39:57<20:03, 1.63it/s] 83%|████████▎ | 9570/11526 [1:39:58<20:03, 1.62it/s] {'loss': 0.1571, 'grad_norm': 0.5599580407142639, 'learning_rate': 8.519822127145177e-07, 'epoch': 2.49}
83%|████████▎ | 9570/11526 [1:39:58<20:03, 1.62it/s] 83%|████████▎ | 9571/11526 [1:39:58<20:02, 1.63it/s] {'loss': 0.147, 'grad_norm': 0.6097681522369385, 'learning_rate': 8.511368822391592e-07, 'epoch': 2.49}
83%|████████▎ | 9571/11526 [1:39:58<20:02, 1.63it/s] 83%|████████▎ | 9572/11526 [1:39:59<20:01, 1.63it/s] {'loss': 0.1736, 'grad_norm': 0.6726387143135071, 'learning_rate': 8.502919323211317e-07, 'epoch': 2.49}
83%|████████▎ | 9572/11526 [1:39:59<20:01, 1.63it/s] 83%|████████▎ | 9573/11526 [1:39:59<20:00, 1.63it/s] {'loss': 0.12, 'grad_norm': 0.49821552634239197, 'learning_rate': 8.494473630379391e-07, 'epoch': 2.49}
83%|████████▎ | 9573/11526 [1:40:00<20:00, 1.63it/s] 83%|████████▎ | 9574/11526 [1:40:00<20:00, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.6506961584091187, 'learning_rate': 8.486031744670497e-07, 'epoch': 2.49}
83%|████████▎ | 9574/11526 [1:40:00<20:00, 1.63it/s] 83%|████████▎ | 9575/11526 [1:40:01<20:00, 1.62it/s] {'loss': 0.1469, 'grad_norm': 0.5823773145675659, 'learning_rate': 8.47759366685898e-07, 'epoch': 2.49}
83%|████████▎ | 9575/11526 [1:40:01<20:00, 1.62it/s] 83%|████████▎ | 9576/11526 [1:40:01<19:59, 1.63it/s] {'loss': 0.163, 'grad_norm': 0.6071286201477051, 'learning_rate': 8.469159397718834e-07, 'epoch': 2.49}
83%|████████▎ | 9576/11526 [1:40:01<19:59, 1.63it/s] 83%|████████▎ | 9577/11526 [1:40:02<19:57, 1.63it/s] {'loss': 0.2004, 'grad_norm': 0.7119973301887512, 'learning_rate': 8.460728938023677e-07, 'epoch': 2.49}
83%|████████▎ | 9577/11526 [1:40:02<19:57, 1.63it/s] 83%|████████▎ | 9578/11526 [1:40:03<19:57, 1.63it/s] {'loss': 0.1433, 'grad_norm': 0.5501281023025513, 'learning_rate': 8.452302288546815e-07, 'epoch': 2.49}
83%|████████▎ | 9578/11526 [1:40:03<19:57, 1.63it/s] 83%|████████▎ | 9579/11526 [1:40:03<19:56, 1.63it/s] {'loss': 0.1491, 'grad_norm': 0.5774813890457153, 'learning_rate': 8.443879450061176e-07, 'epoch': 2.49}
83%|████████▎ | 9579/11526 [1:40:03<19:56, 1.63it/s] 83%|████████▎ | 9580/11526 [1:40:04<19:56, 1.63it/s] {'loss': 0.1754, 'grad_norm': 0.7212733030319214, 'learning_rate': 8.435460423339365e-07, 'epoch': 2.49}
83%|████████▎ | 9580/11526 [1:40:04<19:56, 1.63it/s] 83%|████████▎ | 9581/11526 [1:40:04<19:56, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.5923661589622498, 'learning_rate': 8.427045209153617e-07, 'epoch': 2.49}
83%|████████▎ | 9581/11526 [1:40:04<19:56, 1.63it/s] 83%|████████▎ | 9582/11526 [1:40:05<19:55, 1.63it/s] {'loss': 0.1855, 'grad_norm': 0.655109703540802, 'learning_rate': 8.418633808275817e-07, 'epoch': 2.49}
83%|████████▎ | 9582/11526 [1:40:05<19:55, 1.63it/s] 83%|████████▎ | 9583/11526 [1:40:06<19:54, 1.63it/s] {'loss': 0.1107, 'grad_norm': 0.45690983533859253, 'learning_rate': 8.410226221477513e-07, 'epoch': 2.49}
83%|████████▎ | 9583/11526 [1:40:06<19:54, 1.63it/s] 83%|████████▎ | 9584/11526 [1:40:06<19:54, 1.63it/s] {'loss': 0.1185, 'grad_norm': 0.5631211400032043, 'learning_rate': 8.401822449529911e-07, 'epoch': 2.49}
83%|████████▎ | 9584/11526 [1:40:06<19:54, 1.63it/s] 83%|████████▎ | 9585/11526 [1:40:07<19:54, 1.62it/s] {'loss': 0.1352, 'grad_norm': 0.5669726133346558, 'learning_rate': 8.393422493203829e-07, 'epoch': 2.49}
83%|████████▎ | 9585/11526 [1:40:07<19:54, 1.62it/s] 83%|████████▎ | 9586/11526 [1:40:07<19:55, 1.62it/s] {'loss': 0.178, 'grad_norm': 0.8014151453971863, 'learning_rate': 8.385026353269759e-07, 'epoch': 2.5}
83%|████████▎ | 9586/11526 [1:40:08<19:55, 1.62it/s] 83%|████████▎ | 9587/11526 [1:40:08<19:54, 1.62it/s] {'loss': 0.18, 'grad_norm': 0.6104777455329895, 'learning_rate': 8.37663403049786e-07, 'epoch': 2.5}
83%|████████▎ | 9587/11526 [1:40:08<19:54, 1.62it/s] 83%|████████▎ | 9588/11526 [1:40:09<19:52, 1.62it/s] {'loss': 0.1612, 'grad_norm': 0.6039347052574158, 'learning_rate': 8.368245525657909e-07, 'epoch': 2.5}
83%|████████▎ | 9588/11526 [1:40:09<19:52, 1.62it/s] 83%|████████▎ | 9589/11526 [1:40:09<19:52, 1.62it/s] {'loss': 0.1182, 'grad_norm': 0.4923924207687378, 'learning_rate': 8.359860839519357e-07, 'epoch': 2.5}
83%|████████▎ | 9589/11526 [1:40:09<19:52, 1.62it/s] 83%|████████▎ | 9590/11526 [1:40:10<19:52, 1.62it/s] {'loss': 0.1148, 'grad_norm': 0.4649714529514313, 'learning_rate': 8.3514799728513e-07, 'epoch': 2.5}
83%|████████▎ | 9590/11526 [1:40:10<19:52, 1.62it/s] 83%|████████▎ | 9591/11526 [1:40:11<19:50, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.6065909266471863, 'learning_rate': 8.343102926422447e-07, 'epoch': 2.5}
83%|████████▎ | 9591/11526 [1:40:11<19:50, 1.63it/s] 83%|████████▎ | 9592/11526 [1:40:11<19:49, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.6109460592269897, 'learning_rate': 8.334729701001238e-07, 'epoch': 2.5}
83%|████████▎ | 9592/11526 [1:40:11<19:49, 1.63it/s] 83%|████████▎ | 9593/11526 [1:40:12<19:48, 1.63it/s] {'loss': 0.1232, 'grad_norm': 0.4974806308746338, 'learning_rate': 8.326360297355668e-07, 'epoch': 2.5}
83%|████████▎ | 9593/11526 [1:40:12<19:48, 1.63it/s] 83%|████████▎ | 9594/11526 [1:40:12<19:48, 1.63it/s] {'loss': 0.2066, 'grad_norm': 0.504088282585144, 'learning_rate': 8.317994716253453e-07, 'epoch': 2.5}
83%|████████▎ | 9594/11526 [1:40:12<19:48, 1.63it/s] 83%|████████▎ | 9595/11526 [1:40:13<19:47, 1.63it/s] {'loss': 0.1913, 'grad_norm': 0.7318309545516968, 'learning_rate': 8.309632958461933e-07, 'epoch': 2.5}
83%|████████▎ | 9595/11526 [1:40:13<19:47, 1.63it/s] 83%|████████▎ | 9596/11526 [1:40:14<19:47, 1.63it/s] {'loss': 0.1586, 'grad_norm': 0.6029744744300842, 'learning_rate': 8.301275024748056e-07, 'epoch': 2.5}
83%|████████▎ | 9596/11526 [1:40:14<19:47, 1.63it/s] 83%|████████▎ | 9597/11526 [1:40:14<19:46, 1.63it/s] {'loss': 0.163, 'grad_norm': 0.5369658470153809, 'learning_rate': 8.292920915878511e-07, 'epoch': 2.5}
83%|████████▎ | 9597/11526 [1:40:14<19:46, 1.63it/s] 83%|████████▎ | 9598/11526 [1:40:15<19:45, 1.63it/s] {'loss': 0.1367, 'grad_norm': 0.5631152391433716, 'learning_rate': 8.284570632619571e-07, 'epoch': 2.5}
83%|████████▎ | 9598/11526 [1:40:15<19:45, 1.63it/s] 83%|████████▎ | 9599/11526 [1:40:15<19:44, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.5777652263641357, 'learning_rate': 8.276224175737152e-07, 'epoch': 2.5}
83%|████████▎ | 9599/11526 [1:40:16<19:44, 1.63it/s] 83%|████████▎ | 9600/11526 [1:40:16<19:44, 1.63it/s] {'loss': 0.2038, 'grad_norm': 0.5662856698036194, 'learning_rate': 8.267881545996848e-07, 'epoch': 2.5}
83%|████████▎ | 9600/11526 [1:40:16<19:44, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.30it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5440368056297302, 'eval_runtime': 1.9556, 'eval_samples_per_second': 102.271, 'eval_steps_per_second': 6.648, 'epoch': 2.5}
83%|████████▎ | 9600/11526 [1:40:18<19:44, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 83%|████████▎ | 9601/11526 [1:40:19<38:36, 1.20s/it] {'loss': 0.1736, 'grad_norm': 0.6734690070152283, 'learning_rate': 8.259542744163895e-07, 'epoch': 2.5}
83%|████████▎ | 9601/11526 [1:40:19<38:36, 1.20s/it] 83%|████████▎ | 9602/11526 [1:40:19<32:55, 1.03s/it] {'loss': 0.1531, 'grad_norm': 0.522200345993042, 'learning_rate': 8.251207771003177e-07, 'epoch': 2.5}
83%|████████▎ | 9602/11526 [1:40:19<32:55, 1.03s/it] 83%|████████▎ | 9603/11526 [1:40:20<28:55, 1.11it/s] {'loss': 0.1318, 'grad_norm': 0.5286691188812256, 'learning_rate': 8.242876627279234e-07, 'epoch': 2.5}
83%|████████▎ | 9603/11526 [1:40:20<28:55, 1.11it/s] 83%|████████▎ | 9604/11526 [1:40:20<26:08, 1.23it/s] {'loss': 0.152, 'grad_norm': 0.5988606214523315, 'learning_rate': 8.234549313756224e-07, 'epoch': 2.5}
83%|████████▎ | 9604/11526 [1:40:21<26:08, 1.23it/s] 83%|████████▎ | 9605/11526 [1:40:21<24:11, 1.32it/s] {'loss': 0.2044, 'grad_norm': 0.7336329221725464, 'learning_rate': 8.226225831197971e-07, 'epoch': 2.5}
83%|████████▎ | 9605/11526 [1:40:21<24:11, 1.32it/s] 83%|████████▎ | 9606/11526 [1:40:22<22:48, 1.40it/s] {'loss': 0.1804, 'grad_norm': 0.6194395422935486, 'learning_rate': 8.217906180367996e-07, 'epoch': 2.5}
83%|████████▎ | 9606/11526 [1:40:22<22:48, 1.40it/s] 83%|████████▎ | 9607/11526 [1:40:22<21:51, 1.46it/s] {'loss': 0.1785, 'grad_norm': 0.6333456635475159, 'learning_rate': 8.209590362029379e-07, 'epoch': 2.5}
83%|████████▎ | 9607/11526 [1:40:22<21:51, 1.46it/s] 83%|████████▎ | 9608/11526 [1:40:23<21:10, 1.51it/s] {'loss': 0.1614, 'grad_norm': 0.6187954545021057, 'learning_rate': 8.201278376944916e-07, 'epoch': 2.5}
83%|████████▎ | 9608/11526 [1:40:23<21:10, 1.51it/s] 83%|████████▎ | 9609/11526 [1:40:24<20:42, 1.54it/s] {'loss': 0.1318, 'grad_norm': 0.5287304520606995, 'learning_rate': 8.192970225877028e-07, 'epoch': 2.5}
83%|████████▎ | 9609/11526 [1:40:24<20:42, 1.54it/s] 83%|████████▎ | 9610/11526 [1:40:24<20:22, 1.57it/s] {'loss': 0.1653, 'grad_norm': 0.6505612134933472, 'learning_rate': 8.18466590958778e-07, 'epoch': 2.5}
83%|████████▎ | 9610/11526 [1:40:24<20:22, 1.57it/s] 83%|████████▎ | 9611/11526 [1:40:25<20:08, 1.59it/s] {'loss': 0.1859, 'grad_norm': 0.5757305026054382, 'learning_rate': 8.176365428838906e-07, 'epoch': 2.5}
83%|████████▎ | 9611/11526 [1:40:25<20:08, 1.59it/s] 83%|████████▎ | 9612/11526 [1:40:25<19:57, 1.60it/s] {'loss': 0.1459, 'grad_norm': 0.5938222408294678, 'learning_rate': 8.168068784391747e-07, 'epoch': 2.5}
83%|████████▎ | 9612/11526 [1:40:25<19:57, 1.60it/s] 83%|████████▎ | 9613/11526 [1:40:26<19:50, 1.61it/s] {'loss': 0.1787, 'grad_norm': 0.6812059879302979, 'learning_rate': 8.159775977007334e-07, 'epoch': 2.5}
83%|████████▎ | 9613/11526 [1:40:26<19:50, 1.61it/s] 83%|████████▎ | 9614/11526 [1:40:27<19:45, 1.61it/s] {'loss': 0.1696, 'grad_norm': 0.6306744813919067, 'learning_rate': 8.151487007446335e-07, 'epoch': 2.5}
83%|████████▎ | 9614/11526 [1:40:27<19:45, 1.61it/s] 83%|████████▎ | 9615/11526 [1:40:27<19:43, 1.62it/s] {'loss': 0.1282, 'grad_norm': 0.5205495953559875, 'learning_rate': 8.143201876469048e-07, 'epoch': 2.5}
83%|████████▎ | 9615/11526 [1:40:27<19:43, 1.62it/s] 83%|████████▎ | 9616/11526 [1:40:28<19:39, 1.62it/s] {'loss': 0.1275, 'grad_norm': 0.4911995530128479, 'learning_rate': 8.134920584835443e-07, 'epoch': 2.5}
83%|████████▎ | 9616/11526 [1:40:28<19:39, 1.62it/s] 83%|████████▎ | 9617/11526 [1:40:28<19:36, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.6224273443222046, 'learning_rate': 8.12664313330514e-07, 'epoch': 2.5}
83%|████████▎ | 9617/11526 [1:40:29<19:36, 1.62it/s] 83%|████████▎ | 9618/11526 [1:40:29<19:35, 1.62it/s] {'loss': 0.1684, 'grad_norm': 0.7048724293708801, 'learning_rate': 8.118369522637348e-07, 'epoch': 2.5}
83%|████████▎ | 9618/11526 [1:40:29<19:35, 1.62it/s] 83%|████████▎ | 9619/11526 [1:40:30<19:33, 1.62it/s] {'loss': 0.1464, 'grad_norm': 0.5423153042793274, 'learning_rate': 8.110099753591033e-07, 'epoch': 2.5}
83%|████████▎ | 9619/11526 [1:40:30<19:33, 1.62it/s] 83%|████████▎ | 9620/11526 [1:40:30<19:32, 1.63it/s] {'loss': 0.107, 'grad_norm': 0.4765382707118988, 'learning_rate': 8.101833826924693e-07, 'epoch': 2.5}
83%|████████▎ | 9620/11526 [1:40:30<19:32, 1.63it/s] 83%|████████▎ | 9621/11526 [1:40:31<19:32, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.638895571231842, 'learning_rate': 8.093571743396544e-07, 'epoch': 2.5}
83%|████████▎ | 9621/11526 [1:40:31<19:32, 1.63it/s] 83%|████████▎ | 9622/11526 [1:40:32<19:30, 1.63it/s] {'loss': 0.1531, 'grad_norm': 0.6265671253204346, 'learning_rate': 8.085313503764435e-07, 'epoch': 2.5}
83%|████████▎ | 9622/11526 [1:40:32<19:30, 1.63it/s] 83%|████████▎ | 9623/11526 [1:40:32<19:29, 1.63it/s] {'loss': 0.1573, 'grad_norm': 0.6051944494247437, 'learning_rate': 8.077059108785862e-07, 'epoch': 2.5}
83%|████████▎ | 9623/11526 [1:40:32<19:29, 1.63it/s] 83%|████████▎ | 9624/11526 [1:40:33<19:29, 1.63it/s] {'loss': 0.1517, 'grad_norm': 0.6099645495414734, 'learning_rate': 8.06880855921795e-07, 'epoch': 2.5}
83%|████████▎ | 9624/11526 [1:40:33<19:29, 1.63it/s] 84%|████████▎ | 9625/11526 [1:40:33<19:28, 1.63it/s] {'loss': 0.1471, 'grad_norm': 0.572357177734375, 'learning_rate': 8.060561855817511e-07, 'epoch': 2.51}
84%|████████▎ | 9625/11526 [1:40:33<19:28, 1.63it/s] 84%|████████▎ | 9626/11526 [1:40:34<19:28, 1.63it/s] {'loss': 0.1932, 'grad_norm': 0.7221324443817139, 'learning_rate': 8.052318999340953e-07, 'epoch': 2.51}
84%|████████▎ | 9626/11526 [1:40:34<19:28, 1.63it/s] 84%|████████▎ | 9627/11526 [1:40:35<19:27, 1.63it/s] {'loss': 0.139, 'grad_norm': 0.5925169587135315, 'learning_rate': 8.044079990544368e-07, 'epoch': 2.51}
84%|████████▎ | 9627/11526 [1:40:35<19:27, 1.63it/s] 84%|████████▎ | 9628/11526 [1:40:35<19:26, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.5987269282341003, 'learning_rate': 8.035844830183487e-07, 'epoch': 2.51}
84%|████████▎ | 9628/11526 [1:40:35<19:26, 1.63it/s] 84%|████████▎ | 9629/11526 [1:40:36<19:26, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.6331163644790649, 'learning_rate': 8.027613519013683e-07, 'epoch': 2.51}
84%|████████▎ | 9629/11526 [1:40:36<19:26, 1.63it/s] 84%|████████▎ | 9630/11526 [1:40:36<19:25, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.5910627841949463, 'learning_rate': 8.019386057789996e-07, 'epoch': 2.51}
84%|████████▎ | 9630/11526 [1:40:37<19:25, 1.63it/s] 84%|████████▎ | 9631/11526 [1:40:37<19:24, 1.63it/s] {'loss': 0.1098, 'grad_norm': 0.4467330574989319, 'learning_rate': 8.011162447267051e-07, 'epoch': 2.51}
84%|████████▎ | 9631/11526 [1:40:37<19:24, 1.63it/s] 84%|████████▎ | 9632/11526 [1:40:38<19:23, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.5619364380836487, 'learning_rate': 8.002942688199205e-07, 'epoch': 2.51}
84%|████████▎ | 9632/11526 [1:40:38<19:23, 1.63it/s] 84%|████████▎ | 9633/11526 [1:40:38<19:23, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5720078349113464, 'learning_rate': 7.994726781340422e-07, 'epoch': 2.51}
84%|████████▎ | 9633/11526 [1:40:38<19:23, 1.63it/s] 84%|████████▎ | 9634/11526 [1:40:39<19:22, 1.63it/s] {'loss': 0.154, 'grad_norm': 0.587344229221344, 'learning_rate': 7.986514727444289e-07, 'epoch': 2.51}
84%|████████▎ | 9634/11526 [1:40:39<19:22, 1.63it/s] 84%|████████▎ | 9635/11526 [1:40:40<19:22, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.6081303358078003, 'learning_rate': 7.978306527264068e-07, 'epoch': 2.51}
84%|████████▎ | 9635/11526 [1:40:40<19:22, 1.63it/s] 84%|████████▎ | 9636/11526 [1:40:40<19:21, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.5301871299743652, 'learning_rate': 7.970102181552669e-07, 'epoch': 2.51}
84%|████████▎ | 9636/11526 [1:40:40<19:21, 1.63it/s] 84%|████████▎ | 9637/11526 [1:40:41<19:21, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.5881807208061218, 'learning_rate': 7.961901691062635e-07, 'epoch': 2.51}
84%|████████▎ | 9637/11526 [1:40:41<19:21, 1.63it/s] 84%|████████▎ | 9638/11526 [1:40:41<19:20, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.5944358706474304, 'learning_rate': 7.95370505654618e-07, 'epoch': 2.51}
84%|████████▎ | 9638/11526 [1:40:41<19:20, 1.63it/s] 84%|████████▎ | 9639/11526 [1:40:42<19:21, 1.62it/s] {'loss': 0.1992, 'grad_norm': 0.6487146019935608, 'learning_rate': 7.945512278755119e-07, 'epoch': 2.51}
84%|████████▎ | 9639/11526 [1:40:42<19:21, 1.62it/s] 84%|████████▎ | 9640/11526 [1:40:43<19:19, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.6847327351570129, 'learning_rate': 7.937323358440935e-07, 'epoch': 2.51}
84%|████████▎ | 9640/11526 [1:40:43<19:19, 1.63it/s] 84%|████████▎ | 9641/11526 [1:40:43<19:18, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.7048696279525757, 'learning_rate': 7.92913829635481e-07, 'epoch': 2.51}
84%|████████▎ | 9641/11526 [1:40:43<19:18, 1.63it/s] 84%|████████▎ | 9642/11526 [1:40:44<19:17, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.611409604549408, 'learning_rate': 7.920957093247478e-07, 'epoch': 2.51}
84%|████████▎ | 9642/11526 [1:40:44<19:17, 1.63it/s] 84%|████████▎ | 9643/11526 [1:40:44<19:17, 1.63it/s] {'loss': 0.146, 'grad_norm': 0.6262607574462891, 'learning_rate': 7.912779749869381e-07, 'epoch': 2.51}
84%|████████▎ | 9643/11526 [1:40:45<19:17, 1.63it/s] 84%|████████▎ | 9644/11526 [1:40:45<19:16, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.589229941368103, 'learning_rate': 7.904606266970605e-07, 'epoch': 2.51}
84%|████████▎ | 9644/11526 [1:40:45<19:16, 1.63it/s] 84%|████████▎ | 9645/11526 [1:40:46<19:16, 1.63it/s] {'loss': 0.1243, 'grad_norm': 0.6897417306900024, 'learning_rate': 7.896436645300831e-07, 'epoch': 2.51}
84%|████████▎ | 9645/11526 [1:40:46<19:16, 1.63it/s] 84%|████████▎ | 9646/11526 [1:40:46<19:15, 1.63it/s] {'loss': 0.1837, 'grad_norm': 0.6541534066200256, 'learning_rate': 7.888270885609467e-07, 'epoch': 2.51}
84%|████████▎ | 9646/11526 [1:40:46<19:15, 1.63it/s] 84%|████████▎ | 9647/11526 [1:40:47<19:14, 1.63it/s] {'loss': 0.1379, 'grad_norm': 0.5263383984565735, 'learning_rate': 7.880108988645496e-07, 'epoch': 2.51}
84%|████████▎ | 9647/11526 [1:40:47<19:14, 1.63it/s] 84%|████████▎ | 9648/11526 [1:40:47<19:14, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.5832012891769409, 'learning_rate': 7.871950955157581e-07, 'epoch': 2.51}
84%|████████▎ | 9648/11526 [1:40:48<19:14, 1.63it/s] 84%|████████▎ | 9649/11526 [1:40:48<19:13, 1.63it/s] {'loss': 0.1673, 'grad_norm': 0.6834632754325867, 'learning_rate': 7.863796785894023e-07, 'epoch': 2.51}
84%|████████▎ | 9649/11526 [1:40:48<19:13, 1.63it/s] 84%|████████▎ | 9650/11526 [1:40:49<19:18, 1.62it/s] {'loss': 0.1334, 'grad_norm': 0.5324795246124268, 'learning_rate': 7.855646481602769e-07, 'epoch': 2.51}
84%|████████▎ | 9650/11526 [1:40:49<19:18, 1.62it/s] 84%|████████▎ | 9651/11526 [1:40:49<19:15, 1.62it/s] {'loss': 0.1546, 'grad_norm': 0.54853755235672, 'learning_rate': 7.847500043031408e-07, 'epoch': 2.51}
84%|████████▎ | 9651/11526 [1:40:49<19:15, 1.62it/s] 84%|████████▎ | 9652/11526 [1:40:50<19:13, 1.62it/s] {'loss': 0.1627, 'grad_norm': 0.5546526908874512, 'learning_rate': 7.839357470927195e-07, 'epoch': 2.51}
84%|████████▎ | 9652/11526 [1:40:50<19:13, 1.62it/s] 84%|████████▎ | 9653/11526 [1:40:51<19:12, 1.63it/s] {'loss': 0.144, 'grad_norm': 0.5554543733596802, 'learning_rate': 7.831218766036985e-07, 'epoch': 2.51}
84%|████████▎ | 9653/11526 [1:40:51<19:12, 1.63it/s] 84%|████████▍ | 9654/11526 [1:40:51<19:10, 1.63it/s] {'loss': 0.126, 'grad_norm': 0.5173906683921814, 'learning_rate': 7.823083929107312e-07, 'epoch': 2.51}
84%|████████▍ | 9654/11526 [1:40:51<19:10, 1.63it/s] 84%|████████▍ | 9655/11526 [1:40:52<19:10, 1.63it/s] {'loss': 0.1236, 'grad_norm': 0.5198594331741333, 'learning_rate': 7.814952960884381e-07, 'epoch': 2.51}
84%|████████▍ | 9655/11526 [1:40:52<19:10, 1.63it/s] 84%|████████▍ | 9656/11526 [1:40:52<19:09, 1.63it/s] {'loss': 0.1423, 'grad_norm': 0.5334432125091553, 'learning_rate': 7.806825862113975e-07, 'epoch': 2.51}
84%|████████▍ | 9656/11526 [1:40:53<19:09, 1.63it/s] 84%|████████▍ | 9657/11526 [1:40:53<19:08, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.6511579155921936, 'learning_rate': 7.798702633541566e-07, 'epoch': 2.51}
84%|████████▍ | 9657/11526 [1:40:53<19:08, 1.63it/s] 84%|████████▍ | 9658/11526 [1:40:54<19:07, 1.63it/s] {'loss': 0.156, 'grad_norm': 0.596222460269928, 'learning_rate': 7.790583275912272e-07, 'epoch': 2.51}
84%|████████▍ | 9658/11526 [1:40:54<19:07, 1.63it/s] 84%|████████▍ | 9659/11526 [1:40:54<19:06, 1.63it/s] {'loss': 0.1644, 'grad_norm': 0.6222246885299683, 'learning_rate': 7.782467789970843e-07, 'epoch': 2.51}
84%|████████▍ | 9659/11526 [1:40:54<19:06, 1.63it/s] 84%|████████▍ | 9660/11526 [1:40:55<19:06, 1.63it/s] {'loss': 0.1313, 'grad_norm': 0.5234001278877258, 'learning_rate': 7.774356176461684e-07, 'epoch': 2.51}
84%|████████▍ | 9660/11526 [1:40:55<19:06, 1.63it/s] 84%|████████▍ | 9661/11526 [1:40:55<19:05, 1.63it/s] {'loss': 0.1575, 'grad_norm': 0.6052023768424988, 'learning_rate': 7.76624843612882e-07, 'epoch': 2.51}
84%|████████▍ | 9661/11526 [1:40:56<19:05, 1.63it/s] 84%|████████▍ | 9662/11526 [1:40:56<19:04, 1.63it/s] {'loss': 0.1473, 'grad_norm': 0.5889374017715454, 'learning_rate': 7.758144569715947e-07, 'epoch': 2.51}
84%|████████▍ | 9662/11526 [1:40:56<19:04, 1.63it/s] 84%|████████▍ | 9663/11526 [1:40:57<19:03, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.5782482028007507, 'learning_rate': 7.750044577966404e-07, 'epoch': 2.52}
84%|████████▍ | 9663/11526 [1:40:57<19:03, 1.63it/s] 84%|████████▍ | 9664/11526 [1:40:57<19:02, 1.63it/s] {'loss': 0.1764, 'grad_norm': 0.7032333612442017, 'learning_rate': 7.74194846162316e-07, 'epoch': 2.52}
84%|████████▍ | 9664/11526 [1:40:57<19:02, 1.63it/s] 84%|████████▍ | 9665/11526 [1:40:58<19:02, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.5232377648353577, 'learning_rate': 7.733856221428848e-07, 'epoch': 2.52}
84%|████████▍ | 9665/11526 [1:40:58<19:02, 1.63it/s] 84%|████████▍ | 9666/11526 [1:40:59<19:01, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.5342010855674744, 'learning_rate': 7.725767858125738e-07, 'epoch': 2.52}
84%|████████▍ | 9666/11526 [1:40:59<19:01, 1.63it/s] 84%|████████▍ | 9667/11526 [1:40:59<19:01, 1.63it/s] {'loss': 0.1601, 'grad_norm': 0.691975474357605, 'learning_rate': 7.717683372455703e-07, 'epoch': 2.52}
84%|████████▍ | 9667/11526 [1:40:59<19:01, 1.63it/s] 84%|████████▍ | 9668/11526 [1:41:00<19:01, 1.63it/s] {'loss': 0.1633, 'grad_norm': 0.6629750728607178, 'learning_rate': 7.709602765160351e-07, 'epoch': 2.52}
84%|████████▍ | 9668/11526 [1:41:00<19:01, 1.63it/s] 84%|████████▍ | 9669/11526 [1:41:00<19:00, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.5887560248374939, 'learning_rate': 7.701526036980844e-07, 'epoch': 2.52}
84%|████████▍ | 9669/11526 [1:41:01<19:00, 1.63it/s] 84%|████████▍ | 9670/11526 [1:41:01<19:00, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.6151532530784607, 'learning_rate': 7.69345318865804e-07, 'epoch': 2.52}
84%|████████▍ | 9670/11526 [1:41:01<19:00, 1.63it/s] 84%|████████▍ | 9671/11526 [1:41:02<18:59, 1.63it/s] {'loss': 0.1263, 'grad_norm': 0.5066173672676086, 'learning_rate': 7.685384220932418e-07, 'epoch': 2.52}
84%|████████▍ | 9671/11526 [1:41:02<18:59, 1.63it/s] 84%|████████▍ | 9672/11526 [1:41:02<18:58, 1.63it/s] {'loss': 0.147, 'grad_norm': 0.6083590984344482, 'learning_rate': 7.67731913454412e-07, 'epoch': 2.52}
84%|████████▍ | 9672/11526 [1:41:02<18:58, 1.63it/s] 84%|████████▍ | 9673/11526 [1:41:03<18:57, 1.63it/s] {'loss': 0.1586, 'grad_norm': 0.6047111749649048, 'learning_rate': 7.669257930232915e-07, 'epoch': 2.52}
84%|████████▍ | 9673/11526 [1:41:03<18:57, 1.63it/s] 84%|████████▍ | 9674/11526 [1:41:03<18:57, 1.63it/s] {'loss': 0.1788, 'grad_norm': 0.6623516082763672, 'learning_rate': 7.66120060873824e-07, 'epoch': 2.52}
84%|████████▍ | 9674/11526 [1:41:04<18:57, 1.63it/s] 84%|████████▍ | 9675/11526 [1:41:04<18:56, 1.63it/s] {'loss': 0.1498, 'grad_norm': 0.673691987991333, 'learning_rate': 7.65314717079913e-07, 'epoch': 2.52}
84%|████████▍ | 9675/11526 [1:41:04<18:56, 1.63it/s] 84%|████████▍ | 9676/11526 [1:41:05<18:56, 1.63it/s] {'loss': 0.1224, 'grad_norm': 0.5110170841217041, 'learning_rate': 7.645097617154302e-07, 'epoch': 2.52}
84%|████████▍ | 9676/11526 [1:41:05<18:56, 1.63it/s] 84%|████████▍ | 9677/11526 [1:41:05<18:55, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5560922622680664, 'learning_rate': 7.637051948542112e-07, 'epoch': 2.52}
84%|████████▍ | 9677/11526 [1:41:05<18:55, 1.63it/s] 84%|████████▍ | 9678/11526 [1:41:06<18:54, 1.63it/s] {'loss': 0.1389, 'grad_norm': 0.7883645296096802, 'learning_rate': 7.629010165700551e-07, 'epoch': 2.52}
84%|████████▍ | 9678/11526 [1:41:06<18:54, 1.63it/s] 84%|████████▍ | 9679/11526 [1:41:07<18:54, 1.63it/s] {'loss': 0.1445, 'grad_norm': 0.5638949275016785, 'learning_rate': 7.620972269367271e-07, 'epoch': 2.52}
84%|████████▍ | 9679/11526 [1:41:07<18:54, 1.63it/s] 84%|████████▍ | 9680/11526 [1:41:07<18:53, 1.63it/s] {'loss': 0.1606, 'grad_norm': 0.6080358028411865, 'learning_rate': 7.612938260279517e-07, 'epoch': 2.52}
84%|████████▍ | 9680/11526 [1:41:07<18:53, 1.63it/s] 84%|████████▍ | 9681/11526 [1:41:08<18:52, 1.63it/s] {'loss': 0.1765, 'grad_norm': 0.6925003528594971, 'learning_rate': 7.604908139174255e-07, 'epoch': 2.52}
84%|████████▍ | 9681/11526 [1:41:08<18:52, 1.63it/s] 84%|████████▍ | 9682/11526 [1:41:08<18:53, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.691969096660614, 'learning_rate': 7.596881906788045e-07, 'epoch': 2.52}
84%|████████▍ | 9682/11526 [1:41:09<18:53, 1.63it/s] 84%|████████▍ | 9683/11526 [1:41:09<18:52, 1.63it/s] {'loss': 0.1612, 'grad_norm': 0.5815396308898926, 'learning_rate': 7.588859563857076e-07, 'epoch': 2.52}
84%|████████▍ | 9683/11526 [1:41:09<18:52, 1.63it/s] 84%|████████▍ | 9684/11526 [1:41:10<18:51, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.5961727499961853, 'learning_rate': 7.580841111117221e-07, 'epoch': 2.52}
84%|████████▍ | 9684/11526 [1:41:10<18:51, 1.63it/s] 84%|████████▍ | 9685/11526 [1:41:10<18:53, 1.62it/s] {'loss': 0.1387, 'grad_norm': 0.6081237196922302, 'learning_rate': 7.572826549303969e-07, 'epoch': 2.52}
84%|████████▍ | 9685/11526 [1:41:10<18:53, 1.62it/s] 84%|████████▍ | 9686/11526 [1:41:11<18:51, 1.63it/s] {'loss': 0.1689, 'grad_norm': 0.618696928024292, 'learning_rate': 7.56481587915247e-07, 'epoch': 2.52}
84%|████████▍ | 9686/11526 [1:41:11<18:51, 1.63it/s] 84%|████████▍ | 9687/11526 [1:41:11<18:50, 1.63it/s] {'loss': 0.209, 'grad_norm': 0.737750768661499, 'learning_rate': 7.556809101397517e-07, 'epoch': 2.52}
84%|████████▍ | 9687/11526 [1:41:12<18:50, 1.63it/s] 84%|████████▍ | 9688/11526 [1:41:12<18:50, 1.63it/s] {'loss': 0.1699, 'grad_norm': 0.6752309799194336, 'learning_rate': 7.548806216773514e-07, 'epoch': 2.52}
84%|████████▍ | 9688/11526 [1:41:12<18:50, 1.63it/s] 84%|████████▍ | 9689/11526 [1:41:13<18:49, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.5645025968551636, 'learning_rate': 7.54080722601453e-07, 'epoch': 2.52}
84%|████████▍ | 9689/11526 [1:41:13<18:49, 1.63it/s] 84%|████████▍ | 9690/11526 [1:41:13<18:53, 1.62it/s] {'loss': 0.1282, 'grad_norm': 0.5236591100692749, 'learning_rate': 7.532812129854311e-07, 'epoch': 2.52}
84%|████████▍ | 9690/11526 [1:41:13<18:53, 1.62it/s] 84%|████████▍ | 9691/11526 [1:41:14<18:52, 1.62it/s] {'loss': 0.1536, 'grad_norm': 0.7428906559944153, 'learning_rate': 7.524820929026184e-07, 'epoch': 2.52}
84%|████████▍ | 9691/11526 [1:41:14<18:52, 1.62it/s] 84%|████████▍ | 9692/11526 [1:41:15<18:50, 1.62it/s] {'loss': 0.1387, 'grad_norm': 0.5531002879142761, 'learning_rate': 7.516833624263153e-07, 'epoch': 2.52}
84%|████████▍ | 9692/11526 [1:41:15<18:50, 1.62it/s] 84%|████████▍ | 9693/11526 [1:41:15<18:48, 1.62it/s] {'loss': 0.1272, 'grad_norm': 0.5270191431045532, 'learning_rate': 7.508850216297875e-07, 'epoch': 2.52}
84%|████████▍ | 9693/11526 [1:41:15<18:48, 1.62it/s] 84%|████████▍ | 9694/11526 [1:41:16<18:48, 1.62it/s] {'loss': 0.1405, 'grad_norm': 0.5111185908317566, 'learning_rate': 7.500870705862595e-07, 'epoch': 2.52}
84%|████████▍ | 9694/11526 [1:41:16<18:48, 1.62it/s] 84%|████████▍ | 9695/11526 [1:41:16<18:47, 1.62it/s] {'loss': 0.1589, 'grad_norm': 0.6992964148521423, 'learning_rate': 7.492895093689284e-07, 'epoch': 2.52}
84%|████████▍ | 9695/11526 [1:41:17<18:47, 1.62it/s] 84%|████████▍ | 9696/11526 [1:41:17<18:46, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.6390594244003296, 'learning_rate': 7.484923380509485e-07, 'epoch': 2.52}
84%|████████▍ | 9696/11526 [1:41:17<18:46, 1.63it/s] 84%|████████▍ | 9697/11526 [1:41:18<18:44, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.570393979549408, 'learning_rate': 7.476955567054406e-07, 'epoch': 2.52}
84%|████████▍ | 9697/11526 [1:41:18<18:44, 1.63it/s] 84%|████████▍ | 9698/11526 [1:41:18<18:43, 1.63it/s] {'loss': 0.1599, 'grad_norm': 0.5901169180870056, 'learning_rate': 7.468991654054913e-07, 'epoch': 2.52}
84%|████████▍ | 9698/11526 [1:41:18<18:43, 1.63it/s] 84%|████████▍ | 9699/11526 [1:41:19<18:42, 1.63it/s] {'loss': 0.2103, 'grad_norm': 0.7428701519966125, 'learning_rate': 7.461031642241501e-07, 'epoch': 2.52}
84%|████████▍ | 9699/11526 [1:41:19<18:42, 1.63it/s] 84%|████████▍ | 9700/11526 [1:41:19<18:43, 1.63it/s] {'loss': 0.1793, 'grad_norm': 0.6454610824584961, 'learning_rate': 7.453075532344301e-07, 'epoch': 2.52}
84%|████████▍ | 9700/11526 [1:41:20<18:43, 1.63it/s] 84%|████████▍ | 9701/11526 [1:41:20<18:41, 1.63it/s] {'loss': 0.214, 'grad_norm': 0.779880166053772, 'learning_rate': 7.445123325093101e-07, 'epoch': 2.52}
84%|████████▍ | 9701/11526 [1:41:20<18:41, 1.63it/s] 84%|████████▍ | 9702/11526 [1:41:21<18:40, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.6954355835914612, 'learning_rate': 7.437175021217314e-07, 'epoch': 2.53}
84%|████████▍ | 9702/11526 [1:41:21<18:40, 1.63it/s] 84%|████████▍ | 9703/11526 [1:41:21<18:40, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.5806578397750854, 'learning_rate': 7.429230621445999e-07, 'epoch': 2.53}
84%|████████▍ | 9703/11526 [1:41:21<18:40, 1.63it/s] 84%|████████▍ | 9704/11526 [1:41:22<18:38, 1.63it/s] {'loss': 0.1269, 'grad_norm': 0.48683419823646545, 'learning_rate': 7.421290126507874e-07, 'epoch': 2.53}
84%|████████▍ | 9704/11526 [1:41:22<18:38, 1.63it/s] 84%|████████▍ | 9705/11526 [1:41:23<18:39, 1.63it/s] {'loss': 0.1221, 'grad_norm': 0.5080927610397339, 'learning_rate': 7.41335353713128e-07, 'epoch': 2.53}
84%|████████▍ | 9705/11526 [1:41:23<18:39, 1.63it/s] 84%|████████▍ | 9706/11526 [1:41:23<18:38, 1.63it/s] {'loss': 0.124, 'grad_norm': 0.540823221206665, 'learning_rate': 7.405420854044221e-07, 'epoch': 2.53}
84%|████████▍ | 9706/11526 [1:41:23<18:38, 1.63it/s] 84%|████████▍ | 9707/11526 [1:41:24<18:37, 1.63it/s] {'loss': 0.1427, 'grad_norm': 0.5327385663986206, 'learning_rate': 7.397492077974289e-07, 'epoch': 2.53}
84%|████████▍ | 9707/11526 [1:41:24<18:37, 1.63it/s] 84%|████████▍ | 9708/11526 [1:41:24<18:36, 1.63it/s] {'loss': 0.1734, 'grad_norm': 0.707642674446106, 'learning_rate': 7.389567209648795e-07, 'epoch': 2.53}
84%|████████▍ | 9708/11526 [1:41:25<18:36, 1.63it/s] 84%|████████▍ | 9709/11526 [1:41:25<18:36, 1.63it/s] {'loss': 0.1495, 'grad_norm': 0.591035008430481, 'learning_rate': 7.381646249794649e-07, 'epoch': 2.53}
84%|████████▍ | 9709/11526 [1:41:25<18:36, 1.63it/s] 84%|████████▍ | 9710/11526 [1:41:26<18:42, 1.62it/s] {'loss': 0.1323, 'grad_norm': 0.7113676071166992, 'learning_rate': 7.373729199138385e-07, 'epoch': 2.53}
84%|████████▍ | 9710/11526 [1:41:26<18:42, 1.62it/s] 84%|████████▍ | 9711/11526 [1:41:26<18:40, 1.62it/s] {'loss': 0.1494, 'grad_norm': 0.6163077354431152, 'learning_rate': 7.365816058406217e-07, 'epoch': 2.53}
84%|████████▍ | 9711/11526 [1:41:26<18:40, 1.62it/s] 84%|████████▍ | 9712/11526 [1:41:27<18:38, 1.62it/s] {'loss': 0.182, 'grad_norm': 0.5751841068267822, 'learning_rate': 7.357906828323974e-07, 'epoch': 2.53}
84%|████████▍ | 9712/11526 [1:41:27<18:38, 1.62it/s] 84%|████████▍ | 9713/11526 [1:41:27<18:36, 1.62it/s] {'loss': 0.1698, 'grad_norm': 0.5452237725257874, 'learning_rate': 7.350001509617144e-07, 'epoch': 2.53}
84%|████████▍ | 9713/11526 [1:41:28<18:36, 1.62it/s] 84%|████████▍ | 9714/11526 [1:41:28<18:35, 1.62it/s] {'loss': 0.1477, 'grad_norm': 0.5564009547233582, 'learning_rate': 7.342100103010841e-07, 'epoch': 2.53}
84%|████████▍ | 9714/11526 [1:41:28<18:35, 1.62it/s] 84%|████████▍ | 9715/11526 [1:41:29<18:34, 1.62it/s] {'loss': 0.1366, 'grad_norm': 0.5384077429771423, 'learning_rate': 7.334202609229845e-07, 'epoch': 2.53}
84%|████████▍ | 9715/11526 [1:41:29<18:34, 1.62it/s] 84%|████████▍ | 9716/11526 [1:41:29<18:33, 1.63it/s] {'loss': 0.1575, 'grad_norm': 0.6610180735588074, 'learning_rate': 7.326309028998519e-07, 'epoch': 2.53}
84%|████████▍ | 9716/11526 [1:41:29<18:33, 1.63it/s] 84%|████████▍ | 9717/11526 [1:41:30<18:32, 1.63it/s] {'loss': 0.1154, 'grad_norm': 0.4815067648887634, 'learning_rate': 7.318419363040952e-07, 'epoch': 2.53}
84%|████████▍ | 9717/11526 [1:41:30<18:32, 1.63it/s] 84%|████████▍ | 9718/11526 [1:41:31<18:31, 1.63it/s] {'loss': 0.1368, 'grad_norm': 0.555347204208374, 'learning_rate': 7.3105336120808e-07, 'epoch': 2.53}
84%|████████▍ | 9718/11526 [1:41:31<18:31, 1.63it/s] 84%|████████▍ | 9719/11526 [1:41:31<18:31, 1.63it/s] {'loss': 0.1289, 'grad_norm': 0.6116462349891663, 'learning_rate': 7.302651776841402e-07, 'epoch': 2.53}
84%|████████▍ | 9719/11526 [1:41:31<18:31, 1.63it/s] 84%|████████▍ | 9720/11526 [1:41:32<18:30, 1.63it/s] {'loss': 0.1889, 'grad_norm': 0.7254675030708313, 'learning_rate': 7.294773858045717e-07, 'epoch': 2.53}
84%|████████▍ | 9720/11526 [1:41:32<18:30, 1.63it/s] 84%|████████▍ | 9721/11526 [1:41:32<18:29, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.5659306049346924, 'learning_rate': 7.286899856416357e-07, 'epoch': 2.53}
84%|████████▍ | 9721/11526 [1:41:33<18:29, 1.63it/s] 84%|████████▍ | 9722/11526 [1:41:33<18:28, 1.63it/s] {'loss': 0.118, 'grad_norm': 0.5001912713050842, 'learning_rate': 7.279029772675572e-07, 'epoch': 2.53}
84%|████████▍ | 9722/11526 [1:41:33<18:28, 1.63it/s] 84%|████████▍ | 9723/11526 [1:41:34<18:27, 1.63it/s] {'loss': 0.1577, 'grad_norm': 0.6302550435066223, 'learning_rate': 7.271163607545257e-07, 'epoch': 2.53}
84%|████████▍ | 9723/11526 [1:41:34<18:27, 1.63it/s] 84%|████████▍ | 9724/11526 [1:41:34<18:27, 1.63it/s] {'loss': 0.1613, 'grad_norm': 0.6154893040657043, 'learning_rate': 7.263301361746922e-07, 'epoch': 2.53}
84%|████████▍ | 9724/11526 [1:41:34<18:27, 1.63it/s] 84%|████████▍ | 9725/11526 [1:41:35<18:28, 1.62it/s] {'loss': 0.167, 'grad_norm': 0.6244373917579651, 'learning_rate': 7.255443036001747e-07, 'epoch': 2.53}
84%|████████▍ | 9725/11526 [1:41:35<18:28, 1.62it/s] 84%|████████▍ | 9726/11526 [1:41:35<18:27, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.7139115929603577, 'learning_rate': 7.247588631030545e-07, 'epoch': 2.53}
84%|████████▍ | 9726/11526 [1:41:36<18:27, 1.63it/s] 84%|████████▍ | 9727/11526 [1:41:36<18:25, 1.63it/s] {'loss': 0.1433, 'grad_norm': 0.5819151997566223, 'learning_rate': 7.239738147553765e-07, 'epoch': 2.53}
84%|████████▍ | 9727/11526 [1:41:36<18:25, 1.63it/s] 84%|████████▍ | 9728/11526 [1:41:37<18:25, 1.63it/s] {'loss': 0.157, 'grad_norm': 0.6054345965385437, 'learning_rate': 7.231891586291506e-07, 'epoch': 2.53}
84%|████████▍ | 9728/11526 [1:41:37<18:25, 1.63it/s] 84%|████████▍ | 9729/11526 [1:41:37<18:25, 1.63it/s] {'loss': 0.2052, 'grad_norm': 0.8509754538536072, 'learning_rate': 7.224048947963475e-07, 'epoch': 2.53}
84%|████████▍ | 9729/11526 [1:41:37<18:25, 1.63it/s] 84%|████████▍ | 9730/11526 [1:41:38<18:29, 1.62it/s] {'loss': 0.1672, 'grad_norm': 0.6999646425247192, 'learning_rate': 7.216210233289067e-07, 'epoch': 2.53}
84%|████████▍ | 9730/11526 [1:41:38<18:29, 1.62it/s] 84%|████████▍ | 9731/11526 [1:41:39<18:27, 1.62it/s] {'loss': 0.1454, 'grad_norm': 0.5966212153434753, 'learning_rate': 7.208375442987298e-07, 'epoch': 2.53}
84%|████████▍ | 9731/11526 [1:41:39<18:27, 1.62it/s] 84%|████████▍ | 9732/11526 [1:41:39<18:25, 1.62it/s] {'loss': 0.1357, 'grad_norm': 0.49508506059646606, 'learning_rate': 7.200544577776791e-07, 'epoch': 2.53}
84%|████████▍ | 9732/11526 [1:41:39<18:25, 1.62it/s] 84%|████████▍ | 9733/11526 [1:41:40<18:23, 1.62it/s] {'loss': 0.1412, 'grad_norm': 0.5997925996780396, 'learning_rate': 7.192717638375862e-07, 'epoch': 2.53}
84%|████████▍ | 9733/11526 [1:41:40<18:23, 1.62it/s] 84%|████████▍ | 9734/11526 [1:41:40<18:22, 1.62it/s] {'loss': 0.1541, 'grad_norm': 0.5851051211357117, 'learning_rate': 7.184894625502431e-07, 'epoch': 2.53}
84%|████████▍ | 9734/11526 [1:41:41<18:22, 1.62it/s] 84%|████████▍ | 9735/11526 [1:41:41<18:22, 1.62it/s] {'loss': 0.1696, 'grad_norm': 0.6541768908500671, 'learning_rate': 7.177075539874073e-07, 'epoch': 2.53}
84%|████████▍ | 9735/11526 [1:41:41<18:22, 1.62it/s] 84%|████████▍ | 9736/11526 [1:41:42<18:21, 1.63it/s] {'loss': 0.1251, 'grad_norm': 0.484311580657959, 'learning_rate': 7.16926038220801e-07, 'epoch': 2.53}
84%|████████▍ | 9736/11526 [1:41:42<18:21, 1.63it/s] 84%|████████▍ | 9737/11526 [1:41:42<18:20, 1.63it/s] {'loss': 0.1232, 'grad_norm': 0.6400712728500366, 'learning_rate': 7.161449153221072e-07, 'epoch': 2.53}
84%|████████▍ | 9737/11526 [1:41:42<18:20, 1.63it/s] 84%|████████▍ | 9738/11526 [1:41:43<18:18, 1.63it/s] {'loss': 0.1578, 'grad_norm': 0.6841614842414856, 'learning_rate': 7.153641853629744e-07, 'epoch': 2.53}
84%|████████▍ | 9738/11526 [1:41:43<18:18, 1.63it/s] 84%|████████▍ | 9739/11526 [1:41:43<18:18, 1.63it/s] {'loss': 0.1307, 'grad_norm': 0.5125433802604675, 'learning_rate': 7.145838484150197e-07, 'epoch': 2.53}
84%|████████▍ | 9739/11526 [1:41:44<18:18, 1.63it/s] 85%|████████▍ | 9740/11526 [1:41:44<18:18, 1.63it/s] {'loss': 0.1757, 'grad_norm': 0.5813874006271362, 'learning_rate': 7.138039045498163e-07, 'epoch': 2.54}
85%|████████▍ | 9740/11526 [1:41:44<18:18, 1.63it/s] 85%|████████▍ | 9741/11526 [1:41:45<18:17, 1.63it/s] {'loss': 0.1271, 'grad_norm': 0.5033528804779053, 'learning_rate': 7.130243538389064e-07, 'epoch': 2.54}
85%|████████▍ | 9741/11526 [1:41:45<18:17, 1.63it/s] 85%|████████▍ | 9742/11526 [1:41:45<18:17, 1.63it/s] {'loss': 0.1395, 'grad_norm': 0.5414456129074097, 'learning_rate': 7.122451963537952e-07, 'epoch': 2.54}
85%|████████▍ | 9742/11526 [1:41:45<18:17, 1.63it/s] 85%|████████▍ | 9743/11526 [1:41:46<18:16, 1.63it/s] {'loss': 0.1639, 'grad_norm': 0.6383630633354187, 'learning_rate': 7.114664321659492e-07, 'epoch': 2.54}
85%|████████▍ | 9743/11526 [1:41:46<18:16, 1.63it/s] 85%|████████▍ | 9744/11526 [1:41:47<18:14, 1.63it/s] {'loss': 0.0926, 'grad_norm': 0.4173796772956848, 'learning_rate': 7.106880613468047e-07, 'epoch': 2.54}
85%|████████▍ | 9744/11526 [1:41:47<18:14, 1.63it/s] 85%|████████▍ | 9745/11526 [1:41:47<18:15, 1.63it/s] {'loss': 0.1488, 'grad_norm': 0.653873860836029, 'learning_rate': 7.099100839677553e-07, 'epoch': 2.54}
85%|████████▍ | 9745/11526 [1:41:47<18:15, 1.63it/s] 85%|████████▍ | 9746/11526 [1:41:48<18:14, 1.63it/s] {'loss': 0.1572, 'grad_norm': 0.6184573769569397, 'learning_rate': 7.091325001001631e-07, 'epoch': 2.54}
85%|████████▍ | 9746/11526 [1:41:48<18:14, 1.63it/s] 85%|████████▍ | 9747/11526 [1:41:48<18:12, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.5734347105026245, 'learning_rate': 7.083553098153512e-07, 'epoch': 2.54}
85%|████████▍ | 9747/11526 [1:41:49<18:12, 1.63it/s] 85%|████████▍ | 9748/11526 [1:41:49<18:12, 1.63it/s] {'loss': 0.1817, 'grad_norm': 0.6806484460830688, 'learning_rate': 7.075785131846091e-07, 'epoch': 2.54}
85%|████████▍ | 9748/11526 [1:41:49<18:12, 1.63it/s] 85%|████████▍ | 9749/11526 [1:41:50<18:12, 1.63it/s] {'loss': 0.1271, 'grad_norm': 0.5661699771881104, 'learning_rate': 7.068021102791889e-07, 'epoch': 2.54}
85%|████████▍ | 9749/11526 [1:41:50<18:12, 1.63it/s] 85%|████████▍ | 9750/11526 [1:41:50<18:12, 1.63it/s] {'loss': 0.1749, 'grad_norm': 0.6661866307258606, 'learning_rate': 7.060261011703073e-07, 'epoch': 2.54}
85%|████████▍ | 9750/11526 [1:41:50<18:12, 1.63it/s] 85%|████████▍ | 9751/11526 [1:41:51<18:11, 1.63it/s] {'loss': 0.1183, 'grad_norm': 0.5274088382720947, 'learning_rate': 7.052504859291426e-07, 'epoch': 2.54}
85%|████████▍ | 9751/11526 [1:41:51<18:11, 1.63it/s] 85%|████████▍ | 9752/11526 [1:41:51<18:09, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.5470888018608093, 'learning_rate': 7.044752646268388e-07, 'epoch': 2.54}
85%|████████▍ | 9752/11526 [1:41:52<18:09, 1.63it/s] 85%|████████▍ | 9753/11526 [1:41:52<18:09, 1.63it/s] {'loss': 0.1387, 'grad_norm': 0.5566385984420776, 'learning_rate': 7.037004373345047e-07, 'epoch': 2.54}
85%|████████▍ | 9753/11526 [1:41:52<18:09, 1.63it/s] 85%|████████▍ | 9754/11526 [1:41:53<18:08, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.5827347040176392, 'learning_rate': 7.029260041232111e-07, 'epoch': 2.54}
85%|████████▍ | 9754/11526 [1:41:53<18:08, 1.63it/s] 85%|████████▍ | 9755/11526 [1:41:53<18:10, 1.62it/s] {'loss': 0.1598, 'grad_norm': 0.5680102109909058, 'learning_rate': 7.021519650639952e-07, 'epoch': 2.54}
85%|████████▍ | 9755/11526 [1:41:53<18:10, 1.62it/s] 85%|████████▍ | 9756/11526 [1:41:54<18:10, 1.62it/s] {'loss': 0.1049, 'grad_norm': 0.4576590657234192, 'learning_rate': 7.013783202278524e-07, 'epoch': 2.54}
85%|████████▍ | 9756/11526 [1:41:54<18:10, 1.62it/s] 85%|████████▍ | 9757/11526 [1:41:55<18:09, 1.62it/s] {'loss': 0.1194, 'grad_norm': 0.5146333575248718, 'learning_rate': 7.006050696857497e-07, 'epoch': 2.54}
85%|████████▍ | 9757/11526 [1:41:55<18:09, 1.62it/s] 85%|████████▍ | 9758/11526 [1:41:55<18:08, 1.62it/s] {'loss': 0.1734, 'grad_norm': 0.6301488280296326, 'learning_rate': 6.998322135086133e-07, 'epoch': 2.54}
85%|████████▍ | 9758/11526 [1:41:55<18:08, 1.62it/s] 85%|████████▍ | 9759/11526 [1:41:56<18:07, 1.62it/s] {'loss': 0.1357, 'grad_norm': 0.5136576890945435, 'learning_rate': 6.99059751767332e-07, 'epoch': 2.54}
85%|████████▍ | 9759/11526 [1:41:56<18:07, 1.62it/s] 85%|████████▍ | 9760/11526 [1:41:56<18:10, 1.62it/s] {'loss': 0.1722, 'grad_norm': 0.631274938583374, 'learning_rate': 6.982876845327619e-07, 'epoch': 2.54}
85%|████████▍ | 9760/11526 [1:41:57<18:10, 1.62it/s] 85%|████████▍ | 9761/11526 [1:41:57<18:08, 1.62it/s] {'loss': 0.1626, 'grad_norm': 0.684476375579834, 'learning_rate': 6.975160118757213e-07, 'epoch': 2.54}
85%|████████▍ | 9761/11526 [1:41:57<18:08, 1.62it/s] 85%|████████▍ | 9762/11526 [1:41:58<18:43, 1.57it/s] {'loss': 0.1519, 'grad_norm': 0.5753541588783264, 'learning_rate': 6.967447338669919e-07, 'epoch': 2.54}
85%|████████▍ | 9762/11526 [1:41:58<18:43, 1.57it/s] 85%|████████▍ | 9763/11526 [1:41:58<18:34, 1.58it/s] {'loss': 0.1856, 'grad_norm': 0.7172896265983582, 'learning_rate': 6.959738505773211e-07, 'epoch': 2.54}
85%|████████▍ | 9763/11526 [1:41:58<18:34, 1.58it/s] 85%|████████▍ | 9764/11526 [1:41:59<18:24, 1.59it/s] {'loss': 0.1396, 'grad_norm': 0.5370270609855652, 'learning_rate': 6.952033620774162e-07, 'epoch': 2.54}
85%|████████▍ | 9764/11526 [1:41:59<18:24, 1.59it/s] 85%|████████▍ | 9765/11526 [1:42:00<18:17, 1.60it/s] {'loss': 0.1555, 'grad_norm': 0.5938766598701477, 'learning_rate': 6.944332684379518e-07, 'epoch': 2.54}
85%|████████▍ | 9765/11526 [1:42:00<18:17, 1.60it/s] 85%|████████▍ | 9766/11526 [1:42:00<18:12, 1.61it/s] {'loss': 0.1333, 'grad_norm': 0.599841833114624, 'learning_rate': 6.936635697295674e-07, 'epoch': 2.54}
85%|████████▍ | 9766/11526 [1:42:00<18:12, 1.61it/s] 85%|████████▍ | 9767/11526 [1:42:01<18:08, 1.62it/s] {'loss': 0.15, 'grad_norm': 0.5749219655990601, 'learning_rate': 6.928942660228615e-07, 'epoch': 2.54}
85%|████████▍ | 9767/11526 [1:42:01<18:08, 1.62it/s] 85%|████████▍ | 9768/11526 [1:42:01<18:05, 1.62it/s] {'loss': 0.1379, 'grad_norm': 0.5632360577583313, 'learning_rate': 6.921253573884002e-07, 'epoch': 2.54}
85%|████████▍ | 9768/11526 [1:42:02<18:05, 1.62it/s] 85%|████████▍ | 9769/11526 [1:42:02<18:02, 1.62it/s] {'loss': 0.1674, 'grad_norm': 0.6948402523994446, 'learning_rate': 6.913568438967111e-07, 'epoch': 2.54}
85%|████████▍ | 9769/11526 [1:42:02<18:02, 1.62it/s] 85%|████████▍ | 9770/11526 [1:42:03<18:00, 1.62it/s] {'loss': 0.121, 'grad_norm': 0.50669926404953, 'learning_rate': 6.905887256182881e-07, 'epoch': 2.54}
85%|████████▍ | 9770/11526 [1:42:03<18:00, 1.62it/s] 85%|████████▍ | 9771/11526 [1:42:03<17:59, 1.63it/s] {'loss': 0.2029, 'grad_norm': 0.8074679374694824, 'learning_rate': 6.89821002623588e-07, 'epoch': 2.54}
85%|████████▍ | 9771/11526 [1:42:03<17:59, 1.63it/s] 85%|████████▍ | 9772/11526 [1:42:04<17:57, 1.63it/s] {'loss': 0.1414, 'grad_norm': 0.6333832144737244, 'learning_rate': 6.890536749830279e-07, 'epoch': 2.54}
85%|████████▍ | 9772/11526 [1:42:04<17:57, 1.63it/s] 85%|████████▍ | 9773/11526 [1:42:04<17:57, 1.63it/s] {'loss': 0.1216, 'grad_norm': 0.5063647627830505, 'learning_rate': 6.882867427669931e-07, 'epoch': 2.54}
85%|████████▍ | 9773/11526 [1:42:05<17:57, 1.63it/s] 85%|████████▍ | 9774/11526 [1:42:05<18:25, 1.58it/s] {'loss': 0.1766, 'grad_norm': 0.6606283187866211, 'learning_rate': 6.875202060458308e-07, 'epoch': 2.54}
85%|████████▍ | 9774/11526 [1:42:05<18:25, 1.58it/s] 85%|████████▍ | 9775/11526 [1:42:06<18:20, 1.59it/s] {'loss': 0.1481, 'grad_norm': 0.5984211564064026, 'learning_rate': 6.867540648898524e-07, 'epoch': 2.54}
85%|████████▍ | 9775/11526 [1:42:06<18:20, 1.59it/s] 85%|████████▍ | 9776/11526 [1:42:06<18:12, 1.60it/s] {'loss': 0.1407, 'grad_norm': 0.5736774802207947, 'learning_rate': 6.859883193693329e-07, 'epoch': 2.54}
85%|████████▍ | 9776/11526 [1:42:06<18:12, 1.60it/s] 85%|████████▍ | 9777/11526 [1:42:07<18:40, 1.56it/s] {'loss': 0.1698, 'grad_norm': 0.6347048282623291, 'learning_rate': 6.85222969554511e-07, 'epoch': 2.54}
85%|████████▍ | 9777/11526 [1:42:07<18:40, 1.56it/s] 85%|████████▍ | 9778/11526 [1:42:08<18:27, 1.58it/s] {'loss': 0.1803, 'grad_norm': 0.6658459901809692, 'learning_rate': 6.844580155155861e-07, 'epoch': 2.55}
85%|████████▍ | 9778/11526 [1:42:08<18:27, 1.58it/s] 85%|████████▍ | 9779/11526 [1:42:08<18:16, 1.59it/s] {'loss': 0.1815, 'grad_norm': 0.5085653066635132, 'learning_rate': 6.836934573227289e-07, 'epoch': 2.55}
85%|████████▍ | 9779/11526 [1:42:08<18:16, 1.59it/s] 85%|████████▍ | 9780/11526 [1:42:09<18:08, 1.60it/s] {'loss': 0.1491, 'grad_norm': 0.5705053806304932, 'learning_rate': 6.829292950460653e-07, 'epoch': 2.55}
85%|████████▍ | 9780/11526 [1:42:09<18:08, 1.60it/s] 85%|████████▍ | 9781/11526 [1:42:09<18:04, 1.61it/s] {'loss': 0.1679, 'grad_norm': 0.7009082436561584, 'learning_rate': 6.8216552875569e-07, 'epoch': 2.55}
85%|████████▍ | 9781/11526 [1:42:10<18:04, 1.61it/s] 85%|████████▍ | 9782/11526 [1:42:10<17:59, 1.62it/s] {'loss': 0.1824, 'grad_norm': 0.7206979393959045, 'learning_rate': 6.814021585216601e-07, 'epoch': 2.55}
85%|████████▍ | 9782/11526 [1:42:10<17:59, 1.62it/s] 85%|████████▍ | 9783/11526 [1:42:11<17:58, 1.62it/s] {'loss': 0.1363, 'grad_norm': 0.5319392681121826, 'learning_rate': 6.806391844139959e-07, 'epoch': 2.55}
85%|████████▍ | 9783/11526 [1:42:11<17:58, 1.62it/s] 85%|████████▍ | 9784/11526 [1:42:11<17:54, 1.62it/s] {'loss': 0.1389, 'grad_norm': 0.5706934332847595, 'learning_rate': 6.798766065026819e-07, 'epoch': 2.55}
85%|████████▍ | 9784/11526 [1:42:11<17:54, 1.62it/s] 85%|████████▍ | 9785/11526 [1:42:12<17:52, 1.62it/s] {'loss': 0.1325, 'grad_norm': 0.5234972834587097, 'learning_rate': 6.79114424857667e-07, 'epoch': 2.55}
85%|████████▍ | 9785/11526 [1:42:12<17:52, 1.62it/s] 85%|████████▍ | 9786/11526 [1:42:13<17:51, 1.62it/s] {'loss': 0.1326, 'grad_norm': 0.5223056674003601, 'learning_rate': 6.783526395488605e-07, 'epoch': 2.55}
85%|████████▍ | 9786/11526 [1:42:13<17:51, 1.62it/s] 85%|████████▍ | 9787/11526 [1:42:13<17:49, 1.63it/s] {'loss': 0.1384, 'grad_norm': 0.5629771947860718, 'learning_rate': 6.775912506461391e-07, 'epoch': 2.55}
85%|████████▍ | 9787/11526 [1:42:13<17:49, 1.63it/s] 85%|████████▍ | 9788/11526 [1:42:14<17:49, 1.62it/s] {'loss': 0.165, 'grad_norm': 0.6434540152549744, 'learning_rate': 6.76830258219342e-07, 'epoch': 2.55}
85%|████████▍ | 9788/11526 [1:42:14<17:49, 1.62it/s] 85%|████████▍ | 9789/11526 [1:42:14<17:49, 1.62it/s] {'loss': 0.1273, 'grad_norm': 0.5334280729293823, 'learning_rate': 6.760696623382712e-07, 'epoch': 2.55}
85%|████████▍ | 9789/11526 [1:42:15<17:49, 1.62it/s] 85%|████████▍ | 9790/11526 [1:42:15<17:47, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.6366612315177917, 'learning_rate': 6.753094630726937e-07, 'epoch': 2.55}
85%|████████▍ | 9790/11526 [1:42:15<17:47, 1.63it/s] 85%|████████▍ | 9791/11526 [1:42:16<17:47, 1.63it/s] {'loss': 0.1536, 'grad_norm': 0.5917164087295532, 'learning_rate': 6.745496604923391e-07, 'epoch': 2.55}
85%|████████▍ | 9791/11526 [1:42:16<17:47, 1.63it/s] 85%|████████▍ | 9792/11526 [1:42:16<17:46, 1.63it/s] {'loss': 0.1216, 'grad_norm': 0.4942651093006134, 'learning_rate': 6.737902546668984e-07, 'epoch': 2.55}
85%|████████▍ | 9792/11526 [1:42:16<17:46, 1.63it/s] 85%|████████▍ | 9793/11526 [1:42:17<17:47, 1.62it/s] {'loss': 0.1297, 'grad_norm': 0.5462005138397217, 'learning_rate': 6.730312456660331e-07, 'epoch': 2.55}
85%|████████▍ | 9793/11526 [1:42:17<17:47, 1.62it/s] 85%|████████▍ | 9794/11526 [1:42:17<17:46, 1.62it/s] {'loss': 0.1526, 'grad_norm': 0.6543107032775879, 'learning_rate': 6.722726335593605e-07, 'epoch': 2.55}
85%|████████▍ | 9794/11526 [1:42:18<17:46, 1.62it/s] 85%|████████▍ | 9795/11526 [1:42:18<17:45, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.8020053505897522, 'learning_rate': 6.715144184164652e-07, 'epoch': 2.55}
85%|████████▍ | 9795/11526 [1:42:18<17:45, 1.63it/s] 85%|████████▍ | 9796/11526 [1:42:19<17:44, 1.62it/s] {'loss': 0.1804, 'grad_norm': 0.7016348838806152, 'learning_rate': 6.707566003068955e-07, 'epoch': 2.55}
85%|████████▍ | 9796/11526 [1:42:19<17:44, 1.62it/s] 85%|████████▍ | 9797/11526 [1:42:19<17:44, 1.62it/s] {'loss': 0.1843, 'grad_norm': 0.6689432859420776, 'learning_rate': 6.699991793001631e-07, 'epoch': 2.55}
85%|████████▍ | 9797/11526 [1:42:19<17:44, 1.62it/s] 85%|████████▌ | 9798/11526 [1:42:20<17:43, 1.62it/s] {'loss': 0.1635, 'grad_norm': 0.6035934090614319, 'learning_rate': 6.692421554657425e-07, 'epoch': 2.55}
85%|████████▌ | 9798/11526 [1:42:20<17:43, 1.62it/s] 85%|████████▌ | 9799/11526 [1:42:21<17:43, 1.62it/s] {'loss': 0.1174, 'grad_norm': 1.3215889930725098, 'learning_rate': 6.684855288730735e-07, 'epoch': 2.55}
85%|████████▌ | 9799/11526 [1:42:21<17:43, 1.62it/s] 85%|████████▌ | 9800/11526 [1:42:21<17:42, 1.62it/s] {'loss': 0.1456, 'grad_norm': 0.5886273980140686, 'learning_rate': 6.677292995915563e-07, 'epoch': 2.55}
85%|████████▌ | 9800/11526 [1:42:21<17:42, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.35it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.02it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.91it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.75it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.5430423617362976, 'eval_runtime': 1.9526, 'eval_samples_per_second': 102.43, 'eval_steps_per_second': 6.658, 'epoch': 2.55}
85%|████████▌ | 9800/11526 [1:42:23<17:42, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 85%|████████▌ | 9801/11526 [1:42:24<34:34, 1.20s/it] {'loss': 0.1438, 'grad_norm': 0.5856814980506897, 'learning_rate': 6.669734676905571e-07, 'epoch': 2.55}
85%|████████▌ | 9801/11526 [1:42:24<34:34, 1.20s/it] 85%|████████▌ | 9802/11526 [1:42:24<29:28, 1.03s/it] {'loss': 0.1288, 'grad_norm': 0.579575777053833, 'learning_rate': 6.662180332394052e-07, 'epoch': 2.55}
85%|████████▌ | 9802/11526 [1:42:24<29:28, 1.03s/it] 85%|████████▌ | 9803/11526 [1:42:25<25:55, 1.11it/s] {'loss': 0.1856, 'grad_norm': 0.6935302019119263, 'learning_rate': 6.654629963073933e-07, 'epoch': 2.55}
85%|████████▌ | 9803/11526 [1:42:25<25:55, 1.11it/s] 85%|████████▌ | 9804/11526 [1:42:26<23:25, 1.22it/s] {'loss': 0.1287, 'grad_norm': 0.5039154291152954, 'learning_rate': 6.647083569637797e-07, 'epoch': 2.55}
85%|████████▌ | 9804/11526 [1:42:26<23:25, 1.22it/s] 85%|████████▌ | 9805/11526 [1:42:26<21:40, 1.32it/s] {'loss': 0.1732, 'grad_norm': 0.6622030138969421, 'learning_rate': 6.639541152777795e-07, 'epoch': 2.55}
85%|████████▌ | 9805/11526 [1:42:26<21:40, 1.32it/s] 85%|████████▌ | 9806/11526 [1:42:27<20:25, 1.40it/s] {'loss': 0.159, 'grad_norm': 0.6362423896789551, 'learning_rate': 6.632002713185803e-07, 'epoch': 2.55}
85%|████████▌ | 9806/11526 [1:42:27<20:25, 1.40it/s] 85%|████████▌ | 9807/11526 [1:42:27<19:35, 1.46it/s] {'loss': 0.1278, 'grad_norm': 0.6040448546409607, 'learning_rate': 6.624468251553284e-07, 'epoch': 2.55}
85%|████████▌ | 9807/11526 [1:42:28<19:35, 1.46it/s] 85%|████████▌ | 9808/11526 [1:42:28<18:59, 1.51it/s] {'loss': 0.1833, 'grad_norm': 0.704028308391571, 'learning_rate': 6.61693776857133e-07, 'epoch': 2.55}
85%|████████▌ | 9808/11526 [1:42:28<18:59, 1.51it/s] 85%|████████▌ | 9809/11526 [1:42:29<18:33, 1.54it/s] {'loss': 0.1471, 'grad_norm': 0.5861223340034485, 'learning_rate': 6.609411264930676e-07, 'epoch': 2.55}
85%|████████▌ | 9809/11526 [1:42:29<18:33, 1.54it/s] 85%|████████▌ | 9810/11526 [1:42:29<18:15, 1.57it/s] {'loss': 0.1328, 'grad_norm': 0.5305044054985046, 'learning_rate': 6.601888741321705e-07, 'epoch': 2.55}
85%|████████▌ | 9810/11526 [1:42:29<18:15, 1.57it/s] 85%|████████▌ | 9811/11526 [1:42:30<18:02, 1.58it/s] {'loss': 0.1519, 'grad_norm': 0.7523414492607117, 'learning_rate': 6.594370198434424e-07, 'epoch': 2.55}
85%|████████▌ | 9811/11526 [1:42:30<18:02, 1.58it/s] 85%|████████▌ | 9812/11526 [1:42:31<17:52, 1.60it/s] {'loss': 0.1528, 'grad_norm': 0.650900661945343, 'learning_rate': 6.586855636958489e-07, 'epoch': 2.55}
85%|████████▌ | 9812/11526 [1:42:31<17:52, 1.60it/s] 85%|████████▌ | 9813/11526 [1:42:31<17:47, 1.60it/s] {'loss': 0.1359, 'grad_norm': 0.6103355288505554, 'learning_rate': 6.579345057583154e-07, 'epoch': 2.55}
85%|████████▌ | 9813/11526 [1:42:31<17:47, 1.60it/s] 85%|████████▌ | 9814/11526 [1:42:32<17:41, 1.61it/s] {'loss': 0.1492, 'grad_norm': 0.5782403945922852, 'learning_rate': 6.571838460997331e-07, 'epoch': 2.55}
85%|████████▌ | 9814/11526 [1:42:32<17:41, 1.61it/s] 85%|████████▌ | 9815/11526 [1:42:32<17:37, 1.62it/s] {'loss': 0.1566, 'grad_norm': 0.624849259853363, 'learning_rate': 6.564335847889602e-07, 'epoch': 2.55}
85%|████████▌ | 9815/11526 [1:42:32<17:37, 1.62it/s] 85%|████████▌ | 9816/11526 [1:42:33<17:35, 1.62it/s] {'loss': 0.166, 'grad_norm': 0.5969599485397339, 'learning_rate': 6.55683721894812e-07, 'epoch': 2.55}
85%|████████▌ | 9816/11526 [1:42:33<17:35, 1.62it/s] 85%|████████▌ | 9817/11526 [1:42:34<17:32, 1.62it/s] {'loss': 0.1318, 'grad_norm': 0.5254287123680115, 'learning_rate': 6.549342574860706e-07, 'epoch': 2.56}
85%|████████▌ | 9817/11526 [1:42:34<17:32, 1.62it/s] 85%|████████▌ | 9818/11526 [1:42:34<17:30, 1.63it/s] {'loss': 0.1503, 'grad_norm': 0.6244193911552429, 'learning_rate': 6.541851916314818e-07, 'epoch': 2.56}
85%|████████▌ | 9818/11526 [1:42:34<17:30, 1.63it/s] 85%|████████▌ | 9819/11526 [1:42:35<17:30, 1.62it/s] {'loss': 0.152, 'grad_norm': 0.4707071781158447, 'learning_rate': 6.534365243997537e-07, 'epoch': 2.56}
85%|████████▌ | 9819/11526 [1:42:35<17:30, 1.62it/s] 85%|████████▌ | 9820/11526 [1:42:35<17:30, 1.62it/s] {'loss': 0.1508, 'grad_norm': 0.6270984411239624, 'learning_rate': 6.526882558595599e-07, 'epoch': 2.56}
85%|████████▌ | 9820/11526 [1:42:36<17:30, 1.62it/s] 85%|████████▌ | 9821/11526 [1:42:36<17:29, 1.63it/s] {'loss': 0.173, 'grad_norm': 0.681904673576355, 'learning_rate': 6.519403860795331e-07, 'epoch': 2.56}
85%|████████▌ | 9821/11526 [1:42:36<17:29, 1.63it/s] 85%|████████▌ | 9822/11526 [1:42:37<17:27, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.6147057414054871, 'learning_rate': 6.511929151282736e-07, 'epoch': 2.56}
85%|████████▌ | 9822/11526 [1:42:37<17:27, 1.63it/s] 85%|████████▌ | 9823/11526 [1:42:37<17:28, 1.62it/s] {'loss': 0.1602, 'grad_norm': 0.6116187572479248, 'learning_rate': 6.504458430743432e-07, 'epoch': 2.56}
85%|████████▌ | 9823/11526 [1:42:37<17:28, 1.62it/s] 85%|████████▌ | 9824/11526 [1:42:38<17:27, 1.63it/s] {'loss': 0.1417, 'grad_norm': 0.5571762919425964, 'learning_rate': 6.496991699862687e-07, 'epoch': 2.56}
85%|████████▌ | 9824/11526 [1:42:38<17:27, 1.63it/s] 85%|████████▌ | 9825/11526 [1:42:38<17:26, 1.63it/s] {'loss': 0.1546, 'grad_norm': 0.5595890283584595, 'learning_rate': 6.48952895932538e-07, 'epoch': 2.56}
85%|████████▌ | 9825/11526 [1:42:39<17:26, 1.63it/s] 85%|████████▌ | 9826/11526 [1:42:39<17:25, 1.63it/s] {'loss': 0.1382, 'grad_norm': 0.5948118567466736, 'learning_rate': 6.482070209816055e-07, 'epoch': 2.56}
85%|████████▌ | 9826/11526 [1:42:39<17:25, 1.63it/s] 85%|████████▌ | 9827/11526 [1:42:40<17:24, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.6877224445343018, 'learning_rate': 6.474615452018834e-07, 'epoch': 2.56}
85%|████████▌ | 9827/11526 [1:42:40<17:24, 1.63it/s] 85%|████████▌ | 9828/11526 [1:42:40<17:25, 1.62it/s] {'loss': 0.1651, 'grad_norm': 0.6825730204582214, 'learning_rate': 6.467164686617555e-07, 'epoch': 2.56}
85%|████████▌ | 9828/11526 [1:42:40<17:25, 1.62it/s] 85%|████████▌ | 9829/11526 [1:42:41<17:24, 1.62it/s] {'loss': 0.1478, 'grad_norm': 0.5831253528594971, 'learning_rate': 6.459717914295615e-07, 'epoch': 2.56}
85%|████████▌ | 9829/11526 [1:42:41<17:24, 1.62it/s] 85%|████████▌ | 9830/11526 [1:42:42<17:22, 1.63it/s] {'loss': 0.1672, 'grad_norm': 0.6352212429046631, 'learning_rate': 6.45227513573608e-07, 'epoch': 2.56}
85%|████████▌ | 9830/11526 [1:42:42<17:22, 1.63it/s] 85%|████████▌ | 9831/11526 [1:42:42<17:22, 1.63it/s] {'loss': 0.1602, 'grad_norm': 0.6118616461753845, 'learning_rate': 6.444836351621653e-07, 'epoch': 2.56}
85%|████████▌ | 9831/11526 [1:42:42<17:22, 1.63it/s] 85%|████████▌ | 9832/11526 [1:42:43<17:20, 1.63it/s] {'loss': 0.1513, 'grad_norm': 0.581291913986206, 'learning_rate': 6.437401562634637e-07, 'epoch': 2.56}
85%|████████▌ | 9832/11526 [1:42:43<17:20, 1.63it/s] 85%|████████▌ | 9833/11526 [1:42:43<17:20, 1.63it/s] {'loss': 0.1381, 'grad_norm': 0.5815891027450562, 'learning_rate': 6.429970769457017e-07, 'epoch': 2.56}
85%|████████▌ | 9833/11526 [1:42:44<17:20, 1.63it/s] 85%|████████▌ | 9834/11526 [1:42:44<17:19, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.6026893854141235, 'learning_rate': 6.422543972770395e-07, 'epoch': 2.56}
85%|████████▌ | 9834/11526 [1:42:44<17:19, 1.63it/s] 85%|████████▌ | 9835/11526 [1:42:45<17:19, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.6865195035934448, 'learning_rate': 6.415121173255978e-07, 'epoch': 2.56}
85%|████████▌ | 9835/11526 [1:42:45<17:19, 1.63it/s] 85%|████████▌ | 9836/11526 [1:42:45<17:19, 1.63it/s] {'loss': 0.1458, 'grad_norm': 0.5163726210594177, 'learning_rate': 6.407702371594626e-07, 'epoch': 2.56}
85%|████████▌ | 9836/11526 [1:42:45<17:19, 1.63it/s] 85%|████████▌ | 9837/11526 [1:42:46<17:18, 1.63it/s] {'loss': 0.1257, 'grad_norm': 0.49086669087409973, 'learning_rate': 6.400287568466851e-07, 'epoch': 2.56}
85%|████████▌ | 9837/11526 [1:42:46<17:18, 1.63it/s] 85%|████████▌ | 9838/11526 [1:42:46<17:19, 1.62it/s] {'loss': 0.1912, 'grad_norm': 1.0210336446762085, 'learning_rate': 6.392876764552769e-07, 'epoch': 2.56}
85%|████████▌ | 9838/11526 [1:42:47<17:19, 1.62it/s] 85%|████████▌ | 9839/11526 [1:42:47<17:17, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.5671388506889343, 'learning_rate': 6.385469960532154e-07, 'epoch': 2.56}
85%|████████▌ | 9839/11526 [1:42:47<17:17, 1.63it/s] 85%|████████▌ | 9840/11526 [1:42:48<17:17, 1.63it/s] {'loss': 0.239, 'grad_norm': 0.6773092746734619, 'learning_rate': 6.378067157084383e-07, 'epoch': 2.56}
85%|████████▌ | 9840/11526 [1:42:48<17:17, 1.63it/s] 85%|████████▌ | 9841/11526 [1:42:48<17:17, 1.62it/s] {'loss': 0.1398, 'grad_norm': 0.5566588044166565, 'learning_rate': 6.370668354888476e-07, 'epoch': 2.56}
85%|████████▌ | 9841/11526 [1:42:48<17:17, 1.62it/s] 85%|████████▌ | 9842/11526 [1:42:49<17:17, 1.62it/s] {'loss': 0.1612, 'grad_norm': 0.6143399477005005, 'learning_rate': 6.363273554623134e-07, 'epoch': 2.56}
85%|████████▌ | 9842/11526 [1:42:49<17:17, 1.62it/s] 85%|████████▌ | 9843/11526 [1:42:50<17:16, 1.62it/s] {'loss': 0.1456, 'grad_norm': 0.6093246340751648, 'learning_rate': 6.355882756966608e-07, 'epoch': 2.56}
85%|████████▌ | 9843/11526 [1:42:50<17:16, 1.62it/s] 85%|████████▌ | 9844/11526 [1:42:50<17:14, 1.63it/s] {'loss': 0.167, 'grad_norm': 0.65690016746521, 'learning_rate': 6.348495962596846e-07, 'epoch': 2.56}
85%|████████▌ | 9844/11526 [1:42:50<17:14, 1.63it/s] 85%|████████▌ | 9845/11526 [1:42:51<17:13, 1.63it/s] {'loss': 0.1746, 'grad_norm': 0.6295498013496399, 'learning_rate': 6.341113172191399e-07, 'epoch': 2.56}
85%|████████▌ | 9845/11526 [1:42:51<17:13, 1.63it/s] 85%|████████▌ | 9846/11526 [1:42:51<17:13, 1.63it/s] {'loss': 0.1456, 'grad_norm': 0.5024082064628601, 'learning_rate': 6.333734386427459e-07, 'epoch': 2.56}
85%|████████▌ | 9846/11526 [1:42:52<17:13, 1.63it/s] 85%|████████▌ | 9847/11526 [1:42:52<17:13, 1.62it/s] {'loss': 0.1666, 'grad_norm': 0.5713688135147095, 'learning_rate': 6.326359605981863e-07, 'epoch': 2.56}
85%|████████▌ | 9847/11526 [1:42:52<17:13, 1.62it/s] 85%|████████▌ | 9848/11526 [1:42:53<17:12, 1.62it/s] {'loss': 0.1633, 'grad_norm': 0.6547908186912537, 'learning_rate': 6.318988831531048e-07, 'epoch': 2.56}
85%|████████▌ | 9848/11526 [1:42:53<17:12, 1.62it/s] 85%|████████▌ | 9849/11526 [1:42:53<17:11, 1.63it/s] {'loss': 0.1321, 'grad_norm': 0.6144531965255737, 'learning_rate': 6.311622063751111e-07, 'epoch': 2.56}
85%|████████▌ | 9849/11526 [1:42:53<17:11, 1.63it/s] 85%|████████▌ | 9850/11526 [1:42:54<17:10, 1.63it/s] {'loss': 0.1559, 'grad_norm': 0.578322172164917, 'learning_rate': 6.304259303317773e-07, 'epoch': 2.56}
85%|████████▌ | 9850/11526 [1:42:54<17:10, 1.63it/s] 85%|████████▌ | 9851/11526 [1:42:54<17:11, 1.62it/s] {'loss': 0.1269, 'grad_norm': 0.5219969749450684, 'learning_rate': 6.296900550906393e-07, 'epoch': 2.56}
85%|████████▌ | 9851/11526 [1:42:55<17:11, 1.62it/s] 85%|████████▌ | 9852/11526 [1:42:55<17:11, 1.62it/s] {'loss': 0.1497, 'grad_norm': 0.5898949503898621, 'learning_rate': 6.289545807191955e-07, 'epoch': 2.56}
85%|████████▌ | 9852/11526 [1:42:55<17:11, 1.62it/s] 85%|████████▌ | 9853/11526 [1:42:56<17:10, 1.62it/s] {'loss': 0.1438, 'grad_norm': 0.5577769875526428, 'learning_rate': 6.282195072849084e-07, 'epoch': 2.56}
85%|████████▌ | 9853/11526 [1:42:56<17:10, 1.62it/s] 85%|████████▌ | 9854/11526 [1:42:56<17:08, 1.63it/s] {'loss': 0.1453, 'grad_norm': 0.5636736750602722, 'learning_rate': 6.27484834855201e-07, 'epoch': 2.56}
85%|████████▌ | 9854/11526 [1:42:56<17:08, 1.63it/s] 86%|████████▌ | 9855/11526 [1:42:57<17:07, 1.63it/s] {'loss': 0.1431, 'grad_norm': 0.5624427795410156, 'learning_rate': 6.267505634974652e-07, 'epoch': 2.57}
86%|████████▌ | 9855/11526 [1:42:57<17:07, 1.63it/s] 86%|████████▌ | 9856/11526 [1:42:58<17:07, 1.62it/s] {'loss': 0.147, 'grad_norm': 0.6868594884872437, 'learning_rate': 6.26016693279049e-07, 'epoch': 2.57}
86%|████████▌ | 9856/11526 [1:42:58<17:07, 1.62it/s] 86%|████████▌ | 9857/11526 [1:42:58<17:07, 1.62it/s] {'loss': 0.143, 'grad_norm': 0.604425311088562, 'learning_rate': 6.252832242672685e-07, 'epoch': 2.57}
86%|████████▌ | 9857/11526 [1:42:58<17:07, 1.62it/s] 86%|████████▌ | 9858/11526 [1:42:59<17:06, 1.62it/s] {'loss': 0.1612, 'grad_norm': 0.6153068542480469, 'learning_rate': 6.245501565294021e-07, 'epoch': 2.57}
86%|████████▌ | 9858/11526 [1:42:59<17:06, 1.62it/s] 86%|████████▌ | 9859/11526 [1:42:59<17:05, 1.63it/s] {'loss': 0.1542, 'grad_norm': 0.6305254101753235, 'learning_rate': 6.23817490132691e-07, 'epoch': 2.57}
86%|████████▌ | 9859/11526 [1:43:00<17:05, 1.63it/s] 86%|████████▌ | 9860/11526 [1:43:00<17:04, 1.63it/s] {'loss': 0.155, 'grad_norm': 0.6427944302558899, 'learning_rate': 6.230852251443386e-07, 'epoch': 2.57}
86%|████████▌ | 9860/11526 [1:43:00<17:04, 1.63it/s] 86%|████████▌ | 9861/11526 [1:43:01<17:04, 1.63it/s] {'loss': 0.1525, 'grad_norm': 0.5663949847221375, 'learning_rate': 6.223533616315142e-07, 'epoch': 2.57}
86%|████████▌ | 9861/11526 [1:43:01<17:04, 1.63it/s] 86%|████████▌ | 9862/11526 [1:43:01<17:03, 1.63it/s] {'loss': 0.1665, 'grad_norm': 0.5961875915527344, 'learning_rate': 6.21621899661346e-07, 'epoch': 2.57}
86%|████████▌ | 9862/11526 [1:43:01<17:03, 1.63it/s] 86%|████████▌ | 9863/11526 [1:43:02<17:03, 1.62it/s] {'loss': 0.1462, 'grad_norm': 0.6382938623428345, 'learning_rate': 6.208908393009283e-07, 'epoch': 2.57}
86%|████████▌ | 9863/11526 [1:43:02<17:03, 1.62it/s] 86%|████████▌ | 9864/11526 [1:43:02<17:02, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.7002846002578735, 'learning_rate': 6.201601806173213e-07, 'epoch': 2.57}
86%|████████▌ | 9864/11526 [1:43:03<17:02, 1.63it/s] 86%|████████▌ | 9865/11526 [1:43:03<17:01, 1.63it/s] {'loss': 0.1212, 'grad_norm': 0.5305130481719971, 'learning_rate': 6.194299236775414e-07, 'epoch': 2.57}
86%|████████▌ | 9865/11526 [1:43:03<17:01, 1.63it/s] 86%|████████▌ | 9866/11526 [1:43:04<17:01, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.6163979768753052, 'learning_rate': 6.187000685485733e-07, 'epoch': 2.57}
86%|████████▌ | 9866/11526 [1:43:04<17:01, 1.63it/s] 86%|████████▌ | 9867/11526 [1:43:04<16:59, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.6167327165603638, 'learning_rate': 6.179706152973636e-07, 'epoch': 2.57}
86%|████████▌ | 9867/11526 [1:43:04<16:59, 1.63it/s] 86%|████████▌ | 9868/11526 [1:43:05<17:00, 1.62it/s] {'loss': 0.137, 'grad_norm': 0.5581744909286499, 'learning_rate': 6.172415639908213e-07, 'epoch': 2.57}
86%|████████▌ | 9868/11526 [1:43:05<17:00, 1.62it/s] 86%|████████▌ | 9869/11526 [1:43:06<16:59, 1.63it/s] {'loss': 0.1739, 'grad_norm': 0.6878037452697754, 'learning_rate': 6.165129146958209e-07, 'epoch': 2.57}
86%|████████▌ | 9869/11526 [1:43:06<16:59, 1.63it/s] 86%|████████▌ | 9870/11526 [1:43:06<16:58, 1.63it/s] {'loss': 0.1419, 'grad_norm': 0.5509684681892395, 'learning_rate': 6.157846674791961e-07, 'epoch': 2.57}
86%|████████▌ | 9870/11526 [1:43:06<16:58, 1.63it/s] 86%|████████▌ | 9871/11526 [1:43:07<16:59, 1.62it/s] {'loss': 0.1562, 'grad_norm': 0.5624169707298279, 'learning_rate': 6.150568224077469e-07, 'epoch': 2.57}
86%|████████▌ | 9871/11526 [1:43:07<16:59, 1.62it/s] 86%|████████▌ | 9872/11526 [1:43:07<16:57, 1.62it/s] {'loss': 0.1385, 'grad_norm': 0.5707820057868958, 'learning_rate': 6.143293795482352e-07, 'epoch': 2.57}
86%|████████▌ | 9872/11526 [1:43:08<16:57, 1.62it/s] 86%|████████▌ | 9873/11526 [1:43:08<16:58, 1.62it/s] {'loss': 0.1615, 'grad_norm': 0.6001137495040894, 'learning_rate': 6.136023389673862e-07, 'epoch': 2.57}
86%|████████▌ | 9873/11526 [1:43:08<16:58, 1.62it/s] 86%|████████▌ | 9874/11526 [1:43:09<16:56, 1.63it/s] {'loss': 0.139, 'grad_norm': 0.6125667691230774, 'learning_rate': 6.128757007318881e-07, 'epoch': 2.57}
86%|████████▌ | 9874/11526 [1:43:09<16:56, 1.63it/s] 86%|████████▌ | 9875/11526 [1:43:09<16:55, 1.63it/s] {'loss': 0.1308, 'grad_norm': 0.5557531118392944, 'learning_rate': 6.121494649083937e-07, 'epoch': 2.57}
86%|████████▌ | 9875/11526 [1:43:09<16:55, 1.63it/s] 86%|████████▌ | 9876/11526 [1:43:10<16:54, 1.63it/s] {'loss': 0.1581, 'grad_norm': 0.6240020394325256, 'learning_rate': 6.114236315635136e-07, 'epoch': 2.57}
86%|████████▌ | 9876/11526 [1:43:10<16:54, 1.63it/s] 86%|████████▌ | 9877/11526 [1:43:10<16:54, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.6546675562858582, 'learning_rate': 6.106982007638307e-07, 'epoch': 2.57}
86%|████████▌ | 9877/11526 [1:43:11<16:54, 1.63it/s] 86%|████████▌ | 9878/11526 [1:43:11<16:55, 1.62it/s] {'loss': 0.1293, 'grad_norm': 0.5502102375030518, 'learning_rate': 6.099731725758817e-07, 'epoch': 2.57}
86%|████████▌ | 9878/11526 [1:43:11<16:55, 1.62it/s] 86%|████████▌ | 9879/11526 [1:43:12<16:54, 1.62it/s] {'loss': 0.1305, 'grad_norm': 0.5078849196434021, 'learning_rate': 6.092485470661713e-07, 'epoch': 2.57}
86%|████████▌ | 9879/11526 [1:43:12<16:54, 1.62it/s] 86%|████████▌ | 9880/11526 [1:43:12<16:53, 1.62it/s] {'loss': 0.1351, 'grad_norm': 0.5390763282775879, 'learning_rate': 6.085243243011679e-07, 'epoch': 2.57}
86%|████████▌ | 9880/11526 [1:43:12<16:53, 1.62it/s] 86%|████████▌ | 9881/11526 [1:43:13<16:52, 1.62it/s] {'loss': 0.1875, 'grad_norm': 0.6662602424621582, 'learning_rate': 6.078005043472973e-07, 'epoch': 2.57}
86%|████████▌ | 9881/11526 [1:43:13<16:52, 1.62it/s] 86%|████████▌ | 9882/11526 [1:43:14<16:51, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.7293750643730164, 'learning_rate': 6.070770872709564e-07, 'epoch': 2.57}
86%|████████▌ | 9882/11526 [1:43:14<16:51, 1.63it/s] 86%|████████▌ | 9883/11526 [1:43:14<16:54, 1.62it/s] {'loss': 0.1831, 'grad_norm': 0.6287393569946289, 'learning_rate': 6.063540731385009e-07, 'epoch': 2.57}
86%|████████▌ | 9883/11526 [1:43:14<16:54, 1.62it/s] 86%|████████▌ | 9884/11526 [1:43:15<16:53, 1.62it/s] {'loss': 0.1903, 'grad_norm': 0.6597061157226562, 'learning_rate': 6.056314620162479e-07, 'epoch': 2.57}
86%|████████▌ | 9884/11526 [1:43:15<16:53, 1.62it/s] 86%|████████▌ | 9885/11526 [1:43:15<16:50, 1.62it/s] {'loss': 0.1502, 'grad_norm': 0.5588484406471252, 'learning_rate': 6.049092539704792e-07, 'epoch': 2.57}
86%|████████▌ | 9885/11526 [1:43:16<16:50, 1.62it/s] 86%|████████▌ | 9886/11526 [1:43:16<16:53, 1.62it/s] {'loss': 0.1376, 'grad_norm': 0.5717185139656067, 'learning_rate': 6.041874490674416e-07, 'epoch': 2.57}
86%|████████▌ | 9886/11526 [1:43:16<16:53, 1.62it/s] 86%|████████▌ | 9887/11526 [1:43:17<16:51, 1.62it/s] {'loss': 0.1431, 'grad_norm': 0.6708319783210754, 'learning_rate': 6.034660473733423e-07, 'epoch': 2.57}
86%|████████▌ | 9887/11526 [1:43:17<16:51, 1.62it/s] 86%|████████▌ | 9888/11526 [1:43:17<16:50, 1.62it/s] {'loss': 0.1401, 'grad_norm': 0.5568540692329407, 'learning_rate': 6.02745048954353e-07, 'epoch': 2.57}
86%|████████▌ | 9888/11526 [1:43:17<16:50, 1.62it/s] 86%|████████▌ | 9889/11526 [1:43:18<16:48, 1.62it/s] {'loss': 0.1555, 'grad_norm': 0.6130145788192749, 'learning_rate': 6.020244538766062e-07, 'epoch': 2.57}
86%|████████▌ | 9889/11526 [1:43:18<16:48, 1.62it/s] 86%|████████▌ | 9890/11526 [1:43:19<16:46, 1.63it/s] {'loss': 0.1724, 'grad_norm': 0.6039929986000061, 'learning_rate': 6.013042622061987e-07, 'epoch': 2.57}
86%|████████▌ | 9890/11526 [1:43:19<16:46, 1.63it/s] 86%|████████▌ | 9891/11526 [1:43:19<16:46, 1.62it/s] {'loss': 0.197, 'grad_norm': 0.738967776298523, 'learning_rate': 6.005844740091943e-07, 'epoch': 2.57}
86%|████████▌ | 9891/11526 [1:43:19<16:46, 1.62it/s] 86%|████████▌ | 9892/11526 [1:43:20<16:45, 1.63it/s] {'loss': 0.158, 'grad_norm': 0.603967547416687, 'learning_rate': 5.998650893516122e-07, 'epoch': 2.57}
86%|████████▌ | 9892/11526 [1:43:20<16:45, 1.63it/s] 86%|████████▌ | 9893/11526 [1:43:20<16:45, 1.62it/s] {'loss': 0.13, 'grad_norm': 0.5657825469970703, 'learning_rate': 5.991461082994404e-07, 'epoch': 2.57}
86%|████████▌ | 9893/11526 [1:43:20<16:45, 1.62it/s] 86%|████████▌ | 9894/11526 [1:43:21<16:43, 1.63it/s] {'loss': 0.1537, 'grad_norm': 0.6244953274726868, 'learning_rate': 5.984275309186266e-07, 'epoch': 2.58}
86%|████████▌ | 9894/11526 [1:43:21<16:43, 1.63it/s] 86%|████████▌ | 9895/11526 [1:43:22<16:42, 1.63it/s] {'loss': 0.1998, 'grad_norm': 0.7195156812667847, 'learning_rate': 5.977093572750842e-07, 'epoch': 2.58}
86%|████████▌ | 9895/11526 [1:43:22<16:42, 1.63it/s] 86%|████████▌ | 9896/11526 [1:43:22<16:42, 1.63it/s] {'loss': 0.1246, 'grad_norm': 0.5213212966918945, 'learning_rate': 5.969915874346882e-07, 'epoch': 2.58}
86%|████████▌ | 9896/11526 [1:43:22<16:42, 1.63it/s] 86%|████████▌ | 9897/11526 [1:43:23<16:41, 1.63it/s] {'loss': 0.1186, 'grad_norm': 0.5092576146125793, 'learning_rate': 5.962742214632749e-07, 'epoch': 2.58}
86%|████████▌ | 9897/11526 [1:43:23<16:41, 1.63it/s] 86%|████████▌ | 9898/11526 [1:43:23<16:41, 1.63it/s] {'loss': 0.1486, 'grad_norm': 0.553085207939148, 'learning_rate': 5.955572594266468e-07, 'epoch': 2.58}
86%|████████▌ | 9898/11526 [1:43:24<16:41, 1.63it/s] 86%|████████▌ | 9899/11526 [1:43:24<16:40, 1.63it/s] {'loss': 0.1323, 'grad_norm': 0.5527507066726685, 'learning_rate': 5.948407013905666e-07, 'epoch': 2.58}
86%|████████▌ | 9899/11526 [1:43:24<16:40, 1.63it/s] 86%|████████▌ | 9900/11526 [1:43:25<16:41, 1.62it/s] {'loss': 0.1393, 'grad_norm': 0.5563964247703552, 'learning_rate': 5.941245474207613e-07, 'epoch': 2.58}
86%|████████▌ | 9900/11526 [1:43:25<16:41, 1.62it/s] 86%|████████▌ | 9901/11526 [1:43:25<16:39, 1.63it/s] {'loss': 0.1255, 'grad_norm': 0.47788140177726746, 'learning_rate': 5.934087975829211e-07, 'epoch': 2.58}
86%|████████▌ | 9901/11526 [1:43:25<16:39, 1.63it/s] 86%|████████▌ | 9902/11526 [1:43:26<16:38, 1.63it/s] {'loss': 0.108, 'grad_norm': 0.4427991211414337, 'learning_rate': 5.926934519426996e-07, 'epoch': 2.58}
86%|████████▌ | 9902/11526 [1:43:26<16:38, 1.63it/s] 86%|████████▌ | 9903/11526 [1:43:27<16:38, 1.62it/s] {'loss': 0.1294, 'grad_norm': 0.6029877066612244, 'learning_rate': 5.919785105657095e-07, 'epoch': 2.58}
86%|████████▌ | 9903/11526 [1:43:27<16:38, 1.62it/s] 86%|████████▌ | 9904/11526 [1:43:27<16:37, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.6042262315750122, 'learning_rate': 5.912639735175324e-07, 'epoch': 2.58}
86%|████████▌ | 9904/11526 [1:43:27<16:37, 1.63it/s] 86%|████████▌ | 9905/11526 [1:43:28<16:36, 1.63it/s] {'loss': 0.1423, 'grad_norm': 0.6475872993469238, 'learning_rate': 5.905498408637078e-07, 'epoch': 2.58}
86%|████████▌ | 9905/11526 [1:43:28<16:36, 1.63it/s] 86%|████████▌ | 9906/11526 [1:43:28<16:35, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5938950777053833, 'learning_rate': 5.898361126697399e-07, 'epoch': 2.58}
86%|████████▌ | 9906/11526 [1:43:28<16:35, 1.63it/s] 86%|████████▌ | 9907/11526 [1:43:29<16:34, 1.63it/s] {'loss': 0.1546, 'grad_norm': 0.5683493614196777, 'learning_rate': 5.89122789001098e-07, 'epoch': 2.58}
86%|████████▌ | 9907/11526 [1:43:29<16:34, 1.63it/s] 86%|████████▌ | 9908/11526 [1:43:30<16:37, 1.62it/s] {'loss': 0.1701, 'grad_norm': 0.6042677760124207, 'learning_rate': 5.884098699232088e-07, 'epoch': 2.58}
86%|████████▌ | 9908/11526 [1:43:30<16:37, 1.62it/s] 86%|████████▌ | 9909/11526 [1:43:30<16:35, 1.62it/s] {'loss': 0.1543, 'grad_norm': 0.6122987866401672, 'learning_rate': 5.876973555014686e-07, 'epoch': 2.58}
86%|████████▌ | 9909/11526 [1:43:30<16:35, 1.62it/s] 86%|████████▌ | 9910/11526 [1:43:31<16:33, 1.63it/s] {'loss': 0.14, 'grad_norm': 0.5719583630561829, 'learning_rate': 5.869852458012321e-07, 'epoch': 2.58}
86%|████████▌ | 9910/11526 [1:43:31<16:33, 1.63it/s] 86%|████████▌ | 9911/11526 [1:43:31<16:33, 1.63it/s] {'loss': 0.1396, 'grad_norm': 0.5785710215568542, 'learning_rate': 5.862735408878173e-07, 'epoch': 2.58}
86%|████████▌ | 9911/11526 [1:43:32<16:33, 1.63it/s] 86%|████████▌ | 9912/11526 [1:43:32<16:32, 1.63it/s] {'loss': 0.1498, 'grad_norm': 0.6303358674049377, 'learning_rate': 5.855622408265066e-07, 'epoch': 2.58}
86%|████████▌ | 9912/11526 [1:43:32<16:32, 1.63it/s] 86%|████████▌ | 9913/11526 [1:43:33<16:32, 1.62it/s] {'loss': 0.1619, 'grad_norm': 0.5588133335113525, 'learning_rate': 5.848513456825439e-07, 'epoch': 2.58}
86%|████████▌ | 9913/11526 [1:43:33<16:32, 1.62it/s] 86%|████████▌ | 9914/11526 [1:43:33<16:31, 1.63it/s] {'loss': 0.1261, 'grad_norm': 0.5251164436340332, 'learning_rate': 5.841408555211369e-07, 'epoch': 2.58}
86%|████████▌ | 9914/11526 [1:43:33<16:31, 1.63it/s] 86%|████████▌ | 9915/11526 [1:43:34<16:31, 1.63it/s] {'loss': 0.1685, 'grad_norm': 0.6064478754997253, 'learning_rate': 5.834307704074571e-07, 'epoch': 2.58}
86%|████████▌ | 9915/11526 [1:43:34<16:31, 1.63it/s] 86%|████████▌ | 9916/11526 [1:43:34<16:30, 1.63it/s] {'loss': 0.1671, 'grad_norm': 0.6533164381980896, 'learning_rate': 5.827210904066344e-07, 'epoch': 2.58}
86%|████████▌ | 9916/11526 [1:43:35<16:30, 1.63it/s] 86%|████████▌ | 9917/11526 [1:43:35<16:29, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.561743438243866, 'learning_rate': 5.820118155837672e-07, 'epoch': 2.58}
86%|████████▌ | 9917/11526 [1:43:35<16:29, 1.63it/s] 86%|████████▌ | 9918/11526 [1:43:36<16:29, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5874237418174744, 'learning_rate': 5.813029460039149e-07, 'epoch': 2.58}
86%|████████▌ | 9918/11526 [1:43:36<16:29, 1.63it/s] 86%|████████▌ | 9919/11526 [1:43:36<16:28, 1.63it/s] {'loss': 0.1968, 'grad_norm': 0.7040976285934448, 'learning_rate': 5.805944817320969e-07, 'epoch': 2.58}
86%|████████▌ | 9919/11526 [1:43:36<16:28, 1.63it/s] 86%|████████▌ | 9920/11526 [1:43:37<16:27, 1.63it/s] {'loss': 0.1904, 'grad_norm': 0.7043195366859436, 'learning_rate': 5.798864228332979e-07, 'epoch': 2.58}
86%|████████▌ | 9920/11526 [1:43:37<16:27, 1.63it/s] 86%|████████▌ | 9921/11526 [1:43:38<16:29, 1.62it/s] {'loss': 0.1598, 'grad_norm': 0.6243916153907776, 'learning_rate': 5.79178769372466e-07, 'epoch': 2.58}
86%|████████▌ | 9921/11526 [1:43:38<16:29, 1.62it/s] 86%|████████▌ | 9922/11526 [1:43:38<16:28, 1.62it/s] {'loss': 0.1447, 'grad_norm': 0.5364164113998413, 'learning_rate': 5.784715214145104e-07, 'epoch': 2.58}
86%|████████▌ | 9922/11526 [1:43:38<16:28, 1.62it/s] 86%|████████▌ | 9923/11526 [1:43:39<16:32, 1.61it/s] {'loss': 0.1992, 'grad_norm': 0.7454203963279724, 'learning_rate': 5.777646790243058e-07, 'epoch': 2.58}
86%|████████▌ | 9923/11526 [1:43:39<16:32, 1.61it/s] 86%|████████▌ | 9924/11526 [1:43:39<16:30, 1.62it/s] {'loss': 0.1266, 'grad_norm': 0.5154111385345459, 'learning_rate': 5.770582422666848e-07, 'epoch': 2.58}
86%|████████▌ | 9924/11526 [1:43:40<16:30, 1.62it/s] 86%|████████▌ | 9925/11526 [1:43:40<16:28, 1.62it/s] {'loss': 0.1293, 'grad_norm': 0.6187331676483154, 'learning_rate': 5.763522112064463e-07, 'epoch': 2.58}
86%|████████▌ | 9925/11526 [1:43:40<16:28, 1.62it/s] 86%|████████▌ | 9926/11526 [1:43:41<16:30, 1.62it/s] {'loss': 0.0928, 'grad_norm': 0.3975358307361603, 'learning_rate': 5.756465859083549e-07, 'epoch': 2.58}
86%|████████▌ | 9926/11526 [1:43:41<16:30, 1.62it/s] 86%|████████▌ | 9927/11526 [1:43:41<16:27, 1.62it/s] {'loss': 0.159, 'grad_norm': 0.6481603384017944, 'learning_rate': 5.749413664371312e-07, 'epoch': 2.58}
86%|████████▌ | 9927/11526 [1:43:41<16:27, 1.62it/s] 86%|████████▌ | 9928/11526 [1:43:42<16:26, 1.62it/s] {'loss': 0.1675, 'grad_norm': 0.620785653591156, 'learning_rate': 5.742365528574629e-07, 'epoch': 2.58}
86%|████████▌ | 9928/11526 [1:43:42<16:26, 1.62it/s] 86%|████████▌ | 9929/11526 [1:43:43<16:23, 1.62it/s] {'loss': 0.1377, 'grad_norm': 0.5733993649482727, 'learning_rate': 5.735321452340004e-07, 'epoch': 2.58}
86%|████████▌ | 9929/11526 [1:43:43<16:23, 1.62it/s] 86%|████████▌ | 9930/11526 [1:43:43<16:22, 1.62it/s] {'loss': 0.1674, 'grad_norm': 0.7697559595108032, 'learning_rate': 5.728281436313532e-07, 'epoch': 2.58}
86%|████████▌ | 9930/11526 [1:43:43<16:22, 1.62it/s] 86%|████████▌ | 9931/11526 [1:43:44<16:21, 1.62it/s] {'loss': 0.1419, 'grad_norm': 0.5534998774528503, 'learning_rate': 5.721245481140991e-07, 'epoch': 2.58}
86%|████████▌ | 9931/11526 [1:43:44<16:21, 1.62it/s] 86%|████████▌ | 9932/11526 [1:43:44<16:20, 1.63it/s] {'loss': 0.1547, 'grad_norm': 0.6287319660186768, 'learning_rate': 5.714213587467759e-07, 'epoch': 2.59}
86%|████████▌ | 9932/11526 [1:43:44<16:20, 1.63it/s] 86%|████████▌ | 9933/11526 [1:43:45<16:20, 1.62it/s] {'loss': 0.1543, 'grad_norm': 0.5667932629585266, 'learning_rate': 5.707185755938822e-07, 'epoch': 2.59}
86%|████████▌ | 9933/11526 [1:43:45<16:20, 1.62it/s] 86%|████████▌ | 9934/11526 [1:43:46<16:19, 1.63it/s] {'loss': 0.144, 'grad_norm': 0.5575786232948303, 'learning_rate': 5.700161987198827e-07, 'epoch': 2.59}
86%|████████▌ | 9934/11526 [1:43:46<16:19, 1.63it/s] 86%|████████▌ | 9935/11526 [1:43:46<16:17, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.6634852886199951, 'learning_rate': 5.693142281892027e-07, 'epoch': 2.59}
86%|████████▌ | 9935/11526 [1:43:46<16:17, 1.63it/s] 86%|████████▌ | 9936/11526 [1:43:47<16:17, 1.63it/s] {'loss': 0.1238, 'grad_norm': 0.5190602540969849, 'learning_rate': 5.686126640662316e-07, 'epoch': 2.59}
86%|████████▌ | 9936/11526 [1:43:47<16:17, 1.63it/s] 86%|████████▌ | 9937/11526 [1:43:47<16:17, 1.63it/s] {'loss': 0.1824, 'grad_norm': 0.6033498048782349, 'learning_rate': 5.679115064153212e-07, 'epoch': 2.59}
86%|████████▌ | 9937/11526 [1:43:48<16:17, 1.63it/s] 86%|████████▌ | 9938/11526 [1:43:48<16:16, 1.63it/s] {'loss': 0.1401, 'grad_norm': 0.603104829788208, 'learning_rate': 5.672107553007838e-07, 'epoch': 2.59}
86%|████████▌ | 9938/11526 [1:43:48<16:16, 1.63it/s] 86%|████████▌ | 9939/11526 [1:43:49<16:16, 1.63it/s] {'loss': 0.1776, 'grad_norm': 0.6000518202781677, 'learning_rate': 5.665104107868969e-07, 'epoch': 2.59}
86%|████████▌ | 9939/11526 [1:43:49<16:16, 1.63it/s] 86%|████████▌ | 9940/11526 [1:43:49<16:15, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.6227283477783203, 'learning_rate': 5.658104729379027e-07, 'epoch': 2.59}
86%|████████▌ | 9940/11526 [1:43:49<16:15, 1.63it/s] 86%|████████▌ | 9941/11526 [1:43:50<16:19, 1.62it/s] {'loss': 0.1315, 'grad_norm': 0.5333222150802612, 'learning_rate': 5.651109418180001e-07, 'epoch': 2.59}
86%|████████▌ | 9941/11526 [1:43:50<16:19, 1.62it/s] 86%|████████▋ | 9942/11526 [1:43:51<16:16, 1.62it/s] {'loss': 0.1396, 'grad_norm': 0.47970306873321533, 'learning_rate': 5.644118174913554e-07, 'epoch': 2.59}
86%|████████▋ | 9942/11526 [1:43:51<16:16, 1.62it/s] 86%|████████▋ | 9943/11526 [1:43:51<16:16, 1.62it/s] {'loss': 0.1645, 'grad_norm': 0.6361129283905029, 'learning_rate': 5.637131000220963e-07, 'epoch': 2.59}
86%|████████▋ | 9943/11526 [1:43:51<16:16, 1.62it/s] 86%|████████▋ | 9944/11526 [1:43:52<16:14, 1.62it/s] {'loss': 0.1384, 'grad_norm': 0.6212122440338135, 'learning_rate': 5.630147894743132e-07, 'epoch': 2.59}
86%|████████▋ | 9944/11526 [1:43:52<16:14, 1.62it/s] 86%|████████▋ | 9945/11526 [1:43:52<16:13, 1.62it/s] {'loss': 0.1226, 'grad_norm': 0.44704845547676086, 'learning_rate': 5.623168859120598e-07, 'epoch': 2.59}
86%|████████▋ | 9945/11526 [1:43:52<16:13, 1.62it/s] 86%|████████▋ | 9946/11526 [1:43:53<16:12, 1.62it/s] {'loss': 0.2082, 'grad_norm': 0.819840133190155, 'learning_rate': 5.616193893993499e-07, 'epoch': 2.59}
86%|████████▋ | 9946/11526 [1:43:53<16:12, 1.62it/s] 86%|████████▋ | 9947/11526 [1:43:54<16:11, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.5957873463630676, 'learning_rate': 5.609223000001635e-07, 'epoch': 2.59}
86%|████████▋ | 9947/11526 [1:43:54<16:11, 1.63it/s] 86%|████████▋ | 9948/11526 [1:43:54<16:10, 1.63it/s] {'loss': 0.1374, 'grad_norm': 0.49919092655181885, 'learning_rate': 5.602256177784404e-07, 'epoch': 2.59}
86%|████████▋ | 9948/11526 [1:43:54<16:10, 1.63it/s] 86%|████████▋ | 9949/11526 [1:43:55<16:10, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.5877353549003601, 'learning_rate': 5.595293427980853e-07, 'epoch': 2.59}
86%|████████▋ | 9949/11526 [1:43:55<16:10, 1.63it/s] 86%|████████▋ | 9950/11526 [1:43:55<16:09, 1.63it/s] {'loss': 0.1674, 'grad_norm': 0.6947341561317444, 'learning_rate': 5.58833475122964e-07, 'epoch': 2.59}
86%|████████▋ | 9950/11526 [1:43:56<16:09, 1.63it/s] 86%|████████▋ | 9951/11526 [1:43:56<16:08, 1.63it/s] {'loss': 0.1401, 'grad_norm': 0.5442219972610474, 'learning_rate': 5.581380148169069e-07, 'epoch': 2.59}
86%|████████▋ | 9951/11526 [1:43:56<16:08, 1.63it/s] 86%|████████▋ | 9952/11526 [1:43:57<16:07, 1.63it/s] {'loss': 0.1408, 'grad_norm': 0.6039650440216064, 'learning_rate': 5.574429619437016e-07, 'epoch': 2.59}
86%|████████▋ | 9952/11526 [1:43:57<16:07, 1.63it/s] 86%|████████▋ | 9953/11526 [1:43:57<16:06, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.643884003162384, 'learning_rate': 5.567483165671079e-07, 'epoch': 2.59}
86%|████████▋ | 9953/11526 [1:43:57<16:06, 1.63it/s] 86%|████████▋ | 9954/11526 [1:43:58<16:05, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.5996653437614441, 'learning_rate': 5.560540787508379e-07, 'epoch': 2.59}
86%|████████▋ | 9954/11526 [1:43:58<16:05, 1.63it/s] 86%|████████▋ | 9955/11526 [1:43:59<16:04, 1.63it/s] {'loss': 0.1381, 'grad_norm': 0.5358332395553589, 'learning_rate': 5.553602485585729e-07, 'epoch': 2.59}
86%|████████▋ | 9955/11526 [1:43:59<16:04, 1.63it/s] 86%|████████▋ | 9956/11526 [1:43:59<16:04, 1.63it/s] {'loss': 0.1425, 'grad_norm': 0.5609913468360901, 'learning_rate': 5.546668260539556e-07, 'epoch': 2.59}
86%|████████▋ | 9956/11526 [1:43:59<16:04, 1.63it/s] 86%|████████▋ | 9957/11526 [1:44:00<16:03, 1.63it/s] {'loss': 0.156, 'grad_norm': 0.6231555342674255, 'learning_rate': 5.539738113005883e-07, 'epoch': 2.59}
86%|████████▋ | 9957/11526 [1:44:00<16:03, 1.63it/s] 86%|████████▋ | 9958/11526 [1:44:00<16:02, 1.63it/s] {'loss': 0.1377, 'grad_norm': 0.5278592705726624, 'learning_rate': 5.532812043620406e-07, 'epoch': 2.59}
86%|████████▋ | 9958/11526 [1:44:00<16:02, 1.63it/s] 86%|████████▋ | 9959/11526 [1:44:01<16:02, 1.63it/s] {'loss': 0.1555, 'grad_norm': 0.6639684438705444, 'learning_rate': 5.525890053018429e-07, 'epoch': 2.59}
86%|████████▋ | 9959/11526 [1:44:01<16:02, 1.63it/s] 86%|████████▋ | 9960/11526 [1:44:02<16:02, 1.63it/s] {'loss': 0.128, 'grad_norm': 0.465690940618515, 'learning_rate': 5.518972141834856e-07, 'epoch': 2.59}
86%|████████▋ | 9960/11526 [1:44:02<16:02, 1.63it/s] 86%|████████▋ | 9961/11526 [1:44:02<16:01, 1.63it/s] {'loss': 0.1491, 'grad_norm': 0.5271123051643372, 'learning_rate': 5.512058310704238e-07, 'epoch': 2.59}
86%|████████▋ | 9961/11526 [1:44:02<16:01, 1.63it/s] 86%|████████▋ | 9962/11526 [1:44:03<16:01, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.5518385171890259, 'learning_rate': 5.505148560260765e-07, 'epoch': 2.59}
86%|████████▋ | 9962/11526 [1:44:03<16:01, 1.63it/s] 86%|████████▋ | 9963/11526 [1:44:03<15:59, 1.63it/s] {'loss': 0.1421, 'grad_norm': 0.6396676301956177, 'learning_rate': 5.498242891138228e-07, 'epoch': 2.59}
86%|████████▋ | 9963/11526 [1:44:04<15:59, 1.63it/s] 86%|████████▋ | 9964/11526 [1:44:04<15:59, 1.63it/s] {'loss': 0.1981, 'grad_norm': 0.7003134489059448, 'learning_rate': 5.491341303970066e-07, 'epoch': 2.59}
86%|████████▋ | 9964/11526 [1:44:04<15:59, 1.63it/s] 86%|████████▋ | 9965/11526 [1:44:05<15:59, 1.63it/s] {'loss': 0.1431, 'grad_norm': 0.6256856918334961, 'learning_rate': 5.484443799389305e-07, 'epoch': 2.59}
86%|████████▋ | 9965/11526 [1:44:05<15:59, 1.63it/s] 86%|████████▋ | 9966/11526 [1:44:05<15:58, 1.63it/s] {'loss': 0.1193, 'grad_norm': 0.5110124945640564, 'learning_rate': 5.477550378028651e-07, 'epoch': 2.59}
86%|████████▋ | 9966/11526 [1:44:05<15:58, 1.63it/s] 86%|████████▋ | 9967/11526 [1:44:06<15:58, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.6347026824951172, 'learning_rate': 5.470661040520408e-07, 'epoch': 2.59}
86%|████████▋ | 9967/11526 [1:44:06<15:58, 1.63it/s] 86%|████████▋ | 9968/11526 [1:44:06<15:57, 1.63it/s] {'loss': 0.1679, 'grad_norm': 0.613913357257843, 'learning_rate': 5.463775787496484e-07, 'epoch': 2.59}
86%|████████▋ | 9968/11526 [1:44:07<15:57, 1.63it/s] 86%|████████▋ | 9969/11526 [1:44:07<15:56, 1.63it/s] {'loss': 0.1344, 'grad_norm': 0.5282475352287292, 'learning_rate': 5.456894619588449e-07, 'epoch': 2.59}
86%|████████▋ | 9969/11526 [1:44:07<15:56, 1.63it/s] 87%|████████▋ | 9970/11526 [1:44:08<15:55, 1.63it/s] {'loss': 0.1305, 'grad_norm': 0.5569395422935486, 'learning_rate': 5.450017537427477e-07, 'epoch': 2.6}
87%|████████▋ | 9970/11526 [1:44:08<15:55, 1.63it/s] 87%|████████▋ | 9971/11526 [1:44:08<15:55, 1.63it/s] {'loss': 0.126, 'grad_norm': 0.5819594264030457, 'learning_rate': 5.443144541644379e-07, 'epoch': 2.6}
87%|████████▋ | 9971/11526 [1:44:08<15:55, 1.63it/s] 87%|████████▋ | 9972/11526 [1:44:09<15:54, 1.63it/s] {'loss': 0.1482, 'grad_norm': 0.6525315642356873, 'learning_rate': 5.436275632869592e-07, 'epoch': 2.6}
87%|████████▋ | 9972/11526 [1:44:09<15:54, 1.63it/s] 87%|████████▋ | 9973/11526 [1:44:10<15:53, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.575621485710144, 'learning_rate': 5.429410811733143e-07, 'epoch': 2.6}
87%|████████▋ | 9973/11526 [1:44:10<15:53, 1.63it/s] 87%|████████▋ | 9974/11526 [1:44:10<15:54, 1.63it/s] {'loss': 0.1821, 'grad_norm': 0.8595403432846069, 'learning_rate': 5.422550078864731e-07, 'epoch': 2.6}
87%|████████▋ | 9974/11526 [1:44:10<15:54, 1.63it/s] 87%|████████▋ | 9975/11526 [1:44:11<15:53, 1.63it/s] {'loss': 0.1728, 'grad_norm': 0.6323190331459045, 'learning_rate': 5.415693434893677e-07, 'epoch': 2.6}
87%|████████▋ | 9975/11526 [1:44:11<15:53, 1.63it/s] 87%|████████▋ | 9976/11526 [1:44:11<15:52, 1.63it/s] {'loss': 0.188, 'grad_norm': 0.6392450928688049, 'learning_rate': 5.408840880448879e-07, 'epoch': 2.6}
87%|████████▋ | 9976/11526 [1:44:12<15:52, 1.63it/s] 87%|████████▋ | 9977/11526 [1:44:12<15:51, 1.63it/s] {'loss': 0.1773, 'grad_norm': 0.6600472331047058, 'learning_rate': 5.401992416158919e-07, 'epoch': 2.6}
87%|████████▋ | 9977/11526 [1:44:12<15:51, 1.63it/s] 87%|████████▋ | 9978/11526 [1:44:13<15:51, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.6075108051300049, 'learning_rate': 5.395148042651971e-07, 'epoch': 2.6}
87%|████████▋ | 9978/11526 [1:44:13<15:51, 1.63it/s] 87%|████████▋ | 9979/11526 [1:44:13<15:50, 1.63it/s] {'loss': 0.1557, 'grad_norm': 0.5888816118240356, 'learning_rate': 5.388307760555816e-07, 'epoch': 2.6}
87%|████████▋ | 9979/11526 [1:44:13<15:50, 1.63it/s] 87%|████████▋ | 9980/11526 [1:44:14<15:49, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.575520396232605, 'learning_rate': 5.38147157049792e-07, 'epoch': 2.6}
87%|████████▋ | 9980/11526 [1:44:14<15:49, 1.63it/s] 87%|████████▋ | 9981/11526 [1:44:14<15:48, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.6586061120033264, 'learning_rate': 5.374639473105319e-07, 'epoch': 2.6}
87%|████████▋ | 9981/11526 [1:44:15<15:48, 1.63it/s] 87%|████████▋ | 9982/11526 [1:44:15<15:48, 1.63it/s] {'loss': 0.163, 'grad_norm': 0.6642298102378845, 'learning_rate': 5.367811469004685e-07, 'epoch': 2.6}
87%|████████▋ | 9982/11526 [1:44:15<15:48, 1.63it/s] 87%|████████▋ | 9983/11526 [1:44:16<15:47, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.5948525071144104, 'learning_rate': 5.360987558822334e-07, 'epoch': 2.6}
87%|████████▋ | 9983/11526 [1:44:16<15:47, 1.63it/s] 87%|████████▋ | 9984/11526 [1:44:16<15:46, 1.63it/s] {'loss': 0.1267, 'grad_norm': 0.5581363439559937, 'learning_rate': 5.354167743184191e-07, 'epoch': 2.6}
87%|████████▋ | 9984/11526 [1:44:16<15:46, 1.63it/s] 87%|████████▋ | 9985/11526 [1:44:17<15:46, 1.63it/s] {'loss': 0.1529, 'grad_norm': 0.5944643020629883, 'learning_rate': 5.347352022715802e-07, 'epoch': 2.6}
87%|████████▋ | 9985/11526 [1:44:17<15:46, 1.63it/s] 87%|████████▋ | 9986/11526 [1:44:18<15:45, 1.63it/s] {'loss': 0.1108, 'grad_norm': 0.43768689036369324, 'learning_rate': 5.340540398042365e-07, 'epoch': 2.6}
87%|████████▋ | 9986/11526 [1:44:18<15:45, 1.63it/s] 87%|████████▋ | 9987/11526 [1:44:18<15:45, 1.63it/s] {'loss': 0.1498, 'grad_norm': 0.5624591708183289, 'learning_rate': 5.33373286978865e-07, 'epoch': 2.6}
87%|████████▋ | 9987/11526 [1:44:18<15:45, 1.63it/s] 87%|████████▋ | 9988/11526 [1:44:19<15:44, 1.63it/s] {'loss': 0.1255, 'grad_norm': 0.5055280327796936, 'learning_rate': 5.326929438579098e-07, 'epoch': 2.6}
87%|████████▋ | 9988/11526 [1:44:19<15:44, 1.63it/s] 87%|████████▋ | 9989/11526 [1:44:19<15:43, 1.63it/s] {'loss': 0.1716, 'grad_norm': 0.6717715263366699, 'learning_rate': 5.320130105037757e-07, 'epoch': 2.6}
87%|████████▋ | 9989/11526 [1:44:20<15:43, 1.63it/s] 87%|████████▋ | 9990/11526 [1:44:20<15:43, 1.63it/s] {'loss': 0.1227, 'grad_norm': 0.5374771356582642, 'learning_rate': 5.3133348697883e-07, 'epoch': 2.6}
87%|████████▋ | 9990/11526 [1:44:20<15:43, 1.63it/s] 87%|████████▋ | 9991/11526 [1:44:21<15:43, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5746934413909912, 'learning_rate': 5.306543733454028e-07, 'epoch': 2.6}
87%|████████▋ | 9991/11526 [1:44:21<15:43, 1.63it/s] 87%|████████▋ | 9992/11526 [1:44:21<15:42, 1.63it/s] {'loss': 0.118, 'grad_norm': 0.49904748797416687, 'learning_rate': 5.299756696657854e-07, 'epoch': 2.6}
87%|████████▋ | 9992/11526 [1:44:21<15:42, 1.63it/s] 87%|████████▋ | 9993/11526 [1:44:22<15:41, 1.63it/s] {'loss': 0.144, 'grad_norm': 0.5467403531074524, 'learning_rate': 5.292973760022335e-07, 'epoch': 2.6}
87%|████████▋ | 9993/11526 [1:44:22<15:41, 1.63it/s] 87%|████████▋ | 9994/11526 [1:44:22<15:40, 1.63it/s] {'loss': 0.1065, 'grad_norm': 0.4563208520412445, 'learning_rate': 5.286194924169647e-07, 'epoch': 2.6}
87%|████████▋ | 9994/11526 [1:44:23<15:40, 1.63it/s] 87%|████████▋ | 9995/11526 [1:44:23<15:40, 1.63it/s] {'loss': 0.1763, 'grad_norm': 0.705650806427002, 'learning_rate': 5.279420189721557e-07, 'epoch': 2.6}
87%|████████▋ | 9995/11526 [1:44:23<15:40, 1.63it/s] 87%|████████▋ | 9996/11526 [1:44:24<15:39, 1.63it/s] {'loss': 0.1388, 'grad_norm': 0.6301333904266357, 'learning_rate': 5.272649557299498e-07, 'epoch': 2.6}
87%|████████▋ | 9996/11526 [1:44:24<15:39, 1.63it/s] 87%|████████▋ | 9997/11526 [1:44:24<15:39, 1.63it/s] {'loss': 0.1135, 'grad_norm': 0.5541066527366638, 'learning_rate': 5.26588302752451e-07, 'epoch': 2.6}
87%|████████▋ | 9997/11526 [1:44:24<15:39, 1.63it/s] 87%|████████▋ | 9998/11526 [1:44:25<15:38, 1.63it/s] {'loss': 0.1703, 'grad_norm': 0.640378475189209, 'learning_rate': 5.259120601017253e-07, 'epoch': 2.6}
87%|████████▋ | 9998/11526 [1:44:25<15:38, 1.63it/s] 87%|████████▋ | 9999/11526 [1:44:26<15:37, 1.63it/s] {'loss': 0.1395, 'grad_norm': 0.5447047352790833, 'learning_rate': 5.252362278398027e-07, 'epoch': 2.6}
87%|████████▋ | 9999/11526 [1:44:26<15:37, 1.63it/s] 87%|████████▋ | 10000/11526 [1:44:26<15:40, 1.62it/s] {'loss': 0.1677, 'grad_norm': 0.644564688205719, 'learning_rate': 5.245608060286744e-07, 'epoch': 2.6}
87%|████████▋ | 10000/11526 [1:44:26<15:40, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.23it/s]
31%|███ | 4/13 [00:00<00:01, 8.37it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.76it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5436583757400513, 'eval_runtime': 1.9569, 'eval_samples_per_second': 102.203, 'eval_steps_per_second': 6.643, 'epoch': 2.6}
87%|████████▋ | 10000/11526 [1:44:28<15:40, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 87%|████████▋ | 10001/11526 [1:44:44<2:29:33, 5.88s/it] {'loss': 0.1768, 'grad_norm': 0.7560064196586609, 'learning_rate': 5.238857947302911e-07, 'epoch': 2.6}
87%|████████▋ | 10001/11526 [1:44:44<2:29:33, 5.88s/it] 87%|████████▋ | 10002/11526 [1:44:45<1:49:17, 4.30s/it] {'loss': 0.1368, 'grad_norm': 0.5181815028190613, 'learning_rate': 5.232111940065726e-07, 'epoch': 2.6}
87%|████████▋ | 10002/11526 [1:44:45<1:49:17, 4.30s/it] 87%|████████▋ | 10003/11526 [1:44:46<1:21:08, 3.20s/it] {'loss': 0.1278, 'grad_norm': 0.5709829330444336, 'learning_rate': 5.225370039193944e-07, 'epoch': 2.6}
87%|████████▋ | 10003/11526 [1:44:46<1:21:08, 3.20s/it] 87%|████████▋ | 10004/11526 [1:44:46<1:01:25, 2.42s/it] {'loss': 0.1407, 'grad_norm': 0.5411698818206787, 'learning_rate': 5.218632245305982e-07, 'epoch': 2.6}
87%|████████▋ | 10004/11526 [1:44:46<1:01:25, 2.42s/it] 87%|████████▋ | 10005/11526 [1:44:47<47:38, 1.88s/it] {'loss': 0.1944, 'grad_norm': 0.6547865271568298, 'learning_rate': 5.211898559019873e-07, 'epoch': 2.6}
87%|████████▋ | 10005/11526 [1:44:47<47:38, 1.88s/it] 87%|████████▋ | 10006/11526 [1:44:47<38:01, 1.50s/it] {'loss': 0.1093, 'grad_norm': 0.4714069068431854, 'learning_rate': 5.205168980953241e-07, 'epoch': 2.6}
87%|████████▋ | 10006/11526 [1:44:48<38:01, 1.50s/it] 87%|████████▋ | 10007/11526 [1:44:48<31:16, 1.24s/it] {'loss': 0.1476, 'grad_norm': 0.6637877225875854, 'learning_rate': 5.198443511723395e-07, 'epoch': 2.6}
87%|████████▋ | 10007/11526 [1:44:48<31:16, 1.24s/it] 87%|████████▋ | 10008/11526 [1:44:49<26:32, 1.05s/it] {'loss': 0.1756, 'grad_norm': 0.6164794564247131, 'learning_rate': 5.191722151947227e-07, 'epoch': 2.6}
87%|████████▋ | 10008/11526 [1:44:49<26:32, 1.05s/it] 87%|████████▋ | 10009/11526 [1:44:49<23:13, 1.09it/s] {'loss': 0.1291, 'grad_norm': 0.5119339823722839, 'learning_rate': 5.185004902241241e-07, 'epoch': 2.61}
87%|████████▋ | 10009/11526 [1:44:49<23:13, 1.09it/s] 87%|████████▋ | 10010/11526 [1:44:50<20:54, 1.21it/s] {'loss': 0.1724, 'grad_norm': 0.6492407917976379, 'learning_rate': 5.178291763221593e-07, 'epoch': 2.61}
87%|████████▋ | 10010/11526 [1:44:50<20:54, 1.21it/s] 87%|████████▋ | 10011/11526 [1:44:50<19:18, 1.31it/s] {'loss': 0.1671, 'grad_norm': 0.6558176875114441, 'learning_rate': 5.171582735504054e-07, 'epoch': 2.61}
87%|████████▋ | 10011/11526 [1:44:51<19:18, 1.31it/s] 87%|████████▋ | 10012/11526 [1:44:51<18:09, 1.39it/s] {'loss': 0.1636, 'grad_norm': 0.6715126037597656, 'learning_rate': 5.164877819704006e-07, 'epoch': 2.61}
87%|████████▋ | 10012/11526 [1:44:51<18:09, 1.39it/s] 87%|████████▋ | 10013/11526 [1:44:52<17:21, 1.45it/s] {'loss': 0.1166, 'grad_norm': 0.49093785881996155, 'learning_rate': 5.158177016436478e-07, 'epoch': 2.61}
87%|████████▋ | 10013/11526 [1:44:52<17:21, 1.45it/s] 87%|████████▋ | 10014/11526 [1:44:52<16:46, 1.50it/s] {'loss': 0.1619, 'grad_norm': 0.6236550211906433, 'learning_rate': 5.151480326316072e-07, 'epoch': 2.61}
87%|████████▋ | 10014/11526 [1:44:52<16:46, 1.50it/s] 87%|████████▋ | 10015/11526 [1:44:53<16:22, 1.54it/s] {'loss': 0.1807, 'grad_norm': 0.6722577214241028, 'learning_rate': 5.144787749957075e-07, 'epoch': 2.61}
87%|████████▋ | 10015/11526 [1:44:53<16:22, 1.54it/s] 87%|████████▋ | 10016/11526 [1:44:54<16:08, 1.56it/s] {'loss': 0.1296, 'grad_norm': 0.6742727160453796, 'learning_rate': 5.138099287973375e-07, 'epoch': 2.61}
87%|████████▋ | 10016/11526 [1:44:54<16:08, 1.56it/s] 87%|████████▋ | 10017/11526 [1:44:54<15:55, 1.58it/s] {'loss': 0.1847, 'grad_norm': 0.6796334385871887, 'learning_rate': 5.131414940978452e-07, 'epoch': 2.61}
87%|████████▋ | 10017/11526 [1:44:54<15:55, 1.58it/s] 87%|████████▋ | 10018/11526 [1:44:55<15:46, 1.59it/s] {'loss': 0.1427, 'grad_norm': 0.5993272066116333, 'learning_rate': 5.124734709585439e-07, 'epoch': 2.61}
87%|████████▋ | 10018/11526 [1:44:55<15:46, 1.59it/s] 87%|████████▋ | 10019/11526 [1:44:55<15:39, 1.60it/s] {'loss': 0.1349, 'grad_norm': 0.6190671324729919, 'learning_rate': 5.118058594407093e-07, 'epoch': 2.61}
87%|████████▋ | 10019/11526 [1:44:56<15:39, 1.60it/s] 87%|████████▋ | 10020/11526 [1:44:56<15:34, 1.61it/s] {'loss': 0.1295, 'grad_norm': 0.541154682636261, 'learning_rate': 5.111386596055779e-07, 'epoch': 2.61}
87%|████████▋ | 10020/11526 [1:44:56<15:34, 1.61it/s] 87%|████████▋ | 10021/11526 [1:44:57<15:32, 1.61it/s] {'loss': 0.1565, 'grad_norm': 0.7618451714515686, 'learning_rate': 5.104718715143497e-07, 'epoch': 2.61}
87%|████████▋ | 10021/11526 [1:44:57<15:32, 1.61it/s] 87%|████████▋ | 10022/11526 [1:44:57<15:29, 1.62it/s] {'loss': 0.1899, 'grad_norm': 0.6855851411819458, 'learning_rate': 5.098054952281856e-07, 'epoch': 2.61}
87%|████████▋ | 10022/11526 [1:44:57<15:29, 1.62it/s] 87%|████████▋ | 10023/11526 [1:44:58<15:27, 1.62it/s] {'loss': 0.213, 'grad_norm': 0.6819066405296326, 'learning_rate': 5.091395308082081e-07, 'epoch': 2.61}
87%|████████▋ | 10023/11526 [1:44:58<15:27, 1.62it/s] 87%|████████▋ | 10024/11526 [1:44:58<15:26, 1.62it/s] {'loss': 0.148, 'grad_norm': 0.5302940607070923, 'learning_rate': 5.084739783155068e-07, 'epoch': 2.61}
87%|████████▋ | 10024/11526 [1:44:59<15:26, 1.62it/s] 87%|████████▋ | 10025/11526 [1:44:59<15:25, 1.62it/s] {'loss': 0.1608, 'grad_norm': 0.676024317741394, 'learning_rate': 5.078088378111273e-07, 'epoch': 2.61}
87%|████████▋ | 10025/11526 [1:44:59<15:25, 1.62it/s] 87%|████████▋ | 10026/11526 [1:45:00<15:28, 1.62it/s] {'loss': 0.1724, 'grad_norm': 0.6819131374359131, 'learning_rate': 5.071441093560803e-07, 'epoch': 2.61}
87%|████████▋ | 10026/11526 [1:45:00<15:28, 1.62it/s] 87%|████████▋ | 10027/11526 [1:45:00<15:26, 1.62it/s] {'loss': 0.1273, 'grad_norm': 0.5131519436836243, 'learning_rate': 5.064797930113402e-07, 'epoch': 2.61}
87%|████████▋ | 10027/11526 [1:45:00<15:26, 1.62it/s] 87%|████████▋ | 10028/11526 [1:45:01<15:24, 1.62it/s] {'loss': 0.1537, 'grad_norm': 0.6238626837730408, 'learning_rate': 5.058158888378384e-07, 'epoch': 2.61}
87%|████████▋ | 10028/11526 [1:45:01<15:24, 1.62it/s] 87%|████████▋ | 10029/11526 [1:45:02<15:22, 1.62it/s] {'loss': 0.1684, 'grad_norm': 0.640746533870697, 'learning_rate': 5.051523968964761e-07, 'epoch': 2.61}
87%|████████▋ | 10029/11526 [1:45:02<15:22, 1.62it/s] 87%|████████▋ | 10030/11526 [1:45:02<15:20, 1.62it/s] {'loss': 0.1507, 'grad_norm': 0.5591544508934021, 'learning_rate': 5.044893172481097e-07, 'epoch': 2.61}
87%|████████▋ | 10030/11526 [1:45:02<15:20, 1.62it/s] 87%|████████▋ | 10031/11526 [1:45:03<15:21, 1.62it/s] {'loss': 0.1416, 'grad_norm': 0.7503154873847961, 'learning_rate': 5.03826649953561e-07, 'epoch': 2.61}
87%|████████▋ | 10031/11526 [1:45:03<15:21, 1.62it/s] 87%|████████▋ | 10032/11526 [1:45:03<15:19, 1.62it/s] {'loss': 0.1522, 'grad_norm': 0.5465666055679321, 'learning_rate': 5.031643950736148e-07, 'epoch': 2.61}
87%|████████▋ | 10032/11526 [1:45:04<15:19, 1.62it/s] 87%|████████▋ | 10033/11526 [1:45:04<15:18, 1.63it/s] {'loss': 0.1397, 'grad_norm': 0.5202392935752869, 'learning_rate': 5.025025526690158e-07, 'epoch': 2.61}
87%|████████▋ | 10033/11526 [1:45:04<15:18, 1.63it/s] 87%|████████▋ | 10034/11526 [1:45:05<15:17, 1.63it/s] {'loss': 0.1254, 'grad_norm': 0.5965646505355835, 'learning_rate': 5.018411228004727e-07, 'epoch': 2.61}
87%|████████▋ | 10034/11526 [1:45:05<15:17, 1.63it/s] 87%|████████▋ | 10035/11526 [1:45:05<15:17, 1.62it/s] {'loss': 0.1283, 'grad_norm': 0.5489080548286438, 'learning_rate': 5.011801055286563e-07, 'epoch': 2.61}
87%|████████▋ | 10035/11526 [1:45:05<15:17, 1.62it/s] 87%|████████▋ | 10036/11526 [1:45:06<15:16, 1.62it/s] {'loss': 0.1516, 'grad_norm': 0.5852118134498596, 'learning_rate': 5.005195009141966e-07, 'epoch': 2.61}
87%|████████▋ | 10036/11526 [1:45:06<15:16, 1.62it/s] 87%|████████▋ | 10037/11526 [1:45:06<15:15, 1.63it/s] {'loss': 0.1684, 'grad_norm': 0.7132872939109802, 'learning_rate': 4.998593090176895e-07, 'epoch': 2.61}
87%|████████▋ | 10037/11526 [1:45:07<15:15, 1.63it/s] 87%|████████▋ | 10038/11526 [1:45:07<15:14, 1.63it/s] {'loss': 0.1569, 'grad_norm': 0.5835288763046265, 'learning_rate': 4.99199529899691e-07, 'epoch': 2.61}
87%|████████▋ | 10038/11526 [1:45:07<15:14, 1.63it/s] 87%|████████▋ | 10039/11526 [1:45:08<15:14, 1.63it/s] {'loss': 0.1967, 'grad_norm': 0.7417382597923279, 'learning_rate': 4.985401636207204e-07, 'epoch': 2.61}
87%|████████▋ | 10039/11526 [1:45:08<15:14, 1.63it/s] 87%|████████▋ | 10040/11526 [1:45:08<15:13, 1.63it/s] {'loss': 0.171, 'grad_norm': 0.6250292062759399, 'learning_rate': 4.978812102412589e-07, 'epoch': 2.61}
87%|████████▋ | 10040/11526 [1:45:08<15:13, 1.63it/s] 87%|████████▋ | 10041/11526 [1:45:09<15:14, 1.62it/s] {'loss': 0.1429, 'grad_norm': 0.5790244936943054, 'learning_rate': 4.972226698217475e-07, 'epoch': 2.61}
87%|████████▋ | 10041/11526 [1:45:09<15:14, 1.62it/s] 87%|████████▋ | 10042/11526 [1:45:10<15:13, 1.63it/s] {'loss': 0.1321, 'grad_norm': 0.4696795642375946, 'learning_rate': 4.965645424225928e-07, 'epoch': 2.61}
87%|████████▋ | 10042/11526 [1:45:10<15:13, 1.63it/s] 87%|████████▋ | 10043/11526 [1:45:10<15:12, 1.63it/s] {'loss': 0.1771, 'grad_norm': 0.786300003528595, 'learning_rate': 4.959068281041629e-07, 'epoch': 2.61}
87%|████████▋ | 10043/11526 [1:45:10<15:12, 1.63it/s] 87%|████████▋ | 10044/11526 [1:45:11<15:11, 1.63it/s] {'loss': 0.2249, 'grad_norm': 0.9997169375419617, 'learning_rate': 4.952495269267855e-07, 'epoch': 2.61}
87%|████████▋ | 10044/11526 [1:45:11<15:11, 1.63it/s] 87%|████████▋ | 10045/11526 [1:45:11<15:10, 1.63it/s] {'loss': 0.1563, 'grad_norm': 0.7291524410247803, 'learning_rate': 4.945926389507522e-07, 'epoch': 2.61}
87%|████████▋ | 10045/11526 [1:45:12<15:10, 1.63it/s] 87%|████████▋ | 10046/11526 [1:45:12<15:15, 1.62it/s] {'loss': 0.1737, 'grad_norm': 0.6559407711029053, 'learning_rate': 4.939361642363166e-07, 'epoch': 2.61}
87%|████████▋ | 10046/11526 [1:45:12<15:15, 1.62it/s] 87%|████████▋ | 10047/11526 [1:45:13<15:12, 1.62it/s] {'loss': 0.1388, 'grad_norm': 0.5750887989997864, 'learning_rate': 4.932801028436945e-07, 'epoch': 2.62}
87%|████████▋ | 10047/11526 [1:45:13<15:12, 1.62it/s] 87%|████████▋ | 10048/11526 [1:45:13<15:10, 1.62it/s] {'loss': 0.1522, 'grad_norm': 0.5569429397583008, 'learning_rate': 4.926244548330645e-07, 'epoch': 2.62}
87%|████████▋ | 10048/11526 [1:45:13<15:10, 1.62it/s] 87%|████████▋ | 10049/11526 [1:45:14<15:08, 1.63it/s] {'loss': 0.1642, 'grad_norm': 0.5681940317153931, 'learning_rate': 4.919692202645642e-07, 'epoch': 2.62}
87%|████████▋ | 10049/11526 [1:45:14<15:08, 1.63it/s] 87%|████████▋ | 10050/11526 [1:45:14<15:07, 1.63it/s] {'loss': 0.1847, 'grad_norm': 0.7129523158073425, 'learning_rate': 4.91314399198296e-07, 'epoch': 2.62}
87%|████████▋ | 10050/11526 [1:45:15<15:07, 1.63it/s] 87%|████████▋ | 10051/11526 [1:45:15<15:07, 1.63it/s] {'loss': 0.1232, 'grad_norm': 0.5348122119903564, 'learning_rate': 4.906599916943266e-07, 'epoch': 2.62}
87%|████████▋ | 10051/11526 [1:45:15<15:07, 1.63it/s] 87%|████████▋ | 10052/11526 [1:45:16<15:06, 1.63it/s] {'loss': 0.1307, 'grad_norm': 0.5128976702690125, 'learning_rate': 4.900059978126787e-07, 'epoch': 2.62}
87%|████████▋ | 10052/11526 [1:45:16<15:06, 1.63it/s] 87%|████████▋ | 10053/11526 [1:45:16<15:05, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.5597420930862427, 'learning_rate': 4.893524176133413e-07, 'epoch': 2.62}
87%|████████▋ | 10053/11526 [1:45:16<15:05, 1.63it/s] 87%|████████▋ | 10054/11526 [1:45:17<15:04, 1.63it/s] {'loss': 0.1434, 'grad_norm': 0.5531116724014282, 'learning_rate': 4.886992511562655e-07, 'epoch': 2.62}
87%|████████▋ | 10054/11526 [1:45:17<15:04, 1.63it/s] 87%|████████▋ | 10055/11526 [1:45:18<15:04, 1.63it/s] {'loss': 0.1968, 'grad_norm': 0.6751054525375366, 'learning_rate': 4.880464985013611e-07, 'epoch': 2.62}
87%|████████▋ | 10055/11526 [1:45:18<15:04, 1.63it/s] 87%|████████▋ | 10056/11526 [1:45:18<15:04, 1.62it/s] {'loss': 0.1366, 'grad_norm': 0.5515098571777344, 'learning_rate': 4.873941597085058e-07, 'epoch': 2.62}
87%|████████▋ | 10056/11526 [1:45:18<15:04, 1.62it/s] 87%|████████▋ | 10057/11526 [1:45:19<15:03, 1.63it/s] {'loss': 0.1412, 'grad_norm': 0.545378565788269, 'learning_rate': 4.867422348375328e-07, 'epoch': 2.62}
87%|████████▋ | 10057/11526 [1:45:19<15:03, 1.63it/s] 87%|████████▋ | 10058/11526 [1:45:19<15:01, 1.63it/s] {'loss': 0.1289, 'grad_norm': 0.5554912686347961, 'learning_rate': 4.86090723948241e-07, 'epoch': 2.62}
87%|████████▋ | 10058/11526 [1:45:20<15:01, 1.63it/s] 87%|████████▋ | 10059/11526 [1:45:20<15:01, 1.63it/s] {'loss': 0.1203, 'grad_norm': 0.5542355179786682, 'learning_rate': 4.854396271003914e-07, 'epoch': 2.62}
87%|████████▋ | 10059/11526 [1:45:20<15:01, 1.63it/s] 87%|████████▋ | 10060/11526 [1:45:21<15:00, 1.63it/s] {'loss': 0.1419, 'grad_norm': 0.601361870765686, 'learning_rate': 4.847889443537057e-07, 'epoch': 2.62}
87%|████████▋ | 10060/11526 [1:45:21<15:00, 1.63it/s] 87%|████████▋ | 10061/11526 [1:45:21<15:05, 1.62it/s] {'loss': 0.1394, 'grad_norm': 0.6468157768249512, 'learning_rate': 4.84138675767869e-07, 'epoch': 2.62}
87%|████████▋ | 10061/11526 [1:45:21<15:05, 1.62it/s] 87%|████████▋ | 10062/11526 [1:45:22<15:03, 1.62it/s] {'loss': 0.1408, 'grad_norm': 0.5689977407455444, 'learning_rate': 4.83488821402528e-07, 'epoch': 2.62}
87%|████████▋ | 10062/11526 [1:45:22<15:03, 1.62it/s] 87%|████████▋ | 10063/11526 [1:45:22<15:01, 1.62it/s] {'loss': 0.1228, 'grad_norm': 0.4866063594818115, 'learning_rate': 4.828393813172883e-07, 'epoch': 2.62}
87%|████████▋ | 10063/11526 [1:45:23<15:01, 1.62it/s] 87%|████████▋ | 10064/11526 [1:45:23<14:59, 1.62it/s] {'loss': 0.1612, 'grad_norm': 0.6707834005355835, 'learning_rate': 4.821903555717239e-07, 'epoch': 2.62}
87%|████████▋ | 10064/11526 [1:45:23<14:59, 1.62it/s] 87%|████████▋ | 10065/11526 [1:45:24<14:58, 1.63it/s] {'loss': 0.1712, 'grad_norm': 0.6738183498382568, 'learning_rate': 4.815417442253639e-07, 'epoch': 2.62}
87%|████████▋ | 10065/11526 [1:45:24<14:58, 1.63it/s] 87%|████████▋ | 10066/11526 [1:45:24<14:59, 1.62it/s] {'loss': 0.207, 'grad_norm': 0.7209003567695618, 'learning_rate': 4.808935473377046e-07, 'epoch': 2.62}
87%|████████▋ | 10066/11526 [1:45:24<14:59, 1.62it/s] 87%|████████▋ | 10067/11526 [1:45:25<14:58, 1.62it/s] {'loss': 0.1534, 'grad_norm': 0.6035622358322144, 'learning_rate': 4.802457649682018e-07, 'epoch': 2.62}
87%|████████▋ | 10067/11526 [1:45:25<14:58, 1.62it/s] 87%|████████▋ | 10068/11526 [1:45:26<14:57, 1.62it/s] {'loss': 0.1584, 'grad_norm': 0.575991690158844, 'learning_rate': 4.795983971762741e-07, 'epoch': 2.62}
87%|████████▋ | 10068/11526 [1:45:26<14:57, 1.62it/s] 87%|████████▋ | 10069/11526 [1:45:26<14:56, 1.63it/s] {'loss': 0.1483, 'grad_norm': 0.5809279084205627, 'learning_rate': 4.789514440213011e-07, 'epoch': 2.62}
87%|████████▋ | 10069/11526 [1:45:26<14:56, 1.63it/s] 87%|████████▋ | 10070/11526 [1:45:27<14:54, 1.63it/s] {'loss': 0.1979, 'grad_norm': 0.8196195960044861, 'learning_rate': 4.78304905562626e-07, 'epoch': 2.62}
87%|████████▋ | 10070/11526 [1:45:27<14:54, 1.63it/s] 87%|████████▋ | 10071/11526 [1:45:27<14:54, 1.63it/s] {'loss': 0.183, 'grad_norm': 0.672692060470581, 'learning_rate': 4.776587818595519e-07, 'epoch': 2.62}
87%|████████▋ | 10071/11526 [1:45:28<14:54, 1.63it/s] 87%|████████▋ | 10072/11526 [1:45:28<14:54, 1.63it/s] {'loss': 0.1242, 'grad_norm': 0.5220645070075989, 'learning_rate': 4.770130729713451e-07, 'epoch': 2.62}
87%|████████▋ | 10072/11526 [1:45:28<14:54, 1.63it/s] 87%|████████▋ | 10073/11526 [1:45:29<14:53, 1.63it/s] {'loss': 0.1288, 'grad_norm': 0.5216031670570374, 'learning_rate': 4.7636777895723406e-07, 'epoch': 2.62}
87%|████████▋ | 10073/11526 [1:45:29<14:53, 1.63it/s] 87%|████████▋ | 10074/11526 [1:45:29<14:52, 1.63it/s] {'loss': 0.1361, 'grad_norm': 0.6173685789108276, 'learning_rate': 4.7572289987640853e-07, 'epoch': 2.62}
87%|████████▋ | 10074/11526 [1:45:29<14:52, 1.63it/s] 87%|████████▋ | 10075/11526 [1:45:30<14:52, 1.63it/s] {'loss': 0.1185, 'grad_norm': 0.5139328837394714, 'learning_rate': 4.7507843578802114e-07, 'epoch': 2.62}
87%|████████▋ | 10075/11526 [1:45:30<14:52, 1.63it/s] 87%|████████▋ | 10076/11526 [1:45:30<14:52, 1.62it/s] {'loss': 0.1495, 'grad_norm': 0.6462364792823792, 'learning_rate': 4.744343867511858e-07, 'epoch': 2.62}
87%|████████▋ | 10076/11526 [1:45:31<14:52, 1.62it/s] 87%|████████▋ | 10077/11526 [1:45:31<14:51, 1.63it/s] {'loss': 0.2016, 'grad_norm': 0.8228598833084106, 'learning_rate': 4.7379075282497635e-07, 'epoch': 2.62}
87%|████████▋ | 10077/11526 [1:45:31<14:51, 1.63it/s] 87%|████████▋ | 10078/11526 [1:45:32<14:51, 1.62it/s] {'loss': 0.1713, 'grad_norm': 0.6673421263694763, 'learning_rate': 4.731475340684344e-07, 'epoch': 2.62}
87%|████████▋ | 10078/11526 [1:45:32<14:51, 1.62it/s] 87%|████████▋ | 10079/11526 [1:45:32<14:50, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.6394652128219604, 'learning_rate': 4.7250473054055555e-07, 'epoch': 2.62}
87%|████████▋ | 10079/11526 [1:45:32<14:50, 1.63it/s] 87%|████████▋ | 10080/11526 [1:45:33<14:48, 1.63it/s] {'loss': 0.1939, 'grad_norm': 0.7280358672142029, 'learning_rate': 4.718623423003038e-07, 'epoch': 2.62}
87%|████████▋ | 10080/11526 [1:45:33<14:48, 1.63it/s] 87%|████████▋ | 10081/11526 [1:45:34<14:49, 1.62it/s] {'loss': 0.1636, 'grad_norm': 0.645695149898529, 'learning_rate': 4.71220369406602e-07, 'epoch': 2.62}
87%|████████▋ | 10081/11526 [1:45:34<14:49, 1.62it/s] 87%|████████▋ | 10082/11526 [1:45:34<14:48, 1.63it/s] {'loss': 0.1088, 'grad_norm': 0.46256640553474426, 'learning_rate': 4.705788119183352e-07, 'epoch': 2.62}
87%|████████▋ | 10082/11526 [1:45:34<14:48, 1.63it/s] 87%|████████▋ | 10083/11526 [1:45:35<14:47, 1.63it/s] {'loss': 0.1614, 'grad_norm': 0.701056957244873, 'learning_rate': 4.6993766989435144e-07, 'epoch': 2.62}
87%|████████▋ | 10083/11526 [1:45:35<14:47, 1.63it/s] 87%|████████▋ | 10084/11526 [1:45:35<14:47, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.6329087615013123, 'learning_rate': 4.6929694339346076e-07, 'epoch': 2.62}
87%|████████▋ | 10084/11526 [1:45:36<14:47, 1.63it/s] 87%|████████▋ | 10085/11526 [1:45:36<14:45, 1.63it/s] {'loss': 0.1192, 'grad_norm': 0.4698686897754669, 'learning_rate': 4.686566324744318e-07, 'epoch': 2.62}
87%|████████▋ | 10085/11526 [1:45:36<14:45, 1.63it/s] 88%|████████▊ | 10086/11526 [1:45:37<14:46, 1.62it/s] {'loss': 0.1168, 'grad_norm': 0.46002063155174255, 'learning_rate': 4.680167371959987e-07, 'epoch': 2.63}
88%|████████▊ | 10086/11526 [1:45:37<14:46, 1.62it/s] 88%|████████▊ | 10087/11526 [1:45:37<14:44, 1.63it/s] {'loss': 0.1786, 'grad_norm': 0.6839166879653931, 'learning_rate': 4.673772576168567e-07, 'epoch': 2.63}
88%|████████▊ | 10087/11526 [1:45:37<14:44, 1.63it/s] 88%|████████▊ | 10088/11526 [1:45:38<14:45, 1.62it/s] {'loss': 0.1424, 'grad_norm': 0.5850992202758789, 'learning_rate': 4.6673819379566164e-07, 'epoch': 2.63}
88%|████████▊ | 10088/11526 [1:45:38<14:45, 1.62it/s] 88%|████████▊ | 10089/11526 [1:45:38<14:44, 1.62it/s] {'loss': 0.1587, 'grad_norm': 0.6720302104949951, 'learning_rate': 4.6609954579103387e-07, 'epoch': 2.63}
88%|████████▊ | 10089/11526 [1:45:39<14:44, 1.62it/s] 88%|████████▊ | 10090/11526 [1:45:39<14:43, 1.62it/s] {'loss': 0.1254, 'grad_norm': 0.478282630443573, 'learning_rate': 4.65461313661551e-07, 'epoch': 2.63}
88%|████████▊ | 10090/11526 [1:45:39<14:43, 1.62it/s] 88%|████████▊ | 10091/11526 [1:45:40<14:43, 1.62it/s] {'loss': 0.1389, 'grad_norm': 0.8324794173240662, 'learning_rate': 4.6482349746575783e-07, 'epoch': 2.63}
88%|████████▊ | 10091/11526 [1:45:40<14:43, 1.62it/s] 88%|████████▊ | 10092/11526 [1:45:40<14:41, 1.63it/s] {'loss': 0.1791, 'grad_norm': 0.8360331654548645, 'learning_rate': 4.6418609726215813e-07, 'epoch': 2.63}
88%|████████▊ | 10092/11526 [1:45:40<14:41, 1.63it/s] 88%|████████▊ | 10093/11526 [1:45:41<14:41, 1.63it/s] {'loss': 0.1475, 'grad_norm': 0.5654125809669495, 'learning_rate': 4.635491131092162e-07, 'epoch': 2.63}
88%|████████▊ | 10093/11526 [1:45:41<14:41, 1.63it/s] 88%|████████▊ | 10094/11526 [1:45:42<14:40, 1.63it/s] {'loss': 0.1463, 'grad_norm': 0.6117399334907532, 'learning_rate': 4.6291254506536154e-07, 'epoch': 2.63}
88%|████████▊ | 10094/11526 [1:45:42<14:40, 1.63it/s] 88%|████████▊ | 10095/11526 [1:45:42<14:39, 1.63it/s] {'loss': 0.1409, 'grad_norm': 0.5695201754570007, 'learning_rate': 4.6227639318898287e-07, 'epoch': 2.63}
88%|████████▊ | 10095/11526 [1:45:42<14:39, 1.63it/s] 88%|████████▊ | 10096/11526 [1:45:43<14:39, 1.63it/s] {'loss': 0.1219, 'grad_norm': 0.49685773253440857, 'learning_rate': 4.616406575384325e-07, 'epoch': 2.63}
88%|████████▊ | 10096/11526 [1:45:43<14:39, 1.63it/s] 88%|████████▊ | 10097/11526 [1:45:43<14:38, 1.63it/s] {'loss': 0.1427, 'grad_norm': 0.9298902153968811, 'learning_rate': 4.6100533817202366e-07, 'epoch': 2.63}
88%|████████▊ | 10097/11526 [1:45:44<14:38, 1.63it/s] 88%|████████▊ | 10098/11526 [1:45:44<14:37, 1.63it/s] {'loss': 0.1483, 'grad_norm': 0.5522202253341675, 'learning_rate': 4.6037043514802984e-07, 'epoch': 2.63}
88%|████████▊ | 10098/11526 [1:45:44<14:37, 1.63it/s] 88%|████████▊ | 10099/11526 [1:45:45<14:36, 1.63it/s] {'loss': 0.1323, 'grad_norm': 0.5623466372489929, 'learning_rate': 4.597359485246888e-07, 'epoch': 2.63}
88%|████████▊ | 10099/11526 [1:45:45<14:36, 1.63it/s] 88%|████████▊ | 10100/11526 [1:45:45<14:35, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.6395969986915588, 'learning_rate': 4.591018783602008e-07, 'epoch': 2.63}
88%|████████▊ | 10100/11526 [1:45:45<14:35, 1.63it/s] 88%|████████▊ | 10101/11526 [1:45:46<14:36, 1.63it/s] {'loss': 0.1385, 'grad_norm': 0.4853271543979645, 'learning_rate': 4.584682247127248e-07, 'epoch': 2.63}
88%|████████▊ | 10101/11526 [1:45:46<14:36, 1.63it/s] 88%|████████▊ | 10102/11526 [1:45:46<14:35, 1.63it/s] {'loss': 0.1562, 'grad_norm': 0.7610353231430054, 'learning_rate': 4.5783498764038317e-07, 'epoch': 2.63}
88%|████████▊ | 10102/11526 [1:45:47<14:35, 1.63it/s] 88%|████████▊ | 10103/11526 [1:45:47<14:34, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5933364033699036, 'learning_rate': 4.5720216720126055e-07, 'epoch': 2.63}
88%|████████▊ | 10103/11526 [1:45:47<14:34, 1.63it/s] 88%|████████▊ | 10104/11526 [1:45:48<14:33, 1.63it/s] {'loss': 0.1753, 'grad_norm': 0.6841654777526855, 'learning_rate': 4.5656976345340277e-07, 'epoch': 2.63}
88%|████████▊ | 10104/11526 [1:45:48<14:33, 1.63it/s] 88%|████████▊ | 10105/11526 [1:45:48<14:33, 1.63it/s] {'loss': 0.148, 'grad_norm': 0.6065412759780884, 'learning_rate': 4.5593777645481786e-07, 'epoch': 2.63}
88%|████████▊ | 10105/11526 [1:45:48<14:33, 1.63it/s] 88%|████████▊ | 10106/11526 [1:45:49<14:33, 1.63it/s] {'loss': 0.1253, 'grad_norm': 0.5082294940948486, 'learning_rate': 4.553062062634739e-07, 'epoch': 2.63}
88%|████████▊ | 10106/11526 [1:45:49<14:33, 1.63it/s] 88%|████████▊ | 10107/11526 [1:45:50<14:32, 1.63it/s] {'loss': 0.1835, 'grad_norm': 0.6114698052406311, 'learning_rate': 4.5467505293730283e-07, 'epoch': 2.63}
88%|████████▊ | 10107/11526 [1:45:50<14:32, 1.63it/s] 88%|████████▊ | 10108/11526 [1:45:50<14:32, 1.63it/s] {'loss': 0.1789, 'grad_norm': 0.6001296639442444, 'learning_rate': 4.540443165341979e-07, 'epoch': 2.63}
88%|████████▊ | 10108/11526 [1:45:50<14:32, 1.63it/s] 88%|████████▊ | 10109/11526 [1:45:51<14:31, 1.63it/s] {'loss': 0.1422, 'grad_norm': 0.5362748503684998, 'learning_rate': 4.5341399711201384e-07, 'epoch': 2.63}
88%|████████▊ | 10109/11526 [1:45:51<14:31, 1.63it/s] 88%|████████▊ | 10110/11526 [1:45:51<14:30, 1.63it/s] {'loss': 0.1428, 'grad_norm': 0.5685347318649292, 'learning_rate': 4.5278409472856664e-07, 'epoch': 2.63}
88%|████████▊ | 10110/11526 [1:45:52<14:30, 1.63it/s] 88%|████████▊ | 10111/11526 [1:45:52<14:31, 1.62it/s] {'loss': 0.1231, 'grad_norm': 0.5221661329269409, 'learning_rate': 4.5215460944163623e-07, 'epoch': 2.63}
88%|████████▊ | 10111/11526 [1:45:52<14:31, 1.62it/s] 88%|████████▊ | 10112/11526 [1:45:53<14:30, 1.62it/s] {'loss': 0.1626, 'grad_norm': 0.6945394277572632, 'learning_rate': 4.515255413089592e-07, 'epoch': 2.63}
88%|████████▊ | 10112/11526 [1:45:53<14:30, 1.62it/s] 88%|████████▊ | 10113/11526 [1:45:53<14:29, 1.63it/s] {'loss': 0.1385, 'grad_norm': 0.5659116506576538, 'learning_rate': 4.5089689038824103e-07, 'epoch': 2.63}
88%|████████▊ | 10113/11526 [1:45:53<14:29, 1.63it/s] 88%|████████▊ | 10114/11526 [1:45:54<14:28, 1.62it/s] {'loss': 0.1571, 'grad_norm': 0.6006914973258972, 'learning_rate': 4.5026865673714224e-07, 'epoch': 2.63}
88%|████████▊ | 10114/11526 [1:45:54<14:28, 1.62it/s] 88%|████████▊ | 10115/11526 [1:45:54<14:27, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.653684675693512, 'learning_rate': 4.496408404132896e-07, 'epoch': 2.63}
88%|████████▊ | 10115/11526 [1:45:55<14:27, 1.63it/s] 88%|████████▊ | 10116/11526 [1:45:55<14:31, 1.62it/s] {'loss': 0.1416, 'grad_norm': 0.6285542249679565, 'learning_rate': 4.4901344147426975e-07, 'epoch': 2.63}
88%|████████▊ | 10116/11526 [1:45:55<14:31, 1.62it/s] 88%|████████▊ | 10117/11526 [1:45:56<14:29, 1.62it/s] {'loss': 0.1389, 'grad_norm': 0.5674165487289429, 'learning_rate': 4.4838645997763e-07, 'epoch': 2.63}
88%|████████▊ | 10117/11526 [1:45:56<14:29, 1.62it/s] 88%|████████▊ | 10118/11526 [1:45:56<14:27, 1.62it/s] {'loss': 0.1484, 'grad_norm': 0.6642251014709473, 'learning_rate': 4.4775989598088265e-07, 'epoch': 2.63}
88%|████████▊ | 10118/11526 [1:45:56<14:27, 1.62it/s] 88%|████████▊ | 10119/11526 [1:45:57<14:25, 1.63it/s] {'loss': 0.1415, 'grad_norm': 0.6085973381996155, 'learning_rate': 4.47133749541499e-07, 'epoch': 2.63}
88%|████████▊ | 10119/11526 [1:45:57<14:25, 1.63it/s] 88%|████████▊ | 10120/11526 [1:45:58<14:25, 1.63it/s] {'loss': 0.1121, 'grad_norm': 0.4855416715145111, 'learning_rate': 4.4650802071691255e-07, 'epoch': 2.63}
88%|████████▊ | 10120/11526 [1:45:58<14:25, 1.63it/s] 88%|████████▊ | 10121/11526 [1:45:58<14:25, 1.62it/s] {'loss': 0.1618, 'grad_norm': 0.6298828721046448, 'learning_rate': 4.458827095645185e-07, 'epoch': 2.63}
88%|████████▊ | 10121/11526 [1:45:58<14:25, 1.62it/s] 88%|████████▊ | 10122/11526 [1:45:59<14:23, 1.63it/s] {'loss': 0.1616, 'grad_norm': 0.5532139539718628, 'learning_rate': 4.4525781614167375e-07, 'epoch': 2.63}
88%|████████▊ | 10122/11526 [1:45:59<14:23, 1.63it/s] 88%|████████▊ | 10123/11526 [1:45:59<14:23, 1.63it/s] {'loss': 0.1736, 'grad_norm': 0.6255818009376526, 'learning_rate': 4.4463334050569796e-07, 'epoch': 2.63}
88%|████████▊ | 10123/11526 [1:46:00<14:23, 1.63it/s] 88%|████████▊ | 10124/11526 [1:46:00<14:22, 1.63it/s] {'loss': 0.128, 'grad_norm': 0.5058354139328003, 'learning_rate': 4.44009282713872e-07, 'epoch': 2.64}
88%|████████▊ | 10124/11526 [1:46:00<14:22, 1.63it/s] 88%|████████▊ | 10125/11526 [1:46:01<14:21, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.6198517680168152, 'learning_rate': 4.433856428234362e-07, 'epoch': 2.64}
88%|████████▊ | 10125/11526 [1:46:01<14:21, 1.63it/s] 88%|████████▊ | 10126/11526 [1:46:01<14:21, 1.62it/s] {'loss': 0.134, 'grad_norm': 0.5307363271713257, 'learning_rate': 4.4276242089159426e-07, 'epoch': 2.64}
88%|████████▊ | 10126/11526 [1:46:01<14:21, 1.62it/s] 88%|████████▊ | 10127/11526 [1:46:02<14:20, 1.63it/s] {'loss': 0.1636, 'grad_norm': 0.6500470638275146, 'learning_rate': 4.4213961697551434e-07, 'epoch': 2.64}
88%|████████▊ | 10127/11526 [1:46:02<14:20, 1.63it/s] 88%|████████▊ | 10128/11526 [1:46:02<14:19, 1.63it/s] {'loss': 0.1518, 'grad_norm': 0.541702389717102, 'learning_rate': 4.415172311323213e-07, 'epoch': 2.64}
88%|████████▊ | 10128/11526 [1:46:03<14:19, 1.63it/s] 88%|████████▊ | 10129/11526 [1:46:03<14:18, 1.63it/s] {'loss': 0.1258, 'grad_norm': 0.48311614990234375, 'learning_rate': 4.408952634191044e-07, 'epoch': 2.64}
88%|████████▊ | 10129/11526 [1:46:03<14:18, 1.63it/s] 88%|████████▊ | 10130/11526 [1:46:04<14:17, 1.63it/s] {'loss': 0.1599, 'grad_norm': 0.6052907109260559, 'learning_rate': 4.4027371389291416e-07, 'epoch': 2.64}
88%|████████▊ | 10130/11526 [1:46:04<14:17, 1.63it/s] 88%|████████▊ | 10131/11526 [1:46:04<14:17, 1.63it/s] {'loss': 0.1493, 'grad_norm': 0.6083870530128479, 'learning_rate': 4.3965258261076284e-07, 'epoch': 2.64}
88%|████████▊ | 10131/11526 [1:46:04<14:17, 1.63it/s] 88%|████████▊ | 10132/11526 [1:46:05<14:17, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.5870916843414307, 'learning_rate': 4.390318696296247e-07, 'epoch': 2.64}
88%|████████▊ | 10132/11526 [1:46:05<14:17, 1.63it/s] 88%|████████▊ | 10133/11526 [1:46:06<14:16, 1.63it/s] {'loss': 0.1755, 'grad_norm': 0.6798765659332275, 'learning_rate': 4.384115750064333e-07, 'epoch': 2.64}
88%|████████▊ | 10133/11526 [1:46:06<14:16, 1.63it/s] 88%|████████▊ | 10134/11526 [1:46:06<14:15, 1.63it/s] {'loss': 0.1838, 'grad_norm': 0.622603178024292, 'learning_rate': 4.377916987980868e-07, 'epoch': 2.64}
88%|████████▊ | 10134/11526 [1:46:06<14:15, 1.63it/s] 88%|████████▊ | 10135/11526 [1:46:07<14:14, 1.63it/s] {'loss': 0.1369, 'grad_norm': 0.5531359314918518, 'learning_rate': 4.3717224106144374e-07, 'epoch': 2.64}
88%|████████▊ | 10135/11526 [1:46:07<14:14, 1.63it/s] 88%|████████▊ | 10136/11526 [1:46:07<14:15, 1.62it/s] {'loss': 0.2016, 'grad_norm': 0.6748229265213013, 'learning_rate': 4.3655320185332417e-07, 'epoch': 2.64}
88%|████████▊ | 10136/11526 [1:46:08<14:15, 1.62it/s] 88%|████████▊ | 10137/11526 [1:46:08<14:14, 1.63it/s] {'loss': 0.1577, 'grad_norm': 0.565331220626831, 'learning_rate': 4.359345812305094e-07, 'epoch': 2.64}
88%|████████▊ | 10137/11526 [1:46:08<14:14, 1.63it/s] 88%|████████▊ | 10138/11526 [1:46:09<14:12, 1.63it/s] {'loss': 0.1512, 'grad_norm': 0.5901281833648682, 'learning_rate': 4.3531637924974444e-07, 'epoch': 2.64}
88%|████████▊ | 10138/11526 [1:46:09<14:12, 1.63it/s] 88%|████████▊ | 10139/11526 [1:46:09<14:12, 1.63it/s] {'loss': 0.1547, 'grad_norm': 0.5979196429252625, 'learning_rate': 4.3469859596773136e-07, 'epoch': 2.64}
88%|████████▊ | 10139/11526 [1:46:09<14:12, 1.63it/s] 88%|████████▊ | 10140/11526 [1:46:10<14:11, 1.63it/s] {'loss': 0.1299, 'grad_norm': 0.5265081524848938, 'learning_rate': 4.340812314411397e-07, 'epoch': 2.64}
88%|████████▊ | 10140/11526 [1:46:10<14:11, 1.63it/s] 88%|████████▊ | 10141/11526 [1:46:10<14:12, 1.63it/s] {'loss': 0.129, 'grad_norm': 0.5370705723762512, 'learning_rate': 4.3346428572659693e-07, 'epoch': 2.64}
88%|████████▊ | 10141/11526 [1:46:11<14:12, 1.63it/s] 88%|████████▊ | 10142/11526 [1:46:11<14:11, 1.63it/s] {'loss': 0.1935, 'grad_norm': 0.7494415044784546, 'learning_rate': 4.3284775888069174e-07, 'epoch': 2.64}
88%|████████▊ | 10142/11526 [1:46:11<14:11, 1.63it/s] 88%|████████▊ | 10143/11526 [1:46:12<14:10, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.6799545884132385, 'learning_rate': 4.322316509599761e-07, 'epoch': 2.64}
88%|████████▊ | 10143/11526 [1:46:12<14:10, 1.63it/s] 88%|████████▊ | 10144/11526 [1:46:12<14:08, 1.63it/s] {'loss': 0.1297, 'grad_norm': 0.5269397497177124, 'learning_rate': 4.316159620209631e-07, 'epoch': 2.64}
88%|████████▊ | 10144/11526 [1:46:12<14:08, 1.63it/s] 88%|████████▊ | 10145/11526 [1:46:13<14:07, 1.63it/s] {'loss': 0.188, 'grad_norm': 0.7005610466003418, 'learning_rate': 4.3100069212012653e-07, 'epoch': 2.64}
88%|████████▊ | 10145/11526 [1:46:13<14:07, 1.63it/s] 88%|████████▊ | 10146/11526 [1:46:14<14:10, 1.62it/s] {'loss': 0.1579, 'grad_norm': 0.5919159054756165, 'learning_rate': 4.3038584131390446e-07, 'epoch': 2.64}
88%|████████▊ | 10146/11526 [1:46:14<14:10, 1.62it/s] 88%|████████▊ | 10147/11526 [1:46:14<14:09, 1.62it/s] {'loss': 0.1674, 'grad_norm': 0.6715533137321472, 'learning_rate': 4.297714096586919e-07, 'epoch': 2.64}
88%|████████▊ | 10147/11526 [1:46:14<14:09, 1.62it/s] 88%|████████▊ | 10148/11526 [1:46:15<14:08, 1.62it/s] {'loss': 0.1359, 'grad_norm': 0.5611398220062256, 'learning_rate': 4.2915739721084873e-07, 'epoch': 2.64}
88%|████████▊ | 10148/11526 [1:46:15<14:08, 1.62it/s] 88%|████████▊ | 10149/11526 [1:46:15<14:06, 1.63it/s] {'loss': 0.1171, 'grad_norm': 0.49009260535240173, 'learning_rate': 4.285438040266976e-07, 'epoch': 2.64}
88%|████████▊ | 10149/11526 [1:46:16<14:06, 1.63it/s] 88%|████████▊ | 10150/11526 [1:46:16<14:06, 1.63it/s] {'loss': 0.179, 'grad_norm': 0.6221065521240234, 'learning_rate': 4.27930630162518e-07, 'epoch': 2.64}
88%|████████▊ | 10150/11526 [1:46:16<14:06, 1.63it/s] 88%|████████▊ | 10151/11526 [1:46:17<14:06, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.5148258805274963, 'learning_rate': 4.273178756745555e-07, 'epoch': 2.64}
88%|████████▊ | 10151/11526 [1:46:17<14:06, 1.63it/s] 88%|████████▊ | 10152/11526 [1:46:17<14:05, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.505533754825592, 'learning_rate': 4.2670554061901513e-07, 'epoch': 2.64}
88%|████████▊ | 10152/11526 [1:46:17<14:05, 1.63it/s] 88%|████████▊ | 10153/11526 [1:46:18<14:04, 1.63it/s] {'loss': 0.1268, 'grad_norm': 0.7555171251296997, 'learning_rate': 4.260936250520631e-07, 'epoch': 2.64}
88%|████████▊ | 10153/11526 [1:46:18<14:04, 1.63it/s] 88%|████████▊ | 10154/11526 [1:46:18<14:03, 1.63it/s] {'loss': 0.1662, 'grad_norm': 0.6167718768119812, 'learning_rate': 4.2548212902982886e-07, 'epoch': 2.64}
88%|████████▊ | 10154/11526 [1:46:19<14:03, 1.63it/s] 88%|████████▊ | 10155/11526 [1:46:19<14:02, 1.63it/s] {'loss': 0.1727, 'grad_norm': 0.6134907007217407, 'learning_rate': 4.2487105260840155e-07, 'epoch': 2.64}
88%|████████▊ | 10155/11526 [1:46:19<14:02, 1.63it/s] 88%|████████▊ | 10156/11526 [1:46:20<14:06, 1.62it/s] {'loss': 0.147, 'grad_norm': 0.5541855096817017, 'learning_rate': 4.242603958438324e-07, 'epoch': 2.64}
88%|████████▊ | 10156/11526 [1:46:20<14:06, 1.62it/s] 88%|████████▊ | 10157/11526 [1:46:20<14:04, 1.62it/s] {'loss': 0.1377, 'grad_norm': 0.540647029876709, 'learning_rate': 4.2365015879213436e-07, 'epoch': 2.64}
88%|████████▊ | 10157/11526 [1:46:20<14:04, 1.62it/s] 88%|████████▊ | 10158/11526 [1:46:21<14:02, 1.62it/s] {'loss': 0.1354, 'grad_norm': 0.5652116537094116, 'learning_rate': 4.230403415092821e-07, 'epoch': 2.64}
88%|████████▊ | 10158/11526 [1:46:21<14:02, 1.62it/s] 88%|████████▊ | 10159/11526 [1:46:22<14:00, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.5857281684875488, 'learning_rate': 4.2243094405121197e-07, 'epoch': 2.64}
88%|████████▊ | 10159/11526 [1:46:22<14:00, 1.63it/s] 88%|████████▊ | 10160/11526 [1:46:22<13:59, 1.63it/s] {'loss': 0.1093, 'grad_norm': 0.4367840886116028, 'learning_rate': 4.218219664738216e-07, 'epoch': 2.64}
88%|████████▊ | 10160/11526 [1:46:22<13:59, 1.63it/s] 88%|████████▊ | 10161/11526 [1:46:23<14:04, 1.62it/s] {'loss': 0.1388, 'grad_norm': 0.5507321357727051, 'learning_rate': 4.2121340883296725e-07, 'epoch': 2.64}
88%|████████▊ | 10161/11526 [1:46:23<14:04, 1.62it/s] 88%|████████▊ | 10162/11526 [1:46:23<14:01, 1.62it/s] {'loss': 0.1559, 'grad_norm': 0.7307253479957581, 'learning_rate': 4.2060527118447324e-07, 'epoch': 2.64}
88%|████████▊ | 10162/11526 [1:46:24<14:01, 1.62it/s] 88%|████████▊ | 10163/11526 [1:46:24<14:00, 1.62it/s] {'loss': 0.1608, 'grad_norm': 0.5614312887191772, 'learning_rate': 4.1999755358411833e-07, 'epoch': 2.65}
88%|████████▊ | 10163/11526 [1:46:24<14:00, 1.62it/s] 88%|████████▊ | 10164/11526 [1:46:25<13:59, 1.62it/s] {'loss': 0.1405, 'grad_norm': 0.5389180779457092, 'learning_rate': 4.193902560876473e-07, 'epoch': 2.65}
88%|████████▊ | 10164/11526 [1:46:25<13:59, 1.62it/s] 88%|████████▊ | 10165/11526 [1:46:25<13:57, 1.63it/s] {'loss': 0.1521, 'grad_norm': 0.6624463796615601, 'learning_rate': 4.1878337875076556e-07, 'epoch': 2.65}
88%|████████▊ | 10165/11526 [1:46:25<13:57, 1.63it/s] 88%|████████▊ | 10166/11526 [1:46:26<14:00, 1.62it/s] {'loss': 0.1417, 'grad_norm': 0.5760588645935059, 'learning_rate': 4.1817692162913595e-07, 'epoch': 2.65}
88%|████████▊ | 10166/11526 [1:46:26<14:00, 1.62it/s] 88%|████████▊ | 10167/11526 [1:46:26<13:58, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.5699082016944885, 'learning_rate': 4.175708847783899e-07, 'epoch': 2.65}
88%|████████▊ | 10167/11526 [1:46:27<13:58, 1.62it/s] 88%|████████▊ | 10168/11526 [1:46:27<13:56, 1.62it/s] {'loss': 0.1817, 'grad_norm': 0.6175193190574646, 'learning_rate': 4.1696526825411644e-07, 'epoch': 2.65}
88%|████████▊ | 10168/11526 [1:46:27<13:56, 1.62it/s] 88%|████████▊ | 10169/11526 [1:46:28<13:55, 1.62it/s] {'loss': 0.1404, 'grad_norm': 0.5545672178268433, 'learning_rate': 4.163600721118638e-07, 'epoch': 2.65}
88%|████████▊ | 10169/11526 [1:46:28<13:55, 1.62it/s] 88%|████████▊ | 10170/11526 [1:46:28<13:54, 1.62it/s] {'loss': 0.1219, 'grad_norm': 0.4818403720855713, 'learning_rate': 4.1575529640714486e-07, 'epoch': 2.65}
88%|████████▊ | 10170/11526 [1:46:28<13:54, 1.62it/s] 88%|████████▊ | 10171/11526 [1:46:29<13:54, 1.62it/s] {'loss': 0.1446, 'grad_norm': 0.5669809579849243, 'learning_rate': 4.151509411954335e-07, 'epoch': 2.65}
88%|████████▊ | 10171/11526 [1:46:29<13:54, 1.62it/s] 88%|████████▊ | 10172/11526 [1:46:30<13:54, 1.62it/s] {'loss': 0.2021, 'grad_norm': 0.7411043047904968, 'learning_rate': 4.14547006532165e-07, 'epoch': 2.65}
88%|████████▊ | 10172/11526 [1:46:30<13:54, 1.62it/s] 88%|████████▊ | 10173/11526 [1:46:30<13:53, 1.62it/s] {'loss': 0.1536, 'grad_norm': 0.600202739238739, 'learning_rate': 4.139434924727359e-07, 'epoch': 2.65}
88%|████████▊ | 10173/11526 [1:46:30<13:53, 1.62it/s] 88%|████████▊ | 10174/11526 [1:46:31<13:52, 1.62it/s] {'loss': 0.1678, 'grad_norm': 0.7008655071258545, 'learning_rate': 4.1334039907250214e-07, 'epoch': 2.65}
88%|████████▊ | 10174/11526 [1:46:31<13:52, 1.62it/s] 88%|████████▊ | 10175/11526 [1:46:31<13:51, 1.63it/s] {'loss': 0.1562, 'grad_norm': 0.6560593247413635, 'learning_rate': 4.1273772638678323e-07, 'epoch': 2.65}
88%|████████▊ | 10175/11526 [1:46:32<13:51, 1.63it/s] 88%|████████▊ | 10176/11526 [1:46:32<13:51, 1.62it/s] {'loss': 0.153, 'grad_norm': 0.6231926679611206, 'learning_rate': 4.121354744708622e-07, 'epoch': 2.65}
88%|████████▊ | 10176/11526 [1:46:32<13:51, 1.62it/s] 88%|████████▊ | 10177/11526 [1:46:33<13:49, 1.63it/s] {'loss': 0.1657, 'grad_norm': 0.6694580912590027, 'learning_rate': 4.1153364337997825e-07, 'epoch': 2.65}
88%|████████▊ | 10177/11526 [1:46:33<13:49, 1.63it/s] 88%|████████▊ | 10178/11526 [1:46:33<13:48, 1.63it/s] {'loss': 0.0981, 'grad_norm': 0.44302457571029663, 'learning_rate': 4.1093223316933606e-07, 'epoch': 2.65}
88%|████████▊ | 10178/11526 [1:46:33<13:48, 1.63it/s] 88%|████████▊ | 10179/11526 [1:46:34<13:48, 1.63it/s] {'loss': 0.1612, 'grad_norm': 0.8093985915184021, 'learning_rate': 4.103312438941004e-07, 'epoch': 2.65}
88%|████████▊ | 10179/11526 [1:46:34<13:48, 1.63it/s] 88%|████████▊ | 10180/11526 [1:46:34<13:47, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.5728739500045776, 'learning_rate': 4.097306756093972e-07, 'epoch': 2.65}
88%|████████▊ | 10180/11526 [1:46:35<13:47, 1.63it/s] 88%|████████▊ | 10181/11526 [1:46:35<13:47, 1.63it/s] {'loss': 0.1168, 'grad_norm': 0.46188589930534363, 'learning_rate': 4.091305283703145e-07, 'epoch': 2.65}
88%|████████▊ | 10181/11526 [1:46:35<13:47, 1.63it/s] 88%|████████▊ | 10182/11526 [1:46:36<13:46, 1.63it/s] {'loss': 0.1137, 'grad_norm': 0.5337797999382019, 'learning_rate': 4.085308022319001e-07, 'epoch': 2.65}
88%|████████▊ | 10182/11526 [1:46:36<13:46, 1.63it/s] 88%|████████▊ | 10183/11526 [1:46:36<13:45, 1.63it/s] {'loss': 0.1441, 'grad_norm': 0.5939705967903137, 'learning_rate': 4.079314972491644e-07, 'epoch': 2.65}
88%|████████▊ | 10183/11526 [1:46:36<13:45, 1.63it/s] 88%|████████▊ | 10184/11526 [1:46:37<13:44, 1.63it/s] {'loss': 0.155, 'grad_norm': 0.5641891956329346, 'learning_rate': 4.0733261347707996e-07, 'epoch': 2.65}
88%|████████▊ | 10184/11526 [1:46:37<13:44, 1.63it/s] 88%|████████▊ | 10185/11526 [1:46:38<13:44, 1.63it/s] {'loss': 0.1906, 'grad_norm': 0.6680192351341248, 'learning_rate': 4.0673415097057965e-07, 'epoch': 2.65}
88%|████████▊ | 10185/11526 [1:46:38<13:44, 1.63it/s] 88%|████████▊ | 10186/11526 [1:46:38<13:44, 1.63it/s] {'loss': 0.1195, 'grad_norm': 0.4788076877593994, 'learning_rate': 4.061361097845573e-07, 'epoch': 2.65}
88%|████████▊ | 10186/11526 [1:46:38<13:44, 1.63it/s] 88%|████████▊ | 10187/11526 [1:46:39<13:43, 1.63it/s] {'loss': 0.1712, 'grad_norm': 0.6024636030197144, 'learning_rate': 4.0553848997386956e-07, 'epoch': 2.65}
88%|████████▊ | 10187/11526 [1:46:39<13:43, 1.63it/s] 88%|████████▊ | 10188/11526 [1:46:39<13:42, 1.63it/s] {'loss': 0.1295, 'grad_norm': 0.517299473285675, 'learning_rate': 4.049412915933315e-07, 'epoch': 2.65}
88%|████████▊ | 10188/11526 [1:46:40<13:42, 1.63it/s] 88%|████████▊ | 10189/11526 [1:46:40<13:41, 1.63it/s] {'loss': 0.1543, 'grad_norm': 0.5582489967346191, 'learning_rate': 4.0434451469772486e-07, 'epoch': 2.65}
88%|████████▊ | 10189/11526 [1:46:40<13:41, 1.63it/s] 88%|████████▊ | 10190/11526 [1:46:41<13:40, 1.63it/s] {'loss': 0.147, 'grad_norm': 0.5692045092582703, 'learning_rate': 4.037481593417869e-07, 'epoch': 2.65}
88%|████████▊ | 10190/11526 [1:46:41<13:40, 1.63it/s] 88%|████████▊ | 10191/11526 [1:46:41<13:41, 1.63it/s] {'loss': 0.1364, 'grad_norm': 0.5230595469474792, 'learning_rate': 4.031522255802189e-07, 'epoch': 2.65}
88%|████████▊ | 10191/11526 [1:46:41<13:41, 1.63it/s] 88%|████████▊ | 10192/11526 [1:46:42<13:40, 1.63it/s] {'loss': 0.1905, 'grad_norm': 0.7933785915374756, 'learning_rate': 4.0255671346768444e-07, 'epoch': 2.65}
88%|████████▊ | 10192/11526 [1:46:42<13:40, 1.63it/s] 88%|████████▊ | 10193/11526 [1:46:42<13:39, 1.63it/s] {'loss': 0.1984, 'grad_norm': 0.6975765228271484, 'learning_rate': 4.0196162305880525e-07, 'epoch': 2.65}
88%|████████▊ | 10193/11526 [1:46:43<13:39, 1.63it/s] 88%|████████▊ | 10194/11526 [1:46:43<13:38, 1.63it/s] {'loss': 0.1833, 'grad_norm': 0.684385359287262, 'learning_rate': 4.013669544081683e-07, 'epoch': 2.65}
88%|████████▊ | 10194/11526 [1:46:43<13:38, 1.63it/s] 88%|████████▊ | 10195/11526 [1:46:44<13:38, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.5903049111366272, 'learning_rate': 4.007727075703205e-07, 'epoch': 2.65}
88%|████████▊ | 10195/11526 [1:46:44<13:38, 1.63it/s] 88%|████████▊ | 10196/11526 [1:46:44<13:38, 1.62it/s] {'loss': 0.1441, 'grad_norm': 0.6134189963340759, 'learning_rate': 4.0017888259976767e-07, 'epoch': 2.65}
88%|████████▊ | 10196/11526 [1:46:44<13:38, 1.62it/s] 88%|████████▊ | 10197/11526 [1:46:45<13:38, 1.62it/s] {'loss': 0.171, 'grad_norm': 0.6716485023498535, 'learning_rate': 3.995854795509796e-07, 'epoch': 2.65}
88%|████████▊ | 10197/11526 [1:46:45<13:38, 1.62it/s] 88%|████████▊ | 10198/11526 [1:46:46<13:37, 1.62it/s] {'loss': 0.1586, 'grad_norm': 0.690006673336029, 'learning_rate': 3.989924984783866e-07, 'epoch': 2.65}
88%|████████▊ | 10198/11526 [1:46:46<13:37, 1.62it/s] 88%|████████▊ | 10199/11526 [1:46:46<13:36, 1.63it/s] {'loss': 0.154, 'grad_norm': 0.6399322152137756, 'learning_rate': 3.983999394363802e-07, 'epoch': 2.65}
88%|████████▊ | 10199/11526 [1:46:46<13:36, 1.63it/s] 88%|████████▊ | 10200/11526 [1:46:47<13:35, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.6279355883598328, 'learning_rate': 3.978078024793136e-07, 'epoch': 2.65}
88%|████████▊ | 10200/11526 [1:46:47<13:35, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.31it/s]
31%|███ | 4/13 [00:00<00:01, 8.38it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.79it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.75it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.72it/s]
100%|██████████| 13/13 [00:01<00:00, 6.73it/s]
{'eval_loss': 0.5422788858413696, 'eval_runtime': 1.9606, 'eval_samples_per_second': 102.008, 'eval_steps_per_second': 6.631, 'epoch': 2.65}
88%|████████▊ | 10200/11526 [1:46:49<13:35, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.73it/s]
 89%|████████▊ | 10201/11526 [1:46:49<26:37, 1.21s/it] {'loss': 0.1418, 'grad_norm': 0.5790537595748901, 'learning_rate': 3.9721608766150054e-07, 'epoch': 2.66}
89%|████████▊ | 10201/11526 [1:46:49<26:37, 1.21s/it] 89%|████████▊ | 10202/11526 [1:46:50<22:46, 1.03s/it] {'loss': 0.1621, 'grad_norm': 0.6223523616790771, 'learning_rate': 3.966247950372171e-07, 'epoch': 2.66}
89%|████████▊ | 10202/11526 [1:46:50<22:46, 1.03s/it] 89%|████████▊ | 10203/11526 [1:46:51<19:59, 1.10it/s] {'loss': 0.125, 'grad_norm': 0.49099811911582947, 'learning_rate': 3.960339246606998e-07, 'epoch': 2.66}
89%|████████▊ | 10203/11526 [1:46:51<19:59, 1.10it/s] 89%|████████▊ | 10204/11526 [1:46:51<18:03, 1.22it/s] {'loss': 0.1968, 'grad_norm': 0.6942924857139587, 'learning_rate': 3.954434765861459e-07, 'epoch': 2.66}
89%|████████▊ | 10204/11526 [1:46:51<18:03, 1.22it/s] 89%|████████▊ | 10205/11526 [1:46:52<16:40, 1.32it/s] {'loss': 0.1276, 'grad_norm': 0.5688415169715881, 'learning_rate': 3.948534508677154e-07, 'epoch': 2.66}
89%|████████▊ | 10205/11526 [1:46:52<16:40, 1.32it/s] 89%|████████▊ | 10206/11526 [1:46:52<15:43, 1.40it/s] {'loss': 0.1598, 'grad_norm': 0.5788910388946533, 'learning_rate': 3.942638475595284e-07, 'epoch': 2.66}
89%|████████▊ | 10206/11526 [1:46:53<15:43, 1.40it/s] 89%|████████▊ | 10207/11526 [1:46:53<15:02, 1.46it/s] {'loss': 0.1945, 'grad_norm': 0.6700693368911743, 'learning_rate': 3.9367466671566654e-07, 'epoch': 2.66}
89%|████████▊ | 10207/11526 [1:46:53<15:02, 1.46it/s] 89%|████████▊ | 10208/11526 [1:46:54<14:35, 1.51it/s] {'loss': 0.1482, 'grad_norm': 0.47797125577926636, 'learning_rate': 3.9308590839017335e-07, 'epoch': 2.66}
89%|████████▊ | 10208/11526 [1:46:54<14:35, 1.51it/s] 89%|████████▊ | 10209/11526 [1:46:54<14:15, 1.54it/s] {'loss': 0.1423, 'grad_norm': 0.6112256646156311, 'learning_rate': 3.924975726370539e-07, 'epoch': 2.66}
89%|████████▊ | 10209/11526 [1:46:54<14:15, 1.54it/s] 89%|████████▊ | 10210/11526 [1:46:55<14:00, 1.56it/s] {'loss': 0.1359, 'grad_norm': 0.5064730644226074, 'learning_rate': 3.919096595102706e-07, 'epoch': 2.66}
89%|████████▊ | 10210/11526 [1:46:55<14:00, 1.56it/s] 89%|████████▊ | 10211/11526 [1:46:56<13:52, 1.58it/s] {'loss': 0.1312, 'grad_norm': 0.5838262438774109, 'learning_rate': 3.9132216906375476e-07, 'epoch': 2.66}
89%|████████▊ | 10211/11526 [1:46:56<13:52, 1.58it/s] 89%|████████▊ | 10212/11526 [1:46:56<13:44, 1.59it/s] {'loss': 0.1298, 'grad_norm': 0.5254925489425659, 'learning_rate': 3.907351013513905e-07, 'epoch': 2.66}
89%|████████▊ | 10212/11526 [1:46:56<13:44, 1.59it/s] 89%|████████▊ | 10213/11526 [1:46:57<13:38, 1.60it/s] {'loss': 0.163, 'grad_norm': 0.663987398147583, 'learning_rate': 3.9014845642702805e-07, 'epoch': 2.66}
89%|████████▊ | 10213/11526 [1:46:57<13:38, 1.60it/s] 89%|████████▊ | 10214/11526 [1:46:57<13:34, 1.61it/s] {'loss': 0.1796, 'grad_norm': 0.7661430835723877, 'learning_rate': 3.8956223434447936e-07, 'epoch': 2.66}
89%|████████▊ | 10214/11526 [1:46:57<13:34, 1.61it/s] 89%|████████▊ | 10215/11526 [1:46:58<13:31, 1.62it/s] {'loss': 0.1507, 'grad_norm': 0.574590265750885, 'learning_rate': 3.8897643515751315e-07, 'epoch': 2.66}
89%|████████▊ | 10215/11526 [1:46:58<13:31, 1.62it/s] 89%|████████▊ | 10216/11526 [1:46:59<13:30, 1.62it/s] {'loss': 0.1331, 'grad_norm': 0.5947352051734924, 'learning_rate': 3.8839105891986473e-07, 'epoch': 2.66}
89%|████████▊ | 10216/11526 [1:46:59<13:30, 1.62it/s] 89%|████████▊ | 10217/11526 [1:46:59<13:27, 1.62it/s] {'loss': 0.1222, 'grad_norm': 0.4531765580177307, 'learning_rate': 3.8780610568522836e-07, 'epoch': 2.66}
89%|████████▊ | 10217/11526 [1:46:59<13:27, 1.62it/s] 89%|████████▊ | 10218/11526 [1:47:00<13:26, 1.62it/s] {'loss': 0.1753, 'grad_norm': 0.6693168878555298, 'learning_rate': 3.8722157550725727e-07, 'epoch': 2.66}
89%|████████▊ | 10218/11526 [1:47:00<13:26, 1.62it/s] 89%|████████▊ | 10219/11526 [1:47:00<13:24, 1.62it/s] {'loss': 0.1461, 'grad_norm': 0.6177449822425842, 'learning_rate': 3.8663746843956907e-07, 'epoch': 2.66}
89%|████████▊ | 10219/11526 [1:47:01<13:24, 1.62it/s] 89%|████████▊ | 10220/11526 [1:47:01<13:23, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.6314576268196106, 'learning_rate': 3.8605378453574094e-07, 'epoch': 2.66}
89%|████████▊ | 10220/11526 [1:47:01<13:23, 1.63it/s] 89%|████████▊ | 10221/11526 [1:47:02<13:23, 1.62it/s] {'loss': 0.1571, 'grad_norm': 0.6202382445335388, 'learning_rate': 3.854705238493123e-07, 'epoch': 2.66}
89%|████████▊ | 10221/11526 [1:47:02<13:23, 1.62it/s] 89%|████████▊ | 10222/11526 [1:47:02<13:22, 1.62it/s] {'loss': 0.1527, 'grad_norm': 0.5890020728111267, 'learning_rate': 3.848876864337836e-07, 'epoch': 2.66}
89%|████████▊ | 10222/11526 [1:47:02<13:22, 1.62it/s] 89%|████████▊ | 10223/11526 [1:47:03<13:21, 1.63it/s] {'loss': 0.1956, 'grad_norm': 0.7775847315788269, 'learning_rate': 3.843052723426144e-07, 'epoch': 2.66}
89%|████████▊ | 10223/11526 [1:47:03<13:21, 1.63it/s] 89%|████████▊ | 10224/11526 [1:47:04<13:20, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.6235383749008179, 'learning_rate': 3.8372328162922635e-07, 'epoch': 2.66}
89%|████████▊ | 10224/11526 [1:47:04<13:20, 1.63it/s] 89%|████████▊ | 10225/11526 [1:47:04<13:19, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.4743465781211853, 'learning_rate': 3.8314171434700674e-07, 'epoch': 2.66}
89%|████████▊ | 10225/11526 [1:47:04<13:19, 1.63it/s] 89%|████████▊ | 10226/11526 [1:47:05<13:19, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5331733226776123, 'learning_rate': 3.8256057054929685e-07, 'epoch': 2.66}
89%|████████▊ | 10226/11526 [1:47:05<13:19, 1.63it/s] 89%|████████▊ | 10227/11526 [1:47:05<13:18, 1.63it/s] {'loss': 0.1393, 'grad_norm': 0.6027427315711975, 'learning_rate': 3.8197985028940397e-07, 'epoch': 2.66}
89%|████████▊ | 10227/11526 [1:47:05<13:18, 1.63it/s] 89%|████████▊ | 10228/11526 [1:47:06<13:17, 1.63it/s] {'loss': 0.1705, 'grad_norm': 0.7138509154319763, 'learning_rate': 3.8139955362059445e-07, 'epoch': 2.66}
89%|████████▊ | 10228/11526 [1:47:06<13:17, 1.63it/s] 89%|████████▊ | 10229/11526 [1:47:07<13:16, 1.63it/s] {'loss': 0.1493, 'grad_norm': 0.6033118963241577, 'learning_rate': 3.808196805960962e-07, 'epoch': 2.66}
89%|████████▊ | 10229/11526 [1:47:07<13:16, 1.63it/s] 89%|████████▉ | 10230/11526 [1:47:07<13:16, 1.63it/s] {'loss': 0.189, 'grad_norm': 0.6328792572021484, 'learning_rate': 3.8024023126910005e-07, 'epoch': 2.66}
89%|████████▉ | 10230/11526 [1:47:07<13:16, 1.63it/s] 89%|████████▉ | 10231/11526 [1:47:08<13:18, 1.62it/s] {'loss': 0.1141, 'grad_norm': 0.47567299008369446, 'learning_rate': 3.7966120569275455e-07, 'epoch': 2.66}
89%|████████▉ | 10231/11526 [1:47:08<13:18, 1.62it/s] 89%|████████▉ | 10232/11526 [1:47:08<13:17, 1.62it/s] {'loss': 0.1391, 'grad_norm': 0.5770949125289917, 'learning_rate': 3.790826039201717e-07, 'epoch': 2.66}
89%|████████▉ | 10232/11526 [1:47:09<13:17, 1.62it/s] 89%|████████▉ | 10233/11526 [1:47:09<13:16, 1.62it/s] {'loss': 0.1445, 'grad_norm': 0.5634247660636902, 'learning_rate': 3.785044260044246e-07, 'epoch': 2.66}
89%|████████▉ | 10233/11526 [1:47:09<13:16, 1.62it/s] 89%|████████▉ | 10234/11526 [1:47:10<13:14, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.6478899121284485, 'learning_rate': 3.779266719985469e-07, 'epoch': 2.66}
89%|████████▉ | 10234/11526 [1:47:10<13:14, 1.63it/s] 89%|████████▉ | 10235/11526 [1:47:10<13:14, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5985693335533142, 'learning_rate': 3.7734934195553296e-07, 'epoch': 2.66}
89%|████████▉ | 10235/11526 [1:47:10<13:14, 1.63it/s] 89%|████████▉ | 10236/11526 [1:47:11<13:14, 1.62it/s] {'loss': 0.1471, 'grad_norm': 0.6041471362113953, 'learning_rate': 3.767724359283398e-07, 'epoch': 2.66}
89%|████████▉ | 10236/11526 [1:47:11<13:14, 1.62it/s] 89%|████████▉ | 10237/11526 [1:47:12<13:13, 1.63it/s] {'loss': 0.1717, 'grad_norm': 0.6094442009925842, 'learning_rate': 3.761959539698828e-07, 'epoch': 2.66}
89%|████████▉ | 10237/11526 [1:47:12<13:13, 1.63it/s] 89%|████████▉ | 10238/11526 [1:47:12<13:12, 1.63it/s] {'loss': 0.1188, 'grad_norm': 0.4741837680339813, 'learning_rate': 3.7561989613304264e-07, 'epoch': 2.66}
89%|████████▉ | 10238/11526 [1:47:12<13:12, 1.63it/s] 89%|████████▉ | 10239/11526 [1:47:13<13:12, 1.62it/s] {'loss': 0.1296, 'grad_norm': 0.5567692518234253, 'learning_rate': 3.750442624706563e-07, 'epoch': 2.67}
89%|████████▉ | 10239/11526 [1:47:13<13:12, 1.62it/s] 89%|████████▉ | 10240/11526 [1:47:13<13:10, 1.63it/s] {'loss': 0.1389, 'grad_norm': 0.5233206152915955, 'learning_rate': 3.744690530355255e-07, 'epoch': 2.67}
89%|████████▉ | 10240/11526 [1:47:13<13:10, 1.63it/s] 89%|████████▉ | 10241/11526 [1:47:14<13:10, 1.62it/s] {'loss': 0.1803, 'grad_norm': 0.7437691688537598, 'learning_rate': 3.7389426788041196e-07, 'epoch': 2.67}
89%|████████▉ | 10241/11526 [1:47:14<13:10, 1.62it/s] 89%|████████▉ | 10242/11526 [1:47:15<13:09, 1.63it/s] {'loss': 0.1435, 'grad_norm': 0.514503538608551, 'learning_rate': 3.7331990705803567e-07, 'epoch': 2.67}
89%|████████▉ | 10242/11526 [1:47:15<13:09, 1.63it/s] 89%|████████▉ | 10243/11526 [1:47:15<13:09, 1.63it/s] {'loss': 0.1367, 'grad_norm': 0.5220460295677185, 'learning_rate': 3.727459706210834e-07, 'epoch': 2.67}
89%|████████▉ | 10243/11526 [1:47:15<13:09, 1.63it/s] 89%|████████▉ | 10244/11526 [1:47:16<13:07, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.5617389678955078, 'learning_rate': 3.7217245862219975e-07, 'epoch': 2.67}
89%|████████▉ | 10244/11526 [1:47:16<13:07, 1.63it/s] 89%|████████▉ | 10245/11526 [1:47:16<13:06, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.5512149333953857, 'learning_rate': 3.7159937111398815e-07, 'epoch': 2.67}
89%|████████▉ | 10245/11526 [1:47:17<13:06, 1.63it/s] 89%|████████▉ | 10246/11526 [1:47:17<13:07, 1.63it/s] {'loss': 0.1746, 'grad_norm': 0.66811603307724, 'learning_rate': 3.710267081490171e-07, 'epoch': 2.67}
89%|████████▉ | 10246/11526 [1:47:17<13:07, 1.63it/s] 89%|████████▉ | 10247/11526 [1:47:18<13:06, 1.63it/s] {'loss': 0.1368, 'grad_norm': 0.5757827162742615, 'learning_rate': 3.7045446977981347e-07, 'epoch': 2.67}
89%|████████▉ | 10247/11526 [1:47:18<13:06, 1.63it/s] 89%|████████▉ | 10248/11526 [1:47:18<13:05, 1.63it/s] {'loss': 0.1504, 'grad_norm': 0.7727900743484497, 'learning_rate': 3.6988265605886755e-07, 'epoch': 2.67}
89%|████████▉ | 10248/11526 [1:47:18<13:05, 1.63it/s] 89%|████████▉ | 10249/11526 [1:47:19<13:05, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.5144652128219604, 'learning_rate': 3.693112670386295e-07, 'epoch': 2.67}
89%|████████▉ | 10249/11526 [1:47:19<13:05, 1.63it/s] 89%|████████▉ | 10250/11526 [1:47:20<13:07, 1.62it/s] {'loss': 0.1211, 'grad_norm': 0.503502368927002, 'learning_rate': 3.687403027715075e-07, 'epoch': 2.67}
89%|████████▉ | 10250/11526 [1:47:20<13:07, 1.62it/s] 89%|████████▉ | 10251/11526 [1:47:20<13:05, 1.62it/s] {'loss': 0.141, 'grad_norm': 0.5250975489616394, 'learning_rate': 3.681697633098769e-07, 'epoch': 2.67}
89%|████████▉ | 10251/11526 [1:47:20<13:05, 1.62it/s] 89%|████████▉ | 10252/11526 [1:47:21<13:05, 1.62it/s] {'loss': 0.1591, 'grad_norm': 0.621990442276001, 'learning_rate': 3.6759964870607065e-07, 'epoch': 2.67}
89%|████████▉ | 10252/11526 [1:47:21<13:05, 1.62it/s] 89%|████████▉ | 10253/11526 [1:47:21<13:03, 1.63it/s] {'loss': 0.1101, 'grad_norm': 0.5108038783073425, 'learning_rate': 3.670299590123805e-07, 'epoch': 2.67}
89%|████████▉ | 10253/11526 [1:47:21<13:03, 1.63it/s] 89%|████████▉ | 10254/11526 [1:47:22<13:01, 1.63it/s] {'loss': 0.1845, 'grad_norm': 0.6938790678977966, 'learning_rate': 3.6646069428106336e-07, 'epoch': 2.67}
89%|████████▉ | 10254/11526 [1:47:22<13:01, 1.63it/s] 89%|████████▉ | 10255/11526 [1:47:23<13:01, 1.63it/s] {'loss': 0.1719, 'grad_norm': 0.6686766743659973, 'learning_rate': 3.658918545643353e-07, 'epoch': 2.67}
89%|████████▉ | 10255/11526 [1:47:23<13:01, 1.63it/s] 89%|████████▉ | 10256/11526 [1:47:23<13:01, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.599540114402771, 'learning_rate': 3.653234399143729e-07, 'epoch': 2.67}
89%|████████▉ | 10256/11526 [1:47:23<13:01, 1.62it/s] 89%|████████▉ | 10257/11526 [1:47:24<13:00, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.5818125605583191, 'learning_rate': 3.647554503833156e-07, 'epoch': 2.67}
89%|████████▉ | 10257/11526 [1:47:24<13:00, 1.63it/s] 89%|████████▉ | 10258/11526 [1:47:24<13:00, 1.63it/s] {'loss': 0.1462, 'grad_norm': 0.5963627696037292, 'learning_rate': 3.6418788602326174e-07, 'epoch': 2.67}
89%|████████▉ | 10258/11526 [1:47:25<13:00, 1.63it/s] 89%|████████▉ | 10259/11526 [1:47:25<12:59, 1.63it/s] {'loss': 0.1187, 'grad_norm': 0.51618891954422, 'learning_rate': 3.6362074688627014e-07, 'epoch': 2.67}
89%|████████▉ | 10259/11526 [1:47:25<12:59, 1.63it/s] 89%|████████▉ | 10260/11526 [1:47:26<12:58, 1.63it/s] {'loss': 0.1544, 'grad_norm': 0.5907660126686096, 'learning_rate': 3.6305403302436536e-07, 'epoch': 2.67}
89%|████████▉ | 10260/11526 [1:47:26<12:58, 1.63it/s] 89%|████████▉ | 10261/11526 [1:47:26<12:59, 1.62it/s] {'loss': 0.1483, 'grad_norm': 0.6020315289497375, 'learning_rate': 3.6248774448952695e-07, 'epoch': 2.67}
89%|████████▉ | 10261/11526 [1:47:26<12:59, 1.62it/s] 89%|████████▉ | 10262/11526 [1:47:27<12:57, 1.63it/s] {'loss': 0.1457, 'grad_norm': 0.5960052013397217, 'learning_rate': 3.619218813336989e-07, 'epoch': 2.67}
89%|████████▉ | 10262/11526 [1:47:27<12:57, 1.63it/s] 89%|████████▉ | 10263/11526 [1:47:28<12:56, 1.63it/s] {'loss': 0.1636, 'grad_norm': 0.6139605045318604, 'learning_rate': 3.613564436087863e-07, 'epoch': 2.67}
89%|████████▉ | 10263/11526 [1:47:28<12:56, 1.63it/s] 89%|████████▉ | 10264/11526 [1:47:28<12:55, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.522113561630249, 'learning_rate': 3.607914313666522e-07, 'epoch': 2.67}
89%|████████▉ | 10264/11526 [1:47:28<12:55, 1.63it/s] 89%|████████▉ | 10265/11526 [1:47:29<12:55, 1.63it/s] {'loss': 0.1432, 'grad_norm': 0.583115816116333, 'learning_rate': 3.6022684465912507e-07, 'epoch': 2.67}
89%|████████▉ | 10265/11526 [1:47:29<12:55, 1.63it/s] 89%|████████▉ | 10266/11526 [1:47:29<12:56, 1.62it/s] {'loss': 0.1241, 'grad_norm': 0.6294339299201965, 'learning_rate': 3.596626835379907e-07, 'epoch': 2.67}
89%|████████▉ | 10266/11526 [1:47:29<12:56, 1.62it/s] 89%|████████▉ | 10267/11526 [1:47:30<12:57, 1.62it/s] {'loss': 0.154, 'grad_norm': 0.6118584871292114, 'learning_rate': 3.5909894805499724e-07, 'epoch': 2.67}
89%|████████▉ | 10267/11526 [1:47:30<12:57, 1.62it/s] 89%|████████▉ | 10268/11526 [1:47:31<12:55, 1.62it/s] {'loss': 0.1597, 'grad_norm': 0.6477206349372864, 'learning_rate': 3.5853563826185376e-07, 'epoch': 2.67}
89%|████████▉ | 10268/11526 [1:47:31<12:55, 1.62it/s] 89%|████████▉ | 10269/11526 [1:47:31<12:54, 1.62it/s] {'loss': 0.1757, 'grad_norm': 0.5575046539306641, 'learning_rate': 3.579727542102307e-07, 'epoch': 2.67}
89%|████████▉ | 10269/11526 [1:47:31<12:54, 1.62it/s] 89%|████████▉ | 10270/11526 [1:47:32<12:52, 1.63it/s] {'loss': 0.2591, 'grad_norm': 0.7439597845077515, 'learning_rate': 3.5741029595175833e-07, 'epoch': 2.67}
89%|████████▉ | 10270/11526 [1:47:32<12:52, 1.63it/s] 89%|████████▉ | 10271/11526 [1:47:32<12:52, 1.62it/s] {'loss': 0.1593, 'grad_norm': 0.5872631072998047, 'learning_rate': 3.5684826353802995e-07, 'epoch': 2.67}
89%|████████▉ | 10271/11526 [1:47:33<12:52, 1.62it/s] 89%|████████▉ | 10272/11526 [1:47:33<12:51, 1.62it/s] {'loss': 0.1165, 'grad_norm': 0.47998589277267456, 'learning_rate': 3.562866570205964e-07, 'epoch': 2.67}
89%|████████▉ | 10272/11526 [1:47:33<12:51, 1.62it/s] 89%|████████▉ | 10273/11526 [1:47:34<12:50, 1.63it/s] {'loss': 0.1352, 'grad_norm': 0.5824412703514099, 'learning_rate': 3.557254764509721e-07, 'epoch': 2.67}
89%|████████▉ | 10273/11526 [1:47:34<12:50, 1.63it/s] 89%|████████▉ | 10274/11526 [1:47:34<12:49, 1.63it/s] {'loss': 0.1256, 'grad_norm': 0.5058976411819458, 'learning_rate': 3.55164721880632e-07, 'epoch': 2.67}
89%|████████▉ | 10274/11526 [1:47:34<12:49, 1.63it/s] 89%|████████▉ | 10275/11526 [1:47:35<12:48, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.5350432395935059, 'learning_rate': 3.546043933610116e-07, 'epoch': 2.67}
89%|████████▉ | 10275/11526 [1:47:35<12:48, 1.63it/s] 89%|████████▉ | 10276/11526 [1:47:36<12:49, 1.63it/s] {'loss': 0.1106, 'grad_norm': 0.4648890197277069, 'learning_rate': 3.5404449094350756e-07, 'epoch': 2.67}
89%|████████▉ | 10276/11526 [1:47:36<12:49, 1.63it/s] 89%|████████▉ | 10277/11526 [1:47:36<12:48, 1.63it/s] {'loss': 0.1551, 'grad_norm': 0.6060974597930908, 'learning_rate': 3.5348501467947717e-07, 'epoch': 2.67}
89%|████████▉ | 10277/11526 [1:47:36<12:48, 1.63it/s] 89%|████████▉ | 10278/11526 [1:47:37<12:47, 1.63it/s] {'loss': 0.1532, 'grad_norm': 0.5903681516647339, 'learning_rate': 3.5292596462023876e-07, 'epoch': 2.68}
89%|████████▉ | 10278/11526 [1:47:37<12:47, 1.63it/s] 89%|████████▉ | 10279/11526 [1:47:37<12:47, 1.63it/s] {'loss': 0.1346, 'grad_norm': 0.47825419902801514, 'learning_rate': 3.5236734081707245e-07, 'epoch': 2.68}
89%|████████▉ | 10279/11526 [1:47:37<12:47, 1.63it/s] 89%|████████▉ | 10280/11526 [1:47:38<12:46, 1.63it/s] {'loss': 0.1331, 'grad_norm': 0.5432494282722473, 'learning_rate': 3.5180914332121674e-07, 'epoch': 2.68}
89%|████████▉ | 10280/11526 [1:47:38<12:46, 1.63it/s] 89%|████████▉ | 10281/11526 [1:47:39<12:47, 1.62it/s] {'loss': 0.1523, 'grad_norm': 0.6154611110687256, 'learning_rate': 3.5125137218387394e-07, 'epoch': 2.68}
89%|████████▉ | 10281/11526 [1:47:39<12:47, 1.62it/s] 89%|████████▉ | 10282/11526 [1:47:39<12:45, 1.62it/s] {'loss': 0.1333, 'grad_norm': 0.5288777947425842, 'learning_rate': 3.506940274562048e-07, 'epoch': 2.68}
89%|████████▉ | 10282/11526 [1:47:39<12:45, 1.62it/s] 89%|████████▉ | 10283/11526 [1:47:40<12:44, 1.63it/s] {'loss': 0.1307, 'grad_norm': 0.5885381102561951, 'learning_rate': 3.5013710918933355e-07, 'epoch': 2.68}
89%|████████▉ | 10283/11526 [1:47:40<12:44, 1.63it/s] 89%|████████▉ | 10284/11526 [1:47:40<12:43, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.543964147567749, 'learning_rate': 3.4958061743434304e-07, 'epoch': 2.68}
89%|████████▉ | 10284/11526 [1:47:41<12:43, 1.63it/s] 89%|████████▉ | 10285/11526 [1:47:41<12:43, 1.63it/s] {'loss': 0.1516, 'grad_norm': 0.5711840987205505, 'learning_rate': 3.4902455224227914e-07, 'epoch': 2.68}
89%|████████▉ | 10285/11526 [1:47:41<12:43, 1.63it/s] 89%|████████▉ | 10286/11526 [1:47:42<12:42, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.631653368473053, 'learning_rate': 3.484689136641445e-07, 'epoch': 2.68}
89%|████████▉ | 10286/11526 [1:47:42<12:42, 1.63it/s] 89%|████████▉ | 10287/11526 [1:47:42<12:42, 1.63it/s] {'loss': 0.1377, 'grad_norm': 0.5527870655059814, 'learning_rate': 3.4791370175090876e-07, 'epoch': 2.68}
89%|████████▉ | 10287/11526 [1:47:42<12:42, 1.63it/s] 89%|████████▉ | 10288/11526 [1:47:43<12:41, 1.63it/s] {'loss': 0.2329, 'grad_norm': 0.7710928320884705, 'learning_rate': 3.4735891655349685e-07, 'epoch': 2.68}
89%|████████▉ | 10288/11526 [1:47:43<12:41, 1.63it/s] 89%|████████▉ | 10289/11526 [1:47:44<12:39, 1.63it/s] {'loss': 0.194, 'grad_norm': 0.7906025648117065, 'learning_rate': 3.4680455812279746e-07, 'epoch': 2.68}
89%|████████▉ | 10289/11526 [1:47:44<12:39, 1.63it/s] 89%|████████▉ | 10290/11526 [1:47:44<12:39, 1.63it/s] {'loss': 0.137, 'grad_norm': 0.5418352484703064, 'learning_rate': 3.462506265096605e-07, 'epoch': 2.68}
89%|████████▉ | 10290/11526 [1:47:44<12:39, 1.63it/s] 89%|████████▉ | 10291/11526 [1:47:45<12:39, 1.63it/s] {'loss': 0.1562, 'grad_norm': 0.5788001418113708, 'learning_rate': 3.45697121764893e-07, 'epoch': 2.68}
89%|████████▉ | 10291/11526 [1:47:45<12:39, 1.63it/s] 89%|████████▉ | 10292/11526 [1:47:45<12:38, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.5614686608314514, 'learning_rate': 3.4514404393926836e-07, 'epoch': 2.68}
89%|████████▉ | 10292/11526 [1:47:45<12:38, 1.63it/s] 89%|████████▉ | 10293/11526 [1:47:46<12:38, 1.63it/s] {'loss': 0.1643, 'grad_norm': 0.6482561826705933, 'learning_rate': 3.445913930835176e-07, 'epoch': 2.68}
89%|████████▉ | 10293/11526 [1:47:46<12:38, 1.63it/s] 89%|████████▉ | 10294/11526 [1:47:47<12:37, 1.63it/s] {'loss': 0.1677, 'grad_norm': 0.6336949467658997, 'learning_rate': 3.440391692483319e-07, 'epoch': 2.68}
89%|████████▉ | 10294/11526 [1:47:47<12:37, 1.63it/s] 89%|████████▉ | 10295/11526 [1:47:47<12:36, 1.63it/s] {'loss': 0.1822, 'grad_norm': 0.6890717148780823, 'learning_rate': 3.4348737248436516e-07, 'epoch': 2.68}
89%|████████▉ | 10295/11526 [1:47:47<12:36, 1.63it/s] 89%|████████▉ | 10296/11526 [1:47:48<12:36, 1.63it/s] {'loss': 0.1262, 'grad_norm': 0.49161475896835327, 'learning_rate': 3.429360028422307e-07, 'epoch': 2.68}
89%|████████▉ | 10296/11526 [1:47:48<12:36, 1.63it/s] 89%|████████▉ | 10297/11526 [1:47:48<12:35, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5390297174453735, 'learning_rate': 3.4238506037250377e-07, 'epoch': 2.68}
89%|████████▉ | 10297/11526 [1:47:49<12:35, 1.63it/s] 89%|████████▉ | 10298/11526 [1:47:49<12:34, 1.63it/s] {'loss': 0.0918, 'grad_norm': 0.45546388626098633, 'learning_rate': 3.4183454512572045e-07, 'epoch': 2.68}
89%|████████▉ | 10298/11526 [1:47:49<12:34, 1.63it/s] 89%|████████▉ | 10299/11526 [1:47:50<12:34, 1.63it/s] {'loss': 0.124, 'grad_norm': 0.5024904012680054, 'learning_rate': 3.412844571523749e-07, 'epoch': 2.68}
89%|████████▉ | 10299/11526 [1:47:50<12:34, 1.63it/s] 89%|████████▉ | 10300/11526 [1:47:50<12:33, 1.63it/s] {'loss': 0.1493, 'grad_norm': 0.6111215949058533, 'learning_rate': 3.407347965029273e-07, 'epoch': 2.68}
89%|████████▉ | 10300/11526 [1:47:50<12:33, 1.63it/s] 89%|████████▉ | 10301/11526 [1:47:51<12:33, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5607524514198303, 'learning_rate': 3.401855632277945e-07, 'epoch': 2.68}
89%|████████▉ | 10301/11526 [1:47:51<12:33, 1.63it/s] 89%|████████▉ | 10302/11526 [1:47:52<12:32, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.5275169014930725, 'learning_rate': 3.396367573773546e-07, 'epoch': 2.68}
89%|████████▉ | 10302/11526 [1:47:52<12:32, 1.63it/s] 89%|████████▉ | 10303/11526 [1:47:52<12:31, 1.63it/s] {'loss': 0.1181, 'grad_norm': 0.4104422628879547, 'learning_rate': 3.390883790019478e-07, 'epoch': 2.68}
89%|████████▉ | 10303/11526 [1:47:52<12:31, 1.63it/s] 89%|████████▉ | 10304/11526 [1:47:53<12:31, 1.63it/s] {'loss': 0.1351, 'grad_norm': 0.5682392120361328, 'learning_rate': 3.3854042815187394e-07, 'epoch': 2.68}
89%|████████▉ | 10304/11526 [1:47:53<12:31, 1.63it/s] 89%|████████▉ | 10305/11526 [1:47:53<12:30, 1.63it/s] {'loss': 0.155, 'grad_norm': 0.5905227065086365, 'learning_rate': 3.37992904877395e-07, 'epoch': 2.68}
89%|████████▉ | 10305/11526 [1:47:53<12:30, 1.63it/s] 89%|████████▉ | 10306/11526 [1:47:54<12:31, 1.62it/s] {'loss': 0.1283, 'grad_norm': 0.5522199273109436, 'learning_rate': 3.3744580922873303e-07, 'epoch': 2.68}
89%|████████▉ | 10306/11526 [1:47:54<12:31, 1.62it/s] 89%|████████▉ | 10307/11526 [1:47:55<12:29, 1.63it/s] {'loss': 0.1323, 'grad_norm': 0.5282945036888123, 'learning_rate': 3.368991412560696e-07, 'epoch': 2.68}
89%|████████▉ | 10307/11526 [1:47:55<12:29, 1.63it/s] 89%|████████▉ | 10308/11526 [1:47:55<12:28, 1.63it/s] {'loss': 0.136, 'grad_norm': 0.5570231676101685, 'learning_rate': 3.3635290100954787e-07, 'epoch': 2.68}
89%|████████▉ | 10308/11526 [1:47:55<12:28, 1.63it/s] 89%|████████▉ | 10309/11526 [1:47:56<12:28, 1.63it/s] {'loss': 0.1794, 'grad_norm': 0.5950287580490112, 'learning_rate': 3.35807088539275e-07, 'epoch': 2.68}
89%|████████▉ | 10309/11526 [1:47:56<12:28, 1.63it/s] 89%|████████▉ | 10310/11526 [1:47:56<12:27, 1.63it/s] {'loss': 0.1476, 'grad_norm': 0.6515927314758301, 'learning_rate': 3.3526170389531376e-07, 'epoch': 2.68}
89%|████████▉ | 10310/11526 [1:47:57<12:27, 1.63it/s] 89%|████████▉ | 10311/11526 [1:47:57<12:27, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.48808783292770386, 'learning_rate': 3.3471674712768966e-07, 'epoch': 2.68}
89%|████████▉ | 10311/11526 [1:47:57<12:27, 1.63it/s] 89%|████████▉ | 10312/11526 [1:47:58<12:26, 1.63it/s] {'loss': 0.1678, 'grad_norm': 0.6713954210281372, 'learning_rate': 3.3417221828639103e-07, 'epoch': 2.68}
89%|████████▉ | 10312/11526 [1:47:58<12:26, 1.63it/s] 89%|████████▉ | 10313/11526 [1:47:58<12:25, 1.63it/s] {'loss': 0.1745, 'grad_norm': 0.7061079740524292, 'learning_rate': 3.336281174213618e-07, 'epoch': 2.68}
89%|████████▉ | 10313/11526 [1:47:58<12:25, 1.63it/s] 89%|████████▉ | 10314/11526 [1:47:59<12:24, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.5620784759521484, 'learning_rate': 3.3308444458251434e-07, 'epoch': 2.68}
89%|████████▉ | 10314/11526 [1:47:59<12:24, 1.63it/s] 89%|████████▉ | 10315/11526 [1:47:59<12:23, 1.63it/s] {'loss': 0.1501, 'grad_norm': 0.7583000659942627, 'learning_rate': 3.325411998197142e-07, 'epoch': 2.68}
89%|████████▉ | 10315/11526 [1:48:00<12:23, 1.63it/s] 90%|████████▉ | 10316/11526 [1:48:00<12:23, 1.63it/s] {'loss': 0.159, 'grad_norm': 0.6866222620010376, 'learning_rate': 3.319983831827922e-07, 'epoch': 2.69}
90%|████████▉ | 10316/11526 [1:48:00<12:23, 1.63it/s] 90%|████████▉ | 10317/11526 [1:48:01<12:22, 1.63it/s] {'loss': 0.1598, 'grad_norm': 0.7082087397575378, 'learning_rate': 3.3145599472153846e-07, 'epoch': 2.69}
90%|████████▉ | 10317/11526 [1:48:01<12:22, 1.63it/s] 90%|████████▉ | 10318/11526 [1:48:01<12:22, 1.63it/s] {'loss': 0.1922, 'grad_norm': 0.5224623680114746, 'learning_rate': 3.309140344857037e-07, 'epoch': 2.69}
90%|████████▉ | 10318/11526 [1:48:01<12:22, 1.63it/s] 90%|████████▉ | 10319/11526 [1:48:02<12:21, 1.63it/s] {'loss': 0.1315, 'grad_norm': 0.4983137845993042, 'learning_rate': 3.303725025249993e-07, 'epoch': 2.69}
90%|████████▉ | 10319/11526 [1:48:02<12:21, 1.63it/s] 90%|████████▉ | 10320/11526 [1:48:03<12:21, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5488505363464355, 'learning_rate': 3.2983139888909943e-07, 'epoch': 2.69}
90%|████████▉ | 10320/11526 [1:48:03<12:21, 1.63it/s] 90%|████████▉ | 10321/11526 [1:48:03<12:22, 1.62it/s] {'loss': 0.1721, 'grad_norm': 0.6762606501579285, 'learning_rate': 3.2929072362763435e-07, 'epoch': 2.69}
90%|████████▉ | 10321/11526 [1:48:03<12:22, 1.62it/s] 90%|████████▉ | 10322/11526 [1:48:04<12:21, 1.62it/s] {'loss': 0.134, 'grad_norm': 0.7152507305145264, 'learning_rate': 3.287504767902e-07, 'epoch': 2.69}
90%|████████▉ | 10322/11526 [1:48:04<12:21, 1.62it/s] 90%|████████▉ | 10323/11526 [1:48:04<12:20, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.6483590006828308, 'learning_rate': 3.2821065842635e-07, 'epoch': 2.69}
90%|████████▉ | 10323/11526 [1:48:05<12:20, 1.63it/s] 90%|████████▉ | 10324/11526 [1:48:05<12:21, 1.62it/s] {'loss': 0.1472, 'grad_norm': 0.5658242106437683, 'learning_rate': 3.2767126858559984e-07, 'epoch': 2.69}
90%|████████▉ | 10324/11526 [1:48:05<12:21, 1.62it/s] 90%|████████▉ | 10325/11526 [1:48:06<12:20, 1.62it/s] {'loss': 0.1288, 'grad_norm': 0.5408459901809692, 'learning_rate': 3.2713230731742596e-07, 'epoch': 2.69}
90%|████████▉ | 10325/11526 [1:48:06<12:20, 1.62it/s] 90%|████████▉ | 10326/11526 [1:48:06<12:19, 1.62it/s] {'loss': 0.1638, 'grad_norm': 0.646609365940094, 'learning_rate': 3.2659377467126275e-07, 'epoch': 2.69}
90%|████████▉ | 10326/11526 [1:48:06<12:19, 1.62it/s] 90%|████████▉ | 10327/11526 [1:48:07<12:18, 1.62it/s] {'loss': 0.1354, 'grad_norm': 0.6016151905059814, 'learning_rate': 3.260556706965101e-07, 'epoch': 2.69}
90%|████████▉ | 10327/11526 [1:48:07<12:18, 1.62it/s] 90%|████████▉ | 10328/11526 [1:48:07<12:17, 1.63it/s] {'loss': 0.1743, 'grad_norm': 0.6804863810539246, 'learning_rate': 3.2551799544252585e-07, 'epoch': 2.69}
90%|████████▉ | 10328/11526 [1:48:08<12:17, 1.63it/s] 90%|████████▉ | 10329/11526 [1:48:08<12:16, 1.63it/s] {'loss': 0.1537, 'grad_norm': 0.6043528914451599, 'learning_rate': 3.249807489586265e-07, 'epoch': 2.69}
90%|████████▉ | 10329/11526 [1:48:08<12:16, 1.63it/s] 90%|████████▉ | 10330/11526 [1:48:09<12:15, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.635328471660614, 'learning_rate': 3.2444393129409336e-07, 'epoch': 2.69}
90%|████████▉ | 10330/11526 [1:48:09<12:15, 1.63it/s] 90%|████████▉ | 10331/11526 [1:48:09<12:15, 1.62it/s] {'loss': 0.1197, 'grad_norm': 0.48950180411338806, 'learning_rate': 3.2390754249816525e-07, 'epoch': 2.69}
90%|████████▉ | 10331/11526 [1:48:09<12:15, 1.62it/s] 90%|████████▉ | 10332/11526 [1:48:10<12:14, 1.63it/s] {'loss': 0.1243, 'grad_norm': 0.4677511751651764, 'learning_rate': 3.2337158262004287e-07, 'epoch': 2.69}
90%|████████▉ | 10332/11526 [1:48:10<12:14, 1.63it/s] 90%|████████▉ | 10333/11526 [1:48:11<12:13, 1.63it/s] {'loss': 0.1477, 'grad_norm': 0.579864501953125, 'learning_rate': 3.228360517088891e-07, 'epoch': 2.69}
90%|████████▉ | 10333/11526 [1:48:11<12:13, 1.63it/s] 90%|████████▉ | 10334/11526 [1:48:11<12:12, 1.63it/s] {'loss': 0.1578, 'grad_norm': 0.6517623662948608, 'learning_rate': 3.223009498138241e-07, 'epoch': 2.69}
90%|████████▉ | 10334/11526 [1:48:11<12:12, 1.63it/s] 90%|████████▉ | 10335/11526 [1:48:12<12:11, 1.63it/s] {'loss': 0.1399, 'grad_norm': 0.6559487581253052, 'learning_rate': 3.217662769839297e-07, 'epoch': 2.69}
90%|████████▉ | 10335/11526 [1:48:12<12:11, 1.63it/s] 90%|████████▉ | 10336/11526 [1:48:12<12:12, 1.62it/s] {'loss': 0.1346, 'grad_norm': 0.5182831883430481, 'learning_rate': 3.2123203326825225e-07, 'epoch': 2.69}
90%|████████▉ | 10336/11526 [1:48:13<12:12, 1.62it/s] 90%|████████▉ | 10337/11526 [1:48:13<12:11, 1.63it/s] {'loss': 0.1233, 'grad_norm': 0.5634897947311401, 'learning_rate': 3.2069821871579255e-07, 'epoch': 2.69}
90%|████████▉ | 10337/11526 [1:48:13<12:11, 1.63it/s] 90%|████████▉ | 10338/11526 [1:48:14<12:10, 1.63it/s] {'loss': 0.166, 'grad_norm': 0.6649752259254456, 'learning_rate': 3.2016483337551695e-07, 'epoch': 2.69}
90%|████████▉ | 10338/11526 [1:48:14<12:10, 1.63it/s] 90%|████████▉ | 10339/11526 [1:48:14<12:09, 1.63it/s] {'loss': 0.1678, 'grad_norm': 0.5920435786247253, 'learning_rate': 3.1963187729635024e-07, 'epoch': 2.69}
90%|████████▉ | 10339/11526 [1:48:14<12:09, 1.63it/s] 90%|████████▉ | 10340/11526 [1:48:15<12:08, 1.63it/s] {'loss': 0.1657, 'grad_norm': 0.6407426595687866, 'learning_rate': 3.1909935052717665e-07, 'epoch': 2.69}
90%|████████▉ | 10340/11526 [1:48:15<12:08, 1.63it/s] 90%|████████▉ | 10341/11526 [1:48:15<12:11, 1.62it/s] {'loss': 0.1444, 'grad_norm': 0.6089361310005188, 'learning_rate': 3.1856725311684546e-07, 'epoch': 2.69}
90%|████████▉ | 10341/11526 [1:48:16<12:11, 1.62it/s] 90%|████████▉ | 10342/11526 [1:48:16<12:09, 1.62it/s] {'loss': 0.1403, 'grad_norm': 0.5799223780632019, 'learning_rate': 3.1803558511416144e-07, 'epoch': 2.69}
90%|████████▉ | 10342/11526 [1:48:16<12:09, 1.62it/s] 90%|████████▉ | 10343/11526 [1:48:17<12:08, 1.62it/s] {'loss': 0.1273, 'grad_norm': 0.5845237374305725, 'learning_rate': 3.175043465678929e-07, 'epoch': 2.69}
90%|████████▉ | 10343/11526 [1:48:17<12:08, 1.62it/s] 90%|████████▉ | 10344/11526 [1:48:17<12:07, 1.63it/s] {'loss': 0.1625, 'grad_norm': 0.8088277578353882, 'learning_rate': 3.169735375267674e-07, 'epoch': 2.69}
90%|████████▉ | 10344/11526 [1:48:17<12:07, 1.63it/s] 90%|████████▉ | 10345/11526 [1:48:18<12:06, 1.63it/s] {'loss': 0.1447, 'grad_norm': 0.599246621131897, 'learning_rate': 3.1644315803947503e-07, 'epoch': 2.69}
90%|████████▉ | 10345/11526 [1:48:18<12:06, 1.63it/s] 90%|████████▉ | 10346/11526 [1:48:19<12:06, 1.62it/s] {'loss': 0.1404, 'grad_norm': 0.6777896285057068, 'learning_rate': 3.15913208154664e-07, 'epoch': 2.69}
90%|████████▉ | 10346/11526 [1:48:19<12:06, 1.62it/s] 90%|████████▉ | 10347/11526 [1:48:19<12:05, 1.63it/s] {'loss': 0.2175, 'grad_norm': 0.7585781812667847, 'learning_rate': 3.15383687920946e-07, 'epoch': 2.69}
90%|████████▉ | 10347/11526 [1:48:19<12:05, 1.63it/s] 90%|████████▉ | 10348/11526 [1:48:20<12:03, 1.63it/s] {'loss': 0.1721, 'grad_norm': 0.68019700050354, 'learning_rate': 3.1485459738688885e-07, 'epoch': 2.69}
90%|████████▉ | 10348/11526 [1:48:20<12:03, 1.63it/s] 90%|████████▉ | 10349/11526 [1:48:20<12:03, 1.63it/s] {'loss': 0.1286, 'grad_norm': 0.5568451285362244, 'learning_rate': 3.1432593660102595e-07, 'epoch': 2.69}
90%|████████▉ | 10349/11526 [1:48:21<12:03, 1.63it/s] 90%|████████▉ | 10350/11526 [1:48:21<12:02, 1.63it/s] {'loss': 0.118, 'grad_norm': 0.5374578833580017, 'learning_rate': 3.1379770561184965e-07, 'epoch': 2.69}
90%|████████▉ | 10350/11526 [1:48:21<12:02, 1.63it/s] 90%|████████▉ | 10351/11526 [1:48:22<12:01, 1.63it/s] {'loss': 0.1793, 'grad_norm': 0.68685382604599, 'learning_rate': 3.1326990446781e-07, 'epoch': 2.69}
90%|████████▉ | 10351/11526 [1:48:22<12:01, 1.63it/s] 90%|████████▉ | 10352/11526 [1:48:22<12:01, 1.63it/s] {'loss': 0.185, 'grad_norm': 0.6852129101753235, 'learning_rate': 3.127425332173217e-07, 'epoch': 2.69}
90%|████████▉ | 10352/11526 [1:48:22<12:01, 1.63it/s] 90%|████████▉ | 10353/11526 [1:48:23<12:00, 1.63it/s] {'loss': 0.1479, 'grad_norm': 0.5764837265014648, 'learning_rate': 3.1221559190875716e-07, 'epoch': 2.69}
90%|████████▉ | 10353/11526 [1:48:23<12:00, 1.63it/s] 90%|████████▉ | 10354/11526 [1:48:23<12:00, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.5303879976272583, 'learning_rate': 3.11689080590451e-07, 'epoch': 2.69}
90%|████████▉ | 10354/11526 [1:48:24<12:00, 1.63it/s] 90%|████████▉ | 10355/11526 [1:48:24<11:59, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.5979905724525452, 'learning_rate': 3.111629993106985e-07, 'epoch': 2.7}
90%|████████▉ | 10355/11526 [1:48:24<11:59, 1.63it/s] 90%|████████▉ | 10356/11526 [1:48:25<11:58, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.6037413477897644, 'learning_rate': 3.1063734811775383e-07, 'epoch': 2.7}
90%|████████▉ | 10356/11526 [1:48:25<11:58, 1.63it/s] 90%|████████▉ | 10357/11526 [1:48:25<11:58, 1.63it/s] {'loss': 0.1111, 'grad_norm': 0.46414828300476074, 'learning_rate': 3.101121270598317e-07, 'epoch': 2.7}
90%|████████▉ | 10357/11526 [1:48:25<11:58, 1.63it/s] 90%|████████▉ | 10358/11526 [1:48:26<11:57, 1.63it/s] {'loss': 0.1865, 'grad_norm': 0.7734266519546509, 'learning_rate': 3.0958733618511093e-07, 'epoch': 2.7}
90%|████████▉ | 10358/11526 [1:48:26<11:57, 1.63it/s] 90%|████████▉ | 10359/11526 [1:48:27<11:56, 1.63it/s] {'loss': 0.1255, 'grad_norm': 0.6606439352035522, 'learning_rate': 3.0906297554172684e-07, 'epoch': 2.7}
90%|████████▉ | 10359/11526 [1:48:27<11:56, 1.63it/s] 90%|████████▉ | 10360/11526 [1:48:27<11:56, 1.63it/s] {'loss': 0.1341, 'grad_norm': 0.5437430739402771, 'learning_rate': 3.0853904517777645e-07, 'epoch': 2.7}
90%|████████▉ | 10360/11526 [1:48:27<11:56, 1.63it/s] 90%|████████▉ | 10361/11526 [1:48:28<11:55, 1.63it/s] {'loss': 0.1285, 'grad_norm': 0.5107088685035706, 'learning_rate': 3.0801554514131915e-07, 'epoch': 2.7}
90%|████████▉ | 10361/11526 [1:48:28<11:55, 1.63it/s] 90%|████████▉ | 10362/11526 [1:48:28<11:54, 1.63it/s] {'loss': 0.179, 'grad_norm': 0.6632230281829834, 'learning_rate': 3.0749247548037095e-07, 'epoch': 2.7}
90%|████████▉ | 10362/11526 [1:48:29<11:54, 1.63it/s] 90%|████████▉ | 10363/11526 [1:48:29<11:54, 1.63it/s] {'loss': 0.1446, 'grad_norm': 0.5930254459381104, 'learning_rate': 3.0696983624291353e-07, 'epoch': 2.7}
90%|████████▉ | 10363/11526 [1:48:29<11:54, 1.63it/s] 90%|████████▉ | 10364/11526 [1:48:30<11:53, 1.63it/s] {'loss': 0.1564, 'grad_norm': 0.5939241051673889, 'learning_rate': 3.064476274768846e-07, 'epoch': 2.7}
90%|████████▉ | 10364/11526 [1:48:30<11:53, 1.63it/s] 90%|████████▉ | 10365/11526 [1:48:30<11:53, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.6718155145645142, 'learning_rate': 3.059258492301842e-07, 'epoch': 2.7}
90%|████████▉ | 10365/11526 [1:48:30<11:53, 1.63it/s] 90%|████████▉ | 10366/11526 [1:48:31<11:53, 1.63it/s] {'loss': 0.1721, 'grad_norm': 0.6590719819068909, 'learning_rate': 3.054045015506729e-07, 'epoch': 2.7}
90%|████████▉ | 10366/11526 [1:48:31<11:53, 1.63it/s] 90%|████████▉ | 10367/11526 [1:48:31<11:52, 1.63it/s] {'loss': 0.2406, 'grad_norm': 0.6895049810409546, 'learning_rate': 3.0488358448617197e-07, 'epoch': 2.7}
90%|████████▉ | 10367/11526 [1:48:32<11:52, 1.63it/s] 90%|████████▉ | 10368/11526 [1:48:32<11:51, 1.63it/s] {'loss': 0.1263, 'grad_norm': 0.5436030030250549, 'learning_rate': 3.043630980844625e-07, 'epoch': 2.7}
90%|████████▉ | 10368/11526 [1:48:32<11:51, 1.63it/s] 90%|████████▉ | 10369/11526 [1:48:33<11:51, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.5491880774497986, 'learning_rate': 3.038430423932881e-07, 'epoch': 2.7}
90%|████████▉ | 10369/11526 [1:48:33<11:51, 1.63it/s] 90%|████████▉ | 10370/11526 [1:48:33<11:50, 1.63it/s] {'loss': 0.1705, 'grad_norm': 0.6293357014656067, 'learning_rate': 3.0332341746034885e-07, 'epoch': 2.7}
90%|████████▉ | 10370/11526 [1:48:33<11:50, 1.63it/s] 90%|████████▉ | 10371/11526 [1:48:34<11:49, 1.63it/s] {'loss': 0.1234, 'grad_norm': 0.6031721830368042, 'learning_rate': 3.0280422333330837e-07, 'epoch': 2.7}
90%|████████▉ | 10371/11526 [1:48:34<11:49, 1.63it/s] 90%|████████▉ | 10372/11526 [1:48:35<11:48, 1.63it/s] {'loss': 0.1699, 'grad_norm': 0.6066119074821472, 'learning_rate': 3.022854600597913e-07, 'epoch': 2.7}
90%|████████▉ | 10372/11526 [1:48:35<11:48, 1.63it/s] 90%|████████▉ | 10373/11526 [1:48:35<11:48, 1.63it/s] {'loss': 0.2028, 'grad_norm': 0.7241258025169373, 'learning_rate': 3.0176712768738014e-07, 'epoch': 2.7}
90%|████████▉ | 10373/11526 [1:48:35<11:48, 1.63it/s] 90%|█████████ | 10374/11526 [1:48:36<11:47, 1.63it/s] {'loss': 0.1369, 'grad_norm': 0.5689952969551086, 'learning_rate': 3.0124922626362077e-07, 'epoch': 2.7}
90%|█████████ | 10374/11526 [1:48:36<11:47, 1.63it/s] 90%|█████████ | 10375/11526 [1:48:36<11:46, 1.63it/s] {'loss': 0.1108, 'grad_norm': 0.4875647723674774, 'learning_rate': 3.007317558360157e-07, 'epoch': 2.7}
90%|█████████ | 10375/11526 [1:48:37<11:46, 1.63it/s] 90%|█████████ | 10376/11526 [1:48:37<11:45, 1.63it/s] {'loss': 0.1527, 'grad_norm': 0.6056769490242004, 'learning_rate': 3.002147164520319e-07, 'epoch': 2.7}
90%|█████████ | 10376/11526 [1:48:37<11:45, 1.63it/s] 90%|█████████ | 10377/11526 [1:48:38<11:45, 1.63it/s] {'loss': 0.1646, 'grad_norm': 0.5760502815246582, 'learning_rate': 2.996981081590966e-07, 'epoch': 2.7}
90%|█████████ | 10377/11526 [1:48:38<11:45, 1.63it/s] 90%|█████████ | 10378/11526 [1:48:38<11:44, 1.63it/s] {'loss': 0.1517, 'grad_norm': 0.5660081505775452, 'learning_rate': 2.991819310045929e-07, 'epoch': 2.7}
90%|█████████ | 10378/11526 [1:48:38<11:44, 1.63it/s] 90%|█████████ | 10379/11526 [1:48:39<11:44, 1.63it/s] {'loss': 0.1212, 'grad_norm': 0.4789009988307953, 'learning_rate': 2.986661850358696e-07, 'epoch': 2.7}
90%|█████████ | 10379/11526 [1:48:39<11:44, 1.63it/s] 90%|█████████ | 10380/11526 [1:48:39<11:43, 1.63it/s] {'loss': 0.1773, 'grad_norm': 0.7460885047912598, 'learning_rate': 2.9815087030023336e-07, 'epoch': 2.7}
90%|█████████ | 10380/11526 [1:48:40<11:43, 1.63it/s] 90%|█████████ | 10381/11526 [1:48:40<11:42, 1.63it/s] {'loss': 0.1684, 'grad_norm': 0.588597297668457, 'learning_rate': 2.976359868449513e-07, 'epoch': 2.7}
90%|█████████ | 10381/11526 [1:48:40<11:42, 1.63it/s] 90%|█████████ | 10382/11526 [1:48:41<11:42, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.56998211145401, 'learning_rate': 2.971215347172529e-07, 'epoch': 2.7}
90%|█████████ | 10382/11526 [1:48:41<11:42, 1.63it/s] 90%|█████████ | 10383/11526 [1:48:41<11:41, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.6744027137756348, 'learning_rate': 2.9660751396432487e-07, 'epoch': 2.7}
90%|█████████ | 10383/11526 [1:48:41<11:41, 1.63it/s] 90%|█████████ | 10384/11526 [1:48:42<11:41, 1.63it/s] {'loss': 0.1364, 'grad_norm': 0.5603498220443726, 'learning_rate': 2.9609392463331557e-07, 'epoch': 2.7}
90%|█████████ | 10384/11526 [1:48:42<11:41, 1.63it/s] 90%|█████████ | 10385/11526 [1:48:43<11:40, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.6177731156349182, 'learning_rate': 2.955807667713378e-07, 'epoch': 2.7}
90%|█████████ | 10385/11526 [1:48:43<11:40, 1.63it/s] 90%|█████████ | 10386/11526 [1:48:43<11:39, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.5919156074523926, 'learning_rate': 2.95068040425458e-07, 'epoch': 2.7}
90%|█████████ | 10386/11526 [1:48:43<11:39, 1.63it/s] 90%|█████████ | 10387/11526 [1:48:44<11:39, 1.63it/s] {'loss': 0.1287, 'grad_norm': 0.5605263113975525, 'learning_rate': 2.9455574564270783e-07, 'epoch': 2.7}
90%|█████████ | 10387/11526 [1:48:44<11:39, 1.63it/s] 90%|█████████ | 10388/11526 [1:48:44<11:38, 1.63it/s] {'loss': 0.1278, 'grad_norm': 0.5533790588378906, 'learning_rate': 2.940438824700781e-07, 'epoch': 2.7}
90%|█████████ | 10388/11526 [1:48:44<11:38, 1.63it/s] 90%|█████████ | 10389/11526 [1:48:45<11:38, 1.63it/s] {'loss': 0.1528, 'grad_norm': 0.5897548794746399, 'learning_rate': 2.935324509545179e-07, 'epoch': 2.7}
90%|█████████ | 10389/11526 [1:48:45<11:38, 1.63it/s] 90%|█████████ | 10390/11526 [1:48:46<11:37, 1.63it/s] {'loss': 0.1646, 'grad_norm': 0.6483033299446106, 'learning_rate': 2.9302145114294134e-07, 'epoch': 2.7}
90%|█████████ | 10390/11526 [1:48:46<11:37, 1.63it/s] 90%|█████████ | 10391/11526 [1:48:46<11:37, 1.63it/s] {'loss': 0.1588, 'grad_norm': 0.6966068148612976, 'learning_rate': 2.9251088308221767e-07, 'epoch': 2.7}
90%|█████████ | 10391/11526 [1:48:46<11:37, 1.63it/s] 90%|█████████ | 10392/11526 [1:48:47<11:36, 1.63it/s] {'loss': 0.151, 'grad_norm': 0.6486279964447021, 'learning_rate': 2.92000746819181e-07, 'epoch': 2.7}
90%|█████████ | 10392/11526 [1:48:47<11:36, 1.63it/s] 90%|█████████ | 10393/11526 [1:48:47<11:35, 1.63it/s] {'loss': 0.1353, 'grad_norm': 0.5697183609008789, 'learning_rate': 2.9149104240062286e-07, 'epoch': 2.71}
90%|█████████ | 10393/11526 [1:48:48<11:35, 1.63it/s] 90%|█████████ | 10394/11526 [1:48:48<11:35, 1.63it/s] {'loss': 0.1899, 'grad_norm': 0.5344036221504211, 'learning_rate': 2.9098176987329694e-07, 'epoch': 2.71}
90%|█████████ | 10394/11526 [1:48:48<11:35, 1.63it/s] 90%|█████████ | 10395/11526 [1:48:49<11:34, 1.63it/s] {'loss': 0.1411, 'grad_norm': 0.6001323461532593, 'learning_rate': 2.9047292928391593e-07, 'epoch': 2.71}
90%|█████████ | 10395/11526 [1:48:49<11:34, 1.63it/s] 90%|█████████ | 10396/11526 [1:48:49<11:34, 1.63it/s] {'loss': 0.1889, 'grad_norm': 0.7715939879417419, 'learning_rate': 2.8996452067915517e-07, 'epoch': 2.71}
90%|█████████ | 10396/11526 [1:48:49<11:34, 1.63it/s] 90%|█████████ | 10397/11526 [1:48:50<11:33, 1.63it/s] {'loss': 0.2178, 'grad_norm': 0.6678779721260071, 'learning_rate': 2.894565441056457e-07, 'epoch': 2.71}
90%|█████████ | 10397/11526 [1:48:50<11:33, 1.63it/s] 90%|█████████ | 10398/11526 [1:48:51<11:32, 1.63it/s] {'loss': 0.1549, 'grad_norm': 0.6359946131706238, 'learning_rate': 2.8894899960998535e-07, 'epoch': 2.71}
90%|█████████ | 10398/11526 [1:48:51<11:32, 1.63it/s] 90%|█████████ | 10399/11526 [1:48:51<11:32, 1.63it/s] {'loss': 0.1384, 'grad_norm': 0.5435355305671692, 'learning_rate': 2.8844188723872737e-07, 'epoch': 2.71}
90%|█████████ | 10399/11526 [1:48:51<11:32, 1.63it/s] 90%|█████████ | 10400/11526 [1:48:52<11:31, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.6158000230789185, 'learning_rate': 2.8793520703838616e-07, 'epoch': 2.71}
90%|█████████ | 10400/11526 [1:48:52<11:31, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.40it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.80it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.18it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.02it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.80it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5426958799362183, 'eval_runtime': 1.9562, 'eval_samples_per_second': 102.241, 'eval_steps_per_second': 6.646, 'epoch': 2.71}
90%|█████████ | 10400/11526 [1:48:54<11:31, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 90%|█████████ | 10401/11526 [1:48:54<22:32, 1.20s/it] {'loss': 0.1464, 'grad_norm': 0.6986168026924133, 'learning_rate': 2.8742895905543964e-07, 'epoch': 2.71}
90%|█████████ | 10401/11526 [1:48:54<22:32, 1.20s/it] 90%|█████████ | 10402/11526 [1:48:55<19:13, 1.03s/it] {'loss': 0.194, 'grad_norm': 0.7435123324394226, 'learning_rate': 2.869231433363212e-07, 'epoch': 2.71}
90%|█████████ | 10402/11526 [1:48:55<19:13, 1.03s/it] 90%|█████████ | 10403/11526 [1:48:56<16:53, 1.11it/s] {'loss': 0.1652, 'grad_norm': 0.723354697227478, 'learning_rate': 2.864177599274287e-07, 'epoch': 2.71}
90%|█████████ | 10403/11526 [1:48:56<16:53, 1.11it/s] 90%|█████████ | 10404/11526 [1:48:56<15:15, 1.23it/s] {'loss': 0.1577, 'grad_norm': 0.6342421174049377, 'learning_rate': 2.8591280887511894e-07, 'epoch': 2.71}
90%|█████████ | 10404/11526 [1:48:56<15:15, 1.23it/s] 90%|█████████ | 10405/11526 [1:48:57<14:07, 1.32it/s] {'loss': 0.1532, 'grad_norm': 0.6012107133865356, 'learning_rate': 2.8540829022570824e-07, 'epoch': 2.71}
90%|█████████ | 10405/11526 [1:48:57<14:07, 1.32it/s] 90%|█████████ | 10406/11526 [1:48:57<13:19, 1.40it/s] {'loss': 0.1983, 'grad_norm': 0.7726612091064453, 'learning_rate': 2.84904204025474e-07, 'epoch': 2.71}
90%|█████████ | 10406/11526 [1:48:58<13:19, 1.40it/s] 90%|█████████ | 10407/11526 [1:48:58<12:45, 1.46it/s] {'loss': 0.1637, 'grad_norm': 0.6853024959564209, 'learning_rate': 2.844005503206537e-07, 'epoch': 2.71}
90%|█████████ | 10407/11526 [1:48:58<12:45, 1.46it/s] 90%|█████████ | 10408/11526 [1:48:59<12:21, 1.51it/s] {'loss': 0.1352, 'grad_norm': 0.5097507238388062, 'learning_rate': 2.8389732915744537e-07, 'epoch': 2.71}
90%|█████████ | 10408/11526 [1:48:59<12:21, 1.51it/s] 90%|█████████ | 10409/11526 [1:48:59<12:04, 1.54it/s] {'loss': 0.1836, 'grad_norm': 0.6262570023536682, 'learning_rate': 2.833945405820082e-07, 'epoch': 2.71}
90%|█████████ | 10409/11526 [1:48:59<12:04, 1.54it/s] 90%|█████████ | 10410/11526 [1:49:00<11:52, 1.57it/s] {'loss': 0.1518, 'grad_norm': 0.6001260876655579, 'learning_rate': 2.828921846404603e-07, 'epoch': 2.71}
90%|█████████ | 10410/11526 [1:49:00<11:52, 1.57it/s] 90%|█████████ | 10411/11526 [1:49:00<11:43, 1.58it/s] {'loss': 0.1428, 'grad_norm': 0.7384201288223267, 'learning_rate': 2.8239026137887924e-07, 'epoch': 2.71}
90%|█████████ | 10411/11526 [1:49:01<11:43, 1.58it/s] 90%|█████████ | 10412/11526 [1:49:01<11:37, 1.60it/s] {'loss': 0.1313, 'grad_norm': 0.5678769946098328, 'learning_rate': 2.8188877084330656e-07, 'epoch': 2.71}
90%|█████████ | 10412/11526 [1:49:01<11:37, 1.60it/s] 90%|█████████ | 10413/11526 [1:49:02<11:32, 1.61it/s] {'loss': 0.1654, 'grad_norm': 0.6963567137718201, 'learning_rate': 2.8138771307974045e-07, 'epoch': 2.71}
90%|█████████ | 10413/11526 [1:49:02<11:32, 1.61it/s] 90%|█████████ | 10414/11526 [1:49:02<11:29, 1.61it/s] {'loss': 0.1362, 'grad_norm': 0.5220360159873962, 'learning_rate': 2.808870881341413e-07, 'epoch': 2.71}
90%|█████████ | 10414/11526 [1:49:02<11:29, 1.61it/s] 90%|█████████ | 10415/11526 [1:49:03<11:27, 1.62it/s] {'loss': 0.1198, 'grad_norm': 0.4587233364582062, 'learning_rate': 2.8038689605242864e-07, 'epoch': 2.71}
90%|█████████ | 10415/11526 [1:49:03<11:27, 1.62it/s] 90%|█████████ | 10416/11526 [1:49:04<11:29, 1.61it/s] {'loss': 0.1654, 'grad_norm': 0.7025076150894165, 'learning_rate': 2.7988713688048343e-07, 'epoch': 2.71}
90%|█████████ | 10416/11526 [1:49:04<11:29, 1.61it/s] 90%|█████████ | 10417/11526 [1:49:04<11:25, 1.62it/s] {'loss': 0.2061, 'grad_norm': 0.837498128414154, 'learning_rate': 2.793878106641462e-07, 'epoch': 2.71}
90%|█████████ | 10417/11526 [1:49:04<11:25, 1.62it/s] 90%|█████████ | 10418/11526 [1:49:05<11:23, 1.62it/s] {'loss': 0.1508, 'grad_norm': 0.6176120042800903, 'learning_rate': 2.788889174492193e-07, 'epoch': 2.71}
90%|█████████ | 10418/11526 [1:49:05<11:23, 1.62it/s] 90%|█████████ | 10419/11526 [1:49:05<11:22, 1.62it/s] {'loss': 0.1741, 'grad_norm': 0.6882163286209106, 'learning_rate': 2.783904572814622e-07, 'epoch': 2.71}
90%|█████████ | 10419/11526 [1:49:06<11:22, 1.62it/s] 90%|█████████ | 10420/11526 [1:49:06<11:20, 1.62it/s] {'loss': 0.1733, 'grad_norm': 0.6522196531295776, 'learning_rate': 2.778924302065966e-07, 'epoch': 2.71}
90%|█████████ | 10420/11526 [1:49:06<11:20, 1.62it/s] 90%|█████████ | 10421/11526 [1:49:07<11:20, 1.62it/s] {'loss': 0.1699, 'grad_norm': 0.6501739621162415, 'learning_rate': 2.773948362703055e-07, 'epoch': 2.71}
90%|█████████ | 10421/11526 [1:49:07<11:20, 1.62it/s] 90%|█████████ | 10422/11526 [1:49:07<11:19, 1.62it/s] {'loss': 0.126, 'grad_norm': 0.4900639057159424, 'learning_rate': 2.7689767551823067e-07, 'epoch': 2.71}
90%|█████████ | 10422/11526 [1:49:07<11:19, 1.62it/s] 90%|█████████ | 10423/11526 [1:49:08<11:18, 1.63it/s] {'loss': 0.151, 'grad_norm': 0.5894381403923035, 'learning_rate': 2.764009479959745e-07, 'epoch': 2.71}
90%|█████████ | 10423/11526 [1:49:08<11:18, 1.63it/s] 90%|█████████ | 10424/11526 [1:49:08<11:17, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.5159060955047607, 'learning_rate': 2.759046537490984e-07, 'epoch': 2.71}
90%|█████████ | 10424/11526 [1:49:09<11:17, 1.63it/s] 90%|█████████ | 10425/11526 [1:49:09<11:16, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.6065476536750793, 'learning_rate': 2.7540879282312747e-07, 'epoch': 2.71}
90%|█████████ | 10425/11526 [1:49:09<11:16, 1.63it/s] 90%|█████████ | 10426/11526 [1:49:10<11:16, 1.63it/s] {'loss': 0.1386, 'grad_norm': 0.5098985433578491, 'learning_rate': 2.749133652635444e-07, 'epoch': 2.71}
90%|█████████ | 10426/11526 [1:49:10<11:16, 1.63it/s] 90%|█████████ | 10427/11526 [1:49:10<11:15, 1.63it/s] {'loss': 0.201, 'grad_norm': 0.7653838396072388, 'learning_rate': 2.7441837111579106e-07, 'epoch': 2.71}
90%|█████████ | 10427/11526 [1:49:10<11:15, 1.63it/s] 90%|█████████ | 10428/11526 [1:49:11<11:14, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.7468769550323486, 'learning_rate': 2.739238104252728e-07, 'epoch': 2.71}
90%|█████████ | 10428/11526 [1:49:11<11:14, 1.63it/s] 90%|█████████ | 10429/11526 [1:49:12<11:14, 1.63it/s] {'loss': 0.1524, 'grad_norm': 0.6459237337112427, 'learning_rate': 2.734296832373523e-07, 'epoch': 2.71}
90%|█████████ | 10429/11526 [1:49:12<11:14, 1.63it/s] 90%|█████████ | 10430/11526 [1:49:12<11:13, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.7006499767303467, 'learning_rate': 2.729359895973549e-07, 'epoch': 2.71}
90%|█████████ | 10430/11526 [1:49:12<11:13, 1.63it/s] 90%|█████████ | 10431/11526 [1:49:13<11:13, 1.63it/s] {'loss': 0.1432, 'grad_norm': 0.5779584646224976, 'learning_rate': 2.724427295505644e-07, 'epoch': 2.71}
90%|█████████ | 10431/11526 [1:49:13<11:13, 1.63it/s] 91%|█████████ | 10432/11526 [1:49:13<11:12, 1.63it/s] {'loss': 0.1304, 'grad_norm': 0.49699580669403076, 'learning_rate': 2.7194990314222516e-07, 'epoch': 2.72}
91%|█████████ | 10432/11526 [1:49:14<11:12, 1.63it/s] 91%|█████████ | 10433/11526 [1:49:14<11:11, 1.63it/s] {'loss': 0.1121, 'grad_norm': 0.48134511709213257, 'learning_rate': 2.7145751041754155e-07, 'epoch': 2.72}
91%|█████████ | 10433/11526 [1:49:14<11:11, 1.63it/s] 91%|█████████ | 10434/11526 [1:49:15<11:10, 1.63it/s] {'loss': 0.1805, 'grad_norm': 0.6960548758506775, 'learning_rate': 2.709655514216808e-07, 'epoch': 2.72}
91%|█████████ | 10434/11526 [1:49:15<11:10, 1.63it/s] 91%|█████████ | 10435/11526 [1:49:15<11:10, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5994903445243835, 'learning_rate': 2.7047402619976627e-07, 'epoch': 2.72}
91%|█████████ | 10435/11526 [1:49:15<11:10, 1.63it/s] 91%|█████████ | 10436/11526 [1:49:16<11:10, 1.63it/s] {'loss': 0.1955, 'grad_norm': 0.6860584616661072, 'learning_rate': 2.6998293479688356e-07, 'epoch': 2.72}
91%|█████████ | 10436/11526 [1:49:16<11:10, 1.63it/s] 91%|█████████ | 10437/11526 [1:49:16<11:09, 1.63it/s] {'loss': 0.1545, 'grad_norm': 0.8423088788986206, 'learning_rate': 2.694922772580799e-07, 'epoch': 2.72}
91%|█████████ | 10437/11526 [1:49:17<11:09, 1.63it/s] 91%|█████████ | 10438/11526 [1:49:17<11:08, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.5827860236167908, 'learning_rate': 2.690020536283583e-07, 'epoch': 2.72}
91%|█████████ | 10438/11526 [1:49:17<11:08, 1.63it/s] 91%|█████████ | 10439/11526 [1:49:18<11:08, 1.63it/s] {'loss': 0.1772, 'grad_norm': 0.6848771572113037, 'learning_rate': 2.6851226395268825e-07, 'epoch': 2.72}
91%|█████████ | 10439/11526 [1:49:18<11:08, 1.63it/s] 91%|█████████ | 10440/11526 [1:49:18<11:08, 1.62it/s] {'loss': 0.159, 'grad_norm': 0.5997376441955566, 'learning_rate': 2.680229082759933e-07, 'epoch': 2.72}
91%|█████████ | 10440/11526 [1:49:18<11:08, 1.62it/s] 91%|█████████ | 10441/11526 [1:49:19<11:08, 1.62it/s] {'loss': 0.1097, 'grad_norm': 0.5275095701217651, 'learning_rate': 2.6753398664316145e-07, 'epoch': 2.72}
91%|█████████ | 10441/11526 [1:49:19<11:08, 1.62it/s] 91%|█████████ | 10442/11526 [1:49:20<11:07, 1.62it/s] {'loss': 0.153, 'grad_norm': 0.6483624577522278, 'learning_rate': 2.67045499099039e-07, 'epoch': 2.72}
91%|█████████ | 10442/11526 [1:49:20<11:07, 1.62it/s] 91%|█████████ | 10443/11526 [1:49:20<11:06, 1.63it/s] {'loss': 0.1413, 'grad_norm': 0.5219951272010803, 'learning_rate': 2.6655744568843235e-07, 'epoch': 2.72}
91%|█████████ | 10443/11526 [1:49:20<11:06, 1.63it/s] 91%|█████████ | 10444/11526 [1:49:21<11:05, 1.63it/s] {'loss': 0.1551, 'grad_norm': 0.6827651858329773, 'learning_rate': 2.6606982645610903e-07, 'epoch': 2.72}
91%|█████████ | 10444/11526 [1:49:21<11:05, 1.63it/s] 91%|█████████ | 10445/11526 [1:49:21<11:04, 1.63it/s] {'loss': 0.1361, 'grad_norm': 0.7202551364898682, 'learning_rate': 2.6558264144679715e-07, 'epoch': 2.72}
91%|█████████ | 10445/11526 [1:49:21<11:04, 1.63it/s] 91%|█████████ | 10446/11526 [1:49:22<11:04, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.5499066114425659, 'learning_rate': 2.650958907051815e-07, 'epoch': 2.72}
91%|█████████ | 10446/11526 [1:49:22<11:04, 1.63it/s] 91%|█████████ | 10447/11526 [1:49:23<11:02, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.6217098236083984, 'learning_rate': 2.6460957427591307e-07, 'epoch': 2.72}
91%|█████████ | 10447/11526 [1:49:23<11:02, 1.63it/s] 91%|█████████ | 10448/11526 [1:49:23<11:02, 1.63it/s] {'loss': 0.1378, 'grad_norm': 0.5249376893043518, 'learning_rate': 2.6412369220359666e-07, 'epoch': 2.72}
91%|█████████ | 10448/11526 [1:49:23<11:02, 1.63it/s] 91%|█████████ | 10449/11526 [1:49:24<11:03, 1.62it/s] {'loss': 0.1645, 'grad_norm': 0.6727590560913086, 'learning_rate': 2.636382445328012e-07, 'epoch': 2.72}
91%|█████████ | 10449/11526 [1:49:24<11:03, 1.62it/s] 91%|█████████ | 10450/11526 [1:49:24<11:02, 1.62it/s] {'loss': 0.1586, 'grad_norm': 0.6653781533241272, 'learning_rate': 2.631532313080554e-07, 'epoch': 2.72}
91%|█████████ | 10450/11526 [1:49:25<11:02, 1.62it/s] 91%|█████████ | 10451/11526 [1:49:25<11:02, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.5913458466529846, 'learning_rate': 2.6266865257384586e-07, 'epoch': 2.72}
91%|█████████ | 10451/11526 [1:49:25<11:02, 1.62it/s] 91%|█████████ | 10452/11526 [1:49:26<11:00, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.6260626316070557, 'learning_rate': 2.6218450837462216e-07, 'epoch': 2.72}
91%|█████████ | 10452/11526 [1:49:26<11:00, 1.63it/s] 91%|█████████ | 10453/11526 [1:49:26<10:59, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.5962703824043274, 'learning_rate': 2.6170079875479313e-07, 'epoch': 2.72}
91%|█████████ | 10453/11526 [1:49:26<10:59, 1.63it/s] 91%|█████████ | 10454/11526 [1:49:27<10:58, 1.63it/s] {'loss': 0.1452, 'grad_norm': 0.5865595936775208, 'learning_rate': 2.6121752375872555e-07, 'epoch': 2.72}
91%|█████████ | 10454/11526 [1:49:27<10:58, 1.63it/s] 91%|█████████ | 10455/11526 [1:49:28<10:57, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.5842443704605103, 'learning_rate': 2.6073468343075006e-07, 'epoch': 2.72}
91%|█████████ | 10455/11526 [1:49:28<10:57, 1.63it/s] 91%|█████████ | 10456/11526 [1:49:28<10:57, 1.63it/s] {'loss': 0.1197, 'grad_norm': 0.47757911682128906, 'learning_rate': 2.6025227781515393e-07, 'epoch': 2.72}
91%|█████████ | 10456/11526 [1:49:28<10:57, 1.63it/s] 91%|█████████ | 10457/11526 [1:49:29<10:56, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.6563155055046082, 'learning_rate': 2.5977030695618744e-07, 'epoch': 2.72}
91%|█████████ | 10457/11526 [1:49:29<10:56, 1.63it/s] 91%|█████████ | 10458/11526 [1:49:29<10:56, 1.63it/s] {'loss': 0.1484, 'grad_norm': 0.5693402886390686, 'learning_rate': 2.592887708980596e-07, 'epoch': 2.72}
91%|█████████ | 10458/11526 [1:49:29<10:56, 1.63it/s] 91%|█████████ | 10459/11526 [1:49:30<10:55, 1.63it/s] {'loss': 0.1812, 'grad_norm': 0.6156362891197205, 'learning_rate': 2.58807669684939e-07, 'epoch': 2.72}
91%|█████████ | 10459/11526 [1:49:30<10:55, 1.63it/s] 91%|█████████ | 10460/11526 [1:49:31<11:48, 1.50it/s] {'loss': 0.1613, 'grad_norm': 0.6175692081451416, 'learning_rate': 2.583270033609536e-07, 'epoch': 2.72}
91%|█████████ | 10460/11526 [1:49:31<11:48, 1.50it/s] 91%|█████████ | 10461/11526 [1:49:31<11:32, 1.54it/s] {'loss': 0.1424, 'grad_norm': 0.5810580849647522, 'learning_rate': 2.578467719701966e-07, 'epoch': 2.72}
91%|█████████ | 10461/11526 [1:49:32<11:32, 1.54it/s] 91%|█████████ | 10462/11526 [1:49:32<11:20, 1.56it/s] {'loss': 0.1773, 'grad_norm': 0.6246156692504883, 'learning_rate': 2.5736697555671486e-07, 'epoch': 2.72}
91%|█████████ | 10462/11526 [1:49:32<11:20, 1.56it/s] 91%|█████████ | 10463/11526 [1:49:33<11:11, 1.58it/s] {'loss': 0.1041, 'grad_norm': 0.4394382834434509, 'learning_rate': 2.5688761416451825e-07, 'epoch': 2.72}
91%|█████████ | 10463/11526 [1:49:33<11:11, 1.58it/s] 91%|█████████ | 10464/11526 [1:49:33<11:06, 1.59it/s] {'loss': 0.1439, 'grad_norm': 0.606609046459198, 'learning_rate': 2.5640868783757657e-07, 'epoch': 2.72}
91%|█████████ | 10464/11526 [1:49:33<11:06, 1.59it/s] 91%|█████████ | 10465/11526 [1:49:34<11:01, 1.60it/s] {'loss': 0.1387, 'grad_norm': 0.5643331408500671, 'learning_rate': 2.5593019661982024e-07, 'epoch': 2.72}
91%|█████████ | 10465/11526 [1:49:34<11:01, 1.60it/s] 91%|█████████ | 10466/11526 [1:49:34<10:58, 1.61it/s] {'loss': 0.1535, 'grad_norm': 0.5455701351165771, 'learning_rate': 2.554521405551397e-07, 'epoch': 2.72}
91%|█████████ | 10466/11526 [1:49:35<10:58, 1.61it/s] 91%|█████████ | 10467/11526 [1:49:35<10:55, 1.61it/s] {'loss': 0.1264, 'grad_norm': 0.46739378571510315, 'learning_rate': 2.5497451968738317e-07, 'epoch': 2.72}
91%|█████████ | 10467/11526 [1:49:35<10:55, 1.61it/s] 91%|█████████ | 10468/11526 [1:49:36<10:53, 1.62it/s] {'loss': 0.1359, 'grad_norm': 0.6282541751861572, 'learning_rate': 2.5449733406036235e-07, 'epoch': 2.72}
91%|█████████ | 10468/11526 [1:49:36<10:53, 1.62it/s] 91%|█████████ | 10469/11526 [1:49:36<10:52, 1.62it/s] {'loss': 0.1669, 'grad_norm': 0.6982527375221252, 'learning_rate': 2.5402058371784665e-07, 'epoch': 2.72}
91%|█████████ | 10469/11526 [1:49:36<10:52, 1.62it/s] 91%|█████████ | 10470/11526 [1:49:37<10:51, 1.62it/s] {'loss': 0.1738, 'grad_norm': 0.6035037636756897, 'learning_rate': 2.5354426870356606e-07, 'epoch': 2.73}
91%|█████████ | 10470/11526 [1:49:37<10:51, 1.62it/s] 91%|█████████ | 10471/11526 [1:49:38<10:50, 1.62it/s] {'loss': 0.1418, 'grad_norm': 0.5336907505989075, 'learning_rate': 2.530683890612118e-07, 'epoch': 2.73}
91%|█████████ | 10471/11526 [1:49:38<10:50, 1.62it/s] 91%|█████████ | 10472/11526 [1:49:38<11:05, 1.58it/s] {'loss': 0.0996, 'grad_norm': 0.4185461401939392, 'learning_rate': 2.5259294483443444e-07, 'epoch': 2.73}
91%|█████████ | 10472/11526 [1:49:38<11:05, 1.58it/s] 91%|█████████ | 10473/11526 [1:49:39<11:02, 1.59it/s] {'loss': 0.1395, 'grad_norm': 0.5691930055618286, 'learning_rate': 2.5211793606684243e-07, 'epoch': 2.73}
91%|█████████ | 10473/11526 [1:49:39<11:02, 1.59it/s] 91%|█████████ | 10474/11526 [1:49:39<10:59, 1.60it/s] {'loss': 0.1608, 'grad_norm': 0.6188823580741882, 'learning_rate': 2.516433628020093e-07, 'epoch': 2.73}
91%|█████████ | 10474/11526 [1:49:40<10:59, 1.60it/s] 91%|█████████ | 10475/11526 [1:49:40<10:54, 1.61it/s] {'loss': 0.1113, 'grad_norm': 0.44557079672813416, 'learning_rate': 2.5116922508346296e-07, 'epoch': 2.73}
91%|█████████ | 10475/11526 [1:49:40<10:54, 1.61it/s] 91%|█████████ | 10476/11526 [1:49:41<10:51, 1.61it/s] {'loss': 0.1431, 'grad_norm': 0.5675065517425537, 'learning_rate': 2.506955229546948e-07, 'epoch': 2.73}
91%|█████████ | 10476/11526 [1:49:41<10:51, 1.61it/s] 91%|█████████ | 10477/11526 [1:49:41<10:49, 1.62it/s] {'loss': 0.1202, 'grad_norm': 0.5060227513313293, 'learning_rate': 2.502222564591561e-07, 'epoch': 2.73}
91%|█████████ | 10477/11526 [1:49:41<10:49, 1.62it/s] 91%|█████████ | 10478/11526 [1:49:42<10:46, 1.62it/s] {'loss': 0.1421, 'grad_norm': 0.6174195408821106, 'learning_rate': 2.4974942564025717e-07, 'epoch': 2.73}
91%|█████████ | 10478/11526 [1:49:42<10:46, 1.62it/s] 91%|█████████ | 10479/11526 [1:49:43<10:46, 1.62it/s] {'loss': 0.1267, 'grad_norm': 0.5818251967430115, 'learning_rate': 2.492770305413683e-07, 'epoch': 2.73}
91%|█████████ | 10479/11526 [1:49:43<10:46, 1.62it/s] 91%|█████████ | 10480/11526 [1:49:43<10:44, 1.62it/s] {'loss': 0.185, 'grad_norm': 0.701143741607666, 'learning_rate': 2.488050712058215e-07, 'epoch': 2.73}
91%|█████████ | 10480/11526 [1:49:43<10:44, 1.62it/s] 91%|█████████ | 10481/11526 [1:49:44<10:43, 1.62it/s] {'loss': 0.1734, 'grad_norm': 0.6368730068206787, 'learning_rate': 2.4833354767690665e-07, 'epoch': 2.73}
91%|█████████ | 10481/11526 [1:49:44<10:43, 1.62it/s] 91%|█████████ | 10482/11526 [1:49:44<10:42, 1.63it/s] {'loss': 0.1359, 'grad_norm': 0.6156913638114929, 'learning_rate': 2.478624599978741e-07, 'epoch': 2.73}
91%|█████████ | 10482/11526 [1:49:44<10:42, 1.63it/s] 91%|█████████ | 10483/11526 [1:49:45<10:41, 1.63it/s] {'loss': 0.1259, 'grad_norm': 0.4867783188819885, 'learning_rate': 2.473918082119353e-07, 'epoch': 2.73}
91%|█████████ | 10483/11526 [1:49:45<10:41, 1.63it/s] 91%|█████████ | 10484/11526 [1:49:46<10:40, 1.63it/s] {'loss': 0.148, 'grad_norm': 0.5875318646430969, 'learning_rate': 2.4692159236226143e-07, 'epoch': 2.73}
91%|█████████ | 10484/11526 [1:49:46<10:40, 1.63it/s] 91%|█████████ | 10485/11526 [1:49:46<10:39, 1.63it/s] {'loss': 0.174, 'grad_norm': 0.6642328500747681, 'learning_rate': 2.4645181249198235e-07, 'epoch': 2.73}
91%|█████████ | 10485/11526 [1:49:46<10:39, 1.63it/s] 91%|█████████ | 10486/11526 [1:49:47<10:39, 1.63it/s] {'loss': 0.1225, 'grad_norm': 0.49179232120513916, 'learning_rate': 2.4598246864419083e-07, 'epoch': 2.73}
91%|█████████ | 10486/11526 [1:49:47<10:39, 1.63it/s] 91%|█████████ | 10487/11526 [1:49:47<10:38, 1.63it/s] {'loss': 0.1154, 'grad_norm': 0.4528590440750122, 'learning_rate': 2.4551356086193525e-07, 'epoch': 2.73}
91%|█████████ | 10487/11526 [1:49:48<10:38, 1.63it/s] 91%|█████████ | 10488/11526 [1:49:48<10:37, 1.63it/s] {'loss': 0.1544, 'grad_norm': 0.615398108959198, 'learning_rate': 2.450450891882289e-07, 'epoch': 2.73}
91%|█████████ | 10488/11526 [1:49:48<10:37, 1.63it/s] 91%|█████████ | 10489/11526 [1:49:49<10:36, 1.63it/s] {'loss': 0.1787, 'grad_norm': 0.7462763786315918, 'learning_rate': 2.445770536660408e-07, 'epoch': 2.73}
91%|█████████ | 10489/11526 [1:49:49<10:36, 1.63it/s] 91%|█████████ | 10490/11526 [1:49:49<10:36, 1.63it/s] {'loss': 0.1693, 'grad_norm': 0.6371182799339294, 'learning_rate': 2.441094543383027e-07, 'epoch': 2.73}
91%|█████████ | 10490/11526 [1:49:49<10:36, 1.63it/s] 91%|█████████ | 10491/11526 [1:49:50<10:36, 1.63it/s] {'loss': 0.1348, 'grad_norm': 0.5474529266357422, 'learning_rate': 2.436422912479053e-07, 'epoch': 2.73}
91%|█████████ | 10491/11526 [1:49:50<10:36, 1.63it/s] 91%|█████████ | 10492/11526 [1:49:50<10:35, 1.63it/s] {'loss': 0.1722, 'grad_norm': 0.6256284713745117, 'learning_rate': 2.431755644376993e-07, 'epoch': 2.73}
91%|█████████ | 10492/11526 [1:49:51<10:35, 1.63it/s] 91%|█████████ | 10493/11526 [1:49:51<10:35, 1.63it/s] {'loss': 0.207, 'grad_norm': 0.7666471004486084, 'learning_rate': 2.4270927395049605e-07, 'epoch': 2.73}
91%|█████████ | 10493/11526 [1:49:51<10:35, 1.63it/s] 91%|█████████ | 10494/11526 [1:49:52<10:36, 1.62it/s] {'loss': 0.1457, 'grad_norm': 0.6159775257110596, 'learning_rate': 2.4224341982906684e-07, 'epoch': 2.73}
91%|█████████ | 10494/11526 [1:49:52<10:36, 1.62it/s] 91%|█████████ | 10495/11526 [1:49:52<10:34, 1.62it/s] {'loss': 0.2108, 'grad_norm': 0.686845064163208, 'learning_rate': 2.417780021161398e-07, 'epoch': 2.73}
91%|█████████ | 10495/11526 [1:49:52<10:34, 1.62it/s] 91%|█████████ | 10496/11526 [1:49:53<10:34, 1.62it/s] {'loss': 0.1521, 'grad_norm': 0.5817832350730896, 'learning_rate': 2.413130208544096e-07, 'epoch': 2.73}
91%|█████████ | 10496/11526 [1:49:53<10:34, 1.62it/s] 91%|█████████ | 10497/11526 [1:49:54<10:32, 1.63it/s] {'loss': 0.1669, 'grad_norm': 0.6944968700408936, 'learning_rate': 2.4084847608652375e-07, 'epoch': 2.73}
91%|█████████ | 10497/11526 [1:49:54<10:32, 1.63it/s] 91%|█████████ | 10498/11526 [1:49:54<10:31, 1.63it/s] {'loss': 0.1441, 'grad_norm': 0.6724067330360413, 'learning_rate': 2.403843678550943e-07, 'epoch': 2.73}
91%|█████████ | 10498/11526 [1:49:54<10:31, 1.63it/s] 91%|█████████ | 10499/11526 [1:49:55<10:34, 1.62it/s] {'loss': 0.1382, 'grad_norm': 0.5718895792961121, 'learning_rate': 2.399206962026923e-07, 'epoch': 2.73}
91%|█████████ | 10499/11526 [1:49:55<10:34, 1.62it/s] 91%|█████████ | 10500/11526 [1:49:55<10:32, 1.62it/s] {'loss': 0.1594, 'grad_norm': 0.6311414837837219, 'learning_rate': 2.394574611718464e-07, 'epoch': 2.73}
91%|█████████ | 10500/11526 [1:49:56<10:32, 1.62it/s] 91%|█████████ | 10501/11526 [1:49:56<10:31, 1.62it/s] {'loss': 0.1358, 'grad_norm': 0.5729871392250061, 'learning_rate': 2.3899466280504936e-07, 'epoch': 2.73}
91%|█████████ | 10501/11526 [1:49:56<10:31, 1.62it/s] 91%|█████████ | 10502/11526 [1:49:57<10:30, 1.62it/s] {'loss': 0.1628, 'grad_norm': 0.6140561699867249, 'learning_rate': 2.3853230114475157e-07, 'epoch': 2.73}
91%|█████████ | 10502/11526 [1:49:57<10:30, 1.62it/s] 91%|█████████ | 10503/11526 [1:49:57<10:29, 1.63it/s] {'loss': 0.1319, 'grad_norm': 0.5579816699028015, 'learning_rate': 2.3807037623336195e-07, 'epoch': 2.73}
91%|█████████ | 10503/11526 [1:49:57<10:29, 1.63it/s] 91%|█████████ | 10504/11526 [1:49:58<10:28, 1.63it/s] {'loss': 0.1682, 'grad_norm': 0.6371904015541077, 'learning_rate': 2.376088881132521e-07, 'epoch': 2.73}
91%|█████████ | 10504/11526 [1:49:58<10:28, 1.63it/s] 91%|█████████ | 10505/11526 [1:49:59<10:27, 1.63it/s] {'loss': 0.1788, 'grad_norm': 0.7399778962135315, 'learning_rate': 2.3714783682675158e-07, 'epoch': 2.73}
91%|█████████ | 10505/11526 [1:49:59<10:27, 1.63it/s] 91%|█████████ | 10506/11526 [1:49:59<10:27, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.658094584941864, 'learning_rate': 2.3668722241615093e-07, 'epoch': 2.73}
91%|█████████ | 10506/11526 [1:49:59<10:27, 1.63it/s] 91%|█████████ | 10507/11526 [1:50:00<10:26, 1.63it/s] {'loss': 0.1606, 'grad_norm': 0.6451096534729004, 'learning_rate': 2.3622704492370085e-07, 'epoch': 2.73}
91%|█████████ | 10507/11526 [1:50:00<10:26, 1.63it/s] 91%|█████████ | 10508/11526 [1:50:00<10:25, 1.63it/s] {'loss': 0.187, 'grad_norm': 0.6443097591400146, 'learning_rate': 2.3576730439161032e-07, 'epoch': 2.74}
91%|█████████ | 10508/11526 [1:50:00<10:25, 1.63it/s] 91%|█████████ | 10509/11526 [1:50:01<10:25, 1.63it/s] {'loss': 0.1583, 'grad_norm': 0.6488222479820251, 'learning_rate': 2.3530800086204952e-07, 'epoch': 2.74}
91%|█████████ | 10509/11526 [1:50:01<10:25, 1.63it/s] 91%|█████████ | 10510/11526 [1:50:02<10:24, 1.63it/s] {'loss': 0.156, 'grad_norm': 0.5678635835647583, 'learning_rate': 2.3484913437715028e-07, 'epoch': 2.74}
91%|█████████ | 10510/11526 [1:50:02<10:24, 1.63it/s] 91%|█████████ | 10511/11526 [1:50:02<10:25, 1.62it/s] {'loss': 0.208, 'grad_norm': 0.7483847737312317, 'learning_rate': 2.3439070497900008e-07, 'epoch': 2.74}
91%|█████████ | 10511/11526 [1:50:02<10:25, 1.62it/s] 91%|█████████ | 10512/11526 [1:50:03<10:24, 1.62it/s] {'loss': 0.137, 'grad_norm': 0.6025428175926208, 'learning_rate': 2.3393271270965023e-07, 'epoch': 2.74}
91%|█████████ | 10512/11526 [1:50:03<10:24, 1.62it/s] 91%|█████████ | 10513/11526 [1:50:03<10:22, 1.63it/s] {'loss': 0.163, 'grad_norm': 0.6082755327224731, 'learning_rate': 2.3347515761110884e-07, 'epoch': 2.74}
91%|█████████ | 10513/11526 [1:50:04<10:22, 1.63it/s] 91%|█████████ | 10514/11526 [1:50:04<10:21, 1.63it/s] {'loss': 0.2163, 'grad_norm': 0.7977464199066162, 'learning_rate': 2.330180397253473e-07, 'epoch': 2.74}
91%|█████████ | 10514/11526 [1:50:04<10:21, 1.63it/s] 91%|█████████ | 10515/11526 [1:50:05<10:21, 1.63it/s] {'loss': 0.1327, 'grad_norm': 0.6020174622535706, 'learning_rate': 2.3256135909429435e-07, 'epoch': 2.74}
91%|█████████ | 10515/11526 [1:50:05<10:21, 1.63it/s] 91%|█████████ | 10516/11526 [1:50:05<10:21, 1.63it/s] {'loss': 0.1477, 'grad_norm': 0.5509282350540161, 'learning_rate': 2.3210511575983863e-07, 'epoch': 2.74}
91%|█████████ | 10516/11526 [1:50:05<10:21, 1.63it/s] 91%|█████████ | 10517/11526 [1:50:06<10:20, 1.63it/s] {'loss': 0.1374, 'grad_norm': 0.5654510259628296, 'learning_rate': 2.3164930976382948e-07, 'epoch': 2.74}
91%|█████████ | 10517/11526 [1:50:06<10:20, 1.63it/s] 91%|█████████▏| 10518/11526 [1:50:06<10:19, 1.63it/s] {'loss': 0.1737, 'grad_norm': 0.6696981191635132, 'learning_rate': 2.3119394114807681e-07, 'epoch': 2.74}
91%|█████████▏| 10518/11526 [1:50:07<10:19, 1.63it/s] 91%|█████████▏| 10519/11526 [1:50:07<10:19, 1.63it/s] {'loss': 0.1263, 'grad_norm': 0.5015872120857239, 'learning_rate': 2.307390099543494e-07, 'epoch': 2.74}
91%|█████████▏| 10519/11526 [1:50:07<10:19, 1.63it/s] 91%|█████████▏| 10520/11526 [1:50:08<10:18, 1.63it/s] {'loss': 0.1373, 'grad_norm': 0.5641701221466064, 'learning_rate': 2.3028451622437553e-07, 'epoch': 2.74}
91%|█████████▏| 10520/11526 [1:50:08<10:18, 1.63it/s] 91%|█████████▏| 10521/11526 [1:50:08<10:18, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.6412124633789062, 'learning_rate': 2.2983045999984578e-07, 'epoch': 2.74}
91%|█████████▏| 10521/11526 [1:50:08<10:18, 1.63it/s] 91%|█████████▏| 10522/11526 [1:50:09<10:17, 1.63it/s] {'loss': 0.1311, 'grad_norm': 0.5793457627296448, 'learning_rate': 2.2937684132240513e-07, 'epoch': 2.74}
91%|█████████▏| 10522/11526 [1:50:09<10:17, 1.63it/s] 91%|█████████▏| 10523/11526 [1:50:10<10:16, 1.63it/s] {'loss': 0.1464, 'grad_norm': 0.5420159697532654, 'learning_rate': 2.289236602336664e-07, 'epoch': 2.74}
91%|█████████▏| 10523/11526 [1:50:10<10:16, 1.63it/s] 91%|█████████▏| 10524/11526 [1:50:10<10:16, 1.63it/s] {'loss': 0.1154, 'grad_norm': 0.4907280504703522, 'learning_rate': 2.2847091677519473e-07, 'epoch': 2.74}
91%|█████████▏| 10524/11526 [1:50:10<10:16, 1.63it/s] 91%|█████████▏| 10525/11526 [1:50:11<10:16, 1.62it/s] {'loss': 0.129, 'grad_norm': 0.5012423992156982, 'learning_rate': 2.2801861098851962e-07, 'epoch': 2.74}
91%|█████████▏| 10525/11526 [1:50:11<10:16, 1.62it/s] 91%|█████████▏| 10526/11526 [1:50:11<10:15, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.579293429851532, 'learning_rate': 2.2756674291512958e-07, 'epoch': 2.74}
91%|█████████▏| 10526/11526 [1:50:12<10:15, 1.62it/s] 91%|█████████▏| 10527/11526 [1:50:12<10:14, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.642203152179718, 'learning_rate': 2.271153125964709e-07, 'epoch': 2.74}
91%|█████████▏| 10527/11526 [1:50:12<10:14, 1.63it/s] 91%|█████████▏| 10528/11526 [1:50:13<10:13, 1.63it/s] {'loss': 0.15, 'grad_norm': 0.6494803428649902, 'learning_rate': 2.2666432007395267e-07, 'epoch': 2.74}
91%|█████████▏| 10528/11526 [1:50:13<10:13, 1.63it/s] 91%|█████████▏| 10529/11526 [1:50:13<10:13, 1.63it/s] {'loss': 0.173, 'grad_norm': 0.5922171473503113, 'learning_rate': 2.262137653889429e-07, 'epoch': 2.74}
91%|█████████▏| 10529/11526 [1:50:13<10:13, 1.63it/s] 91%|█████████▏| 10530/11526 [1:50:14<10:12, 1.63it/s] {'loss': 0.1486, 'grad_norm': 0.578151524066925, 'learning_rate': 2.2576364858276746e-07, 'epoch': 2.74}
91%|█████████▏| 10530/11526 [1:50:14<10:12, 1.63it/s] 91%|█████████▏| 10531/11526 [1:50:14<10:12, 1.63it/s] {'loss': 0.1344, 'grad_norm': 0.5257462859153748, 'learning_rate': 2.2531396969671493e-07, 'epoch': 2.74}
91%|█████████▏| 10531/11526 [1:50:15<10:12, 1.63it/s] 91%|█████████▏| 10532/11526 [1:50:15<10:11, 1.63it/s] {'loss': 0.1486, 'grad_norm': 0.598823606967926, 'learning_rate': 2.2486472877203124e-07, 'epoch': 2.74}
91%|█████████▏| 10532/11526 [1:50:15<10:11, 1.63it/s] 91%|█████████▏| 10533/11526 [1:50:16<10:10, 1.63it/s] {'loss': 0.1119, 'grad_norm': 0.5722116827964783, 'learning_rate': 2.2441592584992454e-07, 'epoch': 2.74}
91%|█████████▏| 10533/11526 [1:50:16<10:10, 1.63it/s] 91%|█████████▏| 10534/11526 [1:50:16<10:10, 1.62it/s] {'loss': 0.1676, 'grad_norm': 0.609976589679718, 'learning_rate': 2.2396756097156135e-07, 'epoch': 2.74}
91%|█████████▏| 10534/11526 [1:50:16<10:10, 1.62it/s] 91%|█████████▏| 10535/11526 [1:50:17<10:10, 1.62it/s] {'loss': 0.1547, 'grad_norm': 0.5452983975410461, 'learning_rate': 2.2351963417806766e-07, 'epoch': 2.74}
91%|█████████▏| 10535/11526 [1:50:17<10:10, 1.62it/s] 91%|█████████▏| 10536/11526 [1:50:18<10:09, 1.62it/s] {'loss': 0.1435, 'grad_norm': 0.5888309478759766, 'learning_rate': 2.2307214551052947e-07, 'epoch': 2.74}
91%|█████████▏| 10536/11526 [1:50:18<10:09, 1.62it/s] 91%|█████████▏| 10537/11526 [1:50:18<10:08, 1.62it/s] {'loss': 0.1985, 'grad_norm': 0.6760725378990173, 'learning_rate': 2.2262509500999507e-07, 'epoch': 2.74}
91%|█████████▏| 10537/11526 [1:50:18<10:08, 1.62it/s] 91%|█████████▏| 10538/11526 [1:50:19<10:08, 1.62it/s] {'loss': 0.2207, 'grad_norm': 0.6458944082260132, 'learning_rate': 2.2217848271746835e-07, 'epoch': 2.74}
91%|█████████▏| 10538/11526 [1:50:19<10:08, 1.62it/s] 91%|█████████▏| 10539/11526 [1:50:19<10:10, 1.62it/s] {'loss': 0.1519, 'grad_norm': 0.6054496765136719, 'learning_rate': 2.2173230867391593e-07, 'epoch': 2.74}
91%|█████████▏| 10539/11526 [1:50:20<10:10, 1.62it/s] 91%|█████████▏| 10540/11526 [1:50:20<10:08, 1.62it/s] {'loss': 0.1484, 'grad_norm': 0.6091070771217346, 'learning_rate': 2.212865729202629e-07, 'epoch': 2.74}
91%|█████████▏| 10540/11526 [1:50:20<10:08, 1.62it/s] 91%|█████████▏| 10541/11526 [1:50:21<10:10, 1.61it/s] {'loss': 0.1689, 'grad_norm': 0.6600084900856018, 'learning_rate': 2.20841275497396e-07, 'epoch': 2.74}
91%|█████████▏| 10541/11526 [1:50:21<10:10, 1.61it/s] 91%|█████████▏| 10542/11526 [1:50:21<10:09, 1.62it/s] {'loss': 0.1621, 'grad_norm': 0.6152448058128357, 'learning_rate': 2.203964164461597e-07, 'epoch': 2.74}
91%|█████████▏| 10542/11526 [1:50:21<10:09, 1.62it/s] 91%|█████████▏| 10543/11526 [1:50:22<10:07, 1.62it/s] {'loss': 0.1345, 'grad_norm': 0.6330574154853821, 'learning_rate': 2.1995199580735815e-07, 'epoch': 2.74}
91%|█████████▏| 10543/11526 [1:50:22<10:07, 1.62it/s] 91%|█████████▏| 10544/11526 [1:50:23<10:06, 1.62it/s] {'loss': 0.1759, 'grad_norm': 0.6842169761657715, 'learning_rate': 2.1950801362175644e-07, 'epoch': 2.74}
91%|█████████▏| 10544/11526 [1:50:23<10:06, 1.62it/s] 91%|█████████▏| 10545/11526 [1:50:23<10:04, 1.62it/s] {'loss': 0.1526, 'grad_norm': 0.5652374029159546, 'learning_rate': 2.190644699300809e-07, 'epoch': 2.74}
91%|█████████▏| 10545/11526 [1:50:23<10:04, 1.62it/s] 91%|█████████▏| 10546/11526 [1:50:24<10:06, 1.62it/s] {'loss': 0.1476, 'grad_norm': 0.5743699073791504, 'learning_rate': 2.18621364773014e-07, 'epoch': 2.74}
91%|█████████▏| 10546/11526 [1:50:24<10:06, 1.62it/s] 92%|█████████▏| 10547/11526 [1:50:24<10:04, 1.62it/s] {'loss': 0.1508, 'grad_norm': 0.6853923201560974, 'learning_rate': 2.181786981912004e-07, 'epoch': 2.75}
92%|█████████▏| 10547/11526 [1:50:25<10:04, 1.62it/s] 92%|█████████▏| 10548/11526 [1:50:25<10:03, 1.62it/s] {'loss': 0.1319, 'grad_norm': 0.5077623128890991, 'learning_rate': 2.177364702252449e-07, 'epoch': 2.75}
92%|█████████▏| 10548/11526 [1:50:25<10:03, 1.62it/s] 92%|█████████▏| 10549/11526 [1:50:26<10:02, 1.62it/s] {'loss': 0.1723, 'grad_norm': 0.6218135952949524, 'learning_rate': 2.1729468091570894e-07, 'epoch': 2.75}
92%|█████████▏| 10549/11526 [1:50:26<10:02, 1.62it/s] 92%|█████████▏| 10550/11526 [1:50:26<10:01, 1.62it/s] {'loss': 0.1811, 'grad_norm': 0.6689593195915222, 'learning_rate': 2.1685333030311839e-07, 'epoch': 2.75}
92%|█████████▏| 10550/11526 [1:50:26<10:01, 1.62it/s] 92%|█████████▏| 10551/11526 [1:50:27<10:00, 1.62it/s] {'loss': 0.1527, 'grad_norm': 0.5972919464111328, 'learning_rate': 2.164124184279548e-07, 'epoch': 2.75}
92%|█████████▏| 10551/11526 [1:50:27<10:00, 1.62it/s] 92%|█████████▏| 10552/11526 [1:50:27<09:59, 1.63it/s] {'loss': 0.1524, 'grad_norm': 0.6242448687553406, 'learning_rate': 2.1597194533066134e-07, 'epoch': 2.75}
92%|█████████▏| 10552/11526 [1:50:28<09:59, 1.63it/s] 92%|█████████▏| 10553/11526 [1:50:28<09:58, 1.63it/s] {'loss': 0.1801, 'grad_norm': 0.7216396331787109, 'learning_rate': 2.155319110516413e-07, 'epoch': 2.75}
92%|█████████▏| 10553/11526 [1:50:28<09:58, 1.63it/s] 92%|█████████▏| 10554/11526 [1:50:29<09:58, 1.63it/s] {'loss': 0.1341, 'grad_norm': 0.5895179510116577, 'learning_rate': 2.1509231563125732e-07, 'epoch': 2.75}
92%|█████████▏| 10554/11526 [1:50:29<09:58, 1.63it/s] 92%|█████████▏| 10555/11526 [1:50:29<09:56, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.6173248887062073, 'learning_rate': 2.1465315910983053e-07, 'epoch': 2.75}
92%|█████████▏| 10555/11526 [1:50:29<09:56, 1.63it/s] 92%|█████████▏| 10556/11526 [1:50:30<09:56, 1.63it/s] {'loss': 0.134, 'grad_norm': 0.5282201766967773, 'learning_rate': 2.1421444152764425e-07, 'epoch': 2.75}
92%|█████████▏| 10556/11526 [1:50:30<09:56, 1.63it/s] 92%|█████████▏| 10557/11526 [1:50:31<09:55, 1.63it/s] {'loss': 0.1514, 'grad_norm': 0.6117585897445679, 'learning_rate': 2.1377616292493907e-07, 'epoch': 2.75}
92%|█████████▏| 10557/11526 [1:50:31<09:55, 1.63it/s] 92%|█████████▏| 10558/11526 [1:50:31<09:54, 1.63it/s] {'loss': 0.172, 'grad_norm': 0.651683509349823, 'learning_rate': 2.1333832334191674e-07, 'epoch': 2.75}
92%|█████████▏| 10558/11526 [1:50:31<09:54, 1.63it/s] 92%|█████████▏| 10559/11526 [1:50:32<09:55, 1.63it/s] {'loss': 0.1654, 'grad_norm': 0.6439728736877441, 'learning_rate': 2.1290092281873785e-07, 'epoch': 2.75}
92%|█████████▏| 10559/11526 [1:50:32<09:55, 1.63it/s] 92%|█████████▏| 10560/11526 [1:50:32<09:54, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.6117112040519714, 'learning_rate': 2.1246396139552426e-07, 'epoch': 2.75}
92%|█████████▏| 10560/11526 [1:50:32<09:54, 1.63it/s] 92%|█████████▏| 10561/11526 [1:50:33<09:53, 1.62it/s] {'loss': 0.1418, 'grad_norm': 0.6651291847229004, 'learning_rate': 2.1202743911235612e-07, 'epoch': 2.75}
92%|█████████▏| 10561/11526 [1:50:33<09:53, 1.62it/s] 92%|█████████▏| 10562/11526 [1:50:34<09:52, 1.63it/s] {'loss': 0.279, 'grad_norm': 0.5706053972244263, 'learning_rate': 2.1159135600927305e-07, 'epoch': 2.75}
92%|█████████▏| 10562/11526 [1:50:34<09:52, 1.63it/s] 92%|█████████▏| 10563/11526 [1:50:34<09:52, 1.63it/s] {'loss': 0.1605, 'grad_norm': 0.7099830508232117, 'learning_rate': 2.111557121262764e-07, 'epoch': 2.75}
92%|█████████▏| 10563/11526 [1:50:34<09:52, 1.63it/s] 92%|█████████▏| 10564/11526 [1:50:35<09:52, 1.62it/s] {'loss': 0.159, 'grad_norm': 0.5653893947601318, 'learning_rate': 2.1072050750332583e-07, 'epoch': 2.75}
92%|█████████▏| 10564/11526 [1:50:35<09:52, 1.62it/s] 92%|█████████▏| 10565/11526 [1:50:35<09:50, 1.63it/s] {'loss': 0.1709, 'grad_norm': 0.64861661195755, 'learning_rate': 2.1028574218033948e-07, 'epoch': 2.75}
92%|█████████▏| 10565/11526 [1:50:36<09:50, 1.63it/s] 92%|█████████▏| 10566/11526 [1:50:36<09:50, 1.63it/s] {'loss': 0.1542, 'grad_norm': 0.5509538054466248, 'learning_rate': 2.0985141619719707e-07, 'epoch': 2.75}
92%|█████████▏| 10566/11526 [1:50:36<09:50, 1.63it/s] 92%|█████████▏| 10567/11526 [1:50:37<09:49, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.6483556628227234, 'learning_rate': 2.094175295937373e-07, 'epoch': 2.75}
92%|█████████▏| 10567/11526 [1:50:37<09:49, 1.63it/s] 92%|█████████▏| 10568/11526 [1:50:37<09:49, 1.63it/s] {'loss': 0.1482, 'grad_norm': 0.556835412979126, 'learning_rate': 2.0898408240975943e-07, 'epoch': 2.75}
92%|█████████▏| 10568/11526 [1:50:37<09:49, 1.63it/s] 92%|█████████▏| 10569/11526 [1:50:38<09:49, 1.62it/s] {'loss': 0.169, 'grad_norm': 0.6566305160522461, 'learning_rate': 2.0855107468502112e-07, 'epoch': 2.75}
92%|█████████▏| 10569/11526 [1:50:38<09:49, 1.62it/s] 92%|█████████▏| 10570/11526 [1:50:39<09:48, 1.63it/s] {'loss': 0.17, 'grad_norm': 0.6780317425727844, 'learning_rate': 2.0811850645924114e-07, 'epoch': 2.75}
92%|█████████▏| 10570/11526 [1:50:39<09:48, 1.63it/s] 92%|█████████▏| 10571/11526 [1:50:39<09:47, 1.63it/s] {'loss': 0.2103, 'grad_norm': 0.6708818674087524, 'learning_rate': 2.07686377772095e-07, 'epoch': 2.75}
92%|█████████▏| 10571/11526 [1:50:39<09:47, 1.63it/s] 92%|█████████▏| 10572/11526 [1:50:40<09:46, 1.63it/s] {'loss': 0.1977, 'grad_norm': 0.5611969828605652, 'learning_rate': 2.072546886632232e-07, 'epoch': 2.75}
92%|█████████▏| 10572/11526 [1:50:40<09:46, 1.63it/s] 92%|█████████▏| 10573/11526 [1:50:40<09:45, 1.63it/s] {'loss': 0.1397, 'grad_norm': 0.5640397071838379, 'learning_rate': 2.0682343917222014e-07, 'epoch': 2.75}
92%|█████████▏| 10573/11526 [1:50:40<09:45, 1.63it/s] 92%|█████████▏| 10574/11526 [1:50:41<09:46, 1.62it/s] {'loss': 0.1394, 'grad_norm': 0.5708218812942505, 'learning_rate': 2.0639262933864313e-07, 'epoch': 2.75}
92%|█████████▏| 10574/11526 [1:50:41<09:46, 1.62it/s] 92%|█████████▏| 10575/11526 [1:50:42<09:45, 1.62it/s] {'loss': 0.1521, 'grad_norm': 0.6917902231216431, 'learning_rate': 2.0596225920200885e-07, 'epoch': 2.75}
92%|█████████▏| 10575/11526 [1:50:42<09:45, 1.62it/s] 92%|█████████▏| 10576/11526 [1:50:42<09:44, 1.63it/s] {'loss': 0.1778, 'grad_norm': 0.6269663572311401, 'learning_rate': 2.055323288017924e-07, 'epoch': 2.75}
92%|█████████▏| 10576/11526 [1:50:42<09:44, 1.63it/s] 92%|█████████▏| 10577/11526 [1:50:43<09:43, 1.63it/s] {'loss': 0.1478, 'grad_norm': 0.5902937650680542, 'learning_rate': 2.051028381774306e-07, 'epoch': 2.75}
92%|█████████▏| 10577/11526 [1:50:43<09:43, 1.63it/s] 92%|█████████▏| 10578/11526 [1:50:43<09:43, 1.63it/s] {'loss': 0.119, 'grad_norm': 0.5442484021186829, 'learning_rate': 2.0467378736831911e-07, 'epoch': 2.75}
92%|█████████▏| 10578/11526 [1:50:44<09:43, 1.63it/s] 92%|█████████▏| 10579/11526 [1:50:44<09:42, 1.63it/s] {'loss': 0.1375, 'grad_norm': 0.5817205905914307, 'learning_rate': 2.0424517641381148e-07, 'epoch': 2.75}
92%|█████████▏| 10579/11526 [1:50:44<09:42, 1.63it/s] 92%|█████████▏| 10580/11526 [1:50:45<09:41, 1.63it/s] {'loss': 0.1566, 'grad_norm': 0.6501981019973755, 'learning_rate': 2.038170053532229e-07, 'epoch': 2.75}
92%|█████████▏| 10580/11526 [1:50:45<09:41, 1.63it/s] 92%|█████████▏| 10581/11526 [1:50:45<09:41, 1.62it/s] {'loss': 0.143, 'grad_norm': 0.5665038824081421, 'learning_rate': 2.0338927422582754e-07, 'epoch': 2.75}
92%|█████████▏| 10581/11526 [1:50:45<09:41, 1.62it/s] 92%|█████████▏| 10582/11526 [1:50:46<09:40, 1.63it/s] {'loss': 0.1879, 'grad_norm': 0.7378363609313965, 'learning_rate': 2.0296198307086014e-07, 'epoch': 2.75}
92%|█████████▏| 10582/11526 [1:50:46<09:40, 1.63it/s] 92%|█████████▏| 10583/11526 [1:50:47<09:39, 1.63it/s] {'loss': 0.1738, 'grad_norm': 0.686044454574585, 'learning_rate': 2.0253513192751374e-07, 'epoch': 2.75}
92%|█████████▏| 10583/11526 [1:50:47<09:39, 1.63it/s] 92%|█████████▏| 10584/11526 [1:50:47<09:39, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.6860018968582153, 'learning_rate': 2.0210872083494093e-07, 'epoch': 2.75}
92%|█████████▏| 10584/11526 [1:50:47<09:39, 1.63it/s] 92%|█████████▏| 10585/11526 [1:50:48<09:38, 1.63it/s] {'loss': 0.1732, 'grad_norm': 0.6708439588546753, 'learning_rate': 2.0168274983225432e-07, 'epoch': 2.76}
92%|█████████▏| 10585/11526 [1:50:48<09:38, 1.63it/s] 92%|█████████▏| 10586/11526 [1:50:48<09:38, 1.63it/s] {'loss': 0.1296, 'grad_norm': 0.5733746886253357, 'learning_rate': 2.0125721895852878e-07, 'epoch': 2.76}
92%|█████████▏| 10586/11526 [1:50:48<09:38, 1.63it/s] 92%|█████████▏| 10587/11526 [1:50:49<09:37, 1.63it/s] {'loss': 0.182, 'grad_norm': 0.6731439232826233, 'learning_rate': 2.008321282527942e-07, 'epoch': 2.76}
92%|█████████▏| 10587/11526 [1:50:49<09:37, 1.63it/s] 92%|█████████▏| 10588/11526 [1:50:50<09:36, 1.63it/s] {'loss': 0.1595, 'grad_norm': 0.6478714346885681, 'learning_rate': 2.0040747775404322e-07, 'epoch': 2.76}
92%|█████████▏| 10588/11526 [1:50:50<09:36, 1.63it/s] 92%|█████████▏| 10589/11526 [1:50:50<09:35, 1.63it/s] {'loss': 0.117, 'grad_norm': 0.5060409307479858, 'learning_rate': 1.9998326750122644e-07, 'epoch': 2.76}
92%|█████████▏| 10589/11526 [1:50:50<09:35, 1.63it/s] 92%|█████████▏| 10590/11526 [1:50:51<09:35, 1.63it/s] {'loss': 0.1361, 'grad_norm': 0.5546028017997742, 'learning_rate': 1.9955949753325544e-07, 'epoch': 2.76}
92%|█████████▏| 10590/11526 [1:50:51<09:35, 1.63it/s] 92%|█████████▏| 10591/11526 [1:50:51<09:34, 1.63it/s] {'loss': 0.1251, 'grad_norm': 0.5289822816848755, 'learning_rate': 1.9913616788900135e-07, 'epoch': 2.76}
92%|█████████▏| 10591/11526 [1:50:52<09:34, 1.63it/s] 92%|█████████▏| 10592/11526 [1:50:52<09:33, 1.63it/s] {'loss': 0.141, 'grad_norm': 0.5659897923469543, 'learning_rate': 1.9871327860729318e-07, 'epoch': 2.76}
92%|█████████▏| 10592/11526 [1:50:52<09:33, 1.63it/s] 92%|█████████▏| 10593/11526 [1:50:53<09:33, 1.63it/s] {'loss': 0.1278, 'grad_norm': 0.5588262677192688, 'learning_rate': 1.9829082972692037e-07, 'epoch': 2.76}
92%|█████████▏| 10593/11526 [1:50:53<09:33, 1.63it/s] 92%|█████████▏| 10594/11526 [1:50:53<09:33, 1.63it/s] {'loss': 0.1293, 'grad_norm': 0.5107084512710571, 'learning_rate': 1.9786882128663475e-07, 'epoch': 2.76}
92%|█████████▏| 10594/11526 [1:50:53<09:33, 1.63it/s] 92%|█████████▏| 10595/11526 [1:50:54<09:32, 1.63it/s] {'loss': 0.1171, 'grad_norm': 0.4870617091655731, 'learning_rate': 1.974472533251437e-07, 'epoch': 2.76}
92%|█████████▏| 10595/11526 [1:50:54<09:32, 1.63it/s] 92%|█████████▏| 10596/11526 [1:50:55<09:31, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.5255463719367981, 'learning_rate': 1.9702612588111513e-07, 'epoch': 2.76}
92%|█████████▏| 10596/11526 [1:50:55<09:31, 1.63it/s] 92%|█████████▏| 10597/11526 [1:50:55<09:30, 1.63it/s] {'loss': 0.1544, 'grad_norm': 0.647911548614502, 'learning_rate': 1.966054389931793e-07, 'epoch': 2.76}
92%|█████████▏| 10597/11526 [1:50:55<09:30, 1.63it/s] 92%|█████████▏| 10598/11526 [1:50:56<09:29, 1.63it/s] {'loss': 0.1233, 'grad_norm': 0.558558464050293, 'learning_rate': 1.9618519269992142e-07, 'epoch': 2.76}
92%|█████████▏| 10598/11526 [1:50:56<09:29, 1.63it/s] 92%|█████████▏| 10599/11526 [1:50:56<09:29, 1.63it/s] {'loss': 0.15, 'grad_norm': 0.544099748134613, 'learning_rate': 1.9576538703989124e-07, 'epoch': 2.76}
92%|█████████▏| 10599/11526 [1:50:56<09:29, 1.63it/s] 92%|█████████▏| 10600/11526 [1:50:57<09:29, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.5477178692817688, 'learning_rate': 1.9534602205159403e-07, 'epoch': 2.76}
92%|█████████▏| 10600/11526 [1:50:57<09:29, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.31it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.5425940155982971, 'eval_runtime': 1.954, 'eval_samples_per_second': 102.356, 'eval_steps_per_second': 6.653, 'epoch': 2.76}
92%|█████████▏| 10600/11526 [1:50:59<09:29, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 92%|█████████▏| 10601/11526 [1:51:00<18:32, 1.20s/it] {'loss': 0.2059, 'grad_norm': 0.6924582719802856, 'learning_rate': 1.9492709777349738e-07, 'epoch': 2.76}
92%|█████████▏| 10601/11526 [1:51:00<18:32, 1.20s/it] 92%|█████████▏| 10602/11526 [1:51:00<15:47, 1.03s/it] {'loss': 0.1687, 'grad_norm': 0.5950027704238892, 'learning_rate': 1.9450861424402722e-07, 'epoch': 2.76}
92%|█████████▏| 10602/11526 [1:51:00<15:47, 1.03s/it] 92%|█████████▏| 10603/11526 [1:51:01<13:52, 1.11it/s] {'loss': 0.1911, 'grad_norm': 0.7332208156585693, 'learning_rate': 1.940905715015684e-07, 'epoch': 2.76}
92%|█████████▏| 10603/11526 [1:51:01<13:52, 1.11it/s] 92%|█████████▏| 10604/11526 [1:51:01<12:32, 1.23it/s] {'loss': 0.1353, 'grad_norm': 0.5554003119468689, 'learning_rate': 1.9367296958446746e-07, 'epoch': 2.76}
92%|█████████▏| 10604/11526 [1:51:02<12:32, 1.23it/s] 92%|█████████▏| 10605/11526 [1:51:02<11:36, 1.32it/s] {'loss': 0.1537, 'grad_norm': 0.6192076206207275, 'learning_rate': 1.9325580853102876e-07, 'epoch': 2.76}
92%|█████████▏| 10605/11526 [1:51:02<11:36, 1.32it/s] 92%|█████████▏| 10606/11526 [1:51:03<10:56, 1.40it/s] {'loss': 0.1595, 'grad_norm': 0.5925410389900208, 'learning_rate': 1.928390883795167e-07, 'epoch': 2.76}
92%|█████████▏| 10606/11526 [1:51:03<10:56, 1.40it/s] 92%|█████████▏| 10607/11526 [1:51:03<10:28, 1.46it/s] {'loss': 0.1334, 'grad_norm': 0.5538254976272583, 'learning_rate': 1.924228091681546e-07, 'epoch': 2.76}
92%|█████████▏| 10607/11526 [1:51:03<10:28, 1.46it/s] 92%|█████████▏| 10608/11526 [1:51:04<10:08, 1.51it/s] {'loss': 0.1158, 'grad_norm': 0.5612982511520386, 'learning_rate': 1.9200697093512634e-07, 'epoch': 2.76}
92%|█████████▏| 10608/11526 [1:51:04<10:08, 1.51it/s] 92%|█████████▏| 10609/11526 [1:51:04<09:54, 1.54it/s] {'loss': 0.1441, 'grad_norm': 0.7986922860145569, 'learning_rate': 1.9159157371857474e-07, 'epoch': 2.76}
92%|█████████▏| 10609/11526 [1:51:05<09:54, 1.54it/s] 92%|█████████▏| 10610/11526 [1:51:05<09:44, 1.57it/s] {'loss': 0.1536, 'grad_norm': 0.5424771904945374, 'learning_rate': 1.9117661755660378e-07, 'epoch': 2.76}
92%|█████████▏| 10610/11526 [1:51:05<09:44, 1.57it/s] 92%|█████████▏| 10611/11526 [1:51:06<09:37, 1.58it/s] {'loss': 0.1373, 'grad_norm': 0.5825876593589783, 'learning_rate': 1.9076210248727357e-07, 'epoch': 2.76}
92%|█████████▏| 10611/11526 [1:51:06<09:37, 1.58it/s] 92%|█████████▏| 10612/11526 [1:51:06<09:32, 1.60it/s] {'loss': 0.1128, 'grad_norm': 0.450454980134964, 'learning_rate': 1.9034802854860701e-07, 'epoch': 2.76}
92%|█████████▏| 10612/11526 [1:51:06<09:32, 1.60it/s] 92%|█████████▏| 10613/11526 [1:51:07<09:28, 1.61it/s] {'loss': 0.1755, 'grad_norm': 0.6726895570755005, 'learning_rate': 1.8993439577858542e-07, 'epoch': 2.76}
92%|█████████▏| 10613/11526 [1:51:07<09:28, 1.61it/s] 92%|█████████▏| 10614/11526 [1:51:08<09:26, 1.61it/s] {'loss': 0.1469, 'grad_norm': 0.6359692215919495, 'learning_rate': 1.8952120421514898e-07, 'epoch': 2.76}
92%|█████████▏| 10614/11526 [1:51:08<09:26, 1.61it/s] 92%|█████████▏| 10615/11526 [1:51:08<09:23, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.5872797966003418, 'learning_rate': 1.8910845389619792e-07, 'epoch': 2.76}
92%|█████████▏| 10615/11526 [1:51:08<09:23, 1.62it/s] 92%|█████████▏| 10616/11526 [1:51:09<09:22, 1.62it/s] {'loss': 0.1827, 'grad_norm': 0.733494222164154, 'learning_rate': 1.8869614485959254e-07, 'epoch': 2.76}
92%|█████████▏| 10616/11526 [1:51:09<09:22, 1.62it/s] 92%|█████████▏| 10617/11526 [1:51:09<09:20, 1.62it/s] {'loss': 0.1646, 'grad_norm': 0.6324483752250671, 'learning_rate': 1.88284277143152e-07, 'epoch': 2.76}
92%|█████████▏| 10617/11526 [1:51:10<09:20, 1.62it/s] 92%|█████████▏| 10618/11526 [1:51:10<09:20, 1.62it/s] {'loss': 0.127, 'grad_norm': 0.4971674680709839, 'learning_rate': 1.8787285078465555e-07, 'epoch': 2.76}
92%|█████████▏| 10618/11526 [1:51:10<09:20, 1.62it/s] 92%|█████████▏| 10619/11526 [1:51:11<09:19, 1.62it/s] {'loss': 0.1303, 'grad_norm': 0.5130529403686523, 'learning_rate': 1.8746186582184023e-07, 'epoch': 2.76}
92%|█████████▏| 10619/11526 [1:51:11<09:19, 1.62it/s] 92%|█████████▏| 10620/11526 [1:51:11<09:18, 1.62it/s] {'loss': 0.1428, 'grad_norm': 0.6151509284973145, 'learning_rate': 1.8705132229240475e-07, 'epoch': 2.76}
92%|█████████▏| 10620/11526 [1:51:11<09:18, 1.62it/s] 92%|█████████▏| 10621/11526 [1:51:12<09:17, 1.62it/s] {'loss': 0.1125, 'grad_norm': 0.47949230670928955, 'learning_rate': 1.8664122023400732e-07, 'epoch': 2.76}
92%|█████████▏| 10621/11526 [1:51:12<09:17, 1.62it/s] 92%|█████████▏| 10622/11526 [1:51:12<09:16, 1.62it/s] {'loss': 0.1617, 'grad_norm': 0.6296475529670715, 'learning_rate': 1.8623155968426343e-07, 'epoch': 2.76}
92%|█████████▏| 10622/11526 [1:51:13<09:16, 1.62it/s] 92%|█████████▏| 10623/11526 [1:51:13<09:16, 1.62it/s] {'loss': 0.1423, 'grad_norm': 0.5995914340019226, 'learning_rate': 1.858223406807502e-07, 'epoch': 2.76}
92%|█████████▏| 10623/11526 [1:51:13<09:16, 1.62it/s] 92%|█████████▏| 10624/11526 [1:51:14<09:16, 1.62it/s] {'loss': 0.1571, 'grad_norm': 0.5958232283592224, 'learning_rate': 1.8541356326100436e-07, 'epoch': 2.77}
92%|█████████▏| 10624/11526 [1:51:14<09:16, 1.62it/s] 92%|█████████▏| 10625/11526 [1:51:14<09:14, 1.62it/s] {'loss': 0.1649, 'grad_norm': 0.810056746006012, 'learning_rate': 1.850052274625186e-07, 'epoch': 2.77}
92%|█████████▏| 10625/11526 [1:51:14<09:14, 1.62it/s] 92%|█████████▏| 10626/11526 [1:51:15<09:14, 1.62it/s] {'loss': 0.1299, 'grad_norm': 0.5287416577339172, 'learning_rate': 1.8459733332275022e-07, 'epoch': 2.77}
92%|█████████▏| 10626/11526 [1:51:15<09:14, 1.62it/s] 92%|█████████▏| 10627/11526 [1:51:16<09:13, 1.63it/s] {'loss': 0.1337, 'grad_norm': 0.4935009479522705, 'learning_rate': 1.841898808791137e-07, 'epoch': 2.77}
92%|█████████▏| 10627/11526 [1:51:16<09:13, 1.63it/s] 92%|█████████▏| 10628/11526 [1:51:16<09:12, 1.63it/s] {'loss': 0.2021, 'grad_norm': 0.7131914496421814, 'learning_rate': 1.83782870168982e-07, 'epoch': 2.77}
92%|█████████▏| 10628/11526 [1:51:16<09:12, 1.63it/s] 92%|█████████▏| 10629/11526 [1:51:17<09:12, 1.62it/s] {'loss': 0.1526, 'grad_norm': 0.6946574449539185, 'learning_rate': 1.8337630122968796e-07, 'epoch': 2.77}
92%|█████████▏| 10629/11526 [1:51:17<09:12, 1.62it/s] 92%|█████████▏| 10630/11526 [1:51:17<09:11, 1.63it/s] {'loss': 0.1204, 'grad_norm': 0.48460665345191956, 'learning_rate': 1.8297017409852512e-07, 'epoch': 2.77}
92%|█████████▏| 10630/11526 [1:51:18<09:11, 1.63it/s] 92%|█████████▏| 10631/11526 [1:51:18<09:11, 1.62it/s] {'loss': 0.1437, 'grad_norm': 0.5447143912315369, 'learning_rate': 1.8256448881274592e-07, 'epoch': 2.77}
92%|█████████▏| 10631/11526 [1:51:18<09:11, 1.62it/s] 92%|█████████▏| 10632/11526 [1:51:19<09:10, 1.62it/s] {'loss': 0.139, 'grad_norm': 0.556313157081604, 'learning_rate': 1.8215924540956221e-07, 'epoch': 2.77}
92%|█████████▏| 10632/11526 [1:51:19<09:10, 1.62it/s] 92%|█████████▏| 10633/11526 [1:51:19<09:09, 1.63it/s] {'loss': 0.1892, 'grad_norm': 0.6763558983802795, 'learning_rate': 1.8175444392614484e-07, 'epoch': 2.77}
92%|█████████▏| 10633/11526 [1:51:19<09:09, 1.63it/s] 92%|█████████▏| 10634/11526 [1:51:20<09:09, 1.62it/s] {'loss': 0.1392, 'grad_norm': 0.5421690344810486, 'learning_rate': 1.8135008439962353e-07, 'epoch': 2.77}
92%|█████████▏| 10634/11526 [1:51:20<09:09, 1.62it/s] 92%|█████████▏| 10635/11526 [1:51:20<09:08, 1.63it/s] {'loss': 0.1684, 'grad_norm': 0.5904104113578796, 'learning_rate': 1.8094616686709032e-07, 'epoch': 2.77}
92%|█████████▏| 10635/11526 [1:51:21<09:08, 1.63it/s] 92%|█████████▏| 10636/11526 [1:51:21<09:07, 1.63it/s] {'loss': 0.1225, 'grad_norm': 0.5761419534683228, 'learning_rate': 1.8054269136559387e-07, 'epoch': 2.77}
92%|█████████▏| 10636/11526 [1:51:21<09:07, 1.63it/s] 92%|█████████▏| 10637/11526 [1:51:22<09:06, 1.63it/s] {'loss': 0.2147, 'grad_norm': 0.7317141890525818, 'learning_rate': 1.801396579321435e-07, 'epoch': 2.77}
92%|█████████▏| 10637/11526 [1:51:22<09:06, 1.63it/s] 92%|█████████▏| 10638/11526 [1:51:22<09:05, 1.63it/s] {'loss': 0.1614, 'grad_norm': 0.7756372690200806, 'learning_rate': 1.7973706660370737e-07, 'epoch': 2.77}
92%|█████████▏| 10638/11526 [1:51:22<09:05, 1.63it/s] 92%|█████████▏| 10639/11526 [1:51:23<09:07, 1.62it/s] {'loss': 0.1762, 'grad_norm': 0.5553480982780457, 'learning_rate': 1.793349174172132e-07, 'epoch': 2.77}
92%|█████████▏| 10639/11526 [1:51:23<09:07, 1.62it/s] 92%|█████████▏| 10640/11526 [1:51:24<09:06, 1.62it/s] {'loss': 0.1803, 'grad_norm': 0.7020739912986755, 'learning_rate': 1.789332104095498e-07, 'epoch': 2.77}
92%|█████████▏| 10640/11526 [1:51:24<09:06, 1.62it/s] 92%|█████████▏| 10641/11526 [1:51:24<09:05, 1.62it/s] {'loss': 0.1565, 'grad_norm': 0.5563545227050781, 'learning_rate': 1.785319456175627e-07, 'epoch': 2.77}
92%|█████████▏| 10641/11526 [1:51:24<09:05, 1.62it/s] 92%|█████████▏| 10642/11526 [1:51:25<09:04, 1.62it/s] {'loss': 0.178, 'grad_norm': 0.7136436700820923, 'learning_rate': 1.7813112307805802e-07, 'epoch': 2.77}
92%|█████████▏| 10642/11526 [1:51:25<09:04, 1.62it/s] 92%|█████████▏| 10643/11526 [1:51:25<09:03, 1.62it/s] {'loss': 0.1487, 'grad_norm': 0.5811643004417419, 'learning_rate': 1.77730742827803e-07, 'epoch': 2.77}
92%|█████████▏| 10643/11526 [1:51:26<09:03, 1.62it/s] 92%|█████████▏| 10644/11526 [1:51:26<09:03, 1.62it/s] {'loss': 0.22, 'grad_norm': 0.7218557596206665, 'learning_rate': 1.7733080490352106e-07, 'epoch': 2.77}
92%|█████████▏| 10644/11526 [1:51:26<09:03, 1.62it/s] 92%|█████████▏| 10645/11526 [1:51:27<09:02, 1.62it/s] {'loss': 0.1536, 'grad_norm': 0.6462259888648987, 'learning_rate': 1.7693130934189783e-07, 'epoch': 2.77}
92%|█████████▏| 10645/11526 [1:51:27<09:02, 1.62it/s] 92%|█████████▏| 10646/11526 [1:51:27<09:01, 1.62it/s] {'loss': 0.1383, 'grad_norm': 0.5724541544914246, 'learning_rate': 1.7653225617957737e-07, 'epoch': 2.77}
92%|█████████▏| 10646/11526 [1:51:27<09:01, 1.62it/s] 92%|█████████▏| 10647/11526 [1:51:28<09:00, 1.63it/s] {'loss': 0.1299, 'grad_norm': 0.5669786334037781, 'learning_rate': 1.7613364545316203e-07, 'epoch': 2.77}
92%|█████████▏| 10647/11526 [1:51:28<09:00, 1.63it/s] 92%|█████████▏| 10648/11526 [1:51:28<08:59, 1.63it/s] {'loss': 0.1875, 'grad_norm': 0.7989639043807983, 'learning_rate': 1.7573547719921592e-07, 'epoch': 2.77}
92%|█████████▏| 10648/11526 [1:51:29<08:59, 1.63it/s] 92%|█████████▏| 10649/11526 [1:51:29<08:59, 1.63it/s] {'loss': 0.1438, 'grad_norm': 0.6185638904571533, 'learning_rate': 1.753377514542609e-07, 'epoch': 2.77}
92%|█████████▏| 10649/11526 [1:51:29<08:59, 1.63it/s] 92%|█████████▏| 10650/11526 [1:51:30<08:58, 1.63it/s] {'loss': 0.1402, 'grad_norm': 0.5251977443695068, 'learning_rate': 1.7494046825477783e-07, 'epoch': 2.77}
92%|█████████▏| 10650/11526 [1:51:30<08:58, 1.63it/s] 92%|█████████▏| 10651/11526 [1:51:30<08:58, 1.62it/s] {'loss': 0.1528, 'grad_norm': 0.5738213658332825, 'learning_rate': 1.7454362763720867e-07, 'epoch': 2.77}
92%|█████████▏| 10651/11526 [1:51:30<08:58, 1.62it/s] 92%|█████████▏| 10652/11526 [1:51:31<08:57, 1.63it/s] {'loss': 0.1443, 'grad_norm': 0.6120294332504272, 'learning_rate': 1.7414722963795426e-07, 'epoch': 2.77}
92%|█████████▏| 10652/11526 [1:51:31<08:57, 1.63it/s] 92%|█████████▏| 10653/11526 [1:51:32<08:56, 1.63it/s] {'loss': 0.128, 'grad_norm': 0.518664538860321, 'learning_rate': 1.7375127429337335e-07, 'epoch': 2.77}
92%|█████████▏| 10653/11526 [1:51:32<08:56, 1.63it/s] 92%|█████████▏| 10654/11526 [1:51:32<08:56, 1.63it/s] {'loss': 0.1719, 'grad_norm': 0.7161161303520203, 'learning_rate': 1.7335576163978629e-07, 'epoch': 2.77}
92%|█████████▏| 10654/11526 [1:51:32<08:56, 1.63it/s] 92%|█████████▏| 10655/11526 [1:51:33<08:55, 1.63it/s] {'loss': 0.1142, 'grad_norm': 0.50251305103302, 'learning_rate': 1.7296069171347073e-07, 'epoch': 2.77}
92%|█████████▏| 10655/11526 [1:51:33<08:55, 1.63it/s] 92%|█████████▏| 10656/11526 [1:51:33<08:55, 1.62it/s] {'loss': 0.1454, 'grad_norm': 0.5392135381698608, 'learning_rate': 1.725660645506655e-07, 'epoch': 2.77}
92%|█████████▏| 10656/11526 [1:51:34<08:55, 1.62it/s] 92%|█████████▏| 10657/11526 [1:51:34<08:54, 1.63it/s] {'loss': 0.1766, 'grad_norm': 0.6341469883918762, 'learning_rate': 1.721718801875677e-07, 'epoch': 2.77}
92%|█████████▏| 10657/11526 [1:51:34<08:54, 1.63it/s] 92%|█████████▏| 10658/11526 [1:51:35<08:53, 1.63it/s] {'loss': 0.1827, 'grad_norm': 0.7147120237350464, 'learning_rate': 1.7177813866033454e-07, 'epoch': 2.77}
92%|█████████▏| 10658/11526 [1:51:35<08:53, 1.63it/s] 92%|█████████▏| 10659/11526 [1:51:35<08:53, 1.62it/s] {'loss': 0.1115, 'grad_norm': 0.5666691660881042, 'learning_rate': 1.7138484000508214e-07, 'epoch': 2.77}
92%|█████████▏| 10659/11526 [1:51:35<08:53, 1.62it/s] 92%|█████████▏| 10660/11526 [1:51:36<08:53, 1.62it/s] {'loss': 0.1239, 'grad_norm': 0.569027841091156, 'learning_rate': 1.70991984257885e-07, 'epoch': 2.77}
92%|█████████▏| 10660/11526 [1:51:36<08:53, 1.62it/s] 92%|█████████▏| 10661/11526 [1:51:36<08:52, 1.62it/s] {'loss': 0.1487, 'grad_norm': 0.5739498734474182, 'learning_rate': 1.7059957145477923e-07, 'epoch': 2.77}
92%|█████████▏| 10661/11526 [1:51:37<08:52, 1.62it/s] 93%|█████████▎| 10662/11526 [1:51:37<08:52, 1.62it/s] {'loss': 0.1574, 'grad_norm': 0.6177845001220703, 'learning_rate': 1.702076016317594e-07, 'epoch': 2.78}
93%|█████████▎| 10662/11526 [1:51:37<08:52, 1.62it/s] 93%|█████████▎| 10663/11526 [1:51:38<08:51, 1.62it/s] {'loss': 0.1846, 'grad_norm': 0.6810739040374756, 'learning_rate': 1.6981607482477847e-07, 'epoch': 2.78}
93%|█████████▎| 10663/11526 [1:51:38<08:51, 1.62it/s] 93%|█████████▎| 10664/11526 [1:51:38<08:51, 1.62it/s] {'loss': 0.1664, 'grad_norm': 0.6250445246696472, 'learning_rate': 1.6942499106974985e-07, 'epoch': 2.78}
93%|█████████▎| 10664/11526 [1:51:38<08:51, 1.62it/s] 93%|█████████▎| 10665/11526 [1:51:39<08:49, 1.62it/s] {'loss': 0.2034, 'grad_norm': 0.6593348979949951, 'learning_rate': 1.6903435040254545e-07, 'epoch': 2.78}
93%|█████████▎| 10665/11526 [1:51:39<08:49, 1.62it/s] 93%|█████████▎| 10666/11526 [1:51:40<08:49, 1.62it/s] {'loss': 0.1238, 'grad_norm': 0.5179917216300964, 'learning_rate': 1.6864415285899827e-07, 'epoch': 2.78}
93%|█████████▎| 10666/11526 [1:51:40<08:49, 1.62it/s] 93%|█████████▎| 10667/11526 [1:51:40<08:48, 1.63it/s] {'loss': 0.132, 'grad_norm': 0.5461750030517578, 'learning_rate': 1.6825439847489856e-07, 'epoch': 2.78}
93%|█████████▎| 10667/11526 [1:51:40<08:48, 1.63it/s] 93%|█████████▎| 10668/11526 [1:51:41<08:47, 1.63it/s] {'loss': 0.182, 'grad_norm': 0.6468380689620972, 'learning_rate': 1.678650872859966e-07, 'epoch': 2.78}
93%|█████████▎| 10668/11526 [1:51:41<08:47, 1.63it/s] 93%|█████████▎| 10669/11526 [1:51:41<08:47, 1.62it/s] {'loss': 0.1319, 'grad_norm': 0.5417889952659607, 'learning_rate': 1.6747621932800163e-07, 'epoch': 2.78}
93%|█████████▎| 10669/11526 [1:51:42<08:47, 1.62it/s] 93%|█████████▎| 10670/11526 [1:51:42<08:46, 1.63it/s] {'loss': 0.1473, 'grad_norm': 0.5414963960647583, 'learning_rate': 1.6708779463658564e-07, 'epoch': 2.78}
93%|█████████▎| 10670/11526 [1:51:42<08:46, 1.63it/s] 93%|█████████▎| 10671/11526 [1:51:43<08:46, 1.62it/s] {'loss': 0.1293, 'grad_norm': 0.5816164016723633, 'learning_rate': 1.6669981324737405e-07, 'epoch': 2.78}
93%|█████████▎| 10671/11526 [1:51:43<08:46, 1.62it/s] 93%|█████████▎| 10672/11526 [1:51:43<08:45, 1.63it/s] {'loss': 0.23, 'grad_norm': 0.7411584258079529, 'learning_rate': 1.6631227519595615e-07, 'epoch': 2.78}
93%|█████████▎| 10672/11526 [1:51:43<08:45, 1.63it/s] 93%|█████████▎| 10673/11526 [1:51:44<08:44, 1.63it/s] {'loss': 0.1523, 'grad_norm': 0.5778077244758606, 'learning_rate': 1.6592518051787964e-07, 'epoch': 2.78}
93%|█████████▎| 10673/11526 [1:51:44<08:44, 1.63it/s] 93%|█████████▎| 10674/11526 [1:51:44<08:44, 1.62it/s] {'loss': 0.1197, 'grad_norm': 0.5083779692649841, 'learning_rate': 1.6553852924864887e-07, 'epoch': 2.78}
93%|█████████▎| 10674/11526 [1:51:45<08:44, 1.62it/s] 93%|█████████▎| 10675/11526 [1:51:45<08:43, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.6076626181602478, 'learning_rate': 1.6515232142373217e-07, 'epoch': 2.78}
93%|█████████▎| 10675/11526 [1:51:45<08:43, 1.63it/s] 93%|█████████▎| 10676/11526 [1:51:46<08:43, 1.62it/s] {'loss': 0.1653, 'grad_norm': 0.6930595636367798, 'learning_rate': 1.6476655707855338e-07, 'epoch': 2.78}
93%|█████████▎| 10676/11526 [1:51:46<08:43, 1.62it/s] 93%|█████████▎| 10677/11526 [1:51:46<08:42, 1.62it/s] {'loss': 0.1885, 'grad_norm': 0.7009357810020447, 'learning_rate': 1.6438123624849755e-07, 'epoch': 2.78}
93%|█████████▎| 10677/11526 [1:51:46<08:42, 1.62it/s] 93%|█████████▎| 10678/11526 [1:51:47<08:41, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5905389785766602, 'learning_rate': 1.6399635896890808e-07, 'epoch': 2.78}
93%|█████████▎| 10678/11526 [1:51:47<08:41, 1.63it/s] 93%|█████████▎| 10679/11526 [1:51:48<08:41, 1.62it/s] {'loss': 0.1575, 'grad_norm': 0.5355885624885559, 'learning_rate': 1.6361192527508784e-07, 'epoch': 2.78}
93%|█████████▎| 10679/11526 [1:51:48<08:41, 1.62it/s] 93%|█████████▎| 10680/11526 [1:51:48<08:40, 1.63it/s] {'loss': 0.1286, 'grad_norm': 0.5171935558319092, 'learning_rate': 1.6322793520230028e-07, 'epoch': 2.78}
93%|█████████▎| 10680/11526 [1:51:48<08:40, 1.63it/s] 93%|█████████▎| 10681/11526 [1:51:49<08:42, 1.62it/s] {'loss': 0.2051, 'grad_norm': 0.8244121670722961, 'learning_rate': 1.6284438878576724e-07, 'epoch': 2.78}
93%|█████████▎| 10681/11526 [1:51:49<08:42, 1.62it/s] 93%|█████████▎| 10682/11526 [1:51:49<08:40, 1.62it/s] {'loss': 0.1433, 'grad_norm': 0.6154755353927612, 'learning_rate': 1.624612860606678e-07, 'epoch': 2.78}
93%|█████████▎| 10682/11526 [1:51:50<08:40, 1.62it/s] 93%|█████████▎| 10683/11526 [1:51:50<08:39, 1.62it/s] {'loss': 0.1484, 'grad_norm': 0.7687065005302429, 'learning_rate': 1.620786270621444e-07, 'epoch': 2.78}
93%|█████████▎| 10683/11526 [1:51:50<08:39, 1.62it/s] 93%|█████████▎| 10684/11526 [1:51:51<08:39, 1.62it/s] {'loss': 0.1704, 'grad_norm': 0.7115780115127563, 'learning_rate': 1.6169641182529615e-07, 'epoch': 2.78}
93%|█████████▎| 10684/11526 [1:51:51<08:39, 1.62it/s] 93%|█████████▎| 10685/11526 [1:51:51<08:37, 1.62it/s] {'loss': 0.1307, 'grad_norm': 0.5546984076499939, 'learning_rate': 1.613146403851812e-07, 'epoch': 2.78}
93%|█████████▎| 10685/11526 [1:51:51<08:37, 1.62it/s] 93%|█████████▎| 10686/11526 [1:51:52<08:37, 1.62it/s] {'loss': 0.1909, 'grad_norm': 0.8883015513420105, 'learning_rate': 1.6093331277681868e-07, 'epoch': 2.78}
93%|█████████▎| 10686/11526 [1:51:52<08:37, 1.62it/s] 93%|█████████▎| 10687/11526 [1:51:52<08:36, 1.62it/s] {'loss': 0.1521, 'grad_norm': 0.6004066467285156, 'learning_rate': 1.6055242903518563e-07, 'epoch': 2.78}
93%|█████████▎| 10687/11526 [1:51:53<08:36, 1.62it/s] 93%|█████████▎| 10688/11526 [1:51:53<08:35, 1.63it/s] {'loss': 0.1399, 'grad_norm': 0.5635852813720703, 'learning_rate': 1.601719891952197e-07, 'epoch': 2.78}
93%|█████████▎| 10688/11526 [1:51:53<08:35, 1.63it/s] 93%|█████████▎| 10689/11526 [1:51:54<08:35, 1.62it/s] {'loss': 0.1364, 'grad_norm': 0.5435221791267395, 'learning_rate': 1.5979199329181628e-07, 'epoch': 2.78}
93%|█████████▎| 10689/11526 [1:51:54<08:35, 1.62it/s] 93%|█████████▎| 10690/11526 [1:51:54<08:34, 1.63it/s] {'loss': 0.1659, 'grad_norm': 0.579088568687439, 'learning_rate': 1.594124413598308e-07, 'epoch': 2.78}
93%|█████████▎| 10690/11526 [1:51:54<08:34, 1.63it/s] 93%|█████████▎| 10691/11526 [1:51:55<08:35, 1.62it/s] {'loss': 0.1501, 'grad_norm': 0.5432403683662415, 'learning_rate': 1.5903333343407769e-07, 'epoch': 2.78}
93%|█████████▎| 10691/11526 [1:51:55<08:35, 1.62it/s] 93%|█████████▎| 10692/11526 [1:51:56<08:34, 1.62it/s] {'loss': 0.2272, 'grad_norm': 0.8891834616661072, 'learning_rate': 1.5865466954933128e-07, 'epoch': 2.78}
93%|█████████▎| 10692/11526 [1:51:56<08:34, 1.62it/s] 93%|█████████▎| 10693/11526 [1:51:56<08:32, 1.62it/s] {'loss': 0.1573, 'grad_norm': 0.6016741991043091, 'learning_rate': 1.5827644974032498e-07, 'epoch': 2.78}
93%|█████████▎| 10693/11526 [1:51:56<08:32, 1.62it/s] 93%|█████████▎| 10694/11526 [1:51:57<08:32, 1.62it/s] {'loss': 0.1139, 'grad_norm': 0.46450725197792053, 'learning_rate': 1.578986740417504e-07, 'epoch': 2.78}
93%|█████████▎| 10694/11526 [1:51:57<08:32, 1.62it/s] 93%|█████████▎| 10695/11526 [1:51:57<08:31, 1.63it/s] {'loss': 0.1646, 'grad_norm': 0.5790985226631165, 'learning_rate': 1.5752134248826102e-07, 'epoch': 2.78}
93%|█████████▎| 10695/11526 [1:51:58<08:31, 1.63it/s] 93%|█████████▎| 10696/11526 [1:51:58<08:31, 1.62it/s] {'loss': 0.1346, 'grad_norm': 0.5872274041175842, 'learning_rate': 1.5714445511446518e-07, 'epoch': 2.78}
93%|█████████▎| 10696/11526 [1:51:58<08:31, 1.62it/s] 93%|█████████▎| 10697/11526 [1:51:59<08:30, 1.63it/s] {'loss': 0.1348, 'grad_norm': 0.5515801310539246, 'learning_rate': 1.5676801195493584e-07, 'epoch': 2.78}
93%|█████████▎| 10697/11526 [1:51:59<08:30, 1.63it/s] 93%|█████████▎| 10698/11526 [1:51:59<08:29, 1.63it/s] {'loss': 0.1641, 'grad_norm': 0.6997490525245667, 'learning_rate': 1.5639201304420094e-07, 'epoch': 2.78}
93%|█████████▎| 10698/11526 [1:51:59<08:29, 1.63it/s] 93%|█████████▎| 10699/11526 [1:52:00<08:28, 1.63it/s] {'loss': 0.1502, 'grad_norm': 0.6572221517562866, 'learning_rate': 1.560164584167495e-07, 'epoch': 2.78}
93%|█████████▎| 10699/11526 [1:52:00<08:28, 1.63it/s] 93%|█████████▎| 10700/11526 [1:52:00<08:27, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.6018549799919128, 'learning_rate': 1.5564134810702958e-07, 'epoch': 2.79}
93%|█████████▎| 10700/11526 [1:52:01<08:27, 1.63it/s] 93%|█████████▎| 10701/11526 [1:52:01<08:27, 1.63it/s] {'loss': 0.1737, 'grad_norm': 0.6140480637550354, 'learning_rate': 1.5526668214944808e-07, 'epoch': 2.79}
93%|█████████▎| 10701/11526 [1:52:01<08:27, 1.63it/s] 93%|█████████▎| 10702/11526 [1:52:02<08:26, 1.63it/s] {'loss': 0.1683, 'grad_norm': 0.6255085468292236, 'learning_rate': 1.5489246057837248e-07, 'epoch': 2.79}
93%|█████████▎| 10702/11526 [1:52:02<08:26, 1.63it/s] 93%|█████████▎| 10703/11526 [1:52:02<08:26, 1.63it/s] {'loss': 0.1285, 'grad_norm': 0.5252294540405273, 'learning_rate': 1.5451868342812814e-07, 'epoch': 2.79}
93%|█████████▎| 10703/11526 [1:52:02<08:26, 1.63it/s] 93%|█████████▎| 10704/11526 [1:52:03<08:25, 1.62it/s] {'loss': 0.1053, 'grad_norm': 0.4479507505893707, 'learning_rate': 1.541453507329993e-07, 'epoch': 2.79}
93%|█████████▎| 10704/11526 [1:52:03<08:25, 1.62it/s] 93%|█████████▎| 10705/11526 [1:52:04<08:25, 1.62it/s] {'loss': 0.1374, 'grad_norm': 0.5655518770217896, 'learning_rate': 1.5377246252723078e-07, 'epoch': 2.79}
93%|█████████▎| 10705/11526 [1:52:04<08:25, 1.62it/s] 93%|█████████▎| 10706/11526 [1:52:04<08:25, 1.62it/s] {'loss': 0.1348, 'grad_norm': 0.5315779447555542, 'learning_rate': 1.5340001884502577e-07, 'epoch': 2.79}
93%|█████████▎| 10706/11526 [1:52:04<08:25, 1.62it/s] 93%|█████████▎| 10707/11526 [1:52:05<08:24, 1.62it/s] {'loss': 0.1537, 'grad_norm': 0.5406439304351807, 'learning_rate': 1.5302801972054748e-07, 'epoch': 2.79}
93%|█████████▎| 10707/11526 [1:52:05<08:24, 1.62it/s] 93%|█████████▎| 10708/11526 [1:52:05<08:23, 1.63it/s] {'loss': 0.1851, 'grad_norm': 0.7370311617851257, 'learning_rate': 1.5265646518791755e-07, 'epoch': 2.79}
93%|█████████▎| 10708/11526 [1:52:06<08:23, 1.63it/s] 93%|█████████▎| 10709/11526 [1:52:06<08:23, 1.62it/s] {'loss': 0.1604, 'grad_norm': 0.6234551668167114, 'learning_rate': 1.5228535528121534e-07, 'epoch': 2.79}
93%|█████████▎| 10709/11526 [1:52:06<08:23, 1.62it/s] 93%|█████████▎| 10710/11526 [1:52:07<08:22, 1.63it/s] {'loss': 0.1314, 'grad_norm': 0.5827344059944153, 'learning_rate': 1.5191469003448367e-07, 'epoch': 2.79}
93%|█████████▎| 10710/11526 [1:52:07<08:22, 1.63it/s] 93%|█████████▎| 10711/11526 [1:52:07<08:21, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.7880621552467346, 'learning_rate': 1.5154446948172196e-07, 'epoch': 2.79}
93%|█████████▎| 10711/11526 [1:52:07<08:21, 1.63it/s] 93%|█████████▎| 10712/11526 [1:52:08<08:20, 1.63it/s] {'loss': 0.1827, 'grad_norm': 0.6773557662963867, 'learning_rate': 1.5117469365688697e-07, 'epoch': 2.79}
93%|█████████▎| 10712/11526 [1:52:08<08:20, 1.63it/s] 93%|█████████▎| 10713/11526 [1:52:08<08:19, 1.63it/s] {'loss': 0.1594, 'grad_norm': 0.900326669216156, 'learning_rate': 1.5080536259389823e-07, 'epoch': 2.79}
93%|█████████▎| 10713/11526 [1:52:09<08:19, 1.63it/s] 93%|█████████▎| 10714/11526 [1:52:09<08:19, 1.62it/s] {'loss': 0.133, 'grad_norm': 0.5427837371826172, 'learning_rate': 1.5043647632663194e-07, 'epoch': 2.79}
93%|█████████▎| 10714/11526 [1:52:09<08:19, 1.62it/s] 93%|█████████▎| 10715/11526 [1:52:10<08:18, 1.63it/s] {'loss': 0.2236, 'grad_norm': 0.7089585065841675, 'learning_rate': 1.500680348889255e-07, 'epoch': 2.79}
93%|█████████▎| 10715/11526 [1:52:10<08:18, 1.63it/s] 93%|█████████▎| 10716/11526 [1:52:10<08:18, 1.62it/s] {'loss': 0.1606, 'grad_norm': 0.6808013916015625, 'learning_rate': 1.4970003831457414e-07, 'epoch': 2.79}
93%|█████████▎| 10716/11526 [1:52:10<08:18, 1.62it/s] 93%|█████████▎| 10717/11526 [1:52:11<08:17, 1.63it/s] {'loss': 0.1696, 'grad_norm': 0.6425439119338989, 'learning_rate': 1.4933248663733135e-07, 'epoch': 2.79}
93%|█████████▎| 10717/11526 [1:52:11<08:17, 1.63it/s] 93%|█████████▎| 10718/11526 [1:52:12<08:16, 1.63it/s] {'loss': 0.162, 'grad_norm': 0.6040393114089966, 'learning_rate': 1.4896537989091186e-07, 'epoch': 2.79}
93%|█████████▎| 10718/11526 [1:52:12<08:16, 1.63it/s] 93%|█████████▎| 10719/11526 [1:52:12<08:17, 1.62it/s] {'loss': 0.1803, 'grad_norm': 0.7555728554725647, 'learning_rate': 1.485987181089904e-07, 'epoch': 2.79}
93%|█████████▎| 10719/11526 [1:52:12<08:17, 1.62it/s] 93%|█████████▎| 10720/11526 [1:52:13<08:15, 1.63it/s] {'loss': 0.1436, 'grad_norm': 0.5563342571258545, 'learning_rate': 1.4823250132519672e-07, 'epoch': 2.79}
93%|█████████▎| 10720/11526 [1:52:13<08:15, 1.63it/s] 93%|█████████▎| 10721/11526 [1:52:13<08:18, 1.62it/s] {'loss': 0.1216, 'grad_norm': 0.5565229058265686, 'learning_rate': 1.47866729573124e-07, 'epoch': 2.79}
93%|█████████▎| 10721/11526 [1:52:14<08:18, 1.62it/s] 93%|█████████▎| 10722/11526 [1:52:14<08:16, 1.62it/s] {'loss': 0.1214, 'grad_norm': 0.5367710590362549, 'learning_rate': 1.4750140288632254e-07, 'epoch': 2.79}
93%|█████████▎| 10722/11526 [1:52:14<08:16, 1.62it/s] 93%|█████████▎| 10723/11526 [1:52:15<08:15, 1.62it/s] {'loss': 0.1494, 'grad_norm': 0.5937352776527405, 'learning_rate': 1.4713652129830058e-07, 'epoch': 2.79}
93%|█████████▎| 10723/11526 [1:52:15<08:15, 1.62it/s] 93%|█████████▎| 10724/11526 [1:52:15<08:16, 1.62it/s] {'loss': 0.1485, 'grad_norm': 0.6249909400939941, 'learning_rate': 1.4677208484253025e-07, 'epoch': 2.79}
93%|█████████▎| 10724/11526 [1:52:15<08:16, 1.62it/s] 93%|█████████▎| 10725/11526 [1:52:16<08:14, 1.62it/s] {'loss': 0.1563, 'grad_norm': 0.6153819561004639, 'learning_rate': 1.4640809355243702e-07, 'epoch': 2.79}
93%|█████████▎| 10725/11526 [1:52:16<08:14, 1.62it/s] 93%|█████████▎| 10726/11526 [1:52:16<08:13, 1.62it/s] {'loss': 0.139, 'grad_norm': 0.5621808767318726, 'learning_rate': 1.4604454746140972e-07, 'epoch': 2.79}
93%|█████████▎| 10726/11526 [1:52:17<08:13, 1.62it/s] 93%|█████████▎| 10727/11526 [1:52:17<08:12, 1.62it/s] {'loss': 0.1713, 'grad_norm': 0.6348364949226379, 'learning_rate': 1.4568144660279393e-07, 'epoch': 2.79}
93%|█████████▎| 10727/11526 [1:52:17<08:12, 1.62it/s] 93%|█████████▎| 10728/11526 [1:52:18<08:11, 1.62it/s] {'loss': 0.147, 'grad_norm': 0.5369569063186646, 'learning_rate': 1.4531879100989632e-07, 'epoch': 2.79}
93%|█████████▎| 10728/11526 [1:52:18<08:11, 1.62it/s] 93%|█████████▎| 10729/11526 [1:52:18<08:10, 1.62it/s] {'loss': 0.1368, 'grad_norm': 0.6862889528274536, 'learning_rate': 1.4495658071598083e-07, 'epoch': 2.79}
93%|█████████▎| 10729/11526 [1:52:18<08:10, 1.62it/s] 93%|█████████▎| 10730/11526 [1:52:19<08:09, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.635157585144043, 'learning_rate': 1.4459481575427204e-07, 'epoch': 2.79}
93%|█████████▎| 10730/11526 [1:52:19<08:09, 1.63it/s] 93%|█████████▎| 10731/11526 [1:52:20<08:09, 1.62it/s] {'loss': 0.1675, 'grad_norm': 0.5988638401031494, 'learning_rate': 1.4423349615795223e-07, 'epoch': 2.79}
93%|█████████▎| 10731/11526 [1:52:20<08:09, 1.62it/s] 93%|█████████▎| 10732/11526 [1:52:20<08:08, 1.63it/s] {'loss': 0.129, 'grad_norm': 0.5312473177909851, 'learning_rate': 1.4387262196016548e-07, 'epoch': 2.79}
93%|█████████▎| 10732/11526 [1:52:20<08:08, 1.63it/s] 93%|█████████▎| 10733/11526 [1:52:21<08:07, 1.63it/s] {'loss': 0.1256, 'grad_norm': 0.5345710515975952, 'learning_rate': 1.4351219319401145e-07, 'epoch': 2.79}
93%|█████████▎| 10733/11526 [1:52:21<08:07, 1.63it/s] 93%|█████████▎| 10734/11526 [1:52:21<08:06, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.6380838751792908, 'learning_rate': 1.4315220989255086e-07, 'epoch': 2.79}
93%|█████████▎| 10734/11526 [1:52:22<08:06, 1.63it/s] 93%|█████████▎| 10735/11526 [1:52:22<08:06, 1.63it/s] {'loss': 0.1713, 'grad_norm': 0.6611700057983398, 'learning_rate': 1.427926720888051e-07, 'epoch': 2.79}
93%|█████████▎| 10735/11526 [1:52:22<08:06, 1.63it/s] 93%|█████████▎| 10736/11526 [1:52:23<08:05, 1.63it/s] {'loss': 0.125, 'grad_norm': 0.4811897873878479, 'learning_rate': 1.4243357981575056e-07, 'epoch': 2.79}
93%|█████████▎| 10736/11526 [1:52:23<08:05, 1.63it/s] 93%|█████████▎| 10737/11526 [1:52:23<08:04, 1.63it/s] {'loss': 0.1192, 'grad_norm': 0.4768718481063843, 'learning_rate': 1.4207493310632649e-07, 'epoch': 2.79}
93%|█████████▎| 10737/11526 [1:52:23<08:04, 1.63it/s] 93%|█████████▎| 10738/11526 [1:52:24<08:04, 1.63it/s] {'loss': 0.1413, 'grad_norm': 0.6671316623687744, 'learning_rate': 1.4171673199343095e-07, 'epoch': 2.79}
93%|█████████▎| 10738/11526 [1:52:24<08:04, 1.63it/s] 93%|█████████▎| 10739/11526 [1:52:24<08:03, 1.63it/s] {'loss': 0.163, 'grad_norm': 0.5599462985992432, 'learning_rate': 1.4135897650991882e-07, 'epoch': 2.8}
93%|█████████▎| 10739/11526 [1:52:25<08:03, 1.63it/s] 93%|█████████▎| 10740/11526 [1:52:25<08:02, 1.63it/s] {'loss': 0.1663, 'grad_norm': 0.5331938862800598, 'learning_rate': 1.4100166668860604e-07, 'epoch': 2.8}
93%|█████████▎| 10740/11526 [1:52:25<08:02, 1.63it/s] 93%|█████████▎| 10741/11526 [1:52:26<08:02, 1.63it/s] {'loss': 0.1435, 'grad_norm': 0.6020914912223816, 'learning_rate': 1.4064480256226642e-07, 'epoch': 2.8}
93%|█████████▎| 10741/11526 [1:52:26<08:02, 1.63it/s] 93%|█████████▎| 10742/11526 [1:52:26<08:01, 1.63it/s] {'loss': 0.1526, 'grad_norm': 0.7432588934898376, 'learning_rate': 1.402883841636349e-07, 'epoch': 2.8}
93%|█████████▎| 10742/11526 [1:52:26<08:01, 1.63it/s] 93%|█████████▎| 10743/11526 [1:52:27<08:00, 1.63it/s] {'loss': 0.1645, 'grad_norm': 0.6382383108139038, 'learning_rate': 1.3993241152540304e-07, 'epoch': 2.8}
93%|█████████▎| 10743/11526 [1:52:27<08:00, 1.63it/s] 93%|█████████▎| 10744/11526 [1:52:28<08:00, 1.63it/s] {'loss': 0.1538, 'grad_norm': 0.6181519627571106, 'learning_rate': 1.3957688468022313e-07, 'epoch': 2.8}
93%|█████████▎| 10744/11526 [1:52:28<08:00, 1.63it/s] 93%|█████████▎| 10745/11526 [1:52:28<07:59, 1.63it/s] {'loss': 0.1458, 'grad_norm': 0.5757668018341064, 'learning_rate': 1.3922180366070516e-07, 'epoch': 2.8}
93%|█████████▎| 10745/11526 [1:52:28<07:59, 1.63it/s] 93%|█████████▎| 10746/11526 [1:52:29<07:59, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.5159108638763428, 'learning_rate': 1.388671684994214e-07, 'epoch': 2.8}
93%|█████████▎| 10746/11526 [1:52:29<07:59, 1.63it/s] 93%|█████████▎| 10747/11526 [1:52:29<07:59, 1.62it/s] {'loss': 0.1449, 'grad_norm': 0.640453577041626, 'learning_rate': 1.385129792288986e-07, 'epoch': 2.8}
93%|█████████▎| 10747/11526 [1:52:30<07:59, 1.62it/s] 93%|█████████▎| 10748/11526 [1:52:30<07:58, 1.63it/s] {'loss': 0.1561, 'grad_norm': 0.6239847540855408, 'learning_rate': 1.3815923588162637e-07, 'epoch': 2.8}
93%|█████████▎| 10748/11526 [1:52:30<07:58, 1.63it/s] 93%|█████████▎| 10749/11526 [1:52:31<07:58, 1.63it/s] {'loss': 0.1732, 'grad_norm': 0.5649290084838867, 'learning_rate': 1.3780593849005207e-07, 'epoch': 2.8}
93%|█████████▎| 10749/11526 [1:52:31<07:58, 1.63it/s] 93%|█████████▎| 10750/11526 [1:52:31<07:57, 1.63it/s] {'loss': 0.1395, 'grad_norm': 0.4995115399360657, 'learning_rate': 1.3745308708658144e-07, 'epoch': 2.8}
93%|█████████▎| 10750/11526 [1:52:31<07:57, 1.63it/s] 93%|█████████▎| 10751/11526 [1:52:32<07:56, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.5559002161026001, 'learning_rate': 1.371006817035808e-07, 'epoch': 2.8}
93%|█████████▎| 10751/11526 [1:52:32<07:56, 1.63it/s] 93%|█████████▎| 10752/11526 [1:52:32<07:55, 1.63it/s] {'loss': 0.2018, 'grad_norm': 0.6808789372444153, 'learning_rate': 1.3674872237337432e-07, 'epoch': 2.8}
93%|█████████▎| 10752/11526 [1:52:33<07:55, 1.63it/s] 93%|█████████▎| 10753/11526 [1:52:33<07:55, 1.63it/s] {'loss': 0.1427, 'grad_norm': 0.5627663135528564, 'learning_rate': 1.3639720912824562e-07, 'epoch': 2.8}
93%|█████████▎| 10753/11526 [1:52:33<07:55, 1.63it/s] 93%|█████████▎| 10754/11526 [1:52:34<07:55, 1.63it/s] {'loss': 0.1065, 'grad_norm': 0.520996630191803, 'learning_rate': 1.3604614200043774e-07, 'epoch': 2.8}
93%|█████████▎| 10754/11526 [1:52:34<07:55, 1.63it/s] 93%|█████████▎| 10755/11526 [1:52:34<07:54, 1.63it/s] {'loss': 0.1666, 'grad_norm': 0.6434426307678223, 'learning_rate': 1.3569552102215222e-07, 'epoch': 2.8}
93%|█████████▎| 10755/11526 [1:52:34<07:54, 1.63it/s] 93%|█████████▎| 10756/11526 [1:52:35<07:53, 1.63it/s] {'loss': 0.1619, 'grad_norm': 0.6958709955215454, 'learning_rate': 1.353453462255505e-07, 'epoch': 2.8}
93%|█████████▎| 10756/11526 [1:52:35<07:53, 1.63it/s] 93%|█████████▎| 10757/11526 [1:52:36<07:52, 1.63it/s] {'loss': 0.1461, 'grad_norm': 0.616338849067688, 'learning_rate': 1.349956176427525e-07, 'epoch': 2.8}
93%|█████████▎| 10757/11526 [1:52:36<07:52, 1.63it/s] 93%|█████████▎| 10758/11526 [1:52:36<07:51, 1.63it/s] {'loss': 0.1425, 'grad_norm': 0.575032114982605, 'learning_rate': 1.3464633530583638e-07, 'epoch': 2.8}
93%|█████████▎| 10758/11526 [1:52:36<07:51, 1.63it/s] 93%|█████████▎| 10759/11526 [1:52:37<07:51, 1.63it/s] {'loss': 0.1401, 'grad_norm': 0.6315423250198364, 'learning_rate': 1.342974992468421e-07, 'epoch': 2.8}
93%|█████████▎| 10759/11526 [1:52:37<07:51, 1.63it/s] 93%|█████████▎| 10760/11526 [1:52:37<07:50, 1.63it/s] {'loss': 0.1437, 'grad_norm': 0.6175947785377502, 'learning_rate': 1.3394910949776574e-07, 'epoch': 2.8}
93%|█████████▎| 10760/11526 [1:52:38<07:50, 1.63it/s] 93%|█████████▎| 10761/11526 [1:52:38<07:52, 1.62it/s] {'loss': 0.1353, 'grad_norm': 0.5660531520843506, 'learning_rate': 1.3360116609056339e-07, 'epoch': 2.8}
93%|█████████▎| 10761/11526 [1:52:38<07:52, 1.62it/s] 93%|█████████▎| 10762/11526 [1:52:39<07:51, 1.62it/s] {'loss': 0.1364, 'grad_norm': 0.5376220941543579, 'learning_rate': 1.3325366905715064e-07, 'epoch': 2.8}
93%|█████████▎| 10762/11526 [1:52:39<07:51, 1.62it/s] 93%|█████████▎| 10763/11526 [1:52:39<07:49, 1.62it/s] {'loss': 0.1176, 'grad_norm': 0.5289281010627747, 'learning_rate': 1.32906618429402e-07, 'epoch': 2.8}
93%|█████████▎| 10763/11526 [1:52:39<07:49, 1.62it/s] 93%|█████████▎| 10764/11526 [1:52:40<07:48, 1.62it/s] {'loss': 0.2227, 'grad_norm': 1.6912111043930054, 'learning_rate': 1.3256001423915143e-07, 'epoch': 2.8}
93%|█████████▎| 10764/11526 [1:52:40<07:48, 1.62it/s] 93%|█████████▎| 10765/11526 [1:52:40<07:48, 1.63it/s] {'loss': 0.1601, 'grad_norm': 0.6143153309822083, 'learning_rate': 1.322138565181913e-07, 'epoch': 2.8}
93%|█████████▎| 10765/11526 [1:52:41<07:48, 1.63it/s] 93%|█████████▎| 10766/11526 [1:52:41<07:47, 1.62it/s] {'loss': 0.1742, 'grad_norm': 0.7953528761863708, 'learning_rate': 1.3186814529827174e-07, 'epoch': 2.8}
93%|█████████▎| 10766/11526 [1:52:41<07:47, 1.62it/s] 93%|█████████▎| 10767/11526 [1:52:42<07:47, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.61693274974823, 'learning_rate': 1.3152288061110518e-07, 'epoch': 2.8}
93%|█████████▎| 10767/11526 [1:52:42<07:47, 1.63it/s] 93%|█████████▎| 10768/11526 [1:52:42<07:46, 1.63it/s] {'loss': 0.1441, 'grad_norm': 0.7006922960281372, 'learning_rate': 1.3117806248836018e-07, 'epoch': 2.8}
93%|█████████▎| 10768/11526 [1:52:42<07:46, 1.63it/s] 93%|█████████▎| 10769/11526 [1:52:43<07:45, 1.63it/s] {'loss': 0.1587, 'grad_norm': 0.6175943613052368, 'learning_rate': 1.308336909616653e-07, 'epoch': 2.8}
93%|█████████▎| 10769/11526 [1:52:43<07:45, 1.63it/s] 93%|█████████▎| 10770/11526 [1:52:44<07:45, 1.63it/s] {'loss': 0.1105, 'grad_norm': 0.4276878237724304, 'learning_rate': 1.304897660626092e-07, 'epoch': 2.8}
93%|█████████▎| 10770/11526 [1:52:44<07:45, 1.63it/s] 93%|█████████▎| 10771/11526 [1:52:44<07:45, 1.62it/s] {'loss': 0.1776, 'grad_norm': 0.6664550304412842, 'learning_rate': 1.3014628782273887e-07, 'epoch': 2.8}
93%|█████████▎| 10771/11526 [1:52:44<07:45, 1.62it/s] 93%|█████████▎| 10772/11526 [1:52:45<07:44, 1.62it/s] {'loss': 0.1432, 'grad_norm': 0.6467651724815369, 'learning_rate': 1.2980325627355794e-07, 'epoch': 2.8}
93%|█████████▎| 10772/11526 [1:52:45<07:44, 1.62it/s] 93%|█████████▎| 10773/11526 [1:52:45<07:43, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.6120660901069641, 'learning_rate': 1.294606714465335e-07, 'epoch': 2.8}
93%|█████████▎| 10773/11526 [1:52:46<07:43, 1.63it/s] 93%|█████████▎| 10774/11526 [1:52:46<07:42, 1.63it/s] {'loss': 0.1653, 'grad_norm': 0.5980198383331299, 'learning_rate': 1.2911853337308821e-07, 'epoch': 2.8}
93%|█████████▎| 10774/11526 [1:52:46<07:42, 1.63it/s] 93%|█████████▎| 10775/11526 [1:52:47<07:41, 1.63it/s] {'loss': 0.156, 'grad_norm': 0.615196943283081, 'learning_rate': 1.2877684208460528e-07, 'epoch': 2.8}
93%|█████████▎| 10775/11526 [1:52:47<07:41, 1.63it/s] 93%|█████████▎| 10776/11526 [1:52:47<07:41, 1.63it/s] {'loss': 0.1157, 'grad_norm': 0.49834001064300537, 'learning_rate': 1.284355976124263e-07, 'epoch': 2.8}
93%|█████████▎| 10776/11526 [1:52:47<07:41, 1.63it/s] 94%|█████████▎| 10777/11526 [1:52:48<07:40, 1.63it/s] {'loss': 0.178, 'grad_norm': 0.685225784778595, 'learning_rate': 1.2809479998785236e-07, 'epoch': 2.81}
94%|█████████▎| 10777/11526 [1:52:48<07:40, 1.63it/s] 94%|█████████▎| 10778/11526 [1:52:48<07:39, 1.63it/s] {'loss': 0.1869, 'grad_norm': 0.6376809477806091, 'learning_rate': 1.277544492421434e-07, 'epoch': 2.81}
94%|█████████▎| 10778/11526 [1:52:49<07:39, 1.63it/s] 94%|█████████▎| 10779/11526 [1:52:49<07:39, 1.63it/s] {'loss': 0.1591, 'grad_norm': 0.6051841378211975, 'learning_rate': 1.2741454540651898e-07, 'epoch': 2.81}
94%|█████████▎| 10779/11526 [1:52:49<07:39, 1.63it/s] 94%|█████████▎| 10780/11526 [1:52:50<07:38, 1.63it/s] {'loss': 0.1356, 'grad_norm': 0.6002342700958252, 'learning_rate': 1.270750885121552e-07, 'epoch': 2.81}
94%|█████████▎| 10780/11526 [1:52:50<07:38, 1.63it/s] 94%|█████████▎| 10781/11526 [1:52:50<07:38, 1.63it/s] {'loss': 0.1564, 'grad_norm': 0.7493525743484497, 'learning_rate': 1.267360785901911e-07, 'epoch': 2.81}
94%|█████████▎| 10781/11526 [1:52:50<07:38, 1.63it/s] 94%|█████████▎| 10782/11526 [1:52:51<07:37, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.5716196298599243, 'learning_rate': 1.263975156717212e-07, 'epoch': 2.81}
94%|█████████▎| 10782/11526 [1:52:51<07:37, 1.63it/s] 94%|█████████▎| 10783/11526 [1:52:52<07:36, 1.63it/s] {'loss': 0.1585, 'grad_norm': 0.7545385360717773, 'learning_rate': 1.260593997878007e-07, 'epoch': 2.81}
94%|█████████▎| 10783/11526 [1:52:52<07:36, 1.63it/s] 94%|█████████▎| 10784/11526 [1:52:52<07:35, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.6145988702774048, 'learning_rate': 1.2572173096944418e-07, 'epoch': 2.81}
94%|█████████▎| 10784/11526 [1:52:52<07:35, 1.63it/s] 94%|█████████▎| 10785/11526 [1:52:53<07:35, 1.63it/s] {'loss': 0.1773, 'grad_norm': 0.6891763210296631, 'learning_rate': 1.2538450924762301e-07, 'epoch': 2.81}
94%|█████████▎| 10785/11526 [1:52:53<07:35, 1.63it/s] 94%|█████████▎| 10786/11526 [1:52:53<07:34, 1.63it/s] {'loss': 0.1397, 'grad_norm': 0.5620798468589783, 'learning_rate': 1.2504773465327025e-07, 'epoch': 2.81}
94%|█████████▎| 10786/11526 [1:52:54<07:34, 1.63it/s] 94%|█████████▎| 10787/11526 [1:52:54<07:34, 1.63it/s] {'loss': 0.1445, 'grad_norm': 0.5376279354095459, 'learning_rate': 1.2471140721727727e-07, 'epoch': 2.81}
94%|█████████▎| 10787/11526 [1:52:54<07:34, 1.63it/s] 94%|█████████▎| 10788/11526 [1:52:55<07:33, 1.63it/s] {'loss': 0.1338, 'grad_norm': 0.6590264439582825, 'learning_rate': 1.2437552697049327e-07, 'epoch': 2.81}
94%|█████████▎| 10788/11526 [1:52:55<07:33, 1.63it/s] 94%|█████████▎| 10789/11526 [1:52:55<07:32, 1.63it/s] {'loss': 0.1921, 'grad_norm': 0.6979506015777588, 'learning_rate': 1.240400939437264e-07, 'epoch': 2.81}
94%|█████████▎| 10789/11526 [1:52:55<07:32, 1.63it/s] 94%|█████████▎| 10790/11526 [1:52:56<07:32, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.5881776213645935, 'learning_rate': 1.237051081677454e-07, 'epoch': 2.81}
94%|█████████▎| 10790/11526 [1:52:56<07:32, 1.63it/s] 94%|█████████▎| 10791/11526 [1:52:56<07:31, 1.63it/s] {'loss': 0.1416, 'grad_norm': 0.5634825229644775, 'learning_rate': 1.2337056967327732e-07, 'epoch': 2.81}
94%|█████████▎| 10791/11526 [1:52:57<07:31, 1.63it/s] 94%|█████████▎| 10792/11526 [1:52:57<07:31, 1.63it/s] {'loss': 0.1387, 'grad_norm': 0.5966026782989502, 'learning_rate': 1.2303647849100764e-07, 'epoch': 2.81}
94%|█████████▎| 10792/11526 [1:52:57<07:31, 1.63it/s] 94%|█████████▎| 10793/11526 [1:52:58<07:30, 1.63it/s] {'loss': 0.1877, 'grad_norm': 0.5687988996505737, 'learning_rate': 1.2270283465158016e-07, 'epoch': 2.81}
94%|█████████▎| 10793/11526 [1:52:58<07:30, 1.63it/s] 94%|█████████▎| 10794/11526 [1:52:58<07:30, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.6322779655456543, 'learning_rate': 1.2236963818559878e-07, 'epoch': 2.81}
94%|█████████▎| 10794/11526 [1:52:58<07:30, 1.63it/s] 94%|█████████▎| 10795/11526 [1:52:59<07:29, 1.63it/s] {'loss': 0.1393, 'grad_norm': 0.5206835865974426, 'learning_rate': 1.2203688912362842e-07, 'epoch': 2.81}
94%|█████████▎| 10795/11526 [1:52:59<07:29, 1.63it/s] 94%|█████████▎| 10796/11526 [1:53:00<07:29, 1.62it/s] {'loss': 0.1663, 'grad_norm': 0.691916823387146, 'learning_rate': 1.2170458749618806e-07, 'epoch': 2.81}
94%|█████████▎| 10796/11526 [1:53:00<07:29, 1.62it/s] 94%|█████████▎| 10797/11526 [1:53:00<07:28, 1.63it/s] {'loss': 0.131, 'grad_norm': 0.5419245958328247, 'learning_rate': 1.2137273333375943e-07, 'epoch': 2.81}
94%|█████████▎| 10797/11526 [1:53:00<07:28, 1.63it/s] 94%|█████████▎| 10798/11526 [1:53:01<07:27, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.6338812708854675, 'learning_rate': 1.2104132666678203e-07, 'epoch': 2.81}
94%|█████████▎| 10798/11526 [1:53:01<07:27, 1.63it/s] 94%|█████████▎| 10799/11526 [1:53:01<07:26, 1.63it/s] {'loss': 0.1428, 'grad_norm': 0.5676834583282471, 'learning_rate': 1.207103675256538e-07, 'epoch': 2.81}
94%|█████████▎| 10799/11526 [1:53:01<07:26, 1.63it/s] 94%|█████████▎| 10800/11526 [1:53:02<07:25, 1.63it/s] {'loss': 0.1597, 'grad_norm': 0.5340803265571594, 'learning_rate': 1.2037985594073377e-07, 'epoch': 2.81}
94%|█████████▎| 10800/11526 [1:53:02<07:25, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 12.99it/s]
31%|███ | 4/13 [00:00<00:01, 8.34it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.76it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.40it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.78it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
{'eval_loss': 0.5422919988632202, 'eval_runtime': 1.9571, 'eval_samples_per_second': 102.19, 'eval_steps_per_second': 6.642, 'epoch': 2.81}
94%|█████████▎| 10800/11526 [1:53:04<07:25, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.76it/s]
 94%|█████████▎| 10801/11526 [1:53:05<14:32, 1.20s/it] {'loss': 0.134, 'grad_norm': 0.5631111860275269, 'learning_rate': 1.2004979194233657e-07, 'epoch': 2.81}
94%|█████████▎| 10801/11526 [1:53:05<14:32, 1.20s/it] 94%|█████████▎| 10802/11526 [1:53:05<12:23, 1.03s/it] {'loss': 0.1471, 'grad_norm': 0.5617059469223022, 'learning_rate': 1.19720175560738e-07, 'epoch': 2.81}
94%|█████████▎| 10802/11526 [1:53:05<12:23, 1.03s/it] 94%|█████████▎| 10803/11526 [1:53:06<10:52, 1.11it/s] {'loss': 0.1517, 'grad_norm': 0.6553589105606079, 'learning_rate': 1.1939100682617278e-07, 'epoch': 2.81}
94%|█████████▎| 10803/11526 [1:53:06<10:52, 1.11it/s] 94%|█████████▎| 10804/11526 [1:53:06<09:49, 1.23it/s] {'loss': 0.1795, 'grad_norm': 0.6683325171470642, 'learning_rate': 1.1906228576883394e-07, 'epoch': 2.81}
94%|█████████▎| 10804/11526 [1:53:07<09:49, 1.23it/s] 94%|█████████▎| 10805/11526 [1:53:07<09:04, 1.32it/s] {'loss': 0.1311, 'grad_norm': 0.5124597549438477, 'learning_rate': 1.187340124188735e-07, 'epoch': 2.81}
94%|█████████▎| 10805/11526 [1:53:07<09:04, 1.32it/s] 94%|█████████▍| 10806/11526 [1:53:08<08:32, 1.40it/s] {'loss': 0.1081, 'grad_norm': 0.4999142587184906, 'learning_rate': 1.1840618680640348e-07, 'epoch': 2.81}
94%|█████████▍| 10806/11526 [1:53:08<08:32, 1.40it/s] 94%|█████████▍| 10807/11526 [1:53:08<08:11, 1.46it/s] {'loss': 0.1613, 'grad_norm': 0.7417898774147034, 'learning_rate': 1.1807880896149149e-07, 'epoch': 2.81}
94%|█████████▍| 10807/11526 [1:53:08<08:11, 1.46it/s] 94%|█████████▍| 10808/11526 [1:53:09<07:55, 1.51it/s] {'loss': 0.1469, 'grad_norm': 0.540634036064148, 'learning_rate': 1.1775187891416961e-07, 'epoch': 2.81}
94%|█████████▍| 10808/11526 [1:53:09<07:55, 1.51it/s] 94%|█████████▍| 10809/11526 [1:53:09<07:44, 1.54it/s] {'loss': 0.1478, 'grad_norm': 0.6177617311477661, 'learning_rate': 1.1742539669442388e-07, 'epoch': 2.81}
94%|█████████▍| 10809/11526 [1:53:10<07:44, 1.54it/s] 94%|█████████▍| 10810/11526 [1:53:10<07:36, 1.57it/s] {'loss': 0.1671, 'grad_norm': 0.6138095855712891, 'learning_rate': 1.1709936233220142e-07, 'epoch': 2.81}
94%|█████████▍| 10810/11526 [1:53:10<07:36, 1.57it/s] 94%|█████████▍| 10811/11526 [1:53:11<07:30, 1.59it/s] {'loss': 0.1579, 'grad_norm': 0.6823468208312988, 'learning_rate': 1.1677377585740835e-07, 'epoch': 2.81}
94%|█████████▍| 10811/11526 [1:53:11<07:30, 1.59it/s] 94%|█████████▍| 10812/11526 [1:53:11<07:26, 1.60it/s] {'loss': 0.1322, 'grad_norm': 0.5979026556015015, 'learning_rate': 1.1644863729990797e-07, 'epoch': 2.81}
94%|█████████▍| 10812/11526 [1:53:11<07:26, 1.60it/s] 94%|█████████▍| 10813/11526 [1:53:12<07:23, 1.61it/s] {'loss': 0.1737, 'grad_norm': 0.6684292554855347, 'learning_rate': 1.1612394668952531e-07, 'epoch': 2.81}
94%|█████████▍| 10813/11526 [1:53:12<07:23, 1.61it/s] 94%|█████████▍| 10814/11526 [1:53:13<07:21, 1.61it/s] {'loss': 0.1283, 'grad_norm': 0.49793103337287903, 'learning_rate': 1.1579970405604268e-07, 'epoch': 2.81}
94%|█████████▍| 10814/11526 [1:53:13<07:21, 1.61it/s] 94%|█████████▍| 10815/11526 [1:53:13<07:19, 1.62it/s] {'loss': 0.1453, 'grad_norm': 0.5971542000770569, 'learning_rate': 1.1547590942920128e-07, 'epoch': 2.81}
94%|█████████▍| 10815/11526 [1:53:13<07:19, 1.62it/s] 94%|█████████▍| 10816/11526 [1:53:14<07:17, 1.62it/s] {'loss': 0.1433, 'grad_norm': 0.5691225528717041, 'learning_rate': 1.1515256283870068e-07, 'epoch': 2.82}
94%|█████████▍| 10816/11526 [1:53:14<07:17, 1.62it/s] 94%|█████████▍| 10817/11526 [1:53:14<07:16, 1.62it/s] {'loss': 0.1262, 'grad_norm': 0.5761244297027588, 'learning_rate': 1.1482966431420051e-07, 'epoch': 2.82}
94%|█████████▍| 10817/11526 [1:53:15<07:16, 1.62it/s] 94%|█████████▍| 10818/11526 [1:53:15<07:15, 1.63it/s] {'loss': 0.1632, 'grad_norm': 0.6386290192604065, 'learning_rate': 1.1450721388531983e-07, 'epoch': 2.82}
94%|█████████▍| 10818/11526 [1:53:15<07:15, 1.63it/s] 94%|█████████▍| 10819/11526 [1:53:16<07:14, 1.63it/s] {'loss': 0.1258, 'grad_norm': 0.5243549346923828, 'learning_rate': 1.1418521158163443e-07, 'epoch': 2.82}
94%|█████████▍| 10819/11526 [1:53:16<07:14, 1.63it/s] 94%|█████████▍| 10820/11526 [1:53:16<07:14, 1.63it/s] {'loss': 0.1228, 'grad_norm': 0.5399309396743774, 'learning_rate': 1.1386365743268069e-07, 'epoch': 2.82}
94%|█████████▍| 10820/11526 [1:53:16<07:14, 1.63it/s] 94%|█████████▍| 10821/11526 [1:53:17<07:13, 1.63it/s] {'loss': 0.1388, 'grad_norm': 0.5769402384757996, 'learning_rate': 1.1354255146795223e-07, 'epoch': 2.82}
94%|█████████▍| 10821/11526 [1:53:17<07:13, 1.63it/s] 94%|█████████▍| 10822/11526 [1:53:17<07:12, 1.63it/s] {'loss': 0.1491, 'grad_norm': 0.5378438830375671, 'learning_rate': 1.1322189371690495e-07, 'epoch': 2.82}
94%|█████████▍| 10822/11526 [1:53:18<07:12, 1.63it/s] 94%|█████████▍| 10823/11526 [1:53:18<07:11, 1.63it/s] {'loss': 0.1384, 'grad_norm': 0.5544551610946655, 'learning_rate': 1.1290168420894976e-07, 'epoch': 2.82}
94%|█████████▍| 10823/11526 [1:53:18<07:11, 1.63it/s] 94%|█████████▍| 10824/11526 [1:53:19<07:11, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.6642981767654419, 'learning_rate': 1.125819229734587e-07, 'epoch': 2.82}
94%|█████████▍| 10824/11526 [1:53:19<07:11, 1.63it/s] 94%|█████████▍| 10825/11526 [1:53:19<07:12, 1.62it/s] {'loss': 0.1286, 'grad_norm': 0.5915255546569824, 'learning_rate': 1.1226261003976169e-07, 'epoch': 2.82}
94%|█████████▍| 10825/11526 [1:53:19<07:12, 1.62it/s] 94%|█████████▍| 10826/11526 [1:53:20<07:11, 1.62it/s] {'loss': 0.1532, 'grad_norm': 0.6184898614883423, 'learning_rate': 1.1194374543714803e-07, 'epoch': 2.82}
94%|█████████▍| 10826/11526 [1:53:20<07:11, 1.62it/s] 94%|█████████▍| 10827/11526 [1:53:21<07:10, 1.62it/s] {'loss': 0.162, 'grad_norm': 0.6293431520462036, 'learning_rate': 1.1162532919486713e-07, 'epoch': 2.82}
94%|█████████▍| 10827/11526 [1:53:21<07:10, 1.62it/s] 94%|█████████▍| 10828/11526 [1:53:21<07:09, 1.62it/s] {'loss': 0.1635, 'grad_norm': 0.5602840781211853, 'learning_rate': 1.1130736134212339e-07, 'epoch': 2.82}
94%|█████████▍| 10828/11526 [1:53:21<07:09, 1.62it/s] 94%|█████████▍| 10829/11526 [1:53:22<07:09, 1.62it/s] {'loss': 0.1019, 'grad_norm': 0.4536702632904053, 'learning_rate': 1.1098984190808403e-07, 'epoch': 2.82}
94%|█████████▍| 10829/11526 [1:53:22<07:09, 1.62it/s] 94%|█████████▍| 10830/11526 [1:53:22<07:08, 1.62it/s] {'loss': 0.1288, 'grad_norm': 0.5010046362876892, 'learning_rate': 1.1067277092187412e-07, 'epoch': 2.82}
94%|█████████▍| 10830/11526 [1:53:23<07:08, 1.62it/s] 94%|█████████▍| 10831/11526 [1:53:23<07:08, 1.62it/s] {'loss': 0.152, 'grad_norm': 0.6394920349121094, 'learning_rate': 1.1035614841257702e-07, 'epoch': 2.82}
94%|█████████▍| 10831/11526 [1:53:23<07:08, 1.62it/s] 94%|█████████▍| 10832/11526 [1:53:24<07:07, 1.62it/s] {'loss': 0.1789, 'grad_norm': 0.7833324670791626, 'learning_rate': 1.1003997440923453e-07, 'epoch': 2.82}
94%|█████████▍| 10832/11526 [1:53:24<07:07, 1.62it/s] 94%|█████████▍| 10833/11526 [1:53:24<07:06, 1.62it/s] {'loss': 0.1779, 'grad_norm': 0.7543907165527344, 'learning_rate': 1.0972424894084899e-07, 'epoch': 2.82}
94%|█████████▍| 10833/11526 [1:53:24<07:06, 1.62it/s] 94%|█████████▍| 10834/11526 [1:53:25<07:06, 1.62it/s] {'loss': 0.1232, 'grad_norm': 0.5441473126411438, 'learning_rate': 1.0940897203637835e-07, 'epoch': 2.82}
94%|█████████▍| 10834/11526 [1:53:25<07:06, 1.62it/s] 94%|█████████▍| 10835/11526 [1:53:25<07:05, 1.62it/s] {'loss': 0.1552, 'grad_norm': 0.6305564641952515, 'learning_rate': 1.0909414372474392e-07, 'epoch': 2.82}
94%|█████████▍| 10835/11526 [1:53:26<07:05, 1.62it/s] 94%|█████████▍| 10836/11526 [1:53:26<07:04, 1.62it/s] {'loss': 0.155, 'grad_norm': 0.6153624057769775, 'learning_rate': 1.0877976403482316e-07, 'epoch': 2.82}
94%|█████████▍| 10836/11526 [1:53:26<07:04, 1.62it/s] 94%|█████████▍| 10837/11526 [1:53:27<07:03, 1.63it/s] {'loss': 0.1279, 'grad_norm': 0.6065656542778015, 'learning_rate': 1.0846583299545244e-07, 'epoch': 2.82}
94%|█████████▍| 10837/11526 [1:53:27<07:03, 1.63it/s] 94%|█████████▍| 10838/11526 [1:53:27<07:03, 1.63it/s] {'loss': 0.1605, 'grad_norm': 0.6389105319976807, 'learning_rate': 1.081523506354265e-07, 'epoch': 2.82}
94%|█████████▍| 10838/11526 [1:53:27<07:03, 1.63it/s] 94%|█████████▍| 10839/11526 [1:53:28<07:03, 1.62it/s] {'loss': 0.165, 'grad_norm': 0.5476582050323486, 'learning_rate': 1.078393169835007e-07, 'epoch': 2.82}
94%|█████████▍| 10839/11526 [1:53:28<07:03, 1.62it/s] 94%|█████████▍| 10840/11526 [1:53:29<07:02, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.6058326959609985, 'learning_rate': 1.075267320683876e-07, 'epoch': 2.82}
94%|█████████▍| 10840/11526 [1:53:29<07:02, 1.62it/s] 94%|█████████▍| 10841/11526 [1:53:29<07:01, 1.62it/s] {'loss': 0.1345, 'grad_norm': 0.5845629572868347, 'learning_rate': 1.0721459591876038e-07, 'epoch': 2.82}
94%|█████████▍| 10841/11526 [1:53:29<07:01, 1.62it/s] 94%|█████████▍| 10842/11526 [1:53:30<07:00, 1.63it/s] {'loss': 0.1389, 'grad_norm': 0.5487120151519775, 'learning_rate': 1.0690290856324892e-07, 'epoch': 2.82}
94%|█████████▍| 10842/11526 [1:53:30<07:00, 1.63it/s] 94%|█████████▍| 10843/11526 [1:53:30<06:59, 1.63it/s] {'loss': 0.1322, 'grad_norm': 0.508861243724823, 'learning_rate': 1.0659167003044313e-07, 'epoch': 2.82}
94%|█████████▍| 10843/11526 [1:53:31<06:59, 1.63it/s] 94%|█████████▍| 10844/11526 [1:53:31<06:59, 1.63it/s] {'loss': 0.1469, 'grad_norm': 0.5791577696800232, 'learning_rate': 1.0628088034889184e-07, 'epoch': 2.82}
94%|█████████▍| 10844/11526 [1:53:31<06:59, 1.63it/s] 94%|█████████▍| 10845/11526 [1:53:32<06:58, 1.63it/s] {'loss': 0.153, 'grad_norm': 0.6767048239707947, 'learning_rate': 1.0597053954710223e-07, 'epoch': 2.82}
94%|█████████▍| 10845/11526 [1:53:32<06:58, 1.63it/s] 94%|█████████▍| 10846/11526 [1:53:32<06:58, 1.63it/s] {'loss': 0.1303, 'grad_norm': 0.5159775614738464, 'learning_rate': 1.0566064765354045e-07, 'epoch': 2.82}
94%|█████████▍| 10846/11526 [1:53:32<06:58, 1.63it/s] 94%|█████████▍| 10847/11526 [1:53:33<06:57, 1.63it/s] {'loss': 0.1542, 'grad_norm': 0.5768151879310608, 'learning_rate': 1.0535120469663207e-07, 'epoch': 2.82}
94%|█████████▍| 10847/11526 [1:53:33<06:57, 1.63it/s] 94%|█████████▍| 10848/11526 [1:53:33<06:56, 1.63it/s] {'loss': 0.1447, 'grad_norm': 0.5556932687759399, 'learning_rate': 1.050422107047605e-07, 'epoch': 2.82}
94%|█████████▍| 10848/11526 [1:53:34<06:56, 1.63it/s] 94%|█████████▍| 10849/11526 [1:53:34<06:56, 1.63it/s] {'loss': 0.1363, 'grad_norm': 0.6312822699546814, 'learning_rate': 1.0473366570626864e-07, 'epoch': 2.82}
94%|█████████▍| 10849/11526 [1:53:34<06:56, 1.63it/s] 94%|█████████▍| 10850/11526 [1:53:35<06:55, 1.63it/s] {'loss': 0.1604, 'grad_norm': 0.5951486229896545, 'learning_rate': 1.0442556972945772e-07, 'epoch': 2.82}
94%|█████████▍| 10850/11526 [1:53:35<06:55, 1.63it/s] 94%|█████████▍| 10851/11526 [1:53:35<06:55, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.5586808323860168, 'learning_rate': 1.0411792280258793e-07, 'epoch': 2.82}
94%|█████████▍| 10851/11526 [1:53:35<06:55, 1.63it/s] 94%|█████████▍| 10852/11526 [1:53:36<06:54, 1.63it/s] {'loss': 0.1284, 'grad_norm': 0.5407603979110718, 'learning_rate': 1.038107249538789e-07, 'epoch': 2.82}
94%|█████████▍| 10852/11526 [1:53:36<06:54, 1.63it/s] 94%|█████████▍| 10853/11526 [1:53:37<06:53, 1.63it/s] {'loss': 0.142, 'grad_norm': 0.6255103945732117, 'learning_rate': 1.0350397621150864e-07, 'epoch': 2.82}
94%|█████████▍| 10853/11526 [1:53:37<06:53, 1.63it/s] 94%|█████████▍| 10854/11526 [1:53:37<06:53, 1.63it/s] {'loss': 0.138, 'grad_norm': 0.5796244144439697, 'learning_rate': 1.0319767660361301e-07, 'epoch': 2.83}
94%|█████████▍| 10854/11526 [1:53:37<06:53, 1.63it/s] 94%|█████████▍| 10855/11526 [1:53:38<06:52, 1.63it/s] {'loss': 0.1238, 'grad_norm': 0.4947834014892578, 'learning_rate': 1.0289182615828896e-07, 'epoch': 2.83}
94%|█████████▍| 10855/11526 [1:53:38<06:52, 1.63it/s] 94%|█████████▍| 10856/11526 [1:53:38<06:52, 1.63it/s] {'loss': 0.1291, 'grad_norm': 0.5273683071136475, 'learning_rate': 1.0258642490358905e-07, 'epoch': 2.83}
94%|█████████▍| 10856/11526 [1:53:39<06:52, 1.63it/s] 94%|█████████▍| 10857/11526 [1:53:39<06:51, 1.63it/s] {'loss': 0.1675, 'grad_norm': 0.6862003803253174, 'learning_rate': 1.0228147286752809e-07, 'epoch': 2.83}
94%|█████████▍| 10857/11526 [1:53:39<06:51, 1.63it/s] 94%|█████████▍| 10858/11526 [1:53:40<06:50, 1.63it/s] {'loss': 0.1389, 'grad_norm': 0.581601619720459, 'learning_rate': 1.0197697007807706e-07, 'epoch': 2.83}
94%|█████████▍| 10858/11526 [1:53:40<06:50, 1.63it/s] 94%|█████████▍| 10859/11526 [1:53:40<06:49, 1.63it/s] {'loss': 0.1523, 'grad_norm': 0.6315754055976868, 'learning_rate': 1.0167291656316691e-07, 'epoch': 2.83}
94%|█████████▍| 10859/11526 [1:53:40<06:49, 1.63it/s] 94%|█████████▍| 10860/11526 [1:53:41<06:49, 1.63it/s] {'loss': 0.1413, 'grad_norm': 0.6282773613929749, 'learning_rate': 1.0136931235068758e-07, 'epoch': 2.83}
94%|█████████▍| 10860/11526 [1:53:41<06:49, 1.63it/s] 94%|█████████▍| 10861/11526 [1:53:41<06:48, 1.63it/s] {'loss': 0.1661, 'grad_norm': 0.7410260438919067, 'learning_rate': 1.0106615746848625e-07, 'epoch': 2.83}
94%|█████████▍| 10861/11526 [1:53:42<06:48, 1.63it/s] 94%|█████████▍| 10862/11526 [1:53:42<06:48, 1.63it/s] {'loss': 0.1148, 'grad_norm': 0.5622544288635254, 'learning_rate': 1.007634519443712e-07, 'epoch': 2.83}
94%|█████████▍| 10862/11526 [1:53:42<06:48, 1.63it/s] 94%|█████████▍| 10863/11526 [1:53:43<06:47, 1.63it/s] {'loss': 0.1471, 'grad_norm': 0.7026391625404358, 'learning_rate': 1.0046119580610803e-07, 'epoch': 2.83}
94%|█████████▍| 10863/11526 [1:53:43<06:47, 1.63it/s] 94%|█████████▍| 10864/11526 [1:53:43<06:46, 1.63it/s] {'loss': 0.1544, 'grad_norm': 0.5592085719108582, 'learning_rate': 1.0015938908142064e-07, 'epoch': 2.83}
94%|█████████▍| 10864/11526 [1:53:43<06:46, 1.63it/s] 94%|█████████▍| 10865/11526 [1:53:44<06:46, 1.63it/s] {'loss': 0.1383, 'grad_norm': 0.5623502135276794, 'learning_rate': 9.985803179799358e-08, 'epoch': 2.83}
94%|█████████▍| 10865/11526 [1:53:44<06:46, 1.63it/s] 94%|█████████▍| 10866/11526 [1:53:45<06:47, 1.62it/s] {'loss': 0.1341, 'grad_norm': 0.5833946466445923, 'learning_rate': 9.955712398346806e-08, 'epoch': 2.83}
94%|█████████▍| 10866/11526 [1:53:45<06:47, 1.62it/s] 94%|█████████▍| 10867/11526 [1:53:45<06:46, 1.62it/s] {'loss': 0.2073, 'grad_norm': 0.7450925707817078, 'learning_rate': 9.925666566544534e-08, 'epoch': 2.83}
94%|█████████▍| 10867/11526 [1:53:45<06:46, 1.62it/s] 94%|█████████▍| 10868/11526 [1:53:46<06:45, 1.62it/s] {'loss': 0.1477, 'grad_norm': 0.5429943203926086, 'learning_rate': 9.895665687148614e-08, 'epoch': 2.83}
94%|█████████▍| 10868/11526 [1:53:46<06:45, 1.62it/s] 94%|█████████▍| 10869/11526 [1:53:46<06:45, 1.62it/s] {'loss': 0.1252, 'grad_norm': 0.5521016120910645, 'learning_rate': 9.865709762910792e-08, 'epoch': 2.83}
94%|█████████▍| 10869/11526 [1:53:47<06:45, 1.62it/s] 94%|█████████▍| 10870/11526 [1:53:47<06:43, 1.62it/s] {'loss': 0.1165, 'grad_norm': 0.48632171750068665, 'learning_rate': 9.835798796578755e-08, 'epoch': 2.83}
94%|█████████▍| 10870/11526 [1:53:47<06:43, 1.62it/s] 94%|█████████▍| 10871/11526 [1:53:48<06:43, 1.62it/s] {'loss': 0.1412, 'grad_norm': 0.5386618971824646, 'learning_rate': 9.8059327908962e-08, 'epoch': 2.83}
94%|█████████▍| 10871/11526 [1:53:48<06:43, 1.62it/s] 94%|█████████▍| 10872/11526 [1:53:48<06:42, 1.62it/s] {'loss': 0.1203, 'grad_norm': 0.5081507563591003, 'learning_rate': 9.776111748602601e-08, 'epoch': 2.83}
94%|█████████▍| 10872/11526 [1:53:48<06:42, 1.62it/s] 94%|█████████▍| 10873/11526 [1:53:49<06:41, 1.63it/s] {'loss': 0.154, 'grad_norm': 0.5909155011177063, 'learning_rate': 9.746335672433272e-08, 'epoch': 2.83}
94%|█████████▍| 10873/11526 [1:53:49<06:41, 1.63it/s] 94%|█████████▍| 10874/11526 [1:53:49<06:41, 1.62it/s] {'loss': 0.1182, 'grad_norm': 0.5181498527526855, 'learning_rate': 9.71660456511947e-08, 'epoch': 2.83}
94%|█████████▍| 10874/11526 [1:53:50<06:41, 1.62it/s] 94%|█████████▍| 10875/11526 [1:53:50<06:40, 1.62it/s] {'loss': 0.1689, 'grad_norm': 0.5919679999351501, 'learning_rate': 9.686918429388292e-08, 'epoch': 2.83}
94%|█████████▍| 10875/11526 [1:53:50<06:40, 1.62it/s] 94%|█████████▍| 10876/11526 [1:53:51<06:40, 1.62it/s] {'loss': 0.1382, 'grad_norm': 0.6032755970954895, 'learning_rate': 9.657277267962784e-08, 'epoch': 2.83}
94%|█████████▍| 10876/11526 [1:53:51<06:40, 1.62it/s] 94%|█████████▍| 10877/11526 [1:53:51<06:39, 1.62it/s] {'loss': 0.14, 'grad_norm': 0.5701914429664612, 'learning_rate': 9.627681083561658e-08, 'epoch': 2.83}
94%|█████████▍| 10877/11526 [1:53:51<06:39, 1.62it/s] 94%|█████████▍| 10878/11526 [1:53:52<06:38, 1.62it/s] {'loss': 0.1634, 'grad_norm': 0.549430787563324, 'learning_rate': 9.598129878899687e-08, 'epoch': 2.83}
94%|█████████▍| 10878/11526 [1:53:52<06:38, 1.62it/s] 94%|█████████▍| 10879/11526 [1:53:53<06:40, 1.62it/s] {'loss': 0.1238, 'grad_norm': 0.5205738544464111, 'learning_rate': 9.568623656687648e-08, 'epoch': 2.83}
94%|█████████▍| 10879/11526 [1:53:53<06:40, 1.62it/s] 94%|█████████▍| 10880/11526 [1:53:53<06:38, 1.62it/s] {'loss': 0.1416, 'grad_norm': 0.5559220910072327, 'learning_rate': 9.539162419631764e-08, 'epoch': 2.83}
94%|█████████▍| 10880/11526 [1:53:53<06:38, 1.62it/s] 94%|█████████▍| 10881/11526 [1:53:54<06:37, 1.62it/s] {'loss': 0.1822, 'grad_norm': 0.6751832365989685, 'learning_rate': 9.509746170434597e-08, 'epoch': 2.83}
94%|█████████▍| 10881/11526 [1:53:54<06:37, 1.62it/s] 94%|█████████▍| 10882/11526 [1:53:54<06:36, 1.62it/s] {'loss': 0.1409, 'grad_norm': 0.5102043747901917, 'learning_rate': 9.480374911794265e-08, 'epoch': 2.83}
94%|█████████▍| 10882/11526 [1:53:55<06:36, 1.62it/s] 94%|█████████▍| 10883/11526 [1:53:55<06:35, 1.62it/s] {'loss': 0.1679, 'grad_norm': 0.6481784582138062, 'learning_rate': 9.451048646404837e-08, 'epoch': 2.83}
94%|█████████▍| 10883/11526 [1:53:55<06:35, 1.62it/s] 94%|█████████▍| 10884/11526 [1:53:56<06:35, 1.62it/s] {'loss': 0.1947, 'grad_norm': 0.7208816409111023, 'learning_rate': 9.421767376956381e-08, 'epoch': 2.83}
94%|█████████▍| 10884/11526 [1:53:56<06:35, 1.62it/s] 94%|█████████▍| 10885/11526 [1:53:56<06:34, 1.63it/s] {'loss': 0.18, 'grad_norm': 0.6918247938156128, 'learning_rate': 9.392531106134695e-08, 'epoch': 2.83}
94%|█████████▍| 10885/11526 [1:53:56<06:34, 1.63it/s] 94%|█████████▍| 10886/11526 [1:53:57<06:33, 1.62it/s] {'loss': 0.1942, 'grad_norm': 0.7411441802978516, 'learning_rate': 9.363339836621466e-08, 'epoch': 2.83}
94%|█████████▍| 10886/11526 [1:53:57<06:33, 1.62it/s] 94%|█████████▍| 10887/11526 [1:53:57<06:33, 1.63it/s] {'loss': 0.1552, 'grad_norm': 0.6113187074661255, 'learning_rate': 9.334193571094329e-08, 'epoch': 2.83}
94%|█████████▍| 10887/11526 [1:53:58<06:33, 1.63it/s] 94%|█████████▍| 10888/11526 [1:53:58<06:32, 1.63it/s] {'loss': 0.1324, 'grad_norm': 0.5684190392494202, 'learning_rate': 9.30509231222676e-08, 'epoch': 2.83}
94%|█████████▍| 10888/11526 [1:53:58<06:32, 1.63it/s] 94%|█████████▍| 10889/11526 [1:53:59<06:31, 1.63it/s] {'loss': 0.1113, 'grad_norm': 0.47534140944480896, 'learning_rate': 9.27603606268801e-08, 'epoch': 2.83}
94%|█████████▍| 10889/11526 [1:53:59<06:31, 1.63it/s] 94%|█████████▍| 10890/11526 [1:53:59<06:31, 1.63it/s] {'loss': 0.1588, 'grad_norm': 0.6028677821159363, 'learning_rate': 9.247024825143447e-08, 'epoch': 2.83}
94%|█████████▍| 10890/11526 [1:53:59<06:31, 1.63it/s] 94%|█████████▍| 10891/11526 [1:54:00<06:30, 1.62it/s] {'loss': 0.1596, 'grad_norm': 0.5910957455635071, 'learning_rate': 9.218058602253943e-08, 'epoch': 2.83}
94%|█████████▍| 10891/11526 [1:54:00<06:30, 1.62it/s] 94%|█████████▍| 10892/11526 [1:54:01<06:29, 1.63it/s] {'loss': 0.1608, 'grad_norm': 0.6714658737182617, 'learning_rate': 9.18913739667654e-08, 'epoch': 2.83}
94%|█████████▍| 10892/11526 [1:54:01<06:29, 1.63it/s] 95%|█████████▍| 10893/11526 [1:54:01<06:29, 1.63it/s] {'loss': 0.1543, 'grad_norm': 0.5719507932662964, 'learning_rate': 9.16026121106406e-08, 'epoch': 2.84}
95%|█████████▍| 10893/11526 [1:54:01<06:29, 1.63it/s] 95%|█████████▍| 10894/11526 [1:54:02<06:28, 1.63it/s] {'loss': 0.1524, 'grad_norm': 0.5460296273231506, 'learning_rate': 9.13143004806516e-08, 'epoch': 2.84}
95%|█████████▍| 10894/11526 [1:54:02<06:28, 1.63it/s] 95%|█████████▍| 10895/11526 [1:54:02<06:27, 1.63it/s] {'loss': 0.1611, 'grad_norm': 0.6730301976203918, 'learning_rate': 9.102643910324449e-08, 'epoch': 2.84}
95%|█████████▍| 10895/11526 [1:54:03<06:27, 1.63it/s] 95%|█████████▍| 10896/11526 [1:54:03<06:27, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.5379338264465332, 'learning_rate': 9.07390280048237e-08, 'epoch': 2.84}
95%|█████████▍| 10896/11526 [1:54:03<06:27, 1.63it/s] 95%|█████████▍| 10897/11526 [1:54:04<06:26, 1.63it/s] {'loss': 0.148, 'grad_norm': 0.5601884126663208, 'learning_rate': 9.045206721175149e-08, 'epoch': 2.84}
95%|█████████▍| 10897/11526 [1:54:04<06:26, 1.63it/s] 95%|█████████▍| 10898/11526 [1:54:04<06:25, 1.63it/s] {'loss': 0.1749, 'grad_norm': 0.6267274022102356, 'learning_rate': 9.01655567503501e-08, 'epoch': 2.84}
95%|█████████▍| 10898/11526 [1:54:04<06:25, 1.63it/s] 95%|█████████▍| 10899/11526 [1:54:05<06:25, 1.63it/s] {'loss': 0.1304, 'grad_norm': 0.5409154891967773, 'learning_rate': 8.98794966468991e-08, 'epoch': 2.84}
95%|█████████▍| 10899/11526 [1:54:05<06:25, 1.63it/s] 95%|█████████▍| 10900/11526 [1:54:05<06:24, 1.63it/s] {'loss': 0.1652, 'grad_norm': 0.6449373364448547, 'learning_rate': 8.959388692763804e-08, 'epoch': 2.84}
95%|█████████▍| 10900/11526 [1:54:06<06:24, 1.63it/s] 95%|█████████▍| 10901/11526 [1:54:06<06:24, 1.63it/s] {'loss': 0.1128, 'grad_norm': 0.5216627717018127, 'learning_rate': 8.930872761876541e-08, 'epoch': 2.84}
95%|█████████▍| 10901/11526 [1:54:06<06:24, 1.63it/s] 95%|█████████▍| 10902/11526 [1:54:07<06:23, 1.63it/s] {'loss': 0.1365, 'grad_norm': 0.5130006670951843, 'learning_rate': 8.90240187464364e-08, 'epoch': 2.84}
95%|█████████▍| 10902/11526 [1:54:07<06:23, 1.63it/s] 95%|█████████▍| 10903/11526 [1:54:07<06:23, 1.63it/s] {'loss': 0.1566, 'grad_norm': 0.6113899350166321, 'learning_rate': 8.873976033676679e-08, 'epoch': 2.84}
95%|█████████▍| 10903/11526 [1:54:07<06:23, 1.63it/s] 95%|█████████▍| 10904/11526 [1:54:08<06:22, 1.62it/s] {'loss': 0.1649, 'grad_norm': 0.6675362586975098, 'learning_rate': 8.845595241583071e-08, 'epoch': 2.84}
95%|█████████▍| 10904/11526 [1:54:08<06:22, 1.62it/s] 95%|█████████▍| 10905/11526 [1:54:09<06:22, 1.63it/s] {'loss': 0.1382, 'grad_norm': 0.5032355189323425, 'learning_rate': 8.817259500965958e-08, 'epoch': 2.84}
95%|█████████▍| 10905/11526 [1:54:09<06:22, 1.63it/s] 95%|█████████▍| 10906/11526 [1:54:09<06:21, 1.63it/s] {'loss': 0.1155, 'grad_norm': 0.4904010593891144, 'learning_rate': 8.788968814424536e-08, 'epoch': 2.84}
95%|█████████▍| 10906/11526 [1:54:09<06:21, 1.63it/s] 95%|█████████▍| 10907/11526 [1:54:10<06:20, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.59110027551651, 'learning_rate': 8.760723184553787e-08, 'epoch': 2.84}
95%|█████████▍| 10907/11526 [1:54:10<06:20, 1.63it/s] 95%|█████████▍| 10908/11526 [1:54:10<06:19, 1.63it/s] {'loss': 0.1205, 'grad_norm': 0.45960932970046997, 'learning_rate': 8.732522613944527e-08, 'epoch': 2.84}
95%|█████████▍| 10908/11526 [1:54:11<06:19, 1.63it/s] 95%|█████████▍| 10909/11526 [1:54:11<06:19, 1.63it/s] {'loss': 0.132, 'grad_norm': 0.6111765503883362, 'learning_rate': 8.704367105183575e-08, 'epoch': 2.84}
95%|█████████▍| 10909/11526 [1:54:11<06:19, 1.63it/s] 95%|█████████▍| 10910/11526 [1:54:12<06:18, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.5323822498321533, 'learning_rate': 8.676256660853311e-08, 'epoch': 2.84}
95%|█████████▍| 10910/11526 [1:54:12<06:18, 1.63it/s] 95%|█████████▍| 10911/11526 [1:54:12<06:18, 1.63it/s] {'loss': 0.1393, 'grad_norm': 0.6572916507720947, 'learning_rate': 8.648191283532337e-08, 'epoch': 2.84}
95%|█████████▍| 10911/11526 [1:54:12<06:18, 1.63it/s] 95%|█████████▍| 10912/11526 [1:54:13<06:17, 1.63it/s] {'loss': 0.1756, 'grad_norm': 0.6805550456047058, 'learning_rate': 8.620170975794928e-08, 'epoch': 2.84}
95%|█████████▍| 10912/11526 [1:54:13<06:17, 1.63it/s] 95%|█████████▍| 10913/11526 [1:54:13<06:16, 1.63it/s] {'loss': 0.1576, 'grad_norm': 0.6887146234512329, 'learning_rate': 8.592195740211306e-08, 'epoch': 2.84}
95%|█████████▍| 10913/11526 [1:54:14<06:16, 1.63it/s] 95%|█████████▍| 10914/11526 [1:54:14<06:16, 1.63it/s] {'loss': 0.1059, 'grad_norm': 0.48743942379951477, 'learning_rate': 8.564265579347475e-08, 'epoch': 2.84}
95%|█████████▍| 10914/11526 [1:54:14<06:16, 1.63it/s] 95%|█████████▍| 10915/11526 [1:54:15<06:15, 1.63it/s] {'loss': 0.1269, 'grad_norm': 0.5509501695632935, 'learning_rate': 8.536380495765328e-08, 'epoch': 2.84}
95%|█████████▍| 10915/11526 [1:54:15<06:15, 1.63it/s] 95%|█████████▍| 10916/11526 [1:54:15<06:15, 1.62it/s] {'loss': 0.1543, 'grad_norm': 0.6594672799110413, 'learning_rate': 8.50854049202271e-08, 'epoch': 2.84}
95%|█████████▍| 10916/11526 [1:54:15<06:15, 1.62it/s] 95%|█████████▍| 10917/11526 [1:54:16<06:14, 1.62it/s] {'loss': 0.1433, 'grad_norm': 0.5593491792678833, 'learning_rate': 8.480745570673243e-08, 'epoch': 2.84}
95%|█████████▍| 10917/11526 [1:54:16<06:14, 1.62it/s] 95%|█████████▍| 10918/11526 [1:54:17<06:14, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.6169701814651489, 'learning_rate': 8.452995734266389e-08, 'epoch': 2.84}
95%|█████████▍| 10918/11526 [1:54:17<06:14, 1.63it/s] 95%|█████████▍| 10919/11526 [1:54:17<06:13, 1.63it/s] {'loss': 0.1549, 'grad_norm': 0.6742778420448303, 'learning_rate': 8.425290985347501e-08, 'epoch': 2.84}
95%|█████████▍| 10919/11526 [1:54:17<06:13, 1.63it/s] 95%|█████████▍| 10920/11526 [1:54:18<06:12, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.5691714882850647, 'learning_rate': 8.39763132645799e-08, 'epoch': 2.84}
95%|█████████▍| 10920/11526 [1:54:18<06:12, 1.63it/s] 95%|█████████▍| 10921/11526 [1:54:18<06:12, 1.62it/s] {'loss': 0.146, 'grad_norm': 0.6012393236160278, 'learning_rate': 8.370016760134769e-08, 'epoch': 2.84}
95%|█████████▍| 10921/11526 [1:54:19<06:12, 1.62it/s] 95%|█████████▍| 10922/11526 [1:54:19<06:11, 1.63it/s] {'loss': 0.1449, 'grad_norm': 0.5850521922111511, 'learning_rate': 8.342447288910871e-08, 'epoch': 2.84}
95%|█████████▍| 10922/11526 [1:54:19<06:11, 1.63it/s] 95%|█████████▍| 10923/11526 [1:54:20<06:10, 1.63it/s] {'loss': 0.1437, 'grad_norm': 0.6107122302055359, 'learning_rate': 8.314922915315104e-08, 'epoch': 2.84}
95%|█████████▍| 10923/11526 [1:54:20<06:10, 1.63it/s] 95%|█████████▍| 10924/11526 [1:54:20<06:11, 1.62it/s] {'loss': 0.1244, 'grad_norm': 0.48838675022125244, 'learning_rate': 8.287443641872172e-08, 'epoch': 2.84}
95%|█████████▍| 10924/11526 [1:54:20<06:11, 1.62it/s] 95%|█████████▍| 10925/11526 [1:54:21<06:10, 1.62it/s] {'loss': 0.1231, 'grad_norm': 0.5203971266746521, 'learning_rate': 8.260009471102726e-08, 'epoch': 2.84}
95%|█████████▍| 10925/11526 [1:54:21<06:10, 1.62it/s] 95%|█████████▍| 10926/11526 [1:54:21<06:10, 1.62it/s] {'loss': 0.1238, 'grad_norm': 0.5398504137992859, 'learning_rate': 8.232620405523028e-08, 'epoch': 2.84}
95%|█████████▍| 10926/11526 [1:54:22<06:10, 1.62it/s] 95%|█████████▍| 10927/11526 [1:54:22<06:08, 1.62it/s] {'loss': 0.1626, 'grad_norm': 0.6197635531425476, 'learning_rate': 8.205276447645405e-08, 'epoch': 2.84}
95%|█████████▍| 10927/11526 [1:54:22<06:08, 1.62it/s] 95%|█████████▍| 10928/11526 [1:54:23<06:07, 1.63it/s] {'loss': 0.2398, 'grad_norm': 0.8636438846588135, 'learning_rate': 8.177977599978071e-08, 'epoch': 2.84}
95%|█████████▍| 10928/11526 [1:54:23<06:07, 1.63it/s] 95%|█████████▍| 10929/11526 [1:54:23<06:07, 1.62it/s] {'loss': 0.2017, 'grad_norm': 0.7191216945648193, 'learning_rate': 8.150723865024968e-08, 'epoch': 2.84}
95%|█████████▍| 10929/11526 [1:54:23<06:07, 1.62it/s] 95%|█████████▍| 10930/11526 [1:54:24<06:06, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.5702309012413025, 'learning_rate': 8.123515245285985e-08, 'epoch': 2.84}
95%|█████████▍| 10930/11526 [1:54:24<06:06, 1.63it/s] 95%|█████████▍| 10931/11526 [1:54:25<06:06, 1.62it/s] {'loss': 0.1524, 'grad_norm': 0.6314740180969238, 'learning_rate': 8.096351743256847e-08, 'epoch': 2.85}
95%|█████████▍| 10931/11526 [1:54:25<06:06, 1.62it/s] 95%|█████████▍| 10932/11526 [1:54:25<06:05, 1.63it/s] {'loss': 0.1601, 'grad_norm': 0.6225153207778931, 'learning_rate': 8.069233361429119e-08, 'epoch': 2.85}
95%|█████████▍| 10932/11526 [1:54:25<06:05, 1.63it/s] 95%|█████████▍| 10933/11526 [1:54:26<06:04, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.6092753410339355, 'learning_rate': 8.042160102290308e-08, 'epoch': 2.85}
95%|█████████▍| 10933/11526 [1:54:26<06:04, 1.63it/s] 95%|█████████▍| 10934/11526 [1:54:26<06:03, 1.63it/s] {'loss': 0.1554, 'grad_norm': 0.6083731651306152, 'learning_rate': 8.015131968323708e-08, 'epoch': 2.85}
95%|█████████▍| 10934/11526 [1:54:27<06:03, 1.63it/s] 95%|█████████▍| 10935/11526 [1:54:27<06:03, 1.63it/s] {'loss': 0.1829, 'grad_norm': 0.6713926792144775, 'learning_rate': 7.988148962008446e-08, 'epoch': 2.85}
95%|█████████▍| 10935/11526 [1:54:27<06:03, 1.63it/s] 95%|█████████▍| 10936/11526 [1:54:28<06:02, 1.63it/s] {'loss': 0.1497, 'grad_norm': 0.5746599435806274, 'learning_rate': 7.961211085819598e-08, 'epoch': 2.85}
95%|█████████▍| 10936/11526 [1:54:28<06:02, 1.63it/s] 95%|█████████▍| 10937/11526 [1:54:28<06:02, 1.63it/s] {'loss': 0.1589, 'grad_norm': 0.6701363325119019, 'learning_rate': 7.93431834222802e-08, 'epoch': 2.85}
95%|█████████▍| 10937/11526 [1:54:28<06:02, 1.63it/s] 95%|█████████▍| 10938/11526 [1:54:29<06:01, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.5180304050445557, 'learning_rate': 7.907470733700572e-08, 'epoch': 2.85}
95%|█████████▍| 10938/11526 [1:54:29<06:01, 1.63it/s] 95%|█████████▍| 10939/11526 [1:54:29<06:01, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.5332214832305908, 'learning_rate': 7.880668262699787e-08, 'epoch': 2.85}
95%|█████████▍| 10939/11526 [1:54:30<06:01, 1.63it/s] 95%|█████████▍| 10940/11526 [1:54:30<06:00, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.5889539122581482, 'learning_rate': 7.85391093168414e-08, 'epoch': 2.85}
95%|█████████▍| 10940/11526 [1:54:30<06:00, 1.63it/s] 95%|█████████▍| 10941/11526 [1:54:31<05:59, 1.63it/s] {'loss': 0.1121, 'grad_norm': 0.46292856335639954, 'learning_rate': 7.827198743107944e-08, 'epoch': 2.85}
95%|█████████▍| 10941/11526 [1:54:31<05:59, 1.63it/s] 95%|█████████▍| 10942/11526 [1:54:31<05:59, 1.63it/s] {'loss': 0.1311, 'grad_norm': 0.5652855634689331, 'learning_rate': 7.800531699421465e-08, 'epoch': 2.85}
95%|█████████▍| 10942/11526 [1:54:31<05:59, 1.63it/s] 95%|█████████▍| 10943/11526 [1:54:32<05:58, 1.63it/s] {'loss': 0.1535, 'grad_norm': 0.6290999054908752, 'learning_rate': 7.77390980307069e-08, 'epoch': 2.85}
95%|█████████▍| 10943/11526 [1:54:32<05:58, 1.63it/s] 95%|█████████▍| 10944/11526 [1:54:33<05:58, 1.62it/s] {'loss': 0.1645, 'grad_norm': 0.6942859292030334, 'learning_rate': 7.747333056497609e-08, 'epoch': 2.85}
95%|█████████▍| 10944/11526 [1:54:33<05:58, 1.62it/s] 95%|█████████▍| 10945/11526 [1:54:33<05:57, 1.63it/s] {'loss': 0.1715, 'grad_norm': 0.727211058139801, 'learning_rate': 7.720801462139883e-08, 'epoch': 2.85}
95%|█████████▍| 10945/11526 [1:54:33<05:57, 1.63it/s] 95%|█████████▍| 10946/11526 [1:54:34<05:56, 1.63it/s] {'loss': 0.1402, 'grad_norm': 0.55376136302948, 'learning_rate': 7.694315022431231e-08, 'epoch': 2.85}
95%|█████████▍| 10946/11526 [1:54:34<05:56, 1.63it/s] 95%|█████████▍| 10947/11526 [1:54:34<05:56, 1.63it/s] {'loss': 0.1121, 'grad_norm': 0.4977491497993469, 'learning_rate': 7.667873739801102e-08, 'epoch': 2.85}
95%|█████████▍| 10947/11526 [1:54:34<05:56, 1.63it/s] 95%|█████████▍| 10948/11526 [1:54:35<05:55, 1.63it/s] {'loss': 0.1571, 'grad_norm': 0.6504272222518921, 'learning_rate': 7.641477616674886e-08, 'epoch': 2.85}
95%|█████████▍| 10948/11526 [1:54:35<05:55, 1.63it/s] 95%|█████████▍| 10949/11526 [1:54:36<05:55, 1.62it/s] {'loss': 0.1536, 'grad_norm': 0.5357666015625, 'learning_rate': 7.615126655473703e-08, 'epoch': 2.85}
95%|█████████▍| 10949/11526 [1:54:36<05:55, 1.62it/s] 95%|█████████▌| 10950/11526 [1:54:36<05:54, 1.62it/s] {'loss': 0.1501, 'grad_norm': 0.5689911842346191, 'learning_rate': 7.58882085861462e-08, 'epoch': 2.85}
95%|█████████▌| 10950/11526 [1:54:36<05:54, 1.62it/s] 95%|█████████▌| 10951/11526 [1:54:37<05:53, 1.63it/s] {'loss': 0.1364, 'grad_norm': 0.6556687951087952, 'learning_rate': 7.562560228510652e-08, 'epoch': 2.85}
95%|█████████▌| 10951/11526 [1:54:37<05:53, 1.63it/s] 95%|█████████▌| 10952/11526 [1:54:37<05:53, 1.63it/s] {'loss': 0.1655, 'grad_norm': 0.6165751218795776, 'learning_rate': 7.536344767570536e-08, 'epoch': 2.85}
95%|█████████▌| 10952/11526 [1:54:38<05:53, 1.63it/s] 95%|█████████▌| 10953/11526 [1:54:38<05:52, 1.63it/s] {'loss': 0.1609, 'grad_norm': 0.6478879451751709, 'learning_rate': 7.510174478198851e-08, 'epoch': 2.85}
95%|█████████▌| 10953/11526 [1:54:38<05:52, 1.63it/s] 95%|█████████▌| 10954/11526 [1:54:39<05:52, 1.62it/s] {'loss': 0.1245, 'grad_norm': 0.5156077742576599, 'learning_rate': 7.484049362796064e-08, 'epoch': 2.85}
95%|█████████▌| 10954/11526 [1:54:39<05:52, 1.62it/s] 95%|█████████▌| 10955/11526 [1:54:39<05:51, 1.63it/s] {'loss': 0.1349, 'grad_norm': 0.5121318697929382, 'learning_rate': 7.457969423758648e-08, 'epoch': 2.85}
95%|█████████▌| 10955/11526 [1:54:39<05:51, 1.63it/s] 95%|█████████▌| 10956/11526 [1:54:40<05:50, 1.63it/s] {'loss': 0.1726, 'grad_norm': 0.946419894695282, 'learning_rate': 7.43193466347869e-08, 'epoch': 2.85}
95%|█████████▌| 10956/11526 [1:54:40<05:50, 1.63it/s] 95%|█████████▌| 10957/11526 [1:54:41<05:49, 1.63it/s] {'loss': 0.1618, 'grad_norm': 0.5798136591911316, 'learning_rate': 7.405945084344279e-08, 'epoch': 2.85}
95%|█████████▌| 10957/11526 [1:54:41<05:49, 1.63it/s] 95%|█████████▌| 10958/11526 [1:54:41<05:48, 1.63it/s] {'loss': 0.1619, 'grad_norm': 0.6035678386688232, 'learning_rate': 7.380000688739341e-08, 'epoch': 2.85}
95%|█████████▌| 10958/11526 [1:54:41<05:48, 1.63it/s] 95%|█████████▌| 10959/11526 [1:54:42<05:49, 1.62it/s] {'loss': 0.1471, 'grad_norm': 0.5717296004295349, 'learning_rate': 7.354101479043585e-08, 'epoch': 2.85}
95%|█████████▌| 10959/11526 [1:54:42<05:49, 1.62it/s] 95%|█████████▌| 10960/11526 [1:54:42<05:48, 1.62it/s] {'loss': 0.1466, 'grad_norm': 0.721751868724823, 'learning_rate': 7.328247457632776e-08, 'epoch': 2.85}
95%|█████████▌| 10960/11526 [1:54:42<05:48, 1.62it/s] 95%|█████████▌| 10961/11526 [1:54:43<05:47, 1.63it/s] {'loss': 0.1621, 'grad_norm': 0.6264686584472656, 'learning_rate': 7.302438626878183e-08, 'epoch': 2.85}
95%|█████████▌| 10961/11526 [1:54:43<05:47, 1.63it/s] 95%|█████████▌| 10962/11526 [1:54:44<05:46, 1.63it/s] {'loss': 0.1168, 'grad_norm': 0.5009835362434387, 'learning_rate': 7.276674989147303e-08, 'epoch': 2.85}
95%|█████████▌| 10962/11526 [1:54:44<05:46, 1.63it/s] 95%|█████████▌| 10963/11526 [1:54:44<05:46, 1.63it/s] {'loss': 0.1819, 'grad_norm': 0.7467767000198364, 'learning_rate': 7.250956546803301e-08, 'epoch': 2.85}
95%|█████████▌| 10963/11526 [1:54:44<05:46, 1.63it/s] 95%|█████████▌| 10964/11526 [1:54:45<05:45, 1.62it/s] {'loss': 0.1838, 'grad_norm': 0.7167349457740784, 'learning_rate': 7.225283302205177e-08, 'epoch': 2.85}
95%|█████████▌| 10964/11526 [1:54:45<05:45, 1.62it/s] 95%|█████████▌| 10965/11526 [1:54:45<05:45, 1.63it/s] {'loss': 0.1834, 'grad_norm': 0.8163217902183533, 'learning_rate': 7.199655257707828e-08, 'epoch': 2.85}
95%|█████████▌| 10965/11526 [1:54:46<05:45, 1.63it/s] 95%|█████████▌| 10966/11526 [1:54:46<05:44, 1.63it/s] {'loss': 0.2372, 'grad_norm': 0.7830700874328613, 'learning_rate': 7.174072415662036e-08, 'epoch': 2.85}
95%|█████████▌| 10966/11526 [1:54:46<05:44, 1.63it/s] 95%|█████████▌| 10967/11526 [1:54:47<05:43, 1.63it/s] {'loss': 0.106, 'grad_norm': 0.4387628138065338, 'learning_rate': 7.148534778414374e-08, 'epoch': 2.85}
95%|█████████▌| 10967/11526 [1:54:47<05:43, 1.63it/s] 95%|█████████▌| 10968/11526 [1:54:47<05:43, 1.63it/s] {'loss': 0.1153, 'grad_norm': 0.5642468333244324, 'learning_rate': 7.123042348307296e-08, 'epoch': 2.85}
95%|█████████▌| 10968/11526 [1:54:47<05:43, 1.63it/s] 95%|█████████▌| 10969/11526 [1:54:48<05:42, 1.62it/s] {'loss': 0.1532, 'grad_norm': 0.6306464672088623, 'learning_rate': 7.097595127679103e-08, 'epoch': 2.86}
95%|█████████▌| 10969/11526 [1:54:48<05:42, 1.62it/s] 95%|█████████▌| 10970/11526 [1:54:49<05:41, 1.63it/s] {'loss': 0.1289, 'grad_norm': 0.5434660911560059, 'learning_rate': 7.072193118864034e-08, 'epoch': 2.86}
95%|█████████▌| 10970/11526 [1:54:49<05:41, 1.63it/s] 95%|█████████▌| 10971/11526 [1:54:49<05:41, 1.62it/s] {'loss': 0.1344, 'grad_norm': 0.5599526166915894, 'learning_rate': 7.046836324192063e-08, 'epoch': 2.86}
95%|█████████▌| 10971/11526 [1:54:49<05:41, 1.62it/s] 95%|█████████▌| 10972/11526 [1:54:50<05:40, 1.63it/s] {'loss': 0.1563, 'grad_norm': 0.575204610824585, 'learning_rate': 7.021524745988994e-08, 'epoch': 2.86}
95%|█████████▌| 10972/11526 [1:54:50<05:40, 1.63it/s] 95%|█████████▌| 10973/11526 [1:54:50<05:39, 1.63it/s] {'loss': 0.1343, 'grad_norm': 0.53369140625, 'learning_rate': 6.996258386576638e-08, 'epoch': 2.86}
95%|█████████▌| 10973/11526 [1:54:50<05:39, 1.63it/s] 95%|█████████▌| 10974/11526 [1:54:51<05:39, 1.62it/s] {'loss': 0.1489, 'grad_norm': 0.5797464847564697, 'learning_rate': 6.971037248272583e-08, 'epoch': 2.86}
95%|█████████▌| 10974/11526 [1:54:51<05:39, 1.62it/s] 95%|█████████▌| 10975/11526 [1:54:52<05:39, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.5620077252388, 'learning_rate': 6.945861333390147e-08, 'epoch': 2.86}
95%|█████████▌| 10975/11526 [1:54:52<05:39, 1.63it/s] 95%|█████████▌| 10976/11526 [1:54:52<05:38, 1.62it/s] {'loss': 0.1685, 'grad_norm': 0.668645441532135, 'learning_rate': 6.920730644238705e-08, 'epoch': 2.86}
95%|█████████▌| 10976/11526 [1:54:52<05:38, 1.62it/s] 95%|█████████▌| 10977/11526 [1:54:53<05:37, 1.63it/s] {'loss': 0.1202, 'grad_norm': 0.5179408192634583, 'learning_rate': 6.895645183123357e-08, 'epoch': 2.86}
95%|█████████▌| 10977/11526 [1:54:53<05:37, 1.63it/s] 95%|█████████▌| 10978/11526 [1:54:53<05:36, 1.63it/s] {'loss': 0.1634, 'grad_norm': 0.6795694828033447, 'learning_rate': 6.870604952345039e-08, 'epoch': 2.86}
95%|█████████▌| 10978/11526 [1:54:54<05:36, 1.63it/s] 95%|█████████▌| 10979/11526 [1:54:54<05:36, 1.63it/s] {'loss': 0.1761, 'grad_norm': 0.749793291091919, 'learning_rate': 6.845609954200694e-08, 'epoch': 2.86}
95%|█████████▌| 10979/11526 [1:54:54<05:36, 1.63it/s] 95%|█████████▌| 10980/11526 [1:54:55<05:35, 1.63it/s] {'loss': 0.1416, 'grad_norm': 0.5526391267776489, 'learning_rate': 6.82066019098293e-08, 'epoch': 2.86}
95%|█████████▌| 10980/11526 [1:54:55<05:35, 1.63it/s] 95%|█████████▌| 10981/11526 [1:54:55<05:35, 1.63it/s] {'loss': 0.1393, 'grad_norm': 0.5212991237640381, 'learning_rate': 6.795755664980197e-08, 'epoch': 2.86}
95%|█████████▌| 10981/11526 [1:54:55<05:35, 1.63it/s] 95%|█████████▌| 10982/11526 [1:54:56<05:34, 1.63it/s] {'loss': 0.1061, 'grad_norm': 0.4774209260940552, 'learning_rate': 6.770896378477055e-08, 'epoch': 2.86}
95%|█████████▌| 10982/11526 [1:54:56<05:34, 1.63it/s] 95%|█████████▌| 10983/11526 [1:54:57<05:33, 1.63it/s] {'loss': 0.1552, 'grad_norm': 0.5763002038002014, 'learning_rate': 6.746082333753623e-08, 'epoch': 2.86}
95%|█████████▌| 10983/11526 [1:54:57<05:33, 1.63it/s] 95%|█████████▌| 10984/11526 [1:54:57<05:33, 1.62it/s] {'loss': 0.1539, 'grad_norm': 0.6410741806030273, 'learning_rate': 6.721313533086028e-08, 'epoch': 2.86}
95%|█████████▌| 10984/11526 [1:54:57<05:33, 1.62it/s] 95%|█████████▌| 10985/11526 [1:54:58<05:32, 1.63it/s] {'loss': 0.1343, 'grad_norm': 0.5213571190834045, 'learning_rate': 6.696589978746171e-08, 'epoch': 2.86}
95%|█████████▌| 10985/11526 [1:54:58<05:32, 1.63it/s] 95%|█████████▌| 10986/11526 [1:54:58<05:32, 1.63it/s] {'loss': 0.1739, 'grad_norm': 0.6440485715866089, 'learning_rate': 6.671911673001908e-08, 'epoch': 2.86}
95%|█████████▌| 10986/11526 [1:54:58<05:32, 1.63it/s] 95%|█████████▌| 10987/11526 [1:54:59<05:31, 1.63it/s] {'loss': 0.1163, 'grad_norm': 0.4988017976284027, 'learning_rate': 6.64727861811676e-08, 'epoch': 2.86}
95%|█████████▌| 10987/11526 [1:54:59<05:31, 1.63it/s] 95%|█████████▌| 10988/11526 [1:55:00<05:30, 1.63it/s] {'loss': 0.1203, 'grad_norm': 0.5834193229675293, 'learning_rate': 6.622690816350363e-08, 'epoch': 2.86}
95%|█████████▌| 10988/11526 [1:55:00<05:30, 1.63it/s] 95%|█████████▌| 10989/11526 [1:55:00<05:30, 1.62it/s] {'loss': 0.1341, 'grad_norm': 0.5318042039871216, 'learning_rate': 6.598148269957916e-08, 'epoch': 2.86}
95%|█████████▌| 10989/11526 [1:55:00<05:30, 1.62it/s] 95%|█████████▌| 10990/11526 [1:55:01<05:29, 1.63it/s] {'loss': 0.1424, 'grad_norm': 0.6040578484535217, 'learning_rate': 6.573650981190671e-08, 'epoch': 2.86}
95%|█████████▌| 10990/11526 [1:55:01<05:29, 1.63it/s] 95%|█████████▌| 10991/11526 [1:55:01<05:29, 1.62it/s] {'loss': 0.1652, 'grad_norm': 0.5992820858955383, 'learning_rate': 6.54919895229561e-08, 'epoch': 2.86}
95%|█████████▌| 10991/11526 [1:55:02<05:29, 1.62it/s] 95%|█████████▌| 10992/11526 [1:55:02<05:28, 1.63it/s] {'loss': 0.1668, 'grad_norm': 0.6113911867141724, 'learning_rate': 6.524792185515661e-08, 'epoch': 2.86}
95%|█████████▌| 10992/11526 [1:55:02<05:28, 1.63it/s] 95%|█████████▌| 10993/11526 [1:55:03<05:28, 1.62it/s] {'loss': 0.1757, 'grad_norm': 0.7253512144088745, 'learning_rate': 6.500430683089532e-08, 'epoch': 2.86}
95%|█████████▌| 10993/11526 [1:55:03<05:28, 1.62it/s] 95%|█████████▌| 10994/11526 [1:55:03<05:27, 1.62it/s] {'loss': 0.1516, 'grad_norm': 0.5734214782714844, 'learning_rate': 6.476114447251769e-08, 'epoch': 2.86}
95%|█████████▌| 10994/11526 [1:55:03<05:27, 1.62it/s] 95%|█████████▌| 10995/11526 [1:55:04<05:26, 1.62it/s] {'loss': 0.1387, 'grad_norm': 0.6191969513893127, 'learning_rate': 6.45184348023281e-08, 'epoch': 2.86}
95%|█████████▌| 10995/11526 [1:55:04<05:26, 1.62it/s] 95%|█████████▌| 10996/11526 [1:55:05<05:26, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.6863036751747131, 'learning_rate': 6.427617784259044e-08, 'epoch': 2.86}
95%|█████████▌| 10996/11526 [1:55:05<05:26, 1.62it/s] 95%|█████████▌| 10997/11526 [1:55:05<05:25, 1.62it/s] {'loss': 0.2002, 'grad_norm': 0.6960019469261169, 'learning_rate': 6.403437361552412e-08, 'epoch': 2.86}
95%|█████████▌| 10997/11526 [1:55:05<05:25, 1.62it/s] 95%|█████████▌| 10998/11526 [1:55:06<05:24, 1.63it/s] {'loss': 0.1099, 'grad_norm': 0.4443988502025604, 'learning_rate': 6.379302214330918e-08, 'epoch': 2.86}
95%|█████████▌| 10998/11526 [1:55:06<05:24, 1.63it/s] 95%|█████████▌| 10999/11526 [1:55:06<05:24, 1.62it/s] {'loss': 0.2354, 'grad_norm': 0.9386343955993652, 'learning_rate': 6.355212344808459e-08, 'epoch': 2.86}
95%|█████████▌| 10999/11526 [1:55:06<05:24, 1.62it/s] 95%|█████████▌| 11000/11526 [1:55:07<05:23, 1.62it/s] {'loss': 0.1665, 'grad_norm': 0.7195026874542236, 'learning_rate': 6.331167755194656e-08, 'epoch': 2.86}
95%|█████████▌| 11000/11526 [1:55:07<05:23, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.16it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.99it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.88it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.81it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.76it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.5423033833503723, 'eval_runtime': 1.9582, 'eval_samples_per_second': 102.136, 'eval_steps_per_second': 6.639, 'epoch': 2.86}
95%|█████████▌| 11000/11526 [1:55:09<05:23, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 95%|█████████▌| 11001/11526 [1:55:10<10:32, 1.20s/it] {'loss': 0.1449, 'grad_norm': 0.5643436908721924, 'learning_rate': 6.307168447694967e-08, 'epoch': 2.86}
95%|█████████▌| 11001/11526 [1:55:10<10:32, 1.20s/it] 95%|█████████▌| 11002/11526 [1:55:10<08:58, 1.03s/it] {'loss': 0.1602, 'grad_norm': 0.6877837777137756, 'learning_rate': 6.283214424510853e-08, 'epoch': 2.86}
95%|█████████▌| 11002/11526 [1:55:10<08:58, 1.03s/it] 95%|█████████▌| 11003/11526 [1:55:11<07:52, 1.11it/s] {'loss': 0.1417, 'grad_norm': 0.5607078075408936, 'learning_rate': 6.259305687839334e-08, 'epoch': 2.86}
95%|█████████▌| 11003/11526 [1:55:11<07:52, 1.11it/s] 95%|█████████▌| 11004/11526 [1:55:11<07:07, 1.22it/s] {'loss': 0.1827, 'grad_norm': 0.705260694026947, 'learning_rate': 6.235442239873656e-08, 'epoch': 2.86}
95%|█████████▌| 11004/11526 [1:55:12<07:07, 1.22it/s] 95%|█████████▌| 11005/11526 [1:55:12<06:34, 1.32it/s] {'loss': 0.1501, 'grad_norm': 0.5919515490531921, 'learning_rate': 6.21162408280257e-08, 'epoch': 2.86}
95%|█████████▌| 11005/11526 [1:55:12<06:34, 1.32it/s] 95%|█████████▌| 11006/11526 [1:55:13<06:11, 1.40it/s] {'loss': 0.1381, 'grad_norm': 0.5460925102233887, 'learning_rate': 6.187851218810881e-08, 'epoch': 2.86}
95%|█████████▌| 11006/11526 [1:55:13<06:11, 1.40it/s] 95%|█████████▌| 11007/11526 [1:55:13<05:55, 1.46it/s] {'loss': 0.1558, 'grad_norm': 0.5399622917175293, 'learning_rate': 6.164123650079179e-08, 'epoch': 2.86}
95%|█████████▌| 11007/11526 [1:55:13<05:55, 1.46it/s] 96%|█████████▌| 11008/11526 [1:55:14<05:43, 1.51it/s] {'loss': 0.1343, 'grad_norm': 0.5697486400604248, 'learning_rate': 6.140441378783834e-08, 'epoch': 2.87}
96%|█████████▌| 11008/11526 [1:55:14<05:43, 1.51it/s] 96%|█████████▌| 11009/11526 [1:55:14<05:35, 1.54it/s] {'loss': 0.1549, 'grad_norm': 0.5770275592803955, 'learning_rate': 6.116804407097166e-08, 'epoch': 2.87}
96%|█████████▌| 11009/11526 [1:55:15<05:35, 1.54it/s] 96%|█████████▌| 11010/11526 [1:55:15<05:29, 1.57it/s] {'loss': 0.1336, 'grad_norm': 0.4902442693710327, 'learning_rate': 6.09321273718727e-08, 'epoch': 2.87}
96%|█████████▌| 11010/11526 [1:55:15<05:29, 1.57it/s] 96%|█████████▌| 11011/11526 [1:55:16<05:25, 1.58it/s] {'loss': 0.1536, 'grad_norm': 0.6186672449111938, 'learning_rate': 6.069666371218141e-08, 'epoch': 2.87}
96%|█████████▌| 11011/11526 [1:55:16<05:25, 1.58it/s] 96%|█████████▌| 11012/11526 [1:55:16<05:21, 1.60it/s] {'loss': 0.1462, 'grad_norm': 0.6529629230499268, 'learning_rate': 6.04616531134955e-08, 'epoch': 2.87}
96%|█████████▌| 11012/11526 [1:55:16<05:21, 1.60it/s] 96%|█████████▌| 11013/11526 [1:55:17<05:19, 1.61it/s] {'loss': 0.1318, 'grad_norm': 0.4949430525302887, 'learning_rate': 6.022709559737106e-08, 'epoch': 2.87}
96%|█████████▌| 11013/11526 [1:55:17<05:19, 1.61it/s] 96%|█████████▌| 11014/11526 [1:55:18<05:17, 1.61it/s] {'loss': 0.159, 'grad_norm': 0.606173038482666, 'learning_rate': 5.99929911853242e-08, 'epoch': 2.87}
96%|█████████▌| 11014/11526 [1:55:18<05:17, 1.61it/s] 96%|█████████▌| 11015/11526 [1:55:18<05:16, 1.62it/s] {'loss': 0.1553, 'grad_norm': 0.6844934225082397, 'learning_rate': 5.975933989882777e-08, 'epoch': 2.87}
96%|█████████▌| 11015/11526 [1:55:18<05:16, 1.62it/s] 96%|█████████▌| 11016/11526 [1:55:19<05:15, 1.62it/s] {'loss': 0.1309, 'grad_norm': 0.5549585223197937, 'learning_rate': 5.9526141759312927e-08, 'epoch': 2.87}
96%|█████████▌| 11016/11526 [1:55:19<05:15, 1.62it/s] 96%|█████████▌| 11017/11526 [1:55:19<05:13, 1.62it/s] {'loss': 0.1866, 'grad_norm': 0.6579276323318481, 'learning_rate': 5.9293396788170365e-08, 'epoch': 2.87}
96%|█████████▌| 11017/11526 [1:55:20<05:13, 1.62it/s] 96%|█████████▌| 11018/11526 [1:55:20<05:13, 1.62it/s] {'loss': 0.1661, 'grad_norm': 0.6605570912361145, 'learning_rate': 5.906110500674967e-08, 'epoch': 2.87}
96%|█████████▌| 11018/11526 [1:55:20<05:13, 1.62it/s] 96%|█████████▌| 11019/11526 [1:55:21<05:12, 1.62it/s] {'loss': 0.1588, 'grad_norm': 0.5803224444389343, 'learning_rate': 5.8829266436356556e-08, 'epoch': 2.87}
96%|█████████▌| 11019/11526 [1:55:21<05:12, 1.62it/s] 96%|█████████▌| 11020/11526 [1:55:21<05:11, 1.62it/s] {'loss': 0.1591, 'grad_norm': 0.6251804828643799, 'learning_rate': 5.8597881098257924e-08, 'epoch': 2.87}
96%|█████████▌| 11020/11526 [1:55:21<05:11, 1.62it/s] 96%|█████████▌| 11021/11526 [1:55:22<05:10, 1.62it/s] {'loss': 0.1602, 'grad_norm': 0.6804287433624268, 'learning_rate': 5.836694901367623e-08, 'epoch': 2.87}
96%|█████████▌| 11021/11526 [1:55:22<05:10, 1.62it/s] 96%|█████████▌| 11022/11526 [1:55:22<05:09, 1.63it/s] {'loss': 0.1203, 'grad_norm': 0.48972323536872864, 'learning_rate': 5.8136470203794535e-08, 'epoch': 2.87}
96%|█████████▌| 11022/11526 [1:55:23<05:09, 1.63it/s] 96%|█████████▌| 11023/11526 [1:55:23<05:09, 1.63it/s] {'loss': 0.1476, 'grad_norm': 0.6983583569526672, 'learning_rate': 5.7906444689754816e-08, 'epoch': 2.87}
96%|█████████▌| 11023/11526 [1:55:23<05:09, 1.63it/s] 96%|█████████▌| 11024/11526 [1:55:24<05:08, 1.62it/s] {'loss': 0.1523, 'grad_norm': 0.6611413955688477, 'learning_rate': 5.767687249265519e-08, 'epoch': 2.87}
96%|█████████▌| 11024/11526 [1:55:24<05:08, 1.62it/s] 96%|█████████▌| 11025/11526 [1:55:24<05:08, 1.63it/s] {'loss': 0.1333, 'grad_norm': 0.5384858250617981, 'learning_rate': 5.74477536335527e-08, 'epoch': 2.87}
96%|█████████▌| 11025/11526 [1:55:24<05:08, 1.63it/s] 96%|█████████▌| 11026/11526 [1:55:25<05:07, 1.62it/s] {'loss': 0.1388, 'grad_norm': 0.5713870525360107, 'learning_rate': 5.7219088133464996e-08, 'epoch': 2.87}
96%|█████████▌| 11026/11526 [1:55:25<05:07, 1.62it/s] 96%|█████████▌| 11027/11526 [1:55:26<05:06, 1.63it/s] {'loss': 0.1589, 'grad_norm': 0.6016893982887268, 'learning_rate': 5.6990876013365284e-08, 'epoch': 2.87}
96%|█████████▌| 11027/11526 [1:55:26<05:06, 1.63it/s] 96%|█████████▌| 11028/11526 [1:55:26<05:06, 1.63it/s] {'loss': 0.1482, 'grad_norm': 0.5380858182907104, 'learning_rate': 5.676311729418738e-08, 'epoch': 2.87}
96%|█████████▌| 11028/11526 [1:55:26<05:06, 1.63it/s] 96%|█████████▌| 11029/11526 [1:55:27<05:05, 1.63it/s] {'loss': 0.1319, 'grad_norm': 0.6975628733634949, 'learning_rate': 5.653581199682179e-08, 'epoch': 2.87}
96%|█████████▌| 11029/11526 [1:55:27<05:05, 1.63it/s] 96%|█████████▌| 11030/11526 [1:55:27<05:04, 1.63it/s] {'loss': 0.1475, 'grad_norm': 0.5982299447059631, 'learning_rate': 5.6308960142119064e-08, 'epoch': 2.87}
96%|█████████▌| 11030/11526 [1:55:28<05:04, 1.63it/s] 96%|█████████▌| 11031/11526 [1:55:28<05:04, 1.63it/s] {'loss': 0.1395, 'grad_norm': 0.580478310585022, 'learning_rate': 5.6082561750887e-08, 'epoch': 2.87}
96%|█████████▌| 11031/11526 [1:55:28<05:04, 1.63it/s] 96%|█████████▌| 11032/11526 [1:55:29<05:03, 1.63it/s] {'loss': 0.1631, 'grad_norm': 0.5807511806488037, 'learning_rate': 5.5856616843891764e-08, 'epoch': 2.87}
96%|█████████▌| 11032/11526 [1:55:29<05:03, 1.63it/s] 96%|█████████▌| 11033/11526 [1:55:29<05:03, 1.63it/s] {'loss': 0.15, 'grad_norm': 0.5558671951293945, 'learning_rate': 5.563112544185845e-08, 'epoch': 2.87}
96%|█████████▌| 11033/11526 [1:55:29<05:03, 1.63it/s] 96%|█████████▌| 11034/11526 [1:55:30<05:04, 1.62it/s] {'loss': 0.1357, 'grad_norm': 0.6708522439002991, 'learning_rate': 5.5406087565471054e-08, 'epoch': 2.87}
96%|█████████▌| 11034/11526 [1:55:30<05:04, 1.62it/s] 96%|█████████▌| 11035/11526 [1:55:30<05:03, 1.62it/s] {'loss': 0.1441, 'grad_norm': 0.573715329170227, 'learning_rate': 5.5181503235370856e-08, 'epoch': 2.87}
96%|█████████▌| 11035/11526 [1:55:31<05:03, 1.62it/s] 96%|█████████▌| 11036/11526 [1:55:31<05:02, 1.62it/s] {'loss': 0.1517, 'grad_norm': 0.567571759223938, 'learning_rate': 5.495737247215804e-08, 'epoch': 2.87}
96%|█████████▌| 11036/11526 [1:55:31<05:02, 1.62it/s] 96%|█████████▌| 11037/11526 [1:55:32<05:01, 1.62it/s] {'loss': 0.1516, 'grad_norm': 0.6211439967155457, 'learning_rate': 5.4733695296391166e-08, 'epoch': 2.87}
96%|█████████▌| 11037/11526 [1:55:32<05:01, 1.62it/s] 96%|█████████▌| 11038/11526 [1:55:32<05:00, 1.62it/s] {'loss': 0.1351, 'grad_norm': 0.5345041155815125, 'learning_rate': 5.4510471728587146e-08, 'epoch': 2.87}
96%|█████████▌| 11038/11526 [1:55:32<05:00, 1.62it/s] 96%|█████████▌| 11039/11526 [1:55:33<05:00, 1.62it/s] {'loss': 0.1284, 'grad_norm': 0.47626304626464844, 'learning_rate': 5.4287701789220715e-08, 'epoch': 2.87}
96%|█████████▌| 11039/11526 [1:55:33<05:00, 1.62it/s] 96%|█████████▌| 11040/11526 [1:55:34<04:59, 1.62it/s] {'loss': 0.1598, 'grad_norm': 0.659138560295105, 'learning_rate': 5.40653854987272e-08, 'epoch': 2.87}
96%|█████████▌| 11040/11526 [1:55:34<04:59, 1.62it/s] 96%|█████████▌| 11041/11526 [1:55:34<04:58, 1.62it/s] {'loss': 0.2102, 'grad_norm': 0.787924587726593, 'learning_rate': 5.3843522877496946e-08, 'epoch': 2.87}
96%|█████████▌| 11041/11526 [1:55:34<04:58, 1.62it/s] 96%|█████████▌| 11042/11526 [1:55:35<04:57, 1.62it/s] {'loss': 0.1634, 'grad_norm': 0.6297173500061035, 'learning_rate': 5.362211394588202e-08, 'epoch': 2.87}
96%|█████████▌| 11042/11526 [1:55:35<04:57, 1.62it/s] 96%|█████████▌| 11043/11526 [1:55:35<04:57, 1.62it/s] {'loss': 0.15, 'grad_norm': 0.5708057880401611, 'learning_rate': 5.340115872418949e-08, 'epoch': 2.87}
96%|█████████▌| 11043/11526 [1:55:36<04:57, 1.62it/s] 96%|█████████▌| 11044/11526 [1:55:36<04:56, 1.62it/s] {'loss': 0.1646, 'grad_norm': 0.5473392605781555, 'learning_rate': 5.3180657232688174e-08, 'epoch': 2.87}
96%|█████████▌| 11044/11526 [1:55:36<04:56, 1.62it/s] 96%|█████████▌| 11045/11526 [1:55:37<04:55, 1.63it/s] {'loss': 0.1347, 'grad_norm': 0.5184270143508911, 'learning_rate': 5.2960609491603534e-08, 'epoch': 2.87}
96%|█████████▌| 11045/11526 [1:55:37<04:55, 1.63it/s] 96%|█████████▌| 11046/11526 [1:55:37<04:56, 1.62it/s] {'loss': 0.1386, 'grad_norm': 0.6405014991760254, 'learning_rate': 5.274101552111943e-08, 'epoch': 2.88}
96%|█████████▌| 11046/11526 [1:55:37<04:56, 1.62it/s] 96%|█████████▌| 11047/11526 [1:55:38<04:55, 1.62it/s] {'loss': 0.2396, 'grad_norm': 0.7262946963310242, 'learning_rate': 5.252187534137754e-08, 'epoch': 2.88}
96%|█████████▌| 11047/11526 [1:55:38<04:55, 1.62it/s] 96%|█████████▌| 11048/11526 [1:55:38<04:54, 1.62it/s] {'loss': 0.1785, 'grad_norm': 0.7321373224258423, 'learning_rate': 5.230318897247955e-08, 'epoch': 2.88}
96%|█████████▌| 11048/11526 [1:55:39<04:54, 1.62it/s] 96%|█████████▌| 11049/11526 [1:55:39<04:53, 1.62it/s] {'loss': 0.2555, 'grad_norm': 0.7457536458969116, 'learning_rate': 5.2084956434483855e-08, 'epoch': 2.88}
96%|█████████▌| 11049/11526 [1:55:39<04:53, 1.62it/s] 96%|█████████▌| 11050/11526 [1:55:40<04:53, 1.62it/s] {'loss': 0.1293, 'grad_norm': 0.5173109769821167, 'learning_rate': 5.1867177747408906e-08, 'epoch': 2.88}
96%|█████████▌| 11050/11526 [1:55:40<04:53, 1.62it/s] 96%|█████████▌| 11051/11526 [1:55:40<04:53, 1.62it/s] {'loss': 0.2218, 'grad_norm': 0.5614221096038818, 'learning_rate': 5.164985293122982e-08, 'epoch': 2.88}
96%|█████████▌| 11051/11526 [1:55:40<04:53, 1.62it/s] 96%|█████████▌| 11052/11526 [1:55:41<04:52, 1.62it/s] {'loss': 0.1403, 'grad_norm': 0.6225721836090088, 'learning_rate': 5.1432982005881203e-08, 'epoch': 2.88}
96%|█████████▌| 11052/11526 [1:55:41<04:52, 1.62it/s] 96%|█████████▌| 11053/11526 [1:55:42<04:51, 1.62it/s] {'loss': 0.1752, 'grad_norm': 0.6316826939582825, 'learning_rate': 5.1216564991256044e-08, 'epoch': 2.88}
96%|█████████▌| 11053/11526 [1:55:42<04:51, 1.62it/s] 96%|█████████▌| 11054/11526 [1:55:42<04:50, 1.62it/s] {'loss': 0.1574, 'grad_norm': 0.616841733455658, 'learning_rate': 5.100060190720457e-08, 'epoch': 2.88}
96%|█████████▌| 11054/11526 [1:55:42<04:50, 1.62it/s] 96%|█████████▌| 11055/11526 [1:55:43<04:49, 1.63it/s] {'loss': 0.1752, 'grad_norm': 0.5077406167984009, 'learning_rate': 5.078509277353705e-08, 'epoch': 2.88}
96%|█████████▌| 11055/11526 [1:55:43<04:49, 1.63it/s] 96%|█████████▌| 11056/11526 [1:55:43<04:49, 1.62it/s] {'loss': 0.1379, 'grad_norm': 0.5508193373680115, 'learning_rate': 5.057003761002044e-08, 'epoch': 2.88}
96%|█████████▌| 11056/11526 [1:55:44<04:49, 1.62it/s] 96%|█████████▌| 11057/11526 [1:55:44<04:48, 1.62it/s] {'loss': 0.1578, 'grad_norm': 0.6425543427467346, 'learning_rate': 5.035543643638063e-08, 'epoch': 2.88}
96%|█████████▌| 11057/11526 [1:55:44<04:48, 1.62it/s] 96%|█████████▌| 11058/11526 [1:55:45<04:47, 1.63it/s] {'loss': 0.137, 'grad_norm': 0.5215144157409668, 'learning_rate': 5.0141289272302995e-08, 'epoch': 2.88}
96%|█████████▌| 11058/11526 [1:55:45<04:47, 1.63it/s] 96%|█████████▌| 11059/11526 [1:55:45<04:47, 1.62it/s] {'loss': 0.1497, 'grad_norm': 0.6539726257324219, 'learning_rate': 4.992759613742959e-08, 'epoch': 2.88}
96%|█████████▌| 11059/11526 [1:55:45<04:47, 1.62it/s] 96%|█████████▌| 11060/11526 [1:55:46<04:46, 1.63it/s] {'loss': 0.1111, 'grad_norm': 0.4263342618942261, 'learning_rate': 4.971435705136196e-08, 'epoch': 2.88}
96%|█████████▌| 11060/11526 [1:55:46<04:46, 1.63it/s] 96%|█████████▌| 11061/11526 [1:55:46<04:47, 1.62it/s] {'loss': 0.1443, 'grad_norm': 0.4981379806995392, 'learning_rate': 4.950157203365946e-08, 'epoch': 2.88}
96%|█████████▌| 11061/11526 [1:55:47<04:47, 1.62it/s] 96%|█████████▌| 11062/11526 [1:55:47<04:45, 1.62it/s] {'loss': 0.1393, 'grad_norm': 0.5701273083686829, 'learning_rate': 4.9289241103839813e-08, 'epoch': 2.88}
96%|█████████▌| 11062/11526 [1:55:47<04:45, 1.62it/s] 96%|█████████▌| 11063/11526 [1:55:48<04:45, 1.62it/s] {'loss': 0.1589, 'grad_norm': 0.622278094291687, 'learning_rate': 4.9077364281379104e-08, 'epoch': 2.88}
96%|█████████▌| 11063/11526 [1:55:48<04:45, 1.62it/s] 96%|█████████▌| 11064/11526 [1:55:48<04:44, 1.62it/s] {'loss': 0.1253, 'grad_norm': 0.4555061161518097, 'learning_rate': 4.8865941585712894e-08, 'epoch': 2.88}
96%|█████████▌| 11064/11526 [1:55:48<04:44, 1.62it/s] 96%|█████████▌| 11065/11526 [1:55:49<04:43, 1.63it/s] {'loss': 0.1915, 'grad_norm': 0.6225709915161133, 'learning_rate': 4.86549730362329e-08, 'epoch': 2.88}
96%|█████████▌| 11065/11526 [1:55:49<04:43, 1.63it/s] 96%|█████████▌| 11066/11526 [1:55:50<04:43, 1.62it/s] {'loss': 0.1512, 'grad_norm': 0.6442245841026306, 'learning_rate': 4.8444458652290306e-08, 'epoch': 2.88}
96%|█████████▌| 11066/11526 [1:55:50<04:43, 1.62it/s] 96%|█████████▌| 11067/11526 [1:55:50<04:42, 1.63it/s] {'loss': 0.1225, 'grad_norm': 0.502111554145813, 'learning_rate': 4.8234398453195775e-08, 'epoch': 2.88}
96%|█████████▌| 11067/11526 [1:55:50<04:42, 1.63it/s] 96%|█████████▌| 11068/11526 [1:55:51<04:41, 1.63it/s] {'loss': 0.1169, 'grad_norm': 0.4875654876232147, 'learning_rate': 4.802479245821612e-08, 'epoch': 2.88}
96%|█████████▌| 11068/11526 [1:55:51<04:41, 1.63it/s] 96%|█████████▌| 11069/11526 [1:55:51<04:41, 1.62it/s] {'loss': 0.166, 'grad_norm': 0.7243351340293884, 'learning_rate': 4.7815640686578736e-08, 'epoch': 2.88}
96%|█████████▌| 11069/11526 [1:55:52<04:41, 1.62it/s] 96%|█████████▌| 11070/11526 [1:55:52<04:40, 1.62it/s] {'loss': 0.1902, 'grad_norm': 0.6825723052024841, 'learning_rate': 4.7606943157467166e-08, 'epoch': 2.88}
96%|█████████▌| 11070/11526 [1:55:52<04:40, 1.62it/s] 96%|█████████▌| 11071/11526 [1:55:53<04:40, 1.62it/s] {'loss': 0.1652, 'grad_norm': 0.5833405256271362, 'learning_rate': 4.7398699890024435e-08, 'epoch': 2.88}
96%|█████████▌| 11071/11526 [1:55:53<04:40, 1.62it/s] 96%|█████████▌| 11072/11526 [1:55:53<04:39, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5374189019203186, 'learning_rate': 4.7190910903352485e-08, 'epoch': 2.88}
96%|█████████▌| 11072/11526 [1:55:53<04:39, 1.63it/s] 96%|█████████▌| 11073/11526 [1:55:54<04:38, 1.63it/s] {'loss': 0.1538, 'grad_norm': 0.583662211894989, 'learning_rate': 4.6983576216510505e-08, 'epoch': 2.88}
96%|█████████▌| 11073/11526 [1:55:54<04:38, 1.63it/s] 96%|█████████▌| 11074/11526 [1:55:54<04:37, 1.63it/s] {'loss': 0.201, 'grad_norm': 0.5707346796989441, 'learning_rate': 4.6776695848516626e-08, 'epoch': 2.88}
96%|█████████▌| 11074/11526 [1:55:55<04:37, 1.63it/s] 96%|█████████▌| 11075/11526 [1:55:55<04:37, 1.63it/s] {'loss': 0.1571, 'grad_norm': 0.6258001923561096, 'learning_rate': 4.657026981834623e-08, 'epoch': 2.88}
96%|█████████▌| 11075/11526 [1:55:55<04:37, 1.63it/s] 96%|█████████▌| 11076/11526 [1:55:56<04:36, 1.63it/s] {'loss': 0.1389, 'grad_norm': 0.5346430540084839, 'learning_rate': 4.63642981449347e-08, 'epoch': 2.88}
96%|█████████▌| 11076/11526 [1:55:56<04:36, 1.63it/s] 96%|█████████▌| 11077/11526 [1:55:56<04:35, 1.63it/s] {'loss': 0.1791, 'grad_norm': 0.5968036651611328, 'learning_rate': 4.615878084717529e-08, 'epoch': 2.88}
96%|█████████▌| 11077/11526 [1:55:56<04:35, 1.63it/s] 96%|█████████▌| 11078/11526 [1:55:57<04:35, 1.63it/s] {'loss': 0.1481, 'grad_norm': 0.5537636280059814, 'learning_rate': 4.595371794391845e-08, 'epoch': 2.88}
96%|█████████▌| 11078/11526 [1:55:57<04:35, 1.63it/s] 96%|█████████▌| 11079/11526 [1:55:58<04:34, 1.63it/s] {'loss': 0.1821, 'grad_norm': 0.625562310218811, 'learning_rate': 4.5749109453973595e-08, 'epoch': 2.88}
96%|█████████▌| 11079/11526 [1:55:58<04:34, 1.63it/s] 96%|█████████▌| 11080/11526 [1:55:58<04:33, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.5598582625389099, 'learning_rate': 4.554495539610959e-08, 'epoch': 2.88}
96%|█████████▌| 11080/11526 [1:55:58<04:33, 1.63it/s] 96%|█████████▌| 11081/11526 [1:55:59<04:33, 1.63it/s] {'loss': 0.1221, 'grad_norm': 0.6664264798164368, 'learning_rate': 4.5341255789051465e-08, 'epoch': 2.88}
96%|█████████▌| 11081/11526 [1:55:59<04:33, 1.63it/s] 96%|█████████▌| 11082/11526 [1:55:59<04:32, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.6065846085548401, 'learning_rate': 4.513801065148427e-08, 'epoch': 2.88}
96%|█████████▌| 11082/11526 [1:56:00<04:32, 1.63it/s] 96%|█████████▌| 11083/11526 [1:56:00<04:32, 1.63it/s] {'loss': 0.1142, 'grad_norm': 0.5104752779006958, 'learning_rate': 4.493522000205086e-08, 'epoch': 2.88}
96%|█████████▌| 11083/11526 [1:56:00<04:32, 1.63it/s] 96%|█████████▌| 11084/11526 [1:56:01<04:31, 1.63it/s] {'loss': 0.1394, 'grad_norm': 0.6209031343460083, 'learning_rate': 4.473288385935193e-08, 'epoch': 2.88}
96%|█████████▌| 11084/11526 [1:56:01<04:31, 1.63it/s] 96%|█████████▌| 11085/11526 [1:56:01<04:31, 1.63it/s] {'loss': 0.1894, 'grad_norm': 0.7361853122711182, 'learning_rate': 4.453100224194762e-08, 'epoch': 2.89}
96%|█████████▌| 11085/11526 [1:56:01<04:31, 1.63it/s] 96%|█████████▌| 11086/11526 [1:56:02<04:31, 1.62it/s] {'loss': 0.1475, 'grad_norm': 0.5499058961868286, 'learning_rate': 4.432957516835534e-08, 'epoch': 2.89}
96%|█████████▌| 11086/11526 [1:56:02<04:31, 1.62it/s] 96%|█████████▌| 11087/11526 [1:56:02<04:30, 1.62it/s] {'loss': 0.1473, 'grad_norm': 0.6134096384048462, 'learning_rate': 4.412860265705032e-08, 'epoch': 2.89}
96%|█████████▌| 11087/11526 [1:56:03<04:30, 1.62it/s] 96%|█████████▌| 11088/11526 [1:56:03<04:29, 1.62it/s] {'loss': 0.1412, 'grad_norm': 0.6085135340690613, 'learning_rate': 4.392808472646837e-08, 'epoch': 2.89}
96%|█████████▌| 11088/11526 [1:56:03<04:29, 1.62it/s] 96%|█████████▌| 11089/11526 [1:56:04<04:29, 1.62it/s] {'loss': 0.1549, 'grad_norm': 0.6036361455917358, 'learning_rate': 4.3728021395000874e-08, 'epoch': 2.89}
96%|█████████▌| 11089/11526 [1:56:04<04:29, 1.62it/s] 96%|█████████▌| 11090/11526 [1:56:04<04:28, 1.62it/s] {'loss': 0.1202, 'grad_norm': 0.551740825176239, 'learning_rate': 4.352841268099928e-08, 'epoch': 2.89}
96%|█████████▌| 11090/11526 [1:56:04<04:28, 1.62it/s] 96%|█████████▌| 11091/11526 [1:56:05<04:28, 1.62it/s] {'loss': 0.1617, 'grad_norm': 0.6130101084709167, 'learning_rate': 4.3329258602773375e-08, 'epoch': 2.89}
96%|█████████▌| 11091/11526 [1:56:05<04:28, 1.62it/s] 96%|█████████▌| 11092/11526 [1:56:06<04:27, 1.62it/s] {'loss': 0.1076, 'grad_norm': 0.46232372522354126, 'learning_rate': 4.313055917858911e-08, 'epoch': 2.89}
96%|█████████▌| 11092/11526 [1:56:06<04:27, 1.62it/s] 96%|█████████▌| 11093/11526 [1:56:06<04:26, 1.62it/s] {'loss': 0.1411, 'grad_norm': 0.554460883140564, 'learning_rate': 4.2932314426674673e-08, 'epoch': 2.89}
96%|█████████▌| 11093/11526 [1:56:06<04:26, 1.62it/s] 96%|█████████▋| 11094/11526 [1:56:07<04:25, 1.63it/s] {'loss': 0.1263, 'grad_norm': 0.49549826979637146, 'learning_rate': 4.273452436521219e-08, 'epoch': 2.89}
96%|█████████▋| 11094/11526 [1:56:07<04:25, 1.63it/s] 96%|█████████▋| 11095/11526 [1:56:07<04:25, 1.63it/s] {'loss': 0.1649, 'grad_norm': 0.6079532504081726, 'learning_rate': 4.253718901234494e-08, 'epoch': 2.89}
96%|█████████▋| 11095/11526 [1:56:08<04:25, 1.63it/s] 96%|█████████▋| 11096/11526 [1:56:08<04:24, 1.63it/s] {'loss': 0.1391, 'grad_norm': 0.5676825046539307, 'learning_rate': 4.2340308386173424e-08, 'epoch': 2.89}
96%|█████████▋| 11096/11526 [1:56:08<04:24, 1.63it/s] 96%|█████████▋| 11097/11526 [1:56:09<04:23, 1.63it/s] {'loss': 0.1196, 'grad_norm': 0.49375611543655396, 'learning_rate': 4.214388250475654e-08, 'epoch': 2.89}
96%|█████████▋| 11097/11526 [1:56:09<04:23, 1.63it/s] 96%|█████████▋| 11098/11526 [1:56:09<04:23, 1.63it/s] {'loss': 0.1329, 'grad_norm': 0.551080048084259, 'learning_rate': 4.1947911386112095e-08, 'epoch': 2.89}
96%|█████████▋| 11098/11526 [1:56:09<04:23, 1.63it/s] 96%|█████████▋| 11099/11526 [1:56:10<04:22, 1.63it/s] {'loss': 0.1613, 'grad_norm': 0.6597144603729248, 'learning_rate': 4.175239504821515e-08, 'epoch': 2.89}
96%|█████████▋| 11099/11526 [1:56:10<04:22, 1.63it/s] 96%|█████████▋| 11100/11526 [1:56:10<04:21, 1.63it/s] {'loss': 0.1688, 'grad_norm': 0.7770548462867737, 'learning_rate': 4.1557333509000266e-08, 'epoch': 2.89}
96%|█████████▋| 11100/11526 [1:56:11<04:21, 1.63it/s] 96%|█████████▋| 11101/11526 [1:56:11<04:21, 1.63it/s] {'loss': 0.1324, 'grad_norm': 0.5120864510536194, 'learning_rate': 4.1362726786358664e-08, 'epoch': 2.89}
96%|█████████▋| 11101/11526 [1:56:11<04:21, 1.63it/s] 96%|█████████▋| 11102/11526 [1:56:12<04:20, 1.63it/s] {'loss': 0.206, 'grad_norm': 0.698864758014679, 'learning_rate': 4.116857489814163e-08, 'epoch': 2.89}
96%|█████████▋| 11102/11526 [1:56:12<04:20, 1.63it/s] 96%|█████████▋| 11103/11526 [1:56:12<04:20, 1.63it/s] {'loss': 0.1269, 'grad_norm': 0.49968063831329346, 'learning_rate': 4.097487786215715e-08, 'epoch': 2.89}
96%|█████████▋| 11103/11526 [1:56:12<04:20, 1.63it/s] 96%|█████████▋| 11104/11526 [1:56:13<04:19, 1.63it/s] {'loss': 0.1392, 'grad_norm': 0.5309438705444336, 'learning_rate': 4.078163569617266e-08, 'epoch': 2.89}
96%|█████████▋| 11104/11526 [1:56:13<04:19, 1.63it/s] 96%|█████████▋| 11105/11526 [1:56:14<04:18, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5680993795394897, 'learning_rate': 4.058884841791344e-08, 'epoch': 2.89}
96%|█████████▋| 11105/11526 [1:56:14<04:18, 1.63it/s] 96%|█████████▋| 11106/11526 [1:56:14<04:18, 1.62it/s] {'loss': 0.1209, 'grad_norm': 0.5369157195091248, 'learning_rate': 4.039651604506256e-08, 'epoch': 2.89}
96%|█████████▋| 11106/11526 [1:56:14<04:18, 1.62it/s] 96%|█████████▋| 11107/11526 [1:56:15<04:17, 1.63it/s] {'loss': 0.1693, 'grad_norm': 0.6306920647621155, 'learning_rate': 4.020463859526258e-08, 'epoch': 2.89}
96%|█████████▋| 11107/11526 [1:56:15<04:17, 1.63it/s] 96%|█████████▋| 11108/11526 [1:56:15<04:17, 1.63it/s] {'loss': 0.1601, 'grad_norm': 0.6103684306144714, 'learning_rate': 4.0013216086113305e-08, 'epoch': 2.89}
96%|█████████▋| 11108/11526 [1:56:16<04:17, 1.63it/s] 96%|█████████▋| 11109/11526 [1:56:16<04:16, 1.63it/s] {'loss': 0.2182, 'grad_norm': 0.5857287645339966, 'learning_rate': 3.982224853517236e-08, 'epoch': 2.89}
96%|█████████▋| 11109/11526 [1:56:16<04:16, 1.63it/s] 96%|█████████▋| 11110/11526 [1:56:17<04:15, 1.63it/s] {'loss': 0.1139, 'grad_norm': 0.5319454073905945, 'learning_rate': 3.963173595995795e-08, 'epoch': 2.89}
96%|█████████▋| 11110/11526 [1:56:17<04:15, 1.63it/s] 96%|█████████▋| 11111/11526 [1:56:17<04:15, 1.62it/s] {'loss': 0.1962, 'grad_norm': 0.7579399943351746, 'learning_rate': 3.944167837794333e-08, 'epoch': 2.89}
96%|█████████▋| 11111/11526 [1:56:17<04:15, 1.62it/s] 96%|█████████▋| 11112/11526 [1:56:18<04:14, 1.63it/s] {'loss': 0.1404, 'grad_norm': 0.5721977949142456, 'learning_rate': 3.9252075806562316e-08, 'epoch': 2.89}
96%|█████████▋| 11112/11526 [1:56:18<04:14, 1.63it/s] 96%|█████████▋| 11113/11526 [1:56:18<04:13, 1.63it/s] {'loss': 0.1393, 'grad_norm': 0.582958996295929, 'learning_rate': 3.9062928263207125e-08, 'epoch': 2.89}
96%|█████████▋| 11113/11526 [1:56:19<04:13, 1.63it/s] 96%|█████████▋| 11114/11526 [1:56:19<04:13, 1.63it/s] {'loss': 0.1477, 'grad_norm': 0.603283166885376, 'learning_rate': 3.887423576522609e-08, 'epoch': 2.89}
96%|█████████▋| 11114/11526 [1:56:19<04:13, 1.63it/s] 96%|█████████▋| 11115/11526 [1:56:20<04:12, 1.63it/s] {'loss': 0.1755, 'grad_norm': 0.6194013357162476, 'learning_rate': 3.868599832992814e-08, 'epoch': 2.89}
96%|█████████▋| 11115/11526 [1:56:20<04:12, 1.63it/s] 96%|█████████▋| 11116/11526 [1:56:20<04:12, 1.63it/s] {'loss': 0.1345, 'grad_norm': 0.5717779994010925, 'learning_rate': 3.849821597457892e-08, 'epoch': 2.89}
96%|█████████▋| 11116/11526 [1:56:20<04:12, 1.63it/s] 96%|█████████▋| 11117/11526 [1:56:21<04:11, 1.63it/s] {'loss': 0.141, 'grad_norm': 0.5919985175132751, 'learning_rate': 3.8310888716403535e-08, 'epoch': 2.89}
96%|█████████▋| 11117/11526 [1:56:21<04:11, 1.63it/s] 96%|█████████▋| 11118/11526 [1:56:22<04:10, 1.63it/s] {'loss': 0.1568, 'grad_norm': 0.5975980162620544, 'learning_rate': 3.81240165725838e-08, 'epoch': 2.89}
96%|█████████▋| 11118/11526 [1:56:22<04:10, 1.63it/s] 96%|█████████▋| 11119/11526 [1:56:22<04:10, 1.63it/s] {'loss': 0.1759, 'grad_norm': 0.5830062031745911, 'learning_rate': 3.7937599560260996e-08, 'epoch': 2.89}
96%|█████████▋| 11119/11526 [1:56:22<04:10, 1.63it/s] 96%|█████████▋| 11120/11526 [1:56:23<04:09, 1.63it/s] {'loss': 0.1625, 'grad_norm': 0.8493350744247437, 'learning_rate': 3.775163769653478e-08, 'epoch': 2.89}
96%|█████████▋| 11120/11526 [1:56:23<04:09, 1.63it/s] 96%|█████████▋| 11121/11526 [1:56:23<04:09, 1.63it/s] {'loss': 0.1639, 'grad_norm': 0.6353427171707153, 'learning_rate': 3.756613099846262e-08, 'epoch': 2.89}
96%|█████████▋| 11121/11526 [1:56:24<04:09, 1.63it/s] 96%|█████████▋| 11122/11526 [1:56:24<04:08, 1.63it/s] {'loss': 0.1784, 'grad_norm': 0.6296930313110352, 'learning_rate': 3.7381079483059804e-08, 'epoch': 2.89}
96%|█████████▋| 11122/11526 [1:56:24<04:08, 1.63it/s] 97%|█████████▋| 11123/11526 [1:56:25<04:07, 1.63it/s] {'loss': 0.1705, 'grad_norm': 0.6522207260131836, 'learning_rate': 3.719648316730051e-08, 'epoch': 2.9}
97%|█████████▋| 11123/11526 [1:56:25<04:07, 1.63it/s] 97%|█████████▋| 11124/11526 [1:56:25<04:07, 1.63it/s] {'loss': 0.1249, 'grad_norm': 0.5440175533294678, 'learning_rate': 3.701234206811732e-08, 'epoch': 2.9}
97%|█████████▋| 11124/11526 [1:56:25<04:07, 1.63it/s] 97%|█████████▋| 11125/11526 [1:56:26<04:06, 1.63it/s] {'loss': 0.1465, 'grad_norm': 0.5670942068099976, 'learning_rate': 3.682865620240006e-08, 'epoch': 2.9}
97%|█████████▋| 11125/11526 [1:56:26<04:06, 1.63it/s] 97%|█████████▋| 11126/11526 [1:56:26<04:05, 1.63it/s] {'loss': 0.1716, 'grad_norm': 0.6275484561920166, 'learning_rate': 3.6645425586998016e-08, 'epoch': 2.9}
97%|█████████▋| 11126/11526 [1:56:27<04:05, 1.63it/s] 97%|█████████▋| 11127/11526 [1:56:27<04:05, 1.63it/s] {'loss': 0.1665, 'grad_norm': 0.6259291768074036, 'learning_rate': 3.64626502387172e-08, 'epoch': 2.9}
97%|█████████▋| 11127/11526 [1:56:27<04:05, 1.63it/s] 97%|█████████▋| 11128/11526 [1:56:28<04:04, 1.63it/s] {'loss': 0.1411, 'grad_norm': 0.5272864699363708, 'learning_rate': 3.628033017432364e-08, 'epoch': 2.9}
97%|█████████▋| 11128/11526 [1:56:28<04:04, 1.63it/s] 97%|█████████▋| 11129/11526 [1:56:28<04:18, 1.54it/s] {'loss': 0.1544, 'grad_norm': 0.5889256596565247, 'learning_rate': 3.609846541054118e-08, 'epoch': 2.9}
97%|█████████▋| 11129/11526 [1:56:29<04:18, 1.54it/s] 97%|█████████▋| 11130/11526 [1:56:29<04:13, 1.56it/s] {'loss': 0.159, 'grad_norm': 0.6688299775123596, 'learning_rate': 3.5917055964050374e-08, 'epoch': 2.9}
97%|█████████▋| 11130/11526 [1:56:29<04:13, 1.56it/s] 97%|█████████▋| 11131/11526 [1:56:30<04:10, 1.58it/s] {'loss': 0.1289, 'grad_norm': 0.6310927271842957, 'learning_rate': 3.573610185149179e-08, 'epoch': 2.9}
97%|█████████▋| 11131/11526 [1:56:30<04:10, 1.58it/s] 97%|█████████▋| 11132/11526 [1:56:30<04:07, 1.59it/s] {'loss': 0.1916, 'grad_norm': 0.6834300756454468, 'learning_rate': 3.555560308946326e-08, 'epoch': 2.9}
97%|█████████▋| 11132/11526 [1:56:30<04:07, 1.59it/s] 97%|█████████▋| 11133/11526 [1:56:31<04:05, 1.60it/s] {'loss': 0.1578, 'grad_norm': 0.5952564477920532, 'learning_rate': 3.5375559694521555e-08, 'epoch': 2.9}
97%|█████████▋| 11133/11526 [1:56:31<04:05, 1.60it/s] 97%|█████████▋| 11134/11526 [1:56:32<04:03, 1.61it/s] {'loss': 0.1284, 'grad_norm': 0.5487667322158813, 'learning_rate': 3.5195971683181227e-08, 'epoch': 2.9}
97%|█████████▋| 11134/11526 [1:56:32<04:03, 1.61it/s] 97%|█████████▋| 11135/11526 [1:56:32<04:02, 1.61it/s] {'loss': 0.1503, 'grad_norm': 0.6374291181564331, 'learning_rate': 3.5016839071914666e-08, 'epoch': 2.9}
97%|█████████▋| 11135/11526 [1:56:32<04:02, 1.61it/s] 97%|█████████▋| 11136/11526 [1:56:33<04:02, 1.61it/s] {'loss': 0.1537, 'grad_norm': 0.5750308036804199, 'learning_rate': 3.4838161877153167e-08, 'epoch': 2.9}
97%|█████████▋| 11136/11526 [1:56:33<04:02, 1.61it/s] 97%|█████████▋| 11137/11526 [1:56:33<04:00, 1.61it/s] {'loss': 0.1251, 'grad_norm': 0.5630502700805664, 'learning_rate': 3.4659940115286393e-08, 'epoch': 2.9}
97%|█████████▋| 11137/11526 [1:56:34<04:00, 1.61it/s] 97%|█████████▋| 11138/11526 [1:56:34<03:59, 1.62it/s] {'loss': 0.1422, 'grad_norm': 0.5478718280792236, 'learning_rate': 3.448217380266128e-08, 'epoch': 2.9}
97%|█████████▋| 11138/11526 [1:56:34<03:59, 1.62it/s] 97%|█████████▋| 11139/11526 [1:56:35<03:58, 1.62it/s] {'loss': 0.1635, 'grad_norm': 0.6421257257461548, 'learning_rate': 3.4304862955583665e-08, 'epoch': 2.9}
97%|█████████▋| 11139/11526 [1:56:35<03:58, 1.62it/s] 97%|█████████▋| 11140/11526 [1:56:35<03:57, 1.62it/s] {'loss': 0.1348, 'grad_norm': 0.574113667011261, 'learning_rate': 3.412800759031776e-08, 'epoch': 2.9}
97%|█████████▋| 11140/11526 [1:56:35<03:57, 1.62it/s] 97%|█████████▋| 11141/11526 [1:56:36<03:57, 1.62it/s] {'loss': 0.1572, 'grad_norm': 0.6280604600906372, 'learning_rate': 3.3951607723085586e-08, 'epoch': 2.9}
97%|█████████▋| 11141/11526 [1:56:36<03:57, 1.62it/s] 97%|█████████▋| 11142/11526 [1:56:36<03:56, 1.62it/s] {'loss': 0.1463, 'grad_norm': 0.5660091638565063, 'learning_rate': 3.37756633700681e-08, 'epoch': 2.9}
97%|█████████▋| 11142/11526 [1:56:37<03:56, 1.62it/s] 97%|█████████▋| 11143/11526 [1:56:37<03:55, 1.62it/s] {'loss': 0.1642, 'grad_norm': 0.6578890681266785, 'learning_rate': 3.360017454740294e-08, 'epoch': 2.9}
97%|█████████▋| 11143/11526 [1:56:37<03:55, 1.62it/s] 97%|█████████▋| 11144/11526 [1:56:38<03:55, 1.62it/s] {'loss': 0.136, 'grad_norm': 0.5833616852760315, 'learning_rate': 3.342514127118779e-08, 'epoch': 2.9}
97%|█████████▋| 11144/11526 [1:56:38<03:55, 1.62it/s] 97%|█████████▋| 11145/11526 [1:56:38<03:54, 1.62it/s] {'loss': 0.1726, 'grad_norm': 0.6278811693191528, 'learning_rate': 3.3250563557477025e-08, 'epoch': 2.9}
97%|█████████▋| 11145/11526 [1:56:38<03:54, 1.62it/s] 97%|█████████▋| 11146/11526 [1:56:39<03:54, 1.62it/s] {'loss': 0.1667, 'grad_norm': 0.686174213886261, 'learning_rate': 3.3076441422283944e-08, 'epoch': 2.9}
97%|█████████▋| 11146/11526 [1:56:39<03:54, 1.62it/s] 97%|█████████▋| 11147/11526 [1:56:40<03:53, 1.62it/s] {'loss': 0.1361, 'grad_norm': 0.5755410194396973, 'learning_rate': 3.290277488158078e-08, 'epoch': 2.9}
97%|█████████▋| 11147/11526 [1:56:40<03:53, 1.62it/s] 97%|█████████▋| 11148/11526 [1:56:40<03:52, 1.63it/s] {'loss': 0.1457, 'grad_norm': 0.6248868107795715, 'learning_rate': 3.272956395129645e-08, 'epoch': 2.9}
97%|█████████▋| 11148/11526 [1:56:40<03:52, 1.63it/s] 97%|█████████▋| 11149/11526 [1:56:41<03:51, 1.63it/s] {'loss': 0.1596, 'grad_norm': 0.650974452495575, 'learning_rate': 3.255680864731936e-08, 'epoch': 2.9}
97%|█████████▋| 11149/11526 [1:56:41<03:51, 1.63it/s] 97%|█████████▋| 11150/11526 [1:56:41<03:51, 1.63it/s] {'loss': 0.1408, 'grad_norm': 0.5935438275337219, 'learning_rate': 3.2384508985495164e-08, 'epoch': 2.9}
97%|█████████▋| 11150/11526 [1:56:42<03:51, 1.63it/s] 97%|█████████▋| 11151/11526 [1:56:42<03:50, 1.63it/s] {'loss': 0.14, 'grad_norm': 0.5809087753295898, 'learning_rate': 3.221266498162845e-08, 'epoch': 2.9}
97%|█████████▋| 11151/11526 [1:56:42<03:50, 1.63it/s] 97%|█████████▋| 11152/11526 [1:56:43<03:50, 1.63it/s] {'loss': 0.1629, 'grad_norm': 0.5704101324081421, 'learning_rate': 3.20412766514816e-08, 'epoch': 2.9}
97%|█████████▋| 11152/11526 [1:56:43<03:50, 1.63it/s] 97%|█████████▋| 11153/11526 [1:56:43<03:49, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.5443863272666931, 'learning_rate': 3.187034401077538e-08, 'epoch': 2.9}
97%|█████████▋| 11153/11526 [1:56:43<03:49, 1.63it/s] 97%|█████████▋| 11154/11526 [1:56:44<03:49, 1.62it/s] {'loss': 0.1569, 'grad_norm': 0.6439080238342285, 'learning_rate': 3.1699867075188904e-08, 'epoch': 2.9}
97%|█████████▋| 11154/11526 [1:56:44<03:49, 1.62it/s] 97%|█████████▋| 11155/11526 [1:56:44<03:48, 1.63it/s] {'loss': 0.126, 'grad_norm': 0.5077948570251465, 'learning_rate': 3.152984586035857e-08, 'epoch': 2.9}
97%|█████████▋| 11155/11526 [1:56:45<03:48, 1.63it/s] 97%|█████████▋| 11156/11526 [1:56:45<03:47, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.5254815816879272, 'learning_rate': 3.136028038188077e-08, 'epoch': 2.9}
97%|█████████▋| 11156/11526 [1:56:45<03:47, 1.63it/s] 97%|█████████▋| 11157/11526 [1:56:46<03:47, 1.63it/s] {'loss': 0.1323, 'grad_norm': 0.5205565690994263, 'learning_rate': 3.119117065530808e-08, 'epoch': 2.9}
97%|█████████▋| 11157/11526 [1:56:46<03:47, 1.63it/s] 97%|█████████▋| 11158/11526 [1:56:46<03:46, 1.63it/s] {'loss': 0.1419, 'grad_norm': 0.6172041296958923, 'learning_rate': 3.102251669615253e-08, 'epoch': 2.9}
97%|█████████▋| 11158/11526 [1:56:46<03:46, 1.63it/s] 97%|█████████▋| 11159/11526 [1:56:47<03:45, 1.63it/s] {'loss': 0.1277, 'grad_norm': 0.5143864154815674, 'learning_rate': 3.085431851988452e-08, 'epoch': 2.9}
97%|█████████▋| 11159/11526 [1:56:47<03:45, 1.63it/s] 97%|█████████▋| 11160/11526 [1:56:48<03:45, 1.63it/s] {'loss': 0.1158, 'grad_norm': 0.4937044084072113, 'learning_rate': 3.0686576141931156e-08, 'epoch': 2.9}
97%|█████████▋| 11160/11526 [1:56:48<03:45, 1.63it/s] 97%|█████████▋| 11161/11526 [1:56:48<03:44, 1.63it/s] {'loss': 0.1426, 'grad_norm': 0.612160861492157, 'learning_rate': 3.051928957767958e-08, 'epoch': 2.9}
97%|█████████▋| 11161/11526 [1:56:48<03:44, 1.63it/s] 97%|█████████▋| 11162/11526 [1:56:49<03:43, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.6750708818435669, 'learning_rate': 3.0352458842474176e-08, 'epoch': 2.91}
97%|█████████▋| 11162/11526 [1:56:49<03:43, 1.63it/s] 97%|█████████▋| 11163/11526 [1:56:49<03:42, 1.63it/s] {'loss': 0.1506, 'grad_norm': 0.6063752770423889, 'learning_rate': 3.0186083951616595e-08, 'epoch': 2.91}
97%|█████████▋| 11163/11526 [1:56:49<03:42, 1.63it/s] 97%|█████████▋| 11164/11526 [1:56:50<03:42, 1.63it/s] {'loss': 0.1351, 'grad_norm': 0.5463297963142395, 'learning_rate': 3.002016492036908e-08, 'epoch': 2.91}
97%|█████████▋| 11164/11526 [1:56:50<03:42, 1.63it/s] 97%|█████████▋| 11165/11526 [1:56:51<03:47, 1.58it/s] {'loss': 0.1814, 'grad_norm': 0.6291292905807495, 'learning_rate': 2.985470176395e-08, 'epoch': 2.91}
97%|█████████▋| 11165/11526 [1:56:51<03:47, 1.58it/s] 97%|█████████▋| 11166/11526 [1:56:51<03:46, 1.59it/s] {'loss': 0.1778, 'grad_norm': 0.6228814125061035, 'learning_rate': 2.968969449753667e-08, 'epoch': 2.91}
97%|█████████▋| 11166/11526 [1:56:51<03:46, 1.59it/s] 97%|█████████▋| 11167/11526 [1:56:52<03:44, 1.60it/s] {'loss': 0.1546, 'grad_norm': 0.7046887874603271, 'learning_rate': 2.95251431362642e-08, 'epoch': 2.91}
97%|█████████▋| 11167/11526 [1:56:52<03:44, 1.60it/s] 97%|█████████▋| 11168/11526 [1:56:53<03:42, 1.61it/s] {'loss': 0.1937, 'grad_norm': 0.627465009689331, 'learning_rate': 2.936104769522663e-08, 'epoch': 2.91}
97%|█████████▋| 11168/11526 [1:56:53<03:42, 1.61it/s] 97%|█████████▋| 11169/11526 [1:56:53<03:41, 1.61it/s] {'loss': 0.1957, 'grad_norm': 0.7231285572052002, 'learning_rate': 2.9197408189475253e-08, 'epoch': 2.91}
97%|█████████▋| 11169/11526 [1:56:53<03:41, 1.61it/s] 97%|█████████▋| 11170/11526 [1:56:54<03:40, 1.62it/s] {'loss': 0.1764, 'grad_norm': 0.6793641448020935, 'learning_rate': 2.9034224634020836e-08, 'epoch': 2.91}
97%|█████████▋| 11170/11526 [1:56:54<03:40, 1.62it/s] 97%|█████████▋| 11171/11526 [1:56:54<03:39, 1.62it/s] {'loss': 0.1203, 'grad_norm': 0.5161430239677429, 'learning_rate': 2.88714970438303e-08, 'epoch': 2.91}
97%|█████████▋| 11171/11526 [1:56:54<03:39, 1.62it/s] 97%|█████████▋| 11172/11526 [1:56:55<03:38, 1.62it/s] {'loss': 0.134, 'grad_norm': 0.644498884677887, 'learning_rate': 2.8709225433831146e-08, 'epoch': 2.91}
97%|█████████▋| 11172/11526 [1:56:55<03:38, 1.62it/s] 97%|█████████▋| 11173/11526 [1:56:56<03:37, 1.62it/s] {'loss': 0.1249, 'grad_norm': 0.5932208299636841, 'learning_rate': 2.8547409818907025e-08, 'epoch': 2.91}
97%|█████████▋| 11173/11526 [1:56:56<03:37, 1.62it/s] 97%|█████████▋| 11174/11526 [1:56:56<03:37, 1.62it/s] {'loss': 0.1277, 'grad_norm': 0.5214039087295532, 'learning_rate': 2.838605021390106e-08, 'epoch': 2.91}
97%|█████████▋| 11174/11526 [1:56:56<03:37, 1.62it/s] 97%|█████████▋| 11175/11526 [1:56:57<03:36, 1.62it/s] {'loss': 0.1637, 'grad_norm': 0.7387245893478394, 'learning_rate': 2.8225146633613642e-08, 'epoch': 2.91}
97%|█████████▋| 11175/11526 [1:56:57<03:36, 1.62it/s] 97%|█████████▋| 11176/11526 [1:56:57<03:35, 1.62it/s] {'loss': 0.1536, 'grad_norm': 0.5652535557746887, 'learning_rate': 2.806469909280407e-08, 'epoch': 2.91}
97%|█████████▋| 11176/11526 [1:56:58<03:35, 1.62it/s] 97%|█████████▋| 11177/11526 [1:56:58<03:40, 1.58it/s] {'loss': 0.1437, 'grad_norm': 0.5775116086006165, 'learning_rate': 2.7904707606189462e-08, 'epoch': 2.91}
97%|█████████▋| 11177/11526 [1:56:58<03:40, 1.58it/s] 97%|█████████▋| 11178/11526 [1:56:59<03:39, 1.59it/s] {'loss': 0.1195, 'grad_norm': 0.5391250848770142, 'learning_rate': 2.7745172188445303e-08, 'epoch': 2.91}
97%|█████████▋| 11178/11526 [1:56:59<03:39, 1.59it/s] 97%|█████████▋| 11179/11526 [1:56:59<03:37, 1.60it/s] {'loss': 0.1411, 'grad_norm': 0.6559170484542847, 'learning_rate': 2.7586092854204884e-08, 'epoch': 2.91}
97%|█████████▋| 11179/11526 [1:56:59<03:37, 1.60it/s] 97%|█████████▋| 11180/11526 [1:57:00<03:35, 1.61it/s] {'loss': 0.1619, 'grad_norm': 0.5718007683753967, 'learning_rate': 2.7427469618059866e-08, 'epoch': 2.91}
97%|█████████▋| 11180/11526 [1:57:00<03:35, 1.61it/s] 97%|█████████▋| 11181/11526 [1:57:01<03:34, 1.61it/s] {'loss': 0.1091, 'grad_norm': 0.41851112246513367, 'learning_rate': 2.7269302494559725e-08, 'epoch': 2.91}
97%|█████████▋| 11181/11526 [1:57:01<03:34, 1.61it/s] 97%|█████████▋| 11182/11526 [1:57:01<03:33, 1.61it/s] {'loss': 0.1776, 'grad_norm': 0.6764783263206482, 'learning_rate': 2.7111591498212852e-08, 'epoch': 2.91}
97%|█████████▋| 11182/11526 [1:57:01<03:33, 1.61it/s] 97%|█████████▋| 11183/11526 [1:57:02<03:32, 1.62it/s] {'loss': 0.148, 'grad_norm': 0.566128671169281, 'learning_rate': 2.6954336643486012e-08, 'epoch': 2.91}
97%|█████████▋| 11183/11526 [1:57:02<03:32, 1.62it/s] 97%|█████████▋| 11184/11526 [1:57:02<03:31, 1.62it/s] {'loss': 0.1558, 'grad_norm': 0.5945770144462585, 'learning_rate': 2.6797537944802666e-08, 'epoch': 2.91}
97%|█████████▋| 11184/11526 [1:57:03<03:31, 1.62it/s] 97%|█████████▋| 11185/11526 [1:57:03<03:30, 1.62it/s] {'loss': 0.1857, 'grad_norm': 0.6711657643318176, 'learning_rate': 2.6641195416545197e-08, 'epoch': 2.91}
97%|█████████▋| 11185/11526 [1:57:03<03:30, 1.62it/s] 97%|█████████▋| 11186/11526 [1:57:04<03:29, 1.62it/s] {'loss': 0.1848, 'grad_norm': 0.6784254908561707, 'learning_rate': 2.648530907305491e-08, 'epoch': 2.91}
97%|█████████▋| 11186/11526 [1:57:04<03:29, 1.62it/s] 97%|█████████▋| 11187/11526 [1:57:04<03:28, 1.62it/s] {'loss': 0.1837, 'grad_norm': 0.6849716901779175, 'learning_rate': 2.6329878928629814e-08, 'epoch': 2.91}
97%|█████████▋| 11187/11526 [1:57:04<03:28, 1.62it/s] 97%|█████████▋| 11188/11526 [1:57:05<03:28, 1.62it/s] {'loss': 0.1508, 'grad_norm': 0.5576862096786499, 'learning_rate': 2.6174904997527396e-08, 'epoch': 2.91}
97%|█████████▋| 11188/11526 [1:57:05<03:28, 1.62it/s] 97%|█████████▋| 11189/11526 [1:57:06<03:27, 1.62it/s] {'loss': 0.1454, 'grad_norm': 0.6025176644325256, 'learning_rate': 2.6020387293962945e-08, 'epoch': 2.91}
97%|█████████▋| 11189/11526 [1:57:06<03:27, 1.62it/s] 97%|█████████▋| 11190/11526 [1:57:06<03:26, 1.63it/s] {'loss': 0.1543, 'grad_norm': 0.6254362463951111, 'learning_rate': 2.5866325832108464e-08, 'epoch': 2.91}
97%|█████████▋| 11190/11526 [1:57:06<03:26, 1.63it/s] 97%|█████████▋| 11191/11526 [1:57:07<03:26, 1.62it/s] {'loss': 0.1421, 'grad_norm': 0.5862125754356384, 'learning_rate': 2.571272062609709e-08, 'epoch': 2.91}
97%|█████████▋| 11191/11526 [1:57:07<03:26, 1.62it/s] 97%|█████████▋| 11192/11526 [1:57:07<03:25, 1.63it/s] {'loss': 0.1237, 'grad_norm': 0.5578036308288574, 'learning_rate': 2.5559571690016995e-08, 'epoch': 2.91}
97%|█████████▋| 11192/11526 [1:57:07<03:25, 1.63it/s] 97%|█████████▋| 11193/11526 [1:57:08<03:24, 1.63it/s] {'loss': 0.1716, 'grad_norm': 0.5841283202171326, 'learning_rate': 2.540687903791639e-08, 'epoch': 2.91}
97%|█████████▋| 11193/11526 [1:57:08<03:24, 1.63it/s] 97%|█████████▋| 11194/11526 [1:57:09<03:24, 1.63it/s] {'loss': 0.1451, 'grad_norm': 0.5283409357070923, 'learning_rate': 2.5254642683801845e-08, 'epoch': 2.91}
97%|█████████▋| 11194/11526 [1:57:09<03:24, 1.63it/s] 97%|█████████▋| 11195/11526 [1:57:09<03:23, 1.63it/s] {'loss': 0.1396, 'grad_norm': 0.5816594958305359, 'learning_rate': 2.510286264163553e-08, 'epoch': 2.91}
97%|█████████▋| 11195/11526 [1:57:09<03:23, 1.63it/s] 97%|█████████▋| 11196/11526 [1:57:10<03:22, 1.63it/s] {'loss': 0.1481, 'grad_norm': 0.5983207821846008, 'learning_rate': 2.49515389253413e-08, 'epoch': 2.91}
97%|█████████▋| 11196/11526 [1:57:10<03:22, 1.63it/s] 97%|█████████▋| 11197/11526 [1:57:10<03:21, 1.63it/s] {'loss': 0.113, 'grad_norm': 0.45485544204711914, 'learning_rate': 2.4800671548798615e-08, 'epoch': 2.91}
97%|█████████▋| 11197/11526 [1:57:11<03:21, 1.63it/s] 97%|█████████▋| 11198/11526 [1:57:11<03:21, 1.63it/s] {'loss': 0.1317, 'grad_norm': 0.565839409828186, 'learning_rate': 2.4650260525846404e-08, 'epoch': 2.91}
97%|█████████▋| 11198/11526 [1:57:11<03:21, 1.63it/s] 97%|█████████▋| 11199/11526 [1:57:12<03:20, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.5887275338172913, 'learning_rate': 2.4500305870280295e-08, 'epoch': 2.91}
97%|█████████▋| 11199/11526 [1:57:12<03:20, 1.63it/s] 97%|█████████▋| 11200/11526 [1:57:12<03:20, 1.63it/s] {'loss': 0.1479, 'grad_norm': 0.56968754529953, 'learning_rate': 2.4350807595855953e-08, 'epoch': 2.92}
97%|█████████▋| 11200/11526 [1:57:12<03:20, 1.63it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.34it/s]
31%|███ | 4/13 [00:00<00:01, 8.39it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.79it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.42it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.01it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.90it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.83it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.74it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
{'eval_loss': 0.5424527525901794, 'eval_runtime': 1.9545, 'eval_samples_per_second': 102.33, 'eval_steps_per_second': 6.651, 'epoch': 2.92}
97%|█████████▋| 11200/11526 [1:57:14<03:20, 1.63it/s]
100%|██████████| 13/13 [00:01<00:00, 6.75it/s]
 97%|█████████▋| 11201/11526 [1:57:15<06:30, 1.20s/it] {'loss': 0.1511, 'grad_norm': 0.6250172853469849, 'learning_rate': 2.4201765716285185e-08, 'epoch': 2.92}
97%|█████████▋| 11201/11526 [1:57:15<06:30, 1.20s/it] 97%|█████████▋| 11202/11526 [1:57:15<05:32, 1.03s/it] {'loss': 0.1517, 'grad_norm': 0.6072071194648743, 'learning_rate': 2.4053180245240392e-08, 'epoch': 2.92}
97%|█████████▋| 11202/11526 [1:57:16<05:32, 1.03s/it] 97%|█████████▋| 11203/11526 [1:57:16<04:51, 1.11it/s] {'loss': 0.1495, 'grad_norm': 0.5831158757209778, 'learning_rate': 2.390505119634956e-08, 'epoch': 2.92}
97%|█████████▋| 11203/11526 [1:57:16<04:51, 1.11it/s] 97%|█████████▋| 11204/11526 [1:57:17<04:22, 1.22it/s] {'loss': 0.2166, 'grad_norm': 0.7850402593612671, 'learning_rate': 2.375737858320015e-08, 'epoch': 2.92}
97%|█████████▋| 11204/11526 [1:57:17<04:22, 1.22it/s] 97%|█████████▋| 11205/11526 [1:57:17<04:02, 1.32it/s] {'loss': 0.1378, 'grad_norm': 0.573187530040741, 'learning_rate': 2.3610162419338e-08, 'epoch': 2.92}
97%|█████████▋| 11205/11526 [1:57:17<04:02, 1.32it/s] 97%|█████████▋| 11206/11526 [1:57:18<03:48, 1.40it/s] {'loss': 0.124, 'grad_norm': 0.5443989038467407, 'learning_rate': 2.3463402718265638e-08, 'epoch': 2.92}
97%|█████████▋| 11206/11526 [1:57:18<03:48, 1.40it/s] 97%|█████████▋| 11207/11526 [1:57:19<03:38, 1.46it/s] {'loss': 0.1565, 'grad_norm': 0.6332186460494995, 'learning_rate': 2.3317099493445626e-08, 'epoch': 2.92}
97%|█████████▋| 11207/11526 [1:57:19<03:38, 1.46it/s] 97%|█████████▋| 11208/11526 [1:57:19<03:30, 1.51it/s] {'loss': 0.156, 'grad_norm': 0.5824265480041504, 'learning_rate': 2.3171252758297237e-08, 'epoch': 2.92}
97%|█████████▋| 11208/11526 [1:57:19<03:30, 1.51it/s] 97%|█████████▋| 11209/11526 [1:57:20<03:25, 1.54it/s] {'loss': 0.1478, 'grad_norm': 0.558551013469696, 'learning_rate': 2.302586252619865e-08, 'epoch': 2.92}
97%|█████████▋| 11209/11526 [1:57:20<03:25, 1.54it/s] 97%|█████████▋| 11210/11526 [1:57:20<03:21, 1.57it/s] {'loss': 0.135, 'grad_norm': 0.5463122725486755, 'learning_rate': 2.2880928810486425e-08, 'epoch': 2.92}
97%|█████████▋| 11210/11526 [1:57:20<03:21, 1.57it/s] 97%|█████████▋| 11211/11526 [1:57:21<03:18, 1.58it/s] {'loss': 0.1659, 'grad_norm': 0.5733034014701843, 'learning_rate': 2.273645162445326e-08, 'epoch': 2.92}
97%|█████████▋| 11211/11526 [1:57:21<03:18, 1.58it/s] 97%|█████████▋| 11212/11526 [1:57:22<03:16, 1.60it/s] {'loss': 0.1765, 'grad_norm': 0.87806636095047, 'learning_rate': 2.2592430981352444e-08, 'epoch': 2.92}
97%|█████████▋| 11212/11526 [1:57:22<03:16, 1.60it/s] 97%|█████████▋| 11213/11526 [1:57:22<03:14, 1.61it/s] {'loss': 0.1463, 'grad_norm': 0.6572072505950928, 'learning_rate': 2.2448866894393962e-08, 'epoch': 2.92}
97%|█████████▋| 11213/11526 [1:57:22<03:14, 1.61it/s] 97%|█████████▋| 11214/11526 [1:57:23<03:13, 1.61it/s] {'loss': 0.1964, 'grad_norm': 0.9587633013725281, 'learning_rate': 2.2305759376746174e-08, 'epoch': 2.92}
97%|█████████▋| 11214/11526 [1:57:23<03:13, 1.61it/s] 97%|█████████▋| 11215/11526 [1:57:23<03:12, 1.61it/s] {'loss': 0.128, 'grad_norm': 0.5056919455528259, 'learning_rate': 2.2163108441536907e-08, 'epoch': 2.92}
97%|█████████▋| 11215/11526 [1:57:24<03:12, 1.61it/s] 97%|█████████▋| 11216/11526 [1:57:24<03:11, 1.62it/s] {'loss': 0.1353, 'grad_norm': 0.5239769816398621, 'learning_rate': 2.202091410184959e-08, 'epoch': 2.92}
97%|█████████▋| 11216/11526 [1:57:24<03:11, 1.62it/s] 97%|█████████▋| 11217/11526 [1:57:25<03:10, 1.62it/s] {'loss': 0.1633, 'grad_norm': 0.6074208617210388, 'learning_rate': 2.1879176370727673e-08, 'epoch': 2.92}
97%|█████████▋| 11217/11526 [1:57:25<03:10, 1.62it/s] 97%|█████████▋| 11218/11526 [1:57:25<03:09, 1.62it/s] {'loss': 0.179, 'grad_norm': 0.5965689420700073, 'learning_rate': 2.173789526117187e-08, 'epoch': 2.92}
97%|█████████▋| 11218/11526 [1:57:25<03:09, 1.62it/s] 97%|█████████▋| 11219/11526 [1:57:26<03:09, 1.62it/s] {'loss': 0.1568, 'grad_norm': 0.6236845254898071, 'learning_rate': 2.1597070786141817e-08, 'epoch': 2.92}
97%|█████████▋| 11219/11526 [1:57:26<03:09, 1.62it/s] 97%|█████████▋| 11220/11526 [1:57:27<03:08, 1.62it/s] {'loss': 0.1601, 'grad_norm': 0.5811443328857422, 'learning_rate': 2.1456702958553844e-08, 'epoch': 2.92}
97%|█████████▋| 11220/11526 [1:57:27<03:08, 1.62it/s] 97%|█████████▋| 11221/11526 [1:57:27<03:07, 1.62it/s] {'loss': 0.1372, 'grad_norm': 0.5626583695411682, 'learning_rate': 2.131679179128432e-08, 'epoch': 2.92}
97%|█████████▋| 11221/11526 [1:57:27<03:07, 1.62it/s] 97%|█████████▋| 11222/11526 [1:57:28<03:07, 1.63it/s] {'loss': 0.1268, 'grad_norm': 0.5248302817344666, 'learning_rate': 2.1177337297165757e-08, 'epoch': 2.92}
97%|█████████▋| 11222/11526 [1:57:28<03:07, 1.63it/s] 97%|█████████▋| 11223/11526 [1:57:28<03:06, 1.63it/s] {'loss': 0.1534, 'grad_norm': 0.6116054654121399, 'learning_rate': 2.1038339488990144e-08, 'epoch': 2.92}
97%|█████████▋| 11223/11526 [1:57:28<03:06, 1.63it/s] 97%|█████████▋| 11224/11526 [1:57:29<03:05, 1.62it/s] {'loss': 0.1494, 'grad_norm': 0.6221910119056702, 'learning_rate': 2.089979837950784e-08, 'epoch': 2.92}
97%|█████████▋| 11224/11526 [1:57:29<03:05, 1.62it/s] 97%|█████████▋| 11225/11526 [1:57:30<03:05, 1.63it/s] {'loss': 0.1382, 'grad_norm': 0.5661308169364929, 'learning_rate': 2.0761713981425347e-08, 'epoch': 2.92}
97%|█████████▋| 11225/11526 [1:57:30<03:05, 1.63it/s] 97%|█████████▋| 11226/11526 [1:57:30<03:04, 1.62it/s] {'loss': 0.1619, 'grad_norm': 0.6100101470947266, 'learning_rate': 2.0624086307409198e-08, 'epoch': 2.92}
97%|█████████▋| 11226/11526 [1:57:30<03:04, 1.62it/s] 97%|█████████▋| 11227/11526 [1:57:31<03:04, 1.62it/s] {'loss': 0.1166, 'grad_norm': 0.5549701452255249, 'learning_rate': 2.0486915370083183e-08, 'epoch': 2.92}
97%|█████████▋| 11227/11526 [1:57:31<03:04, 1.62it/s] 97%|█████████▋| 11228/11526 [1:57:31<03:03, 1.63it/s] {'loss': 0.1676, 'grad_norm': 0.6925092339515686, 'learning_rate': 2.0350201182030016e-08, 'epoch': 2.92}
97%|█████████▋| 11228/11526 [1:57:32<03:03, 1.63it/s] 97%|█████████▋| 11229/11526 [1:57:32<03:02, 1.62it/s] {'loss': 0.1638, 'grad_norm': 0.7411305904388428, 'learning_rate': 2.0213943755789112e-08, 'epoch': 2.92}
97%|█████████▋| 11229/11526 [1:57:32<03:02, 1.62it/s] 97%|█████████▋| 11230/11526 [1:57:33<03:02, 1.63it/s] {'loss': 0.1385, 'grad_norm': 0.6301344037055969, 'learning_rate': 2.0078143103858805e-08, 'epoch': 2.92}
97%|█████████▋| 11230/11526 [1:57:33<03:02, 1.63it/s] 97%|█████████▋| 11231/11526 [1:57:33<03:01, 1.62it/s] {'loss': 0.1689, 'grad_norm': 0.7216562032699585, 'learning_rate': 1.9942799238696354e-08, 'epoch': 2.92}
97%|█████████▋| 11231/11526 [1:57:33<03:01, 1.62it/s] 97%|█████████▋| 11232/11526 [1:57:34<03:00, 1.63it/s] {'loss': 0.1298, 'grad_norm': 0.5120865702629089, 'learning_rate': 1.9807912172715715e-08, 'epoch': 2.92}
97%|█████████▋| 11232/11526 [1:57:34<03:00, 1.63it/s] 97%|█████████▋| 11233/11526 [1:57:35<03:00, 1.63it/s] {'loss': 0.1558, 'grad_norm': 0.6182675361633301, 'learning_rate': 1.9673481918289218e-08, 'epoch': 2.92}
97%|█████████▋| 11233/11526 [1:57:35<03:00, 1.63it/s] 97%|█████████▋| 11234/11526 [1:57:35<02:59, 1.62it/s] {'loss': 0.1141, 'grad_norm': 0.5346097946166992, 'learning_rate': 1.9539508487748104e-08, 'epoch': 2.92}
97%|█████████▋| 11234/11526 [1:57:35<02:59, 1.62it/s] 97%|█████████▋| 11235/11526 [1:57:36<02:59, 1.62it/s] {'loss': 0.1306, 'grad_norm': 0.61301189661026, 'learning_rate': 1.9405991893380326e-08, 'epoch': 2.92}
97%|█████████▋| 11235/11526 [1:57:36<02:59, 1.62it/s] 97%|█████████▋| 11236/11526 [1:57:36<02:58, 1.62it/s] {'loss': 0.1573, 'grad_norm': 0.683090329170227, 'learning_rate': 1.9272932147433865e-08, 'epoch': 2.92}
97%|█████████▋| 11236/11526 [1:57:36<02:58, 1.62it/s] 97%|█████████▋| 11237/11526 [1:57:37<02:57, 1.63it/s] {'loss': 0.1708, 'grad_norm': 0.5916993618011475, 'learning_rate': 1.91403292621134e-08, 'epoch': 2.92}
97%|█████████▋| 11237/11526 [1:57:37<02:57, 1.63it/s] 98%|█████████▊| 11238/11526 [1:57:38<02:57, 1.63it/s] {'loss': 0.1829, 'grad_norm': 0.639055073261261, 'learning_rate': 1.900818324958198e-08, 'epoch': 2.93}
98%|█████████▊| 11238/11526 [1:57:38<02:57, 1.63it/s] 98%|█████████▊| 11239/11526 [1:57:38<02:56, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.6739124059677124, 'learning_rate': 1.8876494121959908e-08, 'epoch': 2.93}
98%|█████████▊| 11239/11526 [1:57:38<02:56, 1.63it/s] 98%|█████████▊| 11240/11526 [1:57:39<02:56, 1.62it/s] {'loss': 0.1471, 'grad_norm': 0.5895493030548096, 'learning_rate': 1.8745261891327527e-08, 'epoch': 2.93}
98%|█████████▊| 11240/11526 [1:57:39<02:56, 1.62it/s] 98%|█████████▊| 11241/11526 [1:57:39<02:55, 1.62it/s] {'loss': 0.148, 'grad_norm': 0.6115237474441528, 'learning_rate': 1.8614486569722424e-08, 'epoch': 2.93}
98%|█████████▊| 11241/11526 [1:57:40<02:55, 1.62it/s] 98%|█████████▊| 11242/11526 [1:57:40<02:54, 1.63it/s] {'loss': 0.1219, 'grad_norm': 0.5178911089897156, 'learning_rate': 1.8484168169138893e-08, 'epoch': 2.93}
98%|█████████▊| 11242/11526 [1:57:40<02:54, 1.63it/s] 98%|█████████▊| 11243/11526 [1:57:41<02:53, 1.63it/s] {'loss': 0.135, 'grad_norm': 0.5451470613479614, 'learning_rate': 1.8354306701531267e-08, 'epoch': 2.93}
98%|█████████▊| 11243/11526 [1:57:41<02:53, 1.63it/s] 98%|█████████▊| 11244/11526 [1:57:41<02:53, 1.63it/s] {'loss': 0.149, 'grad_norm': 0.594086229801178, 'learning_rate': 1.822490217881112e-08, 'epoch': 2.93}
98%|█████████▊| 11244/11526 [1:57:41<02:53, 1.63it/s] 98%|█████████▊| 11245/11526 [1:57:42<02:52, 1.62it/s] {'loss': 0.1495, 'grad_norm': 0.5905158519744873, 'learning_rate': 1.8095954612847856e-08, 'epoch': 2.93}
98%|█████████▊| 11245/11526 [1:57:42<02:52, 1.62it/s] 98%|█████████▊| 11246/11526 [1:57:43<02:52, 1.63it/s] {'loss': 0.1644, 'grad_norm': 0.6121170520782471, 'learning_rate': 1.7967464015469783e-08, 'epoch': 2.93}
98%|█████████▊| 11246/11526 [1:57:43<02:52, 1.63it/s] 98%|█████████▊| 11247/11526 [1:57:43<02:51, 1.63it/s] {'loss': 0.1954, 'grad_norm': 0.7172077298164368, 'learning_rate': 1.7839430398462476e-08, 'epoch': 2.93}
98%|█████████▊| 11247/11526 [1:57:43<02:51, 1.63it/s] 98%|█████████▊| 11248/11526 [1:57:44<02:50, 1.63it/s] {'loss': 0.1616, 'grad_norm': 0.6122432351112366, 'learning_rate': 1.7711853773569875e-08, 'epoch': 2.93}
98%|█████████▊| 11248/11526 [1:57:44<02:50, 1.63it/s] 98%|█████████▊| 11249/11526 [1:57:44<02:50, 1.62it/s] {'loss': 0.1337, 'grad_norm': 0.5224506855010986, 'learning_rate': 1.7584734152493733e-08, 'epoch': 2.93}
98%|█████████▊| 11249/11526 [1:57:44<02:50, 1.62it/s] 98%|█████████▊| 11250/11526 [1:57:45<02:49, 1.63it/s] {'loss': 0.151, 'grad_norm': 0.5755152106285095, 'learning_rate': 1.7458071546895272e-08, 'epoch': 2.93}
98%|█████████▊| 11250/11526 [1:57:45<02:49, 1.63it/s] 98%|█████████▊| 11251/11526 [1:57:46<02:49, 1.63it/s] {'loss': 0.1949, 'grad_norm': 0.6705301403999329, 'learning_rate': 1.7331865968391314e-08, 'epoch': 2.93}
98%|█████████▊| 11251/11526 [1:57:46<02:49, 1.63it/s] 98%|█████████▊| 11252/11526 [1:57:46<02:48, 1.63it/s] {'loss': 0.1815, 'grad_norm': 0.6492193341255188, 'learning_rate': 1.7206117428559267e-08, 'epoch': 2.93}
98%|█████████▊| 11252/11526 [1:57:46<02:48, 1.63it/s] 98%|█████████▊| 11253/11526 [1:57:47<02:47, 1.63it/s] {'loss': 0.1487, 'grad_norm': 0.6112135052680969, 'learning_rate': 1.7080825938932677e-08, 'epoch': 2.93}
98%|█████████▊| 11253/11526 [1:57:47<02:47, 1.63it/s] 98%|█████████▊| 11254/11526 [1:57:47<02:47, 1.62it/s] {'loss': 0.1444, 'grad_norm': 0.5197839736938477, 'learning_rate': 1.6955991511004578e-08, 'epoch': 2.93}
98%|█████████▊| 11254/11526 [1:57:48<02:47, 1.62it/s] 98%|█████████▊| 11255/11526 [1:57:48<02:47, 1.62it/s] {'loss': 0.1469, 'grad_norm': 0.6282867193222046, 'learning_rate': 1.6831614156225252e-08, 'epoch': 2.93}
98%|█████████▊| 11255/11526 [1:57:48<02:47, 1.62it/s] 98%|█████████▊| 11256/11526 [1:57:49<02:46, 1.62it/s] {'loss': 0.1539, 'grad_norm': 0.5750123858451843, 'learning_rate': 1.670769388600335e-08, 'epoch': 2.93}
98%|█████████▊| 11256/11526 [1:57:49<02:46, 1.62it/s] 98%|█████████▊| 11257/11526 [1:57:49<02:45, 1.62it/s] {'loss': 0.1383, 'grad_norm': 0.8973411917686462, 'learning_rate': 1.6584230711705897e-08, 'epoch': 2.93}
98%|█████████▊| 11257/11526 [1:57:49<02:45, 1.62it/s] 98%|█████████▊| 11258/11526 [1:57:50<02:45, 1.62it/s] {'loss': 0.1556, 'grad_norm': 0.6090607643127441, 'learning_rate': 1.6461224644656602e-08, 'epoch': 2.93}
98%|█████████▊| 11258/11526 [1:57:50<02:45, 1.62it/s] 98%|█████████▊| 11259/11526 [1:57:51<02:44, 1.62it/s] {'loss': 0.1577, 'grad_norm': 0.6407963633537292, 'learning_rate': 1.6338675696139782e-08, 'epoch': 2.93}
98%|█████████▊| 11259/11526 [1:57:51<02:44, 1.62it/s] 98%|█████████▊| 11260/11526 [1:57:51<02:43, 1.62it/s] {'loss': 0.1563, 'grad_norm': 0.5215564966201782, 'learning_rate': 1.6216583877394775e-08, 'epoch': 2.93}
98%|█████████▊| 11260/11526 [1:57:51<02:43, 1.62it/s] 98%|█████████▊| 11261/11526 [1:57:52<02:43, 1.62it/s] {'loss': 0.1381, 'grad_norm': 0.4940922260284424, 'learning_rate': 1.6094949199621512e-08, 'epoch': 2.93}
98%|█████████▊| 11261/11526 [1:57:52<02:43, 1.62it/s] 98%|█████████▊| 11262/11526 [1:57:52<02:42, 1.62it/s] {'loss': 0.157, 'grad_norm': 0.5904722809791565, 'learning_rate': 1.5973771673976623e-08, 'epoch': 2.93}
98%|█████████▊| 11262/11526 [1:57:52<02:42, 1.62it/s] 98%|█████████▊| 11263/11526 [1:57:53<02:41, 1.63it/s] {'loss': 0.1365, 'grad_norm': 0.5064545273780823, 'learning_rate': 1.5853051311575107e-08, 'epoch': 2.93}
98%|█████████▊| 11263/11526 [1:57:53<02:41, 1.63it/s] 98%|█████████▊| 11264/11526 [1:57:54<02:41, 1.63it/s] {'loss': 0.1725, 'grad_norm': 0.6725881099700928, 'learning_rate': 1.5732788123490883e-08, 'epoch': 2.93}
98%|█████████▊| 11264/11526 [1:57:54<02:41, 1.63it/s] 98%|█████████▊| 11265/11526 [1:57:54<02:41, 1.62it/s] {'loss': 0.1195, 'grad_norm': 0.5238219499588013, 'learning_rate': 1.5612982120754572e-08, 'epoch': 2.93}
98%|█████████▊| 11265/11526 [1:57:54<02:41, 1.62it/s] 98%|█████████▊| 11266/11526 [1:57:55<02:40, 1.62it/s] {'loss': 0.1499, 'grad_norm': 0.7012555003166199, 'learning_rate': 1.549363331435516e-08, 'epoch': 2.93}
98%|█████████▊| 11266/11526 [1:57:55<02:40, 1.62it/s] 98%|█████████▊| 11267/11526 [1:57:55<02:39, 1.62it/s] {'loss': 0.1305, 'grad_norm': 0.5465860962867737, 'learning_rate': 1.537474171524056e-08, 'epoch': 2.93}
98%|█████████▊| 11267/11526 [1:57:56<02:39, 1.62it/s] 98%|█████████▊| 11268/11526 [1:57:56<02:38, 1.62it/s] {'loss': 0.1649, 'grad_norm': 0.6489547491073608, 'learning_rate': 1.5256307334315933e-08, 'epoch': 2.93}
98%|█████████▊| 11268/11526 [1:57:56<02:38, 1.62it/s] 98%|█████████▊| 11269/11526 [1:57:57<02:38, 1.62it/s] {'loss': 0.1329, 'grad_norm': 0.5428279638290405, 'learning_rate': 1.513833018244537e-08, 'epoch': 2.93}
98%|█████████▊| 11269/11526 [1:57:57<02:38, 1.62it/s] 98%|█████████▊| 11270/11526 [1:57:57<02:38, 1.62it/s] {'loss': 0.1061, 'grad_norm': 0.43167033791542053, 'learning_rate': 1.5020810270449105e-08, 'epoch': 2.93}
98%|█████████▊| 11270/11526 [1:57:57<02:38, 1.62it/s] 98%|█████████▊| 11271/11526 [1:57:58<02:37, 1.62it/s] {'loss': 0.1705, 'grad_norm': 0.6328820586204529, 'learning_rate': 1.4903747609107955e-08, 'epoch': 2.93}
98%|█████████▊| 11271/11526 [1:57:58<02:37, 1.62it/s] 98%|█████████▊| 11272/11526 [1:57:59<02:36, 1.62it/s] {'loss': 0.1211, 'grad_norm': 0.49132218956947327, 'learning_rate': 1.4787142209158889e-08, 'epoch': 2.93}
98%|█████████▊| 11272/11526 [1:57:59<02:36, 1.62it/s] 98%|█████████▊| 11273/11526 [1:57:59<02:35, 1.62it/s] {'loss': 0.196, 'grad_norm': 0.84690260887146, 'learning_rate': 1.4670994081297796e-08, 'epoch': 2.93}
98%|█████████▊| 11273/11526 [1:57:59<02:35, 1.62it/s] 98%|█████████▊| 11274/11526 [1:58:00<02:35, 1.62it/s] {'loss': 0.1541, 'grad_norm': 0.6555868983268738, 'learning_rate': 1.4555303236178375e-08, 'epoch': 2.93}
98%|█████████▊| 11274/11526 [1:58:00<02:35, 1.62it/s] 98%|█████████▊| 11275/11526 [1:58:00<02:34, 1.62it/s] {'loss': 0.1274, 'grad_norm': 0.4699592888355255, 'learning_rate': 1.4440069684412694e-08, 'epoch': 2.93}
98%|█████████▊| 11275/11526 [1:58:01<02:34, 1.62it/s] 98%|█████████▊| 11276/11526 [1:58:01<02:34, 1.62it/s] {'loss': 0.1423, 'grad_norm': 0.5768755674362183, 'learning_rate': 1.4325293436570631e-08, 'epoch': 2.93}
98%|█████████▊| 11276/11526 [1:58:01<02:34, 1.62it/s] 98%|█████████▊| 11277/11526 [1:58:02<02:33, 1.62it/s] {'loss': 0.172, 'grad_norm': 1.0008844137191772, 'learning_rate': 1.4210974503179875e-08, 'epoch': 2.94}
98%|█████████▊| 11277/11526 [1:58:02<02:33, 1.62it/s] 98%|█████████▊| 11278/11526 [1:58:02<02:32, 1.62it/s] {'loss': 0.1551, 'grad_norm': 0.6461756825447083, 'learning_rate': 1.4097112894726484e-08, 'epoch': 2.94}
98%|█████████▊| 11278/11526 [1:58:02<02:32, 1.62it/s] 98%|█████████▊| 11279/11526 [1:58:03<02:32, 1.62it/s] {'loss': 0.1544, 'grad_norm': 0.6002910137176514, 'learning_rate': 1.3983708621654324e-08, 'epoch': 2.94}
98%|█████████▊| 11279/11526 [1:58:03<02:32, 1.62it/s] 98%|█████████▊| 11280/11526 [1:58:03<02:31, 1.62it/s] {'loss': 0.1135, 'grad_norm': 0.4765825867652893, 'learning_rate': 1.387076169436563e-08, 'epoch': 2.94}
98%|█████████▊| 11280/11526 [1:58:04<02:31, 1.62it/s] 98%|█████████▊| 11281/11526 [1:58:04<02:30, 1.62it/s] {'loss': 0.1439, 'grad_norm': 0.5470151901245117, 'learning_rate': 1.3758272123221005e-08, 'epoch': 2.94}
98%|█████████▊| 11281/11526 [1:58:04<02:30, 1.62it/s] 98%|█████████▊| 11282/11526 [1:58:05<02:30, 1.63it/s] {'loss': 0.1505, 'grad_norm': 0.591168224811554, 'learning_rate': 1.3646239918538306e-08, 'epoch': 2.94}
98%|█████████▊| 11282/11526 [1:58:05<02:30, 1.63it/s] 98%|█████████▊| 11283/11526 [1:58:05<02:29, 1.63it/s] {'loss': 0.1302, 'grad_norm': 0.5745025277137756, 'learning_rate': 1.3534665090593202e-08, 'epoch': 2.94}
98%|█████████▊| 11283/11526 [1:58:05<02:29, 1.63it/s] 98%|█████████▊| 11284/11526 [1:58:06<02:28, 1.63it/s] {'loss': 0.1895, 'grad_norm': 0.6906489133834839, 'learning_rate': 1.3423547649620839e-08, 'epoch': 2.94}
98%|█████████▊| 11284/11526 [1:58:06<02:28, 1.63it/s] 98%|█████████▊| 11285/11526 [1:58:07<02:28, 1.62it/s] {'loss': 0.098, 'grad_norm': 0.503831684589386, 'learning_rate': 1.3312887605812508e-08, 'epoch': 2.94}
98%|█████████▊| 11285/11526 [1:58:07<02:28, 1.62it/s] 98%|█████████▊| 11286/11526 [1:58:07<02:27, 1.62it/s] {'loss': 0.1564, 'grad_norm': 0.5401507616043091, 'learning_rate': 1.3202684969319535e-08, 'epoch': 2.94}
98%|█████████▊| 11286/11526 [1:58:07<02:27, 1.62it/s] 98%|█████████▊| 11287/11526 [1:58:08<02:27, 1.62it/s] {'loss': 0.1867, 'grad_norm': 0.6726077795028687, 'learning_rate': 1.30929397502505e-08, 'epoch': 2.94}
98%|█████████▊| 11287/11526 [1:58:08<02:27, 1.62it/s] 98%|█████████▊| 11288/11526 [1:58:08<02:26, 1.63it/s] {'loss': 0.1347, 'grad_norm': 0.5328016877174377, 'learning_rate': 1.2983651958670684e-08, 'epoch': 2.94}
98%|█████████▊| 11288/11526 [1:58:09<02:26, 1.63it/s] 98%|█████████▊| 11289/11526 [1:58:09<02:25, 1.63it/s] {'loss': 0.1947, 'grad_norm': 0.7754055261611938, 'learning_rate': 1.2874821604605403e-08, 'epoch': 2.94}
98%|█████████▊| 11289/11526 [1:58:09<02:25, 1.63it/s] 98%|█████████▊| 11290/11526 [1:58:10<02:25, 1.62it/s] {'loss': 0.1473, 'grad_norm': 0.5462754368782043, 'learning_rate': 1.2766448698037225e-08, 'epoch': 2.94}
98%|█████████▊| 11290/11526 [1:58:10<02:25, 1.62it/s] 98%|█████████▊| 11291/11526 [1:58:10<02:24, 1.62it/s] {'loss': 0.1584, 'grad_norm': 0.6377400159835815, 'learning_rate': 1.2658533248906535e-08, 'epoch': 2.94}
98%|█████████▊| 11291/11526 [1:58:10<02:24, 1.62it/s] 98%|█████████▊| 11292/11526 [1:58:11<02:23, 1.63it/s] {'loss': 0.143, 'grad_norm': 0.5149046778678894, 'learning_rate': 1.2551075267112078e-08, 'epoch': 2.94}
98%|█████████▊| 11292/11526 [1:58:11<02:23, 1.63it/s] 98%|█████████▊| 11293/11526 [1:58:11<02:23, 1.63it/s] {'loss': 0.1204, 'grad_norm': 0.4810383915901184, 'learning_rate': 1.2444074762509861e-08, 'epoch': 2.94}
98%|█████████▊| 11293/11526 [1:58:12<02:23, 1.63it/s] 98%|█████████▊| 11294/11526 [1:58:12<02:22, 1.63it/s] {'loss': 0.1275, 'grad_norm': 0.5621935129165649, 'learning_rate': 1.2337531744915921e-08, 'epoch': 2.94}
98%|█████████▊| 11294/11526 [1:58:12<02:22, 1.63it/s] 98%|█████████▊| 11295/11526 [1:58:13<02:22, 1.62it/s] {'loss': 0.141, 'grad_norm': 0.584064245223999, 'learning_rate': 1.2231446224101329e-08, 'epoch': 2.94}
98%|█████████▊| 11295/11526 [1:58:13<02:22, 1.62it/s] 98%|█████████▊| 11296/11526 [1:58:13<02:21, 1.62it/s] {'loss': 0.1465, 'grad_norm': 0.6003232598304749, 'learning_rate': 1.21258182097983e-08, 'epoch': 2.94}
98%|█████████▊| 11296/11526 [1:58:13<02:21, 1.62it/s] 98%|█████████▊| 11297/11526 [1:58:14<02:21, 1.62it/s] {'loss': 0.1336, 'grad_norm': 0.5135493874549866, 'learning_rate': 1.20206477116952e-08, 'epoch': 2.94}
98%|█████████▊| 11297/11526 [1:58:14<02:21, 1.62it/s] 98%|█████████▊| 11298/11526 [1:58:15<02:20, 1.62it/s] {'loss': 0.1232, 'grad_norm': 0.4624166786670685, 'learning_rate': 1.1915934739438194e-08, 'epoch': 2.94}
98%|█████████▊| 11298/11526 [1:58:15<02:20, 1.62it/s] 98%|█████████▊| 11299/11526 [1:58:15<02:20, 1.62it/s] {'loss': 0.1436, 'grad_norm': 0.6410014033317566, 'learning_rate': 1.181167930263294e-08, 'epoch': 2.94}
98%|█████████▊| 11299/11526 [1:58:15<02:20, 1.62it/s] 98%|█████████▊| 11300/11526 [1:58:16<02:19, 1.62it/s] {'loss': 0.1992, 'grad_norm': 0.6574990153312683, 'learning_rate': 1.1707881410842336e-08, 'epoch': 2.94}
98%|█████████▊| 11300/11526 [1:58:16<02:19, 1.62it/s] 98%|█████████▊| 11301/11526 [1:58:16<02:18, 1.62it/s] {'loss': 0.1791, 'grad_norm': 0.6944190263748169, 'learning_rate': 1.1604541073586551e-08, 'epoch': 2.94}
98%|█████████▊| 11301/11526 [1:58:17<02:18, 1.62it/s] 98%|█████████▊| 11302/11526 [1:58:17<02:18, 1.62it/s] {'loss': 0.1308, 'grad_norm': 0.5437262058258057, 'learning_rate': 1.150165830034522e-08, 'epoch': 2.94}
98%|█████████▊| 11302/11526 [1:58:17<02:18, 1.62it/s] 98%|█████████▊| 11303/11526 [1:58:18<02:17, 1.62it/s] {'loss': 0.1935, 'grad_norm': 0.6964549422264099, 'learning_rate': 1.139923310055524e-08, 'epoch': 2.94}
98%|█████████▊| 11303/11526 [1:58:18<02:17, 1.62it/s] 98%|█████████▊| 11304/11526 [1:58:18<02:16, 1.62it/s] {'loss': 0.1524, 'grad_norm': 0.563166618347168, 'learning_rate': 1.1297265483611875e-08, 'epoch': 2.94}
98%|█████████▊| 11304/11526 [1:58:18<02:16, 1.62it/s] 98%|█████████▊| 11305/11526 [1:58:19<02:16, 1.62it/s] {'loss': 0.1423, 'grad_norm': 0.5223636031150818, 'learning_rate': 1.1195755458867641e-08, 'epoch': 2.94}
98%|█████████▊| 11305/11526 [1:58:19<02:16, 1.62it/s] 98%|█████████▊| 11306/11526 [1:58:19<02:15, 1.62it/s] {'loss': 0.1233, 'grad_norm': 0.5518066883087158, 'learning_rate': 1.1094703035633424e-08, 'epoch': 2.94}
98%|█████████▊| 11306/11526 [1:58:20<02:15, 1.62it/s] 98%|█████████▊| 11307/11526 [1:58:20<02:14, 1.62it/s] {'loss': 0.1405, 'grad_norm': 0.5169944763183594, 'learning_rate': 1.0994108223179034e-08, 'epoch': 2.94}
98%|█████████▊| 11307/11526 [1:58:20<02:14, 1.62it/s] 98%|█████████▊| 11308/11526 [1:58:21<02:14, 1.62it/s] {'loss': 0.1878, 'grad_norm': 0.6791489124298096, 'learning_rate': 1.0893971030731532e-08, 'epoch': 2.94}
98%|█████████▊| 11308/11526 [1:58:21<02:14, 1.62it/s] 98%|█████████▊| 11309/11526 [1:58:21<02:14, 1.62it/s] {'loss': 0.1421, 'grad_norm': 0.5807570219039917, 'learning_rate': 1.0794291467475792e-08, 'epoch': 2.94}
98%|█████████▊| 11309/11526 [1:58:21<02:14, 1.62it/s] 98%|█████████▊| 11310/11526 [1:58:22<02:13, 1.62it/s] {'loss': 0.145, 'grad_norm': 0.6149040460586548, 'learning_rate': 1.0695069542555059e-08, 'epoch': 2.94}
98%|█████████▊| 11310/11526 [1:58:22<02:13, 1.62it/s] 98%|█████████▊| 11311/11526 [1:58:23<02:12, 1.62it/s] {'loss': 0.1972, 'grad_norm': 0.7662574648857117, 'learning_rate': 1.0596305265070384e-08, 'epoch': 2.94}
98%|█████████▊| 11311/11526 [1:58:23<02:12, 1.62it/s] 98%|█████████▊| 11312/11526 [1:58:23<02:11, 1.62it/s] {'loss': 0.1465, 'grad_norm': 0.5519954562187195, 'learning_rate': 1.0497998644081186e-08, 'epoch': 2.94}
98%|█████████▊| 11312/11526 [1:58:23<02:11, 1.62it/s] 98%|█████████▊| 11313/11526 [1:58:24<02:11, 1.62it/s] {'loss': 0.1984, 'grad_norm': 0.7010862827301025, 'learning_rate': 1.0400149688604699e-08, 'epoch': 2.94}
98%|█████████▊| 11313/11526 [1:58:24<02:11, 1.62it/s] 98%|█████████▊| 11314/11526 [1:58:24<02:10, 1.62it/s] {'loss': 0.1286, 'grad_norm': 0.5400516986846924, 'learning_rate': 1.0302758407615965e-08, 'epoch': 2.94}
98%|█████████▊| 11314/11526 [1:58:25<02:10, 1.62it/s] 98%|█████████▊| 11315/11526 [1:58:25<02:09, 1.62it/s] {'loss': 0.1348, 'grad_norm': 0.4881977438926697, 'learning_rate': 1.0205824810048392e-08, 'epoch': 2.95}
98%|█████████▊| 11315/11526 [1:58:25<02:09, 1.62it/s] 98%|█████████▊| 11316/11526 [1:58:26<02:10, 1.62it/s] {'loss': 0.1581, 'grad_norm': 0.6128383874893188, 'learning_rate': 1.0109348904793758e-08, 'epoch': 2.95}
98%|█████████▊| 11316/11526 [1:58:26<02:10, 1.62it/s] 98%|█████████▊| 11317/11526 [1:58:26<02:09, 1.62it/s] {'loss': 0.1342, 'grad_norm': 0.5729012489318848, 'learning_rate': 1.001333070070054e-08, 'epoch': 2.95}
98%|█████████▊| 11317/11526 [1:58:26<02:09, 1.62it/s] 98%|█████████▊| 11318/11526 [1:58:27<02:08, 1.62it/s] {'loss': 0.1559, 'grad_norm': 0.5320900082588196, 'learning_rate': 9.917770206576694e-09, 'epoch': 2.95}
98%|█████████▊| 11318/11526 [1:58:27<02:08, 1.62it/s] 98%|█████████▊| 11319/11526 [1:58:28<02:07, 1.62it/s] {'loss': 0.1351, 'grad_norm': 0.5551177859306335, 'learning_rate': 9.822667431186873e-09, 'epoch': 2.95}
98%|█████████▊| 11319/11526 [1:58:28<02:07, 1.62it/s] 98%|█████████▊| 11320/11526 [1:58:28<02:06, 1.62it/s] {'loss': 0.1857, 'grad_norm': 0.7838797569274902, 'learning_rate': 9.728022383255209e-09, 'epoch': 2.95}
98%|█████████▊| 11320/11526 [1:58:28<02:06, 1.62it/s] 98%|█████████▊| 11321/11526 [1:58:29<02:06, 1.62it/s] {'loss': 0.1822, 'grad_norm': 0.7087050676345825, 'learning_rate': 9.633835071463094e-09, 'epoch': 2.95}
98%|█████████▊| 11321/11526 [1:58:29<02:06, 1.62it/s] 98%|█████████▊| 11322/11526 [1:58:29<02:05, 1.62it/s] {'loss': 0.1206, 'grad_norm': 0.5373194813728333, 'learning_rate': 9.540105504449726e-09, 'epoch': 2.95}
98%|█████████▊| 11322/11526 [1:58:29<02:05, 1.62it/s] 98%|█████████▊| 11323/11526 [1:58:30<02:04, 1.63it/s] {'loss': 0.1316, 'grad_norm': 0.5868030190467834, 'learning_rate': 9.446833690812118e-09, 'epoch': 2.95}
98%|█████████▊| 11323/11526 [1:58:30<02:04, 1.63it/s] 98%|█████████▊| 11324/11526 [1:58:31<02:04, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.610967755317688, 'learning_rate': 9.354019639105649e-09, 'epoch': 2.95}
98%|█████████▊| 11324/11526 [1:58:31<02:04, 1.63it/s] 98%|█████████▊| 11325/11526 [1:58:31<02:03, 1.62it/s] {'loss': 0.1546, 'grad_norm': 0.6662547588348389, 'learning_rate': 9.261663357844618e-09, 'epoch': 2.95}
98%|█████████▊| 11325/11526 [1:58:31<02:03, 1.62it/s] 98%|█████████▊| 11326/11526 [1:58:32<02:03, 1.62it/s] {'loss': 0.1416, 'grad_norm': 0.5664051175117493, 'learning_rate': 9.169764855500029e-09, 'epoch': 2.95}
98%|█████████▊| 11326/11526 [1:58:32<02:03, 1.62it/s] 98%|█████████▊| 11327/11526 [1:58:32<02:02, 1.63it/s] {'loss': 0.1292, 'grad_norm': 0.5371645092964172, 'learning_rate': 9.078324140501249e-09, 'epoch': 2.95}
98%|█████████▊| 11327/11526 [1:58:33<02:02, 1.63it/s] 98%|█████████▊| 11328/11526 [1:58:33<02:01, 1.63it/s] {'loss': 0.1584, 'grad_norm': 0.5707628726959229, 'learning_rate': 8.987341221235458e-09, 'epoch': 2.95}
98%|█████████▊| 11328/11526 [1:58:33<02:01, 1.63it/s] 98%|█████████▊| 11329/11526 [1:58:34<02:01, 1.63it/s] {'loss': 0.1723, 'grad_norm': 0.8159041404724121, 'learning_rate': 8.8968161060482e-09, 'epoch': 2.95}
98%|█████████▊| 11329/11526 [1:58:34<02:01, 1.63it/s] 98%|█████████▊| 11330/11526 [1:58:34<02:00, 1.63it/s] {'loss': 0.1501, 'grad_norm': 0.6970759630203247, 'learning_rate': 8.806748803243392e-09, 'epoch': 2.95}
98%|█████████▊| 11330/11526 [1:58:34<02:00, 1.63it/s] 98%|█████████▊| 11331/11526 [1:58:35<01:59, 1.63it/s] {'loss': 0.1779, 'grad_norm': 0.6370256543159485, 'learning_rate': 8.717139321082757e-09, 'epoch': 2.95}
98%|█████████▊| 11331/11526 [1:58:35<01:59, 1.63it/s] 98%|█████████▊| 11332/11526 [1:58:35<01:59, 1.63it/s] {'loss': 0.1364, 'grad_norm': 0.6092800498008728, 'learning_rate': 8.62798766778472e-09, 'epoch': 2.95}
98%|█████████▊| 11332/11526 [1:58:36<01:59, 1.63it/s] 98%|█████████▊| 11333/11526 [1:58:36<01:58, 1.63it/s] {'loss': 0.1276, 'grad_norm': 0.579016923904419, 'learning_rate': 8.539293851527741e-09, 'epoch': 2.95}
98%|█████████▊| 11333/11526 [1:58:36<01:58, 1.63it/s] 98%|█████████▊| 11334/11526 [1:58:37<01:57, 1.63it/s] {'loss': 0.1539, 'grad_norm': 0.5995076894760132, 'learning_rate': 8.451057880446978e-09, 'epoch': 2.95}
98%|█████████▊| 11334/11526 [1:58:37<01:57, 1.63it/s] 98%|█████████▊| 11335/11526 [1:58:37<01:57, 1.63it/s] {'loss': 0.1448, 'grad_norm': 0.5852020382881165, 'learning_rate': 8.363279762635956e-09, 'epoch': 2.95}
98%|█████████▊| 11335/11526 [1:58:37<01:57, 1.63it/s] 98%|█████████▊| 11336/11526 [1:58:38<01:56, 1.63it/s] {'loss': 0.1342, 'grad_norm': 0.5465186238288879, 'learning_rate': 8.275959506146014e-09, 'epoch': 2.95}
98%|█████████▊| 11336/11526 [1:58:38<01:56, 1.63it/s] 98%|█████████▊| 11337/11526 [1:58:39<01:56, 1.63it/s] {'loss': 0.1442, 'grad_norm': 0.6058158278465271, 'learning_rate': 8.1890971189863e-09, 'epoch': 2.95}
98%|█████████▊| 11337/11526 [1:58:39<01:56, 1.63it/s] 98%|█████████▊| 11338/11526 [1:58:39<01:55, 1.63it/s] {'loss': 0.1724, 'grad_norm': 0.729080080986023, 'learning_rate': 8.10269260912544e-09, 'epoch': 2.95}
98%|█████████▊| 11338/11526 [1:58:39<01:55, 1.63it/s] 98%|█████████▊| 11339/11526 [1:58:40<01:55, 1.63it/s] {'loss': 0.1414, 'grad_norm': 0.5992945432662964, 'learning_rate': 8.016745984488206e-09, 'epoch': 2.95}
98%|█████████▊| 11339/11526 [1:58:40<01:55, 1.63it/s] 98%|█████████▊| 11340/11526 [1:58:40<01:54, 1.62it/s] {'loss': 0.176, 'grad_norm': 0.6940473914146423, 'learning_rate': 7.93125725295829e-09, 'epoch': 2.95}
98%|█████████▊| 11340/11526 [1:58:41<01:54, 1.62it/s] 98%|█████████▊| 11341/11526 [1:58:41<01:53, 1.62it/s] {'loss': 0.1515, 'grad_norm': 0.6139853000640869, 'learning_rate': 7.846226422377202e-09, 'epoch': 2.95}
98%|█████████▊| 11341/11526 [1:58:41<01:53, 1.62it/s] 98%|█████████▊| 11342/11526 [1:58:42<01:53, 1.62it/s] {'loss': 0.1553, 'grad_norm': 0.6349841356277466, 'learning_rate': 7.761653500544252e-09, 'epoch': 2.95}
98%|█████████▊| 11342/11526 [1:58:42<01:53, 1.62it/s] 98%|█████████▊| 11343/11526 [1:58:42<01:52, 1.63it/s] {'loss': 0.1219, 'grad_norm': 0.499685138463974, 'learning_rate': 7.677538495217685e-09, 'epoch': 2.95}
98%|█████████▊| 11343/11526 [1:58:42<01:52, 1.63it/s] 98%|█████████▊| 11344/11526 [1:58:43<01:51, 1.63it/s] {'loss': 0.1532, 'grad_norm': 0.6199285984039307, 'learning_rate': 7.593881414111882e-09, 'epoch': 2.95}
98%|█████████▊| 11344/11526 [1:58:43<01:51, 1.63it/s] 98%|█████████▊| 11345/11526 [1:58:43<01:51, 1.62it/s] {'loss': 0.1603, 'grad_norm': 0.6589479446411133, 'learning_rate': 7.510682264900703e-09, 'epoch': 2.95}
98%|█████████▊| 11345/11526 [1:58:44<01:51, 1.62it/s] 98%|█████████▊| 11346/11526 [1:58:44<01:50, 1.62it/s] {'loss': 0.2004, 'grad_norm': 0.7089987397193909, 'learning_rate': 7.427941055215826e-09, 'epoch': 2.95}
98%|█████████▊| 11346/11526 [1:58:44<01:50, 1.62it/s] 98%|█████████▊| 11347/11526 [1:58:45<01:50, 1.62it/s] {'loss': 0.1461, 'grad_norm': 0.5288246273994446, 'learning_rate': 7.345657792647287e-09, 'epoch': 2.95}
98%|█████████▊| 11347/11526 [1:58:45<01:50, 1.62it/s] 98%|█████████▊| 11348/11526 [1:58:45<01:49, 1.63it/s] {'loss': 0.1607, 'grad_norm': 0.711068868637085, 'learning_rate': 7.263832484741828e-09, 'epoch': 2.95}
98%|█████████▊| 11348/11526 [1:58:45<01:49, 1.63it/s] 98%|█████████▊| 11349/11526 [1:58:46<01:48, 1.63it/s] {'loss': 0.1106, 'grad_norm': 0.47705355286598206, 'learning_rate': 7.1824651390051124e-09, 'epoch': 2.95}
98%|█████████▊| 11349/11526 [1:58:46<01:48, 1.63it/s] 98%|█████████▊| 11350/11526 [1:58:47<01:48, 1.63it/s] {'loss': 0.1274, 'grad_norm': 0.5821136832237244, 'learning_rate': 7.101555762900614e-09, 'epoch': 2.95}
98%|█████████▊| 11350/11526 [1:58:47<01:48, 1.63it/s] 98%|█████████▊| 11351/11526 [1:58:47<01:47, 1.62it/s] {'loss': 0.1529, 'grad_norm': 0.5850624442100525, 'learning_rate': 7.021104363849618e-09, 'epoch': 2.95}
98%|█████████▊| 11351/11526 [1:58:47<01:47, 1.62it/s] 98%|█████████▊| 11352/11526 [1:58:48<01:46, 1.63it/s] {'loss': 0.1692, 'grad_norm': 0.7710253000259399, 'learning_rate': 6.941110949232332e-09, 'epoch': 2.95}
98%|█████████▊| 11352/11526 [1:58:48<01:46, 1.63it/s] 98%|█████████▊| 11353/11526 [1:58:48<01:46, 1.63it/s] {'loss': 0.1651, 'grad_norm': 0.6064337491989136, 'learning_rate': 6.861575526385111e-09, 'epoch': 2.95}
98%|█████████▊| 11353/11526 [1:58:49<01:46, 1.63it/s] 99%|█████████▊| 11354/11526 [1:58:49<01:45, 1.63it/s] {'loss': 0.1221, 'grad_norm': 0.523252546787262, 'learning_rate': 6.78249810260434e-09, 'epoch': 2.96}
99%|█████████▊| 11354/11526 [1:58:49<01:45, 1.63it/s] 99%|█████████▊| 11355/11526 [1:58:50<01:45, 1.63it/s] {'loss': 0.1456, 'grad_norm': 0.5737572908401489, 'learning_rate': 6.703878685143106e-09, 'epoch': 2.96}
99%|█████████▊| 11355/11526 [1:58:50<01:45, 1.63it/s] 99%|█████████▊| 11356/11526 [1:58:50<01:44, 1.63it/s] {'loss': 0.1573, 'grad_norm': 0.6081668138504028, 'learning_rate': 6.625717281212862e-09, 'epoch': 2.96}
99%|█████████▊| 11356/11526 [1:58:50<01:44, 1.63it/s] 99%|█████████▊| 11357/11526 [1:58:51<01:43, 1.63it/s] {'loss': 0.1639, 'grad_norm': 0.6501994729042053, 'learning_rate': 6.548013897982874e-09, 'epoch': 2.96}
99%|█████████▊| 11357/11526 [1:58:51<01:43, 1.63it/s] 99%|█████████▊| 11358/11526 [1:58:51<01:43, 1.63it/s] {'loss': 0.1665, 'grad_norm': 0.6312968134880066, 'learning_rate': 6.470768542580219e-09, 'epoch': 2.96}
99%|█████████▊| 11358/11526 [1:58:52<01:43, 1.63it/s] 99%|█████████▊| 11359/11526 [1:58:52<01:42, 1.63it/s] {'loss': 0.2105, 'grad_norm': 0.6685278415679932, 'learning_rate': 6.3939812220908946e-09, 'epoch': 2.96}
99%|█████████▊| 11359/11526 [1:58:52<01:42, 1.63it/s] 99%|█████████▊| 11360/11526 [1:58:53<01:42, 1.63it/s] {'loss': 0.1523, 'grad_norm': 0.5713217854499817, 'learning_rate': 6.317651943558156e-09, 'epoch': 2.96}
99%|█████████▊| 11360/11526 [1:58:53<01:42, 1.63it/s] 99%|█████████▊| 11361/11526 [1:58:53<01:41, 1.63it/s] {'loss': 0.1279, 'grad_norm': 0.571735680103302, 'learning_rate': 6.241780713983625e-09, 'epoch': 2.96}
99%|█████████▊| 11361/11526 [1:58:53<01:41, 1.63it/s] 99%|█████████▊| 11362/11526 [1:58:54<01:40, 1.63it/s] {'loss': 0.1486, 'grad_norm': 0.6209164261817932, 'learning_rate': 6.166367540325624e-09, 'epoch': 2.96}
99%|█████████▊| 11362/11526 [1:58:54<01:40, 1.63it/s] 99%|█████████▊| 11363/11526 [1:58:55<01:40, 1.63it/s] {'loss': 0.1897, 'grad_norm': 0.7413982152938843, 'learning_rate': 6.091412429502508e-09, 'epoch': 2.96}
99%|█████████▊| 11363/11526 [1:58:55<01:40, 1.63it/s] 99%|█████████▊| 11364/11526 [1:58:55<01:39, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.5876460671424866, 'learning_rate': 6.016915388389333e-09, 'epoch': 2.96}
99%|█████████▊| 11364/11526 [1:58:55<01:39, 1.63it/s] 99%|█████████▊| 11365/11526 [1:58:56<01:39, 1.63it/s] {'loss': 0.141, 'grad_norm': 0.591728925704956, 'learning_rate': 5.942876423818966e-09, 'epoch': 2.96}
99%|█████████▊| 11365/11526 [1:58:56<01:39, 1.63it/s] 99%|█████████▊| 11366/11526 [1:58:56<01:38, 1.63it/s] {'loss': 0.1798, 'grad_norm': 0.6155518889427185, 'learning_rate': 5.869295542583198e-09, 'epoch': 2.96}
99%|█████████▊| 11366/11526 [1:58:57<01:38, 1.63it/s] 99%|█████████▊| 11367/11526 [1:58:57<01:37, 1.63it/s] {'loss': 0.1541, 'grad_norm': 0.6139171123504639, 'learning_rate': 5.796172751431073e-09, 'epoch': 2.96}
99%|█████████▊| 11367/11526 [1:58:57<01:37, 1.63it/s] 99%|█████████▊| 11368/11526 [1:58:58<01:37, 1.63it/s] {'loss': 0.1973, 'grad_norm': 0.6908715963363647, 'learning_rate': 5.723508057069449e-09, 'epoch': 2.96}
99%|█████████▊| 11368/11526 [1:58:58<01:37, 1.63it/s] 99%|█████████▊| 11369/11526 [1:58:58<01:36, 1.63it/s] {'loss': 0.1318, 'grad_norm': 0.5743290185928345, 'learning_rate': 5.651301466164105e-09, 'epoch': 2.96}
99%|█████████▊| 11369/11526 [1:58:58<01:36, 1.63it/s] 99%|█████████▊| 11370/11526 [1:58:59<01:36, 1.62it/s] {'loss': 0.1256, 'grad_norm': 0.503889799118042, 'learning_rate': 5.579552985338077e-09, 'epoch': 2.96}
99%|█████████▊| 11370/11526 [1:58:59<01:36, 1.62it/s] 99%|█████████▊| 11371/11526 [1:58:59<01:35, 1.62it/s] {'loss': 0.154, 'grad_norm': 0.6669270396232605, 'learning_rate': 5.508262621172766e-09, 'epoch': 2.96}
99%|█████████▊| 11371/11526 [1:59:00<01:35, 1.62it/s] 99%|█████████▊| 11372/11526 [1:59:00<01:34, 1.63it/s] {'loss': 0.1603, 'grad_norm': 0.635861873626709, 'learning_rate': 5.437430380206832e-09, 'epoch': 2.96}
99%|█████████▊| 11372/11526 [1:59:00<01:34, 1.63it/s] 99%|█████████▊| 11373/11526 [1:59:01<01:34, 1.63it/s] {'loss': 0.1427, 'grad_norm': 0.5158104300498962, 'learning_rate': 5.367056268937853e-09, 'epoch': 2.96}
99%|█████████▊| 11373/11526 [1:59:01<01:34, 1.63it/s] 99%|█████████▊| 11374/11526 [1:59:01<01:33, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.6199572682380676, 'learning_rate': 5.297140293820668e-09, 'epoch': 2.96}
99%|█████████▊| 11374/11526 [1:59:01<01:33, 1.63it/s] 99%|█████████▊| 11375/11526 [1:59:02<01:32, 1.63it/s] {'loss': 0.1069, 'grad_norm': 0.47514456510543823, 'learning_rate': 5.22768246126848e-09, 'epoch': 2.96}
99%|█████████▊| 11375/11526 [1:59:02<01:32, 1.63it/s] 99%|█████████▊| 11376/11526 [1:59:03<01:32, 1.63it/s] {'loss': 0.1916, 'grad_norm': 0.5725969672203064, 'learning_rate': 5.158682777652302e-09, 'epoch': 2.96}
99%|█████████▊| 11376/11526 [1:59:03<01:32, 1.63it/s] 99%|█████████▊| 11377/11526 [1:59:03<01:31, 1.63it/s] {'loss': 0.1442, 'grad_norm': 0.6622864603996277, 'learning_rate': 5.090141249300962e-09, 'epoch': 2.96}
99%|█████████▊| 11377/11526 [1:59:03<01:31, 1.63it/s] 99%|█████████▊| 11378/11526 [1:59:04<01:30, 1.63it/s] {'loss': 0.0914, 'grad_norm': 0.43364372849464417, 'learning_rate': 5.022057882502207e-09, 'epoch': 2.96}
99%|█████████▊| 11378/11526 [1:59:04<01:30, 1.63it/s] 99%|█████████▊| 11379/11526 [1:59:04<01:30, 1.63it/s] {'loss': 0.12, 'grad_norm': 0.5426481366157532, 'learning_rate': 4.954432683500487e-09, 'epoch': 2.96}
99%|█████████▊| 11379/11526 [1:59:05<01:30, 1.63it/s] 99%|█████████▊| 11380/11526 [1:59:05<01:29, 1.63it/s] {'loss': 0.1284, 'grad_norm': 0.4966309070587158, 'learning_rate': 4.887265658498619e-09, 'epoch': 2.96}
99%|█████████▊| 11380/11526 [1:59:05<01:29, 1.63it/s] 99%|█████████▊| 11381/11526 [1:59:06<01:29, 1.63it/s] {'loss': 0.1515, 'grad_norm': 0.6164245009422302, 'learning_rate': 4.820556813657784e-09, 'epoch': 2.96}
99%|█████████▊| 11381/11526 [1:59:06<01:29, 1.63it/s] 99%|█████████▉| 11382/11526 [1:59:06<01:28, 1.63it/s] {'loss': 0.1407, 'grad_norm': 0.6248553395271301, 'learning_rate': 4.754306155096977e-09, 'epoch': 2.96}
99%|█████████▉| 11382/11526 [1:59:06<01:28, 1.63it/s] 99%|█████████▉| 11383/11526 [1:59:07<01:27, 1.63it/s] {'loss': 0.1477, 'grad_norm': 0.583594024181366, 'learning_rate': 4.688513688893004e-09, 'epoch': 2.96}
99%|█████████▉| 11383/11526 [1:59:07<01:27, 1.63it/s] 99%|█████████▉| 11384/11526 [1:59:07<01:27, 1.63it/s] {'loss': 0.1836, 'grad_norm': 0.627795934677124, 'learning_rate': 4.6231794210804816e-09, 'epoch': 2.96}
99%|█████████▉| 11384/11526 [1:59:08<01:27, 1.63it/s] 99%|█████████▉| 11385/11526 [1:59:08<01:26, 1.63it/s] {'loss': 0.1371, 'grad_norm': 0.5155497789382935, 'learning_rate': 4.5583033576529486e-09, 'epoch': 2.96}
99%|█████████▉| 11385/11526 [1:59:08<01:26, 1.63it/s] 99%|█████████▉| 11386/11526 [1:59:09<01:26, 1.63it/s] {'loss': 0.1115, 'grad_norm': 0.5552707314491272, 'learning_rate': 4.4938855045600915e-09, 'epoch': 2.96}
99%|█████████▉| 11386/11526 [1:59:09<01:26, 1.63it/s] 99%|█████████▉| 11387/11526 [1:59:09<01:25, 1.63it/s] {'loss': 0.1631, 'grad_norm': 0.5898129940032959, 'learning_rate': 4.429925867711626e-09, 'epoch': 2.96}
99%|█████████▉| 11387/11526 [1:59:09<01:25, 1.63it/s] 99%|█████████▉| 11388/11526 [1:59:10<01:24, 1.63it/s] {'loss': 0.125, 'grad_norm': 0.4963190257549286, 'learning_rate': 4.366424452973972e-09, 'epoch': 2.96}
99%|█████████▉| 11388/11526 [1:59:10<01:24, 1.63it/s] 99%|█████████▉| 11389/11526 [1:59:11<01:24, 1.63it/s] {'loss': 0.1628, 'grad_norm': 0.6717686057090759, 'learning_rate': 4.303381266171913e-09, 'epoch': 2.96}
99%|█████████▉| 11389/11526 [1:59:11<01:24, 1.63it/s] 99%|█████████▉| 11390/11526 [1:59:11<01:23, 1.63it/s] {'loss': 0.145, 'grad_norm': 0.6157485842704773, 'learning_rate': 4.240796313088047e-09, 'epoch': 2.96}
99%|█████████▉| 11390/11526 [1:59:11<01:23, 1.63it/s] 99%|█████████▉| 11391/11526 [1:59:12<01:23, 1.63it/s] {'loss': 0.1474, 'grad_norm': 0.6127716898918152, 'learning_rate': 4.178669599462781e-09, 'epoch': 2.96}
99%|█████████▉| 11391/11526 [1:59:12<01:23, 1.63it/s] 99%|█████████▉| 11392/11526 [1:59:12<01:22, 1.63it/s] {'loss': 0.1533, 'grad_norm': 0.6435173749923706, 'learning_rate': 4.117001130995446e-09, 'epoch': 2.97}
99%|█████████▉| 11392/11526 [1:59:13<01:22, 1.63it/s] 99%|█████████▉| 11393/11526 [1:59:13<01:21, 1.63it/s] {'loss': 0.1398, 'grad_norm': 0.5472415089607239, 'learning_rate': 4.055790913342072e-09, 'epoch': 2.97}
99%|█████████▉| 11393/11526 [1:59:13<01:21, 1.63it/s] 99%|█████████▉| 11394/11526 [1:59:14<01:21, 1.63it/s] {'loss': 0.1519, 'grad_norm': 0.5706294178962708, 'learning_rate': 3.995038952117058e-09, 'epoch': 2.97}
99%|█████████▉| 11394/11526 [1:59:14<01:21, 1.63it/s] 99%|█████████▉| 11395/11526 [1:59:14<01:20, 1.63it/s] {'loss': 0.122, 'grad_norm': 0.5117437243461609, 'learning_rate': 3.9347452528931684e-09, 'epoch': 2.97}
99%|█████████▉| 11395/11526 [1:59:14<01:20, 1.63it/s] 99%|█████████▉| 11396/11526 [1:59:15<01:20, 1.62it/s] {'loss': 0.1399, 'grad_norm': 0.5093418955802917, 'learning_rate': 3.874909821200978e-09, 'epoch': 2.97}
99%|█████████▉| 11396/11526 [1:59:15<01:20, 1.62it/s] 99%|█████████▉| 11397/11526 [1:59:15<01:19, 1.62it/s] {'loss': 0.1283, 'grad_norm': 0.501716136932373, 'learning_rate': 3.815532662528875e-09, 'epoch': 2.97}
99%|█████████▉| 11397/11526 [1:59:16<01:19, 1.62it/s] 99%|█████████▉| 11398/11526 [1:59:16<01:18, 1.62it/s] {'loss': 0.1314, 'grad_norm': 0.621728777885437, 'learning_rate': 3.75661378232306e-09, 'epoch': 2.97}
99%|█████████▉| 11398/11526 [1:59:16<01:18, 1.62it/s] 99%|█████████▉| 11399/11526 [1:59:17<01:18, 1.63it/s] {'loss': 0.1749, 'grad_norm': 0.5635420083999634, 'learning_rate': 3.698153185988651e-09, 'epoch': 2.97}
99%|█████████▉| 11399/11526 [1:59:17<01:18, 1.63it/s] 99%|█████████▉| 11400/11526 [1:59:17<01:17, 1.62it/s] {'loss': 0.1495, 'grad_norm': 0.6737257242202759, 'learning_rate': 3.6401508788869167e-09, 'epoch': 2.97}
99%|█████████▉| 11400/11526 [1:59:17<01:17, 1.62it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 13.19it/s]
31%|███ | 4/13 [00:00<00:01, 8.37it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.78it/s]
46%|████▌ | 6/13 [00:00<00:00, 7.41it/s]
54%|█████▍ | 7/13 [00:00<00:00, 7.17it/s]
62%|██████▏ | 8/13 [00:01<00:00, 7.00it/s]
69%|██████▉ | 9/13 [00:01<00:00, 6.89it/s]
77%|███████▋ | 10/13 [00:01<00:00, 6.82it/s]
85%|████████▍ | 11/13 [00:01<00:00, 6.77it/s]
92%|█████████▏| 12/13 [00:01<00:00, 6.73it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
{'eval_loss': 0.5424109101295471, 'eval_runtime': 1.9577, 'eval_samples_per_second': 102.159, 'eval_steps_per_second': 6.64, 'epoch': 2.97}
99%|█████████▉| 11400/11526 [1:59:19<01:17, 1.62it/s]
100%|██████████| 13/13 [00:01<00:00, 6.74it/s]
 99%|█████████▉| 11401/11526 [1:59:20<02:30, 1.20s/it] {'loss': 0.1443, 'grad_norm': 0.7185093760490417, 'learning_rate': 3.5826068663386006e-09, 'epoch': 2.97}
99%|█████████▉| 11401/11526 [1:59:20<02:30, 1.20s/it] 99%|█████████▉| 11402/11526 [1:59:21<02:07, 1.03s/it] {'loss': 0.1669, 'grad_norm': 0.7059828639030457, 'learning_rate': 3.525521153622258e-09, 'epoch': 2.97}
99%|█████████▉| 11402/11526 [1:59:21<02:07, 1.03s/it] 99%|█████████▉| 11403/11526 [1:59:21<01:51, 1.11it/s] {'loss': 0.1794, 'grad_norm': 0.6493867635726929, 'learning_rate': 3.4688937459737004e-09, 'epoch': 2.97}
99%|█████████▉| 11403/11526 [1:59:21<01:51, 1.11it/s] 99%|█████████▉| 11404/11526 [1:59:22<01:39, 1.23it/s] {'loss': 0.1428, 'grad_norm': 0.6364675760269165, 'learning_rate': 3.412724648587107e-09, 'epoch': 2.97}
99%|█████████▉| 11404/11526 [1:59:22<01:39, 1.23it/s] 99%|█████████▉| 11405/11526 [1:59:22<01:31, 1.32it/s] {'loss': 0.1045, 'grad_norm': 0.42115747928619385, 'learning_rate': 3.357013866615022e-09, 'epoch': 2.97}
99%|█████████▉| 11405/11526 [1:59:22<01:31, 1.32it/s] 99%|█████████▉| 11406/11526 [1:59:23<01:25, 1.40it/s] {'loss': 0.1261, 'grad_norm': 0.5442671775817871, 'learning_rate': 3.3017614051678026e-09, 'epoch': 2.97}
99%|█████████▉| 11406/11526 [1:59:23<01:25, 1.40it/s] 99%|█████████▉| 11407/11526 [1:59:24<01:21, 1.46it/s] {'loss': 0.1898, 'grad_norm': 0.6838012933731079, 'learning_rate': 3.2469672693125067e-09, 'epoch': 2.97}
99%|█████████▉| 11407/11526 [1:59:24<01:21, 1.46it/s] 99%|█████████▉| 11408/11526 [1:59:24<01:18, 1.51it/s] {'loss': 0.163, 'grad_norm': 0.5598217248916626, 'learning_rate': 3.1926314640756684e-09, 'epoch': 2.97}
99%|█████████▉| 11408/11526 [1:59:24<01:18, 1.51it/s] 99%|█████████▉| 11409/11526 [1:59:25<01:15, 1.54it/s] {'loss': 0.1294, 'grad_norm': 0.566270112991333, 'learning_rate': 3.1387539944416347e-09, 'epoch': 2.97}
99%|█████████▉| 11409/11526 [1:59:25<01:15, 1.54it/s] 99%|█████████▉| 11410/11526 [1:59:25<01:13, 1.57it/s] {'loss': 0.1242, 'grad_norm': 0.4976848363876343, 'learning_rate': 3.0853348653520077e-09, 'epoch': 2.97}
99%|█████████▉| 11410/11526 [1:59:26<01:13, 1.57it/s] 99%|█████████▉| 11411/11526 [1:59:26<01:12, 1.58it/s] {'loss': 0.1378, 'grad_norm': 0.6319066286087036, 'learning_rate': 3.0323740817067572e-09, 'epoch': 2.97}
99%|█████████▉| 11411/11526 [1:59:26<01:12, 1.58it/s] 99%|█████████▉| 11412/11526 [1:59:27<01:11, 1.60it/s] {'loss': 0.1207, 'grad_norm': 0.5464583039283752, 'learning_rate': 2.9798716483636635e-09, 'epoch': 2.97}
99%|█████████▉| 11412/11526 [1:59:27<01:11, 1.60it/s] 99%|█████████▉| 11413/11526 [1:59:27<01:10, 1.61it/s] {'loss': 0.1704, 'grad_norm': 0.7407466173171997, 'learning_rate': 2.92782757013832e-09, 'epoch': 2.97}
99%|█████████▉| 11413/11526 [1:59:27<01:10, 1.61it/s] 99%|█████████▉| 11414/11526 [1:59:28<01:09, 1.61it/s] {'loss': 0.154, 'grad_norm': 0.5685294270515442, 'learning_rate': 2.876241851805239e-09, 'epoch': 2.97}
99%|█████████▉| 11414/11526 [1:59:28<01:09, 1.61it/s] 99%|█████████▉| 11415/11526 [1:59:28<01:08, 1.62it/s] {'loss': 0.1345, 'grad_norm': 0.5435574054718018, 'learning_rate': 2.825114498095638e-09, 'epoch': 2.97}
99%|█████████▉| 11415/11526 [1:59:29<01:08, 1.62it/s] 99%|█████████▉| 11416/11526 [1:59:29<01:07, 1.62it/s] {'loss': 0.1559, 'grad_norm': 0.5958377122879028, 'learning_rate': 2.7744455136990976e-09, 'epoch': 2.97}
99%|█████████▉| 11416/11526 [1:59:29<01:07, 1.62it/s] 99%|█████████▉| 11417/11526 [1:59:30<01:07, 1.62it/s] {'loss': 0.1783, 'grad_norm': 0.6609383225440979, 'learning_rate': 2.724234903263567e-09, 'epoch': 2.97}
99%|█████████▉| 11417/11526 [1:59:30<01:07, 1.62it/s] 99%|█████████▉| 11418/11526 [1:59:30<01:06, 1.62it/s] {'loss': 0.151, 'grad_norm': 0.5967975854873657, 'learning_rate': 2.674482671394807e-09, 'epoch': 2.97}
99%|█████████▉| 11418/11526 [1:59:30<01:06, 1.62it/s] 99%|█████████▉| 11419/11526 [1:59:31<01:05, 1.62it/s] {'loss': 0.1573, 'grad_norm': 0.6410059332847595, 'learning_rate': 2.625188822655833e-09, 'epoch': 2.97}
99%|█████████▉| 11419/11526 [1:59:31<01:05, 1.62it/s] 99%|█████████▉| 11420/11526 [1:59:32<01:05, 1.62it/s] {'loss': 0.1399, 'grad_norm': 0.5403884053230286, 'learning_rate': 2.5763533615685844e-09, 'epoch': 2.97}
99%|█████████▉| 11420/11526 [1:59:32<01:05, 1.62it/s] 99%|█████████▉| 11421/11526 [1:59:32<01:04, 1.62it/s] {'loss': 0.1952, 'grad_norm': 0.7870845794677734, 'learning_rate': 2.5279762926122554e-09, 'epoch': 2.97}
99%|█████████▉| 11421/11526 [1:59:32<01:04, 1.62it/s] 99%|█████████▉| 11422/11526 [1:59:33<01:03, 1.63it/s] {'loss': 0.1582, 'grad_norm': 0.5475623607635498, 'learning_rate': 2.4800576202238524e-09, 'epoch': 2.97}
99%|█████████▉| 11422/11526 [1:59:33<01:03, 1.63it/s] 99%|█████████▉| 11423/11526 [1:59:33<01:03, 1.63it/s] {'loss': 0.1372, 'grad_norm': 0.5708506107330322, 'learning_rate': 2.432597348799859e-09, 'epoch': 2.97}
99%|█████████▉| 11423/11526 [1:59:34<01:03, 1.63it/s] 99%|█████████▉| 11424/11526 [1:59:34<01:02, 1.63it/s] {'loss': 0.1292, 'grad_norm': 0.5492356419563293, 'learning_rate': 2.385595482692904e-09, 'epoch': 2.97}
99%|█████████▉| 11424/11526 [1:59:34<01:02, 1.63it/s] 99%|█████████▉| 11425/11526 [1:59:35<01:02, 1.62it/s] {'loss': 0.1585, 'grad_norm': 0.619196891784668, 'learning_rate': 2.339052026214539e-09, 'epoch': 2.97}
99%|█████████▉| 11425/11526 [1:59:35<01:02, 1.62it/s] 99%|█████████▉| 11426/11526 [1:59:35<01:01, 1.62it/s] {'loss': 0.1531, 'grad_norm': 0.6456537842750549, 'learning_rate': 2.292966983633571e-09, 'epoch': 2.97}
99%|█████████▉| 11426/11526 [1:59:35<01:01, 1.62it/s] 99%|█████████▉| 11427/11526 [1:59:36<01:01, 1.62it/s] {'loss': 0.1679, 'grad_norm': 0.6270414590835571, 'learning_rate': 2.2473403591777297e-09, 'epoch': 2.97}
99%|█████████▉| 11427/11526 [1:59:36<01:01, 1.62it/s] 99%|█████████▉| 11428/11526 [1:59:37<01:00, 1.62it/s] {'loss': 0.146, 'grad_norm': 0.5685920119285583, 'learning_rate': 2.202172157032001e-09, 'epoch': 2.97}
99%|█████████▉| 11428/11526 [1:59:37<01:00, 1.62it/s] 99%|█████████▉| 11429/11526 [1:59:37<00:59, 1.62it/s] {'loss': 0.1445, 'grad_norm': 0.5671069025993347, 'learning_rate': 2.1574623813391814e-09, 'epoch': 2.97}
99%|█████████▉| 11429/11526 [1:59:37<00:59, 1.62it/s] 99%|█████████▉| 11430/11526 [1:59:38<00:59, 1.62it/s] {'loss': 0.167, 'grad_norm': 0.6410182118415833, 'learning_rate': 2.1132110361998802e-09, 'epoch': 2.98}
99%|█████████▉| 11430/11526 [1:59:38<00:59, 1.62it/s] 99%|█████████▉| 11431/11526 [1:59:38<00:58, 1.62it/s] {'loss': 0.1276, 'grad_norm': 0.5474736094474792, 'learning_rate': 2.069418125674183e-09, 'epoch': 2.98}
99%|█████████▉| 11431/11526 [1:59:38<00:58, 1.62it/s] 99%|█████████▉| 11432/11526 [1:59:39<00:57, 1.62it/s] {'loss': 0.1503, 'grad_norm': 0.6460645794868469, 'learning_rate': 2.026083653778321e-09, 'epoch': 2.98}
99%|█████████▉| 11432/11526 [1:59:39<00:57, 1.62it/s] 99%|█████████▉| 11433/11526 [1:59:40<00:57, 1.63it/s] {'loss': 0.1687, 'grad_norm': 0.6892236471176147, 'learning_rate': 1.983207624488004e-09, 'epoch': 2.98}
99%|█████████▉| 11433/11526 [1:59:40<00:57, 1.63it/s] 99%|█████████▉| 11434/11526 [1:59:40<00:56, 1.62it/s] {'loss': 0.1491, 'grad_norm': 0.5772979855537415, 'learning_rate': 1.9407900417345304e-09, 'epoch': 2.98}
99%|█████████▉| 11434/11526 [1:59:40<00:56, 1.62it/s] 99%|█████████▉| 11435/11526 [1:59:41<00:56, 1.62it/s] {'loss': 0.1759, 'grad_norm': 0.5903696417808533, 'learning_rate': 1.898830909410343e-09, 'epoch': 2.98}
99%|█████████▉| 11435/11526 [1:59:41<00:56, 1.62it/s] 99%|█████████▉| 11436/11526 [1:59:41<00:55, 1.63it/s] {'loss': 0.1644, 'grad_norm': 0.5872316360473633, 'learning_rate': 1.8573302313629193e-09, 'epoch': 2.98}
99%|█████████▉| 11436/11526 [1:59:42<00:55, 1.63it/s] 99%|█████████▉| 11437/11526 [1:59:42<00:54, 1.63it/s] {'loss': 0.1489, 'grad_norm': 0.5859439969062805, 'learning_rate': 1.8162880113992142e-09, 'epoch': 2.98}
99%|█████████▉| 11437/11526 [1:59:42<00:54, 1.63it/s] 99%|█████████▉| 11438/11526 [1:59:43<00:54, 1.63it/s] {'loss': 0.1291, 'grad_norm': 0.5087520480155945, 'learning_rate': 1.7757042532845492e-09, 'epoch': 2.98}
99%|█████████▉| 11438/11526 [1:59:43<00:54, 1.63it/s] 99%|█████████▉| 11439/11526 [1:59:43<00:53, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.6201165914535522, 'learning_rate': 1.7355789607409469e-09, 'epoch': 2.98}
99%|█████████▉| 11439/11526 [1:59:43<00:53, 1.63it/s] 99%|█████████▉| 11440/11526 [1:59:44<00:52, 1.62it/s] {'loss': 0.1307, 'grad_norm': 0.5336241126060486, 'learning_rate': 1.6959121374487963e-09, 'epoch': 2.98}
99%|█████████▉| 11440/11526 [1:59:44<00:52, 1.62it/s] 99%|█████████▉| 11441/11526 [1:59:45<00:52, 1.63it/s] {'loss': 0.1467, 'grad_norm': 0.5131561160087585, 'learning_rate': 1.656703787046854e-09, 'epoch': 2.98}
99%|█████████▉| 11441/11526 [1:59:45<00:52, 1.63it/s] 99%|█████████▉| 11442/11526 [1:59:45<00:51, 1.63it/s] {'loss': 0.1375, 'grad_norm': 0.6338952779769897, 'learning_rate': 1.6179539131316868e-09, 'epoch': 2.98}
99%|█████████▉| 11442/11526 [1:59:45<00:51, 1.63it/s] 99%|█████████▉| 11443/11526 [1:59:46<00:51, 1.63it/s] {'loss': 0.1731, 'grad_norm': 0.6345216035842896, 'learning_rate': 1.5796625192571192e-09, 'epoch': 2.98}
99%|█████████▉| 11443/11526 [1:59:46<00:51, 1.63it/s] 99%|█████████▉| 11444/11526 [1:59:46<00:50, 1.63it/s] {'loss': 0.1924, 'grad_norm': 0.6947850584983826, 'learning_rate': 1.5418296089358964e-09, 'epoch': 2.98}
99%|█████████▉| 11444/11526 [1:59:46<00:50, 1.63it/s] 99%|█████████▉| 11445/11526 [1:59:47<00:49, 1.62it/s] {'loss': 0.2006, 'grad_norm': 0.6766084432601929, 'learning_rate': 1.5044551856380207e-09, 'epoch': 2.98}
99%|█████████▉| 11445/11526 [1:59:47<00:49, 1.62it/s] 99%|█████████▉| 11446/11526 [1:59:48<00:49, 1.63it/s] {'loss': 0.1323, 'grad_norm': 0.48744624853134155, 'learning_rate': 1.4675392527918608e-09, 'epoch': 2.98}
99%|█████████▉| 11446/11526 [1:59:48<00:49, 1.63it/s] 99%|█████████▉| 11447/11526 [1:59:48<00:48, 1.63it/s] {'loss': 0.158, 'grad_norm': 0.6428831219673157, 'learning_rate': 1.4310818137830419e-09, 'epoch': 2.98}
99%|█████████▉| 11447/11526 [1:59:48<00:48, 1.63it/s] 99%|█████████▉| 11448/11526 [1:59:49<00:47, 1.63it/s] {'loss': 0.14, 'grad_norm': 0.5428009033203125, 'learning_rate': 1.3950828719566656e-09, 'epoch': 2.98}
99%|█████████▉| 11448/11526 [1:59:49<00:47, 1.63it/s] 99%|█████████▉| 11449/11526 [1:59:49<00:47, 1.62it/s] {'loss': 0.1239, 'grad_norm': 0.5145309567451477, 'learning_rate': 1.3595424306139803e-09, 'epoch': 2.98}
99%|█████████▉| 11449/11526 [1:59:50<00:47, 1.62it/s] 99%|█████████▉| 11450/11526 [1:59:50<00:46, 1.62it/s] {'loss': 0.1489, 'grad_norm': 0.5729084014892578, 'learning_rate': 1.3244604930151562e-09, 'epoch': 2.98}
99%|█████████▉| 11450/11526 [1:59:50<00:46, 1.62it/s] 99%|█████████▉| 11451/11526 [1:59:51<00:46, 1.61it/s] {'loss': 0.1501, 'grad_norm': 0.5610694885253906, 'learning_rate': 1.289837062377619e-09, 'epoch': 2.98}
99%|█████████▉| 11451/11526 [1:59:51<00:46, 1.61it/s] 99%|█████████▉| 11452/11526 [1:59:51<00:45, 1.62it/s] {'loss': 0.1408, 'grad_norm': 0.588186502456665, 'learning_rate': 1.2556721418782725e-09, 'epoch': 2.98}
99%|█████████▉| 11452/11526 [1:59:51<00:45, 1.62it/s] 99%|█████████▉| 11453/11526 [1:59:52<00:45, 1.62it/s] {'loss': 0.1414, 'grad_norm': 0.6011470556259155, 'learning_rate': 1.2219657346501657e-09, 'epoch': 2.98}
99%|█████████▉| 11453/11526 [1:59:52<00:45, 1.62it/s] 99%|█████████▉| 11454/11526 [1:59:53<00:44, 1.62it/s] {'loss': 0.1468, 'grad_norm': 0.5858942866325378, 'learning_rate': 1.1887178437852698e-09, 'epoch': 2.98}
99%|█████████▉| 11454/11526 [1:59:53<00:44, 1.62it/s] 99%|█████████▉| 11455/11526 [1:59:53<00:43, 1.62it/s] {'loss': 0.1761, 'grad_norm': 0.6736816167831421, 'learning_rate': 1.1559284723333675e-09, 'epoch': 2.98}
99%|█████████▉| 11455/11526 [1:59:53<00:43, 1.62it/s] 99%|█████████▉| 11456/11526 [1:59:54<00:43, 1.62it/s] {'loss': 0.1363, 'grad_norm': 0.5413865447044373, 'learning_rate': 1.123597623302053e-09, 'epoch': 2.98}
99%|█████████▉| 11456/11526 [1:59:54<00:43, 1.62it/s] 99%|█████████▉| 11457/11526 [1:59:54<00:42, 1.62it/s] {'loss': 0.166, 'grad_norm': 0.6700513362884521, 'learning_rate': 1.091725299656732e-09, 'epoch': 2.98}
99%|█████████▉| 11457/11526 [1:59:54<00:42, 1.62it/s] 99%|█████████▉| 11458/11526 [1:59:55<00:41, 1.63it/s] {'loss': 0.1487, 'grad_norm': 0.6524859070777893, 'learning_rate': 1.0603115043206213e-09, 'epoch': 2.98}
99%|█████████▉| 11458/11526 [1:59:55<00:41, 1.63it/s] 99%|█████████▉| 11459/11526 [1:59:56<00:41, 1.63it/s] {'loss': 0.1581, 'grad_norm': 0.8047919869422913, 'learning_rate': 1.0293562401758605e-09, 'epoch': 2.98}
99%|█████████▉| 11459/11526 [1:59:56<00:41, 1.63it/s] 99%|█████████▉| 11460/11526 [1:59:56<00:40, 1.62it/s] {'loss': 0.1321, 'grad_norm': 0.5759990215301514, 'learning_rate': 9.988595100612896e-10, 'epoch': 2.98}
99%|█████████▉| 11460/11526 [1:59:56<00:40, 1.62it/s] 99%|█████████▉| 11461/11526 [1:59:57<00:40, 1.62it/s] {'loss': 0.1571, 'grad_norm': 0.5865100622177124, 'learning_rate': 9.688213167746706e-10, 'epoch': 2.98}
99%|█████████▉| 11461/11526 [1:59:57<00:40, 1.62it/s] 99%|█████████▉| 11462/11526 [1:59:57<00:39, 1.63it/s] {'loss': 0.169, 'grad_norm': 0.6157423257827759, 'learning_rate': 9.39241663071022e-10, 'epoch': 2.98}
99%|█████████▉| 11462/11526 [1:59:58<00:39, 1.63it/s] 99%|█████████▉| 11463/11526 [1:59:58<00:38, 1.63it/s] {'loss': 0.1264, 'grad_norm': 0.4850750267505646, 'learning_rate': 9.101205516637291e-10, 'epoch': 2.98}
99%|█████████▉| 11463/11526 [1:59:58<00:38, 1.63it/s] 99%|█████████▉| 11464/11526 [1:59:59<00:38, 1.62it/s] {'loss': 0.1398, 'grad_norm': 0.5201992392539978, 'learning_rate': 8.814579852234328e-10, 'epoch': 2.98}
99%|█████████▉| 11464/11526 [1:59:59<00:38, 1.62it/s] 99%|█████████▉| 11465/11526 [1:59:59<00:37, 1.62it/s] {'loss': 0.143, 'grad_norm': 0.5964159369468689, 'learning_rate': 8.532539663796968e-10, 'epoch': 2.98}
99%|█████████▉| 11465/11526 [1:59:59<00:37, 1.62it/s] 99%|█████████▉| 11466/11526 [2:00:00<00:36, 1.62it/s] {'loss': 0.1681, 'grad_norm': 0.6532237529754639, 'learning_rate': 8.255084977198957e-10, 'epoch': 2.98}
99%|█████████▉| 11466/11526 [2:00:00<00:36, 1.62it/s] 99%|█████████▉| 11467/11526 [2:00:01<00:36, 1.62it/s] {'loss': 0.2252, 'grad_norm': 0.7311219573020935, 'learning_rate': 7.982215817881056e-10, 'epoch': 2.98}
99%|█████████▉| 11467/11526 [2:00:01<00:36, 1.62it/s] 99%|█████████▉| 11468/11526 [2:00:01<00:35, 1.63it/s] {'loss': 0.1707, 'grad_norm': 0.6453463435173035, 'learning_rate': 7.713932210884345e-10, 'epoch': 2.98}
99%|█████████▉| 11468/11526 [2:00:01<00:35, 1.63it/s] 100%|█████████▉| 11469/11526 [2:00:02<00:35, 1.62it/s] {'loss': 0.1359, 'grad_norm': 0.5668453574180603, 'learning_rate': 7.450234180800264e-10, 'epoch': 2.99}
100%|█████████▉| 11469/11526 [2:00:02<00:35, 1.62it/s] 100%|█████████▉| 11470/11526 [2:00:02<00:34, 1.62it/s] {'loss': 0.134, 'grad_norm': 0.7122533917427063, 'learning_rate': 7.191121751831675e-10, 'epoch': 2.99}
100%|█████████▉| 11470/11526 [2:00:03<00:34, 1.62it/s] 100%|█████████▉| 11471/11526 [2:00:03<00:33, 1.62it/s] {'loss': 0.1646, 'grad_norm': 0.5300202965736389, 'learning_rate': 6.936594947742903e-10, 'epoch': 2.99}
100%|█████████▉| 11471/11526 [2:00:03<00:33, 1.62it/s] 100%|█████████▉| 11472/11526 [2:00:04<00:33, 1.62it/s] {'loss': 0.1792, 'grad_norm': 0.6572644114494324, 'learning_rate': 6.686653791876386e-10, 'epoch': 2.99}
100%|█████████▉| 11472/11526 [2:00:04<00:33, 1.62it/s] 100%|█████████▉| 11473/11526 [2:00:04<00:32, 1.63it/s] {'loss': 0.1704, 'grad_norm': 0.603084146976471, 'learning_rate': 6.441298307158228e-10, 'epoch': 2.99}
100%|█████████▉| 11473/11526 [2:00:04<00:32, 1.63it/s] 100%|█████████▉| 11474/11526 [2:00:05<00:31, 1.63it/s] {'loss': 0.1784, 'grad_norm': 0.758007287979126, 'learning_rate': 6.200528516098204e-10, 'epoch': 2.99}
100%|█████████▉| 11474/11526 [2:00:05<00:31, 1.63it/s] 100%|█████████▉| 11475/11526 [2:00:05<00:31, 1.62it/s] {'loss': 0.1969, 'grad_norm': 0.6595556139945984, 'learning_rate': 5.964344440778647e-10, 'epoch': 2.99}
100%|█████████▉| 11475/11526 [2:00:06<00:31, 1.62it/s] 100%|█████████▉| 11476/11526 [2:00:06<00:30, 1.62it/s] {'loss': 0.1417, 'grad_norm': 0.531283438205719, 'learning_rate': 5.73274610286556e-10, 'epoch': 2.99}
100%|█████████▉| 11476/11526 [2:00:06<00:30, 1.62it/s] 100%|█████████▉| 11477/11526 [2:00:07<00:30, 1.63it/s] {'loss': 0.1599, 'grad_norm': 0.5592915415763855, 'learning_rate': 5.505733523597512e-10, 'epoch': 2.99}
100%|█████████▉| 11477/11526 [2:00:07<00:30, 1.63it/s] 100%|█████████▉| 11478/11526 [2:00:07<00:29, 1.63it/s] {'loss': 0.1867, 'grad_norm': 0.6660749316215515, 'learning_rate': 5.283306723802284e-10, 'epoch': 2.99}
100%|█████████▉| 11478/11526 [2:00:07<00:29, 1.63it/s] 100%|█████████▉| 11479/11526 [2:00:08<00:28, 1.63it/s] {'loss': 0.1854, 'grad_norm': 0.6415063738822937, 'learning_rate': 5.065465723874674e-10, 'epoch': 2.99}
100%|█████████▉| 11479/11526 [2:00:08<00:28, 1.63it/s] 100%|█████████▉| 11480/11526 [2:00:09<00:28, 1.63it/s] {'loss': 0.1466, 'grad_norm': 0.6156145930290222, 'learning_rate': 4.852210543809799e-10, 'epoch': 2.99}
100%|█████████▉| 11480/11526 [2:00:09<00:28, 1.63it/s] 100%|█████████▉| 11481/11526 [2:00:09<00:27, 1.62it/s] {'loss': 0.1729, 'grad_norm': 0.6700217127799988, 'learning_rate': 4.643541203158686e-10, 'epoch': 2.99}
100%|█████████▉| 11481/11526 [2:00:09<00:27, 1.62it/s] 100%|█████████▉| 11482/11526 [2:00:10<00:27, 1.63it/s] {'loss': 0.1091, 'grad_norm': 0.5034483671188354, 'learning_rate': 4.4394577210615795e-10, 'epoch': 2.99}
100%|█████████▉| 11482/11526 [2:00:10<00:27, 1.63it/s] 100%|█████████▉| 11483/11526 [2:00:10<00:26, 1.63it/s] {'loss': 0.1445, 'grad_norm': 0.5793138742446899, 'learning_rate': 4.239960116242392e-10, 'epoch': 2.99}
100%|█████████▉| 11483/11526 [2:00:10<00:26, 1.63it/s] 100%|█████████▉| 11484/11526 [2:00:11<00:25, 1.63it/s] {'loss': 0.1647, 'grad_norm': 0.7162911295890808, 'learning_rate': 4.045048406997598e-10, 'epoch': 2.99}
100%|█████████▉| 11484/11526 [2:00:11<00:25, 1.63it/s] 100%|█████████▉| 11485/11526 [2:00:12<00:25, 1.62it/s] {'loss': 0.22, 'grad_norm': 0.7727014422416687, 'learning_rate': 3.854722611201789e-10, 'epoch': 2.99}
100%|█████████▉| 11485/11526 [2:00:12<00:25, 1.62it/s] 100%|█████████▉| 11486/11526 [2:00:12<00:24, 1.62it/s] {'loss': 0.1357, 'grad_norm': 0.5251969695091248, 'learning_rate': 3.6689827463243233e-10, 'epoch': 2.99}
100%|█████████▉| 11486/11526 [2:00:12<00:24, 1.62it/s] 100%|█████████▉| 11487/11526 [2:00:13<00:24, 1.62it/s] {'loss': 0.1477, 'grad_norm': 0.5830000042915344, 'learning_rate': 3.487828829390472e-10, 'epoch': 2.99}
100%|█████████▉| 11487/11526 [2:00:13<00:24, 1.62it/s] 100%|█████████▉| 11488/11526 [2:00:13<00:23, 1.62it/s] {'loss': 0.1325, 'grad_norm': 0.49182483553886414, 'learning_rate': 3.311260877025824e-10, 'epoch': 2.99}
100%|█████████▉| 11488/11526 [2:00:14<00:23, 1.62it/s] 100%|█████████▉| 11489/11526 [2:00:14<00:22, 1.62it/s] {'loss': 0.14, 'grad_norm': 0.5262997150421143, 'learning_rate': 3.139278905417431e-10, 'epoch': 2.99}
100%|█████████▉| 11489/11526 [2:00:14<00:22, 1.62it/s] 100%|█████████▉| 11490/11526 [2:00:15<00:22, 1.63it/s] {'loss': 0.1691, 'grad_norm': 0.6258643865585327, 'learning_rate': 2.971882930347114e-10, 'epoch': 2.99}
100%|█████████▉| 11490/11526 [2:00:15<00:22, 1.63it/s] 100%|█████████▉| 11491/11526 [2:00:15<00:21, 1.62it/s] {'loss': 0.1848, 'grad_norm': 0.6625111699104309, 'learning_rate': 2.809072967163706e-10, 'epoch': 2.99}
100%|█████████▉| 11491/11526 [2:00:15<00:21, 1.62it/s] 100%|█████████▉| 11492/11526 [2:00:16<00:20, 1.62it/s] {'loss': 0.1512, 'grad_norm': 0.597245991230011, 'learning_rate': 2.650849030805258e-10, 'epoch': 2.99}
100%|█████████▉| 11492/11526 [2:00:16<00:20, 1.62it/s] 100%|█████████▉| 11493/11526 [2:00:17<00:20, 1.63it/s] {'loss': 0.1993, 'grad_norm': 0.7132617831230164, 'learning_rate': 2.497211135787936e-10, 'epoch': 2.99}
100%|█████████▉| 11493/11526 [2:00:17<00:20, 1.63it/s] 100%|█████████▉| 11494/11526 [2:00:17<00:19, 1.63it/s] {'loss': 0.1179, 'grad_norm': 0.5011604428291321, 'learning_rate': 2.34815929620047e-10, 'epoch': 2.99}
100%|█████████▉| 11494/11526 [2:00:17<00:19, 1.63it/s] 100%|█████████▉| 11495/11526 [2:00:18<00:19, 1.63it/s] {'loss': 0.1501, 'grad_norm': 0.5391255617141724, 'learning_rate': 2.2036935257097046e-10, 'epoch': 2.99}
100%|█████████▉| 11495/11526 [2:00:18<00:19, 1.63it/s] 100%|█████████▉| 11496/11526 [2:00:18<00:18, 1.63it/s] {'loss': 0.152, 'grad_norm': 0.559592068195343, 'learning_rate': 2.0638138375772553e-10, 'epoch': 2.99}
100%|█████████▉| 11496/11526 [2:00:18<00:18, 1.63it/s] 100%|█████████▉| 11497/11526 [2:00:19<00:17, 1.63it/s] {'loss': 0.1842, 'grad_norm': 0.7114229202270508, 'learning_rate': 1.9285202446261975e-10, 'epoch': 2.99}
100%|█████████▉| 11497/11526 [2:00:19<00:17, 1.63it/s] 100%|█████████▉| 11498/11526 [2:00:20<00:17, 1.63it/s] {'loss': 0.1208, 'grad_norm': 0.4871519207954407, 'learning_rate': 1.7978127592632733e-10, 'epoch': 2.99}
100%|█████████▉| 11498/11526 [2:00:20<00:17, 1.63it/s] 100%|█████████▉| 11499/11526 [2:00:20<00:16, 1.63it/s] {'loss': 0.1359, 'grad_norm': 0.5488213896751404, 'learning_rate': 1.6716913934899937e-10, 'epoch': 2.99}
100%|█████████▉| 11499/11526 [2:00:20<00:16, 1.63it/s] 100%|█████████▉| 11500/11526 [2:00:21<00:16, 1.62it/s] {'loss': 0.1924, 'grad_norm': 0.6756308674812317, 'learning_rate': 1.5501561588637803e-10, 'epoch': 2.99}
100%|█████████▉| 11500/11526 [2:00:21<00:16, 1.62it/s] 100%|█████████▉| 11501/11526 [2:00:21<00:15, 1.62it/s] {'loss': 0.1156, 'grad_norm': 0.515956461429596, 'learning_rate': 1.4332070665423747e-10, 'epoch': 2.99}
100%|█████████▉| 11501/11526 [2:00:22<00:15, 1.62it/s] 100%|█████████▉| 11502/11526 [2:00:22<00:14, 1.62it/s] {'loss': 0.1462, 'grad_norm': 0.6260185837745667, 'learning_rate': 1.320844127239429e-10, 'epoch': 2.99}
100%|█████████▉| 11502/11526 [2:00:22<00:14, 1.62it/s] 100%|█████████▉| 11503/11526 [2:00:23<00:14, 1.62it/s] {'loss': 0.1262, 'grad_norm': 0.5720354318618774, 'learning_rate': 1.2130673512744661e-10, 'epoch': 2.99}
100%|█████████▉| 11503/11526 [2:00:23<00:14, 1.62it/s] 100%|█████████▉| 11504/11526 [2:00:23<00:13, 1.63it/s] {'loss': 0.1527, 'grad_norm': 0.620479941368103, 'learning_rate': 1.1098767485284712e-10, 'epoch': 2.99}
100%|█████████▉| 11504/11526 [2:00:23<00:13, 1.63it/s] 100%|█████████▉| 11505/11526 [2:00:24<00:12, 1.62it/s] {'loss': 0.1399, 'grad_norm': 0.5706897377967834, 'learning_rate': 1.0112723284660953e-10, 'epoch': 2.99}
100%|█████████▉| 11505/11526 [2:00:24<00:12, 1.62it/s] 100%|█████████▉| 11506/11526 [2:00:25<00:12, 1.62it/s] {'loss': 0.1494, 'grad_norm': 0.5779110789299011, 'learning_rate': 9.17254100130105e-11, 'epoch': 2.99}
100%|█████████▉| 11506/11526 [2:00:25<00:12, 1.62it/s] 100%|█████████▉| 11507/11526 [2:00:25<00:11, 1.62it/s] {'loss': 0.1618, 'grad_norm': 0.6469229459762573, 'learning_rate': 8.278220721469332e-11, 'epoch': 3.0}
100%|█████████▉| 11507/11526 [2:00:25<00:11, 1.62it/s] 100%|█████████▉| 11508/11526 [2:00:26<00:11, 1.63it/s] {'loss': 0.1264, 'grad_norm': 0.6042360663414001, 'learning_rate': 7.429762527211282e-11, 'epoch': 3.0}
100%|█████████▉| 11508/11526 [2:00:26<00:11, 1.63it/s] 100%|█████████▉| 11509/11526 [2:00:26<00:10, 1.62it/s] {'loss': 0.1614, 'grad_norm': 0.6123148202896118, 'learning_rate': 6.627166496353533e-11, 'epoch': 3.0}
100%|█████████▉| 11509/11526 [2:00:27<00:10, 1.62it/s] 100%|█████████▉| 11510/11526 [2:00:27<00:09, 1.62it/s] {'loss': 0.1602, 'grad_norm': 0.6222004294395447, 'learning_rate': 5.87043270250387e-11, 'epoch': 3.0}
100%|█████████▉| 11510/11526 [2:00:27<00:09, 1.62it/s] 100%|█████████▉| 11511/11526 [2:00:28<00:09, 1.62it/s] {'loss': 0.1425, 'grad_norm': 0.5687487721443176, 'learning_rate': 5.159561214995723e-11, 'epoch': 3.0}
100%|█████████▉| 11511/11526 [2:00:28<00:09, 1.62it/s] 100%|█████████▉| 11512/11526 [2:00:28<00:08, 1.62it/s] {'loss': 0.1504, 'grad_norm': 0.6656650900840759, 'learning_rate': 4.494552099165716e-11, 'epoch': 3.0}
100%|█████████▉| 11512/11526 [2:00:28<00:08, 1.62it/s] 100%|█████████▉| 11513/11526 [2:00:29<00:08, 1.62it/s] {'loss': 0.1887, 'grad_norm': 0.5998349189758301, 'learning_rate': 3.875405415909583e-11, 'epoch': 3.0}
100%|█████████▉| 11513/11526 [2:00:29<00:08, 1.62it/s] 100%|█████████▉| 11514/11526 [2:00:29<00:07, 1.62it/s] {'loss': 0.162, 'grad_norm': 0.598003089427948, 'learning_rate': 3.302121222126253e-11, 'epoch': 3.0}
100%|█████████▉| 11514/11526 [2:00:30<00:07, 1.62it/s] 100%|█████████▉| 11515/11526 [2:00:30<00:06, 1.62it/s] {'loss': 0.1307, 'grad_norm': 0.5674147605895996, 'learning_rate': 2.7746995702737643e-11, 'epoch': 3.0}
100%|█████████▉| 11515/11526 [2:00:30<00:06, 1.62it/s] 100%|█████████▉| 11516/11526 [2:00:31<00:06, 1.62it/s] {'loss': 0.1468, 'grad_norm': 0.5455756187438965, 'learning_rate': 2.2931405088133518e-11, 'epoch': 3.0}
100%|█████████▉| 11516/11526 [2:00:31<00:06, 1.62it/s] 100%|█████████▉| 11517/11526 [2:00:31<00:05, 1.62it/s] {'loss': 0.1741, 'grad_norm': 0.596160888671875, 'learning_rate': 1.8574440819318918e-11, 'epoch': 3.0}
100%|█████████▉| 11517/11526 [2:00:31<00:05, 1.62it/s] 100%|█████████▉| 11518/11526 [2:00:32<00:04, 1.63it/s] {'loss': 0.1459, 'grad_norm': 0.6387988924980164, 'learning_rate': 1.4676103294863907e-11, 'epoch': 3.0}
100%|█████████▉| 11518/11526 [2:00:32<00:04, 1.63it/s] 100%|█████████▉| 11519/11526 [2:00:33<00:04, 1.62it/s] {'loss': 0.1856, 'grad_norm': 0.7051814794540405, 'learning_rate': 1.1236392873370528e-11, 'epoch': 3.0}
100%|█████████▉| 11519/11526 [2:00:33<00:04, 1.62it/s] 100%|█████████▉| 11520/11526 [2:00:33<00:03, 1.62it/s] {'loss': 0.1848, 'grad_norm': 0.6285011172294617, 'learning_rate': 8.255309870142114e-12, 'epoch': 3.0}
100%|█████████▉| 11520/11526 [2:00:33<00:03, 1.62it/s] 100%|█████████▉| 11521/11526 [2:00:34<00:03, 1.62it/s] {'loss': 0.1383, 'grad_norm': 0.5607538819313049, 'learning_rate': 5.732854558848644e-12, 'epoch': 3.0}
100%|█████████▉| 11521/11526 [2:00:34<00:03, 1.62it/s] 100%|█████████▉| 11522/11526 [2:00:34<00:02, 1.62it/s] {'loss': 0.1324, 'grad_norm': 0.5816987156867981, 'learning_rate': 3.669027169861394e-12, 'epoch': 3.0}
100%|█████████▉| 11522/11526 [2:00:35<00:02, 1.62it/s] 100%|█████████▉| 11523/11526 [2:00:35<00:01, 1.62it/s] {'loss': 0.1592, 'grad_norm': 0.6229835748672485, 'learning_rate': 2.0638278935836142e-12, 'epoch': 3.0}
100%|█████████▉| 11523/11526 [2:00:35<00:01, 1.62it/s] 100%|█████████▉| 11524/11526 [2:00:36<00:01, 1.63it/s] {'loss': 0.1288, 'grad_norm': 0.5024412274360657, 'learning_rate': 9.172568765647427e-13, 'epoch': 3.0}
100%|█████████▉| 11524/11526 [2:00:36<00:01, 1.63it/s] 100%|█████████▉| 11525/11526 [2:00:36<00:00, 1.62it/s] {'loss': 0.1519, 'grad_norm': 0.7073336243629456, 'learning_rate': 2.293142242759672e-13, 'epoch': 3.0}
100%|█████████▉| 11525/11526 [2:00:36<00:00, 1.62it/s] 100%|██████████| 11526/11526 [2:00:37<00:00, 1.62it/s] {'loss': 0.1441, 'grad_norm': 0.5505287647247314, 'learning_rate': 0.0, 'epoch': 3.0}
100%|██████████| 11526/11526 [2:00:37<00:00, 1.62it/s]There were missing keys in the checkpoint model loaded: ['lm_head.weight'].
There were missing keys in the checkpoint model loaded: ['lm_head.weight'].
There were missing keys in the checkpoint model loaded: ['lm_head.weight'].
There were missing keys in the checkpoint model loaded: ['lm_head.weight'].
{'train_runtime': 7255.879, 'train_samples_per_second': 25.41, 'train_steps_per_second': 1.589, 'train_loss': 0.23182308609281774, 'epoch': 3.0}
100%|██████████| 11526/11526 [2:00:55<00:00, 1.62it/s] 100%|██████████| 11526/11526 [2:00:55<00:00, 1.59it/s]
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /data02/users/lz/code/game/cpm/repos2/11/training/wandb/offline-run-20241111_125651-umlr61gt
wandb: Find logs at: wandb/offline-run-20241111_125651-umlr61gt/logs