mlabonne
/

NeuralHermes-2.5-Mistral-7B-laser

+Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it]
+Loading checkpoint shards: 100% 3/3 [00:07<00:00,  2.53s/it]
+generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s]
+tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s]
+tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s]
+tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s]
+added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s]
+special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s]
+Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+Reconstructing layer: model.layers.25.mlp.down_proj
+Reduced from torch.Size([4096]) to 3607
+Layer mlp.down_proj_25 has already been modified. Skipping.
+Restored original weights for layer: model.layers.25.mlp.down_proj
+Reconstructing layer: model.layers.25.mlp.down_proj
+Reduced from torch.Size([4096]) to 3607
+Restored original weights for layer: model.layers.25.mlp.down_proj
+['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.']
+Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s]
+/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
+  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
+Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s]
+Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
+avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s]
+==================================================
+The initial perplexity of the model is 12.614558219909668
+==================================================
+Reconstructing layer: model.layers.31.mlp.down_proj
+Reduced from torch.Size([4096]) to 3753
+avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s]
+avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s]
+avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s]
+Restored original weights for layer: model.layers.31.mlp.down_proj
+Reconstructing layer: model.layers.31.mlp.up_proj
+Reduced from torch.Size([4096]) to 3717
+avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s]
+avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s]
+avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s]
+Restored original weights for layer: model.layers.31.mlp.up_proj
+Reconstructing layer: model.layers.31.self_attn.q_proj
+Reduced from torch.Size([4096]) to 818
+avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s]
+avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s]
+avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s]
+**************************************************
+Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1
+**************************************************
+Reconstructing layer: model.layers.31.self_attn.k_proj
+Reduced from torch.Size([1024]) to 524
+avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s]
+avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s]
+avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s]
+Restored original weights for layer: model.layers.31.self_attn.k_proj
+Reconstructing layer: model.layers.31.self_attn.v_proj
+Reduced from torch.Size([1024]) to 846
+avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s]
+avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s]
+avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s]
+**************************************************
+Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2
+**************************************************
+Reconstructing layer: model.layers.31.self_attn.o_proj
+Reduced from torch.Size([4096]) to 834
+avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s]
+avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s]
+avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s]
+Restored original weights for layer: model.layers.31.self_attn.o_proj
+Reconstructing layer: model.layers.30.mlp.down_proj
+Reduced from torch.Size([4096]) to 3770
+avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s]
+avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s]
+avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s]
+Restored original weights for layer: model.layers.30.mlp.down_proj
+Reconstructing layer: model.layers.30.mlp.up_proj
+Reduced from torch.Size([4096]) to 3787
+avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s]
+avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s]
+avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s]
+Restored original weights for layer: model.layers.30.mlp.up_proj
+Reconstructing layer: model.layers.30.self_attn.q_proj
+Reduced from torch.Size([4096]) to 819
+avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s]
+avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s]
+avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s]
+**************************************************
+Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3
+**************************************************
+Reconstructing layer: model.layers.30.self_attn.k_proj
+Reduced from torch.Size([1024]) to 524
+avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s]
+avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s]
+avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s]
+Restored original weights for layer: model.layers.30.self_attn.k_proj
+Reconstructing layer: model.layers.30.self_attn.v_proj
+Reduced from torch.Size([1024]) to 812
+avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s]
+avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s]
+avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s]
+Restored original weights for layer: model.layers.30.self_attn.v_proj
+Reconstructing layer: model.layers.30.self_attn.o_proj
+Reduced from torch.Size([4096]) to 859
+avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s]
+avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s]
+avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s]
+Restored original weights for layer: model.layers.30.self_attn.o_proj
+Reconstructing layer: model.layers.29.mlp.down_proj
+Reduced from torch.Size([4096]) to 3763
+avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s]
+avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s]
+avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s]
+Restored original weights for layer: model.layers.29.mlp.down_proj
+Reconstructing layer: model.layers.29.mlp.up_proj
+Reduced from torch.Size([4096]) to 3828
+avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s]
+avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s]
+avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s]
+**************************************************
+Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4
+**************************************************
+Reconstructing layer: model.layers.29.self_attn.q_proj
+Reduced from torch.Size([4096]) to 803
+avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s]
+avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s]
+avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s]
+**************************************************
+Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5
+**************************************************