File size: 8,344 Bytes
d2512d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it]
Loading checkpoint shards: 100% 3/3 [00:07<00:00,  2.53s/it]
generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s]
tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s]
tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s]
added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s]
special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Reconstructing layer: model.layers.25.mlp.down_proj
Reduced from torch.Size([4096]) to 3607
Layer mlp.down_proj_25 has already been modified. Skipping.
Restored original weights for layer: model.layers.25.mlp.down_proj
Reconstructing layer: model.layers.25.mlp.down_proj
Reduced from torch.Size([4096]) to 3607
Restored original weights for layer: model.layers.25.mlp.down_proj
['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.']
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s]
==================================================
The initial perplexity of the model is 12.614558219909668
==================================================
Reconstructing layer: model.layers.31.mlp.down_proj
Reduced from torch.Size([4096]) to 3753
avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s]
avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s]
avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s]
Restored original weights for layer: model.layers.31.mlp.down_proj
Reconstructing layer: model.layers.31.mlp.up_proj
Reduced from torch.Size([4096]) to 3717
avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s]
avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s]
avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s]
Restored original weights for layer: model.layers.31.mlp.up_proj
Reconstructing layer: model.layers.31.self_attn.q_proj
Reduced from torch.Size([4096]) to 818
avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s]
avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s]
avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s]
**************************************************
Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1
**************************************************
Reconstructing layer: model.layers.31.self_attn.k_proj
Reduced from torch.Size([1024]) to 524
avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s]
avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s]
avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s]
Restored original weights for layer: model.layers.31.self_attn.k_proj
Reconstructing layer: model.layers.31.self_attn.v_proj
Reduced from torch.Size([1024]) to 846
avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s]
avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s]
avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s]
**************************************************
Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2
**************************************************
Reconstructing layer: model.layers.31.self_attn.o_proj
Reduced from torch.Size([4096]) to 834
avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s]
avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s]
avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s]
Restored original weights for layer: model.layers.31.self_attn.o_proj
Reconstructing layer: model.layers.30.mlp.down_proj
Reduced from torch.Size([4096]) to 3770
avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s]
avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s]
avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s]
Restored original weights for layer: model.layers.30.mlp.down_proj
Reconstructing layer: model.layers.30.mlp.up_proj
Reduced from torch.Size([4096]) to 3787
avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s]
avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s]
avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s]
Restored original weights for layer: model.layers.30.mlp.up_proj
Reconstructing layer: model.layers.30.self_attn.q_proj
Reduced from torch.Size([4096]) to 819
avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s]
avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s]
avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s]
**************************************************
Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3
**************************************************
Reconstructing layer: model.layers.30.self_attn.k_proj
Reduced from torch.Size([1024]) to 524
avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s]
avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s]
avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s]
Restored original weights for layer: model.layers.30.self_attn.k_proj
Reconstructing layer: model.layers.30.self_attn.v_proj
Reduced from torch.Size([1024]) to 812
avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s]
avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s]
avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s]
Restored original weights for layer: model.layers.30.self_attn.v_proj
Reconstructing layer: model.layers.30.self_attn.o_proj
Reduced from torch.Size([4096]) to 859
avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s]
avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s]
avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s]
Restored original weights for layer: model.layers.30.self_attn.o_proj
Reconstructing layer: model.layers.29.mlp.down_proj
Reduced from torch.Size([4096]) to 3763
avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s]
avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s]
avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s]
Restored original weights for layer: model.layers.29.mlp.down_proj
Reconstructing layer: model.layers.29.mlp.up_proj
Reduced from torch.Size([4096]) to 3828
avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s]
avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s]
avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s]
**************************************************
Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4
**************************************************
Reconstructing layer: model.layers.29.self_attn.q_proj
Reduced from torch.Size([4096]) to 803
avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s]
avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s]
avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s]
**************************************************
Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5
**************************************************