AswanthCManoj commited on
Commit
a6df6cf
1 Parent(s): e18d521

azma-deepseek-coder-1.3b-instruct-structured-output

Browse files
Files changed (2) hide show
  1. README.md +24 -18
  2. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -17,8 +17,6 @@ should probably proofread and complete it, then remove this comment. -->
17
  # results
18
 
19
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) on the None dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 1.1485
22
 
23
  ## Model description
24
 
@@ -37,7 +35,7 @@ More information needed
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
40
- - learning_rate: 0.0001
41
  - train_batch_size: 4
42
  - eval_batch_size: 4
43
  - seed: 42
@@ -46,28 +44,36 @@ The following hyperparameters were used during training:
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.03
49
- - lr_scheduler_warmup_steps: 50
50
- - training_steps: 200
51
  - mixed_precision_training: Native AMP
52
 
53
  ### Training results
54
 
55
- | Training Loss | Epoch | Step | Validation Loss |
56
- |:-------------:|:-----:|:----:|:---------------:|
57
- | 1.3759 | 0.02 | 25 | 1.3449 |
58
- | 0.5848 | 0.03 | 50 | 1.2507 |
59
- | 1.0184 | 0.05 | 75 | 1.1688 |
60
- | 0.5275 | 0.07 | 100 | 1.1849 |
61
- | 0.9792 | 0.08 | 125 | 1.1529 |
62
- | 0.5695 | 0.1 | 150 | 1.1572 |
63
- | 0.8567 | 0.11 | 175 | 1.1495 |
64
- | 0.5234 | 0.13 | 200 | 1.1485 |
65
 
66
 
67
  ### Framework versions
68
 
69
- - PEFT 0.7.2.dev0
70
- - Transformers 4.37.0.dev0
71
  - Pytorch 2.1.0+cu121
72
  - Datasets 2.16.1
73
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  # results
18
 
19
  This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) on the None dataset.
 
 
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 0.0002
39
  - train_batch_size: 4
40
  - eval_batch_size: 4
41
  - seed: 42
 
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
  - lr_scheduler_type: cosine
46
  - lr_scheduler_warmup_ratio: 0.03
47
+ - lr_scheduler_warmup_steps: 100
48
+ - num_epochs: 0.5
49
  - mixed_precision_training: Native AMP
50
 
51
  ### Training results
52
 
 
 
 
 
 
 
 
 
 
 
53
 
54
 
55
  ### Framework versions
56
 
57
+ - Transformers 4.36.2
 
58
  - Pytorch 2.1.0+cu121
59
  - Datasets 2.16.1
60
+ - Tokenizers 0.15.0
61
+ ## Training procedure
62
+
63
+
64
+ The following `bitsandbytes` quantization config was used during training:
65
+ - quant_method: bitsandbytes
66
+ - load_in_8bit: False
67
+ - load_in_4bit: True
68
+ - llm_int8_threshold: 6.0
69
+ - llm_int8_skip_modules: None
70
+ - llm_int8_enable_fp32_cpu_offload: False
71
+ - llm_int8_has_fp16_weight: False
72
+ - bnb_4bit_quant_type: nf4
73
+ - bnb_4bit_use_double_quant: True
74
+ - bnb_4bit_compute_dtype: bfloat16
75
+
76
+ ### Framework versions
77
+
78
+
79
+ - PEFT 0.6.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0478dfa6bba88fcd883af6305b3d233433cfa8b8158038961318fcc6215a0141
3
  size 409720
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79acdcc1b8e57db310c2b04aab07db81dd60615a67b42d7a105b2b40f171d1e8
3
  size 409720