sapinedamo commited on
Commit
78c1219
1 Parent(s): ea464ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -1
README.md CHANGED
@@ -2,4 +2,63 @@
2
  license: apache-2.0
3
  language:
4
  - es
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  language:
4
  - es
5
+ ---
6
+
7
+ <div style="text-align:center;width:350px;height:350px;">
8
+ <img src="https://huggingface.co/hackathon-somos-nlp-2023/bertin-gpt-j-6B-es-finetuned-salpaca/resolve/main/Alpaca.png" alt="SAlpaca logo"">
9
+ </div>
10
+
11
+
12
+
13
+ # SAlpaca: Spanish + Alpaca (WIP)
14
+
15
+
16
+ ## Adapter Description
17
+ This adapter was created with the [PEFT](https://github.com/huggingface/peft) library and allowed the base model *bertin-project/bertin-gpt-j-6B* to be fine-tuned on the *Spanish Alpaca Dataset* by using the method *LoRA*.
18
+
19
+
20
+ ## How to use
21
+ ```py
22
+ import torch
23
+ from peft import PeftModel, PeftConfig
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer
25
+
26
+ peft_model_id = "hackathon-somos-nlp-2023/bertin-gpt-j-6B-es-finetuned-salpaca"
27
+ config = PeftConfig.from_pretrained(peft_model_id)
28
+ model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
29
+ # tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
30
+ tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
31
+
32
+ # Load the Lora model
33
+ model = PeftModel.from_pretrained(model, peft_model_id)
34
+
35
+ def gen_conversation(text):
36
+ text = "<SC>instruction: " + text + "\n "
37
+ batch = tokenizer(text, return_tensors='pt')
38
+ with torch.cuda.amp.autocast():
39
+ output_tokens = model.generate(**batch, max_new_tokens=256, eos_token_id=50258, early_stopping = True, temperature=.9)
40
+
41
+ print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=False))
42
+
43
+ text = "hola"
44
+
45
+ gen_conversation(text)
46
+ ```
47
+
48
+
49
+ ## Resources used
50
+ Google Colab machine with the following specifications
51
+ <div style="text-align:center;width:550px;height:550px;">
52
+ <img src="https://huggingface.co/hackathon-somos-nlp-2023/bertin-gpt-j-6B-es-finetuned-salpaca/resolve/main/resource.jpeg" alt="Resource logo">
53
+ </div>
54
+
55
+ ## Citation
56
+ ```
57
+ @misc {hackathon-somos-nlp-2023,
58
+ author = { {Edison Bejarano, Leonardo Bolaños, Alberto Ceballos, Santiago Pineda, Nicolay Potes} },
59
+ title = { SAlpaca },
60
+ year = 2023,
61
+ url = { https://huggingface.co/hackathon-somos-nlp-2023/bertin-gpt-j-6B-es-finetuned-salpaca }
62
+ publisher = { Hugging Face }
63
+ }
64
+ ```