Afrizal Hasbi Azizy
commited on
Commit
β’
607ee93
1
Parent(s):
a00ec2c
Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ Selamat datang!
|
|
41 |
|
42 |
I am ultra-overjoyed to introduce you... the π¦ Kancil! It's a fine-tuned version of Llama 3 8B with the TumpengQA, an instruction dataset of 6.7 million words. Both the model and dataset is openly available in Huggingface.
|
43 |
|
44 |
-
π The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're
|
45 |
|
46 |
π¦ This follows previous efforts for collection of open, fine-tuned Indonesian models, like Merak and Cendol. However, Kancil solely leverages synthetic data in a very creative way, which makes it a very unique contribution!
|
47 |
|
|
|
41 |
|
42 |
I am ultra-overjoyed to introduce you... the π¦ Kancil! It's a fine-tuned version of Llama 3 8B with the TumpengQA, an instruction dataset of 6.7 million words. Both the model and dataset is openly available in Huggingface.
|
43 |
|
44 |
+
π The dataset was synthetically generated from Llama 3 70B. A big problem with existing Indonesian instruction dataset is they're in reality not-very-good-translations of English datasets. Llama 3 70B can generate fluent Indonesian! (with minor caveats π)
|
45 |
|
46 |
π¦ This follows previous efforts for collection of open, fine-tuned Indonesian models, like Merak and Cendol. However, Kancil solely leverages synthetic data in a very creative way, which makes it a very unique contribution!
|
47 |
|