Model Card for ravialdy/llama2-javanese-chat

This model is a fine-tuned version of NousResearch's LLaMA-2-7b-chat-hf, specifically adapted for the Javanese language (Basa Jawa). It is trained to function as a chatbot, responding fluently and accurately in Javanese. The model was fine-tuned using a translated Javanese dataset, with the intention to enhance the presence of Javanese language in the field of language models and chatbot technology.

Training procedure

The model was fine-tuned on a dataset translated into Javanese language using the NLLB model. The translated dataset includes texts from OASST1 and OASST2, covering a wide range of conversational contexts. The training utilized multi-GPU setups with DeepSpeed, TRL, and LoRA PEFT to enable efficient and fast fine-tuning.

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: fp4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: float32

Framework versions

PyTorch 2.1.0
DeepSpeed (version used for training)
PEFT 0.6.2
Transformers (version used for training)

Model Usage

The model is designed for use as a conversational chatbot in Javanese language. It can be deployed for various applications requiring natural language understanding and generation in Javanese. The model can be interacted with using the typical Hugging Face Transformers pipeline for text generation.

ravialdy
/

llama2-javanese-chat

Model Card for ravialdy/llama2-javanese-chat

Training procedure

Framework versions

Model Usage

Model tree for ravialdy/llama2-javanese-chat