Edit model card

Model Card for ravialdy/llama2-javanese-chat

This model is a fine-tuned version of NousResearch's LLaMA-2-7b-chat-hf, specifically adapted for the Javanese language (Basa Jawa). It is trained to function as a chatbot, responding fluently and accurately in Javanese. The model was fine-tuned using a translated Javanese dataset, with the intention to enhance the presence of Javanese language in the field of language models and chatbot technology.

Training procedure

The model was fine-tuned on a dataset translated into Javanese language using the NLLB model. The translated dataset includes texts from OASST1 and OASST2, covering a wide range of conversational contexts. The training utilized multi-GPU setups with DeepSpeed, TRL, and LoRA PEFT to enable efficient and fast fine-tuning.

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: fp4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float32

Framework versions

  • PyTorch 2.1.0
  • DeepSpeed (version used for training)
  • PEFT 0.6.2
  • Transformers (version used for training)

Model Usage

The model is designed for use as a conversational chatbot in Javanese language. It can be deployed for various applications requiring natural language understanding and generation in Javanese. The model can be interacted with using the typical Hugging Face Transformers pipeline for text generation.

Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ravialdy/llama2-javanese-chat

Adapter
(313)
this model