Lloro / README.md
leaderboard-pt-pr-bot's picture
Fixing some errors of the leaderboard evaluation results in the ModelCard yaml
7d9f9d2 verified
|
raw
history blame
10.5 kB
metadata
language:
  - pt
license: llama2
library_name: transformers
tags:
  - code
  - analytics
  - analise-dados
  - portugues-BR
datasets:
  - semantixai/Test-Dataset-Lloro
base_model: codellama/CodeLlama-7b-Instruct-hf
model-index:
  - name: LloroV2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: ENEM Challenge (No Images)
          type: eduagarcia/enem_challenge
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 26.03
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BLUEX (No Images)
          type: eduagarcia-temp/BLUEX_without_images
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 29.07
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: OAB Exams
          type: eduagarcia/oab_exams
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 32.53
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Assin2 RTE
          type: assin2
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: f1_macro
            value: 57.19
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Assin2 STS
          type: eduagarcia/portuguese_benchmark
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: pearson
            value: 26.81
            name: pearson
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: FaQuAD NLI
          type: ruanchaves/faquad-nli
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: f1_macro
            value: 43.77
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HateBR Binary
          type: ruanchaves/hatebr
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 68.02
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: PT Hate Speech Binary
          type: hate_speech_portuguese
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 38.53
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: tweetSentBR
          type: eduagarcia-temp/tweetsentbr
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 35.21
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=semantixai/LloroV2
          name: Open Portuguese LLM Leaderboard

Lloro 7B

Lloro-7b Logo

Lloro, developed by Semantix Research Labs , is a language Model that was trained to effectively perform Portuguese Data Analysis in Python. It is a fine-tuned version of codellama/CodeLlama-7b-Instruct-hf, that was trained on synthetic datasets . The fine-tuning process was performed using the QLORA metodology on a GPU V100 with 16 GB of RAM.

Model description

Model type: A 7B parameter fine-tuned on synthetic datasets.

Language(s) (NLP): Primarily Portuguese, but the model is capable to understand English as well

Finetuned from model: codellama/CodeLlama-7b-Instruct-hf

What is Lloro's intended use(s)?

Lloro is built for data analysis in Portuguese contexts .

Input : Text

Output : Text (Code)

Usage

Using Transformers

#Import required libraries
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

#Load Model
model_name = "semantixai/LloroV2"
base_model = AutoModelForCausalLM.from_pretrained(
        model_name,
        return_dict=True,
        torch_dtype=torch.float16,
        device_map="auto",
    )

#Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)


#Define Prompt
user_prompt = "Desenvolva um algoritmo em Python para calcular a média e a mediana dos preços de vendas por tipo de material do produto."
system = "Provide answers in Python without explanations, only the code"
prompt_template = f"[INST] <<SYS>>\\n{system}\\n<</SYS>>\\n\\n{user_prompt}[/INST]"

#Call the model
input_ids = tokenizer([prompt_template], return_tensors="pt")["input_ids"].to("cuda")

            
outputs = base_model.generate(
    input_ids,
    do_sample=True,
    top_p=0.95,
    max_new_tokens=1024,
    temperature=0.1,
    )

#Decode and retrieve Output
output_text = tokenizer.batch_decode(outputs, skip_prompt=True, skip_special_tokens=False)
display(output_text)

Using an OpenAI compatible inference server (like vLLM)

from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1",
)
user_prompt = "Desenvolva um algoritmo em Python para calcular a média e a mediana dos preços de vendas por tipo de material do produto."
completion = client.chat.completions.create(temperature=0.1,frequency_penalty=0.1,model="semantixai/LloroV2",messages=[{"role":"system","content":"Provide answers in Python without explanations, only the code"},{"role":"user","content":user_prompt}])

Params Training Parameters

Params Training Data Examples Tokens LR
7B Pairs synthetic instructions/code 28907 3 031 188 1e-5

Model Sources

Test Dataset Repository: https://huggingface.co/datasets/semantixai/Test-Dataset-Lloro

Model Dates Lloro was trained between November 2023 and January 2024.

Performance

Modelo LLM as Judge Code Bleu Score Rouge-L CodeBert- Precision CodeBert-Recall CodeBert-F1 CodeBert-F3
GPT 3.5 91.22% 0.2745 0.2189 0.7502 0.7146 0.7303 0.7175
Instruct -Base 97.40% 0.2487 0.1146 0.6997 0.6473 0.6713 0.6518
Instruct -FT 97.76% 0.3264 0.3602 0.7942 0.8178 0.8042 0.8147

Training Infos: The following hyperparameters were used during training:

Parameter Value
learning_rate 1e-5
weight_decay 0.0001
train_batch_size 1
eval_batch_size 1
seed 42
optimizer Adam - paged_adamw_32bit
lr_scheduler_type cosine
lr_scheduler_warmup_ratio 0.03
num_epochs 5.0

QLoRA hyperparameters The following parameters related with the Quantized Low-Rank Adaptation and Quantization were used during training:

Parameter Value
lora_r 16
lora_alpha 64
lora_dropout 0.1
storage_dtype "nf4"
compute_dtype "float16"

Experiments

Model Epochs Overfitting Final Epochs Training Hours CO2 Emission (Kg)
Code Llama Instruct 1 No 1 8.1 1.337
Code Llama Instruct 5 Yes 3 45.6 9.12

Framework versions

Library Version
bitsandbytes 0.40.2
Datasets 2.14.3
Pytorch 2.0.1
Tokenizers 0.14.1
Transformers 4.34.0

Open Portuguese LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Average 39.68
ENEM Challenge (No Images) 26.03
BLUEX (No Images) 29.07
OAB Exams 32.53
Assin2 RTE 57.19
Assin2 STS 26.81
FaQuAD NLI 43.77
HateBR Binary 68.02
PT Hate Speech Binary 38.53
tweetSentBR 35.21