Advanced RAG: Fine-Tune Embeddings from HuggingFace for RAG

Community Article Published July 5, 2024

What is BeyondLLM?

Building Advanced RAG Pipelines with HuggingFace
Step-by-Step Guide to Fine-Tuning Embeddings

1. Import Required Libraries

2. Load and Prepare Data

3. Initialize the Language Model

4. Fine-Tune the Embeddings

5. Load the Fine-Tuned Model

6. Set Up the Retriever

7. Retrieve Information

8. Generate Responses with RAG

Why BeyondLLM?

Conclusion
Co-Author: Shivaya Pandey

What is BeyondLLM?

BeyondLLM offers a user-friendly library that prioritizes flexibility for Data Scientists. BeyondLLM simplifies the construction of complex RAG pipelines with minimal coding and enhances the evaluation process with comprehensive benchmarks like Context Relevance, Answer Relevance, Groundedness, and Ground Truth. These metrics assess everything from the retriever’s ability to fetch relevant information to the LLMs' response accuracy and factual truth, all streamlined within the framework which also automates quick experimentation.

Large Language Models have demonstrated remarkable potential, offering a wide range of features for diverse applications. The rapid growth of LLMs in open-source libraries enables the development of various tools such as RAG applications, LLM evaluation, fine-tuning, observability, and more. BeyondLLM is designed to streamline the development of RAG and LLM applications, complete with evaluations, all in just 5–7 lines of code.

Building Advanced RAG Pipelines with HuggingFace

Step-by-Step Guide to Fine-Tuning Embeddings

Before we dive into the code, make sure you have the necessary libraries installed. You can install them using the following commands:

!pip install beyondllm
!pip install llama-index-finetuning
!pip install llama-index-embeddings-huggingface

1. Import Required Libraries

First, import the necessary libraries and set up the environment.

from beyondllm import source, retrieve, llms, generator
from beyondllm.embeddings import FineTuneEmbeddings
import os
from getpass import getpass

os.environ['GOOGLE_API_KEY'] = getpass("API key:")

2. Load and Prepare Data

Load your data from a specified path and prepare it for processing. The data should be in PDF format for this example, and it will be chunked appropriately.

data = source.fit(path="my_cv.pdf", dtype="pdf", chunk_size=1024, chunk_overlap=0)

3. Initialize the Language Model

Initialize the language model from beyondllm. Here, we are using the GeminiModel.

llm = llms.GeminiModel()

4. Fine-Tune the Embeddings

Why Fine-Tuning Embeddings?

Fine-tuning embeddings with your specific dataset enhances the contextual understanding of the language model, making it better suited for retrieving and generating relevant information. This process adapts the embeddings to the nuances and domain-specific language present in your data, thereby improving the accuracy and relevance of RAG pipelines.

Fine-tune the embeddings using the FineTuneEmbeddings class. This involves training the embeddings on your specific dataset to better capture the nuances of your data.

To fine-tune embeddings, the LLM generates a dataset of pairs from the given data. This process is essential for training the model. The model used in this case is the BAAI/bge-small-en-v1.5, an open-source embedding model from HuggingFace.

fine_tuned_model = FineTuneEmbeddings()
embed_model = fine_tuned_model.train(
    ["my_cv.pdf"], 
    "BAAI/bge-small-en-v1.5", 
    llm, 
    "finetune"
)

Note: In the training function, we declare a path where the model needs to be saved. For our case, this path is set to finetune. This same path variable is also used to load the fine-tuned embedding model.

5. Load the Fine-Tuned Model

HuggingFace embeddings now are updated, so we will now use that in our retrieval and generation pipeline.

embed_model = fine_tuned_model.load_model("finetune")

6. Set Up the Retriever

Set up the retriever using the fine-tuned embeddings. The retriever is responsible for fetching relevant documents based on the query.

retriever = retrieve.auto_retriever(data, embed_model, type="normal", top_k=4)

7. Retrieve Information

You can now use the retriever to fetch relevant information. For example, to retrieve information about "Tarun's role at AI Planet":

query = "what is Tarun's role in AI planet"
result = retriever.retrieve(query)[0].text
print(result)

8. Generate Responses with RAG

Finally, set up the generation pipeline using the retriever and the language model. This pipeline will use the retrieved information to generate a coherent response.

Evaluating RAG Performance

The evaluation metrics for the RAG triad assess the effectiveness of your pipeline. These metrics include Context Relevance, Answer Relevance, Groundedness, and Ground Truth. They collectively measure how well the system retrieves, understands, and generates responses that are accurate, factual, and contextually appropriate.

pipeline = generator.Generate(
    question=query,
    retriever=retriever,
    llm=llm
)

# Generate the response
response = pipeline.call()
print(response)

# Get evaluation metrics for the RAG triad

evals = pipeline.get_rag_triad_evals()
print(evals)

Why BeyondLLM?

Easily Build a Model with Minimal Code
BeyondLLM allows quick experimentation with RAG applications by automating most integration tasks. With components like source and auto_retriever, the framework simplifies the development process.
Flexible Evaluation of Embeddings and LLMs
BeyondLLM supports multiple LLMs for evaluating both LLMs and embeddings, providing metrics like Hit rate and MRR (Mean Reciprocal Rank) for embeddings, and four evaluation metrics for LLMs.
Advanced Techniques to Reduce LLM Hallucinations
BeyondLLM incorporates features like Markdown splitter, chunking strategies, re-ranking, and Hybrid Search to minimize hallucinations, making RAG applications more reliable.

Conclusion

By following these steps, you can fine-tune embeddings from HuggingFace to create a powerful RAG pipeline tailored to your specific dataset. This approach enhances the retrieval and generation capabilities of your models, leading to more accurate and contextually relevant responses. Whether you are working on a complex AI project or just exploring the capabilities of RAG, this guide provides a solid foundation for advanced applications.

Co-Author: Shivaya Pandey

Get Started with BeyondLLM
For more details and to get started, check out the documentation: BeyondLLM Documentation
Quickstart Guide Colab Notebook: Quickstart Guide

Open your PR here: BeyondLLM GitHub Repository

Don’t forget to ⭐️ and fork the repository!

Upvote