anakin87 (Stefano Fiorucci)

posted an update 11 days ago

Post

1484

🕵🏻 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐀𝐆 𝐰𝐢𝐭𝐡 🦙 𝐋𝐥𝐚𝐦𝐚 3.2

I was excited to explore Llama 3.2, but as a simple 🇪🇺 EU guy, I don't have access to Meta's multimodal models 😿

🤔 So I thought: why not challenge the small 3B text model with Agentic RAG?

🎯 The plan:
- Build a system that tries to answer questions using a knowledge base.
- If the documents don't contain the answer, use Web search for additional context.

Check out my experimental notebook here: 📓 https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/llama32_agentic_rag.ipynb

My stack:
🏗️ haystack (https://haystack.deepset.ai/): open-source LLM orchestration framework
🦙 meta-llama/Llama-3.2-3B-Instruct
🦆🌐 free DuckDuckGo API, integrated with Haystack

✨ 𝘛𝘩𝘦 𝘳𝘦𝘴𝘶𝘭𝘵𝘴? 𝘌𝘯𝘤𝘰𝘶𝘳𝘢𝘨𝘪𝘯𝘨 - 𝘢 𝘧𝘦𝘸 𝘮𝘰𝘯𝘵𝘩𝘴 𝘢𝘨𝘰, 𝘵𝘩𝘪𝘴 𝘭𝘦𝘷𝘦𝘭 𝘰𝘧 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦 𝘧𝘳𝘰𝘮 𝘢 𝘴𝘮𝘢𝘭𝘭 𝘮𝘰𝘥𝘦𝘭 𝘸𝘰𝘶𝘭𝘥'𝘷𝘦 𝘣𝘦𝘦𝘯 𝘶𝘯𝘵𝘩𝘪𝘯𝘬𝘢𝘣𝘭𝘦!
This probably reflects the impressive IFEval score of the model (comparable to Llama 3.1 8B).

posted an update about 1 month ago

Post

1045

𝐌𝐲 𝐟𝐢𝐫𝐬𝐭 𝐜𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 𝐚𝐫𝐭𝐢𝐜𝐥𝐞! 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐯𝐞 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐒𝐩𝐞𝐜𝐭𝐫𝐮𝐦 🎯

Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
📔 👣 https://huggingface.co/blog/anakin87/spectrum

---

Looking to fine-tune Language Models efficiently and save on computational resources?

One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.

However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.

What if we could identify the most informative layers and only fine-tune those? 🤔

This is exactly what Spectrum does! 👇

🔬 Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)

🎯 Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).

You can then ❄️ freeze the rest of the model and focus your 🏋️‍♂️ training on the chosen layers.

🏆 Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...

---

For a practical guide, check out the article above.

replied to their post about 1 month ago

This comment has been hidden

posted an update about 1 month ago

Post

1588

💬 🇮🇹 Phi 3.5 mini ITA: a Small Language Model for Italian

Lately, I've spent some time fine-tuning language models.

Now I am happy to release Phi 3.5 mini ITA: a fine-tuned version of Phi-3.5-mini-instruct to improve performance on the Italian language

🔹 Small (3.82 B parameters) but capable model
🔹 128k context length

Chat with it on 🤗 Spaces: anakin87/Phi-3.5-mini-ITA
Model card: anakin87/Phi-3.5-mini-ITA

🗃️ Data
Supervised fine-tuning using a good mix of English and Italian data:
- mlabonne/FineTome-100k by @mlabonne
- efederici/capybara-claude-15k-ita by @efederici
🙏 Thanks to the authors for the datasets.

🎯 Targeted training with Spectrum
I used Spectrum, a relatively new technique for parameter-efficient learning.
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ❄️ freeze the rest.
I trained the top 30% of model layers.

📝 Spectrum paper: https://arxiv.org/abs/2406.06623

📊 Vibe check and performance on Italian benchmarks seem encouraging

2 replies

·

replied to grimjim's post 3 months ago

Nice!

Have a look at my rap model (built with the same approach as MopeyMule): https://huggingface.co/anakin87/yo-Llama-3-8B-Instruct

posted an update 3 months ago

Post

1029

How to alter the behavior of a Language Model without fine-tuning or prompting? Say hello to 🎤 yo-Llama 🦙!

Model anakin87/yo-Llama-3-8B-Instruct

This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. 😎

𝐖𝐡𝐚𝐭 𝐬𝐩𝐚𝐫𝐤𝐞𝐝 𝐭𝐡𝐢𝐬 𝐢𝐝𝐞𝐚?

Lately, I got interested in mechanistic interpretability of LLMs.

💡 A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.

Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.

𝐇𝐨𝐰 𝐝𝐢𝐝 𝐈 𝐜𝐫𝐞𝐚𝐭𝐞 𝐲𝐨-𝐋𝐥𝐚𝐦𝐚?
(📓 notebook in the HF repository, heavily inspired by Failspy's work)

1️⃣ Load the Llama-3-8B-Instruct model.
2️⃣ Load 1024 examples from Alpaca (instruction dataset).
3️⃣ Prepare a system prompt to make the original model act like a rapper.
4️⃣ Run inference on the examples, with and without the system prompt, and cache the activations.
5️⃣ Compute the rap feature directions (one for each layer) from the activations.
6️⃣ Apply the feature directions one by one, checking the results on some examples.
7️⃣ Pick the best-performing feature direction.
8️⃣ Apply this feature direction and voilà!
yo-Llama-3-8B-Instruct is born! 🥳🎶

This was a fun experiment.

📚 Resources

Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717

Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration

Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule

posted an update 3 months ago

Post

1649

🌌 Creating adventures with local LLMs

What if 🤔... Homer Simpson met Spider-Man and they went on a quest for donuts? 🍩
Or if Fred Astaire and Corporal Hicks teamed up to fight xenomorphs? 👾

In the words of Karpathy, LLMs are dream machines...
they seem specially made to simulate these wild scenarios!

𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐭𝐡𝐢𝐬 𝐢𝐝𝐞𝐚 👇
Nous Research / @teknium recently released NousResearch/CharacterCodex:
a massive dataset with information on 16k characters, both fictional and real.
I couldn't wait to play it...

After a few attempts, I found that combining the information in this dataset with a good model (like meta-llama/Meta-Llama-3-8B-Instruct) opens the doors to a myriad of chat adventures.

🛠️ Stack:
🔹Haystack for orchestration 🏗️
🔹llamafile 🦙🗂️ to run our model locally.

📓 Check out the notebook: https://t.ly/y6jrZ
(includes a bonus 🕵️ Mystery Character Quiz)

posted an update 4 months ago

Post

916

🧪 RAG Evaluation with 🔥 Prometheus 2 + Haystack

📝 Blog post: https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2
📓 Notebook: https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb

─── ⋆⋅☆⋅⋆ ───

When evaluating LLMs' responses, 𝐩𝐫𝐨𝐩𝐫𝐢𝐞𝐭𝐚𝐫𝐲 𝐦𝐨𝐝𝐞𝐥𝐬 like GPT-4 are commonly used due to their strong performance.
However, relying on closed models presents challenges related to data privacy 🔒, transparency, controllability, and cost 💸.

On the other hand, 𝐨𝐩𝐞𝐧 𝐦𝐨𝐝𝐞𝐥𝐬 typically do not correlate well with human judgments and lack flexibility.

🔥 Prometheus 2 is a new family of open-source models designed to address these gaps:
🔹 two variants: prometheus-eval/prometheus-7b-v2.0; prometheus-eval/prometheus-8x7b-v2.0
🔹 trained on open-source data
🔹 high correlation with human evaluations and proprietary models
🔹 highly flexible: capable of performing direct assessments and pairwise rankings, and allowing the definition of custom evaluation criteria.

See my experiments with RAG evaluation in the links above.

posted an update 4 months ago

Post

2082

⚙️ Prompt Optimization with Haystack and DSPy

Experimental notebook: 🧪📓 https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prompt_optimization_with_dspy.ipynb

When building applications with LLMs, writing effective prompts is a long process of trial and error. 🔄
Often, if you switch models, you also have to change the prompt. 😩
What if you could automate this process?

💡 That's where DSPy comes in - a framework designed to algorithmically optimize prompts for Language Models.
By applying classical machine learning concepts (training and evaluation data, metrics, optimization), DSPy generates better prompts for a given model and task.

Recently, I explored combining DSPy with the robustness of Haystack Pipelines.

Here's how it works:
▶️ Start from a Haystack RAG pipeline with a basic prompt
🎯 Define a goal (in this case, get correct and concise answers)
📊 Create a DSPy program, define data and metrics
✨ Optimize and evaluate -> improved prompt
🚀 Build a refined Haystack RAG pipeline using the optimized prompt

1 reply

·

posted an update 5 months ago

Post

1277

Do you want to play a game against Llama 3? 🦙🦙🦙

Meet 🧑‍🏫 𝐀𝐮𝐭𝐨𝐐𝐮𝐢𝐳𝐳𝐞𝐫, a new LLM application that you can use for learning or just for fun.

Try it out on Hugging Face Spaces 🤗 deepset/autoquizzer

𝐇𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬
You provide an URL -> A multiple-choice quiz is instantly generated.

🔹 You can play the quiz yourself.

🔹 You can let the LLM play in two different ways
📕 Closed book: the LLM responds only by knowing the general topic and using its parametric knowledge and reasoning abilities.
🔎🌐 Web RAG: for each question, a Google search is done and the top 3 snippets are included in the prompt for the LLM.

𝐒𝐭𝐚𝐜𝐤
🏗️ Haystack LLM framework https://haystack.deepset.ai/
🦙 Llama 3 8B Instruct
⚡ Groq

Original idea: @Tuana

1 reply

·

Stefano Fiorucci

AI & ML interests

Articles

Selective fine-tuning of Language Models with Spectrum

Organizations

anakin87's activity