155 97 313

Alvaro Bartolome

alvarobartt

https://alvarobartt.me

AI & ML interests

☁️ cloud machine learning @huggingface and open source passionate

Articles

💨 Introducing Notus: a DPO fine-tune of Zephyr with a focus on high-quality data

Dec 1, 2023

• 1

🤗 LLM suggestions in Argilla with HuggingFace Inference Endpoints

Sep 20, 2023

Organizations

alvarobartt's activity

posted an update about 2 months ago

Post

2434

🤗 Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai

posted an update 5 months ago

Post

2694

🔥 Prometheus 2 was recently released by Kaist AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!

prometheus-eval/prometheus-7b-v2.0
prometheus-eval/prometheus-8x7b-v2.0

🌬️Fine-tuned on top of mistralai/Mistral-7B-Instruct-v0.2 and mistralai/Mixtral-8x7B-Instruct-v0.1
🗂️The datasets used for fine-tuning have been publicly released i.e. prometheus-eval/Feedback-Collection and prometheus-eval/Preference-Collection
🤝🏻Unified LM evaluator for absolute (a single prompt-completion pair) and relative (two completions for a given prompt) due to model merging
❌No longer needs a mandatory reference / golden answer, but can still be provided optionally
🔝Surpasses the former version of Prometheus, and has a high correlation with human, GPT-4, and Claude 3 Opus scores when evaluating LMs
📝Apache 2.0 license

Long-story short, an amazing job from Kaist AI bridging the gap with LLM evaluators other than proprietary and bigger models!

This week at Argilla, we decided to add a new task to use Prometheus 2 as an LLM evaluator using distilabel, so we implemented PrometheusEval.

😱 Using PrometheusEval running their 7B variant with vLLM in a single L40 on top of HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!

Find the generated dataset and the code at distilabel-internal-testing/instruction-dataset-prometheus

1 reply

posted an update 6 months ago

Post

2751

🦫 We have just released argilla/Capybara-Preferences in collaboration with Kaist AI ( @JW17 , @nlee-208 ) and Hugging Face ( @lewtun )

A new synthetic preference dataset built using distilabel on top of the awesome LDJnr/Capybara from @LDJnr

The current dataset combines the already generated alternative completions from argilla/distilabel-capybara-dpo-7k-binarized, while also adding the remaining ones using the same approach!

Here are some key features on how we built it:

- 🧹 Duplicate removal, keeping the conversation besides the last assistant response, and some slight pre-processing

- 🤖 Generation of alternative completions for the existing conversations (last turn only) with: mlabonne/NeuralBeagle14-7B, argilla/notus-7b-v1, and teknium/OpenHermes-2.5-Mistral-7B

- 👨🏻‍🏫 Running UltraFeedback via GPT-4 to generate the critique i.e. ratings and rationales, for the last assistant responses

- 🎉 Finally, we selected the chosen and rejected responses based on their UltraFeedback score, and applied some slight post-processing!

Sounds simple right? Start building your own synthetic datasets with https://github.com/argilla-io/distilabel already!

replied to davanstrien's post 7 months ago

Also to mention that there's a v2 of the dataset ahead and anyone can contribute via https://huggingface.co/spaces/DIBT/prompt-collective signing in with their HuggingFace Hub account! 🤗

P.S. Thanks everyone for the great community effort already done!

replied to akhaliq's post 8 months ago

indeed https://huggingface.co/CohereForAI/aya-101 and https://huggingface.co/datasets/CohereForAI/aya_dataset

replied to their post 9 months ago

Fair, maybe you can go ahead and merge the PEFT adapter into the model, as otherwise you may be getting those OOM, anyway, feel free to report that in the community tab.

Also w.r.t. prompting, the original model was fine-tuned for instruction following, but concatenating the messages can be used for chat applications, so yes, after the system and user prompt the assistants response goes there and then the upcoming turns are appended to that string using the same [INST] notation.

replied to their post 9 months ago

Indeed we’re serving this via TGI, so that doesn’t seem to be the issue, with Inference Endpoint do you mean HuggingFace’s service right? Could you let me know which GPUs are you using?

replied to their post 9 months ago

Hey sorry to hear that, is it due to the resources required to run it right? Maybe you can ask HuggingFace for a quota increase so that you can allocate the required resources 🤗 Let me know if there's anything I can help you with!

replied to abhishek's post 9 months ago

@Sylvestre 🐐

replied to abhishek's post 9 months ago

Could that be related to https://huggingface.co/spaces/social-post-explorers/README/discussions/15? Thanks for the quick update BTW

replied to abhishek's post 9 months ago

Hi @abhishek , I think the link you pasted in the post is broken! Did you mean https://huggingface.co/blog/stefan-it/autotrain-flair-mobie? (it's the only blog I saw posted within the 🤗 Hub from @stefan-it )

replied to KnutJaegersberg's post 9 months ago

In fact, to add more context, the authors mentioned that they will release some more content in the upcoming revision of the paper which is nice, because that would imply that anyone could run a faithful reproduction of their synthetic data generation process. See the reply from the authors at https://huggingface.co/papers/2401.00368#65978d195f689f3f0b2caeb9.

Also worth mentioning that @andersonbcdefg ran both stages:

Task definition generation at https://huggingface.co/datasets/andersonbcdefg/synthetic_retrieval_tasks
query-pos-neg triplets generation at https://huggingface.co/datasets/andersonbcdefg/synthetic_tuples_gpt35_turbo

(Unsure if the reproduction of the second stage is faithful to the original, but asked them at https://twitter.com/alvarobartt/status/1742839431881490717, anyway I think we may need to wait for the authors to share the full details on the prompting strategies for the generation).

posted an update 9 months ago

Post

💬 Notux 8x7b has already its own Chat UI running on 🤗 Spaces! Feel free to give it a try and chat with Notux, and let us know how it goes.

https://huggingface.co/spaces/argilla/notux-chat-ui

Kudos to @gabrielmbmb !

replied to their post 9 months ago

MT-Bench is on par with mistralai/Mixtral-8x7B-Instruct-v0.1 which means ~8.3, and we didn't run AlpacaEval yet

replied to their post 9 months ago

We may need to discuss that internally, but could be something we may consider for opening 2024 🤗

replied to their post 9 months ago

Yes, indeed for anyone wondering, a bit more into detail, Mixtral-8x7B-Instruct-v0.1 was fine-tuned using SFT + DPO (read more about it at https://mistral.ai/news/mixtral-of-experts/) and we ran the DPO fine-tuning on top of it, but using data from UltraFeedback, in particular argilla/ultrafeedback-binarized-preferences-cleaned, but using a different binarization approach and applying some data cleaning, but essentially following the same approach as @HuggingFaceH4 did with zephyr-7b-beta.

So DPO ^ 2 is fair! 😄

replied to their post 9 months ago

Also thanks to @osanseviero for granting me access for the posts private beta! 🦙

posted an update 9 months ago

Post

💨 Notux 8x7b was just released!

From Argilla, we recently fine-tuned Mixtral 8x7b Instruct from Mistral AI using DPO, and a binarized and curated version of UltraFeedback, to find out it outperforms every other MoE-based model on the Hub.

- argilla/notux-8x7b-v1
- argilla/ultrafeedback-binarized-preferences-cleaned

19 replies

Alvaro Bartolome

AI & ML interests

Articles

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

🧑‍⚖️ "Replacing Judges with Juries" using distilabel

Deploying 🤗 Hub models in Vertex AI

🏷️ Build AI Feedback (AIF) datasets for LLM alignment with ⚗️ distilabel

💨 Introducing Notus: a DPO fine-tune of Zephyr with a focus on high-quality data

🤗 LLM suggestions in Argilla with HuggingFace Inference Endpoints

Organizations

alvarobartt's activity