Daniel Vila
AI & ML interests
Articles
Organizations
dvilasuero's activity
Excited to share this space where the community can explore a tiny subset of FinePersonas
argilla/finepersonas
Dataset built with distilabel and Free Serveless endpoints
This is just a first step towards more interesting experiments with FinePersonas, for example can we use it to assess biases in text2image models?
If you have ideas I'd love to hear them in the comments!
Just found this other tweet đ
Involving the one and only
@davanstrien
We'll keep you posted in both channels but I think it will make sense to shift our efforts towards the HF discord very soon
Thanks @clem !!
that's a very good question! We're releasing argilla 2.0 in the coming weeks, and are thinking about a separate package for monitoring and metrics, we'd love to talk about this multimodal metrics!
Weâre embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.
Over the past year, weâve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyrâs learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets
After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, weâre now the same team.
To those of you whoâve been following us, this wonât be a huge surprise, but it will be a big deal in the coming months. This acquisition means weâll double down on empowering the community to build and collaborate on high quality datasets, weâll bring full support for multimodal datasets, and weâll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.
As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.
Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.
Would love to answer any questions you have so feel free to add them below!
Great stuff, I love to see how this is evolving!
Congrats on leading this!!
very cool!
Let's go!!
A recipe to replicate SPIN (Self-Play Fine Tuning) with 30x less data:
đŁïž 50K samples vs 1.8K prompts curated by the 350+ amazing DIBT contributors.
âïž Distillation of Mistral Large instead of OpenAI
đ Open data & code with âïždistilabel
SPIN Paper:
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (2401.01335)
SPIN DIBT Collection with datasets and models:
argilla/dibt-prompt-collective-spin-65ef59062518776024395fc3
Repo:
https://github.com/argilla-io/distilabel-spin-dibt
Joint work with the amazing DIBT community đ
@aashish1904 , @flozi00 , @sayhan , @munish0838 , @0-hero , @dvilasuero , @eren23 , @davanstrien , @ahnz , @BlackKakapo , @kitano-o , @mmhamdy , @sdiazlor , @Stopwolf , @gabrielmbmb , @tculler91 , @plaguss , @ignacioct , @Hugi-R , @davidberenstein1957 , @Korla , @alvarobartt , @Hugs4Llamas , @Sumandora , @nataliaElv , @jfcalvo , @Averill , @steventrouble , @vasilis , @aeros93 , @kayyshf , @thomasgauthier , @jeromebas , @Ameeeee , @ayoubelmhamdi , @TuringsSolutions , @efels , @Haleyok , @abrazador , @emessy , @Nindaleth , @burtenshaw , @vicgalle , @CortexPE , @casey-martin , @Leire-aguirre-eguiluz , @mrfakename , @Portias600kNeurons , @nathaliepett , @Filippo
> Using LLMs to improve other LLMs, at scale!
Built in collaboration with the H4 Hugging Face team, it's a 1M preferences dataset on top of the amazing @teknium 's dataset.
Dataset:
argilla/OpenHermesPreferences
The dataset is another example of open collaboration:
> The H4 team created responses with Mixtral using llm-swarm
> Argilla created responses with NousResearch Hermes-2-Yi-34B using distilabel
> The H4 ranked these responses + original response with PairRM from AllenAI, University of Southern California, Zhejiang University ( @yuchenlin @DongfuTingle and colleagues)
We hope this dataset will help the community's research efforts towards understanding the role of AI feedback for LLM alignment.
We're particularly excited about the ability of filtering specific subsets to improve LLM skills like math or reasoning.
Here's how easy it is to filter by subset:
ds = load_dataset("HuggingFaceH4/OpenHermesPreferences", split="train")
# Get the categories of the source dataset
# ['airoboros2.2', 'CamelAI', 'caseus_custom', ...]
sources = ds.unique("source")
# Filter for a subset
ds_filtered = ds.filter(lambda x : x["source"] in ["metamath", "EvolInstruct_70k"], num_proc=6)
As usual, all the scripts to reproduce this work are available and open to the community!
argilla/OpenHermesPreferences
So fun collab between @vwxyzjn , @plaguss , @kashif , @philschmid & @lewtun !
Open Source AI FTW!
Data is essential for training good AI systems. We believe that the amazing community built around open machine learning can also work on developing amazing datasets together.
To explore how this can be done, Argilla and Hugging Face are thrilled to announce a collaborative project where weâre asking Hugging Face community members to build a dataset consisting of LLM prompts collectively.
What are we doing?
Using an instance of Argilla â a powerful open-source data collaboration tool â hosted on the Hugging Face Hub, we are collecting ratings of prompts based on their quality.
How Can You Contribute?
Itâs super simple to start contributing:
1. Sign up if you donât have a Hugging Face account
2. Go to this Argilla Space and sign in: DIBT/prompt-collective
3. Read the guidelines and start rating prompts!
You can also join the #data-is-better-together channel in the Hugging Face Discord.
Finally, to track the community progress we'll be updating this Gradio dashboard:
DIBT/prompt-collective-dashboard
đ Welcome Distilabel Capybara DPO, a multi-turn, high-quality preference dataset.
argilla/distilabel-capybara-dpo-7k-binarized
Why?
Best closed chat models are built on top of multi-turn dialogue preference data. The OSS community lacks these datasets. This dataset is the first in the series to close this gap.
Is this dataset useful?
To test this dataset, we've built our virtual launching partner:
đ Welcome CapybaraHermes, a preference tuned OpenHermes with increased second turn capabilities on MTBench
argilla/CapybaraHermes-2.5-Mistral-7B
As usual, models are the least important to us. We like to focus on the data. Our mission is to build and share high-quality datasets, sharing our methods in the open so the community can improve upon them.
That's why, we took some time to describe the full methodology on the dataset card, check it out and give us feedback! Data and methods are never perfect!
Finally, this is just a preview version and would love to collaborate with you to add more benchmarking results, what hyperparams work for DPO'ing models, what mix of datasets, etc.
Expect some more datasets in the coming weeks. Let's build the best data for AI, together.
my first name at argilla.io
thanks @BramVanroy , you just pointed out at a clear limitation of all the DPO/RM work we see lately: multilinguality or non-english methods!
We'd love to help there if you are able to share a sample of the dataset
This approach of generating dpo pairs seems to lead to reward hacking, as it becomes easy for the model to quickly exploit patterns in the chosen vs rejected response (even the first words). It happens with the original orca pairs too where it overfits very quickly (see Argilla's version). Besides all the recommendations above I'd try working on the dataset đ you can try PairRM which is cheap to compute and see if it helps you reranking the pairs, unless you're using a pretty bad model for the rejected (which I would discourage)
that's a very cool idea @gblazex and certainly something we one could try to tackle combining Argilla and distilabel!
This could be decomposed into the following steps:
Finding the candidate responses for rewriting: thanks for sharing some examples via screenshots, one could easily use Argilla UI to flag these candidates more easily, as we've done for UltraFeedback. Another option is to just rewrite low rated responses, but I checked and some of the examples you shared got a high score. Another option is simply attempting to improve every responses.
Improving the response: this is very easy to do now that we have the critique text in the dataset. With distilabel one can define a custom text generation task that receives the instruction, the original response, the critique and ask the LLM to provide an improved response. It would be a few lines of code
happy to discuss this further!
đą Dropping our first open dataset and LLM of the year:
đŸMeet distilabel Orca Pairs DPO, an improved version of the now famous dataset from Intel:
argilla/distilabel-intel-orca-dpo-pairs
đïž And a new OpenHermes fine-tune outperforming baselines with 54% less DPO pairs:
https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B
You can use this new dataset for your DPO tuning, just like this:
from datasets import load_dataset
# Instead of this:
# dataset = load_dataset("Intel/orca_dpo_pairs", split="train")
# use this:
dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")
dataset = dataset.filter(
lambda r:
r["status"] != "tie" and
r["chosen_score"] >= 8 and
not r["in_gsm8k_train"]
)
This will reduce the size of the original by 54% while giving you better quality preferences!
What should we build next?
If you want to build something similar, here's an end-to-end colab:
https://colab.research.google.com/drive/1rO1-OlLFPBC0KPuXQOeMpZOeajiwNoMy?usp=sharing
This is my very first post.
I'll use it to share some old news: a math preference dataset for DPO!
I created this dataset some time ago while we were developing distilabel (https://github.com/argilla-io/distilabel).
Some days ago we found out people are actually using it! So I'll use this post to explain how I built it in case it's useful for the community.
1. I used distilabel's SelfInstruct-inspired task to generate instructions about different math topics. I curated the instructions with Argilla (on Spaces!).
2. Then I used a distilabel Pipeline to build a preference dataset using gpt3.5 as generator and gpt4 as labeller. If I recall correctly I used our JudgeLM implementation (see https://distilabel.argilla.io/latest/technical-reference/tasks/#judgelmtask)
(see the screenshot with the dataset in the Argilla UI)
3. Then I just binarized into chosen, rejected pairs and voilĂ :
argilla/distilabel-math-preference-dpo
The funny thing is that I used this to do a second DPO run over Notus-7B. I hoped to see an improvement on math/reasoning skills but it actually improved in STEM and Humanities and did worse on Math đ€Ł .
In conclusion, this dataset was only a quick experiement. I'm happy to see the community found it useful. Data for DPO and fine-tuning are still a mystery, let's unveil these mysteries in 2024 together!
Follow me for the most exciting datasets for LLMs (and maybe some great, small, efficient models). I plan to announce all Argilla open-source work here!
Awesome space đ