LLM Compiler Collection Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated 12 days ago • 137
Probably DPO datasets Collection A collection of datasets that probably support DPO • 146 items • Updated 13 days ago • 8
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 67
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Paper • 2312.08935 • Published Dec 14, 2023 • 4
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1 • 59
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities Paper • 2406.09406 • Published 26 days ago • 12
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 74
SciRIFF Collection Data and models to enhance instruction-following for scientific literature understanding. • 9 items • Updated 26 days ago • 4
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published 27 days ago • 48
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets Paper • 2404.05623 • Published Apr 8 • 3
K2 Collection K2-65B is a fully reproducible LLM outperforming Llama 2 70B using 35% less compute. • 7 items • Updated 27 days ago • 6
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24 • 52
MAmmoTH2 Collection Scaling up instruction data from the web for to build better LLMs • 11 items • Updated May 26 • 7
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency Paper • 2304.02721 • Published Apr 5, 2023 • 3
CommonCanvas Collection Collection of models trained on the CommonCatalogue datasets • 8 items • Updated May 16 • 6
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 8 items • Updated May 16 • 13
MADLAD-400 Collection Models and spaces for MADLAD-400: A Multilingual And Document-Level Large Audited Dataset • 8 items • Updated Nov 14, 2023 • 5
Chronos Models & Datasets Collection Chronos: Pretrained (language) models for time series forecasting based on the T5 architecture. • 8 items • Updated 12 days ago • 26
Speaker Diarization Datasets Collection A collection of speaker diarization datasets compatible with Diarizers. • 6 items • Updated May 29 • 1
End-to-end speaker segmentation for overlap-aware resegmentation Paper • 2104.04045 • Published Apr 8, 2021 • 1
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation Paper • 2210.13248 • Published Oct 24, 2022 • 1
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • 10 days ago • 29
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 20 items • Updated 10 days ago • 145
llama 3 self-align experiments Collection Replicating the pipeline for StarCoder-2 Instruct on Llama-3-8B with some tweaks https://huggingface.co/blog/sc2-instruct • 4 items • Updated May 9 • 6
Community Tools Collection Cool HF tools that I and others at HF work on that I regularly use • 4 items • Updated May 21 • 3
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval Paper • 2311.05800 • Published Nov 10, 2023 • 3
🦢SWIM-IR Dataset Collection 29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs. • 4 items • Updated Apr 28 • 7
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits Paper • 2305.02547 • Published May 4, 2023 • 7
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated 27 days ago • 26
Generalizable Face Landmarking Guided by Conditional Face Warping Paper • 2404.12322 • Published Apr 18 • 1
Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning Paper • 2404.12897 • Published Apr 19 • 1
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 26
Antidote Project Collection Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote) • 20 items • Updated May 6 • 5
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 3
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 18