1138 53 467

Lewis Tunstall PRO

lewtun

https://lewtun.github.io/blog/

_lewtun

lewtun

AI & ML interests

LLMs, LLMs, LLMs

Articles

Organizations

lewtun's activity

upvoted an article 27 days ago

Article

Putting RL back in RLHF

27 days ago

• 53

upvoted an article about 1 month ago

Article

Space secrets security update

May 31

• 50

upvoted an article about 2 months ago

Article

Hugging Face x LangChain : A new partner package in LangChain

May 14

• 87

upvoted a paper 2 months ago

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29 • 67

upvoted 7 papers 3 months ago

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22 • 6

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Paper • 2402.09844 • Published Feb 15 • 20

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 52

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Paper • 2404.10719 • Published Apr 16 • 3

From r to Q^*: Your Language Model is Secretly a Q-Function

Paper • 2404.12358 • Published Apr 18 • 2

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 59

upvoted an article 3 months ago

Article

Constitutional AI with Open LLMs

Feb 1

• 5

upvoted 2 papers 3 months ago

More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 47

InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26 • 26

upvoted 9 papers 4 months ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2 • 11

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

Paper • 2402.14804 • Published Feb 22 • 2

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Paper • 2403.07816 • Published Mar 12 • 37

Aligning Large Language Models by On-Policy Self-Judgment

Paper • 2402.11253 • Published Feb 17 • 2

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

Paper • 2402.19255 • Published Feb 29 • 1

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Paper • 2402.18334 • Published Feb 28 • 12

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

Paper • 2402.19450 • Published Feb 29 • 3

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22 • 6

upvoted 5 papers 5 months ago

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 27

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

Paper • 2402.04833 • Published Feb 7 • 6

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 33

The False Promise of Imitating Proprietary LLMs

Paper • 2305.15717 • Published May 25, 2023 • 5

upvoted 7 papers 6 months ago

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Paper • 2401.04056 • Published Jan 8 • 2

Possible Meissner effect near room temperature in copper-substituted lead apatite

Paper • 2401.00999 • Published Jan 2 • 5

R-Tuning: Teaching Large Language Models to Refuse Unknown Questions

Paper • 2311.09677 • Published Nov 16, 2023 • 3

Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 3

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Paper • 2312.08935 • Published Dec 14, 2023 • 4

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

Paper • 2312.16682 • Published Dec 27, 2023 • 5

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Paper • 2312.15685 • Published Dec 25, 2023 • 16

upvoted a collection 6 months ago

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated 26 days ago • 194

upvoted 2 papers 7 months ago

A General Theoretical Paradigm to Understand Learning from Human Preferences

Paper • 2310.12036 • Published Oct 18, 2023 • 11

Data Diversity Matters for Robust Instruction Tuning

Paper • 2311.14736 • Published Nov 21, 2023 • 2

upvoted a collection 7 months ago

Papers We've Read

Collection

Papers discussed in the H4 journal club • 3 items • Updated Apr 12 • 8

upvoted a paper 7 months ago

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Paper • 2311.16079 • Published Nov 27, 2023 • 19

upvoted a collection 8 months ago

Hallucination

Collection

14 items • Updated 29 days ago • 4

upvoted 2 papers 8 months ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 176

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Paper • 2311.10702 • Published Nov 17, 2023 • 17

upvoted a collection 8 months ago

zephyr story

Collection

sources mentioned by hf.co/thomwolf tweet: x.com/Thom_Wolf/status/1720503998518640703 • 8 items • Updated Jan 24 • 15

upvoted a paper 9 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 119

upvoted a collection 9 months ago

Zephyr 7B

Collection

Models, datasets, and demos associated with Zephyr 7B. For code to train the models, see: https://github.com/huggingface/alignment-handbook • 9 items • Updated Apr 12 • 142

upvoted 4 papers 9 months ago

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 51

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 39

Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 12

A General Language Assistant as a Laboratory for Alignment

Paper • 2112.00861 • Published Dec 1, 2021 • 2

upvoted a collection 9 months ago

Awesome RLHF

Collection

A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). • 11 items • Updated Oct 2, 2023 • 7

upvoted 2 papers about 1 year ago

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 53

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 30

Lewis Tunstall PRO

AI & ML interests

Articles

Welcome Gemma 2 - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Mixture of Experts Explained

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

Fine-tuning Llama 2 70B using PyTorch FSDP

Code Llama: Llama 2 learns to code

Llama 2 is here - get it on Hugging Face

Can foundation models label data like humans?

The Falcon has landed in the Hugging Face ecosystem

Creating a Coding Assistant with StarCoder

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Red-Teaming Large Language Models

Diffusion Models Live Event

Very Large Language Models and How to Evaluate Them

SetFit: Efficient Few-Shot Learning Without Prompts

Announcing Evaluation on the Hub

Organizations

lewtun's activity

Putting RL back in RLHF

Space secrets security update

Hugging Face x LangChain : A new partner package in LangChain

Constitutional AI with Open LLMs