What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation Paper • 2406.16320 • Published 15 days ago • 1
view article Article Financial Analysis with Langchain and CrewAI Agents By herooooooooo • 9 days ago • 4
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models Paper • 2406.19999 • Published 11 days ago • 3
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting Paper • 2406.00053 • Published May 28 • 1
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published 11 days ago • 3
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP Paper • 2406.12618 • Published 21 days ago • 5
Multi-property Steering of Large Language Models with Dynamic Activation Composition Paper • 2406.17563 • Published 14 days ago • 4
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation Paper • 2406.13663 • Published 20 days ago • 7
Estimating Knowledge in Large Language Models Without Generating a Single Token Paper • 2406.12673 • Published 21 days ago • 7
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Paper • 2406.09519 • Published 26 days ago • 1
Modello Italia - iGenius Collection An unofficial collection of Italian LLMs developed by iGenius. • 2 items • Updated Jun 7 • 6
IrokoBench Collection a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated May 31 • 15
Calibrating Reasoning in Language Models with Internal Consistency Paper • 2405.18711 • Published May 29 • 6
Emergence of a High-Dimensional Abstraction Phase in Language Transformers Paper • 2405.15471 • Published May 24 • 2
Learned feature representations are biased by complexity, learning order, position, and more Paper • 2405.05847 • Published May 9 • 2
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Paper • 2402.17700 • Published Feb 27 • 1
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models Paper • 2405.12522 • Published May 21 • 2
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Paper • 2405.12241 • Published May 17 • 1
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks Paper • 2405.10928 • Published May 17 • 1
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability Paper • 2405.10927 • Published May 17 • 3
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control Paper • 2405.08366 • Published May 14 • 2
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation Paper • 2203.03759 • Published Mar 7, 2022 • 5
Wikimedia Datasets Collection Wikimedia datasets, across languages and modalities, from different Wikimedia projects, on the hub. Not all tested. • 19 items • Updated May 16 • 9
A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published Apr 30 • 10
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation Paper • 2404.07129 • Published Apr 10 • 3
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models Paper • 2404.07004 • Published Apr 10 • 5
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 86
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 85
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1 • 14
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Paper • 2403.19647 • Published Mar 28 • 3
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms Paper • 2403.17806 • Published Mar 26 • 3
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Paper • 2310.03686 • Published Oct 5, 2023 • 3
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 14
Information Flow Routes: Automatically Interpreting Language Models at Scale Paper • 2403.00824 • Published Feb 27 • 3
AtP*: An efficient and scalable method for localizing LLM behaviour to components Paper • 2403.00745 • Published Mar 1 • 8
LiT5 Collection Linguistically-Informed T5 models from the LREC-COLING paper "Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)". • 6 items • Updated Feb 28 • 2
CausalGym: Benchmarking causal interpretability methods on linguistic tasks Paper • 2402.12560 • Published Feb 19 • 3
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking Paper • 2402.14811 • Published Feb 22 • 4
Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation Paper • 2402.13331 • Published Feb 20 • 2
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space Paper • 2402.12865 • Published Feb 20 • 1
In-Context Learning Demonstration Selection via Influence Analysis Paper • 2402.11750 • Published Feb 19 • 2
⛔️🔦 Provenance, Watermarking & Deepfake Detection Collection Technical tools for more control over non-consensual synthetic content • 14 items • Updated Apr 1 • 37
Recovering the Pre-Fine-Tuning Weights of Generative Models Paper • 2402.10208 • Published Feb 15 • 7
SyntaxShap: Syntax-aware Explainability Method for Text Generation Paper • 2402.09259 • Published Feb 14 • 2