16 5 183

Sourab Mangrulkar

smangrul

https://www.linkedin.com/in/sourab-m/

pacman100

AI & ML interests

Machine Learning, Deep Learning, Natural Language Processing, Natural Language Generation, Computer Vision, Reinforcement Learning

Articles

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 51

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Mar 9, 2023

• 17

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Feb 10, 2023

• 23

Accelerate Large Model Training using DeepSpeed

Jun 28, 2022

Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel

May 2, 2022

• 1

Organizations

Posts 6

Post

2558

Unlocking the Power of locally running Llama-3 8B Model Agents with Chat-UI! 🔥🚀✨

I'm thrilled to share my hackathon-style side project:
1. Finetuning Llama-8B for function calling using PEFT QLoRA as the instruct Llama-3 model doesn't support this. The colab notebook for it is here: https://lnkd.in/ggJMzqh2. 🛠️
2. Finetuned model along with the 4-bit quants here: https://lnkd.in/gNpFKY6V ✨
3. Clone Hugging Face https://lnkd.in/gKBKuUBQ and make it compatible for function calling by building upon the PR https://lnkd.in/gnqFuAd4 for my model and local inferencing usecase using Ollama. This was a steep learning curve wherein I stayed awake the whole night to get it working. 💪🏽
4. Above, I used SerpAPI for web browsing and Mongo DB Atlas free tier for persistence of conversations and assistant configs. 🔎
5. More work is required to switch between using tools and responding directly wherein I see the model breaks. 🧐

How cool is this wherein we are approaching experience akin to ChatGPT while using local hosted agent model running on your laptop! 💻

Post

2288

🤗 PEFT v0.10.0 release! 🔥🚀✨

Some highli📝ghts:
1. FSDP+QLoRA and DeepSpeed Stage-3+QLoRA
2. Layer expansion + LoRA
3. DoRA support for Conv2D layers and quantized bitsandbytes layers
4. New LoftQ utility
5. Batched inference for mixed LoRA adapters.

http://Answer.AI team in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost https://lnkd.in/g6jgfXyv. This is now integrated into Hugging Face ecosystem.

For an end-to-end example on FSDP+QLoRA, please refer https://lnkd.in/gT3yY-Rx.

For an end-to-end example on DeepSpeed Stage-3+QLoRA, please refer https://lnkd.in/gkt-xZRE.

With the PR https://lnkd.in/g5F348MN these changes are now upstreamed in https://lnkd.in/g5_MxYtY thanks to Wing Lian ! 🚀

Kudos to http://Answer.AI team, Titus von Köller , Younes Belkada, Benjamin Bossan and Zachary Mueller for all the help without which this couldn't have been possible. 🤗

For efficient depthwise layer expansion akin to passthrough method of mergekit but without using additional memory and attaching LoRAs to it, refer to the details below! 🔥https://lnkd.in/ge95ztjA

Now DoRA is supported for Conv2D layers as well as bitsandbytes quantized layers ✨. For more details, please refer the below thread.
https://lnkd.in/gsJbuWPD

Now you can mix different LoRA adapters in a batch during inference which speeds-up the inference by avoiding computation of base model multiple times which would be the case for adaptive inference with batch_size=1! ⚡️.
Details below. https://lnkd.in/gD-pcX_B

LoftQ reduces quantization error by appropriately initializing the LoRA adapter weights. Normally, this is a two-step process. Benjamin Bossan
added new util replace_lora_weights_loftq for LoftQ to use it on the fly with bnb.

For more details, refer to the release notes. 📝
https://lnkd.in/gg7-AmHA. As always, make sure losses go down and be happy to watch your model train!

View all posts