pszemraj commited on
Commit
6d4035b
1 Parent(s): ba3e462

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -214,18 +214,17 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
214
 
215
  > alternative section title: how to get this monster to run inference on free colab runtimes
216
 
217
- Per [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models. Per **initial tests** the summarization quality seems to hold while using _significantly_ less memory! \*
218
 
219
- How-to: basically make sure you have pip-installed the **latest GitHub repo main** version of `transformers`, and also the `bitsandbytes` package.
220
-
221
- install the latest `main` branch:
222
 
 
223
  ```bash
224
- pip install bitsandbytes
225
- pip install git+https://github.com/huggingface/transformers.git
226
  ```
227
 
228
- load in 8-bit (_voodoo magic-the good kind-completed by `bitsandbytes` behind the scenes_)
229
 
230
  ```python
231
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
@@ -241,9 +240,7 @@ model = AutoModelForSeq2SeqLM.from_pretrained(
241
  )
242
  ```
243
 
244
- The above is already present in the Colab demo linked at the top of the model map.
245
-
246
- Do you like to ask questions? Great. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
247
 
248
  \* More rigorous metrics-based research comparing beam-search summarization with and without LLM.int8 will take place over time.
249
 
 
214
 
215
  > alternative section title: how to get this monster to run inference on free colab runtimes
216
 
217
+ Via [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models.
218
 
219
+ - per **initial tests** the summarization quality seems to hold while using _significantly_ less memory! \*
220
+ - a version of this model quantized to int8 is [already on the hub here](https://huggingface.co/pszemraj/long-t5-tglobal-xl-16384-book-summary-8bit) so if you're using the 8-bit version anyway, you can start there for a 3.5 gb download only!
 
221
 
222
+ First, make sure you have the latest versions of the relevant packages:
223
  ```bash
224
+ pip install -U transformers bitsandbytes accelerate
 
225
  ```
226
 
227
+ load in 8-bit (_magic completed by `bitsandbytes` behind the scenes_)
228
 
229
  ```python
230
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 
240
  )
241
  ```
242
 
243
+ The above is already present in the Colab demo linked at the top of the model card.
 
 
244
 
245
  \* More rigorous metrics-based research comparing beam-search summarization with and without LLM.int8 will take place over time.
246