Edit model card

pszemraj/opt-peter-2.7B

Open In Colab

This model is a fine-tuned version of facebook/opt-2.7b on about 80k WhatsApp/text messages (mine). Please use responsibly :)

Test it out on Google Colab by clicking the button above.

chatdemo

Model description

  • Exploring to see how OPT does in terms of dialogue/conversational applications
  • Seems to do a lot better than GPT-Neo with similar training parameters
  • you can create your own digital clone and deploy it leveraging this repository I am working on.

sharded checkpoint

As this model file is 10+ GB, it can impose some constraints with lower RAM runtimes and/or download speeds. To help with this issue, a sharded checkpoint of this model is available here.

The pszemraj/opt-peter-2.7B-sharded model can be used as a drop-in replacement for this one for all use cases.

Intended uses & limitations

The base model has a custom license that propagates to this one. Most importantly, it cannot be used commercially. Read more here: facebook/opt-2.7b

  • the model is probably too large to use via API here. Use in Python with GPU RAM / CPU RAM > 12 GB, Colab notebook linked above.
    • alternatively, you can message a bot on telegram where I test LLMs for dialogue generation
  • any statements or claims made by this model do not reflect actual claims/statements by me. Keep in mind it is a fine-tuned version of the model on my data, so things from pre-training are also present in outputs.

Training and evaluation data

WhatsApp & iMessage data were parsed using ai-msgbot and then fed as a text dataset to the HF trainer.

Training procedure

Training hyperparameters

SESSION ONE

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 3

SESSION TWO

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 4

Framework versions

  • Transformers 4.19.2
  • Pytorch 1.10.0+cu113
  • Datasets 2.2.2
  • Tokenizers 0.12.1
Downloads last month
7
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/opt-peter-2.7B

Base model

facebook/opt-2.7b
Finetuned
this model