qwen_worldmodel / README.md
archit11's picture
Update README.md
72d9631 verified
metadata
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen2.5-0.5B
tags:
  - generated_from_trainer
  - qwen
  - GGUF
  - worldmodel
  - worldbuilding
model-index:
  - name: capybara_finetuned_results3
    results: []
datasets:
  - archit11/worldbuilding

capybara_finetuned_results3

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6542

video demo : (its pretty bad)

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 5
  • training_steps: 800

Training results

Training Loss Epoch Step Validation Loss
15.5311 0.0230 50 14.5422
8.7477 0.0460 100 9.2952
7.3554 0.0690 150 7.1992
6.828 0.0920 200 6.7258
6.4694 0.1150 250 6.3597
6.3401 0.1381 300 6.1703
6.1256 0.1611 350 6.0395
6.0372 0.1841 400 5.9271
6.0221 0.2071 450 5.8464
5.8783 0.2301 500 5.7810
5.8339 0.2531 550 5.7335
5.8546 0.2761 600 5.6904
5.9169 0.2991 650 5.6690
5.7959 0.3221 700 5.6565
5.7271 0.3451 750 5.6543
5.8734 0.3682 800 5.6542

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1