--- library_name: transformers tags: - vit - cifar10 - image classification license: apache-2.0 datasets: - uoft-cs/cifar10 language: - en metrics: - accuracy - perplexity pipeline_tag: image-classification --- ## Model Details ### Model Description An adapter for the [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) ViT trained on CIFAR10 classification task ## Loading guide ```py from transformers import AutoModelForImageClassification labels2title = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] model = AutoModelForImageClassification.from_pretrained( 'google/vit-base-patch16-224-in21k', num_labels=len(labels2title), id2label={i: c for i, c in enumerate(labels2title)}, label2id={c: i for i, c in enumerate(labels2title)} ) model.load_adapter("yturkunov/cifar10_vit16_lora") ``` ## Learning curves ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655221be7bd4634260e032ca/Ji1ewA_8T1rJuQkdNCIXQ.png) ### Recommendations to input The model expects an image that has went through the following preprocessing stages: * Scaling range: $[0, 255]\rightarrow[0, 1]$ * Normalization parameters: $\mu=(.5,.5,.5),\sigma=(.5,.5,.5)$ * Dimensions: 224x224 * Number of channels: 3 ### Inference on 3x4 random sample ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655221be7bd4634260e032ca/zxj9ID37gJJnkmc8Sl97A.png)