Training an Object Detection Model with AutoTrain

Community Article Published June 5, 2024

Object detection is an essential task in computer vision, enabling models to identify and classify objects within images. AutoTrain simplifies this process by allowing users to train a state-of-the-art object detection model with ease. In this blog post, we'll walk you through the steps to prepare your data, configure your training parameters, and use the command-line interface (CLI) and user interface (UI) for an effective object detection model both locally and on Hugging Face cloud.

image/png

Preparing Your Data

Before training your model, you need to organize your images and create a metadata file. Follow these guidelines:

Data preparation for UI

  1. Create a Zip Archive: Gather your images and a metadata.jsonl file into a single zip file. Your file structure should look like this:

    Archive.zip
    ├── 0001.png
    ├── 0002.png
    ├── 0003.png
    ├── ...
    └── metadata.jsonl
    
  2. Prepare the Metadata: The metadata.jsonl file contains information about each image, including the bounding boxes and categories of objects. Here's an example:

    {"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "category": [0]}}
    {"file_name": "0002.png", "objects": {"bbox": [[810.0, 100.0, 57.0, 28.0]], "category": [1]}}
    {"file_name": "0003.png", "objects": {"bbox": [[160.0, 31.0, 248.0, 616.0], [741.0, 68.0, 202.0, 401.0]], "category": [2, 2]}}
    

    Ensure the bounding boxes are in COCO format [x, y, width, height].

Data preparation for CLI

Alternatively, you can organize your data in folders if not using the UI:

  1. Create Training and Validation Folders: Organize your images and metadata.jsonl into separate folders for training and validation.

    training/
    ├── 0001.png
    ├── 0002.png
    ├── 0003.png
    ├── ...
    └── metadata.jsonl
    
    validation/
    ├── 0004.png
    ├── 0005.png
    ├── ...
    └── metadata.jsonl
    
  2. Prepare the Metadata: Similar to the UI method, the metadata.jsonl file should contain bounding box and category information.

Image Requirements

  • Format: All images must be in JPEG, JPG, or PNG format.
  • Quantity: Include at least 5 images to provide sufficient examples for learning.
  • Exclusivity: The zip file should only contain images and the metadata.jsonl file. No additional files or nested folders should be included.

When the train.zip is decompressed, it should create no folders, only images and metadata.jsonl.

NOTE: you can also use a dataset from the Hugging Face Hub. This will be discussed further in this blogpost.

Configuring Training Parameters

AutoTrain offers various parameters to customize your training process. Here are the key parameters you can configure:

Basic Parameters

  • --image-square-size: Resize input images to a square shape with the specified size (default is 600).
  • --batch-size: Set the training batch size.
  • --seed: Random seed for reproducibility.
  • --epochs: Number of training epochs.
  • --gradient_accumulation: Number of gradient accumulation steps.
  • --disable_gradient_checkpointing: Disable gradient checkpointing.
  • --lr: Learning rate.
  • --log: Experiment tracking options (none, wandb, tensorboard).

Advanced Parameters

  • --image-column: Specify the image column to use.
  • --target-column: Specify the target column to use.
  • --warmup-ratio: Proportion of training for a linear warmup (default is 0.1).
  • --optimizer: Choose the optimizer algorithm (adamw_torch by default).
  • --scheduler: Select the learning rate scheduler (linear by default, cosine is another option).
  • --weight-decay: Set the weight decay rate (default is 0.0).
  • --max-grad-norm: Maximum norm of the gradients for gradient clipping (default is 1.0).
  • --logging-steps: Determine the frequency of logging training progress (default is -1 for automatic determination).
  • --evaluation-strategy: Specify the evaluation frequency (no, steps, epoch).
  • --save-total-limit: Limit the number of model checkpoints to save.
  • --auto-find-batch-size: Automatically determine the batch size based on hardware capabilities.
  • --mixed-precision: Choose precision mode (fp16, bf16, or None).

Using the CLI for Training

To train your object detection model using the CLI, you can create a configuration file and run the autotrain command. Below is an example configuration file for training on the CPPE-5 dataset from Hugging Face Hub.

image/png

Sample Configuration File

task: object_detection
base_model: facebook/detr-resnet-50
project_name: autotrain-obj-det-cppe5-2
log: tensorboard
backend: local

data:
  path: cppe-5
  train_split: train
  valid_split: test
  column_mapping:
    image_column: image
    objects_column: objects

params:
  image_square_size: 600
  epochs: 100
  batch_size: 8
  lr: 5e-5
  weight_decay: 1e-4
  optimizer: adamw_torch
  scheduler: linear
  gradient_accumulation: 1
  mixed_precision: fp16
  early_stopping_patience: 50
  early_stopping_threshold: 0.001

hub:
  username: ${HF_USERNAME}
  token: ${HF_TOKEN}
  push_to_hub: true

Running the Training

To start training, use the following command:

$ export HF_USERNAME=your_hugging_face_username
$ export HF_TOKEN=your_hugging_face_write_token

$ autotrain --config configfile.yml

This command will use the configuration specified in configfile.yml to train your object detection model.

Note: you only need to export your username and token in case you have set push_to_hub to true.

In some cases of Hugging Face Datasets, the dataset might contain a config, in those cases, you can use dataset_config:split_name for train_split and valid_split. For example, this dataset has configs: full and mini:

image/png

For this, the configfile changes will be:

data:
  path: keremberke/license-plate-object-detection
  train_split: full:train
  valid_split: full:validation
  column_mapping:
    image_column: image
    objects_column: objects

If your dataset is stored locally, you need to update the following in config yaml:

data:
  path: /path/to/data/folder/
  train_split: train # this folder contains images and metadata.jsonl
  valid_split: val # this folder contains images and metadata.jsonl, optional, can be set to null
  column_mapping:
    image_column: image
    objects_column: objects

Using the UI for training

Locally, you can start AutoTrain UI by running:

$ pip install -U autotrain-advanced

$ autotrain app --host 127.0.0.1 --port 8000

The app will start at http://127.0.0.1:8000

image/png

The data format for uploading in the UI is the same as described above for zip files.

In case you don't have proper hardware, you can also start the UI on Hugging Face spaces by clicking here. Read more in the docs.

When using a dataset from hub, you must map the columns correctly. The column mapping should remain as it is when using a local dataset (folder or zip).

Conclusion

AutoTrain simplifies the complex task of training object detection models, enabling you to focus on fine-tuning your model for optimal performance. By following these guidelines and utilizing the available parameters, you can create an effective object detection model tailored to your specific needs. Whether using the UI or CLI, AutoTrain provides a streamlined process for building powerful object detection models.

P.S: All models trained using AutoTrain are ready for deployment using API Inference and Inference Endpoints.

In case of any issues or feature requests, checkout the GitHub Repository.

Happy Training! :)