Edit model card

Pleias-Topic-Detection

Pleias-Topic-Detection is an encoder-decoder specialized for topic detection. Given a document Pleias-Topic-Deduction will return a main topic that can be used for further downstream tasks (annotation, embedding indexation)

Pleias-Topic-Detection is a finetuned version of t5-small on a set of 70,000 documents and associated topics from Common Corpus. While t5-small has been reportedly only trained in English, the model actually shows unexpected capacities for multilingual annotation. The final corpus include a significant amount of texts in French, Spanish, Italian, Dutch and German and has been proven to work somewhat in all of theses languages.

Given that Pleias-Topic-Detection is a relatively lightweight model (70 million parameters) it can be used for classification at scale on a large corpus.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1
  • mixed_precision_training: Native AMP
Downloads last month
28
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from