ymelka's picture
Add new SentenceTransformer model.
f09530d verified
metadata
language:
  - de
  - en
  - es
  - fr
  - it
  - nl
  - pl
  - pt
  - ru
  - zh
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5749
  - loss:CoSENTLoss
base_model: ymelka/camembert-cosmetic-finetuned
datasets:
  - PhilipMay/stsb_multi_mt
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: >-
      Nous nous déplaçons "... par rapport au cadre de repos cosmique en
      mouvement ... à environ 371 km/s vers la constellation du Lion".
    sentences:
      - La dame a fait frire la viande panée dans de l'huile chaude.
      - Il n'y a pas d'alambic qui ne soit pas relatif à un autre objet.
      - >-
        Le joueur de basket-ball est sur le point de marquer des points pour son
        équipe.
  - source_sentence: >-
      Le professeur Burkhauser a effectué des recherches approfondies sur les
      personnes qui sont pénalisées par l'augmentation du salaire minimum.
    sentences:
      - Un adolescent parle à une fille par le biais d'une webcam.
      - Une femme est en train de couper des oignons verts.
      - >-
        Les lois sur le salaire minimum nuisent le plus aux personnes les moins
        qualifiées et les moins productives.
  - source_sentence: >-
      Bien que le terme "reine" puisse faire référence à la fois à la reine
      régente (souveraine) ou à la reine consort, le roi a toujours été le
      souverain.
    sentences:
      - Des moutons paissent dans le champ devant une rangée d'arbres.
      - >-
        Il y a une très bonne raison de ne pas appeler le conjoint de la Reine
        "Roi" - parce qu'il n'est pas le Roi.
      - Un groupe de personnes âgées pose autour d'une table à manger.
  - source_sentence: Deux pygargues à tête blanche perchés sur une branche.
    sentences:
      - Un groupe de militaires joue dans un quintette de cuivres.
      - Deux aigles sont perchés sur une branche.
      - Un homme qui joue de la guitare sous la pluie.
  - source_sentence: Un homme joue de la guitare.
    sentences:
      - >-
        Il est possible qu'un système solaire comme le nôtre existe en dehors
        d'une galaxie.
      - Un homme joue de la flûte.
      - Un homme est en train de manger une banane.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on ymelka/camembert-cosmetic-finetuned
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb fr dev
          type: stsb-fr-dev
        metrics:
          - type: pearson_cosine
            value: 0.6401461834329478
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6661576168424006
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7077411059971963
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7104395816607704
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.6183470655093759
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6339424060254548
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.18614455072383299
            name: Pearson Dot
          - type: spearman_dot
            value: 0.21677402345623561
            name: Spearman Dot
          - type: pearson_max
            value: 0.7077411059971963
            name: Pearson Max
          - type: spearman_max
            value: 0.7104395816607704
            name: Spearman Max
          - type: pearson_cosine
            value: 0.834390325106948
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8564941342147334
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8518548236293758
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.854193303324745
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8541012365072966
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8555434573522197
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.4989804086580052
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5094008186566353
            name: Spearman Dot
          - type: pearson_max
            value: 0.8541012365072966
            name: Pearson Max
          - type: spearman_max
            value: 0.8564941342147334
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb fr test
          type: stsb-fr-test
        metrics:
          - type: pearson_cosine
            value: 0.7979696368103
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8219240068315988
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8237827107867745
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8221440625680553
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8230384709547542
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8218369251066925
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.4089365107737232
            name: Pearson Dot
          - type: spearman_dot
            value: 0.4588995887587045
            name: Spearman Dot
          - type: pearson_max
            value: 0.8237827107867745
            name: Pearson Max
          - type: spearman_max
            value: 0.8221440625680553
            name: Spearman Max

SentenceTransformer based on ymelka/camembert-cosmetic-finetuned

This is a sentence-transformers model finetuned from ymelka/camembert-cosmetic-finetuned on the PhilipMay/stsb_multi_mt dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: CamembertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ymelka/camembert-cosmetic-similarity")
# Run inference
sentences = [
    'Un homme joue de la guitare.',
    'Un homme est en train de manger une banane.',
    'Un homme joue de la flûte.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6401
spearman_cosine 0.6662
pearson_manhattan 0.7077
spearman_manhattan 0.7104
pearson_euclidean 0.6183
spearman_euclidean 0.6339
pearson_dot 0.1861
spearman_dot 0.2168
pearson_max 0.7077
spearman_max 0.7104

Semantic Similarity

Metric Value
pearson_cosine 0.8344
spearman_cosine 0.8565
pearson_manhattan 0.8519
spearman_manhattan 0.8542
pearson_euclidean 0.8541
spearman_euclidean 0.8555
pearson_dot 0.499
spearman_dot 0.5094
pearson_max 0.8541
spearman_max 0.8565

Semantic Similarity

Metric Value
pearson_cosine 0.798
spearman_cosine 0.8219
pearson_manhattan 0.8238
spearman_manhattan 0.8221
pearson_euclidean 0.823
spearman_euclidean 0.8218
pearson_dot 0.4089
spearman_dot 0.4589
pearson_max 0.8238
spearman_max 0.8221

Training Details

Training Dataset

PhilipMay/stsb_multi_mt

  • Dataset: PhilipMay/stsb_multi_mt at 3acaa3d
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 11.1 tokens
    • max: 30 tokens
    • min: 6 tokens
    • mean: 11.04 tokens
    • max: 26 tokens
    • min: 0.0
    • mean: 2.7
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Un avion est en train de décoller. Un avion est en train de décoller. 5.0
    Un homme joue d'une grande flûte. Un homme joue de la flûte. 3.799999952316284
    Un homme étale du fromage râpé sur une pizza. Un homme étale du fromage râpé sur une pizza non cuite. 3.799999952316284
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

PhilipMay/stsb_multi_mt

  • Dataset: PhilipMay/stsb_multi_mt at 3acaa3d
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 17.45 tokens
    • max: 52 tokens
    • min: 6 tokens
    • mean: 17.35 tokens
    • max: 48 tokens
    • min: 0.0
    • mean: 2.36
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Un homme avec un casque de sécurité est en train de danser. Un homme portant un casque de sécurité est en train de danser. 5.0
    Un jeune enfant monte à cheval. Un enfant monte à cheval. 4.75
    Un homme donne une souris à un serpent. L'homme donne une souris au serpent. 5.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss stsb-fr-dev_spearman_cosine stsb-fr-test_spearman_cosine
0 0 - - 0.6661 -
0.2778 100 4.9452 4.4417 0.7733 -
0.5556 200 4.667 4.4273 0.7986 -
0.8333 300 4.4904 4.3058 0.8338 -
1.1111 400 4.1679 4.2723 0.8491 -
1.3889 500 4.138 4.3575 0.8464 -
1.6667 600 4.5737 4.3427 0.8479 -
1.9444 700 4.3086 4.4455 0.8510 -
2.2222 800 3.8711 4.4135 0.8590 -
2.5 900 4.064 4.4775 0.8567 -
2.7778 1000 4.2255 4.4733 0.8565 -
3.0 1080 - - - 0.8219

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}