MugheesAwan11's picture
Add new SentenceTransformer model.
9bf08e9 verified
metadata
language:
  - en
license: apache-2.0
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:1K<n<10K
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
widget:
  - source_sentence: Our effective tax rate for 2023 was 18%.
    sentences:
      - What was the effective tax rate in fiscal 2023?
      - What are some key goals of the corporation related to climate change?
      - In which item is Note 10, discussing Legal Proceedings, included?
  - source_sentence: What kind of services does Equifax provide?
    sentences:
      - What is the primary business of Equifax Inc.?
      - What new production locations and vehicle models were active in 2023?
      - >-
        How much did AbbVie's gross margin percentage decrease in 2023 compared
        to 2022?
  - source_sentence: What was the effective tax rate in 2023?
    sentences:
      - What was the effective tax rate for fiscal year 2023?
      - How long do Enterprise Agreements last and who are they designed for?
      - What was Ellen Copaken's professional role prior to joining AMC?
  - source_sentence: What former roles has Indra K. Nooyi held?
    sentences:
      - Indra K. Nooyi | 68 | Former Chair and CEO, PepsiCo, Inc.
      - What is the valuation allowance of the company as of January 31, 2023?
      - What was the effective tax rate for fiscal 2023?
  - source_sentence: The net earnings margin in 2023 was 6.0%.
    sentences:
      - What was the net earnings margin in 2023?
      - What caused the slight decline in Workforce Solutions revenue in 2023?
      - >-
        What does it mean when an item is 'incorporated by reference' in a
        document?
pipeline_tag: sentence-similarity
model-index:
  - name: BGE base Financial Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.7257142857142858
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8514285714285714
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8828571428571429
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9142857142857143
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7257142857142858
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.28380952380952373
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17657142857142857
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09142857142857141
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7257142857142858
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8514285714285714
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8828571428571429
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9142857142857143
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8232947560533131
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7937823129251699
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7965741135480359
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.7257142857142858
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8542857142857143
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8757142857142857
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.91
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7257142857142858
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.28476190476190477
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17514285714285713
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09099999999999998
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7257142857142858
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8542857142857143
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8757142857142857
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.91
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8215329948771338
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7927670068027208
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7959270152786184
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.71
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.85
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8671428571428571
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9085714285714286
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.71
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2833333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1734285714285714
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09085714285714284
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.71
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.85
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8671428571428571
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9085714285714286
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8139428654682047
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7832817460317458
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7863373038655584
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.6814285714285714
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8157142857142857
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8585714285714285
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8942857142857142
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6814285714285714
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2719047619047619
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1717142857142857
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08942857142857143
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6814285714285714
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8157142857142857
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8585714285714285
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8942857142857142
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7914768113496716
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7581626984126983
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7616459239835561
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.66
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.78
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8071428571428572
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.87
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.66
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16142857142857142
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.087
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.66
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.78
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8071428571428572
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.87
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.763736298979858
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7301014739229026
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7342830326633573
            name: Cosine Map@100

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("MugheesAwan11/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'The net earnings margin in 2023 was 6.0%.',
    'What was the net earnings margin in 2023?',
    'What caused the slight decline in Workforce Solutions revenue in 2023?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7257
cosine_accuracy@3 0.8514
cosine_accuracy@5 0.8829
cosine_accuracy@10 0.9143
cosine_precision@1 0.7257
cosine_precision@3 0.2838
cosine_precision@5 0.1766
cosine_precision@10 0.0914
cosine_recall@1 0.7257
cosine_recall@3 0.8514
cosine_recall@5 0.8829
cosine_recall@10 0.9143
cosine_ndcg@10 0.8233
cosine_mrr@10 0.7938
cosine_map@100 0.7966

Information Retrieval

Metric Value
cosine_accuracy@1 0.7257
cosine_accuracy@3 0.8543
cosine_accuracy@5 0.8757
cosine_accuracy@10 0.91
cosine_precision@1 0.7257
cosine_precision@3 0.2848
cosine_precision@5 0.1751
cosine_precision@10 0.091
cosine_recall@1 0.7257
cosine_recall@3 0.8543
cosine_recall@5 0.8757
cosine_recall@10 0.91
cosine_ndcg@10 0.8215
cosine_mrr@10 0.7928
cosine_map@100 0.7959

Information Retrieval

Metric Value
cosine_accuracy@1 0.71
cosine_accuracy@3 0.85
cosine_accuracy@5 0.8671
cosine_accuracy@10 0.9086
cosine_precision@1 0.71
cosine_precision@3 0.2833
cosine_precision@5 0.1734
cosine_precision@10 0.0909
cosine_recall@1 0.71
cosine_recall@3 0.85
cosine_recall@5 0.8671
cosine_recall@10 0.9086
cosine_ndcg@10 0.8139
cosine_mrr@10 0.7833
cosine_map@100 0.7863

Information Retrieval

Metric Value
cosine_accuracy@1 0.6814
cosine_accuracy@3 0.8157
cosine_accuracy@5 0.8586
cosine_accuracy@10 0.8943
cosine_precision@1 0.6814
cosine_precision@3 0.2719
cosine_precision@5 0.1717
cosine_precision@10 0.0894
cosine_recall@1 0.6814
cosine_recall@3 0.8157
cosine_recall@5 0.8586
cosine_recall@10 0.8943
cosine_ndcg@10 0.7915
cosine_mrr@10 0.7582
cosine_map@100 0.7616

Information Retrieval

Metric Value
cosine_accuracy@1 0.66
cosine_accuracy@3 0.78
cosine_accuracy@5 0.8071
cosine_accuracy@10 0.87
cosine_precision@1 0.66
cosine_precision@3 0.26
cosine_precision@5 0.1614
cosine_precision@10 0.087
cosine_recall@1 0.66
cosine_recall@3 0.78
cosine_recall@5 0.8071
cosine_recall@10 0.87
cosine_ndcg@10 0.7637
cosine_mrr@10 0.7301
cosine_map@100 0.7343

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 46.61 tokens
    • max: 289 tokens
    • min: 8 tokens
    • mean: 20.58 tokens
    • max: 45 tokens
  • Samples:
    positive anchor
    Insurance Medical Membership at December 31, 2020 for Florida includes Individual Medicare Advantage (851.3 thousand), Group Medicare Advantage (9.1 thousand), Medicare stand-alone PDP (131.9 thousand), Medicare Supplement (17.5 thousand), State-based contracts and Other (656.6 thousand), Fully-insured commercial Group (73.8 thousand), ASO (24.5 thousand), totaling 1,764.7 thousand members. How is Florida's total insurance medical membership detailed in the data for December 31, 2023?
    For the year ended December 31, 2023, the total provision for income taxes was $836 million, which includes both current and deferred tax amounts. What was the total provision for income taxes at the end of 2023?
    Pursuant to the IRA, under Sections 48, 48E and 25D of the Internal Revenue Code (“IRC”), standalone energy storage technology is eligible for a tax credit between 6% and 50% of qualified expenditures, regardless of the source of energy, which may be claimed by our customers for storage systems they purchase or by us for arrangements where we own the systems. Under what sections of the Internal Revenue Code can standalone energy storage technology receive a tax credit?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8122 10 1.4587 - - - - -
0.9746 12 - 0.7544 0.7722 0.7809 0.7118 0.7804
1.6244 20 0.6938 - - - - -
1.9492 24 - 0.7586 0.779 0.7876 0.7197 0.785
0.8122 10 0.5238 - - - - -
0.9746 12 - 0.7602 0.7815 0.7928 0.7285 0.7942
1.6244 20 0.4172 - - - - -
1.9492 24 - 0.7616 0.7863 0.7959 0.7343 0.7966
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}