MugheesAwan11's picture
Add new SentenceTransformer model.
982fb20 verified
metadata
base_model: BAAI/bge-base-en-v1.5
datasets: []
language:
  - en
library_name: sentence-transformers
license: apache-2.0
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:7872
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      personal information within 45 days. If personal information was sold,
      organizations must also identify and inform the consumer of the sources of
      information, its collection purpose, and the categories of third parties
      to whom the data was sold to. As per the CCPA, the following information
      must be provided in an access request: The categories of personal
      information the business has collected about the consumer in the preceding
      12 months. For each category identified, the categories of third parties
      to whom it disclosed that particular category of personal information. The
      categories of sources from which the personal information was collected.
      The business or commercial purpose for which it collected or sold the
      personal information. The categories of third parties with whom the
      business shares consumers’ Personal Information. The right to access is
      one of the toughest articles for businesses to comply with because
      organizations need to track the location of every consumer’s personal
      information in all on-premises and multicloud data systems.
    sentences:
      - >-
        What are the UCPA requirements for organizations regarding personal data
        handling, including pseudonymous and sensitive data, and data transfer
        to third parties in certain circumstances?
      - >-
        What are the benefits of implementing CCPA for businesses in terms of
        reducing costs, liabilities, and human effort while ensuring effortless
        compliance?
      - >-
        What information must organizations provide regarding the categories of
        third parties in relation to personal information under the CCPA?
  - source_sentence: >-
      on businesses that meet these criteria, regardless of their physical
      presence in Colorado. Colorado is a one-party consent state for recording
      conversations. This means that as long as one participant in the
      conversation consents to the recording, it is generally legal. However,
      it's important to understand and adhere to the specific legal requirements
      and limitations. ## Join Our Newsletter Get all the latest information,
      law updates and more delivered to your inbox ### Share Copy 41 ### More
      Stories that May Interest You View More September 21, 2023 ## Navigating
      Generative AI Privacy Challenges & Safeguarding Tips Introduction The
      emergence of Generative AI has ushered in a new era of innovation in the
      ever-evolving technological landscape that pushes the boundaries of...
      View More September 15, 2023 ## Right of Access to Personal Data: What To
      Know The wealth of data available
    sentences:
      - What solutions does Oracle offer for data security and governance?
      - >-
        What are the legal requirements for recording conversations in Colorado,
        considering consent laws and data protection regulations?
      - What are the key components of the NVIDIA computing platform?
  - source_sentence: >-
      such personal data have been collected or where such collected personal
      data are beyond the extent required, discriminatory, unfair or illegal.
      ### Right to Erasure Data subjects can request omission or erasure of the
      personal data upon cessation of the purpose for which the processing has
      been conducted, or where all justifications for maintaining such personal
      data by the organization cease to exist. ## Facts related to Qatar DPL 1
      The DPL incorporates concepts familiar from other international privacy
      frameworks to protect a consumer's personal data. 2 Under the DPL, a data
      controller is responsible for identifying all parties who process personal
      data on its behalf. 3 In Qatar, the Compliance and Data Protection
      department (the “CDP”)at MoTC is responsible for the enforcement of the
      DPL. . 4 The MoTC can also impose fines of up to QAR 5 million (US$1.4
      million)
    sentences:
      - >-
        What is Securiti's mission regarding data protection laws and
        regulations?
      - >-
        What is the role of the Nominating and Corporate Governance Committee at
        NVIDIA?
      - >-
        What is the right to erasure and how does it apply to personal data in
        Qatar under the DPL?
  - source_sentence: >-
      . It allows you to identify gaps in compliance and address the risks.
      Seamlessly expand assessment capabilities across your vendor ecosystem to
      maintain compliance against LPPD requirements. ## Map data flows Track
      data flows in your organizations by having a centralized catalogue of
      internal data process flows as well as flows for data transfer to service
      providers and other third parties. ## Manage vendor risk Articles: 8, 9,
      12 Track, manage and monitor privacy and security readiness for all your
      service providers from a single interface. Collaborate instantly with
      vendors, automate data requests, and manage all vendor contracts and
      compliance documents. ## Breach Response Notification Article: 12(5), Data
      Protection Board Decision 2019/10 Automates compliance actions and breach
      notifications to concerned stakeholders in relation to security incidents
      by leveraging a knowledge database on security incident diagnosis and
      response. ## Key data subject rights encoded within LPPD Access: Data
      subjects have the right to access, , and privacy impact assessment system,
      you can gauge your organization's posture against Qatar DPL requirements,
      identify the gaps, and address the risks. Seamlessly being able to expand
      assessment capabilities across your vendor ecosystem to maintain
      compliance against Qatar DPL requirements. ## Map data flows Articles: 23,
      24, 25 Track data flows in your organizations, trace this data, catalog,
      transfer, and document business process flows internally and to service
      providers or third parties. ## Manage vendor risk Articles: 15, 12 Keep
      track of privacy and security readiness for all your service providers
      from a single interface. Collaborate instantly with vendors, automate data
      requests and deletions, and manage all vendor contracts and compliance
      documents. ## Breach Response Notification Articles: 11(5), 14 Automates
      compliance actions and breach notifications to concerned stakeholders in
      relation to security incidents by leveraging a knowledge database on
      security incident diagnosis and response.
    sentences:
      - >-
        What is the purpose of a centralized catalogue in managing data flows,
        vendor risk, and compliance with LPPD and Qatar DPL requirements?
      - >-
        What are the security requirements for data handlers according to
        Spain's Data Protection Law?
      - What are some key rights granted to data subjects under Bahrain PDPL?
  - source_sentence: >-
      office of the ​​Federal Commissioner for Data Protection and Freedom of
      Information, with its headquarters in the city of Bonn. It is led by a
      Federal Commissioner, elected via a vote by the German Bundestag.
      Eligibility criteria include being at least 35 years old, appropriate
      qualifications in the field of data protection law gained through relevant
      professional experience. The Commissioner's term is for five years, which
      can be extended once. The Commissioner has the responsibility to act as
      the primary office responsible for enforcing the Federal Data Protection
      Act within Germany. Some of the office's key responsibilities include:
      Advising the Bundestag, the Bundesrat, and the Federal Government on
      administrative and legislative measures related to data protection within
      the country; To oversee and implement both the GDPR and Federal Data
      Protection Act within Germany; To promote awareness within the public
      related to the risks, rules, safeguards, and rights concerning the
      processing of personal data; To handle all,  within Germany. It
      supplements and aligns with the requirements of the EU GDPR. Yes, Germany
      is covered by GDPR (General Data Protection Regulation). GDPR is a
      regulation that applies uniformly across all EU member states, including
      Germany. The Federal Data Protection Act established the office of the
      ​​Federal Commissioner for Data Protection and Freedom of Information,
      with its headquarters in the city of Bonn. It is led by a Federal
      Commissioner, elected via a vote by the German Bundestag. Germany's
      interpretation is the Bundesdatenschutzgesetz (BDSG), the German Federal
      Data Protection Act. It mirrors the GDPR in all key areas while giving
      local German regulatory authorities the power to enforce it more
      efficiently nationally. ## Join Our Newsletter Get all the latest
      information, law updates and more delivered to your inbox ### Share Copy
      14 ### More Stories that May Interest You View More
    sentences:
      - What is the collection and use of personal information by businesses?
      - >-
        How does Data Mapping Automation optimize data governance and enable
        data security and protection?
      - >-
        What are the main responsibilities of the Federal Commissioner for Data
        Protection and Freedom of Information in enforcing data protection laws
        in Germany, including the GDPR and the Federal Data Protection Act?
model-index:
  - name: SentenceTransformer based on BAAI/bge-base-en-v1.5
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.6907216494845361
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8865979381443299
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9381443298969072
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9690721649484536
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6907216494845361
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.29553264604810997
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.18762886597938144
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09690721649484535
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6907216494845361
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8865979381443299
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9381443298969072
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9690721649484536
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8386189701330025
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7955735558828344
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7967787552384278
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.6907216494845361
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8762886597938144
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9278350515463918
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9690721649484536
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6907216494845361
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2920962199312715
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.18556701030927836
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09690721649484535
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6907216494845361
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8762886597938144
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9278350515463918
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9690721649484536
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8329963353635171
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7889011618393064
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7896128390908116
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.6907216494845361
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8556701030927835
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8969072164948454
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9381443298969072
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6907216494845361
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2852233676975945
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17938144329896905
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09381443298969072
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6907216494845361
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8556701030927835
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8969072164948454
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9381443298969072
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8161733445083468
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7769595810832928
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7795708391204863
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.5979381443298969
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7731958762886598
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8247422680412371
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8865979381443299
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5979381443298969
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.25773195876288657
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16494845360824742
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08865979381443297
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5979381443298969
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7731958762886598
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8247422680412371
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8865979381443299
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7462462760759706
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7009818360333826
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7046924157583041
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.5154639175257731
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.6804123711340206
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.711340206185567
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7731958762886598
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5154639175257731
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2268041237113402
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1422680412371134
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.07731958762886597
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5154639175257731
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.6804123711340206
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.711340206185567
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7731958762886598
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6463393588703956
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6055105547373589
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6128426579691056
            name: Cosine Map@100

SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("MugheesAwan11/bge-base-securiti-dataset-1-v16")
# Run inference
sentences = [
    "office of the \u200b\u200bFederal Commissioner for Data Protection and Freedom of Information, with its headquarters in the city of Bonn. It is led by a Federal Commissioner, elected via a vote by the German Bundestag. Eligibility criteria include being at least 35 years old, appropriate qualifications in the field of data protection law gained through relevant professional experience. The Commissioner's term is for five years, which can be extended once. The Commissioner has the responsibility to act as the primary office responsible for enforcing the Federal Data Protection Act within Germany. Some of the office's key responsibilities include: Advising the Bundestag, the Bundesrat, and the Federal Government on administrative and legislative measures related to data protection within the country; To oversee and implement both the GDPR and Federal Data Protection Act within Germany; To promote awareness within the public related to the risks, rules, safeguards, and rights concerning the processing of personal data; To handle all,  within Germany. It supplements and aligns with the requirements of the EU GDPR. Yes, Germany is covered by GDPR (General Data Protection Regulation). GDPR is a regulation that applies uniformly across all EU member states, including Germany. The Federal Data Protection Act established the office of the \u200b\u200bFederal Commissioner for Data Protection and Freedom of Information, with its headquarters in the city of Bonn. It is led by a Federal Commissioner, elected via a vote by the German Bundestag. Germany's interpretation is the Bundesdatenschutzgesetz (BDSG), the German Federal Data Protection Act. It mirrors the GDPR in all key areas while giving local German regulatory authorities the power to enforce it more efficiently nationally. ## Join Our Newsletter Get all the latest information, law updates and more delivered to your inbox ### Share Copy 14 ### More Stories that May Interest You View More",
    'What are the main responsibilities of the Federal Commissioner for Data Protection and Freedom of Information in enforcing data protection laws in Germany, including the GDPR and the Federal Data Protection Act?',
    'What is the collection and use of personal information by businesses?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6907
cosine_accuracy@3 0.8866
cosine_accuracy@5 0.9381
cosine_accuracy@10 0.9691
cosine_precision@1 0.6907
cosine_precision@3 0.2955
cosine_precision@5 0.1876
cosine_precision@10 0.0969
cosine_recall@1 0.6907
cosine_recall@3 0.8866
cosine_recall@5 0.9381
cosine_recall@10 0.9691
cosine_ndcg@10 0.8386
cosine_mrr@10 0.7956
cosine_map@100 0.7968

Information Retrieval

Metric Value
cosine_accuracy@1 0.6907
cosine_accuracy@3 0.8763
cosine_accuracy@5 0.9278
cosine_accuracy@10 0.9691
cosine_precision@1 0.6907
cosine_precision@3 0.2921
cosine_precision@5 0.1856
cosine_precision@10 0.0969
cosine_recall@1 0.6907
cosine_recall@3 0.8763
cosine_recall@5 0.9278
cosine_recall@10 0.9691
cosine_ndcg@10 0.833
cosine_mrr@10 0.7889
cosine_map@100 0.7896

Information Retrieval

Metric Value
cosine_accuracy@1 0.6907
cosine_accuracy@3 0.8557
cosine_accuracy@5 0.8969
cosine_accuracy@10 0.9381
cosine_precision@1 0.6907
cosine_precision@3 0.2852
cosine_precision@5 0.1794
cosine_precision@10 0.0938
cosine_recall@1 0.6907
cosine_recall@3 0.8557
cosine_recall@5 0.8969
cosine_recall@10 0.9381
cosine_ndcg@10 0.8162
cosine_mrr@10 0.777
cosine_map@100 0.7796

Information Retrieval

Metric Value
cosine_accuracy@1 0.5979
cosine_accuracy@3 0.7732
cosine_accuracy@5 0.8247
cosine_accuracy@10 0.8866
cosine_precision@1 0.5979
cosine_precision@3 0.2577
cosine_precision@5 0.1649
cosine_precision@10 0.0887
cosine_recall@1 0.5979
cosine_recall@3 0.7732
cosine_recall@5 0.8247
cosine_recall@10 0.8866
cosine_ndcg@10 0.7462
cosine_mrr@10 0.701
cosine_map@100 0.7047

Information Retrieval

Metric Value
cosine_accuracy@1 0.5155
cosine_accuracy@3 0.6804
cosine_accuracy@5 0.7113
cosine_accuracy@10 0.7732
cosine_precision@1 0.5155
cosine_precision@3 0.2268
cosine_precision@5 0.1423
cosine_precision@10 0.0773
cosine_recall@1 0.5155
cosine_recall@3 0.6804
cosine_recall@5 0.7113
cosine_recall@10 0.7732
cosine_ndcg@10 0.6463
cosine_mrr@10 0.6055
cosine_map@100 0.6128

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,872 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 18 tokens
    • mean: 206.12 tokens
    • max: 414 tokens
    • min: 9 tokens
    • mean: 21.62 tokens
    • max: 102 tokens
  • Samples:
    positive anchor
    Automation PrivacyCenter.Cloud Data Mapping
    on both in terms of material and territorial scope. ### 1.1 Material Scope The Spanish data protection law affords blanket protection for all data that may have been collected on a data subject. There are only a handful of exceptions that include: Information subject to a pending legal case Information collected concerning the investigation of terrorism or organised crime Information classified as "Confidential" for matters related to Spain's national security ### 1.2 Territorial Scope The Spanish data protection law applies to all data handlers that are: Carrying out data collection activities in Spain Not established in Spain but carrying out data collection activities on Spanish territory Not established within the European Union but carrying out data collection activities on Spanish residents unless for data transit purposes only ## 2. Obligations for Organizations Under Spanish Data Protection Law The Spanish data protection law and GDPR lay out specific obligations for all data handlers. These obligations ensure, . ### 2.3 Privacy Policy Requirements Spain's data protection law requires all data handlers to inform the data subject of the following in their privacy policy: The purpose of collecting the data and the recipients of the information The obligatory or voluntary nature of the reply to the questions put to them The consequences of obtaining the data or of refusing to provide them The possibility of exercising rights of access, rectification, erasure, portability, and objection The identity and address of the controller or their local Spanish representative ### 2.4 Security Requirements Article 9 of Spain's Data Protection Law is direct and explicit in stating the responsibility of the data handler is to take adequate measures to ensure the protection of any data collected. It mandates all data handlers to adopt technical and organisational measures necessary to ensure the security of the personal data and prevent their alteration, loss, and unauthorised processing or access. Additionally, collection of any What are the requirements for organizations under the Spanish data protection law regarding privacy policies and security measures?
    before the point of collection of their personal information. ## Right to Erasure The right to erasure gives consumers the right to request deleting all their data stored by the organization. Organizations are supposed to comply within 45 days and must deliver a report to the consumer confirming the deletion of their information. ## Right to Opt-in for Minors Personal information containing minors' personal information cannot be sold by a business unless the minor (age of 13 to 16 years) or the Parent/Guardian (if the minor is aged below 13 years) opt-ins to allow this sale. Businesses can be held liable for the sale of minors' personal information if they either knew or wilfully disregarded the consumer's status as a minor and the minor or Parent/Guardian had not willingly opted in. ## Right to Continued Protection Even when consumers choose to allow a business to collect and sell their personal information, businesses' must sign written What are the conditions under which businesses can sell minors' personal information?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0407 10 7.3954 - - - - -
0.0813 20 6.0944 - - - - -
0.1220 30 4.9443 - - - - -
0.1626 40 3.8606 - - - - -
0.2033 50 3.0961 - - - - -
0.2439 60 1.8788 - - - - -
0.2846 70 2.3815 - - - - -
0.3252 80 4.0698 - - - - -
0.3659 90 2.2183 - - - - -
0.4065 100 1.9142 - - - - -
0.4472 110 1.5149 - - - - -
0.4878 120 1.7036 - - - - -
0.5285 130 2.9528 - - - - -
0.5691 140 1.0596 - - - - -
0.6098 150 1.7619 - - - - -
0.6504 160 1.6529 - - - - -
0.6911 170 3.097 - - - - -
0.7317 180 1.3802 - - - - -
0.7724 190 1.9744 - - - - -
0.8130 200 5.1313 - - - - -
0.8537 210 1.405 - - - - -
0.8943 220 1.4389 - - - - -
0.9350 230 3.6439 - - - - -
0.9756 240 3.7227 - - - - -
1.0122 249 - 0.6623 0.7328 0.7549 0.5729 0.7572
1.0041 250 1.3183 - - - - -
1.0447 260 5.2631 - - - - -
1.0854 270 4.0516 - - - - -
1.1260 280 2.5487 - - - - -
1.1667 290 1.7379 - - - - -
1.2073 300 1.1724 - - - - -
1.2480 310 0.7885 - - - - -
1.2886 320 1.2341 - - - - -
1.3293 330 3.3722 - - - - -
1.3699 340 1.2227 - - - - -
1.4106 350 0.8475 - - - - -
1.4512 360 0.7605 - - - - -
1.4919 370 0.8954 - - - - -
1.5325 380 1.9712 - - - - -
1.5732 390 0.5607 - - - - -
1.6138 400 0.9671 - - - - -
1.6545 410 1.0024 - - - - -
1.6951 420 2.1374 - - - - -
1.7358 430 0.8213 - - - - -
1.7764 440 2.1253 - - - - -
1.8171 450 2.7885 - - - - -
1.8577 460 0.9053 - - - - -
1.8984 470 0.9261 - - - - -
1.9390 480 3.1218 - - - - -
1.9797 490 3.0135 - - - - -
1.9878 492 - 0.7047 0.7796 0.7896 0.6128 0.7968
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}