Edit model card

SentenceTransformer based on Rajan/NepaliBERT

This is a sentence-transformers model finetuned from Rajan/NepaliBERT. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Rajan/NepaliBERT
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("syubraj/sentence_similarity_nepali_v2")
# Run inference
sentences = [
    'रातो, डबल डेकर बस।',
    'रातो डबल डेकर बस।',
    'दुई कालो कुकुर हिउँमा हिंड्दै।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6971
spearman_cosine 0.6623
pearson_manhattan 0.6332
spearman_manhattan 0.6079
pearson_euclidean 0.634
spearman_euclidean 0.609
pearson_dot 0.4848
spearman_dot 0.5306
pearson_max 0.6971
spearman_max 0.6623

Training Details

Training Dataset

Unnamed Dataset

  • Size: 4,599 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 6 tokens
    • mean: 19.5 tokens
    • max: 81 tokens
    • min: 6 tokens
    • mean: 19.43 tokens
    • max: 75 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    एक व्यक्ति प्याज काट्दै छ। एउटा बिरालो शौचालयमा पपिङ गर्दैछ। 0.0
    क्यानडाको तेल रेल विस्फोटमा थप मृत्यु हुने अपेक्षा गरिएको छ क्यानडामा रेल दुर्घटनामा पाँच जनाको मृत्यु भएको छ 0.5599999904632569
    एउटी महिला झिंगा माझ्दै छिन्। एउटी महिला केही झिंगा माझ्दै। 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 100
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 100
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss stsb-dev-nepali_spearman_max
1.0 288 - 0.5355
1.7361 500 0.0723 -
2.0 576 - 0.5794
3.0 864 - 0.6108
3.4722 1000 0.047 0.6147
4.0 1152 - 0.6259
5.0 1440 - 0.6356
5.2083 1500 0.034 -
6.0 1728 - 0.6329
6.9444 2000 0.0217 0.6375
7.0 2016 - 0.6382
8.0 2304 - 0.6468
8.6806 2500 0.0137 -
9.0 2592 - 0.6348
10.0 2880 - 0.6332
10.4167 3000 0.0102 0.6427
11.0 3168 - 0.6370
12.0 3456 - 0.6515
12.1528 3500 0.0084 -
13.0 3744 - 0.6546
13.8889 4000 0.0069 0.6400
14.0 4032 - 0.6610
15.0 4320 - 0.6495
15.625 4500 0.006 -
16.0 4608 - 0.6574
17.0 4896 - 0.6486
17.3611 5000 0.0053 0.6589
18.0 5184 - 0.6592
19.0 5472 - 0.6488
19.0972 5500 0.0047 -
20.0 5760 - 0.6436
20.8333 6000 0.0044 0.6576
21.0 6048 - 0.6515
22.0 6336 - 0.6541
22.5694 6500 0.0041 -
23.0 6624 - 0.6549
24.0 6912 - 0.6571
24.3056 7000 0.0037 0.6603
25.0 7200 - 0.6699
26.0 7488 - 0.6653
26.0417 7500 0.0037 -
27.0 7776 - 0.6609
27.7778 8000 0.0033 0.6578
28.0 8064 - 0.6606
29.0 8352 - 0.6614
29.5139 8500 0.0031 -
30.0 8640 - 0.6579
31.0 8928 - 0.6688
31.25 9000 0.0028 0.6650
32.0 9216 - 0.6639
32.9861 9500 0.0027 -
33.0 9504 - 0.6624
34.0 9792 - 0.6646
34.7222 10000 0.0025 0.6530
35.0 10080 - 0.6587
36.0 10368 - 0.6671
36.4583 10500 0.0025 -
37.0 10656 - 0.6614
38.0 10944 - 0.6602
38.1944 11000 0.0024 0.6576
39.0 11232 - 0.6665
39.9306 11500 0.0023 -
40.0 11520 - 0.6663
41.0 11808 - 0.6734
41.6667 12000 0.0021 0.6633
42.0 12096 - 0.6667
43.0 12384 - 0.6679
43.4028 12500 0.002 -
44.0 12672 - 0.6701
45.0 12960 - 0.6650
45.1389 13000 0.0019 0.6680
46.0 13248 - 0.6631
46.875 13500 0.0018 -
47.0 13536 - 0.6643
48.0 13824 - 0.6631
48.6111 14000 0.0017 0.6648
49.0 14112 - 0.6648
50.0 14400 - 0.6619
50.3472 14500 0.0017 -
51.0 14688 - 0.6633
52.0 14976 - 0.6622
52.0833 15000 0.0016 0.6612
53.0 15264 - 0.6670
53.8194 15500 0.0015 -
54.0 15552 - 0.6618
55.0 15840 - 0.6641
55.5556 16000 0.0015 0.6617
56.0 16128 - 0.6669
57.0 16416 - 0.6645
57.2917 16500 0.0014 -
58.0 16704 - 0.6642
59.0 16992 - 0.6579
59.0278 17000 0.0013 0.6592
60.0 17280 - 0.6589
60.7639 17500 0.0014 -
61.0 17568 - 0.6685
62.0 17856 - 0.6673
62.5 18000 0.0012 0.6669
63.0 18144 - 0.6665
64.0 18432 - 0.6626
64.2361 18500 0.0012 -
65.0 18720 - 0.6619
65.9722 19000 0.0012 0.6643
66.0 19008 - 0.6651
67.0 19296 - 0.6628
67.7083 19500 0.0011 -
68.0 19584 - 0.6658
69.0 19872 - 0.6615
69.4444 20000 0.0011 0.6627
70.0 20160 - 0.6657
71.0 20448 - 0.6663
71.1806 20500 0.0011 -
72.0 20736 - 0.6634
72.9167 21000 0.001 0.6649
73.0 21024 - 0.6632
74.0 21312 - 0.6658
74.6528 21500 0.001 -
75.0 21600 - 0.6639
76.0 21888 - 0.6601
76.3889 22000 0.001 0.6623
77.0 22176 - 0.6607
78.0 22464 - 0.6613
78.125 22500 0.0009 -
79.0 22752 - 0.6613
79.8611 23000 0.0009 0.6615
80.0 23040 - 0.6615
81.0 23328 - 0.6617
81.5972 23500 0.0008 -
82.0 23616 - 0.6604
83.0 23904 - 0.6605
83.3333 24000 0.0008 0.6602
84.0 24192 - 0.6628
85.0 24480 - 0.6603
85.0694 24500 0.0008 -
86.0 24768 - 0.6602
86.8056 25000 0.0008 0.6592
87.0 25056 - 0.6611
88.0 25344 - 0.6612
88.5417 25500 0.0008 -
89.0 25632 - 0.6607
90.0 25920 - 0.6598
90.2778 26000 0.0008 0.6607
91.0 26208 - 0.6615
92.0 26496 - 0.6615
92.0139 26500 0.0007 -
93.0 26784 - 0.6609
93.75 27000 0.0007 0.6607
94.0 27072 - 0.6612
95.0 27360 - 0.6624
95.4861 27500 0.0007 -
96.0 27648 - 0.6627
97.0 27936 - 0.6618
97.2222 28000 0.0007 0.6619
98.0 28224 - 0.6621
98.9583 28500 0.0007 -
99.0 28512 - 0.6623
100.0 28800 - 0.6623

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.1.2
  • Accelerate: 0.30.1
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
59
Safetensors
Model size
81.9M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Space using syubraj/sentence_similarity_nepali_v2 1

Evaluation results