SentenceTransformer based on indobenchmark/indobert-base-p2

This is a sentence-transformers model finetuned from indobenchmark/indobert-base-p2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: indobenchmark/indobert-base-p2
Maximum Sequence Length: 200 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 200, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Penduduk kabupaten Raja Ampat mayoritas memeluk agama Kristen.',
    'Masyarakat kabupaten Raja Ampat mayoritas memeluk agama Islam.',
    'Gereja Baptis biasanya cenderung membentuk kelompok sendiri.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	-0.0979
spearman_cosine	-0.1037
pearson_manhattan	-0.0987
spearman_manhattan	-0.1005
pearson_euclidean	-0.0981
spearman_euclidean	-0.0998
pearson_dot	-0.0822
spearman_dot	-0.0821
pearson_max	-0.0822
spearman_max	-0.0821

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	-0.0278
spearman_cosine	-0.035
pearson_manhattan	-0.0355
spearman_manhattan	-0.0387
pearson_euclidean	-0.0356
spearman_euclidean	-0.0389
pearson_dot	-0.0092
spearman_dot	-0.0066
pearson_max	-0.0092
spearman_max	-0.0066

Training Details

Training Dataset

Unnamed Dataset

Size: 10,330 training samples
Columns: sentence_0, sentence_1, and label
Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label
type string string int
details
min: 10 tokens
mean: 30.59 tokens
max: 128 tokens

min: 6 tokens
mean: 11.93 tokens
max: 37 tokens

0: ~33.50%
1: ~32.70%
2: ~33.80%

	sentence_0	sentence_1	label
type	string	string	int
details	min: 10 tokens mean: 30.59 tokens max: 128 tokens	min: 6 tokens mean: 11.93 tokens max: 37 tokens	0: ~33.50% 1: ~32.70% 2: ~33.80%

Samples:

sentence_0	sentence_1	label
`Ini adalah coup de grâce dan dorongan yang dibutuhkan oleh para pendatang untuk mendapatkan kemerdekaan mereka.`	`Pendatang tidak mendapatkan kemerdekaan.`	`2`
`Dua bayi almarhum Raja, Diana dan Suharna, diculik.`	`Jumlah bayi raja yang diculik sudah mencapai 2 bayi.`	`1`
`Sebuah penelitian menunjukkan bahwa mengkonsumsi makanan yang tinggi kadar gulanya bisa meningkatkan rasa haus.`	`Tidak ada penelitian yang bertopik makanan yang kadar gulanya tinggi.`	`2`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
num_train_epochs: 20
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand

Epoch	Step	Training Loss	sts-dev_spearman_max
0.0998	129	-	-0.0821
0.0999	258	-	-0.0541
0.1936	500	0.0322	-
0.1998	516	-	-0.0474
0.2997	774	-	-0.0369
0.3871	1000	0.0157	-
0.3995	1032	-	-0.0371
0.4994	1290	-	-0.0388
0.5807	1500	0.0109	-
0.5993	1548	-	-0.0284
0.6992	1806	-	-0.0293
0.7743	2000	0.0112	-
0.7991	2064	-	-0.0176
0.8990	2322	-	-0.0290
0.9679	2500	0.0104	-
0.9988	2580	-	-0.0128
1.0	2583	-	-0.0123
1.0987	2838	-	-0.0200
1.1614	3000	0.0091	-
1.1986	3096	-	-0.0202
1.2985	3354	-	-0.0204
1.3550	3500	0.0052	-
1.3984	3612	-	-0.0231
1.4983	3870	-	-0.0312
1.5486	4000	0.0017	-
1.5981	4128	-	-0.0277
1.6980	4386	-	-0.0366
1.7422	4500	0.0054	-
1.7979	4644	-	-0.0192
1.8978	4902	-	-0.0224
1.9357	5000	0.0048	-
1.9977	5160	-	-0.0240
2.0	5166	-	-0.0248
2.0976	5418	-	-0.0374
2.1293	5500	0.0045	-
2.1974	5676	-	-0.0215
2.2973	5934	-	-0.0329
2.3229	6000	0.0047	-
2.3972	6192	-	-0.0284
2.4971	6450	-	-0.0370
2.5165	6500	0.0037	-
2.5970	6708	-	-0.0390
2.6969	6966	-	-0.0681
2.7100	7000	0.0128	-
2.7967	7224	-	-0.0343
2.8966	7482	-	-0.0413
2.9036	7500	0.0055	-
2.9965	7740	-	-0.0416
3.0	7749	-	-0.0373
3.0964	7998	-	-0.0630
3.0972	8000	0.0016	-
3.1963	8256	-	-0.0401
3.2907	8500	0.0018	-
3.2962	8514	-	-0.0303
3.3961	8772	-	-0.0484
3.4843	9000	0.0017	-
3.4959	9030	-	-0.0619
3.5958	9288	-	-0.0411
3.6779	9500	0.007	-
3.6957	9546	-	-0.0408
3.7956	9804	-	-0.0368
3.8715	10000	0.0029	-
3.8955	10062	-	-0.0429
3.9954	10320	-	-0.0526
4.0	10332	-	-0.0494
4.0650	10500	0.0004	-
4.0952	10578	-	-0.0385
4.1951	10836	-	-0.0467
4.2586	11000	0.0004	-
4.2950	11094	-	-0.0500
4.3949	11352	-	-0.0458
4.4522	11500	0.0011	-
4.4948	11610	-	-0.0389
4.5947	11868	-	-0.0401
4.6458	12000	0.0046	-
4.6945	12126	-	-0.0370
4.7944	12384	-	-0.0495
4.8393	12500	0.0104	-
4.8943	12642	-	-0.0504
4.9942	12900	-	-0.0377
5.0	12915	-	-0.0379
5.0329	13000	0.0005	-
5.0941	13158	-	-0.0617
5.1940	13416	-	-0.0354
5.2265	13500	0.0006	-
5.2938	13674	-	-0.0514
5.3937	13932	-	-0.0615
5.4201	14000	0.0014	-
5.4936	14190	-	-0.0574
5.5935	14448	-	-0.0503
5.6136	14500	0.0025	-
5.6934	14706	-	-0.0512
5.7933	14964	-	-0.0316
5.8072	15000	0.0029	-
5.8931	15222	-	-0.0475
5.9930	15480	-	-0.0429
6.0	15498	-	-0.0377
6.0008	15500	0.0003	-
6.0929	15738	-	-0.0486
6.1928	15996	-	-0.0512
6.1943	16000	0.0002	-
6.2927	16254	-	-0.0383
6.3879	16500	0.0017	-
6.3926	16512	-	-0.0460
6.4925	16770	-	-0.0439
6.5815	17000	0.0046	-
6.5923	17028	-	-0.0378
6.6922	17286	-	-0.0289
6.7751	17500	0.0081	-
6.7921	17544	-	-0.0415
6.8920	17802	-	-0.0451
6.9686	18000	0.0021	-
6.9919	18060	-	-0.0386
7.0	18081	-	-0.0390
7.0918	18318	-	-0.0460
7.1622	18500	0.0001	-
7.1916	18576	-	-0.0510
7.2915	18834	-	-0.0566
7.3558	19000	0.0009	-
7.3914	19092	-	-0.0479
7.4913	19350	-	-0.0456
7.5494	19500	0.0019	-
7.5912	19608	-	-0.0371
7.6911	19866	-	-0.0184
7.7429	20000	0.003	-
7.7909	20124	-	-0.0312
7.8908	20382	-	-0.0307
7.9365	20500	0.0008	-
7.9907	20640	-	-0.0291
8.0	20664	-	-0.0298
8.0906	20898	-	-0.0452
8.1301	21000	0.0001	-
8.1905	21156	-	-0.0405
8.2904	21414	-	-0.0417
8.3237	21500	0.0007	-
8.3902	21672	-	-0.0430
8.4901	21930	-	-0.0487
8.5172	22000	0.0	-
8.5900	22188	-	-0.0471
8.6899	22446	-	-0.0361
8.7108	22500	0.0037	-
8.7898	22704	-	-0.0443
8.8897	22962	-	-0.0404
8.9044	23000	0.0009	-
8.9895	23220	-	-0.0421
9.0	23247	-	-0.0425
9.0894	23478	-	-0.0451
9.0979	23500	0.0001	-
9.1893	23736	-	-0.0458
9.2892	23994	-	-0.0479
9.2915	24000	0.0	-
9.3891	24252	-	-0.0400
9.4851	24500	0.0014	-
9.4890	24510	-	-0.0374
9.5889	24768	-	-0.0454
9.6787	25000	0.0075	-
9.6887	25026	-	-0.0230
9.7886	25284	-	-0.0345
9.8722	25500	0.0007	-
9.8885	25542	-	-0.0301
9.9884	25800	-	-0.0363
10.0	25830	-	-0.0375
10.0658	26000	0.0001	-
10.0883	26058	-	-0.0381
10.1882	26316	-	-0.0386
10.2594	26500	0.0	-
10.2880	26574	-	-0.0390
10.3879	26832	-	-0.0366
10.4530	27000	0.0007	-
10.4878	27090	-	-0.0464
10.5877	27348	-	-0.0509
10.6465	27500	0.0021	-
10.6876	27606	-	-0.0292
10.7875	27864	-	-0.0514
10.8401	28000	0.0017	-
10.8873	28122	-	-0.0485
10.9872	28380	-	-0.0471
11.0	28413	-	-0.0468
11.0337	28500	0.0	-
11.0871	28638	-	-0.0460
11.1870	28896	-	-0.0450
11.2273	29000	0.0	-
11.2869	29154	-	-0.0457
11.3868	29412	-	-0.0450
11.4208	29500	0.0008	-
11.4866	29670	-	-0.0440
11.5865	29928	-	-0.0384
11.6144	30000	0.0028	-
11.6864	30186	-	-0.0066

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.41.2
PyTorch: 2.3.0+cu121
Accelerate: 0.31.0
Datasets: 2.19.2
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

cassador
/

indobert-t4

SentenceTransformer based on indobenchmark/indobert-base-p2

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Semantic Similarity

Training Details

Training Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Model tree for cassador/indobert-t4

Evaluation results