SetFit with sentence-transformers/all-MiniLM-L6-v2
This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/all-MiniLM-L6-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
Label |
Examples |
very_semantic |
- 'What are the key considerations when proposing names for a project or initiative?'
- 'What are the key aspects of team life and events in a company?'
- 'What is being asked for or sought in this conversation?'
|
lexical |
- 'Who is responsible for reviewing and signing documents related to conference submissions?'
- 'How do data architecture and management systems enable digital transformation and address its associated challenges?'
- 'How do keys or access credentials get shared or transferred among team members in a workplace?'
|
very_lexical |
- 'What are some of the key challenges associated with handling and storing large amounts of genomic data?'
- "What is the focus of Eurobiomed's partnership with Digital113?"
- 'What are the key considerations for generating well-formatted JSON instances that conform to a given schema?'
|
semantic |
- 'How can visualizations be used to enhance documentation and collaboration in software development?'
- 'What are the key considerations when choosing a distance metric for a vector database?'
- 'How can AI be leveraged to support HR departments in detecting and addressing gender bias?'
|
Evaluation
Metrics
Label |
Accuracy |
all |
0.3077 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("yaniseuranova/setfit-rag-hybrid-search-query-router-test")
preds = model("What is the purpose of the message posted by the CR?")
Training Details
Training Set Metrics
Training set |
Min |
Median |
Max |
Word count |
7 |
14.1913 |
24 |
Label |
Training Sample Count |
lexical |
41 |
semantic |
24 |
very_lexical |
17 |
very_semantic |
33 |
Training Hyperparameters
- batch_size: (4, 4)
- num_epochs: (2, 2)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True
Training Results
Epoch |
Step |
Training Loss |
Validation Loss |
0.0004 |
1 |
0.4883 |
- |
0.0209 |
50 |
0.3738 |
- |
0.0417 |
100 |
0.2192 |
- |
0.0626 |
150 |
0.1503 |
- |
0.0834 |
200 |
0.1514 |
- |
0.1043 |
250 |
0.1829 |
- |
0.1251 |
300 |
0.4191 |
- |
0.1460 |
350 |
0.2136 |
- |
0.1668 |
400 |
0.1847 |
- |
0.1877 |
450 |
0.1681 |
- |
0.2085 |
500 |
0.222 |
- |
0.2294 |
550 |
0.0397 |
- |
0.2502 |
600 |
0.2626 |
- |
0.2711 |
650 |
0.1343 |
- |
0.2919 |
700 |
0.1769 |
- |
0.3128 |
750 |
0.1704 |
- |
0.3336 |
800 |
0.401 |
- |
0.3545 |
850 |
0.1405 |
- |
0.3753 |
900 |
0.1892 |
- |
0.3962 |
950 |
0.1444 |
- |
0.4170 |
1000 |
0.2337 |
- |
0.4379 |
1050 |
0.1848 |
- |
0.4587 |
1100 |
0.0601 |
- |
0.4796 |
1150 |
0.2467 |
- |
0.5004 |
1200 |
0.1829 |
- |
0.5213 |
1250 |
0.1695 |
- |
0.5421 |
1300 |
0.3892 |
- |
0.5630 |
1350 |
0.1408 |
- |
0.5838 |
1400 |
0.0506 |
- |
0.6047 |
1450 |
0.1835 |
- |
0.6255 |
1500 |
0.3284 |
- |
0.6464 |
1550 |
0.1797 |
- |
0.6672 |
1600 |
0.1118 |
- |
0.6881 |
1650 |
0.1502 |
- |
0.7089 |
1700 |
0.112 |
- |
0.7298 |
1750 |
0.0401 |
- |
0.7506 |
1800 |
0.117 |
- |
0.7715 |
1850 |
0.1287 |
- |
0.7923 |
1900 |
0.0623 |
- |
0.8132 |
1950 |
0.2128 |
- |
0.8340 |
2000 |
0.1542 |
- |
0.8549 |
2050 |
0.1774 |
- |
0.8757 |
2100 |
0.3252 |
- |
0.8966 |
2150 |
0.0152 |
- |
0.9174 |
2200 |
0.0539 |
- |
0.9383 |
2250 |
0.0047 |
- |
0.9591 |
2300 |
0.1232 |
- |
0.9800 |
2350 |
0.3466 |
- |
1.0 |
2398 |
- |
0.3644 |
1.0008 |
2400 |
0.0296 |
- |
1.0217 |
2450 |
0.3459 |
- |
1.0425 |
2500 |
0.0867 |
- |
1.0634 |
2550 |
0.1343 |
- |
1.0842 |
2600 |
0.2074 |
- |
1.1051 |
2650 |
0.0052 |
- |
1.1259 |
2700 |
0.0548 |
- |
1.1468 |
2750 |
0.0441 |
- |
1.1676 |
2800 |
0.0821 |
- |
1.1885 |
2850 |
0.0546 |
- |
1.2093 |
2900 |
0.1286 |
- |
1.2302 |
2950 |
0.1222 |
- |
1.2510 |
3000 |
0.0227 |
- |
1.2719 |
3050 |
0.3011 |
- |
1.2927 |
3100 |
0.018 |
- |
1.3136 |
3150 |
0.0581 |
- |
1.3344 |
3200 |
0.0485 |
- |
1.3553 |
3250 |
0.2369 |
- |
1.3761 |
3300 |
0.1681 |
- |
1.3970 |
3350 |
0.1289 |
- |
1.4178 |
3400 |
0.1664 |
- |
1.4387 |
3450 |
0.1467 |
- |
1.4595 |
3500 |
0.1399 |
- |
1.4804 |
3550 |
0.3045 |
- |
1.5013 |
3600 |
0.2155 |
- |
1.5221 |
3650 |
0.061 |
- |
1.5430 |
3700 |
0.0787 |
- |
1.5638 |
3750 |
0.3649 |
- |
1.5847 |
3800 |
0.1202 |
- |
1.6055 |
3850 |
0.1004 |
- |
1.6264 |
3900 |
0.154 |
- |
1.6472 |
3950 |
0.0944 |
- |
1.6681 |
4000 |
0.0004 |
- |
1.6889 |
4050 |
0.1843 |
- |
1.7098 |
4100 |
0.2233 |
- |
1.7306 |
4150 |
0.2203 |
- |
1.7515 |
4200 |
0.0986 |
- |
1.7723 |
4250 |
0.2295 |
- |
1.7932 |
4300 |
0.1763 |
- |
1.8140 |
4350 |
0.3487 |
- |
1.8349 |
4400 |
0.3285 |
- |
1.8557 |
4450 |
0.0152 |
- |
1.8766 |
4500 |
0.1108 |
- |
1.8974 |
4550 |
0.2416 |
- |
1.9183 |
4600 |
0.0476 |
- |
1.9391 |
4650 |
0.2929 |
- |
1.9600 |
4700 |
0.1006 |
- |
1.9808 |
4750 |
0.0925 |
- |
2.0 |
4796 |
- |
0.3669 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- SetFit: 1.0.3
- Sentence Transformers: 2.6.1
- Transformers: 4.39.0
- PyTorch: 2.3.1+cu121
- Datasets: 2.18.0
- Tokenizers: 0.15.2
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}