Stefan PRO
stefan-it
AI & ML interests
Flair Library, NER & PoS Tagging, LM Pretraining (mostly encoder-only), Historical Language Models
Articles
Organizations
stefan-it's activity
upvoted
a
paper
11 days ago
upvoted
a
paper
13 days ago
upvoted
a
paper
27 days ago
upvoted
an
article
about 1 month ago
Article
Announcing Occiglot-Fineweb
By
•
•
5upvoted
a
paper
2 months ago
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Paper
•
2404.14408
•
Published
•
6
Investigating Gender Bias in Turkish Language Models
Paper
•
2404.11726
•
Published
•
1
Fewer Truncations Improve Language Modeling
Paper
•
2404.10830
•
Published
•
2
Token Dropping for Efficient BERT Pretraining
Paper
•
2203.13240
•
Published
•
2
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Paper
•
2403.19559
•
Published
•
1
Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding
Paper
•
2404.05694
•
Published
•
2
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models
Paper
•
2404.04113
•
Published
•
3
Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds
Paper
•
2404.04031
•
Published
•
1
Tokenizer Choice For LLM Training: Negligible or Crucial?
Paper
•
2310.08754
•
Published
•
2
Understanding Back-Translation at Scale
Paper
•
1808.09381
•
Published
•
1
Revisiting subword tokenization: A case study on affixal negation in large language models
Paper
•
2404.02421
•
Published
•
1
Cross-lingual Named Entity Corpus for Slavic Languages
Paper
•
2404.00482
•
Published
•
3
Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions
Paper
•
2403.15279
•
Published
•
1
CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction
Paper
•
2403.15322
•
Published
•
1
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank
Paper
•
2403.10293
•
Published
•
1
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Paper
•
2403.08693
•
Published
•
1
MaiBaam Annotation Guidelines
Paper
•
2403.05902
•
Published
•
1
Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models
Paper
•
2402.18397
•
Published
•
1
upvoted
a
collection
4 months ago
SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages
Paper
•
2402.08638
•
Published
•
1
Pixel Sentence Representation Learning
Paper
•
2402.08183
•
Published
•
2
Fractal Patterns May Unravel the Intelligence in Next-Token Prediction
Paper
•
2402.01825
•
Published
•
2
Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks
Paper
•
2401.17396
•
Published
•
1
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper
•
2401.17072
•
Published
•
23
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Paper
•
2401.16589
•
Published
•
1
DrBERT: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
Paper
•
2401.15861
•
Published
•
1
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Paper
•
2305.18893
•
Published
•
2
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Paper
•
2401.14373
•
Published
•
11
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Paper
•
2401.13160
•
Published
•
9
LangBridge: Multilingual Reasoning Without Multilingual Supervision
Paper
•
2401.10695
•
Published
•
4
Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Paper
•
2309.08351
•
Published
•
3
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Paper
•
2207.14251
•
Published
•
1
Cross-lingual Editing in Multilingual Language Models
Paper
•
2401.10521
•
Published
•
2
Mission: Impossible Language Models
Paper
•
2401.06416
•
Published
•
3
RoBERTurk: Adjusting RoBERTa for Turkish
Paper
•
2401.03515
•
Published
•
1
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Paper
•
2401.03321
•
Published
•
1
German Text Embedding Clustering Benchmark
Paper
•
2401.02709
•
Published
•
5
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Paper
•
2312.17482
•
Published
•
1
Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers
Paper
•
2312.16291
•
Published
•
1
Language Resources for Dutch Large Language Modelling
Paper
•
2312.12852
•
Published
•
9
WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Paper
•
2112.06598
•
Published
•
1
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper
•
2312.07910
•
Published
•
14
On Meta-Prompting
Paper
•
2312.06562
•
Published
•
1
Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models
Paper
•
2312.05503
•
Published
•
1
Gated Linear Attention Transformers with Hardware-Efficient Training
Paper
•
2312.06635
•
Published
•
3
RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training
Paper
•
2312.04032
•
Published
•
1
Advancing State of the Art in Language Modeling
Paper
•
2312.03735
•
Published
•
1
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Paper
•
2312.03788
•
Published
•
1