Hungarian Abstractive Summarization BART model

For further models, scripts and details, see our repository or our demo site.

BART base model (see Results Table - bold):
- Pretrained on Webcorpus 2.0
- Finetuned HI corpus (hvg.hu + index.hu)
  - Segments: 559.162

Limitations

tokenized input text (tokenizer: HuSpaCy)
max_source_length = 1024
max_target_length = 256

Results

Model	HI	NOL
BART-base-512	30.18/13.86/22.92	46.48/32.40/39.45
BART-base-1024	31.86/14.59/23.79	47.01/32.91/39.97

Citation

If you use this model, please cite the following paper:

@inproceedings {yang-bart,
    title = {{BARTerezzünk! - Messze, messze, messze a világtól, - BART kísérleti modellek magyar nyelvre}},
    booktitle = {XVIII. Magyar Számítógépes Nyelvészeti Konferencia},
    year = {2022},
    publisher = {Szegedi Tudományegyetem, Informatikai Intézet},
    address = {Szeged, Magyarország},
    author = {Yang, Zijian Győző},
    pages = {15--29}
}