Indic Datasets List of text and voice datasets to train and finetune Indic LLMs ai4bharat/sangraha Viewer • Updated Jul 25 • 177M • 1.17k • 27 uonlp/CulturaX Viewer • Updated Jul 23 • 7.18B • 11.4k • 459 pary/hind_encorp Updated Jan 18 • 14 • 1 PleIAs/YouTube-Commons Updated Jun 26 • 40 • 301
Alignment Dataset English and other model alignment datasets. H-D-T/Buzz-8b-Large-v0.5 Text Generation • Updated May 14 • 12 • 29 allenai/WildChat-1M Viewer • Updated 15 days ago • 838k • 928 • 268 nvidia/ChatQA-Training-Data Viewer • Updated Jun 4 • 442k • 1.77k • 152 nvidia/ChatRAG-Bench Viewer • Updated May 24 • 34.6k • 1.64k • 94