The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 14 days ago • 73
view article Article BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*⚡ By xhluca • about 7 hours ago • 25
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published 25 days ago • 18
Beyond Document Page Classification: Design, Datasets, and Challenges Paper • 2308.12896 • Published Aug 24, 2023 • 1