Common Corpus Collection The largest public domain dataset for training LLMs. β’ 27 items β’ Updated Jul 17 β’ 112