AI & ML interests

None defined yet.

Chroma Datasets

Making it easy to load data into Chroma since 2023

pip install chroma_datasets

Current Datasets

  • State of the Union from chroma_datasets import StateOfTheUnion
  • Paul Graham Essay from chroma_datasets import PaulGrahamEssay
  • Glue from chroma_datasets import Glue
  • SciPy from chroma_datasets import SciPy

chroma_datasets is generally backed by hugging face datasets, but it is not a requirement.

How to use

The following will:

  1. Download the 2022 State of the Union
  2. Chunk it up for you
  3. Embed it using Chroma's default open-source embedding function
  4. Import it into Chroma
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma

chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"])
print(result)

Learn about how to create and contribute a package at chroma-core/chroma_datasets.

models

None public yet