Spaces:

ryanrwatkins
/

needs

Sleeping

ryanrwatkins commited on Jan 17

Commit

3fb326e

•

1 Parent(s): c6f7e29

Update notes.md

Files changed (1) hide show

notes.md CHANGED Viewed

@@ -6,6 +6,7 @@ I created the embedding file outside of huggingface, though you can do it here t
 With the embedding you determine the appropriate  size that you want to divide the text into chunks (e.g., 500, 800, 1000 token chunks). The size of chunks is therefore an important variable since if you have large chunks (e.g., 2000 tokens) you pass larger context blocks to ChatGPT, but also use up the limited number of tokens available. For ChatGPT the default length is fixed at 2048 tokens, while the maximum can be set at 4096 tokens. The point being, consider what strategy you want to use for chunk sizing for each project you are working on. More on tokens limits can be found here: https://medium.com/@russkohn/mastering-ai-token-limits-and-memory-ce920630349a
 # Templates for App:
 https://huggingface.co/spaces/anzorq/chatgpt-demo

 With the embedding you determine the appropriate  size that you want to divide the text into chunks (e.g., 500, 800, 1000 token chunks). The size of chunks is therefore an important variable since if you have large chunks (e.g., 2000 tokens) you pass larger context blocks to ChatGPT, but also use up the limited number of tokens available. For ChatGPT the default length is fixed at 2048 tokens, while the maximum can be set at 4096 tokens. The point being, consider what strategy you want to use for chunk sizing for each project you are working on. More on tokens limits can be found here: https://medium.com/@russkohn/mastering-ai-token-limits-and-memory-ce920630349a
+Here is my Python script for creating the embedding (using Google Colab): https://github.com/ryanrwatkins/create-openai-embedding/blob/main/create-embedding
 # Templates for App:
 https://huggingface.co/spaces/anzorq/chatgpt-demo