ryanrwatkins commited on
Commit
3fb326e
1 Parent(s): c6f7e29

Update notes.md

Browse files
Files changed (1) hide show
  1. notes.md +1 -0
notes.md CHANGED
@@ -6,6 +6,7 @@ I created the embedding file outside of huggingface, though you can do it here t
6
 
7
  With the embedding you determine the appropriate size that you want to divide the text into chunks (e.g., 500, 800, 1000 token chunks). The size of chunks is therefore an important variable since if you have large chunks (e.g., 2000 tokens) you pass larger context blocks to ChatGPT, but also use up the limited number of tokens available. For ChatGPT the default length is fixed at 2048 tokens, while the maximum can be set at 4096 tokens. The point being, consider what strategy you want to use for chunk sizing for each project you are working on. More on tokens limits can be found here: https://medium.com/@russkohn/mastering-ai-token-limits-and-memory-ce920630349a
8
 
 
9
 
10
  # Templates for App:
11
  https://huggingface.co/spaces/anzorq/chatgpt-demo
 
6
 
7
  With the embedding you determine the appropriate size that you want to divide the text into chunks (e.g., 500, 800, 1000 token chunks). The size of chunks is therefore an important variable since if you have large chunks (e.g., 2000 tokens) you pass larger context blocks to ChatGPT, but also use up the limited number of tokens available. For ChatGPT the default length is fixed at 2048 tokens, while the maximum can be set at 4096 tokens. The point being, consider what strategy you want to use for chunk sizing for each project you are working on. More on tokens limits can be found here: https://medium.com/@russkohn/mastering-ai-token-limits-and-memory-ce920630349a
8
 
9
+ Here is my Python script for creating the embedding (using Google Colab): https://github.com/ryanrwatkins/create-openai-embedding/blob/main/create-embedding
10
 
11
  # Templates for App:
12
  https://huggingface.co/spaces/anzorq/chatgpt-demo