1-800-BAD-CODE commited on
Commit
3151c36
1 Parent(s): b51be78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -203,11 +203,15 @@ We show here the cosine similarity between the embeddings of each token:
203
 
204
  Recall that these embeddings are used to predict sentence boundaries... thus we should expect full stops to cluster.
205
 
206
- Indeed, we see that `NULL` and `COMMA` are exactly the same, because neither have an implication on sentence boundaries.
207
 
208
- Next, we see that periods and question marks are exactly the same, and exactly the opposite of NULL.
209
- This is expected since these tokens typically imply sentence boundaries, whereas NULL and commas do not.
 
 
 
 
 
 
 
210
 
211
- Lastly, we see that ACRONYM is quite, but not totally, similar to periods and question marks,
212
- and almost, but not totally, the opposite of NULL and commas.
213
- Intuitio suggests this is because acronyms can be full stops ("I live in the northern U.S. It's cold here.") or not ("It's 5 a.m. and I'm tired").
 
203
 
204
  Recall that these embeddings are used to predict sentence boundaries... thus we should expect full stops to cluster.
205
 
206
+ Indeed, we see that `NULL` and "`,`" are exactly the same, because neither have an implication on sentence boundaries.
207
 
208
+ Next, we see that "`.`" and "`?`" are exactly the same, because w.r.t. SBD these are exactly the same: strong full stop implications.
209
+ (Though, we may expect some difference between these tokens, given that "`.`" is predicted after abbreviations, e.g., 'Mr.', that are not full stops.)
210
+
211
+ Further, we see that "`.`" and "`?`" are exactly the opposite of `NULL`.
212
+ This is expected since these tokens typically imply sentence boundaries, whereas `NULL` and "`,`" do not.
213
+
214
+ Lastly, we see that `ACRONYM` is very, but not totally, similar to the full stops "`.`" and "`?`",
215
+ and almost, but not totally, the opposite of `NULL` and "`,`".
216
+ Intuition suggests this is because acronyms can be full stops ("I live in the northern U.S. It's cold here.") or not ("It's 5 a.m. and I'm tired.").
217