caballeroch commited on
Commit
b1d1135
1 Parent(s): 3a57388

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fake News Classifier - Finetuned: 'distilbert-base-uncased'
2
+
3
+ #### **LIAR Dataset**
4
+ ***
5
+ - This model is finetuned on a large dataset of hand-labeled short statements from politifact.com's API.
6
+ - Data went through a series of text cleaning stages such as:
7
+ 1. Lower-case standardization for improved 'uncased' model performance.
8
+ 2. Mixed letter/digit word removal.
9
+ 3. Stopword removal.
10
+ 4. Extra space trimming.
11
+
12
+ #### **DistilBERT Uncased Tokenizer**
13
+ ***
14
+ - The text is tokenized using the 'distilbert-base-uncased' HuggingFace tokenizer.
15
+ - For training, the text is cut to a block-size of 200.
16
+ - Max length padding is used to maintain consistent input data shape.
17
+
18
+ #### **DistilBERT Uncased Model**
19
+ ***
20
+ - The model that is finetuned is the DistilBERT model, 'distilbert-base-uncased'.
21
+ - This is a small and fast text classifier, perfect for real-time inference!
22
+ - 40% less parameters than the base BERT model.
23
+ - 60% faster while preserving 95% performance of the base BERT model.
24
+ - This model outperforms the finetuned 'distilbert-base-cased' by over 5% average F1-score.
25
+ - This improvement comes mainly from the slower learning rate and improved data preprocessing.
26
+ - These modifications allow for a smoother training curve and convergence.
27
+
28
+
29
+
30
+