usama98 commited on
Commit
e1c9104
1 Parent(s): 495d139

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -29,6 +29,8 @@ This is a poem generator that creates poems based on the style of the targeted p
29
 
30
  All models are available in the `HuggingFace` model page under the [usama98](https://huggingface.co/usama98/) name. Checkpoints are available in PyTorch.
31
 
 
 
32
  # Dataset
33
 
34
  The dataset consists of content scraped mainly from الموسوعة الشعرية and الديوان. After merging both, the total number of verses is 1,831,770 poetic verses. Each verse is labeled by its meter, the poet who wrote it, and the age which it was written in. There are 22 meters, 3701 poets and 11 ages: Pre-Islamic, Islamic, Umayyad, Mamluk, Abbasid, Ayyubid, Ottoman, Andalusian, era between Umayyad and Abbasid, Fatimid, and finally the modern age. We are only interested in the 16 classic meters which are attributed to Al-Farahidi, and they comprise the majority of the dataset with a total number around 1.7M verses. It is important to note that the verses diacritic states are not consistent. This means that a verse can carry full, semi diacritics, or it can carry nothing.
 
29
 
30
  All models are available in the `HuggingFace` model page under the [usama98](https://huggingface.co/usama98/) name. Checkpoints are available in PyTorch.
31
 
32
+ Our model adds a newly tried capability of NLP models where we don't just try to generate text but one that imitates a specific style. Our dataset contains poetry gathered from different poets, the data was feed to the model during training in with the aim of teaching the model how to structure arabic poetry. The additional step here was to add a poet name at the beginning of each training example. This training strategy allows the model to not only learn how to write poetry but how to the written poetry relates to that specific poet and their style.
33
+
34
  # Dataset
35
 
36
  The dataset consists of content scraped mainly from الموسوعة الشعرية and الديوان. After merging both, the total number of verses is 1,831,770 poetic verses. Each verse is labeled by its meter, the poet who wrote it, and the age which it was written in. There are 22 meters, 3701 poets and 11 ages: Pre-Islamic, Islamic, Umayyad, Mamluk, Abbasid, Ayyubid, Ottoman, Andalusian, era between Umayyad and Abbasid, Fatimid, and finally the modern age. We are only interested in the 16 classic meters which are attributed to Al-Farahidi, and they comprise the majority of the dataset with a total number around 1.7M verses. It is important to note that the verses diacritic states are not consistent. This means that a verse can carry full, semi diacritics, or it can carry nothing.