Spaces:

jeremyarancio
/

ingredients-spellcheck

Running on Zero

App Files Files Community

jeremyarancio commited on Jul 11

Commit

b24df6e

•

1 Parent(s): 17306ce

Add with torch no grad

Browse files

Files changed (1) hide show

app.py +9 -6

app.py CHANGED Viewed

@@ -18,7 +18,7 @@ logging.basicConfig(
 EXAMPLES = [
     ["images/ingredients_1.jpg", "24.36% chocolat noir 63% origine non UE (cacao, sucre, beurre de cacao, émulsifiant léci - thine de colza, vanille bourbon gousse), œuf, farine de blé, beurre, sucre, miel, sucre perlé, levure chimique, zeste de citron."],
     ["images/ingredients_2.jpg", "farine de froment, œufs, lait entier pasteurisé Aprigine: France), sucre, sel, extrait de vanille naturelle Conditi( 35."],
-    ["images/ingredients_3.jpg", "tural basmati rice - cooked (98%), rice bran oil, salt"],
     ["images/ingredients_4.jpg", "Eau de noix de coco 93.9%, Arôme natutel de fruit"],
     ["images/ingredients_5.jpg", "Sucre, pâte de cacao, beurre de cacao, émulsifiant: léci - thines (soja). Peut contenir des traces de lait. Chocolat noir: cacao: 50% minimum. À conserver à l'abri de la chaleur et de l'humidité. Élaboré en France."],
 ]
@@ -37,6 +37,8 @@ However, it often happens the information extracted by OCR contains typos and er
 To solve this problem, we developed an 🍊 **Ingredient Spellcheck** 🍊, a model capable of correcting typos in a list of ingredients following a defined guideline.
 The model, based on Mistral-7B-v0.3, was fine-tuned on thousand of corrected lists of ingredients extracted from the database. More information in the model card.
 ## 👇 Links
 * Open Food Facts website: https://world.openfoodfacts.org/discover
@@ -83,11 +85,12 @@ def process(text: str) -> str:
         add_special_tokens=True,
         return_tensors="pt"
     ).input_ids
-    output = model.generate(
-        input_ids.to(zero.device), # GPU
-        do_sample=False,
-        max_new_tokens=512,
-    )
     return tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt):].strip()

 EXAMPLES = [
     ["images/ingredients_1.jpg", "24.36% chocolat noir 63% origine non UE (cacao, sucre, beurre de cacao, émulsifiant léci - thine de colza, vanille bourbon gousse), œuf, farine de blé, beurre, sucre, miel, sucre perlé, levure chimique, zeste de citron."],
     ["images/ingredients_2.jpg", "farine de froment, œufs, lait entier pasteurisé Aprigine: France), sucre, sel, extrait de vanille naturelle Conditi( 35."],
+    # ["images/ingredients_3.jpg", "tural basmati rice - cooked (98%), rice bran oil, salt"],
     ["images/ingredients_4.jpg", "Eau de noix de coco 93.9%, Arôme natutel de fruit"],
     ["images/ingredients_5.jpg", "Sucre, pâte de cacao, beurre de cacao, émulsifiant: léci - thines (soja). Peut contenir des traces de lait. Chocolat noir: cacao: 50% minimum. À conserver à l'abri de la chaleur et de l'humidité. Élaboré en France."],
 ]
 To solve this problem, we developed an 🍊 **Ingredient Spellcheck** 🍊, a model capable of correcting typos in a list of ingredients following a defined guideline.
 The model, based on Mistral-7B-v0.3, was fine-tuned on thousand of corrected lists of ingredients extracted from the database. More information in the model card.
+## Project in progress
 ## 👇 Links
 * Open Food Facts website: https://world.openfoodfacts.org/discover
         add_special_tokens=True,
         return_tensors="pt"
     ).input_ids
+    with torch.no_grad():
+        output = model.generate(
+            input_ids.to(zero.device), # GPU
+            do_sample=False,
+            max_new_tokens=512,
+        )
     return tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt):].strip()