Ihor commited on
Commit
afe95e4
1 Parent(s): 1c21554

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -15
README.md CHANGED
@@ -58,6 +58,85 @@ for entity in entities:
58
  print(entity["text"], "=>", entity["label"])
59
  ```
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  **How to use for open information extraction:**
62
 
63
  ```python
@@ -128,21 +207,6 @@ for summary in summaries:
128
  print(summary["text"], "=>", summary["score"])
129
  ```
130
 
131
- **How to use for relation extraction:**
132
-
133
- ```python
134
- text = """
135
- Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014.
136
- """
137
-
138
- labels = ["Microsoft <> founder", "Microsoft <> inception date", "Bill Gates <> held position"]
139
-
140
- entities = model.predict_entities(text, labels)
141
-
142
- for entity in entities:
143
- print(entity["label"], “=>”, entity["text"])
144
- ```
145
-
146
  ### Benchmarks:
147
 
148
  Our multitask model demonstrates comparable performance on different zero-shot benchmarks to dedicated models to NER task:
 
58
  print(entity["text"], "=>", entity["label"])
59
  ```
60
 
61
+ **How to use for relation extraction:**
62
+
63
+ ```python
64
+ text = """
65
+ Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014.
66
+ """
67
+
68
+ labels = ["Microsoft <> founder", "Microsoft <> inception date", "Bill Gates <> held position"]
69
+
70
+ entities = model.predict_entities(text, labels)
71
+
72
+ for entity in entities:
73
+ print(entity["label"], “=>”, entity["text"])
74
+ ```
75
+ ### Construct relations extraction pipeline with [utca](https://github.com/Knowledgator/utca)
76
+ First of all, we need import neccessary components of the library and initalize predictor - GLiNER model and construct pipeline that combine NER and realtions extraction:
77
+ ```python
78
+ from utca.core import RenameAttribute
79
+ from utca.implementation.predictors import (
80
+ GLiNERPredictor,
81
+ GLiNERPredictorConfig
82
+ )
83
+ from utca.implementation.tasks import (
84
+ GLiNER,
85
+ GLiNERPreprocessor,
86
+ GLiNERRelationExtraction,
87
+ GLiNERRelationExtractionPreprocessor,
88
+ )
89
+
90
+ predictor = GLiNERPredictor( # Predictor manages the model that will be used by tasks
91
+ GLiNERPredictorConfig(
92
+ model_name = "knowledgator/gliner-multitask-large-v0.5", # Model to use
93
+ device = "cuda:0", # Device to use
94
+ )
95
+ )
96
+
97
+ pipe = (
98
+ GLiNER( # GLiNER task produces classified entities that will be at the "output" key.
99
+ predictor=predictor,
100
+ preprocess=GLiNERPreprocessor(threshold=0.7) # Entities threshold
101
+ )
102
+ | RenameAttribute("output", "entities") # Rename output entities from GLiNER task to use them as inputs in GLiNERRelationExtraction
103
+ | GLiNERRelationExtraction( # GLiNERRelationExtraction is used for relation extraction.
104
+ predictor=predictor,
105
+ preprocess=(
106
+ GLiNERPreprocessor(threshold=0.5) # Relations threshold
107
+ | GLiNERRelationExtractionPreprocessor()
108
+ )
109
+ )
110
+ )
111
+ ```
112
+
113
+ To run pipeline we need to specify entity types and relations with their parameters:
114
+
115
+ ```python
116
+ r = pipe.run({
117
+ "text": text, # Text to process
118
+ "labels": [ # Labels used by GLiNER for entity extraction
119
+ "scientist",
120
+ "university",
121
+ "city",
122
+ "research",
123
+ "journal",
124
+ ],
125
+ "relations": [{ # Relation parameters
126
+ "relation": "published at", # Relation label. Required parameter.
127
+ "pairs_filter": [("scientist", "journal")], # Optional parameter. It specifies possible members of relations by their entity labels.
128
+ # Here, "scientist" is the entity label of the source, and "journal" is the target's entity label.
129
+ # If provided, only specified pairs will be returned.
130
+ },{
131
+ "relation": "worked at",
132
+ "pairs_filter": [("scientist", "university"), ("scientist", "other")],
133
+ "distance_threshold": 100, # Optional parameter. It specifies the max distance between spans in the text (i.e., the end of the span that is closer to the start of the text and the start of the next one).
134
+ }]
135
+ })
136
+
137
+ print(r["output"])
138
+ ```
139
+
140
  **How to use for open information extraction:**
141
 
142
  ```python
 
207
  print(summary["text"], "=>", summary["score"])
208
  ```
209
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210
  ### Benchmarks:
211
 
212
  Our multitask model demonstrates comparable performance on different zero-shot benchmarks to dedicated models to NER task: