OpenAlex
/

bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract / README.md

justin13barrett's picture

justin13barrett

Update README.md

c653011 verified 9 months ago

|

No virus

3.67 kB

	---
	license: apache-2.0
	base_model: bert-base-multilingual-cased
	model-index:
	- name: >-
	bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract
	results: []
	pipeline_tag: text-classification
	widget:
	- text: "Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4"
	---


	# bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract

	This model is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on a labeled dataset provided by CWTS:
	[CWTS Labeled Data]

	This is NOT the full model being used to tag [OpenAlex](https://openalex.org/) works with a topic. For that, check out the following github repo:
	[OpenAlex Topic Classification](https://github.com/ourresearch/openalex-topic-classification)

	That repository will also contain information about text preprocessing, modeling, testing, and deployment.

	## Model description

	The model was trained using the following input data format (so it is recommended the data be in this format as well):

	"\<TITLE\> {insert-processed-title-here}\n\<ABSTRACT\> {insert-processed-abstract-here}"

	The quickest way to use this model in Python is with the following code (assuming you have the transformers library installed):

	```
	from transformers import pipeline

	title = "{insert-processed-title-here}"
	abstract = "{insert-processed-abstract-here}"

	classifier = \
	pipeline(model="OpenAlex/bert-base-multilingual-cased-finetuned-openalex-topic-classification-title-abstract", top_k=10)

	classifier(f"""<TITLE> {title}\n<ABSTRACT> {abstract}""")

	```

	## Intended uses & limitations

	The model is intended to be used as part of a larger model that also incorporates journal information and citation features. However, this model is good if you want to use it for quickly generating a topic based only on a title/abstract.

	Since this model was fine-tuned on a BERT model, all of the biases seen in that model will most likely show up in this model as well.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 6e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 6e-05, 'decay_steps': 335420, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 500, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Train Accuracy \| Epoch \|
	\|:----------:\|:---------------:\|:--------------:\|:-----:\|
	\| 4.8075 \| 3.6686 \| 0.3839 \| 0 \|
	\| 3.4867 \| 3.3360 \| 0.4337 \| 1 \|
	\| 3.1865 \| 3.2005 \| 0.4556 \| 2 \|
	\| 2.9969 \| 3.1379 \| 0.4675 \| 3 \|
	\| 2.8489 \| 3.0900 \| 0.4746 \| 4 \|
	\| 2.7212 \| 3.0744 \| 0.4799 \| 5 \|
	\| 2.6035 \| 3.0660 \| 0.4831 \| 6 \|
	\| 2.4942 \| 3.0737 \| 0.4846 \| 7 \|


	### Framework versions

	- Transformers 4.35.2
	- TensorFlow 2.13.0
	- Datasets 2.15.0
	- Tokenizers 0.15.0