Update README.md

a0ebb6a verified 17 days ago

3.79 kB

	---
	license: gpl-3.0
	datasets:
	- Mxode/BiST
	language:
	- en
	- zh
	pipeline_tag: translation
	library_name: transformers
	---
	# NanoTranslator-XL

	English \| [简体中文](README_zh-CN.md)

	## Introduction

	This is the x-large model of the NanoTranslator, currently supported only in English to Chinese.

	The ONNX version of the model is also available in the repository.


	\| Size \| P. \| Arch. \| Act. \| V. \| H. \| I. \| L. \| A.H. \| K.H. \| Tie \|
	\| :--: \| :-----: \| :--: \| :--: \| :--: \| :-----: \| :---: \| :------: \| :----: \| :----: \| :--: \|
	\| XL \| 100 \| LLaMA \| SwiGLU \| 16K \| 768 \| 4096 \| 8 \| 24 \| 8 \| True \|
	\| L \| 78 \| LLaMA \| GeGLU \| 16K \| 768 \| 4096 \| 6 \| 24 \| 8 \| True \|
	\| M2 \| 22 \| Qwen2 \| GeGLU \| 4K \| 432 \| 2304 \| 6 \| 24 \| 8 \| True \|
	\| M \| 22 \| LLaMA \| SwiGLU \| 8K \| 256 \| 1408 \| 16 \| 16 \| 4 \| True \|
	\| S \| 9 \| LLaMA \| SwiGLU \| 4K \| 168 \| 896 \| 16 \| 12 \| 4 \| True \|
	\| XS \| 2 \| LLaMA \| SwiGLU \| 2K \| 96 \| 512 \| 12 \| 12 \| 4 \| True \|

	- P. - Parameters (in million)
	- V. - vocab size
	- H. - hidden size
	- I. - intermediate size
	- L. - num layers
	- A.H. - num attention heads
	- K.H. - num kv heads
	- Tie - tie word embeddings



	## How to use

	Prompt format as follows：

	```
	<\|im_start\|> {English Text} <\|endoftext\|>
	```

	### Directly using transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_path = 'Mxode/NanoTranslator-XL'

	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path)

	def translate(text: str, model, **kwargs):
	generation_args = dict(
	max_new_tokens = kwargs.pop("max_new_tokens", 512),
	do_sample = kwargs.pop("do_sample", True),
	temperature = kwargs.pop("temperature", 0.55),
	top_p = kwargs.pop("top_p", 0.8),
	top_k = kwargs.pop("top_k", 40),
	**kwargs
	)

	prompt = "<\|im_start\|>" + text + "<\|endoftext\|>"
	model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	generated_ids = model.generate(model_inputs.input_ids, **generation_args)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	return response

	text = "I love to watch my favorite TV series."

	response = translate(text, model, max_new_tokens=64, do_sample=False)
	print(response)
	```


	### ONNX

	It has been measured that reasoning with ONNX models will be 2-10 times faster than reasoning directly with transformers models.

	You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-XL/tree/onnx) manually and download to local.

	reference docs:

	- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
	- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)

	Using ORTModelForCausalLM

	```python
	from optimum.onnxruntime import ORTModelForCausalLM
	from transformers import AutoTokenizer

	model_path = "your/folder/to/onnx_model"

	ort_model = ORTModelForCausalLM.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	text = "I love to watch my favorite TV series."

	response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
	print(response)
	```

	Using pipeline

	```python
	from optimum.pipelines import pipeline

	model_path = "your/folder/to/onnx_model"
	pipe = pipeline("text-generation", model=model_path, accelerator="ort")

	text = "I love to watch my favorite TV series."

	response = pipe(text, max_new_tokens=64, do_sample=False)
	response
	```