---
inference: false
language:
- de
library_name: transformers
license: apache-2.0
model_creator: jphme
model_name: EM German
model_type: mistral
pipeline_tag: text-generation
prompt_template: 'Du bist ein hilfreicher Assistent. USER: Was ist 1+1? ASSISTANT:'
extra_gated_prompt: "Purchase access to this repo [HERE](https://buy.stripe.com/cN27te1dzeqfaMU8xl)"
tags:
- pytorch
- german
- deutsch
- mistral
- leolm
- function calling
---

# Function Calling Leo German Mistral 7B
- This function calling model extends the hugging face Leo German Mistral model with function calling capabilities.
- The model responds with a structured json argument with the function name and arguments.

Purchase access [here](https://buy.stripe.com/cN27te1dzeqfaMU8xl)

**Recent Updates**
- Nov 27th 2023 -> Added Leo German
- Nov 15th 2023 -> added Yi 200k context models in 6B and 34B form.
- Nov 8th 2023 -> added Zephyr beta, an improved version of Mistral 7B (achieved via DPO)
- November 6th 2023 -> added Deepseek Coder 1.3B, 6.7B and 33B
- October 11th 2023 -> added Mistral 7B with function calling
- October 11th 2023 -> new models pushed, trained on an improved underlying dataset

**Improvements with v2**
1. Shortened syntax: Only function descriptions are needed for inference and no added instruction is required.
2. Function descriptions are moved outside of the system prompt. This avoids the behaviour of function calling being affected by how the system prompt had been trained to influence the model.

Latest Models:
- Yi-6B-200k context with function calling ([Base Model](Trelis/Yi-6B-200K-Llamafied-function-calling-v2)), ([PEFT Adapters](Trelis/Yi-6B-200K-Llamafied-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/00gdRC7BX1Dt08gbJp)
- Yi-34B-200k context with function calling ([Base Model](Trelis/Yi-34B-200K-Llamafied-chat-SFT-function-calling-v2)), ([PEFT Adapters](Trelis/Yi-34B-200K-Llamafied-chat-SFT-function-calling-adapters-v2)), ([AWQ](Trelis/Yi-34B-200K-Llamafied-chat-SFT-function-calling-v2-AWQ)), ([GGUF - files are in the main branch of the base model]) - Paid, [purchase here](https://buy.stripe.com/fZe14Q7BX4PF7AIeVG)
- Deepseek-Coder-1.3B-Instruct with function calling ([Base Model](https://huggingface.co/Trelis/deepseek-coder-1.3b-instruct-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/deepseek-coder-1.3b-instruct-function-calling-adapters-v2/settings)) - Paid, [purchase here](https://buy.stripe.com/9AQbJubSda9Z8EM00A)
- Llama-7B-chat with function calling ([Base Model](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-adapters-v2)), ([GGUF - files are in the main branch of the base model]) - Free
- zephyr-7b-beta with function calling ([Base Model](https://huggingface.co/Trelis/zephyr-7b-beta-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/zephyr-7b-beta-function-calling-adapters-v2)), ([GGUF - files are in the main branch of the base model]) - Paid, [purchase here](https://buy.stripe.com/14k00M4pLeqf9IQbJk)
- Mistral-7B-Instruct-v0.1 with function calling ([Base Model](https://huggingface.co/Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Mistral-7B-Instruct-v0.1-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/cN2cNybSdgyncV25kQ)
- Deepseek-Coder-6.7B-Instruct with function calling ([Base Model](https://huggingface.co/Trelis/deepseek-coder-6.7b-instruct-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/deepseek-coder-6.7b-instruct-function-calling-adapters-v2/settings)) - Paid, [purchase here](https://buy.stripe.com/cN27te5tPa9Z6wEdRo)
- Deepseek-Coder-33B-Instruct with function calling ([Base Model](https://huggingface.co/Trelis/deepseek-coder-33b-instruct-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/deepseek-coder-33b-instruct-function-calling-adapters-v2/settings)) - Paid, [purchase here](https://buy.stripe.com/9AQ6pabSd81RcV25kT)
- CodeLlama-34B-Instruct with function calling ([Base Model](https://huggingface.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/cN27teg8t2Hx5sA8wM)
- Llama-70B-chat with function calling ([Base Model](https://huggingface.co/Trelis/Llama-2-70b-chat-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Llama-2-70b-chat-hf-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/8wMdRC1dzci75sA4gy)

Other Models:
- Llama-13B-chat with function calling ([Base Model](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)

## Which model is best for what?
1. Larger models are better at handling function calling. The cross entropy training losses are approximately 0.5 for 7B, 0.4 for 13B, 0.3 for 70B. The absolute numbers don't mean anything but the relative values offer a sense of relative performance.
1. Provide very clear function descriptions, including whether the arguments are required or what the default values should be.
1. Make sure to post-process the language model's response to check that all necessary information is provided by the user. If not, prompt the user to let them know they need to provide more info (e.g. their name, order number etc.)

Check out this video overview of performance [here](https://www.loom.com/share/8d7467de95e04af29ff428c46286946c?sid=683c970e-6063-4f1e-b184-894cc1d96115)

Some short tips based on models as of November 2023:
- DeepSeek Coder (all sizes) = best coding model.
- Yi 34B = best for long context.
- Llama 70B = strongest overall model (4k context).
- Mistral 7B = Best model if you have only 8 GB of VRAM (run with quantization).
Zephyr is better than Mistral 7B but is not openly licensed for commercial use.

## Licensing

Llama-7B with function calling is licensed according to the Meta Community license.

Mistral-7B, Llama-13B, Code-llama-34b, Llama-70B and Falcon-180B with function calling require the purchase of access.
- Commercial license purchase required per user.
- Licenses are not transferable to other users/entities.

Use of all Llama models with function calling is further subject to terms in the [Meta license](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).

Yi models are subject to the Yi license, which permits commercial use as of Nov 15th 2023.

Zephr models were generated using Ultrachat, which relies on openai. OpenAI does not permit the use of it's models to train competitive models. This makes it unclear as to whether Zephyr may be used commercial. Buyers/users do so at their sole risk.

## Dataset

The dataset used for training this model can be found at [Trelis Function Calling Extended Dataset](https://huggingface.co/datasets/Trelis/function_calling_extended).

## Inference

!!! Make sure to check the prompt format below and adjust inference accordingly !!!

### Quick Start in Google Colab
Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)

### Text Generation Inference
You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)

Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)

And here is a video showing it working with [llama-2-7b-chat-hf-function-calling-v2](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) (note that we've now moved to v2)

Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).

#### Runpod Quickstart
For a quickstart with runpod, you can use this template: [here](https://runpod.io/gsc?template=edxvuji38p&ref=jmfkcdio)

Once up and running, you can make queries to:
```
https://{YOUR_POD_ID}-8080.proxy.runpod.net
```
Then, you can make queries to the api as follows:
```
curl https://{YOUR_POD_ID}-8080.proxy.runpod.net/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'
```
Or use /generate_stream for streaming. You can also write python scripts and use python to make requests. More info from the text-generation-inference [github repo](https://github.com/huggingface/text-generation-inference/)

### Run on your laptop
Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)

After running llama.cpp server, you can call the server with this command, with thanks to @jdo300:
```
import requests
import json

# Define the roles and markers
B_FUNC, E_FUNC = "<FUNCTIONS>", "</FUNCTIONS>\n\n"
B_INST, E_INST = "[INST] ", " [/INST]" #Llama style
# B_INST, E_INST = "\n### Instruction:\n", "\n### Response:\n" #DeepSeek Coder Style
# B_INST, E_INST = "Human: ", " Assistant: " #Yi Style

# Define the function metadata
function_metadata = {
    "function": "search_bing",
    "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}

# Define the user prompt
user_prompt = 'Search for the latest news on AI.'

# Format the function list and prompt
function_list = json.dumps(function_metadata, indent=4)
prompt = f"{B_FUNC}{function_list.strip()}{E_FUNC}{B_INST}{user_prompt.strip()}{E_INST}\n\n"

# Define the API endpoint
url = "http:/localhost:8080/completion"

# Send the POST request to the API server
response = requests.post(url, json={"prompt": prompt})

# Print the response
print(response.json())
```

## Syntax

### Prompt Templates

The function descriptions must be wrapped within a function block. You can put this function below before or after the system message block.

Example without a system message:
```
  # Define the roles and markers
  B_FUNC, E_FUNC = "<FUNCTIONS>", "</FUNCTIONS>\n\n"
  B_INST, E_INST = "[INST] ", " [/INST]" #Llama style
  # B_INST, E_INST = "\n### Instruction:\n", "\n### Response:\n" #DeepSeek Coder Style
  # B_INST, E_INST = "Human: ", " Assistant: " #Yi Style

  functionList = {function_1_metadata}{function_2_metadata}...
  user_prompt = '...'

  # Format your prompt template
  prompt = f"{B_FUNC}{functionList.strip()}{E_FUNC}{B_INST}{user_prompt.strip()}{E_INST}\n\n"
```

Example with a system message:
```
  # Define the roles and markers
  B_FUNC, E_FUNC = "<FUNCTIONS>", "</FUNCTIONS>\n\n"
  B_INST, E_INST = "[INST] ", " [/INST]" #Llama style
  # B_INST, E_INST = "\n### Instruction:\n", "\n### Response:\n" #DeepSeek Coder Style
  # B_INST, E_INST = "Human: ", " Assistant: " #Yi Style
  B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

  # assuming functionList is defined as above
  system_prompt = '...'
  user_prompt = '...'

  # Format your prompt template
  prompt = f"{B_FUNC}{functionList.strip()}{E_FUNC}{B_INST}{B_SYS}{system_prompt.strip()}{E_SYS}{user_prompt.strip()}{E_INST}\n\n"

```
Notice that the function block is placed at the very start of the sequence, before 'B_INST'.

### Function Metadata Template

functionMetadata should be a string representation of a JSON object, like this:

```
"functionMetadata": {
        "function": "search_bing",
        "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
        "arguments": [
            {
                "name": "query",
                "type": "string",
                "description": "The search query string"
            }
        ]
    }
'''
```

and the language model should respond with a json object formatted like this:
```
{
    "function": "function_name",
    "arguments": {
        "argument1": "argument_value",
        "argument2": "argument_value"
    }
}
```

It is recommended to handle cases where:
- There is no json object in the response
- The response contains text in addition to the json response

### Sample functionList

```
{
    "function": "search_bing",
    "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}

{
    "function": "search_arxiv",
    "description": "Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.",
    "arguments": [
        {
            "name": "query",
            "type": "string",
            "description": "The search query string"
        }
    ]
}
```

### Training Set Argument Types
Models were fine-tuned on argument types including strings, numbers and arrays. The training set includes function calls with 0, 1, 2 or 3 arguments. The larger the model the better it will generalise beyond these types.

Here is a function call with an array:
```
{ "function": "delete_file", "arguments": { "fileNames": [ "Dissecting Transformer Length Extrapolation via The Lens of Receptive Field Analysis", "Luna- Linear Unified Nested Attention", "Substack_Inc_2021_2020_GAAP_Audited_Financials" ] } }
```
Here is a function call with three arguments:
```
{ "function": "save_chat", "arguments": { "fileName": "KiteDiscussion", "fileDescription": "Notes on one and two stringed kites", "fileContent": "--- **Types of Kite** There are one and two string kites. The two string ones are easier to control, although you can get the cords tangled. The one-stringed ones are sometimes used for kite fights, and you lose the kite and have to run after it if the string breaks. ---" } }
```

~

Below follows information on the original:

~

LeoLM Mistral is the showcase-model of the EM German model family and as of its release in our opinion the best open German LLM.

**Many thanks to the [LeoLM](https://huggingface.co/LeoLM) team for the publication of a base model that has received continued pretraining with German texts, greatly improving generation capabilities.**

*Please note that the Mistral architecture is very recent and still not supported by all libraries (e.g. AutoGPTQ). In case of any problems, please try a different format/base model.*

# Table of Contents

1. [Introduction](#introduction)
2. [Links & Demos](#links--demos)
   - [Model Links](#model-links)
   - [Demos](#demos)
3. [Prompt Format](#prompt-format)
4. [Example Output](#example-output)
5. [Acknowledgements](#acknowledgements)
6. [Contact](#contact)
7. [Disclaimer](#disclaimer)

# Introduction

**EM German** is a Llama2/Mistral/LeoLM-based model family, finetuned on a large dataset of various instructions in German language. The models are optimized for German text, providing proficiency in understanding, generating, and interacting with German language content.

We offer versions based on 7b, 13b and 70b Llama-2, Mistral and LeoLM (Llama-2/Mistral with continued pretraining on German texts) models.

Please find all Informations, Example Outputs, the special RAG prompt format, output examples and eval results for the EM German Model family in [our Github Repository](https://github.com/jphme/EM_German).  ([Deutsche Version](https://github.com/jphme/EM_German/blob/main/README_DE.md)). You will also find instructions on how to run the models with a GUI (GPT4All/LM Studio).


# Links & Demos

## Model Links

Should you only try one model version, I strongly recommend the **[LeoLM Mistral](https://huggingface.co/jphme/em_german_leo_mistral)** model which offers by far the best combination of performance and computing requirements!

| Base Model | HF    | GPTQ  | GGUF  | AWQ   |
|-------|-------|-------|-------|-------|
| Llama2 7b   | [Link](https://huggingface.co/jphme/em_german_7b_v01) | [Link](https://huggingface.co/TheBloke/em_german_7b_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_7b_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_7b_v01-AWQ) |
| Llama2 13b  | [Link](https://huggingface.co/jphme/em_german_13b_v01) | [Link](https://huggingface.co/TheBloke/em_german_13b_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_13b_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_13b_v01-AWQ) |
| Llama2 70b   | [Link](https://huggingface.co/jphme/em_german_70b_v01) | [Link](https://huggingface.co/TheBloke/em_german_70b_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_70b_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_70b_v01-AWQ) |
| [Mistral 7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | [Link](https://huggingface.co/jphme/em_german_mistral_v01) | [Link](https://huggingface.co/TheBloke/em_german_mistral_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_mistral_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_mistral_v01-AWQ) |
| [LeoLM 7b](https://huggingface.co/LeoLM/leo-hessianai-7b) | [Link](https://huggingface.co/jphme/em_german_7b_leo) | [Link](https://huggingface.co/jphme/em_german_7b_leo_gptq) | [Link](hhttps://huggingface.co/jphme/em_german_7b_leo_gguf) | tbc |
| [LeoLM 13b](https://huggingface.co/LeoLM/leo-hessianai-13b) | soon | soon | [Link](https://huggingface.co/jphme/em_german_13b_leo_gguf) | tbc |
| [LeoLM Mistral](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b) | [Link](https://huggingface.co/jphme/em_german_leo_mistral) | [Link](https://huggingface.co/TheBloke/em_german_leo_mistral-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_leo_mistral-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_leo_mistral-AWQ) |

### Notes about the different versions:
See also the [comparison of example outputs](https://github.com/jphme/EM_German/blob/main/example_output_comparison.md) for a comparison of (7b) model capabilities.

If you get unsatisfying results with one or another EM German version, please try a different (and/or larger) model or version for your usecase.


## Demos:

You can use some of the models with **free** google Colab instances (e.g. the 7bn model in 8bit or the 13b model with GPTQ):

* [Example Colab Notebook for 13b with GPTQ](https://colab.research.google.com/drive/1IJfJdVwGkfe5MYOqHptystR3FBeEUdGn?usp=sharing)
* [Example Colab Notebook for 7b with 8bit-Loading](https://colab.research.google.com/drive/1bsv6vkLM4AlCpSyXA6ol9P32zxZmf7Zu?usp=sharing)
* [Example Colab Notebook for 7b Mistral GGUF with Grammar-based structured output](https://colab.research.google.com/drive/17O-a3D4BnVc_V9Of3dFfed18v1ULu1Sv?usp=sharing)


# Prompt Format

This model follows the Vicuna format without linebreaks (but should work with linebreaks as well). The format is as follows:

```
Du bist ein hilfreicher Assistent. USER: <instruction> ASSISTANT:
```

You can swap the standard system prompt for a better suited one (see below for RAG-tasks).

# Example Output

Please find a selection of various example outputs for all 7b versions of EM German [here](https://github.com/jphme/EM_German/blob/main/example_output_comparison.md).

**Prompt:**
> Erkläre die Bedeutung von drei wesentlichen Elementen der Bilanzierung und des Accountings nach IFRS für Unternehmen, die auf internationaler Ebene tätig sind.

**Antwort EM German Leo Mistral 7b:**
```
Die International Financial Reporting Standards (IFRS) sind eine Reihe von internationalen Rechnungslegungsstandards, die von der International Accounting Standards Board (IASB) entwickelt wurden. Sie sollen dazu beitragen, dass Finanzberichte vergleichbar, zuverlässig und relevant sind.

1. Fair Value: Dieses Konzept ist ein zentraler Bestandteil der IFRS. Es besagt, dass Vermögenswerte und Schulden zum fairen Wert bilanziert werden sollten, was dem Preis entspricht, zu dem sie zwischen Marktteilnehmern gehandelt werden könnten. Dadurch wird sichergestellt, dass das Unternehmen eine genaue Darstellung seiner finanziellen Lage liefert.

2. Konsolidierung: Die IFRS verlangen von Unternehmen, ihre Finanzergebnisse auf konsolidierter Basis darzustellen. Das bedeutet, dass alle Tochtergesellschaften und verbundenen Unternehmen in die Bilanz des Mutterunternehmens aufgenommen werden. Dadurch wird ein klareres Bild von der finanziellen Situation des Konzerns geschaffen.

3. Vollständigkeitsprinzip: Die IFRS verlangen von Unternehmen, alle relevanten Informationen offenzulegen. Dies umfasst sowohl positive als auch negative Faktoren, wie zum Beispiel Gewinne und Verluste, Einnahmen und Ausgaben sowie Risiken und Unsicherheiten. Durch dieses Prinzip wird sichergestellt, dass Investoren, Kreditgeber und andere Stakeholder eine umfassende Einschätzung der finanziellen Lage des Unternehmens vornehmen können.
```

**Prompt:**
> Ein Mensch atmet durchschnittlich 16 Mal pro Minute. Wie oft atmet er an einem Tag?

**Antwort EM German Leo Mistral 7b:**
```
 Zuerst berechnen wir, wie oft eine Person in einer Stunde atmet. Wir wissen, dass eine Stunde 60 Minuten hat, also:

Atemzüge pro Stunde = Atemfrequenz * Minuten pro Stunde
= 16 Atemzüge pro Minute * 60 Minuten
= 960 Atemzüge

Um herauszufinden, wie viele Atemzüge eine Person an einem Tag macht, müssen wir diese stündliche Rate mit der Anzahl der Stunden an einem Tag multiplizieren, also 24:

Atemzüge pro Tag = Atemzüge pro Stunde * Stunden pro Tag
= 960 Atemzüge * 24 Stunden
= 23.040 Atemzüge

Also macht ein durchschnittlicher Mensch etwa 23.040 Atemzüge an einem Tag.
```
(For more examples,  please visit [our Github Repository](https://github.com/jphme/EM_German).)

# Acknowledgements:

Many thanks to [winglian/caseus](https://huggingface.co/winglian) for his great work on Axolotl which I used to train the EM mdoels. I am also grateful to [Jon Durbin](https://huggingface.co/jondurbin) and his [Airoboros](https://huggingface.co/jondurbin/airoboros-l2-70b-2.2.1) models and code from which I borrowed many ideas and code snippets.
Additionally many thanks to [Björn Plüster](https://huggingface.co/bjoernp) and the LeoLM team for the outstanding pretraining work on LeoLM and last but not least many many thanks to [TheBloke](https://huggingface.co/TheBloke) for the preparation of quantized versions in all formats under the sun.
The 70b model was trained with support of the [OVH Cloud Startup Program](https://startup.ovhcloud.com/en/).

# Contact

For detailed feedback & feature requests, please open an issue or get in contact with me via [my website](https://www.jph.me).

*PS: We are also always interested in support for our startup [ellamind](https://ellamind.com), which will offer customized models for business applications in the future (we are currently still in stealth mode). If you use our models for business applications and have advanced needs for specialized capabilities, please get in touch.*

# Disclaimer:

I am not responsible for the actions of third parties who use this model or the outputs of the model. This model should only be used for research purposes. The original base model license applies and is distributed with the model files.