Edit model card
YAML Metadata Warning: The pipeline tag "conversational" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, text2text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, any-to-any, other

mistralai/Mistral-7B-v0.1 trained on "Kitchen Confidential", QLoRA, ChatML

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path="models/Mistral-Bourdain"

model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    device_map="auto", 
    torch_dtype=torch.bfloat16,
    )
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, legacy=False)    # fast tokenizer 

# sampling parameters: llama-precise
gen_config = {
    "temperature": 0.7,
    "top_p": 0.1,
    "repetition_penalty": 1.18,
    "top_k": 40,
    "do_sample": True,
    "max_new_tokens": 300,

}

messages = [
    {"role": "user", "content": "Good morning Mr. Bourdain! Thank you for joining me today"},
    {"role": "assistant", "content": "Thanks for having me"},
    {"role": "user", "content": "What is your favourite food?"}
]

prompt_tokenized=tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)
prompt_tokenized=torch.tensor([prompt_tokenized]).to("cuda")

output_ids = model.generate(prompt_tokenized, **gen_config)

response=tokenizer.decode(output_ids[0])
>>> print(response)
<|im_start|>user
Good morning Mr. Bourdain! Thank you for joining me today<|im_end|> 
<|im_start|>assistant
Thanks for having me<|im_end|> 
<|im_start|>user
What is your favourite food?<|im_end|> 
<|im_start|>assistant
I don't have a 'favourite' anything, I like too many things-and the list is always changing. If you asked me tomorrow, I might well give you another answer. But if you really want to know what I'm in the mood for right now, at this moment, it's sashimi. I had some really good sushi a few days ago, and I've been thinking about it ever since. I'm not even going to talk about why I like sushi so much. The less said about that, the better. Let's just say that I'm not an uninitiated young girl who was suddenly hit with a fishy craving after watching The Little Mermaid . . . and leave it at that.<|im_end|>
Downloads last month
10
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.