g-ronimo
/

mamba-1.4b-OA

Text Generation

Inference Endpoints

Model card Files Files and versions Community

g-ronimo commited on Dec 10, 2023

Commit

483bd9b

•

1 Parent(s): 23d9b41

Update README.md

Files changed (1) hide show

README.md +39 -2

README.md CHANGED Viewed

@@ -2,12 +2,49 @@
 license: apache-2.0
 datasets:
 - OpenAssistant/oasst_top1_2023-08-25
 ---
 * state-spaces/mamba-1.4b finetuned on Open Assistant conversations, 3 epochs
 * talks ChatML (w/o system message)
-* this is a test, not a useful SOTA bot
-* code: https://github.com/geronimi73/mamba
 example conversation:

 license: apache-2.0
 datasets:
 - OpenAssistant/oasst_top1_2023-08-25
+pipeline_tag: text-generation
 ---
+**this is a test, not a useful SOTA bot**
 * state-spaces/mamba-1.4b finetuned on Open Assistant conversations, 3 epochs
 * talks ChatML (w/o system message)
+* training code: https://github.com/geronimi73/mamba
+inference:
+```python
+import torch
+from transformers import AutoTokenizer
+from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
+modelpath="g-ronimo/mamba-1.4b-OA"
+model = MambaLMHeadModel.from_pretrained(
+    modelpath,
+    dtype=torch.bfloat16,
+    device="cuda"
+)
+tokenizer = AutoTokenizer.from_pretrained(modelpath)
+question="Why am I so tired?"
+template="<|im_start|>user\n{q}\n<|im_end|>\n<|im_start|>assistant"
+prompt=template.format(q=question)
+prompt_tokenized=tokenizer(prompt, return_tensors="pt").to("cuda")["input_ids"]
+output_tokenized = model.generate(
+    input_ids=prompt_tokenized,
+    max_length=len(prompt_tokenized[0])+100,
+    cg=True,
+    output_scores=True,
+    enable_timing=False,
+    temperature=0.7,
+    top_k=40,
+    top_p=0.1,
+    )
+answer=tokenizer.decode(output_tokenized[0])
+print(answer)
+```
 example conversation: