QuantFactory
/

Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF

+---
+base_model: EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code
+language:
+- en
+license: apache-2.0
+tags:
+- text-generation-inference
+- transformers
+- unsloth
+- llama
+- trl
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF
+This is quantized version of [EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO](https://huggingface.co/EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO) created using llama.cpp
+# Original Model Card
+# Agent LLama
+Experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be agentic coder. It fine tuned with code dataset for Coder Agent.
+It has some build-in agent features:
+- search
+- calculator
+- ReAct. [Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
+  - fine tuned ReAct for better responses
+Other noticable features:
+- Self learning using unsloth. (in progress)
+- can be used in RAG applications
+- Memory. [**please use Langchain memory , section Message persistence**](https://python.langchain.com/docs/tutorials/chatbot/)
+It is perfectly use for Langchain or LLamaIndex.
+Context Window: 128K
+### Installation
+```bash
+pip install --upgrade "transformers>=4.43.2" torch==2.3.1 accelerate vllm==0.5.3.post1
+```
+Developers can easily integrate EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K into their projects using popular libraries like Transformers and vLLM. The following sections illustrate the usage with simple hands-on examples:
+Optional: to use build in tool, please add to system prompt: "Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n"
+#### ToT - Tree of Thought
+- Use system prompt:
+```python
+"Imagine three different experts are answering this question.
+All experts will write down 1 step of their thinking,
+then share it with the group.
+Then all experts will go on to the next step, etc.
+If any expert realises they're wrong at any point then they leave.
+The question is..."
+```
+#### ReAct
+example from langchain agent - [langchain React agent](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/react/agent.py)
+- Use system prompt:
+```python
+"""
+Answer the following questions as best you can. You have access to the following tools:
+            {tools}
+            Use the following format:
+            Question: the input question you must answer
+            Thought: you should always think about what to do
+            Action: the action to take, should be one of [{tool_names}]
+            Action Input: the input to the action
+            Observation: the result of the action
+            ... (this Thought/Action/Action Input/Observation can repeat N times)
+            Thought: I now know the final answer
+            Final Answer: the final answer to the original input question
+            Begin!
+            Question: {input}
+            Thought:{agent_scratchpad}
+"""
+```
+### Conversational Use-case
+#### Use with [Transformers](https://github.com/huggingface/transformers)
+##### Using `transformers.pipeline()` API , best use for 4bit for fast response.
+```python
+import transformers
+import torch
+from langchain_community.llms import HuggingFaceEndpoint
+from langchain_community.chat_models.huggingface import ChatHuggingFace
+from transformers import BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype="float16",
+    bnb_4bit_use_double_quant=True,
+)
+model_id = "EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"quantization_config": quantization_config}, #for fast response. For full 16bit inference, remove this code.
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content":  """
+    Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
+    You are a coding assistant with expert with everything\n
+    Ensure any code you provide can be executed \n
+    with all required imports and variables defined. List the imports.  Structure your answer with a description of the code solution. \n
+    write only the code. do not print anything else.\n
+    debug code if error occurs. \n
+    Here is the user question: {question}
+    """},
+    {"role": "user", "content": "Create a bar plot showing the market capitalization of the top 7 publicly listed companies using matplotlib"}
+]
+outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
+print(outputs[0]["generated_text"][-1])
+```
+# Example:
+Please go to Colab for sample of the code using Langchain [Colab](https://colab.research.google.com/drive/129SEHVRxlr24r73yf34BKnIHOlD3as09?authuser=1)
+# Unsloth Fast
+```python
+%%capture
+# Installs Unsloth, Xformers (Flash Attention) and all other packages!
+!pip install unsloth
+# Get latest Unsloth
+!pip install --upgrade --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
+!pip install langchain_experimental
+from unsloth import FastLanguageModel
+from google.colab import userdata
+# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
+fourbit_models = [
+    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
+    "unsloth/gemma-7b-it-bnb-4bit",
+] # More models at https://huggingface.co/unsloth
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name = "EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code",
+    max_seq_length = 128000,
+    load_in_4bit = True,
+    token =userdata.get('HF_TOKEN')
+)
+def chatbot(query):
+  messages = [
+      {"from": "system", "value":
+       """
+      Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
+      You are a coding assistant with expert with everything\n
+      Ensure any code you provide can be executed \n
+      with all required imports and variables defined. List the imports.  Structure your answer with a description of the code solution. \n
+      write only the code. do not print anything else.\n
+      use ipython for search tool. \n
+      debug code if error occurs. \n
+      Here is the user question: {question}
+      """
+       },
+      {"from": "human", "value": query},
+  ]
+  inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")
+  text_streamer = TextStreamer(tokenizer)
+  _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048, use_cache = True)
+```
+# Execute code (Make sure to use virtual environments)
+```bash
+python3 -m venv env
+source env/bin/activate
+```
+##  Execution code responses from Llama
+#### Please use execute python code function for local.  For langchain, please use Python REPL() to execute code
+execute code funciton locally in python:
+```python
+def execute_Python_code(code):
+     # A string stream to capture the outputs of exec
+    output = io.StringIO()
+    try:
+        # Redirect stdout to the StringIO object
+        with contextlib.redirect_stdout(output):
+            # Allow imports
+            exec(code, globals())
+    except Exception as e:
+        # If an error occurs, capture it as part of the output
+        print(f"Error: {e}", file=output)
+    return output.getvalue()
+```
+Langchain python Repl
+- Install
+```bash
+!pip install langchain_experimental
+```
+Code:
+```python
+from langchain_core.tools import Tool
+from langchain_experimental.utilities import PythonREPL
+python_repl = PythonREPL()
+# You can create the tool to pass to an agent
+repl_tool = Tool(
+    name="python_repl",
+    description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
+    func=python_repl.run,
+)
+repl_tool(outputs[0]["generated_text"][-1])
+```
+# Safety inputs/ outputs procedures
+Fo all inputs, please use Llama-Guard: meta-llama/Llama-Guard-3-8B for safety classification.
+Go to model card [Llama-Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B)
+# Uploaded  model
+- **Developed by:** EpistemeAI
+- **License:** apache-2.0
+- **Finetuned from model :** EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code
+This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)