aashish1904 commited on
Commit
734e37b
1 Parent(s): beef3ee

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +247 -0
README.md ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ base_model: EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code
5
+ language:
6
+ - en
7
+ license: apache-2.0
8
+ tags:
9
+ - text-generation-inference
10
+ - transformers
11
+ - unsloth
12
+ - llama
13
+ - trl
14
+
15
+ ---
16
+
17
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
18
+
19
+
20
+ # QuantFactory/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO-GGUF
21
+ This is quantized version of [EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO](https://huggingface.co/EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO) created using llama.cpp
22
+
23
+ # Original Model Card
24
+
25
+
26
+ # Agent LLama
27
+
28
+ Experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be agentic coder. It fine tuned with code dataset for Coder Agent.
29
+ It has some build-in agent features:
30
+ - search
31
+ - calculator
32
+ - ReAct. [Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)
33
+ - fine tuned ReAct for better responses
34
+
35
+ Other noticable features:
36
+ - Self learning using unsloth. (in progress)
37
+ - can be used in RAG applications
38
+ - Memory. [**please use Langchain memory , section Message persistence**](https://python.langchain.com/docs/tutorials/chatbot/)
39
+
40
+ It is perfectly use for Langchain or LLamaIndex.
41
+
42
+ Context Window: 128K
43
+
44
+ ### Installation
45
+ ```bash
46
+ pip install --upgrade "transformers>=4.43.2" torch==2.3.1 accelerate vllm==0.5.3.post1
47
+ ```
48
+
49
+ Developers can easily integrate EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K into their projects using popular libraries like Transformers and vLLM. The following sections illustrate the usage with simple hands-on examples:
50
+
51
+ Optional: to use build in tool, please add to system prompt: "Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n"
52
+
53
+ #### ToT - Tree of Thought
54
+ - Use system prompt:
55
+ ```python
56
+ "Imagine three different experts are answering this question.
57
+ All experts will write down 1 step of their thinking,
58
+ then share it with the group.
59
+ Then all experts will go on to the next step, etc.
60
+ If any expert realises they're wrong at any point then they leave.
61
+ The question is..."
62
+ ```
63
+ #### ReAct
64
+ example from langchain agent - [langchain React agent](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/react/agent.py)
65
+ - Use system prompt:
66
+ ```python
67
+ """
68
+ Answer the following questions as best you can. You have access to the following tools:
69
+
70
+ {tools}
71
+
72
+ Use the following format:
73
+
74
+ Question: the input question you must answer
75
+ Thought: you should always think about what to do
76
+ Action: the action to take, should be one of [{tool_names}]
77
+ Action Input: the input to the action
78
+ Observation: the result of the action
79
+ ... (this Thought/Action/Action Input/Observation can repeat N times)
80
+ Thought: I now know the final answer
81
+ Final Answer: the final answer to the original input question
82
+
83
+ Begin!
84
+
85
+ Question: {input}
86
+ Thought:{agent_scratchpad}
87
+ """
88
+ ```
89
+
90
+ ### Conversational Use-case
91
+ #### Use with [Transformers](https://github.com/huggingface/transformers)
92
+ ##### Using `transformers.pipeline()` API , best use for 4bit for fast response.
93
+ ```python
94
+ import transformers
95
+ import torch
96
+ from langchain_community.llms import HuggingFaceEndpoint
97
+ from langchain_community.chat_models.huggingface import ChatHuggingFace
98
+
99
+ from transformers import BitsAndBytesConfig
100
+
101
+ quantization_config = BitsAndBytesConfig(
102
+ load_in_4bit=True,
103
+ bnb_4bit_quant_type="nf4",
104
+ bnb_4bit_compute_dtype="float16",
105
+ bnb_4bit_use_double_quant=True,
106
+ )
107
+
108
+ model_id = "EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code"
109
+ pipeline = transformers.pipeline(
110
+ "text-generation",
111
+ model=model_id,
112
+ model_kwargs={"quantization_config": quantization_config}, #for fast response. For full 16bit inference, remove this code.
113
+ device_map="auto",
114
+ )
115
+ messages = [
116
+ {"role": "system", "content": """
117
+ Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
118
+ You are a coding assistant with expert with everything\n
119
+ Ensure any code you provide can be executed \n
120
+ with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n
121
+ write only the code. do not print anything else.\n
122
+ debug code if error occurs. \n
123
+ Here is the user question: {question}
124
+ """},
125
+ {"role": "user", "content": "Create a bar plot showing the market capitalization of the top 7 publicly listed companies using matplotlib"}
126
+ ]
127
+ outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
128
+ print(outputs[0]["generated_text"][-1])
129
+ ```
130
+
131
+ # Example:
132
+ Please go to Colab for sample of the code using Langchain [Colab](https://colab.research.google.com/drive/129SEHVRxlr24r73yf34BKnIHOlD3as09?authuser=1)
133
+
134
+ # Unsloth Fast
135
+
136
+ ```python
137
+ %%capture
138
+ # Installs Unsloth, Xformers (Flash Attention) and all other packages!
139
+ !pip install unsloth
140
+ # Get latest Unsloth
141
+ !pip install --upgrade --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
142
+ !pip install langchain_experimental
143
+
144
+ from unsloth import FastLanguageModel
145
+ from google.colab import userdata
146
+
147
+
148
+ # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
149
+ fourbit_models = [
150
+ "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
151
+ "unsloth/gemma-7b-it-bnb-4bit",
152
+ ] # More models at https://huggingface.co/unsloth
153
+
154
+ model, tokenizer = FastLanguageModel.from_pretrained(
155
+ model_name = "EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code",
156
+ max_seq_length = 128000,
157
+ load_in_4bit = True,
158
+ token =userdata.get('HF_TOKEN')
159
+ )
160
+ def chatbot(query):
161
+ messages = [
162
+ {"from": "system", "value":
163
+ """
164
+ Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n
165
+ You are a coding assistant with expert with everything\n
166
+ Ensure any code you provide can be executed \n
167
+ with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n
168
+ write only the code. do not print anything else.\n
169
+ use ipython for search tool. \n
170
+ debug code if error occurs. \n
171
+ Here is the user question: {question}
172
+ """
173
+ },
174
+ {"from": "human", "value": query},
175
+ ]
176
+ inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda")
177
+
178
+ text_streamer = TextStreamer(tokenizer)
179
+ _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048, use_cache = True)
180
+ ```
181
+
182
+
183
+
184
+ # Execute code (Make sure to use virtual environments)
185
+ ```bash
186
+ python3 -m venv env
187
+ source env/bin/activate
188
+ ```
189
+
190
+ ## Execution code responses from Llama
191
+ #### Please use execute python code function for local. For langchain, please use Python REPL() to execute code
192
+
193
+ execute code funciton locally in python:
194
+ ```python
195
+ def execute_Python_code(code):
196
+ # A string stream to capture the outputs of exec
197
+ output = io.StringIO()
198
+ try:
199
+ # Redirect stdout to the StringIO object
200
+ with contextlib.redirect_stdout(output):
201
+ # Allow imports
202
+ exec(code, globals())
203
+ except Exception as e:
204
+ # If an error occurs, capture it as part of the output
205
+ print(f"Error: {e}", file=output)
206
+ return output.getvalue()
207
+ ```
208
+
209
+ Langchain python Repl
210
+ - Install
211
+
212
+ ```bash
213
+ !pip install langchain_experimental
214
+ ```
215
+
216
+ Code:
217
+ ```python
218
+ from langchain_core.tools import Tool
219
+ from langchain_experimental.utilities import PythonREPL
220
+
221
+ python_repl = PythonREPL()
222
+
223
+ # You can create the tool to pass to an agent
224
+ repl_tool = Tool(
225
+ name="python_repl",
226
+ description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
227
+ func=python_repl.run,
228
+ )
229
+ repl_tool(outputs[0]["generated_text"][-1])
230
+ ```
231
+
232
+ # Safety inputs/ outputs procedures
233
+ Fo all inputs, please use Llama-Guard: meta-llama/Llama-Guard-3-8B for safety classification.
234
+ Go to model card [Llama-Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B)
235
+
236
+
237
+
238
+ # Uploaded model
239
+
240
+ - **Developed by:** EpistemeAI
241
+ - **License:** apache-2.0
242
+ - **Finetuned from model :** EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code
243
+
244
+ This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
245
+
246
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
247
+