TheBloke/zephyr-7B-beta-GGUF · Addressing Inconsistencies in Model Outputs: Understanding and Solutions

When experimenting with this model, I've observed occasional discrepancies in its output. Sometimes it provides the correct response, and sometimes times it doesn't, even when presented with the same or similar questions. I have two inquiries: Why does this occur, and how can we address this issue?
The output of Agent goes into an infinite loop with the LLM not making changes to its reasoning as can
be seen in the highlighted block… this block keeps repeating till the agent runs out of iterations and
hence does not arrive at the final answer.

Code -
from huggingface_hub import hf_hub_download
from langchain.llms import LlamaCpp
from langchain.agents import create_csv_agent

MODEL_ID = "TheBloke/zephyr-7B-beta-GGUF"
MODEL_BASENAME = "zephyr-7b-beta.Q4_K_M.gguf"

CONTEXT_WINDOW_SIZE = 4096
MAX_NEW_TOKENS = 1024

model_path = hf_hub_download(
repo_id=MODEL_ID,
filename=MODEL_BASENAME,
resume_download=True,
cache_dir="./models",
)
kwargs = {
"model_path": model_path,
"n_ctx": CONTEXT_WINDOW_SIZE,
"max_tokens": MAX_NEW_TOKENS,
"n_gpu_layers":4
}
llm = LlamaCpp(
model_path=model_path,
temperature=0.1,
n_ctx=4096,
max_tokens=1024,
n_batch=100,
top_p=1,
verbose=True,
n_gpu_layers=100)

agent = create_csv_agent(llm, ['./Data/Employees.csv','./Data/Verticals.csv'], verbose=True)
response = agent.run("Which vertical name has the most number of resignations")
print(response)

QUERY:

How to correct such reasoning of the LLM such that the LLM reasons out that its actions are
being repeated but not helping at arriving at the right answer? Does the AgentExecutor need
to be corrected in this case ? If yes, how and what needs to be done?