Perspectives for first principles prompt engineering

Community Article Published August 18, 2024

To derive and maximise the value we can extract from using genAI, we need to optimise our skill in aligning the functioning of an LLM with our expectations. Prompt engineering plays a key role here.

What is prompt engineering?

Prompt engineering is the process of designing and optimising the prompt until the response meets the users expectations for relevance or quality.

What is the basic process underlying prompt engineering?

  1. Frame the problem and deepen your domain understanding. With background knowledge, it is easier to specify the response you’re looking for and to have a feeling for the context you need to use to inspire a good response.
  2. Write an initial version using that context
  3. Optimise the prompt to the model at hand, respecting the models strengths and weaknesses
  4. Further optimise and identify errors and explanations for those errors
  5. Iterate

image/png

General best practices:

  • Use the best model first
  • Clearly separate parts of the prompt like instructions and inserted context
  • Be specific and clear, concise and explicit
  • Describe details, elements and format of the output
  • Use few shot prompting
  • Prime the models response i.e. by providing a beginning or hints for what it should use
  • Provide a fallback option and workflow
  • Optimise the order of examples

Perspectives for theoretically informed prompt engineering

There are roughly three (not entirely distinct) frames for prompt engineering:

  • LLMs as internet.zip or interpolative stochastic parrots
  • LLMs as more flexible dynamical systems
  • LLMs as pseudocognitive artifacts

These perspectives describe LLMs as interpolative databases, a more flexible mechanism of self-attention or to human cognition to make prompt engineering easier.

The stochastic parrot

This perspective is describing LLMs as a retrieval mechanism, which basically just regurgitate the pretraining data when prompted, i.e. they try to complete the sentence with tokens most likely for the context in the pretraining data. Alignment then biases the model to be more easy to use, but is still grounded in repeating back mixtures of the pretraining data. This perspective is quite powerful and concrete for helping with prompt engineering, as the user only needs some knowledge about the general shape of the pretraining data. In the tech-bro community, LLMs are often described as a compression of the internet, or internet.zip to assist ones imagination. Of cause, the other sources are also important, but it is very easy to get a picture of standard internet pages on whatever topic you’re interested in. Standard copywriting and blog layouts are then the records the ‘database’ is mixing together, so you can use that structure for prompting or for thinking about phrases for prompts that LLMs will be able to ‘understand’ well. To steer the output, get the LLM to regurgitate the mixture of sources that is the closest match to what you want: Try to use the knowledge from pretraining data to force your task/problem into a format the LLM can work with well. This approach means you craft your prompt from the patterns the LLM has memorised the most. Two approaches here are simply making a base model repeat back the general sources it was trained on to learn about the patterns or by analysing the pretraining data directly for frequent patterns.

LLMs as flexible dynamical systems

Another perspective recognizes that LLMs can do more than just repeat back the data they have seen. In this perspective, the role of self-attention is more appropriately reflected than in the stochastic parrot perspective. While being grounded in the pretraining data, the LLM goes beyond just repeating back what it has seen. the mechanics of next token prediction with self-attention (Rwat et al, 2024) have been described like this: For tokens in the context window, self-attention selects important tokens based on the lastest input token. These important tokens are based on strongly connected components between tokens in the pretraining data, i.e. words that are indicative for the completion of the (whole) context. Token by token, this abstraction of meaning then guides the completion of the context, meaning that the LLM does not merely repeat back the pretraining data, but an interpretation of it, based on stereotypical associations between concepts in the pretraining data. Hence, it is not parroting but confabulating the context, yielding some extrapolation beyond the pretraining data. As the pretraining data is very broad, this gives the LLM a broad array of human like but fragile natural language understanding, meaning that the ‘understanding’ of the LLM is only supported as far as the pretraining data was ‘thick’ enough to teach it a human like interpretation of words. Another source of generalization of the ‘intelligence’ LLMs have is of cause their abstractions as deep neural networks. Their interpretation of meaning is not limited to the currently attended words, but also the ‘intention’ or ‘gist’ of parts of the context window and of the context window as a whole. Just as image classifiers of cats and dogs distinguish them by their abstract component features as eye and ear shapes, LLMs predict the next word as abstract interpretation of the original pretraining data rather than merely repeating it back, with the features being ‘intentions’ behind the components of the training documents rather than an exact copy of them, as far as stereotypical associations between words leveraged those. As an example, instead of merely repeating back a mixture of blog posts on SEO marketing and standard headlines from the pretraining data, they have a ‘gist’ of a stereotypical blog post ‘in mind’ and interpret completion of a SEO marketing blog post with that knowledge. These abstractions both bias the output but also give it abstraction freedoms beyond the literal pretraining data, explaining why high end LLMs can in context learn entirely new languages. LLMs engage in ‘stereotypical reasoning’ trying to interpret the prompt completion situation as ‘one of a kind’ according its interpretation of the pretraining data. Personally I like to call LLM reasoning “properly executed pigeonholing” - meaning it kinda violently stuffs the context with what seems like a typical completion to it, in a kinda retarded way - superhumanly stupid at times, as ‘reasoning’ works as well (or badly) for humanly simple and complex problems a like. With these dynamical mechanics in mind, we can see that prompting can be seen as the application of control theory (Bhargava et al, 2024): There is a subset of reachable output space, which matches human intent and the ‘right tokens’ used to express that intent. The overlap of human understanding and LLM ‘understanding’ that serves the human intent correctly is demarcated as a directly controllable subspace of LLM reasoning (by using ML (i.e. another LLM) instead of a human, the indirectly controllable space becomes even bigger). Hence, to steer LLM pigeonholing, we can try to pigeonhole ourselves and use frequent keywords, textrank or by analysing self-attention on typical output generated by the LLM to come up with prompts which align with both human and LLLM ‘intent’. Prompt engineering thus ought to be supported by data-driven empathy with our retard LLM sevant.

LLM as pseudocognitive artifact

I’m starting to anthropomorphize the LLM because I’m starting to think about the final and more intuitive perspective: taking the act of instructing the LLM as instructing another human (Janus, 2021). This perspective makes sense, because of some similarities between how LLMs ‘reason’ and humans reason, though this is not to suggest a true sameness in any way, as our brains operation is way more complex than that of LLMs. Broadly pretrained LLMs can be seen as simulators of human linguistic thought and the typical goals humans were pursuing when writing the documents of the pretraining data, using the world knowledge the LLM learned from those to interpret the writing task. However, it is important to note that such world knowledge is full of inconsistencies and that one is well advised to not overestimate the pseudocognitive abilities of the LLM: stick to thinking about them as engaging in stereotypical reasoning. However, this perspective adds the rich theoretical lenses from cognitive science as a guideline to inspire prompt engineering, though here there is of cause more wiggle room in what inspiration can be interpreted as appropriately covered by LLM abilities vs when we are ascribing too much ‘cognition’ to the LLM. Useful for inspiring prompt writing such a perspective is nonetheless:

  • Think of completions as simulated humans from the pretraining data, you can try to think of a persona that writes the current completion (and also prompt the LLM for an intuition)
  • With knowledge about the pretraining corpus, think about tasks humans frequently executed when writing the pretraining corpus, how likely your desired output is among the task outputs in the corpus and how well your words reflect how a typical human executing the task in the corpus would formulate or think about the task (McCoy et al, 2023).
  • How does a person writing task relevant texts of the pretraining data ‘tick’? What are cultural word associations for those people? In what memes, frames, schemas and scripts do these people think? Try to find out how to ‘prime’ such a person to have the right associations to complete your task, what is the typical language such people use and think in?
  • Try to help the LLM not to mix its interpretation with the perspective of several authors as in a blank prompt but explicitly steer it to take the stereotypical person you’d imagine writing a related completion of the base model (i.e. as an expert in …,)
  • As it is pseudocognitive, help the LLM take its ‘role’ by chain-of-thought prompting or otherwise inspiring it to write for a while (can also be relevant RAG content rewritten as the LLM ‘gets’ it)
  • Use external knowledge resources like books / encyclopaedias of the expert knowledge you want it to use and use those as associative networks to prime it to think like the simulated author useful for your task
  • What are typical concerns the simulated people in question care about, what are their motives and what goals do they strive for?
  • Without assuming a clean science, what MBTI type is associated with their profession?
  • What is the stereotypical narrative and life story of the person to be simulated?
  • What prototypical concepts exist in the knowledge domain of the simulated person? What are otherwise frequently used concepts in their knowledge domain?
  • What propositions do simulated people usually base their thinking upon, what belief systems are typical for the LLM simulated person completing your prompt?
  • What is the typical autobiographical memory of the simulated person?
  • Which person would make the right inferences about the context window to help task performance?
  • Additionally, to align ones understanding, we can try to iteratively predict how a given person would understand and react to a paragraph or piece of text and try to learn about the right words for the task through back and forth with the LLM
  • What is the main idea, keywords, implications, unstated conclusions, value judgements the simulated person would have in mind about the task / text chunk in question?
  • How would the simulated person name the topic, what facts are important enough for it to recall, what impact / practical implications would that person consider?
  • To what would the person compare the situation, what analogies would they draw, how would they analyse the situation and evaluate the ideas that are task relevant?
  • How would the person in question understand the words in the context window?
  • What moral does the person in question interpret from the narrative?
  • What character traits would the person assign to other relevant agents?
  • What counterfactual reasoning would the simulated person engage in?
  • How would the simulated person feel about the task / context?
  • How does the simulated person draw conclusions about the task / context?
  • What tone does the simulated person recognize in the context?
  • What attitudes and aims does the simulated person infer from the prompt engineer? Can it interpret the intention correctly?
  • What heuristics guide the simulated person to solve the problem?
  • What cognitive biases does the simulated person fall victim to?
  • What are typical (i.e. professional) problem solving strategies used by the simulated person (i.e. practical theoretical frameworks like SWOT analysis)
  • What are the detailed contents and features of the schemas the simulated person in question uses? How do they differentiate?
  • What is the typical material the simulated person learns the content it uses to think from (encoding specificity)?
  • What hierarchical structure does the experts memory typically use to correctly remember contents from long-term working memory?
  • What framing effects does the simulated person succumb to?
  • How does the text world model evolve dynamically of the simulated person reading the context window (potentially use a second LLM prompt to assess)?
  • How can you apply common persuasion techniques to the simulated person in question to comply with a request?

Conclusion

In conclusion, prompt engineering is a critical skill for maximizing the potential of large language models (LLMs). By understanding and applying the right techniques, we can better align the output of these models with our intended goals. Through the perspectives of LLMs as "stochastic parrots," "flexible dynamical systems," or "pseudocognitive artifacts," we can tailor our prompts to harness the model’s capabilities more effectively. Each perspective offers unique insights into how LLMs process information and how we can steer their outputs. By iterating on prompts, experimenting with different approaches, and considering the underlying mechanics and potential biases of LLMs, we can achieve more reliable and relevant results. Whether we view LLMs as simple pattern matchers or as more sophisticated, albeit imperfect, simulators of human reasoning, the key to effective prompt engineering lies in understanding both the model’s strengths and its limitations. As we continue to refine our methods, we unlock new possibilities in leveraging AI to enhance creativity, decision-making, and problem-solving across various domains. (last paragraph written by Chatgpt).

References

Bhargava, A., Witkowski, C., Shah, M., & Thomson, M. (2023). What's the Magic Word? A Control Theory of LLM Prompting. arXiv preprint arXiv:2310.04444.

Janus (2024). https://generative.ink/posts/methods-of-prompt-programming/#the-reverse-engineered-dynamics-of-language

Li, Y., Huang, Y., Ildiz, M. E., Rawat, A. S., & Oymak, S. (2024, April). Mechanics of next token prediction with self-attention. In International Conference on Artificial Intelligence and Statistics (pp. 685-693). PMLR.

McCoy, R. T., Yao, S., Friedman, D., Hardy, M., & Griffiths, T. L. (2023). Embers of autoregression: Understanding large language models through the problem they are trained to solve. arXiv preprint arXiv:2309.13638.