Prompt Engineering Guide: Advanced Techniques for Production LLMs
Prompt engineering is the highest-leverage activity in LLM application development. A well-crafted prompt can make a cheap model outperform an expensive one. This guide covers the techniques that matter in production.
Foundation: what goes in a system prompt
// Structure your system prompt in this order for best results: const systemPrompt = ` ## Role You are a customer support assistant for Acme Corp, a B2B SaaS company. ## Task Answer customer questions about billing, accounts, and product features. If unsure, say so — never guess on billing questions. ## Constraints - Keep responses under 3 sentences unless a list is more helpful. - Never discuss competitors. - Escalate to human if: account suspension, legal threat, data deletion. ## Format - Use plain text unless the user asks for a list. - Do not start with "Great question!" or similar filler phrases. - End with a concrete next step or offer to help further. `;
Chain-of-thought (CoT)
CoT dramatically improves accuracy on multi-step reasoning tasks by making the model think out loud:
// Zero-shot CoT — add "Think step by step" to any prompt
const messages = [{
role: 'user',
content: `A customer bought a 12-month plan on March 15, 2026 at $99/month.
They cancelled on June 10, 2026. How many months should we refund?
Think step by step.`,
}];
// Output:
// Step 1: March 15 to June 10 is 2 months and 26 days.
// Step 2: The customer was billed for 3 months (March, April, May).
// Step 3: They used 2 full months + partial third month.
// Step 4: Refundable: remaining months = 12 - 3 = 9 months.
// Answer: Refund 9 months = $891.// Few-shot CoT — provide worked examples before the question
const messages = [
{
role: 'user',
content: `Q: If a train travels 60mph for 2.5 hours, how far does it go?
Think step by step.`,
},
{
role: 'assistant',
content: 'Step 1: distance = speed × time. Step 2: 60 × 2.5 = 150. Answer: 150 miles.',
},
{
role: 'user',
content: `Q: ${actualQuestion}. Think step by step.`,
},
];Self-consistency
Generate multiple independent answers with high temperature, then take the majority vote:
async function selfConsistency(question: string, samples = 5): Promise<string> {
const responses = await Promise.all(
Array.from({ length: samples }, () =>
openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: question + '
Think step by step.' }],
temperature: 0.8, // diversity
})
)
);
const answers = responses.map(r =>
extractFinalAnswer(r.choices[0].message.content!)
);
// Return the most common answer
const freq = answers.reduce<Record<string, number>>((acc, a) => {
acc[a] = (acc[a] ?? 0) + 1; return acc;
}, {});
return Object.entries(freq).sort((a, b) => b[1] - a[1])[0][0];
}Least-to-most prompting
Break complex problems into sub-problems, solve each, then compose:
// Step 1: Decompose
const decompositions = await ask(`
To answer: "${complexQuestion}", what simpler sub-questions must I answer first?
List them in order.`);
// Step 2: Solve each sub-question in sequence
let context = '';
for (const subQ of decompositions) {
const answer = await ask(`Context: ${context}
Question: ${subQ}`);
context += `Q: ${subQ}
A: ${answer}
`;
}
// Step 3: Synthesize
return ask(`Based on these answers:
${context}
Answer: ${complexQuestion}`);Role prompting
// Assign a specific expert role for domain tasks const systemPrompt = `You are a senior PostgreSQL DBA with 15 years of experience optimizing queries for multi-terabyte OLAP workloads. When reviewing SQL, always check for: missing indexes, N+1 patterns, cross-join risks, and suboptimal window function usage.`; // The role grounds the model's prior knowledge toward the domain
Meta-prompting
Use the LLM to generate and improve its own prompts:
// Generate a prompt for a specific task const generatedPrompt = await ask(` Write a system prompt for an AI assistant that: - Helps data scientists write efficient pandas code - Prefers vectorized operations over loops - Includes brief explanations - Outputs code in fenced code blocks - Points out common pitfalls Output only the system prompt text, no preamble.`);
Instruction clarity checklist
- Be explicit about format: "Return a JSON array" not "return the data".
- Specify output length: "Summarize in exactly 2 sentences".
- State what NOT to do: "Do not include code examples".
- Use anchors for lists: number items to prevent omissions.
- One instruction per line: compound instructions are partially followed.
Prompt testing in CI
// prompt-tests/sentiment.test.ts
import { describe, it, expect } from 'vitest';
const testCases = [
{ input: 'I love this product!', expected: 'positive' },
{ input: 'This is terrible service.', expected: 'negative' },
{ input: 'The package arrived yesterday.', expected: 'neutral' },
];
describe('sentiment extraction prompt', () => {
it('classifies correctly on test cases', async () => {
for (const tc of testCases) {
const result = await classifySentiment(tc.input);
expect(result.sentiment).toBe(tc.expected);
}
});
it('never returns undefined or null', async () => {
const edge = await classifySentiment('');
expect(edge.sentiment).toBeDefined();
});
});Prompt versioning
// prompts/support-v4.ts
export const SUPPORT_SYSTEM_PROMPT = {
version: '4.2.0',
content: `You are a helpful support assistant...`,
changelog: 'Added escalation rules for data deletion requests',
};
// Log version with every call — critical for debugging regressions
await trackedCompletion(params, {
promptVersion: SUPPORT_SYSTEM_PROMPT.version,
feature: 'customer-support',
});Temperature guide
| Task | Temperature | Why |
|---|---|---|
| Code generation | 0.0–0.2 | Deterministic, correct syntax |
| Data extraction | 0.0 | Maximum consistency |
| Question answering | 0.0–0.3 | Accuracy over variety |
| Creative writing | 0.7–1.0 | Diversity and creativity |
| Brainstorming | 0.8–1.0 | Maximize idea diversity |
| Self-consistency | 0.7–0.9 | Independent samples |
Takeaway
Chain-of-thought and few-shot examples give the biggest quality gains with the least effort. Test prompts automatically in CI, version them like code, and log the version with every production call. Regressions are otherwise invisible.