10 Prompt Engineering Tips for Production LLM Systems

← Back to all posts

Most prompt engineering guides focus on getting one good answer from an LLM. But building a production system is different — you need consistent, reliable, cost-efficient outputs at scale, across thousands of different user inputs.

These are the 10 patterns I've used in real production systems at Intuit and OLX.

The 10 Tips

Tip 01

Be explicit about output format

Don't say "return a summary". Say "Return a JSON object with keys: summary (string, max 2 sentences), confidence (0–1), action_required (boolean)." Ambiguous format instructions cause parsing failures in production.

Tip 02

Use XML tags to delimit sections

Claude responds especially well to XML-like delimiters. Wrap context, instructions, and examples in tags: <context>, <instructions>, <examples>. This reduces hallucination and improves structure adherence by ~30% in our testing.

Tip 03

Add a "if you don't know, say so" instruction

Explicitly tell the model: "If you cannot answer with high confidence based on the provided context, respond with {confident: false, reason: '...'}." This is critical for financial and legal applications.

Tip 04

Include 2–3 few-shot examples for edge cases

For complex classification or extraction tasks, include examples of the hardest cases — not the easy ones. The model already handles easy cases. Examples teach it how to handle edge inputs.

Tip 05

Set token budgets per task type

Use max_tokens aggressively. A summarisation task shouldn't get 4,096 tokens. Tight budgets force concise outputs, reduce cost, and improve latency. We set separate budgets for: classification (150), extraction (400), generation (1,200).

Tip 06

Temperature 0 for deterministic tasks, 0.3–0.7 for creative

For classification, extraction, or validation — use temperature: 0. For copywriting, suggestions, or explanations — use 0.3–0.7. Many teams forget to set this and wonder why outputs vary on identical inputs.

Tip 07

Separate system prompt from user context

Never concatenate instructions with user data into one string. Use the system role for fixed instructions and the user role for dynamic context. This improves model compliance and makes prompt versioning much easier.

Tip 08

Log inputs and outputs — always

You can't debug what you can't see. Log every prompt, response, token count, latency, and model version. We use a lightweight middleware layer that intercepts all API calls and stores them in Elasticsearch with a 30-day retention window.

Tip 09

Use claude-sonnet before reaching for claude-opus

Opus is 5× the cost of Sonnet. In our production systems, Sonnet handled 95% of tasks equally well. Only escalate to Opus for tasks that demonstrably need deeper reasoning. Build an automatic fallback: if Sonnet confidence < 0.7, retry with Opus.

Tip 10

Version your prompts like code

Store prompts in source control with semantic versions (v1.2.0). Never edit a production prompt in-place — always increment the version and A/B test before full rollout. A single word change in a system prompt can shift output quality dramatically.

✅ BonusBuild an eval harness with 50–100 golden input/output pairs. Run it against every prompt change before deploying. This is the single highest ROI investment in any LLM system.

10 Prompt Engineering Tips for Production LLM Systems

The 10 Tips

Found this useful?

Related Posts