โœ๏ธ
LLM

10 Prompt Engineering Tips for Production LLM Systems

AK
Arun Kataria
๐Ÿ“… December 5, 2024โฑ 5 min read
โ† Back to all posts

Most prompt engineering guides focus on getting one good answer from an LLM. But building a production system is different โ€” you need consistent, reliable, cost-efficient outputs at scale, across thousands of different user inputs.

These are the 10 patterns I've used in real production systems at Intuit and OLX.

The 10 Tips

Tip 01
Be explicit about output format
Don't say "return a summary". Say "Return a JSON object with keys: summary (string, max 2 sentences), confidence (0โ€“1), action_required (boolean)." Ambiguous format instructions cause parsing failures in production.
Tip 02
Use XML tags to delimit sections
Claude responds especially well to XML-like delimiters. Wrap context, instructions, and examples in tags: <context>, <instructions>, <examples>. This reduces hallucination and improves structure adherence by ~30% in our testing.
Tip 03
Add a "if you don't know, say so" instruction
Explicitly tell the model: "If you cannot answer with high confidence based on the provided context, respond with {confident: false, reason: '...'}." This is critical for financial and legal applications.
Tip 04
Include 2โ€“3 few-shot examples for edge cases
For complex classification or extraction tasks, include examples of the hardest cases โ€” not the easy ones. The model already handles easy cases. Examples teach it how to handle edge inputs.
Tip 05
Set token budgets per task type
Use max_tokens aggressively. A summarisation task shouldn't get 4,096 tokens. Tight budgets force concise outputs, reduce cost, and improve latency. We set separate budgets for: classification (150), extraction (400), generation (1,200).
Tip 06
Temperature 0 for deterministic tasks, 0.3โ€“0.7 for creative
For classification, extraction, or validation โ€” use temperature: 0. For copywriting, suggestions, or explanations โ€” use 0.3โ€“0.7. Many teams forget to set this and wonder why outputs vary on identical inputs.
Tip 07
Separate system prompt from user context
Never concatenate instructions with user data into one string. Use the system role for fixed instructions and the user role for dynamic context. This improves model compliance and makes prompt versioning much easier.
Tip 08
Log inputs and outputs โ€” always
You can't debug what you can't see. Log every prompt, response, token count, latency, and model version. We use a lightweight middleware layer that intercepts all API calls and stores them in Elasticsearch with a 30-day retention window.
Tip 09
Use claude-sonnet before reaching for claude-opus
Opus is 5ร— the cost of Sonnet. In our production systems, Sonnet handled 95% of tasks equally well. Only escalate to Opus for tasks that demonstrably need deeper reasoning. Build an automatic fallback: if Sonnet confidence < 0.7, retry with Opus.
Tip 10
Version your prompts like code
Store prompts in source control with semantic versions (v1.2.0). Never edit a production prompt in-place โ€” always increment the version and A/B test before full rollout. A single word change in a system prompt can shift output quality dramatically.
โœ… BonusBuild an eval harness with 50โ€“100 golden input/output pairs. Run it against every prompt change before deploying. This is the single highest ROI investment in any LLM system.

Found this useful?

Share with your team building LLM systems ๐Ÿ‘‡