Building Agentic AI Workflows with Anthropic Claude

← Back to all posts

At Intuit, I was tasked with reducing the manual effort involved in tax filing workflows. The solution? Build an agentic AI system powered by Anthropic Claude that could autonomously handle multi-step reasoning — asking questions, interpreting documents, and making filing decisions.

Here's exactly how I built it, what worked, and what I'd do differently.

✅ What you'll learnHow to architect an agentic AI pipeline with Claude, design prompts for multi-step tax workflows, handle tool use and retries, and keep costs under control in production.

What is an Agentic AI Workflow?

A standard LLM interaction is stateless — you send a prompt, you get a response. An agentic workflow is different. The model is given tools, memory, and a goal — and it figures out the steps to achieve it autonomously.

In our case, the goal was: given a user's tax documents and financial situation, determine the optimal filing strategy and pre-fill their return.

System Architecture

The Three Layers

Orchestration Layer — Java Spring Boot 21 service that manages session state, tool registry, and conversation history
Claude Integration Layer — Wraps Anthropic API calls, handles retries, token budgets, and streaming
Tool Layer — Set of callable functions: fetch W2 data, calculate deductions, validate IRS rules, call filing APIs

💡 Architecture tipKeep your orchestration layer language-agnostic from Claude. This lets you swap models (Claude → GPT-4) without rewriting business logic. We used an interface pattern in Spring Boot for this.

Prompt Design for Multi-Step Reasoning

The most critical part of an agentic system is the system prompt. Here's a simplified version of ours:

You are a tax filing assistant for TurboTax. Your job is to help
users complete their federal tax return accurately.

You have access to the following tools:
- get_user_documents(userId): Fetch uploaded W2, 1099, receipts
- calculate_deduction(type, amount): Compute deduction eligibility
- validate_filing_rule(rule_id, context): Check IRS compliance
- submit_draft_return(data): Save progress

Rules:
1. Always verify document data before making calculations
2. Never guess — if data is missing, call get_user_documents first
3. Explain each decision to the user in plain language
4. If confidence < 80%, ask a clarifying question before proceeding

Tool Use with Claude

Claude's tool use feature was the core of the agentic loop. When Claude decides it needs to call a tool, it returns a structured JSON response that our orchestrator executes:

// Spring Boot tool dispatcher (simplified)
public String dispatchTool(String toolName, JsonNode input) {
  return switch (toolName) {
    case "get_user_documents" -> documentService.fetch(input.get("userId").asText());
    case "calculate_deduction" -> deductionEngine.calculate(input);
    case "validate_filing_rule" -> irsRuleValidator.check(input);
    default -> throw new UnknownToolException(toolName);
  };
}

Handling the Agentic Loop

The loop runs until Claude either completes the filing or asks the user a question it can't answer itself:

Send user context + conversation history to Claude
Claude responds — either with tool calls or a user-facing message
If tool calls: execute, append results, loop back to step 1
If user message: stream to frontend, wait for user input
Repeat until filing is complete or max iterations reached

⚠️ Always set a max iteration limitWithout one, a poorly prompted agent can loop indefinitely — burning tokens and money. We set max 12 iterations per session and logged anything that hit the limit for manual review.

Results

Metric	Before	After	Improvement
Manual review time per return	18 min	4 min	78% reduction
Filing accuracy	91%	97%	+6%
User drop-off rate	34%	21%	38% reduction
Avg tokens per session	—	~4,200	Baseline

Key Lessons

Smaller, focused tools beat large ones — Claude calls tools more reliably when each does exactly one thing
Include confidence in your prompt — Asking Claude to express uncertainty prevents silent errors in production
Cache tool results aggressively — Document fetches are expensive; cache them per session to cut API costs
Log every tool call — Essential for debugging when the agent makes unexpected decisions
Use claude-sonnet for cost efficiency — claude-opus is smarter but 5× the cost; sonnet handled 95% of our cases perfectly

Note: Code examples are simplified for clarity and don't represent Intuit's actual implementation. Architecture patterns are based on general agentic AI design principles.

Building Agentic AI Workflows with Anthropic Claude

What is an Agentic AI Workflow?

System Architecture

The Three Layers

Prompt Design for Multi-Step Reasoning

Tool Use with Claude

Handling the Agentic Loop

Results

Key Lessons

Found this useful?

Related Posts