Published Loading...

7 min read

Agent Patterns That Actually Work in Production

Lessons learned from building production AI agents: what works, what doesn't, and why most agent frameworks miss the mark on real-world complexity.

thoughts

AI Agents

Architecture

Production

Patterns

After building a dozen AI agent systems that handle real money, real compliance requirements, and real user frustration, I've learned that most agent frameworks solve the wrong problems.

The blog posts show autonomous agents booking flights and ordering groceries. The reality is messier: agents that need to work within existing systems, handle edge cases gracefully, and maintain audit trails that satisfy both regulators and angry users.

Here are the patterns that actually work.

The Multi-Agent Myth

Common Wisdom: Break complex tasks into multiple specialized agents that collaborate.

Reality Check: Agent-to-agent communication is where things break down.

Most production "multi-agent" systems are actually:

One orchestrator agent that's really just a state machine
Multiple specialized functions that happen to use LLM calls
A lot of error handling code

The Communication Tax

Every agent boundary is a point of failure. Message passing, state synchronization, and error recovery across agents adds complexity faster than it adds capability.

What Works Instead: The Single Agent Pattern

class ProductionAgent:
    def __init__(self):
        self.tools = {
            'document_processor': DocumentProcessor(),
            'policy_checker': PolicyEngine(),
            'report_generator': ReportGenerator()
        }
        self.state_machine = StateMachine()
    
    def execute_workflow(self, task):
        # Single agent with multiple tools
        # Clear state transitions
        # Centralized error handling
        # Audit trail in one place
        pass

Why it works:

Single point of failure (which you can actually debug)
Shared context across all operations
Simpler error recovery
Easier to audit and test

The Planning Fallacy

Common Wisdom: Agents should plan multi-step workflows before execution.

Reality Check: Plans don't survive contact with production systems.

I've seen agents generate beautiful 10-step plans that fail on step 2 because:

External API changed response format
User provided incomplete information
Database timeout occurred
Regulatory requirement changed mid-process

What Works Instead: The Adaptive Execution Pattern

class AdaptiveAgent:
    def execute(self, goal):
        while not self.is_complete(goal):
            # Assess current situation
            context = self.gather_context()
            
            # Plan only the next immediate step
            next_action = self.decide_next_action(context, goal)
            
            # Execute with error handling
            result = self.execute_with_recovery(next_action)
            
            # Adapt based on result
            if result.failed:
                goal = self.adjust_goal(goal, result.error)
            
            self.update_state(result)

Key insight: Successful agents are reactive, not predictive. They respond to what actually happens rather than what should happen.

The Tool Integration Reality

Common Wisdom: Give agents access to APIs and they'll figure out how to use them.

Reality Check: Production systems have authentication, rate limits, error conditions, and undocumented behaviors that LLMs can't reason about effectively.

The Wrapper Pattern That Works

class ReliableToolWrapper:
    def __init__(self, api_client):
        self.client = api_client
        self.circuit_breaker = CircuitBreaker()
        self.retry_policy = ExponentialBackoff()
        
    def execute_with_context(self, action, context):
        """
        Handles all the production concerns:
        - Authentication refresh
        - Rate limiting
        - Error classification  
        - Retry logic
        - Fallback strategies
        - Audit logging
        """
        with self.circuit_breaker:
            return self.retry_policy.execute(
                lambda: self._execute_safely(action, context)
            )
    
    def _execute_safely(self, action, context):
        # The actual API call with all error handling
        pass

The wrapper does what LLMs can't:

Handle authentication token refresh
Implement exponential backoff
Classify errors (retry vs. fail fast)
Maintain rate limit budgets
Provide consistent error messages

The State Management Problem

Common Wisdom: Agents should be stateless for scalability.

Reality Check: Real workflows have state that matters. User context, partial results, approval chains, regulatory checkpoints.

The Persistent Context Pattern

class StatefulAgent:
    def __init__(self, workflow_id):
        self.context = WorkflowContext.load(workflow_id)
        self.checkpoint_manager = CheckpointManager()
    
    def execute_step(self, step):
        # Save state before risky operations
        checkpoint = self.checkpoint_manager.create(self.context)
        
        try:
            result = self.execute_with_tools(step)
            self.context.update(result)
            self.checkpoint_manager.commit(checkpoint)
            
        except RecoverableError as e:
            # Rollback to checkpoint
            self.context = self.checkpoint_manager.restore(checkpoint)
            raise RetryWithContext(e, self.context)
            
        except FatalError as e:
            # Save failure context for human review
            self.context.mark_failed(e)
            self.checkpoint_manager.save_failure_state(self.context)
            raise

Why state persistence matters:

User can resume interrupted workflows
Audit requirements need complete history
Error recovery can restart from checkpoints
Compliance reviews need full context

The Human-in-the-Loop Reality

Common Wisdom: Agents should be fully autonomous.

Reality Check: Production systems need human oversight, approval workflows, and escalation paths.

The most successful agents I've built have clear handoff patterns:

The Escalation Pattern

class HumanIntegrationAgent:
    def __init__(self):
        self.confidence_threshold = 0.8
        self.approval_required_keywords = ['payment', 'delete', 'approve']
        
    def execute_with_oversight(self, task):
        plan = self.generate_plan(task)
        
        # Check if human approval needed
        if (plan.confidence < self.confidence_threshold or 
            self.requires_approval(plan)):
            
            approval_request = self.create_approval_request(plan)
            return self.wait_for_human_approval(approval_request)
        
        # Execute autonomously with monitoring
        return self.execute_with_monitoring(plan)
    
    def requires_approval(self, plan):
        return any(keyword in plan.description.lower() 
                  for keyword in self.approval_required_keywords)

The pattern works because:

Agents handle routine cases autonomously
Humans review edge cases and high-stakes decisions
Clear escalation criteria prevent both micro-management and disasters
Audit trail shows both agent reasoning and human oversight

The Monitoring & Observability Gap

Common Wisdom: Agent frameworks will provide built-in monitoring.

Reality Check: You need custom observability for production agent systems.

What to Monitor

class AgentObservability:
    def __init__(self):
        self.metrics = {
            'task_completion_rate': TaskCompletionMetric(),
            'error_rate_by_type': ErrorClassificationMetric(),
            'human_escalation_rate': EscalationMetric(),
            'cost_per_task': CostTrackingMetric(),
            'user_satisfaction': SatisfactionMetric()
        }
    
    def track_execution(self, agent_execution):
        with self.trace_context():
            # Trace every LLM call with cost
            # Monitor tool execution times  
            # Track state transitions
            # Log confidence scores
            # Measure end-to-end latency
            pass

Key metrics for production agents:

Task Success Rate: Not just "didn't crash" but "achieved user goal"
Error Classification: Distinguish agent errors from system errors from user errors
Cost per Task: LLM costs add up fast in production
Human Escalation Rate: Are agents handling appropriate complexity?
User Satisfaction: The ultimate measure of agent effectiveness

The Security Model Most Miss

Common Wisdom: Run agents in sandboxed environments.

Reality Check: Agents need access to real systems with real permissions, making security complex.

The Principle of Least Privilege for Agents

class SecureAgentContext:
    def __init__(self, user_id, task_type):
        # Dynamic permission based on task and user
        self.permissions = PermissionManager.get_agent_permissions(
            user_id=user_id,
            task_type=task_type,
            time_limit=timedelta(hours=1)
        )
        
        # Audit every permission use
        self.audit_logger = AuditLogger(user_id, task_type)
    
    def execute_with_permissions(self, action):
        if not self.permissions.allows(action):
            self.audit_logger.log_denied_action(action)
            raise PermissionDenied(action)
            
        self.audit_logger.log_permitted_action(action)
        return self.execute(action)

Security principles that work:

Agents inherit user permissions, not system permissions
Time-bounded access tokens
Comprehensive audit logging
Explicit deny-by-default policies
Regular permission reviews

The Deployment Patterns

Common Wisdom: Deploy agents like any other service.

Reality Check: Agents have different failure modes and operational needs.

The Circuit Breaker Pattern for LLM Costs

class CostAwareAgent:
    def __init__(self):
        self.cost_tracker = CostTracker()
        self.circuit_breaker = CostCircuitBreaker(
            cost_threshold_per_hour=100.00,
            error_rate_threshold=0.1
        )
    
    def execute_with_cost_control(self, task):
        with self.circuit_breaker:
            estimated_cost = self.estimate_task_cost(task)
            
            if not self.cost_tracker.can_afford(estimated_cost):
                raise CostBudgetExceeded(estimated_cost)
                
            return self.execute(task)

Operational patterns that work:

Cost circuit breakers prevent runaway LLM bills
Gradual rollout with synthetic tasks
A/B testing between agent and human workflows
Detailed cost attribution per user/department
Automated rollback on quality degradation

What Actually Works in Production

After building agents that handle millions in transactions and pass government audits, here's what I've learned works:

Single Agent with Multiple Tools beats multi-agent complexity
Reactive Execution beats elaborate planning
Explicit Human Handoffs beats full autonomy
Custom Monitoring beats framework promises
User Permission Models beats agent permission models
Cost Controls beats unlimited LLM access

The most successful production agents I've built look boring: they're essentially smart state machines with LLM reasoning, robust error handling, and clear escalation paths.

They don't book your flights autonomously. They do eliminate 80% of routine work while keeping humans in the loop for everything that matters.

Building production AI agents? Focus on reliability patterns first, AI capabilities second. Your users will thank you when their agents actually work on Tuesday morning.

Want to Discuss Agent Architecture?

I'm always interested in comparing notes on production agent patterns. Reach out if you're building something real.

All Notes Get in touch