Agents

The Core Philosophy: Agents Are Software, Not Models

At Agent CI, we hold a fundamental belief that distinguishes us from traditional ML approaches: agents are software applications, not models. This philosophy drives every decision in our platform design and shapes how we think about agent development, deployment, and maintenance.

"Prototyping agents is relatively straightforward and you can just run them manually as you adjust your prompts to see that they are performing adequately. But as soon as you get them into an environment where they're expected to maintain a baseline and improve above that, we need ways that we can actually quantify and track that over time as multiple developers are added to the team and are making changes to different parts of the system."

From Prototype to Production: The Scaling Challenge

The journey from agent prototype to production deployment presents unique challenges that traditional software development has already solved. While manually testing a proof-of-concept agent works perfectly, production agents require systematic quality assurance that scales with team complexity.

The Exponential Complexity Problem

Manual testing doesn't scale. Exponential complexity does.

Prototype Stage: Single developer, simple workflows, manual testing works perfectly
Production Stage: Multiple team members, real users, exponential test scenarios
The Reality: Every prompt change creates exponential validation requirements that no human team can adequately test

Why Traditional ML Approaches Fall Short

Most platforms treat agents as machine learning models requiring:

Training cycles, datasets, and hyperparameters
Fine-tuning and optimization pipelines
Separate infrastructure for ML workflows
Data science methodologies and experiment tracking

We believe this is fundamentally wrong. Agents are software applications that happen to use AI models as components.

The Software-First Approach

Use the Tools You Already Know

Agent development should leverage the mature ecosystem of software engineering:

Git, not experiment tracking systems - Version control that developers understand
CI/CD, not model registries - Deployment pipelines that scale with teams
Logs and traces, not tensor analysis - Debugging approaches that work for applications
Pull requests, not notebook commits - Code review processes that ensure quality

Clean Architecture Without Compromise

Tests don't live in production code. Just as you learned to write unit tests in separate /tests directories, agent evaluation belongs in configuration files, not scattered throughout your agent implementation.

No @monitor, @track, or @observe decorators cluttering your business logic
Framework conventions, not manual instrumentation - We extract evaluation data automatically
Separation of concerns - Agent code expresses intent; infrastructure measures performance

Git as the Source of Truth

Version control is your time machine. Git isn't just for code backup—it's your application's memory. Every commit tells a story, every diff shows evolution.

When we make Git the source of truth for prompts and agent behavior, we're not adding a feature. We're acknowledging that version control already solved this problem decades ago.

Agent Architecture and Composition

Multi-Prompt Agent Structure

Agents are composed of multiple prompt types that work together in sophisticated workflows:

Router/Decision prompts - Choosing correct tools or sub-agents based on input analysis
Tool execution prompts - Generating valid parameters and API calls for external systems
Planning prompts - Logical step decomposition and task sequencing
Reflection prompts - Plan adherence validation and infinite loop detection
Response prompts - User-facing output generation and formatting

While individual prompts are versioned separately through Git, evaluations run at the agent level since agents incorporate multiple prompts as a cohesive unit. This approach evaluates complete agent behavior rather than isolated prompt performance.

Framework-Aware Intelligence

We support specific agent frameworks to enable automatic data extraction without manual instrumentation:

Supported Frameworks (in order of market popularity):

LangChain - Most popular with extensive ecosystem support
LangGraph - Graph-based workflows for complex multi-step reasoning
Pydantic AI - Type-safe framework with clean design and validation
LlamaIndex - RAG and data-augmented applications
OpenAI Agents SDK - Direct integration with OpenAI capabilities
Semantic Kernel - Microsoft's cross-platform enterprise framework
CrewAI - Multi-agent collaboration framework
Azure AI Agents - Cloud-native platform with Azure integration
Google ADK - Google's agent development toolkit

This "convention over configuration" approach provides elegant zero-setup experience for supported frameworks while maintaining a well-defined Python API for custom implementations.

Automatic Prompt Versioning

Git-native prompt tracking eliminates separate management interfaces:

# Example: Automatic prompt extraction and versioning
agent_one = Agent(
    prompt="What is the capital of France?"  # Tracked as 'agent_one'
)

GLOBAL_PROMPT = "You are a helpful assistant."
agent_two = Agent(
    prompt=(
        GLOBAL_PROMPT  # Tracked separately as 'GLOBAL_PROMPT'
        " What is the capital of France?"  # Tracked as 'agent_two'
    )
)

Every prompt change is tracked through Git commit history, enabling:

Zero-config versioning through normal Git workflows
Native diff views showing prompt changes alongside performance impact
Environment mapping associating branches with deployment stages
Rollback simplicity using standard Git commands

Production Deployment Philosophy

Branch-Based Environment Strategy

Your Git structure defines your deployment pipeline:

main branch → Production environment
staging branch → Staging environment
Feature branches → Isolated development environments

No configuration files, no environment variables needed. Your existing Git workflow becomes your agent deployment strategy.

Live Branch Environments

Every branch gets its own running agent environment for:

Real-time testing of changes before merging
Stakeholder feedback on branch-specific agent instances
Interactive validation during development
Cross-environment comparison using Git commit relationships

The Foundation for Systematic Improvement

This software-first approach creates the foundation for:

Quantifiable metrics that replace gut feelings with data
Multi-developer coordination with clear performance attribution
Regression prevention through automated evaluation gates
Production confidence backed by systematic quality assurance

By treating agents as the software applications they are, we enable the same disciplined engineering practices that have built reliable software systems for decades. The result: agents that scale from prototype to production without architectural rewrites, supported by the mature tooling ecosystem that software teams already know and trust.

Next Steps

Evaluations - Learn how systematic evaluation enables confident agent development
Installation - Set up Agent CI with your existing agent codebase
Quick Start - Get your first evaluation running in 6 minutes