Agents
The Core Philosophy: Agents Are Software, Not Models
At Agent CI, we hold a fundamental belief that distinguishes us from traditional ML approaches: agents are software applications, not models. This philosophy drives every decision in our platform design and shapes how we think about agent development, deployment, and maintenance.
"Prototyping agents is relatively straightforward and you can just run them manually as you adjust your prompts to see that they are performing adequately. But as soon as you get them into an environment where they're expected to maintain a baseline and improve above that, we need ways that we can actually quantify and track that over time as multiple developers are added to the team and are making changes to different parts of the system."
From Prototype to Production: The Scaling Challenge
The journey from agent prototype to production deployment presents unique challenges that traditional software development has already solved. While manually testing a proof-of-concept agent works perfectly, production agents require systematic quality assurance that scales with team complexity.
The Exponential Complexity Problem
Manual testing doesn't scale. Exponential complexity does.
- Prototype Stage: Single developer, simple workflows, manual testing works perfectly
- Production Stage: Multiple team members, real users, exponential test scenarios
- The Reality: Every prompt change creates exponential validation requirements that no human team can adequately test
Why Traditional ML Approaches Fall Short
Most platforms treat agents as machine learning models requiring:
- Training cycles, datasets, and hyperparameters
- Fine-tuning and optimization pipelines
- Separate infrastructure for ML workflows
- Data science methodologies and experiment tracking
We believe this is fundamentally wrong. Agents are software applications that happen to use AI models as components.
The Software-First Approach
Use the Tools You Already Know
Agent development should leverage the mature ecosystem of software engineering:
- Git, not experiment tracking systems - Version control that developers understand
- CI/CD, not model registries - Deployment pipelines that scale with teams
- Logs and traces, not tensor analysis - Debugging approaches that work for applications
- Pull requests, not notebook commits - Code review processes that ensure quality
Clean Architecture Without Compromise
Tests don't live in production code. Just as you learned to write unit tests in separate /tests directories, agent evaluation belongs in configuration files, not scattered throughout your agent implementation.
- No
@monitor,@track, or@observedecorators cluttering your business logic - Framework conventions, not manual instrumentation - We extract evaluation data automatically
- Separation of concerns - Agent code expresses intent; infrastructure measures performance
Git as the Source of Truth
Version control is your time machine. Git isn't just for code backup—it's your application's memory. Every commit tells a story, every diff shows evolution.
When we make Git the source of truth for prompts and agent behavior, we're not adding a feature. We're acknowledging that version control already solved this problem decades ago.
Agent Architecture and Composition
Multi-Prompt Agent Structure
Agents are composed of multiple prompt types that work together in sophisticated workflows:
- Router/Decision prompts - Choosing correct tools or sub-agents based on input analysis
- Tool execution prompts - Generating valid parameters and API calls for external systems
- Planning prompts - Logical step decomposition and task sequencing
- Reflection prompts - Plan adherence validation and infinite loop detection
- Response prompts - User-facing output generation and formatting
While individual prompts are versioned separately through Git, evaluations run at the agent level since agents incorporate multiple prompts as a cohesive unit. This approach evaluates complete agent behavior rather than isolated prompt performance.
Framework-Aware Intelligence
We support specific agent frameworks to enable automatic data extraction without manual instrumentation:
Supported Frameworks (in order of market popularity):
- LangChain - Most popular with extensive ecosystem support
- LangGraph - Graph-based workflows for complex multi-step reasoning
- Pydantic AI - Type-safe framework with clean design and validation
- LlamaIndex - RAG and data-augmented applications
- OpenAI Agents SDK - Direct integration with OpenAI capabilities
- Semantic Kernel - Microsoft's cross-platform enterprise framework
- CrewAI - Multi-agent collaboration framework
- Azure AI Agents - Cloud-native platform with Azure integration
- Google ADK - Google's agent development toolkit
This "convention over configuration" approach provides elegant zero-setup experience for supported frameworks while maintaining a well-defined Python API for custom implementations.
Automatic Prompt Versioning
Git-native prompt tracking eliminates separate management interfaces:
# Example: Automatic prompt extraction and versioning
agent_one = Agent(
prompt="What is the capital of France?" # Tracked as 'agent_one'
)
GLOBAL_PROMPT = "You are a helpful assistant."
agent_two = Agent(
prompt=(
GLOBAL_PROMPT # Tracked separately as 'GLOBAL_PROMPT'
" What is the capital of France?" # Tracked as 'agent_two'
)
)
Every prompt change is tracked through Git commit history, enabling:
- Zero-config versioning through normal Git workflows
- Native diff views showing prompt changes alongside performance impact
- Environment mapping associating branches with deployment stages
- Rollback simplicity using standard Git commands
Production Deployment Philosophy
Branch-Based Environment Strategy
Your Git structure defines your deployment pipeline:
mainbranch → Production environmentstagingbranch → Staging environment- Feature branches → Isolated development environments
No configuration files, no environment variables needed. Your existing Git workflow becomes your agent deployment strategy.
Live Branch Environments
Every branch gets its own running agent environment for:
- Real-time testing of changes before merging
- Stakeholder feedback on branch-specific agent instances
- Interactive validation during development
- Cross-environment comparison using Git commit relationships
The Foundation for Systematic Improvement
This software-first approach creates the foundation for:
- Quantifiable metrics that replace gut feelings with data
- Multi-developer coordination with clear performance attribution
- Regression prevention through automated evaluation gates
- Production confidence backed by systematic quality assurance
By treating agents as the software applications they are, we enable the same disciplined engineering practices that have built reliable software systems for decades. The result: agents that scale from prototype to production without architectural rewrites, supported by the mature tooling ecosystem that software teams already know and trust.
Next Steps
- Evaluations - Learn how systematic evaluation enables confident agent development
- Installation - Set up Agent CI with your existing agent codebase
- Quick Start - Get your first evaluation running in 6 minutes