Getting Started with Agent CI: 5 Minutes to Your First Automated Evaluation

Test Your Agents Like You Test Your Code

Your agent works great in development. But will it work in production? With Agent CI, you can find out before you deploy.

In 5 minutes, you'll have automated testing running on every pull request. Here's how:

Step 1: Connect Your Repository

Install Agent CI on Your GitHub Organization

Go to agent-ci.com and click "Login with GitHub"
Authorize Agent CI to verify your GitHub identity (you'll see a GitHub authorization page)
Install the Agent CI GitHub App:
- You'll be redirected to GitHub's app installation page
- Choose your organization from the dropdown (only organization owners can install apps)
- Select repository access:
  - Choose "All repositories" for organization-wide access
  - Or choose "Only select repositories" and pick specific repos
- Click "Install" to complete the GitHub App installation

Note: Only organization owners can install GitHub Apps. If you're not an owner, ask an organization owner to complete this step.

Set Up Agent CI

Create your organization in Agent CI (you'll be redirected back after GitHub installation)
Click "Create New Application" and select your repository from the dropdown
Confirm permissions - Agent CI will show you exactly what access it has to your selected repositories

That's it. Agent CI is now monitoring your repository for pull requests and can run evaluations.

Step 2: Write Your First Test

Create a folder called .agentci in your repository root and add this file:

.agentci/basic-test.toml

[eval]
description = "Basic agent test"
type = "accuracy"
targets.agents = ["*"]

[[eval.cases]]
prompt = "What is 2 + 2?"
output = "4"

[[eval.cases]]
prompt = "Hello!"
output = "{{*}}"  # Matches any response

Commit and push this file. That's your first agent evaluation!

Step 3: See It Work

Open a pull request with your new .agentci/basic-test.toml file. Within seconds, you'll see:

✅ Agent CI: Evaluation complete
   └── ✅ basic-test (2/2 test cases passed)

That's it! You now have automated agent testing on every pull request.

Why This Works So Well

Configuration files in Git = Separation of concerns

Your agent code stays clean
Tests are versioned with your code
Changes are auditable and reviewable
Teams can collaborate on test criteria

Zero setup, maximum power

No servers to manage
No complex deployment configurations
Automatic environment creation
Native GitHub integration

Level Up: More Evaluation Types

Once you're comfortable with basic accuracy tests, try these:

Performance testing:

[eval]
type = "performance"
[eval.thresholds]
max_latency_ms = 2000

Safety checks:

[eval]
type = "safety"
[eval.templates]
prompt_injection = true
harmful_content = true

Consistency verification:

[eval]
type = "consistency"
iterations = 5  # Run 5 times, check variance

Next Steps

Add more tests as you need them:

Performance thresholds
Safety checks against prompt injection
Consistency across multiple runs

Invite your team:

Everyone can see evaluation results on PRs
Changes to tests go through code review
Full audit trail of who changed what

Scale with confidence:

Production monitoring runs automatically
Performance trends tracked over time
Regressions caught before users see them

The 5-Minute Foundation

In just 5 minutes, you've built something powerful:

Automated testing for AI agents
Confidence in your deployments
A foundation that scales with your team

Your agents are no longer black boxes. They're systematically tested, continuously monitored, and ready for production.

Get early access to Agent CI