OpenAI Agents SDK Testing & Evaluation

AgentCI offers comprehensive OpenAI Agents SDK testing and evaluation for developers building with OpenAI's native agent framework. Our platform provides automated OpenAI Agents CI that discovers your Agent instances, function tools, and validates them through rigorous evaluation workflows - all without touching your code.

OpenAI Agents SDK evals with zero code changes required

AgentCI automatically discovers and evaluates OpenAI Agents SDK agents, including:

  • Agent discovery: Agent() instances with name, instructions, model, and tools parameters
  • Evaluation types: Accuracy, safety, performance, and tool execution testing
  • CI/CD integration: Automated testing on pull requests via GitHub
  • Zero code changes: No wrappers or modifications to your Python code required

Supported Agent Patterns

Agent Class

from agents import Agent

agent = Agent(
    name="Research Assistant",
    instructions="""You are a research assistant that helps find information.

Use the search tool to find relevant information.""",
    model="gpt-4o",
    tools=[search_web, get_weather]
)

Supported Tool Patterns

@function_tool Decorator

from agents import function_tool
from typing import Dict, Any
from datetime import datetime

@function_tool
def search_web(query: str) -> Dict[str, Any]:
    """Search the web for information.

    Args:
        query: The search query

    Returns:
        Search results as a dictionary
    """
    return {
        "query": query,
        "results": ["Result 1", "Result 2", "Result 3"],
        "timestamp": datetime.now().isoformat()
    }

Plain Functions

def get_current_time() -> str:
    """Get the current time as a string."""
    return datetime.now().isoformat()

What Gets Auto-Discovered

AgentCI automatically finds:

  • Agent() instances with name, instructions, model, and tools parameters
  • Functions decorated with @function_tool
  • Plain Python functions with docstrings

No configuration required - AgentCI detects both decorated and plain functions as tools.

Next Steps