A comprehensive PHP-based testing framework for validating OpenAI function calling capabilities and message responses. This framework allows you to test whether your AI assistant correctly calls the right functions with the right parameters or provides appropriate text responses based on user inputs.
- Function Call Testing: Validate that the AI calls the correct functions with expected parameters
- Message Response Testing: Verify that the AI provides appropriate text responses when no function calls are needed
- AI-Powered Meaning Validation: Uses another AI model to validate if response meanings match expectations
- Colorized Console Output: Easy-to-read test results with color-coded success/failure indicators
- Flexible Test Filtering: Run all tests or filter by test name patterns
- Detailed Error Reporting: Comprehensive output showing what was expected vs what was received
- Concurrent Execution: Run tests in parallel batches with
--concurrency/-cflag for faster results - Repeat Mode: Run tests multiple times with a numeric argument to measure consistency
- Token Usage Tracking: Automatic tracking and reporting of input/output/cached token usage per test
- Timing Reports: Elapsed time per test, per batch, and overall suite execution time
- Multi-Turn Conversation Testing: Full message history with function calls and function outputs in test cases
βββ test.php # Main test framework and runner
βββ cases.json # Test case definitions (active)
βββ cases.default.json # Default test case template
βββ structure.json # API configuration: model, tools, streaming, etc.
βββ structure.default.json # Default API configuration template
βββ settings.ini.php # API keys and configuration settings (active)
βββ settings.ini.default.php # Default settings template
βββ README.md # This documentation
βββ LICENSE # Project license
β οΈ Important Setup Notice
Before running the tests, you must copy the default configuration files to their active versions:
- Copy
settings.ini.default.phpβsettings.ini.php- Copy
cases.default.jsonβcases.json- Copy
structure.default.jsonβstructure.jsonThen customize these files with your specific API keys and test configurations.
Note: After copying
structure.default.jsontostructure.json, you might want to edit the file and remove the vector storage section if you don't need it for your testing purposes.
Configure your API keys and models:
<?php
return [
"api_key" => "your-openai-api-key",
"model_meaning_verification" => "gpt-4.1-mini",
"model_core" => "o3-mini",
"system_prompt" => "Your system prompt here..."
];
?>Defines the OpenAI API request structure including available tools and model configuration.
Test cases are defined in cases.json with the following structure:
{
"name": "test_name",
"messages": [
{
"role": "user",
"content": "User input message"
}
],
"expected_output": {
"type": "function_call|message",
"tool": "function_name", // For function_call type
"arguments": {"key": "value"}, // Optional, for validating function arguments
"meaning": "Expected meaning" // For message type with AI validation
},
"ignore": false // Optional: skip this test when running all tests
}"function_call"β Expect the AI to call a specific tool"message"β Expect the AI to respond with a text message (no function calls)["function_call", "message"]β Accept either outcome (array form for multiple valid possibilities)
Test cases can include full message histories with function calls and their outputs. Use "type": "message" for user/assistant messages and "type": "function_call" / "type": "function_call_output" for tool interactions:
{
"name": "multi_turn_example",
"messages": [
{
"type": "message",
"role": "user",
"content": "Check my balance"
},
{
"type": "function_call",
"status": "completed",
"call_id": "fc_001",
"name": "check_balance",
"arguments": "{}"
},
{
"type": "function_call_output",
"call_id": "fc_001",
"output": "You have $50 balance."
}
],
"expected_output": {"type": "message"}
}Here are the current test cases included in the framework:
{
"name": "support_price",
"messages": [
{
"role": "user",
"content": "What is paid support price?"
}
],
"expected_output": {
"type": "function_call",
"tool": "support_price"
}
}What it tests: Verifies that when a user asks about support price, the AI correctly calls the support_price function.
{
"name": "password_reminder",
"messages": [
{
"role": "user",
"content": "I forgot my password, my e-mail is remdex@gmail.com"
}
],
"expected_output": {
"type": "function_call",
"tool": "password_reminder",
"arguments": {"email": "remdex@gmail.com"}
}
}What it tests: Ensures the AI calls the correct function with the right parameters when asked to remind password.
{
"name": "welcome",
"messages": [
{
"role": "user",
"content": "Hi"
}
],
"expected_output": {
"type": "message"
}
}What it tests: Validates that the AI responds with a text message (not a function call) for greeting inputs.
{
"name": "unrelated",
"messages": [
{
"role": "user",
"content": "Who is the president of the USA?"
}
],
"expected_output": {
"type": "message",
"meaning": "Answer to user question should not be provided as it is not related to the Live Helper Chat."
}
}What it tests: Ensures the AI refuses to answer questions unrelated to the Live Helper domain and uses AI-powered validation to check if the response meaning matches expectations.
{
"name": "support_price_multi_turn",
"messages": [
{
"type": "message",
"role": "user",
"content": "Hi"
},
{
"type": "message",
"role": "assistant",
"content": "Hello! How can I help you?"
},
{
"type": "message",
"role": "user",
"content": "What is paid support price?"
}
],
"expected_output": {"type": "function_call", "tool": "support_price"}
}What it tests: Verifies the AI correctly calls a function after a multi-turn conversation history.
{
"name": "password_reminder_or_message",
"messages": [
{
"role": "user",
"content": "Can you help me reset my password?"
}
],
"expected_output": {"type": ["function_call", "message"]}
}What it tests: Accepts either a function call or a message response as valid β useful when either outcome is acceptable.
php test.phpTests with "ignore": true are automatically skipped when running all tests. To run an ignored test, target it by name.
php test.php "password_reminder"This will run all tests containing "password_reminder" in their name. When targeting a specific test by name, the ignore flag is bypassed.
php test.php --concurrency 5
php test.php -c 3 "support_price"Executes tests in parallel batches. The first example runs all tests 5 at a time; the second runs matching tests 3 at a time.
php test.php 10
php test.php 10 --concurrency 5
php test.php "welcome" 3A numeric argument repeats all (or matching) tests that many times. Useful for measuring consistency and accumulating token stats over multiple runs.
The framework provides detailed, colorized output:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenAI Function Calling Test β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Running test: support_price
Expected tool: support_price
Expected type: function_call
[β] support_price - PASS <function_call> (1.23s, 450t (320i/130o))
β Called tools: support_price
Running test: password_reminder
Expected tool: password_reminder
Expected type: function_call
[β] password_reminder - PASS <function_call> (980ms, 520t (380i/140o))
β Called tools: password_reminder
β Arguments matched: {"email": "remdex@gmail.com"}
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Test Summary β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total Runs: 2
Passed: 2
Failed: 0
Success Rate: 100.0%
ββ Timing Report ββ
Total time: 2.5s
Avg per test: 1.2s
Avg per run: 2.5s
ββ Top Token Consumers ββ
# Test Name Runs Input Output Cached Total
1. password_reminder 1 380 140 0 520
2. support_price 1 320 130 0 450
π All tests passed! π
β/ββ Pass/Fail indicator<function_call>/<message>β The matched expected output type1.23sβ Elapsed time for the test450t (320i/130o)β Token usage: total (input/output). If present, cached tokens shown withcsuffix- Timing Report β Shows total, per-test, and per-run averages
- Top Token Consumers β Top 5 tests ranked by average total tokens per run
- Verifies the correct function is called
- Validates function arguments match expected values
- Checks that no unexpected function calls occur
- Ensures AI responds with text messages when appropriate
- Validates no function calls are made when not expected
- Uses AI-powered meaning validation for semantic correctness
For message responses with meaning defined, the framework uses a separate AI model to validate if the actual response semantically matches the expected meaning in context of the original user question.
- Open
cases.json - Add a new test case following the format above
- Run the tests to validate your new case
Example of adding a new test:
{
"name": "check_account_balance",
"messages": [
{
"role": "user",
"content": "What is my account balance?"
}
],
"expected_output": {
"type": "function_call",
"tool": "get_account_balance"
}
}| Property | Type | Description |
|---|---|---|
name |
string | Unique test name (used for filtering) |
messages |
array | Array of message objects forming the conversation |
expected_output.type |
string or string[] | "function_call", "message", or an array like ["function_call", "message"] |
expected_output.tool |
string | (function_call only) Expected function name |
expected_output.arguments |
object | (function_call only) Expected arguments to match |
expected_output.meaning |
string | (message only) Semantic meaning to validate via AI |
ignore |
boolean | If true, test is skipped when running all tests |
| Type | Description |
|---|---|
"role": "user" / "role": "assistant" |
Simple message (legacy format) |
"type": "message", "role": "user" |
User message (explicit format) |
"type": "message", "role": "assistant" |
Assistant message (explicit format) |
"type": "function_call" |
A function call made by the AI in history |
"type": "function_call_output" |
The output of a function call in history |
The framework provides comprehensive error handling:
- API connection errors
- Invalid JSON responses
- Missing or malformed test cases
- Function call validation failures
- Meaning validation errors
- PHP 7.4 or higher
- cURL extension enabled
- Valid OpenAI API key
- Internet connection for API calls
- Add test cases to
cases.json - Update function definitions in
structure.jsonif needed - Run tests to ensure everything works
- Submit your changes
This project is licensed under the terms specified in the LICENSE file.