Feature request
I would like to propose adding a native, multi-tier dynamic model routing feature to PR-Agent based on the token size of the pull request.
Currently, PR-Agent routes all requests to a single primary model. The proposed solution would allow users to define a list of routing rules (thresholds) in the configuration.toml file. When PR-Agent calculates the PR diff tokens, it should evaluate these rules to select the most appropriate model before sending the prompt.
Example configuration approach:
[config]
# Default/Fallback heavy model for the largest PRs
model = "anthropic/claude-3.5-sonnet"
enable_dynamic_routing = true
# A list of thresholds to route smaller PRs to cheaper/faster models
# Evaluated in ascending order of max_tokens
[[config.routing_rules]]
max_tokens = 1000
model = "gemini/gemini-2.5-flash"
[[config.routing_rules]]
max_tokens = 5000
model = "openai/gpt-4o-mini"
Workflow logic: If the calculated tokens are <= 1000, use gemini-2.5-flash. If it's > 1000 but <= 5000, use gpt-4o-mini. If it exceeds 5000, fall back to the main model (claude-3.5-sonnet).
Motivation
- The Cost Problem: Using heavy models (GPT-4o, Claude 3.5) for trivial 10-line PRs wastes API credits, while cheap models fail on complex architectural changes.
- The Overhead: The current workaround (deploying a standalone LiteLLM Proxy) introduces unnecessary infrastructure complexity for self-hosted users.
- The Solution: Native threshold-based routing optimizes API budgets by automatically handling trivial updates with cost-effective models, reserving expensive tokens only for deep reasoning.
Feature request
I would like to propose adding a native, multi-tier dynamic model routing feature to PR-Agent based on the token size of the pull request.
Currently, PR-Agent routes all requests to a single primary model. The proposed solution would allow users to define a list of routing rules (thresholds) in the configuration.toml file. When PR-Agent calculates the PR diff tokens, it should evaluate these rules to select the most appropriate model before sending the prompt.
Example configuration approach:
Workflow logic: If the calculated tokens are
<= 1000, usegemini-2.5-flash. If it's> 1000but<= 5000, usegpt-4o-mini. If it exceeds5000, fall back to the mainmodel(claude-3.5-sonnet).Motivation