-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description of the Need
As a BADGERS user, I want the system to intelligently manage model selection and token usage based on task complexity and cost constraints, So that I can control per-page analysis costs while maintaining quality where it matters.
Context
Different documents have wildly different complexity levels. A simple typed form doesn't need the same model or token budget as a dense handwritten ledger with marginalia. Currently every page gets the same model and prompt treatment regardless of complexity, leading to unnecessary cost on simple pages and potentially insufficient analysis on complex ones. Customers want cost predictability (e.g. ~$0.60/page average) with quality that scales to the task.
Feature Areas
1. Automatic model routing based on task complexity
- Implement a lightweight pre-classification step that assesses page complexity (e.g. simple/medium/complex)
- Route simple pages to cheaper/faster models, complex pages to more capable models
- Classification could factor in: image resolution, detected text density, document type, handwriting presence
- Routing rules configurable per analyzer
2. Token Budgeting per Analyzer
- Define a target cost-per-page for each analyzer (e.g. $0.60 average)
- System tracks running average cost and adjusts model/prompt selection to stay within budget
- Support hard caps (never exceed $X per page) and soft targets (average over N pages)
- Surface alerts when an analyzer is trending over budget
3. Prompt Cost Analysis and Optimization Recommendations
- Break down cost per invocation: input tokens, image tokens, output tokens
- Identify which prompt components contribute most to cost
- Recommend optimizations: prompt trimming, fewer examples, output format changes, model downsizing
- Provide a cost analysis report per analyzer (on-demand or periodic)
Acceptance Criteria
- A complexity classifier can categorize pages before full analysis
- Model routing rules are configurable per analyzer with sensible defaults
- Per-analyzer cost targets can be defined in configuration
- The system tracks and reports actual cost per page against targets
- A cost breakdown is available per invocation showing input/image/output token costs
- The system can generate optimization recommendations for a given analyzer's prompt configuration
- Existing deployments without cost config continue to work unchanged
Technical Considerations
- Complexity classification should be cheap itself — could use a small model, heuristics, or image analysis
- Token counting needs to happen pre- and post-invocation for accurate tracking
- Bedrock pricing varies by model and region; need a pricing reference table (or API)
- Image token costs follow different pricing models than text tokens
- This feature depends on the pluggable LLM backends feature for model routing to work across providers
Out of Scope (future consideration):
- Real-time cost dashboards
- Automatic prompt rewriting/compression
- Cross-analyzer budget pooling