Layer 2: Context Payloads
How to Optimize Context Payloads
Most teams inject full data objects on every request without measuring what the model actually uses. JSON records, RAG chunks, conversation history, user profiles. The model ignores most of it, but you pay for every token. Here's a repeatable process to trim it down.
Step 1: Define your success signal
Context payload optimization has two things to measure:
1. Field utilization: which fields does the model actually reference? Run a set of real queries against the current payload. For each response, trace which injected fields influenced the output. Any field never referenced across 50+ queries is a candidate for removal. 2. Output quality: does removing fields degrade the response? Take a representative query set. Run it with the full payload, then with the trimmed payload. Compare outputs. If the answer is the same or better, the removed fields were noise.
The goal is not to remove data the model needs. It is to stop paying for data the model ignores. Most teams discover 70 to 90 percent of injected fields are never used.
Step 2: Generate test cases
Pull real queries from production that hit this context payload. Cover the range of what users actually ask, including queries that need deep context and queries that need almost none.
Needs full context: "Why was I charged twice last month?" → needs billing_history "Which integrations do I have set up?" → needs integrations "Add my coworker to the team" → needs team_members Needs minimal context: "How do I reset my password?" → needs user_id only "What's the API rate limit on my plan?" → needs plan tier only "Where are my notification settings?" → needs nothing from payload Edge cases: "Show me everything about my account" → ambiguous scope "Compare my usage to last quarter" → needs historical data "Why is my dashboard different from my teammate's?" → needs preferences + team
Aim for 30 to 50 test queries. Weight by real-world frequency. Most support agents handle password resets and plan questions far more often than billing disputes.
Step 3: Benchmark the baseline
Run every test query against the current payload. For each response, record which fields were referenced, the output quality, and the token count. This tells you what the model actually uses versus what you are sending.
Test queries: 48 Payload size: 2,847 tokens (user_context.json) Field utilization: user_id 48/48 queries (100%) plan 31/48 queries (64.6%) open_tickets 18/48 queries (37.5%) billing_history 6/48 queries (12.5%) integrations 4/48 queries (8.3%) team_members 3/48 queries (6.3%) preferences 2/48 queries (4.2%) feature_flags 0/48 queries (0%) profile_image 0/48 queries (0%) dashboard_layout 0/48 queries (0%) Output quality score: 82% (rubric: relevance, accuracy, completeness) Avg tokens/call: 2,847 (context) + 1,200 (prompt) + 480 (completion) Cost per call: $0.0136
This is your floor. Notice that feature_flags, profile_image, and dashboard_layout were never referenced. Preferences was used twice. That is over 1,800 tokens of noise on every single call.
Step 4: Generate optimization candidates
Use agents to propose payload restructuring. The same three approaches work here:
Spawn 10 agents with the field utilization data. Each proposes a trimmed payload structure. Consensus: which fields to keep inline vs move to tool-level. Divergence: fields used 5-15% of the time. Human decides.
Spawn 3 agents: Minimalist: cut everything under 10% utilization Completionist: keep anything that prevents a follow-up call Architect: restructure into tiers (always, on-demand, never) Three rounds. Converge on a tiered injection strategy.
Pass the payload + utilization data to a single model. "Here's my user context object. Here's how often each field is referenced. Propose a trimmed version that keeps the fields used in >20% of queries inline, and moves the rest to tool-level injection."
The key optimization for context payloads is not just removing fields. It is tiered injection: always-present fields stay inline, sometimes-needed fields move to tool calls, never-used fields get dropped entirely.
Step 5: Test candidates against the same baseline
Run the exact same 48 test queries against the trimmed payload. Compare output quality to the baseline. If quality holds or improves, the optimization ships.
Test queries: 48 Payload size: 312 tokens (trimmed user_context.json) Always inline (every call): user_id, plan, open_tickets, last_billing_issue, active_integrations On-demand (tool-level, loaded when needed): billing_history, team_members, integrations config Dropped entirely: feature_flags, profile_image, dashboard_layout, preferences.theme, preferences.notifications, preferences.dashboard_layout Output quality score: 91% (up from 82%. Less noise, better signal.) Queries needing tool fallback: 8/48 (16.7%, handled automatically) Avg tokens/call: 312 (context) + 1,200 (prompt) + 420 (completion) Cost per call: $0.0058
Quality
+9pts
82% → 91%. Less noise, better signal.
Tokens
-89%
2,847 → 312 per request
Cost
-57%
$0.0136 → $0.0058 per call
Step 6: Map to business outcomes
Context payloads are often the largest token cost per call because they repeat on every request. A 2,500 token reduction across 40,000 calls per month adds up fast.
Workflow Calls/mo Before After Savings/mo ──────────────────────────────────────────────────────────────────── Support agent 42,000 $571 $244 $327 Account dashboard 28,000 $380 $162 $218 Onboarding flow 15,000 $204 $87 $117 Search assistant 8,500 $116 $49 $67 Total monthly savings: $729 Annual savings: $8,748
Context payload optimization often delivers the largest absolute cost savings because the waste compounds on every call. A bloated prompt is bad. A bloated payload attached to a bloated prompt is worse.
Then do it again
Payloads drift just like prompts. New fields get added to data models, new integrations get wired up, nobody audits what is actually being sent. The loop runs continuously:
1. Define success signal (field utilization + output quality) 2. Generate test queries from production traffic 3. Benchmark field utilization, output quality, and token cost 4. Generate optimization candidates (consensus, debate, or single model) 5. Test candidates: keep if quality holds at lower token count 6. Map to business outcomes: prioritize by call volume × savings 7. Re-audit quarterly, or when data models change
The same discipline applies to every type of injected context. RAG chunks, conversation history, tool results, memory files. If it gets injected per call, it should be measured per call.
Quality
+9pts
Less noise, better signal
Tokens
-89%
Payload trimmed to what matters
ROI
57%
Cost reduction at higher quality