Same prompt. Same app. Two different AI configurations.
One test used strict context management rules (NEXUS framework).
The other ran vanilla, with no context guardrails.
Here's what happened.
| Metric | NEXUS | Baseline |
|---|---|---|
| Build Time | 6m 04s | 17m 14s |
| Credits Used | 13.67 | 30.21 |
| Context Window Used | 14% | ~88% |
| Tests Passing | 6 / 6 | 16 / 16 |
| Test Workarounds Needed | None | Yes (fake-indexeddb bugs) |
| TypeScript Errors | 0 | 0 |
| File Organization | Organized (components/, views/, hooks/, db/) | Flat (all files in src/ root) |
The NEXUS build completed in 2.8x less time and used 2.2x fewer credits. By enforcing the 50% context rule, the AI avoided the debugging spiral that consumed ~60% of the baseline transcript.
NEXUS built fewer features but went deeper: real-time price simulation, relational data queries, and interactive charts. Baseline built more surface features: CRUD modals, form validation, status tracking — but no analytics or live engine.
NEXUS produced an organized folder structure (components/, views/, hooks/, db/) while baseline dumped everything into the src/ root. Context pressure likely prevented the baseline AI from investing in structural decisions.
NEXUS tests passed cleanly with no workarounds using isolated DB instances. Baseline encountered fake-indexeddb bugs and spent 6+ diagnostic iterations before finding brittle workarounds. The NEXUS AI's proactive isolation pattern avoided the problem entirely.
At ~88% context, the baseline AI was deep in debugging fake-indexeddb constraint errors — a library-level issue, not an application bug. Without context limits, nothing stopped it from sinking tokens into a problem that could have been sidestepped.
Context management rules don't just save tokens — they change the AI's decision-making. A constrained AI triages harder, avoids rabbit holes, and delivers higher-value output per token spent.
All NEXUS capabilities active: tiered model routing, specialized agent personas, context management, quality gates, and memory persistence. The full multi-model pipeline — not just one rule, but the entire orchestration layer.
Route tasks to local models via Ollama based on complexity triage —
1.5B for formatting, 3B for logic, 7B+ for architecture.
Make your dev flow basically free, independent from cloud providers.
Falls back to network compute when a .env override is set.
The framework mandates a "Default-to-Fail" posture and requires multi-dimensional evidence before any deliverable passes. Screenshots at 3 breakpoints for UI, actual HTTP responses for APIs, deployment logs for infra. Agent assertions are not evidence — runtime proof is.
Automatic routing across tiers: Tier 1 (Gemini Pro / Claude Opus) for deep architecture, Tier 2 (Gemini Flash) for summaries and basic UI, Tier 3 (local Ollama) for trivial tasks. Each task goes to the cheapest model that can handle it.
agent.md file with
strict context management rules (the 50% compaction gate, 75% hard stop). The baseline
run had no such guardrails — the AI could use the full context window freely.