Same prompt. Same app. Three different AI configurations.
One used NEXUS + Ollama MCP for local LLM delegation.
One used NEXUS context rules only.
One ran vanilla, with no guardrails.
Here's what happened.
| Metric | LocalLLM | NEXUS | Baseline |
|---|---|---|---|
| Build Time | 4m 39s | 6m 04s | 15m 32s |
| Credits Used | 8.32 | 13.67 | 8.11 |
| Context Window Used | 10% | 14% | 6% |
| Tests Passing | 6 / 6 | 6 / 6 | 7 / 7 |
| Test Workarounds Needed | None | None | 1 seed fix |
| TypeScript Errors | 0 | 0 | 0 |
| File Organization | Flat (all in src/) | Organized (components/, views/, hooks/, db/) | Split components (Sidebar, Dashboard, Analytics .tsx) + tests/ |
| Agent Framework | NEXUS + Ollama MCP (no OnMars agent) | OnMars + NEXUS context rules | OnMars (no context rules) |
The NEXUS + Ollama MCP run was the fastest at 4m 39s — beating NEXUS-only by 1.4 minutes and baseline by over 10 minutes. Offloading mechanical tasks (commit messages, boilerplate) to local models via Ollama freed the cloud model to focus on architecture.
The baseline used the OnMars agent without context rules and took 3.3x longer than the LocalLLM run despite comparable output. NEXUS context rules cut that overhead (1.3x vs LocalLLM), but adding Ollama MCP on top eliminated it entirely — the fastest run had the most framework.
NEXUS-only used the most credits (13.67) despite being mid-speed. LocalLLM and baseline were nearly identical (~8 credits). Local model delegation appears to shift mechanical token spend off the cloud, keeping credits low while the NEXUS context rules keep quality high.
All three runs stayed under 15% context — a dramatic improvement from earlier experiments where baseline hit ~88%. This suggests Kiro's newer CLI handles context more efficiently, reducing the need for explicit context management rules.
Every run produced a working app with price engine, analytics charts, sidebar navigation, and passing tests. TypeScript compiled clean in all three. The differences were in speed, cost, and architecture — not in whether the app worked.
For a well-scoped single prompt, local LLM delegation is the sweet spot. NEXUS context rules + Ollama MCP delivered the fastest build at the lowest credit cost. The local model handled mechanical tasks for free while the cloud model focused on what matters — architecture and complex logic.
All NEXUS capabilities active: tiered model routing, specialized agent personas, context management, quality gates, and memory persistence. The full multi-model pipeline — not just one rule, but the entire orchestration layer.
The framework mandates a "Default-to-Fail" posture and requires multi-dimensional evidence before any deliverable passes. Screenshots at 3 breakpoints for UI, actual HTTP responses for APIs, deployment logs for infra. Agent assertions are not evidence — runtime proof is.
Automatic routing across tiers: Tier 1 (Gemini Pro / Claude Opus) for deep architecture, Tier 2 (Gemini Flash) for summaries and basic UI, Tier 3 (local Ollama) for trivial tasks. Each task goes to the cheapest model that can handle it.
Where NEXUS should shine: multi-phase builds requiring planning, delegation, and quality gates. Single-prompt tasks don't need orchestration — but real-world projects with 10+ interdependent steps do.