Hardware Sniper

Same prompt. Same app. Three different AI configurations.
One used NEXUS + Ollama MCP for local LLM delegation. One used NEXUS context rules only. One ran vanilla, with no guardrails.
Here's what happened.

Live Demos

Build Statistics

Metric LocalLLM NEXUS Baseline
Build Time 4m 39s 6m 04s 15m 32s
Credits Used 8.32 13.67 8.11
Context Window Used 10% 14% 6%
Tests Passing 6 / 6 6 / 6 7 / 7
Test Workarounds Needed None None 1 seed fix
TypeScript Errors 0 0 0
File Organization Flat (all in src/) Organized (components/, views/, hooks/, db/) Split components (Sidebar, Dashboard, Analytics .tsx) + tests/
Agent Framework NEXUS + Ollama MCP (no OnMars agent) OnMars + NEXUS context rules OnMars (no context rules)

Build Time Comparison

LocalLLM
4m 39s
Nexus
6m 04s
Baseline
15m 32s

Credits Used

Baseline
8.11
LocalLLM
8.32
Nexus
13.67

Context Window Usage

Baseline
6% context
LocalLLM
10% context
Nexus
14% context

Key Insights

Local LLM Delegation Won on Speed

The NEXUS + Ollama MCP run was the fastest at 4m 39s — beating NEXUS-only by 1.4 minutes and baseline by over 10 minutes. Offloading mechanical tasks (commit messages, boilerplate) to local models via Ollama freed the cloud model to focus on architecture.

Agent Overhead is Real — But Ollama Offsets It

The baseline used the OnMars agent without context rules and took 3.3x longer than the LocalLLM run despite comparable output. NEXUS context rules cut that overhead (1.3x vs LocalLLM), but adding Ollama MCP on top eliminated it entirely — the fastest run had the most framework.

Credits Tell a Different Story

NEXUS-only used the most credits (13.67) despite being mid-speed. LocalLLM and baseline were nearly identical (~8 credits). Local model delegation appears to shift mechanical token spend off the cloud, keeping credits low while the NEXUS context rules keep quality high.

Context Usage Was Low Across the Board

All three runs stayed under 15% context — a dramatic improvement from earlier experiments where baseline hit ~88%. This suggests Kiro's newer CLI handles context more efficiently, reducing the need for explicit context management rules.

All Three Delivered

Every run produced a working app with price engine, analytics charts, sidebar navigation, and passing tests. TypeScript compiled clean in all three. The differences were in speed, cost, and architecture — not in whether the app worked.

The Takeaway

For a well-scoped single prompt, local LLM delegation is the sweet spot. NEXUS context rules + Ollama MCP delivered the fastest build at the lowest credit cost. The local model handled mechanical tasks for free while the cloud model focused on what matters — architecture and complex logic.

Coming Soon — More NEXUS Builds

NEXUS (Network of EXperts, Unified in Strategy) is a central repository for defining multi-model agentic behaviors, personas, prompts, and orchestration tools. The context management rules tested above are just one piece. NEXUS features are being integrated incrementally into the AI-toolkit to avoid breaking changes — each test below isolates a single NEXUS capability.
Coming Soon
NEXUS — Full Framework

Complete NEXUS Orchestration

All NEXUS capabilities active: tiered model routing, specialized agent personas, context management, quality gates, and memory persistence. The full multi-model pipeline — not just one rule, but the entire orchestration layer.

Coming Soon
NEXUS — Quality Gates

Default-to-Fail Evidence System

The framework mandates a "Default-to-Fail" posture and requires multi-dimensional evidence before any deliverable passes. Screenshots at 3 breakpoints for UI, actual HTTP responses for APIs, deployment logs for infra. Agent assertions are not evidence — runtime proof is.

Coming Soon
NEXUS — Multi-Model Routing

Tiered Cloud + Local Routing

Automatic routing across tiers: Tier 1 (Gemini Pro / Claude Opus) for deep architecture, Tier 2 (Gemini Flash) for summaries and basic UI, Tier 3 (local Ollama) for trivial tasks. Each task goes to the cheapest model that can handle it.

Coming Soon
NEXUS — Multi-Step Workflows

Complex Task Orchestration

Where NEXUS should shine: multi-phase builds requiring planning, delegation, and quality gates. Single-prompt tasks don't need orchestration — but real-world projects with 10+ interdependent steps do.

Methodology

Prompt: All three tests received the identical build prompt for a React/Vite "hardware-sniper" app with Brutalist design, Dexie.js offline-first database, simulated price engine, complex UI with sidebar/dashboard/analytics, and Vitest tests.

Configurations:
LocalLLM: Kiro CLI with NEXUS context rules + Ollama MCP server for local model delegation. No OnMars agent. Mechanical tasks (commit messages, boilerplate, test scaffolds) were routed to local Ollama models (qwen2.5-coder, llama3.2) instead of the cloud model.
NEXUS: OnMars agent with strict context management rules (50% compaction gate, 75% hard stop) from the NEXUS framework.
Baseline: OnMars agent with no context management rules — full context window available, no guardrails.

Tool: All three runs used Kiro (Amazon's AI IDE) in auto mode. The NEXUS and baseline runs used the AI-toolkit's OnMars agent configuration. The LocalLLM run used Kiro CLI with the NEXUS framework and Ollama MCP but no OnMars agent.

Disclaimer

This comparison is based on three single runs and should not be taken as concrete proof of one approach over another. Different CLIs (Claude Code, Gemini CLI) were not tested. A rigorous benchmark would require multiple runs across different tools, models, and prompt variations. This is an exploratory observation, not a controlled experiment.