CTI: Map → Architect → Build, In Practice

Orbital's engagement model has three phases: Map, Architect, Build. Let's see how it actually played out for CTI — what got mapped, what got architected, what got built, in the order it actually happened.

The phases are sequential for a reason. Each one produces a specific kind of artefact, and each artefact is a precondition for the next. When the sequence is honoured, the system gets durable. When it's skipped, you end up with AI that sounds plausible and behaves like a tourist.

Phase 1: Map — Surfacing What Cycling Coaching Actually Runs On

The Map phase answers one question: what intelligence does this domain actually reason in?

It's not a technical question. It's an expertise question. Before any code gets written, before any architecture is drawn, the work is to surface the substrate the experts in this domain actually use to make decisions.

For cycling coaching, the substrate turned out to be a stack of three things:

Quantitative substrate. The Performance Management Chart vocabulary — TSS (Training Stress Score), CTL (Chronic Training Load), ATL (Acute Training Load), TSB (Training Stress Balance) — built on top of the Critical Power model and Bannister's impulse-response framework. Plus the rider's actual numbers: FTP, recent rides, power curve, weight.

Subjective substrate. What the metrics miss. Feeling and RPE (rate of perceived exertion). The rider's self-report of how today actually went — sleep, life stress, the hint of a flu, the nagging knee.

Procedural substrate. How a coach actually responds. The shape of "should I go hard today?" answered well. What gets escalated, what gets reassured, what gets a question asked back. The fact that a competent coach refuses to speculate about cardiac symptoms and refers them out.

What the Map phase produced for CTI:

A clear separation between the model the coach reasons in (TSS-based PMC) and the data the coach has access to (per-ride telemetry, profile facts, conversation history).
A list of question archetypes — workout generation, ride analysis, route search, coaching chat, off-topic — that became the routing classes for the architecture phase.
A clear grounding constraint: every recommendation has to be traceable to the rider's actual numbers, not textbook averages. This was the central design constraint everything downstream had to honour.
A safety boundary: medical guardrails written in plain English, before they were ever in a system prompt.

The artefact: CTI's Fitness Metrics: TSS, CTL, ATL, TSB — the Map output, written down. Roughly the same length as a small textbook chapter, and the substrate the rest of the system reasons over.

Why this transfers. Every domain has its TSS. The accumulated vocabulary the experts actually reason in. For an accounting firm it's the firm's tax-planning frameworks. For an environmental consultancy it's the regional methodology and assessment rubrics. For a logistics operator it's the exception-handling decision tree the best dispatcher uses without thinking about it. Surfacing that substrate before any system gets designed is the difference between AI that sounds expert and AI that is.

Phase 2: Architect — Designing the System Around That Intelligence

Architect answers a different question: how does an AI actually reason over this substrate in a way that produces output reflecting a real coach's standards, not a generic chatbot's?

This is where the Map output becomes a system design. The substrate doesn't change in this phase — the architecture is the answer to "given that substrate, how do we make it usable?"

The key architectural decisions for CTI:

Pipeline-first, not prompt-first. Routing decisions happen before the model sees the message. A Haiku classifier sorts every incoming message into one of the five archetypes from the Map phase. Off-topic messages short-circuit the entire pipeline before any expensive model call. Workout-generation messages force toolChoice: 'required'. The model never has to figure out whether it should call a tool — the pipeline decides that for it.

Three-tier memory. Profile memory for permanent facts (FTP, goals, medical conditions). Session memory for per-ride episodic summaries. Chat-message index for full searchable history. Each tier serves a different kind of recall, because the Map phase made clear that "what does the coach remember?" doesn't have a single answer.

Layered system prompt. Ten distinct layers assembled in strict order. Safety first, base prompt second, skill instructions next, then the user's profile, then ride data, then the retrieved past-conversation context, then the intent hint, then the skills manifest. Every layer has one job. Adding a new context source means adding a layer, not editing a giant string.

Hybrid retrieval. Four parallel searches — keyword and semantic, on chat messages and session memories — merged with Reciprocal Rank Fusion. RRF rewards items that appear across multiple lists, which means a message that scored in both keyword and semantic search ranks higher than one that only matched one of them. The model sees the right context at the right granularity.

Composable from day one. Every library is structured around (supabase, userId), not request objects. The same code path serves the web UI and (later) the MCP server. This is an architectural decision that has nothing to do with cycling — it's about making the build phase cheap.

What the Architect phase produced for CTI:

A request-flow diagram you could hand to an engineer who'd never seen the system and have them understand it.
A clear separation between what's loaded into context (Architect output) and what's in the substrate (Map output) — so when the substrate evolves, the architecture doesn't need to.
The prompt-version system that makes prompt changes versionable, testable, and rollbackable in the Build phase.

The artefact: CTI's AI Architecture: How the Coaching Layer Works — the system design, with diagrams.

Why this transfers. The architecture isn't cycling-specific. Pipeline routing, multi-tier memory, hybrid retrieval, layered prompts, composable libraries — that shape is the right shape for any domain-specific AI product. Swap the substrate, keep the architecture.

Phase 3: Build — Shipping, Instrumenting, Improving

Build answers the third question: how do we get this into production, prove it works, and design the loop that makes it better over time?

This is where most AI projects fall over. They ship something that demos well, then degrade in production because no one is watching, no one is measuring, and prompt changes are deploy-and-pray. The Build phase exists to prevent that.

What got built for CTI:

The foundation app, fast. 3D route viewer, FIT-file parsing, cinematic camera animation, real-time charts — production-ready in a week, on roughly $50 of model spend. The point of building fast isn't bragging rights; it's getting to a real system you can iterate on, instead of a paper design you can argue about.

The coaching pipeline, instrumented. Every conversation captured as a trace via OpenTelemetry. Every prompt change versioned with a structured changelog. Every model call tagged with the prompt version, the intent class, the tool selection, the latency, the cost. Production and observability were built in from the first commit, not bolted on later.

The evals loop. Golden fixtures (the canonical good cases) and regression fixtures (real production failures, promoted from traces). An admin triage UI that lets bad responses get tagged and turned into test cases in a few clicks. Prompt changes run against the suite before they ship. This is what turns "deploy and pray" into "deploy and prove."

The MCP server. OAuth 2.1 with PKCE for interactive clients (Claude Desktop). Personal Access Tokens for scripts. Per-user scoping that keeps every tool call isolated to one identity. The whole thing slotted into the existing app in roughly a week — only because the Architect phase had already structured every library to make it possible.

The crucial Build decision was that instrumentation is not optional. Every conversation captured. Every prompt version tracked. Every bad response triageable into a regression fixture. That's what makes the system improve over time rather than degrade.

The artefacts:

Building CTI in One Week with Claude Code — the foundation, shipped.

CTI's Reinforcing Evals Loop — the quality machinery.

CTI's MCP Server: Making the Coach Composable — the composability layer.

Why this transfers. Ship fast, instrument from day one, design the improvement loop before you need it. That sequence is what makes a system get more valuable over time rather than degrading. It's the same pattern in any domain.

What Each Phase Delivered

Phase	What it produced for CTI	The transferable principle
Map	The PMC fitness substrate, the five question archetypes, the grounding constraint, the safety boundary	Every domain has its substrate. Surface it before you build.
Architect	Pipeline-first routing, three-tier memory, hybrid retrieval, layered prompts, composable libraries	The architecture is mostly the same across domains. The substrate is what's specific.
Build	Production app, evals loop, MCP server — all instrumented from day one	Ship fast, instrument always, design the improvement loop before you need it.

Why The Phases Are Sequential

You can't Architect well without a Map. CTI's three-tier memory was the right design because the Map phase had already made clear that coaching needs three different kinds of recall: permanent facts (FTP, goals), per-ride episodes (last Tuesday's threshold), and full-history search (the recurring knee complaint). Designing memory before understanding what needed remembering would have produced something more general, less useful, and harder to evolve.

You can't Build well without an Architecture. The reason CTI's MCP server was a one-week addition rather than a six-week rewrite is that the Architect phase had already structured every library around (supabase, userId) rather than around request objects. The MCP server slotted in without disrupting anything. That kind of post-fact composability only works if the Architecture anticipated it.

And the Map output is the long-term asset. The substrate doesn't change when the AI models change. The architecture flexes. The build evolves. But the substrate — the structured representation of how the domain actually reasons — gets more valuable as everything around it moves.

That's why the Map phase stands on its own as a deliverable. Even if Architect and Build never happen, the Map output is something the business owns and can reuse.

Start With A Map

If you're considering this approach for your own business, the place to start is the same place CTI started: a Map engagement. We surface the operational intelligence in your business, identify where AI has genuine leverage, and produce a concrete plan.

You walk away with clarity, whether or not we keep working together.

Email: info@orbital.co.nz

Read the CTI case study · orbital.co.nz