CTI's Fitness Metrics: TSS, CTL, ATL, TSB — the Numbers Behind the Coach

The previous four posts in this series have been about engineering — building the viewer, wiring the coach, growing the evals loop, and exposing the MCP surface. This one is about the domain.

Strip away every line of code and what's left is a model of fitness. Three numbers — Fitness, Fatigue, Form — derived from a fourth, Training Stress Score (TSS), itself derived from your power data. The model is fifty years old (Bannister's impulse-response framework, 1975), but it's still the working model. Every coaching decision the CTI AI makes is grounded in these numbers. So is every colour on the weekly TSS badge, every "you should rest" suggestion, every "you're ready to push" recommendation.

This post covers what those numbers are, how CTI computes them, and how the coaching layer reasons over them.

CTI AI Coaching Fitness Chart

The Performance Management Chart

CTI implements the Performance Management Chart (PMC), originally formalised by Bannister and popularised in cycling by TrainingPeaks. The PMC tracks three related metrics:

Fitness (CTL) — your long-term aerobic base, exponentially weighted over ~42 days
Fatigue (ATL) — your short-term tiredness, exponentially weighted over ~7 days
Form (TSB) — fitness minus fatigue, your readiness to perform

All three are computed from daily TSS, which itself is derived from power data via Critical Power and Normalized Power. The pipeline is:

power data → NP → IF → TSS → CTL/ATL/TSB

Each step compresses the previous: thousands of power samples become one Normalized Power, NP and CP become one IF, IF and duration become one TSS, daily TSS becomes three rolling averages. By the time the coach is reasoning about your week, it's reasoning about a handful of numbers — but those numbers are the right ones.

Critical Power: The Anchor

Everything downstream depends on CP. It's the threshold against which efforts are normalised, the dividing line between sustainable and unsustainable, and the input to the IF calculation that produces TSS.

CTI fits a Critical Power model from a rolling 16-week window of your most recent rides — capped at 60 rides, with a minimum 10-minute duration per ride. The window is relative to the ride being viewed: a ride three months ago is analysed against your fitness at that point, not today. This matters because if a rider has gained 30W of CP since January, replaying a January ride against today's CP would make every January effort look like an easy spin.

Why CP rather than FTP? FTP is a single test-day number — often stale, often imperfectly tested. CP is fit from your actual power data; the model finds the asymptote of your power-duration curve. As long as you're riding hard occasionally, CP self-updates without a dedicated test.

Normalized Power, Intensity Factor, TSS

Normalized Power (NP) weights harder efforts more heavily than easy spinning, giving a truer read of metabolic demand than average power. A ride that alternates 400W and 100W has the same average as a ride at 250W constant, but the metabolic cost is much higher. NP captures that.

Intensity Factor (IF) is NP ÷ CP — how hard the session was relative to your current fitness. An IF of 1.0 is exactly threshold; 0.7 is endurance pace; 1.05 is over-threshold and unsustainable for long.

TSS combines IF and duration into a single training-load number, with one hour at CP defined as 100 TSS:

TSS = IF² × hours × 100

The squaring is the important detail — doubling intensity quadruples training stress. Two hours at IF 0.7 is 98 TSS; one hour at IF 1.0 is 100. The harder hour costs the same as the longer easy ride, which matches what your legs feel the next day.

CTI reads TSS from the processed FIT file (fit_metadata.trainingStressScore) or, when available, from the pre-computed ride insights (insights.rideCard.tss).

IF Zones

IF Range	Workout Zone
< 0.65	Recovery / very easy spin
0.65 – 0.75	Easy endurance / active recovery
0.76 – 0.85	Moderate endurance / tempo
0.86 – 0.94	Threshold / hard sustained effort
0.95 – 1.05	Very hard / sub- or over-threshold
> 1.05	Extremely intense — VO₂max / anaerobic intervals

These zones drive the workout-builder skill (/workout) and inform the ride-analysis narrative.

CTL: Fitness (42-day EWMA)

CTL — Chronic Training Load — is your long-term fitness, computed as an exponentially weighted moving average of daily TSS with a ~42-day time constant:

CTL_today = CTL_yesterday × (1 − 1/42) + TSS_today × (1/42)

The factor 1/42 means each day contributes about 2.4% to today's CTL. A single hard ride barely moves the number; a sustained training block moves it noticeably. That's the right behaviour — fitness isn't built in a day, and CTL shouldn't pretend it is.

CTL is computed on every request by iterating from the earliest ride date to today. Rest days contribute TSS = 0, causing natural decay. Multiple rides on the same day have their TSS summed before the EWMA update. The naive O(N) recomputation is fine because N is rarely more than a few hundred days; precomputed snapshots would be a premature optimisation.

A high CTL means you've built a large aerobic base through sustained training. It rises slowly — typically 5–8 TSS/day per week of consistent training — and falls slowly when you stop riding. The asymmetry is honest: you can lose fitness faster than you build it, but the EWMA's symmetry treats both sides equally, and that turns out to be a good enough approximation in practice.

ATL: Fatigue (7-day EWMA)

ATL — Acute Training Load — captures short-term fatigue using a ~7-day time constant:

ATL_today = ATL_yesterday × (1 − 1/7) + TSS_today × (1/7)

ATL rises quickly after hard training and falls quickly during rest, making it a sensitive indicator of recent load. The 7-day constant is roughly the timescale at which acute fatigue clears: a single hard day is mostly absorbed within a week, even without further easy days.

The mathematical machinery is identical to CTL — same EWMA structure, just a faster decay constant. The two curves run in parallel, with CTL the slow-moving baseline and ATL the responsive overlay.

TSB: Form

TSB — Training Stress Balance — is your "form" or freshness:

TSB_today = CTL_yesterday − ATL_yesterday

Note the lag: today's TSB uses yesterday's CTL and ATL. That's deliberate. TSB reflects the state you woke up in, before today's ride. A workout you do today doesn't tell you about your readiness to do that workout — the question is what shape you were in beforehand.

Joe Friel's interpretation of TSB is the canonical reference for cycling coaches, and CTI's bands follow it closely:

TSB Range	Interpretation
> +5	Fresh / tapered — peak readiness
−10 to +5	Neutral — well-recovered, normal training
−25 to −10	Productive fatigue — adapting to load
< −25	Deep fatigue — prioritise recovery

Positive TSB before a key event or race indicates good form. A sustained negative TSB through a training block is normal and expected; it only becomes a problem when combined with persistent poor subjective feeling — which is why CTI also captures Feeling and RPE.

A common misconception is that the goal is to keep TSB high. It isn't. TSB at +5 to +15 is what you want for a race. During build phases TSB should be negative; that's the signal that you're loading the body enough to provoke adaptation. Going positive every week means you're not training hard enough.

Feeling & RPE: Internal Load Overlay

TSS, CTL, ATL, and TSB measure external load — power times duration, deterministic, computable from the FIT file alone. They don't know whether you slept badly, whether you're fighting a virus, whether your last meal was three hours late.

For that, CTI captures internal load through two subjective signals after every ride.

Feeling — 5-point scale

Value	Score	Description
STRONG	5	Legs felt fresh and powerful
GOOD	4	Rode well, no complaints
NORMAL	3	Average day, nothing remarkable
POOR	2	Struggled, below par
WEAK	1	Flat, heavy, no power

Stored as a nullable text enum in attachments.feeling. The numeric score enables trend analysis and rolling averages.

RPE — Rate of Perceived Exertion (0–10)

RPE	Description
0	Nothing at all — resting
1–3	Easy — could talk normally, breathing naturally, very comfortable
4–6	Moderate — could talk in short spurts, breathing more laboured, working
7–9	Hard — could barely talk, breathing heavily, outside comfort zone
10	Max effort — gasping for breath, at physical limit

Stored as a nullable integer (0–10) in attachments.rpe. An optional free-text ride_note captures contextual detail that a structured rating cannot express ("struggled on the steep +10% climbs", "stopped twice for traffic", "flat tyre at km 30").

Why Feeling + RPE together

The combination of external load (TSS) and internal load (Feeling, RPE) reveals whether fatigue is productive or concerning:

Negative TSB + GOOD/STRONG feeling → adapting well, the training is working
Negative TSB + POOR/WEAK feeling → non-training stressors (sleep, illness, life stress) or early overreaching

An IF of 0.85 that felt like a 9/10 RPE is a warning. The same IF that felt like a 6/10 is a confirmation that the work is landing well. Neither signal alone tells the full story.

This is also where CTI's coaching layer earns its keep over a static dashboard — the AI architecture post describes how the coach combines TSB and Feeling at prompt-construction time to recommend or veto today's session.

Weekly TSS Colour Tiers

CTI shows a weekly TSS badge in the UI, colour-coded by the ratio of weekly TSS to current CTL rather than by absolute weekly TSS:

Ratio (weekly TSS / CTL)	Tier	Colour	Meaning
< 5.5×	Recovery	Green	Light / deload week
5.5 – 7.5×	Sustainable	Yellow-green	Normal productive week
7.5 – 9.0×	Build	Yellow	Hard week, monitor fatigue
9.0 – 11.0×	Overload	Orange	Very hard, plan recovery
> 11.0×	Danger	Red	Potential overreach risk

The ratio approach matters because the same 700 TSS week is recovery for a fit racer with CTL 110 and an overload for a returning rider with CTL 50. An absolute weekly TSS scale would mislead the latter rider into thinking they were doing fine.

The 5.5–7.5× sustainable band is the rule of thumb that matches stable training — weekly TSS roughly six times CTL maintains the current level. Higher ratios drive CTL up; lower ones let it drift down. Periodisation falls out naturally: build weeks in the 8–10× band, recovery weeks in the <5× band, race weeks at whatever the taper requires.

TSS Recovery Context

After each ride the narrative summary includes plain-language recovery guidance based on TSS alone (CTL/ATL context is layered separately):

TSS	Guidance
< 50	Light load — legs will be fresh tomorrow
50 – 99	Solid effort — one easy day, then ready for another quality session
100 – 149	Meaningful load — expect one to two days to fully absorb
150 – 249	Big day — two to three days before another hard effort
≥ 250	Exceptional load — protect the recovery that follows

These thresholds aren't physiologically precise — recovery depends on the individual, the kind of effort, and the rider's CTL — but they're the right shape. A 250 TSS day is a big day for almost everyone; 50 TSS is a light day for almost everyone. The narrative says so in plain English, then the coach can layer in personalisation based on the individual's history and TSB.

How the Coach Uses These Metrics

The AI architecture post covered the full message pipeline. The fitness state — current CTL, ATL, TSB, last-ride TSS, weekly trend — is loaded as part of the system prompt's context layer, surfaced via the cti://fitness/current resource (also available over MCP to external clients):

<fitness_state>
{
  "ctl": 78.4,
  "atl": 102.1,
  "tsb": -23.7,
  "lastRideTss": 156,
  "weeklyTss": 712,
  "weeklyTier": "build"
}
</fitness_state>

The coach's prompt-level decision logic combines TSB and Feeling:

Condition	Recommendation
Positive TSB + GOOD/STRONG feeling	Quality intervals or long endurance
Mild negative TSB (−5 to −15) + NORMAL/GOOD	Standard training, monitor RPE
Negative TSB + POOR/WEAK feeling	Recovery ride or rest, regardless of plan
Very negative TSB (< −20) + persistent WEAK	Recovery block

The same logic powers the /training skill suggestions and the form_check MCP prompt. It's also the criterion the coach uses when deciding whether to push the rider toward a hard session or recommend backing off — a POOR feeling with TSB at −22 will get a "today is a recovery day" answer regardless of what the user says they had planned.

The AI also uses the nutrition heuristic of ~10 kcal per TSS point as a rough energy-expenditure estimate on top of baseline maintenance. A 150 TSS ride is roughly 1,500 kcal of training stress — useful for fuelling guidance even though the actual figure varies with body mass and efficiency.

These metrics are also the substrate the reinforcing evals loop tests against. A regression where the model misreads CTL drift, or quotes the wrong TSB band, or mis-tiers a weekly TSS — those are exactly the cases that get promoted to golden or regression fixtures so the eval suite can guard them on the next prompt change.

Compound Score: A Future Direction

W/kg has been the canonical "how strong is this rider" metric for two decades. It's also incomplete — W/kg is most relevant on long climbs, where gravity dominates. On flat and rolling terrain, aerodynamics and absolute watts dominate, and a W/kg-only model under-rates the engine.

Compound Score is an emerging metric developed by WorldTour sports scientists Leo & Wakefield. It combines absolute power with power-to-weight by using 5-minute peak power:

Compound Score ≈ P²₅ₘᵢₙ / body_mass

Squaring 5-minute power amplifies absolute output, then dividing by mass keeps body weight in the picture. The result correlates better with race performance across mixed terrain than W/kg alone:

Compound Score	Rider level
700 – 1,000	Fit recreational rider
1,100 – 1,600	Strong club / masters racer
2,000 – 2,400	Regional / national racer
2,800+	U23 elite / WorldTour

The most useful application is as a personal tracking metric: trending upward across seasons at stable body weight indicates meaningful gains in the critical 5-minute range. Adding Compound Score to CTI is on the roadmap — the 5-minute peak is already extractable from the existing power-curve infrastructure, so it's a small calculation away. The harder question is how the coach should use it; a metric you can compute but never reference in a recommendation isn't worth the UI real estate.

What I'd Do Differently

Heart-rate-based TSS as fallback. CTI requires power data. For rides without a power meter — a road ride on the second bike, an outdoor session before a power-meter battery swap — there's no TSS, so CTL and ATL skip a day they probably shouldn't. Heart-rate-based TSS (TRIMP-style or hrTSS) would close the gap, even if the calibration is rougher than power-based.

Time-zone-aware day boundaries. Rest days contribute TSS = 0. But what counts as a "day"? Currently it's UTC midnight, which is wrong for any rider not on UTC — a Sunday night ride in Sydney can land in Monday's bucket. Storing each ride's local date alongside its UTC timestamp would fix this; the schema change is small, but the migration over historical rides is the kind of yak shave that gets deferred.

CP refit caching. CP is refit on every request from a 16-week window. At 60 rides this is fast, but the fit is recomputed even when no new ride has been added. Caching the fit per (user, window-end-date) and invalidating on new uploads would shave milliseconds off every request — small individually, meaningful at scale.

Acknowledging the model's limits. CTL/ATL/TSB is a model — not a physiological measurement. It assumes every TSS point is created equal, which isn't true (sweet-spot TSS recovers differently from VO₂max TSS). It ignores sleep, nutrition, and life stress entirely. The coach knows this — that's what Feeling and RPE are for — but the dashboards don't say so. A small "the model assumes…" link near the PMC chart would be honest about what it isn't.

Conclusion

The interesting realisation from building this side of CTI is that the metrics aren't the product — they're the shared vocabulary the coach reasons in. A rider asking "should I go hard today?" doesn't care about the EWMA constant. They care about the answer. But the answer the coach gives is only as good as the substrate it's reasoning over: TSS that's correct, CTL that reflects sustained training, TSB that captures yesterday's state, Feeling that catches the days power data misses.

Bannister's framework is fifty years old, and TSB-based form management has been the working model in cycling for two decades. The fact that no replacement has displaced them isn't because no one has tried — it's because the model is approximately right, and "approximately right" beats "physiologically precise but unimplementable" every time.

The PMC isn't perfect. It's the right level of abstraction for a coach to reason in, the right level of resolution for a rider to act on, and a stable enough vocabulary that the rest of CTI's machinery — the AI pipeline, the evals loop, the MCP surface — can build on it without worrying about the foundation shifting underneath.

References: Bannister's impulse-response model and PMC implementation (Science to Sport), Joe Friel on managing training using TSB, Leo & Wakefield on Compound Score

Built with: Claude Sonnet 4.6 via Claude Code CLI

Every domain has its TSS. The accumulated vocabulary the experts actually reason in — and that an AI has to ground on if its output is going to feel like real expertise rather than a textbook summary. Surfacing that vocabulary, structuring it, and making it the substrate the model reasons over is what Orbital's Map phase produces, in cycling or anywhere else where institutional expertise needs to scale. Read the CTI case study →

The CTI series: