Methodik

The model

references/model.md — direkt aus der Skill-Doktrin gerendert.

The conceptual core the rest of the skill assumes. Read it once at the start of any engagement. This file is the single home for the timeless structurewhat you are actually modelling. process.md tells you what to do each phase and how each fails; contract.md is the schema that encodes this model; pricing-tco.md is the cost method; examples.md shows it run end to end. This file owns the funnel mechanics and the axes doctrine; it does not repeat per-phase how-to (that is process.md) and carries no worked examples (those are examples.md).

The spine — a funnel over the decision space

Before the tools and the axes, the shape of the whole method: it is a funnel that contracts a near-infinite solution space down to one decision. A goal admits countless solutions; each phase rules most of them out. The artifact each phase emits is the residue of that contraction, not the point — the point is the narrowing. The five phases are these five contractions — one system, not two:

Phase Contraction Artifact
1 Mission frame the goal — the purpose and the use cases the system must do, tool-neutral the mission / use-case list
2 Requirements bound the space — choosing what matters is the first hard contraction criteria.json
3 Architecture partition the remainder into a few coherent architectures — and surface the first forks the forks + the architecture records
4 Solutions populate those architectures with concrete tools + build + glue, scored (forks extended, more discovered) the filled solution records
5 Decisions collapse what’s left to the final choice — close the value-forks value-fork positions + the verdict

This is the phase list — the funnel and the phases are the same five steps seen as contractions. The per-phase reasoning (why each exists, how it fails) is process.md; the unit each phase contracts through is the fork (below).

Tools are not solutions

The single distinction everything hangs on:

  • A tool is a buyable/rentable building block — a SaaS product, an open-source app, an API, a managed service. It has its own capabilities, its own pricing model, its own vendor risk. Tools are what the market offers.
  • A solution is what answers the customer’s requirements — a composition: the tools you choose + the custom build + the integration glue + the operating model that runs it. Solutions are what you design.

For the simplest engagement a solution is a single tool, and the two collapse — which is why the old “tool selection” framing worked for years. But the moment a real answer needs two tools plus middleware, or a bought core plus a built gap-filler, or a bespoke build, the collapse breaks and you have to model the two layers separately:

Tools feed solutions. Solutions are the unit you compare, score, and recommend.

Concretely, a single ranked list of tools cannot express “Tool A + Tool C + a routing service we build” versus “Platform B alone.” Those are two solutions composed from overlapping tools, and the customer is choosing between the solutions. Keep the layers distinct in your head and in the data: score tools on capability; score solutions on the four axes below.

Why this matters (the failure it prevents)

Flatten tools and solutions into one list and three things break. (1) Build disappears — a custom-built option has no vendor and no pricing page, so a tool-only matrix silently omits the very alternative the buy-vs-build question is about. (2) Integration cost is invisible — the glue between two tools is real effort and real fragility, but it belongs to no single tool, so it never gets counted. (3) The trade-off inverts at the wrong layer — a best-of-breed stack can out-fit any single platform and still be the wrong call once integration risk and maintenance are counted, and that only shows up when you score the composition as a whole.

Architecture is not Solution

A second distinction, one layer up, and the funnel turns on it. Between “tool” and “solution” sits the architecture — and conflating it with the solution is the failure this prevents.

  • An architecture is a tool-agnostic form: a coherent path through the decision forks — who owns the system of record, native vs. separate knowledge layer, rent vs. own. No tools yet. Enumerating architectures is Phase 3 (it partitions the space).
  • A solution is an architecture filled — concrete tools + custom build + glue, scored on the four axes under the Scenario. Producing solutions is Phase 4 (it populates the partitions).

The data may keep one evolving record — an architecture that grows tools[] and scores as research arrives becomes a solution — but the contract names both layers so you never call an empty architecture a “solution,” and the file name must track the dominant layer (the architectures*.jsonsolutions*.json rule below). The failure if you blur them: you either skip enumerating the forms and leap straight to scoring tools (and miss the build/own architectures that have no vendor), or you score before the forms are even on the table.

Three levels, three conformance roles

The architecture/solution/tool split is also a conformance split — each level relates to the Gate · Fit · Cost · Risk axes differently:

Level EN / DE Definition Conformance
Architecture (syn. solution family) Architecture / Architektur Tool-agnostic form — a coherent path through the forks; a class of solutions Not scored directly. Gets qualitative verdicts (landscape findings) and may be represented by its best-known member (“representative contender”) — explicitly provisional while exploration is open
Solution Solution / Lösung An architecture filled with specifics — concrete product(s) + build + glue; the unit that is ranked. A single product can be a solution (trivial composition) Rolled-up Gate · Fit · Cost · Risk under the Scenario
Tool Tool / Baustein (product) A buyable/rentable building block Per-criterion scores — inputs to the solution roll-up, never the ranked unit

The maturity / file-naming rule

The file name tracks the layer it holds. architectures*.json holds architecture records; solutions*.json holds filled/scored solutions. The “one evolving record” above is still fine — an architecture record grows into a scored solution — but its file name must track the dominant layer: an architectures file that Phase 4 populates into scored solutions is renamed at that roll-up, never carried as architecture records under a solutions name. (Motivating failure: an engagement’s solutions.json that holds only architectures — exemplar tools[], no scores — is exactly the “calling an empty architecture a solution” this warns against.) The field shapes for both files are in contract.md.

The representative contender

An architecture may render through its best-known member’s conformance as a family-typical line — explicitly labeled provisional, hardened per-product in the deep dive. Two ways to pick that member:

  • Authored — when the architecture’s members aren’t all scored yet, name a best-known exemplar by hand (provisional).
  • Computed — where every member of the architecture is scored, the representative is the top-ranked member at the current weighting, not authored — it moves as the weights move.

The four axes (single-placement rule — DOCTRINE)

Every criterion lives on exactly one of four axes. The discipline is the single-placement rule: each requirement is answered by exactly one question, so nothing is counted twice.

Axis The question it answers Type Effect
Gate “Is it even allowed / possible?” Binary pass/fail Eliminates. Never scored.
Fit “Can it do the work?” Scored 1–5, weighted Capability only.
Cost “What does it cost?” TCO (€), 4 buckets A band, separate from fit.
Risk “What can go wrong, and who carries it?” Scored, by dimension Downside / uncertainty.

The rule’s whole job is to break double-counting. The classic trap: a property that is both a capability and a downside — “graceful degradation when the API is down,” “SLA uptime,” “data-control depth above the legal minimum.” Score it as Fit and mention it as Risk and you’ve counted it twice. Decide with one question:

  • Does it eliminate? → Gate. (Phrase it as a loose, literal minimum bar, escape hatches included — see contract.md on gates. Where degree matters, pair a loose gate with a scored depth criterion on Fit or Risk.)
  • Is it a capability the use cases need? → Fit.
  • Is it money? → Cost — and only Cost. Never bolt an affordability clause onto another gate (“compliant on an affordable plan” smuggles budget into a compliance test). Cost must never contaminate the Fit score — pricing/TCO runs as its own pass after scoring (pricing-tco.md). Early on, cost may act as a soft, reversible research screen (skip the obviously out-of-range), never as a hard filter.
  • Is it a failure mode, dependency, or uncertainty? → Risk.

Mapping the canonical requirements vocabulary. Engineering-speak names four kinds of requirement — functional, non-functional, risks, and constraints — and the axes are how each lands:

  • Functional requirement (“the system must do X”) → Fit, or Gate if it’s a binary minimum bar.
  • Non-functional requirement (“the system must be fast / secure / portable”) → Fit if it’s a capability the thing has on day one (e.g. “responds in <200 ms”); Risk if it’s a downside that emerges over time (e.g. “stays effective as load grows”, “degrades gracefully when the API is down”).
  • Risk (“can fail in way X”, “depends on Y”) → Risk, one of its dimensions.
  • Constraint (“the system must not Z”, “we must avoid Y”) → Gate, binary, eliminates.

The mapping is one-way and decisive — you do not need a separate “non-functional requirements” or “risks” list alongside criteria.json. That’s what the single-placement rule replaces: every requirement, of every flavour, lives once, on exactly one axis.

The Fit↔Risk test. This is where requirements drift most often. Ask: is the thing a capability on day one, or a downside that emerges over time? “Has a configurable router” — capability → Fit. “Survives a cat figuring out how to defeat it” — downside that emerges → Risk. The same property can hide on either side (“uptime” reads as a capability but lands on Risk because it’s about the failure mode of being down); when in doubt, default to Risk and pair it with a tight Gate (“minimum 99% SLA”) if a hard floor exists.

Risk is a first-class scored axis

Risk is scored, not narrated — break it into dimensions and score each, so two solutions are comparable on risk the way they are on fit. A representative default set (the Skischule dimensions):

  • Datensicherheit — data control, breach resilience, regulatory exposure.
  • Resilienz — uptime, peak-load behaviour, graceful degradation.
  • Abhängigkeit — lock-in, bus-factor, vendor longevity, vendor-count (more tools and more glue = more dependency).
  • Termin-Sicherheit — deadline risk; turn-key buy is low, bespoke build is high.

Some dimensions are computed from criteria scores; some are hand-scored per solution (contract.md shows both riskScores shapes). Pick the dimensions that fit the engagement — but keep risk scored.

Two elimination causes — a gate, or a decision (DOCTRINE)

A solution leaves the running for exactly one of two reasons, and the difference is load-bearing:

  1. Gate-fail — self-elimination. A candidate that fails a Gate is recorded in the data as a hard-filter 0 with its reason, never removed by row-deletion. This is a fact about the candidate — it cannot clear a written minimum bar — and it is irreversible short of the candidate itself changing. A struck candidate is data, not absence; eliminations get revisited when criteria shift, so the first place a later round looks is the list of what fell and on which filter.
  2. Decision — a resolved fork rules it out. A made decision (a resolved fork — § The fork is the decision — whose options map to architectures) rules out the architectures, and thus the solutions, that its unchosen options carried. This elimination is reversible: change the position and those solutions re-enable. The board names the causing decision, not a gate.

Everything else stays live. A solution that merely scores worse — passes its gates, not excluded by any decision — is not eliminated. It stays in the running and ranks low, or is hidden by a display setting (§ One solution per viable spine), never struck. “Worse” is a position on the ranking, not a cause of removal.

Never fabricate a gate-0 to make a decision-driven elimination render as “out.” A decision-elimination is not a gate failure and must not be recorded as one: doing so makes a reversible choice look like an irreversible fact about the candidate and corrupts the re-entry the never-deleted rule protects. The two causes carry different data and different reversibility — keep them distinct. (The matrix hard-filter field and the status derivation that distinguishes a gate-raus from a decision-raus are in contract.md; the re-entrant discipline that revisits eliminations is process.md § Regression.)

The shared Scenario

Cost and load are not properties of a tool — they are what happens when this org’s demand meets a tool’s pricing and a solution’s architecture. So pull the demand out into one shared Scenario: the demand vector every solution is evaluated against. Which knobs the Scenario carries depends on the cost shape of the candidate solutions — a seat-priced SaaS stack wants seats and seasonality; a hardware-plus-refills decision wants units and refill cadence. Sketch the Scenario as prose + amounts in Phase 2 (a Mengengerüst, no fields yet) and pin its fields in the Phase-4 pricing pass (pricing-tco.md), once you know what cost shapes you are pricing.

The Scenario is the forcing function. Because every solution is costed under the same demand vector, a SaaS subscription, a composed stack, and a custom build land in one comparable table instead of three incompatible pricing pages. Turn a Scenario knob (seats 4 → 6, peak 5× → 10×) and every solution’s Cost — and any load-sensitive Fit/Risk — recomputes together, so the ranking’s sensitivity becomes visible, not just its point value.

Composition

A solution is built by composition:

solution = chosen tools  +  custom build  +  integration glue  +  operating model
  • Chosen tools — zero or more catalog tools, each contributing capability (→ Fit) and license/usage cost (→ Cost buckets ③④).
  • Custom build — the bespoke pieces, sized in work packages (effort bands: optimistic / expected / pessimistic). Build contributes capability and project effort (→ Cost bucket ①) and Termin-Sicherheit risk.
  • Integration glue — the effort to make the tools talk (→ Cost bucket ①) and the fragility it introduces (→ Risk · Resilienz / Abhängigkeit).
  • Operating model — who runs and maintains it (→ Cost bucket ② maintenance).

Two disciplines make composition honest. Count each work package once per solution, never sum it across solutions — shared setup (mailbox wiring, KB seeding) is identical across paths; only build-specific packages differ. And size build in bands, not point estimates — the optimistic↔pessimistic spread is the Termin-Sicherheit signal, the same way a usage-priced tool’s cost band is its cost-risk signal. (The 4-bucket TCO rollup these feed is the cost method in pricing-tco.md; the field shapes — costModel, workPackages — are in contract.md.)

Coverage — how each use case is met

Composition answers, per use case: does the core deliver it, or is there a gap — and if so, how is the gap filled? That per-use-case answer is a solution’s coverage (contract.mdcoverage), and it is the substance the Composition view renders (cockpit.md): for one chosen solution, what the core does natively and where you are augmenting, building, or accepting a partial. Five codes:

  • in — the core delivers it natively. No gap.
  • part (or part:<fillerId>) — the core does most; a remainder is filled.
  • aug:<fillerId> — a gap filled by a named filler: a bought sidecar, an API, or a separate tool docked to the core.
  • build — delivered by the custom build.
  • proj — a one-time setup task at implementation (e.g. a data migration), not an ongoing capability.

Fillers are factored into a small library (fillers.json, schema in contract.md) so the same gap-filler is referenced by every solution that needs it, each with its buy-vs-build options and effort band. A solution that needs ≥1 filler also pulls in the integration glue automatically — which is exactly how the single-core-plus-sidecars vs. fragmented best-of-breed difference becomes visible: more fillers, more glue, more dependency (→ Risk · Abhängigkeit), more integration effort (→ Cost ①).

Coverage and Fit are two fields, side by side — the coverage→Fit bridge. Coverage is the qualitative shape of how a use case is met; the Fit score (matrixscores[critId]) is the graded quality of that meeting. Neither derives from the other — aug/build/proj have no score equivalent, and a use case can be in yet score a mediocre 3. Render them together so a mismatch (in but Fit 2; build claimed at Fit 5 with no work package) is visible and gets questioned.

The fork is the decision

The unit the funnel converges through is the fork — a decision. A fork is a degree of freedom (own vs. rent; AVB-knowledge in scope or not), and its options map to architectures: choosing a position narrows the live architecture set by set intersection (the kernel’s architectureFromForks / liveArchitectures — step-3 compute). Get the fork model right and four loosely-related notions become one object: the architecture forks, the architectures they generate, the client’s value calls, and the scoring presets.

A fork carries two independent coordinates — keep them apart:

  • Discoverywhen the decision surfaced. Any step; often late. You go looking for forks deliberately once the space is open (Phase 3 on), but you keep discovering them later and place each at the phase that resolves it. Discovery ≠ placement.
  • Resolutionwhen/how it closes. A router to the phase that can answer it:
    • 🔍 fact → a Requirements / Scenario datum settles it (Phase 2). No judgement.
    • 📊 rating → the Solutions scoring decides (Phase 4) — a tradeoff we compute for them.
    • 🎯 value → the client closes it at Decisions (Phase 5), because it’s a long-term, hard-to-reverse, project-shaping commitment (own vs. rent; maintenance appetite) that we have no standing to make for them.

And it sits on two independent axes — collapsing them is the blunt-instrument trap:

  • Leverage — how much the decision contracts the space. Orders what you surface first; the craft is spending the client’s attention on the high-leverage forks, not the trivia.
  • Ownership weight — how strategic / irreversible it is. Routes who closes it, along a continuous spectrum that 🔍/📊/🎯 only band: soft end = absorb it, never bother the client; hard end = the client must own it. The discriminator is reversibility and long-term lock-in — not cost magnitude (a cheap call can be deeply strategic; an expensive one can be a pure rating).

The axes correlate but don’t collapse: a fork can be high-leverage / low-ownership (a rating that strongly narrows — just decide it) or low-leverage / high-ownership (a values call that barely moves the space but the client must still own). The headline forks score high on both — they narrow hard and only the client can make them.

Narrowing lives on the fork itself: a fork’s options each declare prune:{set,keep} (the field shape is in contract.md), and choosing options sets positions = { forkId → optionKey } that the kernel intersects. The fork records live in decisions.jsonforks[]. (Plan 15 MR-1 retired the old solution-side forkPositions stamp / architecture-side forks{} map — narrowing has a single home now: the fork’s prune set; see contract.md D6.)

Narrowing — a palette of moves, not a pipeline

Phase 4 turns the long list into a short list — but not by a fixed sequence. After the landscape survey yields the long list, you hold a palette of investigative moves, each sharpening a different facet:

  • specify / drill — make a candidate concrete: its composition (coverage) — what the core covers vs. what you’d fill, and how;
  • estimate cost — price it under the Scenario (the 4-bucket TCO);
  • estimate effort — size the build + integration work packages (the band is Termin-risk);
  • assess risk — score the risk dimensions;
  • assess fit — score capability against the Fit criteria.

You choose, at each step, the move with the most leverage right now — the one that most reduces your uncertainty about which solutions are real. A move sharpens the candidates and surfaces findings; some then fall away — eliminated by a failed gate or a consultant judgment call. Iterate; the picture tightens; a short list of ~2–5 emerges. There is no canonical order — cost-first is not a rule; sometimes one risk finding kills three candidates before any costing is worth doing. The app supports each move (the Composition view, cost on the cockpit, the effort bands, the Risk axis, the Fit scores — cockpit.md) but never sequences them for you. (The per-stage how-to — when to drill, what each loop stage asks — is process.md § Phase 4.)

The consultant decides; the agent presents. This is the verdict rule (below) applied to every elimination along the way: the agent surfaces the cost/effort/risk/fit picture and the defects it found, but it does not eliminate a candidate or crown a winner on its own. Each cut is the consultant’s call, recorded with its reason (a struck candidate stays in the data as a hard-filter 0, never deleted). An agent that quietly drops a candidate it judged weak has overstepped.

The knowledge grid — the gaps are the map

“Most leverage right now” is not a vibe; it is read off a grid. Build the long list per architecture and initialise every solution empty — then each candidate carries, beside its scores, a knowledge-state over the five facets (spezif · kosten · effort · risiko · fit — Plan 15 MR-2 renamed buildeffort), each one offen | geschätzt | bekannt (contract.mdknowledge). This is confidence, not value — distinct from the score itself. The score is the answer (“Fit 4”); the state is how far you trust it (“geschätzt”). A fresh candidate is all-offen; a move on a facet walks that cell toward bekannt. The open cells are the map — what is not yet known is what is left to do.

Which open cell to close next is the leverage question, and it has an answer: the one that most separates the live candidates. gain = w_open × separation × leverage — an offen cell (a bekannt one can’t light up, nothing left to learn) on the facet that most tightly splits the current front-runners, weighted by that facet’s share of the active preset. A facet nobody’s front-runner cares about scores low however open it is. This is what makes the palette operational: init empty → fill the highest-gain open cell → re-rank → collapse (the loop is process.md § Phase 4; the infoGain compute lives template-level for now and graduates to the kernel when it generalizes across engagements — cockpit.md).

When the highest-gain open cell hides a value-fork, it is not a research move at all — it surfaces as a 🎯 value-move routed to the client, the same fork-discovery feedback as below. You author it as an ordinary decisions.json fork whose options[].prune targets the affected solutions/fillers (Plan 15 MR-3 retired the dedicated facetForks link — a gap that hides a decision is just a fork). The gap that most separates the field can be one only the client can close, not one you research away.

Forks are discovered here, not only in Phase 3

The most valuable thing a move can surface is a decision you can’t make for the client. When drilling a candidate reveals a degree of freedom that’s strategic or hard to reverse — buy the knowledge layer or build it? own the spine or rent the platform? — that is a fork, and minting it is how the narrowing loop feeds back into the architecture model (Phase 3’s machinery, reused, not a new mechanism). Mint it in place: author a decisions.json fork (the schema is contract.md) whose options carry the prune:{set,keep} that narrows the surviving candidates. Place it by its resolution mode — a 📊 rating fork the scoring closes, a 🎯 value fork the client closes at Phase 5. Discovery is continuous; placement is by resolution (§ The fork is the decision).

Collapsing to the brief

The endpoint is not one ranked answer (see the verdict section). It is a short set the client can decide between: ideally 2–3 solutions + 2–3 open decisions, the preferred one highlighted by the consultant’s authored verdict. The lever that gets you there is the fork positions — once the survivors differ only on a handful of forks, they collapse into “one solution family + N open decisions,” which is the briefing shape. If they still differ on too many axes to present cleanly, that’s the signal to run another move (or another research round) — not to force a pick.

One solution per viable spine — visibility is a display choice

When the landscape survey finds a tool viable as a spine, that yields a constructed solution — an architecture filled with that spine — and it goes on the board, not into an early-disqualification bin. The board can carry many cards; that is fine. What controls what you see is not deletion but three orthogonal mechanisms:

  • Gates remove a candidate (hard-filter 0) — a fact about it (§ Two elimination causes).
  • Decisions rule out the cards an unchosen option carried — reversibly (ditto).
  • Display settings — a pure view filter (hide-eliminated · top-N · show-all) — change what’s shown without touching the data or the ranking. A low-ranked, gate-passing, decision-included solution is hidden by a display choice, never struck.

So “merely worse” never disqualifies a candidate early; it lands it low on a possibly-filtered board. Disqualification is a gate or a decision; everything else is ranking and display. (The board, the eliminated-card rendering, and the display filter are cockpit.md.)

The verdict: instrument vs. judgment

The scored comparison is an instrument, not an oracle. It produces a ranking at a given weighting of the axes, and named weighting presets (e.g. “balanced,” “security-first,” “cheapest,” “my recommendation”) let you — and the client — turn the knobs and watch the order move. That transparency is the point: the client sees how the weighting drives the result instead of being handed a number to trust.

Gate-eliminated solutions are out, not low. They never appear on the ranking, no matter the weighting — the instrument must not let an eliminated candidate outrank a live one by happening to score well on the other axes. The Gate axis carries through to the verdict the same way it does to the matrix: pass or fail, no second chances at the weighting screen.

But the verdict is authored by the consultant. It is informed by the instrument and free to diverge from whatever currently sits on top of the ranking — when it does, that divergence is stated plainly, with the reason. The instrument shows the trade-off; the consultant’s judgment, the client’s stated values, and the things that don’t reduce to a score (which UI feels trustworthy, whether the team will actually adopt it) decide it. The consultant authors the verdict; the agent presents it, and never decides or eliminates.

This is the true form of the old “never blend everything into one score” rule. The danger was never weighting — it was letting an opaque blend pose as a recommendation. A transparent, turnable weighting plus an explicit, authored verdict is the cure: the math is visible and the judgment is owned. Never present the instrument’s top row as “the answer” without your verdict on top of it.

The judgement hierarchy — evidence flows up, verdicts stay authored

The funnel has four layers, and a research insight bubbles up them: an insight about a tool changes the solutions that use it; a change in a solution changes the feasibility of the architecture it fills; changed architecture standing + solution rankings change the overall comparison.

 tool gate ──▶ solution viability ──▶ architecture feasibility ──▶ overall comparison
 (authored,     (computed score +       (AUTHORED verdict,            (kernel ranking +
  eliminatory)   authored verdict)       now TRACKED)                  authored recommend)

Each layer carries a judgement, and the architecture layer now carries a standing, tracked one — parallel to the solution layer’s per-facet knowledge-state (confidence), but here it is the feasibility verdict itself. Two doctrines govern it:

  • The verdict’s value stays authored (P1). An architecture’s feasibility is often a structural call no score captures — the kind of “this whole shape is dead because licensable ⇒ closed, open ⇒ pool-gated” finding that no Fit average would surface. A mechanical roll-up of its solutions’ scores would silently overwrite exactly the judgement the consultant is paid to make. So the architecture verdict is never the mean of its solutions; it is authored, like the comparison verdict above it.
  • Detection is computed; judgement is authored. “Tracked” does not mean auto-recomputed. The architecture (and comparison) verdict records what it rests on (verdict.basis[] — finding ids, solution keys, gate refs) and when it was last authored (verdict.asOf, a round marker). The children carry a revisedIn (the round they last materially moved). A pure kernel function, staleness(), flags a verdict stale when any basis[] child’s revisedIn is newer than the verdict’s asOf — and surfaces a “⚠ neu zu prüfen” badge. It never rewrites the verdict. A human re-examines and re-stamps asOf. This is the safe, P1-consistent choice: auto-recompute was rejected because it would mask the structural shifts that most deserve a human look.

So evidence flows up mechanically (the badge), but the re-derivation at each level is deliberate. “Materially changed” = a gate flip (pass↔fail), an elimination, a score crossing a tier boundary, or a new finding tagged to a dependency — cosmetic edits must not trip it (bump revisedIn only on a real change). The schema is in contract.md (verdict.asOf/basis[] on architectures + comparison; revisedIn on solutions and rounds[].findings[]); the stamp-and-flag protocol is in process.md (§ Phase 4, § Regression); the detector is engine/kernel.js staleness(). The hard-escalation rule for a finding that invalidates a Phase-1/2 requirement is distinct and unchanged (process.md § Regression): that still STOPS and escalates — staleness is the softer, in-phase signal, not a replacement for escalation.

The sign-off bookend, and write-back discipline

A dynamic research workflow — a fan-out of briefs, a deep-dive sweep, a red-team round — is bookended by two sign-offs, because the machine proposes and the consultant disposes (this is the verdict rule applied to the running of research, not just its reading):

  • Before running — present the workflow plan: its structure and how many agents it fans out to. The consultant approves the shape before any agent runs. (Run research in deliberate, reviewable waves, not one uncontrolled blast — research-briefs.md.)
  • Before write-back — present the results for approval before they touch the data. A returned report is biased input, not ground truth (research-briefs.md § Ingesting); it is read, audited, and signed off, then written.

Write-back discipline. When approved research is applied, the machine may touch scores, knowledge-states, cost, and findings — but never gates and never authored narrative / verdict prose. A gate flip and a requirement change escalate to the human (process.md § Regression); the authored verdict stays authored (above). And a tool that research found weaker becomes a red-team caveat and a lower score, never a fabricated gate-0 — the never-fabricate-a-gate rule (§ Two elimination causes) applies on write-back exactly as it does everywhere else.

Three registers — model · rationale · client copy

The data and the prose tangle unless you name three registers and keep them apart:

  1. Model (structured) — ids, axes, tiers, forks, positions, options→architectures, scores, costModel, weights. The kernel computes on this; the explorer renders it generically.
  2. Rationale (analytic prose) — criterion description, architecture summary, fork question, expectedFailure, gewinnt/gibtAuf. Authored for the analyst; lives as labeled fields on the model; internal voice is fine (a 3rd-person “Christian”, “Spine/Glue/RAG” jargon — it’s a working note, not client-facing).
  3. Client copy (presentation prose) — the polished, client-addressed strings the customer reads. Not in the data — a shell layer (an override map keyed by id, falling back to register 2 when unmapped).

The rule: the data file carries 1 + 2; the shell carries 3. The kernel and explorer never read register 3. The failure this prevents runs both ways — internal jargon and 3rd-person analysis leaking onto the client’s screen, or the data file polished into client copy until the analyst’s blunt reasoning is gone. Keep the analysis honest in the data; keep the polish in the shell. (Where the override map lives and how it falls back is cockpit.md.)

The verdict is register-1/2 data, never register 3. Which option is favoured — the recommended flag and its verdictNote (contract.md solution/shape schema) — and the authored verdict above are model + rationale, carried in the data file. The shell (register 3) polishes copy only and MUST NOT encode the favourite. Changing the recommendation must never require a shell or code edit — flip recommended in the data and the cockpit re-foregrounds. (Motivating failure: the Ehimare cockpit leaked the favourite into a POS_COPY.favorite flag in app.js — a register-3 edit doing register-1 work; it was moved into shapes.json/solutions.json recommended.)

Override keys are a closed set over the data. A register-3 override map keys by data id (fork id, option key, shape/position id) and falls back silently to register 2 on any mismatch. That silent fallback is a footgun: rename an id in the data and a now-orphaned override key simply stops applying, with no error. So the rule is override keys MUST be a subset of the data’s ids/option-keys; the shell emits a dev-mode warning for any orphaned key (guarded by a dev flag, silent in production — cockpit.md § override contract).

Two registers, one model

The cockpit holds the same scored model in two registers:

  • Die Reise — a curated story you narrate: a linear, paced walkthrough (Auftrag → Anforderungen → Architektur), one idea per step, the presenter controls. It is told, in order.
  • Die Exploration — a non-linear analyst deck: the board, the verdict, the sharpening matrix, the cockpit, the matrix, the tools landscape, the report. You jump between these to work the solution space. They are instruments, not chapters.

The seam is narration vs. exploration, not client vs. consultant — the whole cockpit is consultant-guided either way. So this is one model, one URL space, one data spine: every register reads the same data and the same scores; you build one model and skin it twice. There is no second URL space and no audience mode split — the two registers only name the spine’s two halves so the nav can show them (the brand two-tier docnav). Drill-downs (a solution’s Composition from a board card, the Schärfe-Matrix) live inside Exploration, not as free-floating peer apps. (The shell + template that realize both registers, and the nav that separates them, are in cockpit.md; the worked instances are in examples.md.)

← Alle Methodik-Themen