Two planes, one node
A concrete, high-level picture of how the system is deployed and stays interconnected for GEB: the two agent planes on one node, the seam between them, the outside interfaces, and how skills and measurement plug in. It is the base for the implementation.
v1 modelled the two platforms as an A/B pair joined by a platform-abstraction layer, both interchangeable per company. The operating manual reframes them as a stack, not alternatives: Paperclip governs, Fusion executes, joined by a shared standard — the seam. v2 pins the concrete deployment: Paperclip and Fusion on one cloud VM hosted on northcheck, skills consumed from the seam (synced from Penelope, not duplicated), external interfaces via webhooks, operator access through the dashboards, and a Langfuse measurement rail. The A/B data already gathered stays as evidence; it is no longer the architecture.
Two planes, joined by a seam
Paperclip governs, Fusion executes. The seam between them is a handover with a gate on each side — not a live socket. The approved Paperclip spec becomes Fusion's PROMPT.md; nothing crosses until governance passes it.
PROMPT.md spec, git worktrees, per-step review, pre/post-merge gates, delivery. Decides how it gets built and ships it.
An internal, self-improving agentic system running GEB and Octaflow through supervised agents and a shared skill ecosystem — to cut production cost while preserving quality, provable internally first.
Runtime & the seam, inside the VM
Both planes run on one cloud VM hosted on the northcheck platform (internal-first). Paperclip and the execution plane never call each other over a socket — they exchange an approved spec and a work product through the shared repo (the seam). Skills live in the seam and both planes read them. Fusion exports OpenTelemetry to Langfuse, which returns a quality signal to the governance gate.
The two apps share the repo and filesystem, so the seam is local — no network coupling between them. Fusion can later move to a dedicated execution node / build farm on northcheck without changing the seam contract.
Webhooks out, skills in, dashboards for the team
The node reaches the outside only through webhooks and outbound calls: GEB Shopify (webhook in / out), the Penelope repo (skill-sync pull), model providers (OpenRouter / HF), and the operator dashboards over a Tailscale tunnel.
The Paperclip dashboard (:3100) and the Fusion board (:4040) are the human interface for manual task creation, approvals, and monitoring. Exposed to Aman, Rajan and devs over a Tailscale tunnel (internal-first, no public surface); a Cloudflare Tunnel + Access can add a public URL later. Multi-user via Paperclip's members / roles / invites / permissions. Never expose the dashboard/API without auth.
The seam crossing, end to end
A shopper's questionnaire fires a pipeline that collects context, retrieves and analyses evidence, applies rules, scores risk, and produces a decision package for human sign-off before delivery.
- Trigger. A shopper submits the GEB Shopify questionnaire → a webhook hits the node's gateway.
- Govern. Paperclip creates the issue (goal ancestry, budget); the CEO / approval gate approves the spec.
- Cross the seam. The approved spec becomes a
PROMPT.mdin the shared repo — via the shared package, not the experimental live plugin. - Execute. Fusion runs it as one Stepwise workflow in a git worktree; each step is plan → execute → review; the pre/post-merge gates carry the GEB gate criteria.
- Measure. Fusion exports OpenTelemetry to Langfuse; each step's output is scored against the stored evals; the quality signal feeds the governance gate.
- Return. The merged result returns to Paperclip as an audited work product.
- Egress. On approval, the action goes back to Shopify by webhook; the run (inputs, steps, output, cost, quality) is recorded.
Did the specification survive the crossing, did each gate fire, and did the measured quality hold. Humans can also create and approve work directly in the Paperclip dashboard — a second entry path alongside the webhook.
Skills from Penelope, quality from Langfuse
Skills live once in the seam and are read by both planes — no per-repo hand-copy. Measurement neither Paperclip nor Fusion does on its own is added by Langfuse over OpenTelemetry.
Skills — consumed from the seam
- A skill-sync plugin pulls skills from the Penelope repo (source of truth) into the seam: scheduled + on-demand, pinned to a ref (
skills.lock), a read-only mirror. - Every synced skill is scanned with agentshield before it goes live (a shared package is a shared supply-chain surface).
- Interim mechanism while there is no export path beyond manual copying; the lightweight form of a
companies.shcatalogue, swappable later.
Measurement — Langfuse over OpenTelemetry
- Fusion already exports OpenTelemetry; Langfuse ingests at
/api/public/otel— no custom bridge. - The gate criteria applied by hand (rubric, business rules, red-flag list) become stored, versioned evals; every run is scored automatically.
- Instrument against OpenTelemetry so the tool stays swappable (LangSmith / Braintrust / Phoenix all read it).
The order the pieces come together
- Provision one cloud VM on northcheck (Node / Docker, internal network); run Paperclip (:3100) + Fusion (:4040).
- Make the seam repo the shared Agent Companies package (
COMPANY / AGENTS / PROJECT / TASK / SKILL.md);fn initfor Fusion, provider OAuth complete. - Stand up the skill-sync plugin (Penelope → seam skills/, scheduled + on-demand, agentshield-gated).
- Run the stack: govern in Paperclip → cross the seam as
PROMPT.md→ execute in Fusion as one Stepwise workflow with the gate criteria in pre/post-merge steps → return to Paperclip. - Wire the GEB Shopify webhooks (in: questionnaire → Paperclip; out: approved action → Shopify).
- Route models per layer through OpenRouter (strong for governance/review, cheap for execution); meter cost.
- Stand up Langfuse; point Fusion's OTel export at it; turn the gate criteria into stored evals.
- Expose the dashboards over Tailscale; set members/roles for Aman / Rajan / devs.
- Run the single seam test end-to-end (GEB anchor) and read it as one thing.
One cloud VM on northcheck; Paperclip + Fusion installed and authenticated; the seam repo as a companies.sh package; read access (deploy key) to Penelope's skills repo + agentshield; OpenRouter (+ HF) keys; Shopify webhook/API access; Langfuse (docker compose) + Fusion OTel export; Tailscale (or Cloudflare Tunnel + Access) for the dashboards; Paperclip member accounts/roles; human approvers for the gates.
What is still to settle
Three items the proposal leaves to the runbook and to running conditions. Tap each for the detail.
The skill-sync plugin is the interim mechanism while there is no export path beyond manual copying. A real companies.sh catalogue / registry is the target; same shape, swappable.
The exact wiring of Shopify → Paperclip ingress and the Fusion trigger is fixed in the runbook, not in this architecture proposal.
Whether Fusion stays on the same VM or moves to a dedicated execution node / build farm on northcheck as scale grows — the seam contract does not change either way.