RelayForge™ · Architecture · v1.0 · May 2026

How We Build Lobsters

The architecture behind RelayForge's persistent AI companions. No marketing fluff. If something isn't built yet, we say so.

All RelayForge Documentation

🏗️ArchitectureThis page — how lobsters are built 🔬Research Sources46 papers, systems & benchmarks 🛡️Carapace ProtocolCryptographic agent identity & ARIA ⚒️Clawmark StandardSix-gate tool verification rubric 📋Tool RegistryAll verified tools & Clawmark scores 🏭DAWES BenchmarkIndustrial AI readiness standard 🦞About RelayForgeWhy we exist, who we are

The Problem We're Solving

Today's AI assistants have a dirty secret: they forget everything the moment you close the window. Every conversation starts from zero. The model doesn't know what worked last time, what failed, what you prefer, or what it promised to follow up on. That's not an assistant. That's a stranger with amnesia.

The research is unambiguous. Benchmarks like LongMemEval, AMA-Bench, and ASTRA-bench consistently show that dialogue history alone is not enough for real persistence. Agents need temporal recall, evolving knowledge, environment-specific experience, and personal context linked to tools and tasks.

RelayForge lobsters are designed from the ground up to be different. They persist, they learn, and they earn trust through architecture — not promises.

The Six-Layer Architecture

The strongest finding from current research — Anthropic, LangGraph, Letta, Mem0, cognitive architecture literature — is that persistent agents converge on the same shape: multiple distinct memory layers, each optimized for a different job. We didn't invent this pattern. We implemented it.

Session Ledger

An append-only event log of every message, tool call, approval, and state transition. When something goes wrong, we can reconstruct exactly what happened — no guessing, no lossy summaries.

Working Memory

A small, pinned state that holds current goals, active constraints, and temporary evidence. This is what keeps reasoning fast and context windows lean. Aggressively compacted after every session.

Episodic Memory

Structured records of what happened in prior sessions: outcomes, failures, user reactions, and reflection summaries. Your lobster learns from experience the same way you do — by reviewing what worked and what didn't.

Semantic Memory

Facts, preferences, entities, and relationships — with timestamps, confidence scores, and provenance. Every fact tracks whether it came from you, was inferred by the lobster, or was returned by a tool. Nothing is permanent without evidence.

Procedural Memory

When a lobster figures out a good way to handle something, that knowledge gets stored as a versioned, benchmarked skill. Next time, it doesn't start from scratch — it reuses what proved reliable.

Trust Layer (Carapace)

Every hatch attempts ARIA registration with cryptographic identity, capability attestation, and scoped permissions. Registration failures are visible and retryable. High-impact actions require approval and runtime activity is logged for review.

Design principle: Treat the context window as working memory, not as the source of truth. Treat memory writes as first-class operations with provenance, confidence, scope, and auditability.

Context Engineering, Not Context Stuffing

Many platforms solve the memory problem by cramming everything into the context window. More tokens, bigger window, problem solved. The research says this is a trap.

39%

performance improvement from structured memory tools + context editing (Anthropic)

84%

token reduction on 100-turn workflows vs. raw history injection (Anthropic)

When you talk to your lobster, it doesn't load your entire history into the conversation. It retrieves what's relevant — the right facts, the right episodes, the right skills — and leaves everything else in structured storage where it belongs.

Trust Is Infrastructure, Not a Feature

Trust in an AI agent is not a toggle you switch on. It's an infrastructure problem that touches identity, permissions, tracing, approvals, and rollback. We treat it that way.

Carapace — Cryptographic Agent Identity

RelayForge hatches create a Carapace identity card when ARIA is reachable, with an Ed25519 signature, capability attestation, and registry record. Registration is visible in the dashboard and failures are treated as retryable trust work, not hidden success.

Clawmark — Tool Verification

CoreClaw tools are registered and reviewed against the six-gate Clawmark rubric before we advertise them as verified. This covers security scanning, capability scoping, MCP-discoverable manifests, reliability scoring, and registry visibility.

Scoped Execution

Tool calls declare intent, scope, and risk level before execution where the connector supports it. High-impact actions require explicit user approval, and runtime activity is logged so tool calls, approvals, memory changes, and outcomes can be reviewed.

What Makes a Lobster Different

We reviewed over 45 research papers, open-source projects, and production systems to understand the state of the art.

Capability	Industry Standard	RelayForge Lobsters
Memory Provenance	Most systems store opaque vectors or lossy summaries	Versioned, inspectable, reversible memory with evidence links and timestamps
User Understanding	Preferences stored as flat, static facts	Bounded, reviewable user model tracking communication style, behavior patterns, and relationship depth
Temporal Reasoning	Chunk retrieval by similarity only	Query by entity, event, time, and causality — knows what was true and when
Cross-Session Learning	Remembers facts but not improved procedures	Turns successful outcomes into reusable, benchmarked procedural skills
Trust Surface	Basic tool use plus logs	Cryptographic identity, policy-aware execution, approvals, trace grading, and rollback
Evaluation Rigor	Cherry-picked demos	Continuous benchmark portfolio covering memory, tool use, security, and observability

Industrial Grade: DAWES

RelayForge also builds for industrial environments — refineries, manufacturing plants, and process control networks. Our DAWES benchmark (Domain Anchored Workplace Expertise Standard) is the industry's first rigorous test of whether an AI model is actually ready for unassisted industrial deployment.

The honest answer today: no model passes. That's the point. DAWES credibility comes from rigorous methodology and published failure data, not from inflated scores. The consumer lobster platform funds and informs the industrial track — every architecture improvement that makes a personal lobster more reliable also makes an industrial agent safer.

What We Haven't Built Yet

Transparency is a RelayForge value. Here is what's in progress and not yet shipped.

—Temporal graph memory (Graphiti-style entity/time queries) — designed, implementation in progress

—Procedural skill library with automated benchmarking — schema defined, routing not yet live

—Full user-model layer with communication style tracking — Phase 3 of our build plan

—Background memory consolidation at scale — working for small lobster pools, not yet stress-tested for fleet operations

—Enterprise policy engine with per-tenant memory partitions — Phase 4

We'd rather tell you what's real than pretend everything is finished. The memory and trust substrate is live. The layers above it are being built in the open.

Research Foundations

Our architecture is informed by peer-reviewed research and production systems. Key influences include:

◆Anthropic's managed-agent architecture and context engineering work (2024–2026)

◆LangGraph's checkpointed state and long-term memory model

◆Letta/MemGPT's stateful agent runtime and self-editing memory

◆Mem0's memory extraction and consolidation pipeline

◆Graphiti/Zep's temporal graph memory for evolving truths

◆Generative Agents and Reflexion for post-session reflection and learning

◆Voyager's skill library model for compounding agent capability

◆LongMemEval, AMA-Bench, and ASTRA-bench for memory evaluation methodology

◆MCP specification for tool ecosystem and authorization standards

◆40 years of cognitive architecture research on working, episodic, semantic, and procedural memory

Full source list and white papers

Get your lobster