My Journey Understanding OpenClaw's Architecture

Cover Image for My Journey Understanding OpenClaw's Architecture
Ram Sathyavageeswaran
Ram Sathyavageeswaran

Introduction: Three Days in the TypeScript Trenches

I’m not a TypeScript wizard. Far from it, actually. But I spent the last 3-4 days diving deep into the OpenClaw codebase, trying to understand how this multi-channel AI assistant actually works under the hood. This post is my attempt at documenting what I learned - mostly for my own understanding, but hopefully it helps you too if you’re exploring the project.

Why did I get curious? Well, I was fascinated by the idea of having a single AI assistant that works across WhatsApp, Telegram, Discord, Slack, and 30+ other messaging platforms - all running on my own hardware. No cloud service, no data leaving my machine unless I want it to. That’s pretty cool.

What is OpenClaw, Anyway?

Think of OpenClaw as a bridge between your messaging apps and AI. Here’s the simple version:

  1. You send a message on WhatsApp (or Telegram, Discord, etc.)
  2. OpenClaw receives it and talks to an AI model (like Claude or GPT)
  3. The AI responds back to you in the same app

The genius part? It’s all self-hosted. It runs on your own computer or server - not some company’s cloud. You control the data, you control the setup, and it’s all open source (MIT licensed).

Why this matters:

  • Privacy: Your conversations stay on your hardware
  • Convenience: One AI assistant everywhere you chat
  • Flexibility: Open source means you can see and change how it works
  • Multi-channel: 30+ messaging platforms through one system

Let’s dig into how they built this thing.

The Big Picture: How It All Fits Together

After days of reading code, here’s what I figured out: OpenClaw has a surprisingly elegant architecture built around one central idea - the Gateway.

OpenClaw Main Architecture

Key Ideas

One Gateway to rule them all: There’s a single program running (the Gateway) that handles everything. It’s like a traffic controller at an airport - coordinating all the planes (messages) coming and going.

Everything connects to the Gateway: Your phone, your terminal, your web browser - they all talk to this one Gateway via WebSocket (think of it like a phone call that stays open).

Mobile devices are helpers: Your iPhone or Android can be a “node” that adds extra capabilities to the system - like taking photos, recording your screen, or showing a canvas that the AI can draw on.

Plugins make it extensible: Want to add support for a new messaging app? Just write a plugin. The core system doesn’t need to change.

Core Component #1: The Gateway (The Heart of Everything)

The Gateway is the central hub - it runs a WebSocket server (port 18789) that manages all connections, routes messages to the right AI agent, and tracks conversation sessions. It validates every message using TypeBox schemas to prevent unauthorized access. Think of it as the traffic controller coordinating all incoming and outgoing messages across your messaging apps.

Core Component #2: The Channel System (Talking to All Those Apps)

OpenClaw supports over 30 messaging apps through a unified channel model. Built-in channels include WhatsApp (Baileys), Telegram (grammY), Discord, Slack, Signal, and iMessage. Additional platforms are added via plugins with a manifest file, adapter functions, and capability declarations. The genius part: all channels work the same way internally - whether it's WhatsApp or Discord, they all have messages, users, and send/receive operations. This unified approach makes adding new channels straightforward.

Core Component #3: The Router (Who Gets Which AI?)

When a message arrives, OpenClaw needs to decide which AI agent handles it. The routing system uses a priority hierarchy: exact person match → thread/conversation → Discord role-based → server/workspace → channel account → channel type → default. The system picks the most specific match, so a "specific person" rule beats a "Discord Server" rule. OpenClaw tracks conversations with session keys (e.g., agent:my-agent:whatsapp:group:123) so each chat stays separate - your work Discord won't leak into personal WhatsApp.

Core Component #4: The AI Agent Brain (Where the Magic Happens)

Found in: src/agents/

Okay, so here’s where it gets really interesting. OpenClaw uses something called “Pi” (pi-mono) as its AI agent runtime. Think of Pi as the actual AI brain that thinks, responds, and uses tools.

The Workspace: Where the AI Lives

Each agent has a folder (workspace) where it keeps its stuff. When you first set up an agent, it looks for these markdown files:

  • AGENTS.md - Operating instructions (“here’s how you should behave”)
  • SOUL.md - Personality and boundaries (“be helpful but not pushy”)
  • TOOLS.md - Notes about tools (“use the browser tool when you need to check a website”)
  • IDENTITY.md - Who the agent thinks it is
  • USER.md - Info about you (so it can personalize responses)
  • HEARTBEAT.md - Periodic check-in instructions
  • BOOTSTRAP.md - First-time setup ritual (deleted after first run)

These get loaded when a conversation starts, so the AI knows its identity and how to act. It’s like giving the AI its “instruction manual” every time.

The Conversation Logs

Every chat is saved as a .jsonl file (JSON lines format) at:

~/.openclaw/agents/<agent-name>/sessions/<session-id>.jsonl

Each line is a turn in the conversation. This is how the AI remembers what you talked about!

Tools: What Can the AI Actually DO?

This blew my mind - the AI can do way more than just chat. It has tools for:

File operations:

  • read, write, edit - Work with files
  • apply_patch - Apply code patches

Running code:

  • exec, bash - Run commands on your computer
  • process - Manage background processes

Web stuff:

  • web_search - Search the internet
  • web_fetch - Grab content from URLs
  • browser - Control a real browser (click buttons, fill forms!)

Messaging:

  • message - Send messages to other channels
  • sessions_* - Manage multiple conversation sessions

Your devices:

  • canvas - Show visual content on your phone/tablet
  • camera - Take photos with your phone
  • nodes - Control connected devices

Automation:

  • cron - Schedule recurring tasks
  • gateway - Control the Gateway itself

Tool Profiles: Controlling What’s Allowed

You can configure “tool profiles” to limit what an agent can do:

  • minimal - Just chat, no tools
  • messaging - Can send messages
  • coding - Can read/write files and run code
  • full - Everything is allowed

Here’s a config example:

{
  tools: {
    profile: "coding",  // Allow coding tools by default
    deny: ["exec"]      // But don't let it run commands
  }
}

You can even set different tool policies for different AI providers! Like, maybe Google’s models get fewer tools than Claude.

Files I Looked At

  • src/agents/ - How the Pi runtime integrates
  • src/config/ - Configuration loading and validation

Core Component #5: Media Pipeline (Images, Audio, Video)

Location: src/media/

One thing I didn’t expect: OpenClaw has a full media pipeline. You can send images, audio, video - and the AI can process them.

Inbound media flow:

  1. You send an image to WhatsApp
  2. Gateway saves it: ~/.openclaw/media/inbound/
  3. If it’s audio, it can transcribe it
  4. If it’s an image, it can analyze it with vision models

Outbound media flow:

  1. AI wants to send an image
  2. It outputs: MEDIA:/path/to/image.jpg or MEDIA:https://url-to-image
  3. Gateway parses it with splitMediaFromOutput()
  4. Sends via the appropriate channel adapter

Image operations they support:

  • Resize images (resizeToJpeg())
  • Get metadata like dimensions
  • Temporary hosting with TTL cleanup

Pretty comprehensive for a self-hosted system!

Multi-Platform: From Raspberry Pi to iPhone

One thing that impressed me: OpenClaw runs on everything.

Where Can You Run the Gateway?

Desktop/Server:

  • macOS - LaunchAgent (background service) - Can be Gateway & Node
  • Linux - systemd (system service) - Can be Gateway & Node
  • Windows - Via WSL2 + systemd - Can be Gateway only
  • Raspberry Pi - systemd - Can be Gateway & Node

Mobile:

  • iOS - Foreground app - Can be Node only
  • Android - Foreground service - Can be Node only

The key insight: iOS and Android never run the Gateway. They’re “nodes” - they connect to a Gateway running somewhere else (your Mac, Linux server, or Raspberry Pi).

What's a "Node"?

A node is a device that extends the Gateway’s capabilities. Your iPhone can be a node that provides:

  • Camera (take photos on command)
  • Screen recording
  • Canvas display (AI can show you visual content)
  • Location services
  • Voice wake (“Hey AI, what’s the weather?”)

Nodes connect with role: "node" and declare their capabilities. The Gateway then knows: “Oh, this iPhone can take photos and show a canvas. Cool!”

Security & Device Discovery

Device Pairing: All clients (including nodes) must pair with the Gateway. Local connections are auto-approved, while remote connections require challenge/response authentication and explicit approval. Once approved, devices get a token for future connections. Approve pending devices with: openclaw nodes approve <device-id>

Discovery Methods: Devices can find the Gateway via Bonjour/mDNS (local network), Tailscale (VPN for cross-network), or manual host/port entry.

The Plugin System: Extending Everything

Location: src/plugins/ and extensions/

This part took me a while to understand, but it’s actually really well designed.

How Plugins Work

Plugins are TypeScript modules that get loaded at runtime. They run in-process with the Gateway (not sandboxed), so they’re treated as trusted code.

Discovery order (first match wins):

  1. Paths in config: plugins.load.paths
  2. Workspace extensions: <workspace>/.openclaw/extensions/
  3. Global extensions: ~/.openclaw/extensions/
  4. Bundled extensions: <openclaw>/extensions/

The Plugin API

When your plugin loads, it gets an api object with methods like:

export function myPlugin(api) {
  // Register a channel
  api.registerChannel({ ... });

  // Register a tool
  api.registerTool({ ... });

  // Register a provider (LLM auth)
  api.registerProvider({ ... });

  // Add HTTP routes
  api.registerHttpRoute({ ... });

  // Hook into events
  api.registerHook("before_agent_start", async (ctx) => {
    // Your code here
  });
}

Plugin Manifest

Every plugin needs an openclaw.plugin.json:

{
  "id": "my-plugin",
  "channels": ["my-channel"],
  "configSchema": {
    "type": "object",
    "properties": {
      "apiKey": { "type": "string" }
    }
  }
}

The manifest is validated before the plugin code runs. This is smart - it means config errors are caught early, before any plugin code executes.

Types of Plugins

  • Channel plugins - Add messaging platform support
  • Provider plugins - Add LLM/auth integrations
  • Memory plugins - Add memory storage backends
  • Skill plugins - Add specialized capabilities

The “Slots” System

Some plugin types are exclusive. Like memory plugins - you can only have one active at a time. You configure which one with:

{
  plugins: {
    slots: {
      memory: "memory-lancedb"  // Use LanceDB for memory
    }
  }
}

Pretty clever way to handle mutually exclusive functionality!

The Data Flow: From Your Message to AI Response

Let me walk through what happens when you send “What’s the weather?” on WhatsApp:

Data Flow Diagram

Step by Step

  1. Inbound: WhatsApp monitor receives your message
  2. Access Control: Gateway checks if you’re allowed (allowlist, pairing status)
  3. Group Gating: If it’s a group, check if the bot was mentioned
  4. Routing: resolveAgentRoute() picks which agent handles this
  5. Context: Add previous messages, inject template variables like {{MediaPath}}
  6. Agent Run: Pi agent processes your message, maybe uses tools
  7. Outbound: Response gets normalized (split if too long, formatted for the channel)
  8. Delivery: Channel-specific sender delivers it back to you

The Context Templating

Before the agent sees your message, OpenClaw can inject context variables:

  • {{MediaPath}} - Paths to any images/files you sent
  • {{Transcript}} - Audio transcriptions
  • User info from USER.md
  • Previous conversation history

This is how the AI gets all the context it needs!

Design Patterns I Noticed

1. Dependency Injection

Instead of hardcoding channel send functions, they use dependency injection:

// In src/cli/deps.ts
function createDefaultDeps() {
  return {
    sendWhatsApp: (...) => { ... },
    sendTelegram: (...) => { ... },
    // etc
  };
}

// Commands receive these as parameters
async function handleCommand(deps) {
  await deps.sendWhatsApp(message);
}

This makes testing way easier - you can inject mock functions!

2. Configuration Hot Reload

The Gateway can reload most config changes without restarting:

{
  gateway: {
    reload: {
      mode: "hybrid"  // or "hot", "restart", "off"
    }
  }
}

Super useful for development!

3. Protocol Type Safety

They generate types across languages:

  1. Write TypeBox schemas (TypeScript)
  2. Generate JSON Schema
  3. Generate Swift models for iOS/macOS/Android

Scripts: scripts/protocol-gen.ts, scripts/protocol-gen-swift.ts

This means the protocol is type-safe on both sides!

4. Module Organization

  • Tests live next to source: foo.ts and foo.test.ts in same folder
  • E2E tests: foo.e2e.test.ts
  • Keep files under 500-700 lines (guideline, not hard rule)
  • ESM-only, no CommonJS

What Makes This Architecture Special?

After spending days in the code, here’s what I think they got really right:

1. Gateway-Centric Design

Having one Gateway that manages everything simplifies deployment massively. You don’t need to coordinate multiple services - it’s just one process.

2. Unified Channel Model

Treating all messaging platforms the same internally is genius. Whether it’s WhatsApp or Discord, they all have the same concepts: messages, users, groups. The channel-specific stuff is isolated in adapters.

3. Role-Based Device Multiplexing

The same WebSocket protocol works for control clients (your laptop) and nodes (your phone). They just declare different capabilities. Elegant!

4. Plugin Runtime Injection

Plugins don’t import from core code. Instead, they get an api.runtime object that exposes what they need. This keeps plugins from depending on internal implementation details.

5. Hierarchical Routing

The routing priority system (peer → guild → team → channel → default) is deterministic and intuitive. You can predict which agent will handle a message.

6. Tool Profiles

Being able to configure tool access per agent and per provider is really flexible. You might want your personal agent to have full file access, but limit your work agent.

What I’m Still Figuring Out

Even after 3-4 days, there are parts I haven’t fully grasped:

  • How exactly does the session pruning work?
  • The memory plugin system seems complex - need to dig deeper
  • The Canvas A2UI stuff is interesting but I only scratched the surface
  • Haven’t fully explored the webhook/cron automation features

That’s the thing about a mature codebase - there’s always more to learn!

Deployment Options

You can run OpenClaw in several ways:

Simple:

npm install -g openclaw@latest
openclaw onboard --install-daemon

Docker: Multi-stage builds, PNPM-based

Nix: Declarative configuration for the Nix folks

Cloud Platforms:

  • Fly.io
  • Railway
  • GCP
  • Hetzner
  • Oracle Cloud (free tier works!)

Raspberry Pi: Perfect for an always-on gateway. A Pi 4 with 4GB RAM works great!

Where Everything Lives

~/.openclaw/
├── openclaw.json           # Main config
├── credentials/            # Channel auth tokens
├── agents/
│   └── <agent-id>/
│       └── sessions/       # Conversation logs
├── workspace/              # Agent working directory
│   ├── AGENTS.md
│   ├── SOUL.md
│   └── TOOLS.md
└── media/
    └── inbound/            # Received images/audio

Security Model

Direct Message Access

By default, OpenClaw is locked down:

  • Unknown senders get a pairing code
  • You must explicitly approve them: openclaw pairing approve <code>
  • Only then can they talk to your AI

To open it up (not recommended for public bots):

{
  channels: {
    whatsapp: {
      dmPolicy: "open",
      allowFrom: ["*"]
    }
  }
}

Tool Security

You control what tools each agent can use:

{
  tools: {
    profile: "coding",  // Default: coding tools
    deny: ["exec"]      // But no command execution
  },
  agents: {
    list: [{
      id: "personal",
      tools: {
        profile: "full"  // Override: this agent gets everything
      }
    }]
  }
}

Gateway Authentication

  • Token or password required for non-local connections
  • Device pairing for nodes
  • Challenge/response for remote connects
  • Optional TLS + certificate pinning

Conclusion: What I Learned

After spending 3-4 days deep in this codebase, I'm impressed by how well-architected OpenClaw is. Here's what stands out:

It's genuinely local-first: Not "local-first with cloud backup" - actually local-first. Your gateway runs on your hardware, your data stays on your hardware (unless you explicitly send it to an LLM provider).

It's built for extension: The plugin system isn't an afterthought. Channels, tools, providers, memory backends - all pluggable.

It handles complexity well: Supporting 30+ messaging platforms, multiple AI providers, mobile nodes, tool security - and it still feels manageable.

Mobile devices are first-class: iOS and Android aren't just "also supported" - they're designed as nodes from the ground up.

The code is readable: Despite being a complex system, the code is organized logically. Tests are colocated, files are reasonably sized, and there's good separation of concerns.

What's Unique Here?

Compared to other AI assistant projects I've looked at:

  • Most are single-channel (just Discord or just Telegram)
  • Most require cloud deployment
  • Most don't support multiple agent routing
  • Most don't have a node system for extending with your devices
  • Most don't have this level of plugin extensibility

OpenClaw does all of these things.

Things I'd Love to Explore More

  • The memory system with LanceDB integration
  • How the browser automation tool works
  • The Canvas A2UI rendering system
  • Cron jobs and webhook automation
  • Voice wake on macOS/iOS

Resources for Going Deeper

If you want to explore further:

Final Thoughts

I started this journey not knowing much TypeScript and definitely not understanding how multi-channel AI assistants work. Three days later, I have a much better grasp of both.

What impresses me most is that OpenClaw solves a real problem - having your AI assistant everywhere you communicate - without compromising on privacy or control. And the architecture makes it possible to extend the system without needing to understand the entire codebase.

If you're building something similar, or just want to understand how complex TypeScript projects are structured, I'd recommend exploring the OpenClaw codebase. It's a great example of:

  • Clean separation of concerns
  • Plugin architecture done right
  • Protocol-based communication
  • Multi-platform support
  • Security-first design

Now I need to actually use it instead of just reading the code. But hey, understanding the architecture first isn't a bad approach!


What's Next for Me?

After understanding how OpenClaw works, I'm planning to build my own super lightweight version using a tech stack I'm more comfortable with. Not to replace OpenClaw - it's way more mature and feature-rich than what I could build - but as a learning exercise to really internalize these architectural concepts.

I'm thinking:

  • Maybe Python instead of TypeScript (I know it better)
  • Start with just 1 messaging channels instead of 30+
  • Simpler routing (no multi-agent at first)
  • Basic tool system (just file operations and web search)
  • No mobile nodes initially - just get the core Gateway → Agent flow working

The goal isn't to build something production-ready. It's to learn by doing. Sometimes the best way to understand an architecture is to try implementing a simplified version yourself. I'll probably hit all the edge cases and "gotchas" that the OpenClaw team already solved, but that's part of the learning process!


💬 Reach out on LinkedIn or subscribe on Substack for more deep dives into AI systems and architecture.