aitm 1.0: a terminal where the AI is a participant, not the driver

I was doing AI-assisted coding inside a terminal session. The AI kept modifying files, but I had no way to view those changes in the same window — I had to switch apps, switch context, come back. Every loop through the cycle was an interruption. What I wanted was simple: the terminal and the AI and the files, all in the same place, without the context-switching. So I built it. That's the origin of aitm. The version 1.0 is that thing, shipped. The design choice: participant, not driver Most AI terminals are built around a single model: you describe intent, the AI executes. It's efficient when it works and a bad afternoon when it doesn't. aitm draws a different line. The AI can see your environment, read your files, and call tools — but execution is always gated by you. The invariant is: AI suggests → you decide → it happens. In practice: the AI calls list_files, read_file, get_terminal_history, and search_history automatically. These are read-only. You see the results in the conversation as they come in. But run_command — anything that changes state — stops the loop and waits for your approval. That distinction sounds obvious in retrospect. It wasn't obvious at design time. The first version had a "trust mode" that auto-approved low-risk commands. I removed it. The UX was slightly smoother; the feeling of being in control was not worth trading away. Why Tauri and Rust Electron was the obvious first option to evaluate. At idle: ~150 MB RAM, several seconds to show the window. Fine for a prototype, not acceptable for something that sits open all day. So Electron was ruled out early. The choice was Tauri 2 + Rust from the start. Two months to get to something usable. The numbers: 5.3 MB binary (vs 150+ MB Electron) 3–5 ms cold start ~30 MB RAM at idle The React 19 frontend handles the UI. The Rust layer handles the PTY, IPC, tool execution, and security. This matters: the security gates run in Rust, which means the JavaScript/React layer cannot bypass them. The AI layer sends requests over Tauri IPC; the Rust handler is the one that decides whether a command actually runs. The four-layer security model Every run_command call goes through four sequential gates before it reaches your shell. L1 — Blocklist regex. Hard-coded patterns that always fail. rm -rf /, the fork bomb :(){ :|:& };:, dd if=/dev/zero, and ~50 others. These are commands where "the user confirmed it" is still not enough — the blocklist exists precisely to be unconditional. L2 — Heuristic risk scoring. The command string is scored against a set of signals: does it touch /, use redirection (>), pipe to sh, reference system directories? The output is DESTRUCTIVE, HIGH, or LOW. This label shows up in the confirmation dialog so you can see why something got flagged. L3 — Project scope allowlist. Each session has a configured project directory. You can define a globset — paths and patterns the AI is allowed to operate on. Anything outside scope is flagged before reaching L4. This is opt-in, but it's what makes "AI working on this project" distinct from "AI with access to your whole machine." L4 — Explicit user confirmation. Every run_command produces a modal: full command text, risk level, scope check result. There is no auto-approve mode and no way to configure one. L1 and L2 run synchronously in Rust with no async overhead. L3 uses the globset crate. L4 is a hard gate in the IPC handler — no call path in the AI layer can reach the shell without passing through it. run_command request │ ├─ L1: blocklist regex ────────────► REJECT immediately │ ├─ L2: heuristic scoring │ DESTRUCTIVE / HIGH / LOW │ ├─ L3: project scope check ────────► flag if out of scope │ └─ L4: confirmation modal ─────────► shell only if approved What else shipped in 1.0 The tool loop and security model are the headline. Everything else in 1.0 was stuff I'd been deferring: Project scope + SQLite persistence. Sessions now have a project directory. Conversation history, session state, and config all live in a local SQLite database. Nothing leaves your machine, no account required. Six LLM providers. OpenAI, Anthropic, DeepSeek, Qwen (Alibaba DashScope), Zhipu, and Moonshot (Kimi). You can switch per session. The six were chosen for coverage: Western API providers plus the major Chinese providers for users who want lower-latency access from CN. Eight themes, English and Chinese UI. The themes are opinionated. There's a dark ink-wash one that I find easier on the eyes during long sessions. Split-pane CodeMirror editor. A file editor built into the window. For quick edits without losing terminal context. macOS Developer ID notarization. The .dmg is signed and notarized. No Gatekeeper warning on first launch. What's not in 1.0 Windows support exists — CI builds it, I've run it — but macOS is the platform I use daily and where the edge cases are best covered. Windows testing is less thorough. There's no streaming AI response in the tool-calling loop. The AI responds after all tool calls complete. In practice the wait is usually under two seconds, but it's a noticeable gap when a sequence involves several reads. I'll revisit this in 1.1. Plugin system and user-defined tools are on the roadmap, not here. Download Binary at the GitHub release: macOS Apple Silicon Windows x86_64 Windows ARM64 Source on GitHub under Apache 2.0. Issues and discussions are open. If you want to talk about the security model specifically, that's where to do it.

What I wanted was simple: the terminal and the AI and the files, all in the same place, without the context-switching. So I built it.

That's the origin of aitm. The version 1.0 is that thing, shipped.

The design choice: participant, not driver

Most AI terminals are built around a single model: you describe intent, the AI executes. It's efficient when it works and a bad afternoon when it doesn't.

aitm draws a different line. The AI can see your environment, read your files, and call tools — but execution is always gated by you. The invariant is:

AI suggests → you decide → it happens.

In practice: the AI calls list_files, read_file, get_terminal_history, and search_history automatically. These are read-only. You see the results in the conversation as they come in. But run_command — anything that changes state — stops the loop and waits for your approval.

That distinction sounds obvious in retrospect. It wasn't obvious at design time. The first version had a "trust mode" that auto-approved low-risk commands. I removed it. The UX was slightly smoother; the feeling of being in control was not worth trading away.

Why Tauri and Rust

Electron was the obvious first option to evaluate. At idle: ~150 MB RAM, several seconds to show the window. Fine for a prototype, not acceptable for something that sits open all day. So Electron was ruled out early.

The choice was Tauri 2 + Rust from the start. Two months to get to something usable. The numbers:

5.3 MB binary (vs 150+ MB Electron)
3–5 ms cold start
~30 MB RAM at idle

The React 19 frontend handles the UI. The Rust layer handles the PTY, IPC, tool execution, and security. This matters: the security gates run in Rust, which means the JavaScript/React layer cannot bypass them. The AI layer sends requests over Tauri IPC; the Rust handler is the one that decides whether a command actually runs.

The four-layer security model

Every run_command call goes through four sequential gates before it reaches your shell.

L1 — Blocklist regex. Hard-coded patterns that always fail. rm -rf /, the fork bomb :(){ :|:& };:, dd if=/dev/zero, and ~50 others. These are commands where "the user confirmed it" is still not enough — the blocklist exists precisely to be unconditional.

L2 — Heuristic risk scoring. The command string is scored against a set of signals: does it touch /, use redirection (>), pipe to sh, reference system directories? The output is DESTRUCTIVE, HIGH, or LOW. This label shows up in the confirmation dialog so you can see why something got flagged.

L3 — Project scope allowlist. Each session has a configured project directory. You can define a globset — paths and patterns the AI is allowed to operate on. Anything outside scope is flagged before reaching L4. This is opt-in, but it's what makes "AI working on this project" distinct from "AI with access to your whole machine."

L4 — Explicit user confirmation. Every run_command produces a modal: full command text, risk level, scope check result. There is no auto-approve mode and no way to configure one.

L1 and L2 run synchronously in Rust with no async overhead. L3 uses the globset crate. L4 is a hard gate in the IPC handler — no call path in the AI layer can reach the shell without passing through it.

run_command request
    │
    ├─ L1: blocklist regex ────────────► REJECT immediately
    │
    ├─ L2: heuristic scoring
    │       DESTRUCTIVE / HIGH / LOW
    │
    ├─ L3: project scope check ────────► flag if out of scope
    │
    └─ L4: confirmation modal ─────────► shell only if approved

What else shipped in 1.0

The tool loop and security model are the headline. Everything else in 1.0 was stuff I'd been deferring:

Project scope + SQLite persistence. Sessions now have a project directory. Conversation history, session state, and config all live in a local SQLite database. Nothing leaves your machine, no account required.

Six LLM providers. OpenAI, Anthropic, DeepSeek, Qwen (Alibaba DashScope), Zhipu, and Moonshot (Kimi). You can switch per session. The six were chosen for coverage: Western API providers plus the major Chinese providers for users who want lower-latency access from CN.

Eight themes, English and Chinese UI. The themes are opinionated. There's a dark ink-wash one that I find easier on the eyes during long sessions.

Split-pane CodeMirror editor. A file editor built into the window. For quick edits without losing terminal context.

macOS Developer ID notarization. The .dmg is signed and notarized. No Gatekeeper warning on first launch.

What's not in 1.0

Windows support exists — CI builds it, I've run it — but macOS is the platform I use daily and where the edge cases are best covered. Windows testing is less thorough.

There's no streaming AI response in the tool-calling loop. The AI responds after all tool calls complete. In practice the wait is usually under two seconds, but it's a noticeable gap when a sequence involves several reads. I'll revisit this in 1.1.

Plugin system and user-defined tools are on the roadmap, not here.

Download

Binary at the GitHub release:

macOS Apple Silicon
Windows x86_64
Windows ARM64

Source on GitHub under Apache 2.0. Issues and discussions are open. If you want to talk about the security model specifically, that's where to do it.