Lean system prompts Cost-optimized architecture Zero telemetry Office integration Specialist delegation Cross-session memory Native multi-protocol

Precision context.
Minimal cost.

OmniContext CLI is a terminal-native coding assistant that treats context as a first-class resource. Lean system prompts keep overhead low. Specialist delegation routes grunt work to cheaper models while keeping your main context clean. Zero telemetry means your code never leaves your machine. And it extends into VS Code, Office, the browser, and Figma.

$ npm install -g omni-context-cli && omx

Read the Docs

Terminal

╔═╗┌┬┐┌┐┌┬  ╔═╗┌─┐┌┐┌┌┬┐┌─┐─┐ ┬┌┬┐  ╔═╗╦  ╦
║ ║│││││││  ║  │ ││││ │ ├┤ ┌┴┬┘ │   ║  ║  ║
╚═╝┴ ┴┘└┘┴  ╚═╝└─┘┘└┘ ┴ └─┘┴ └─ ┴   ╚═╝╩═╝╩

▸ Version: 0.0.70 ▸ Project: my-project ▸ Session: 1771152411043-cipsnmcqu

Omni Context CLI. Tell Omx what you want to do.

Anthropic: Claude Opus 4.6 (Thinking) | 0.0% (⇈ 0 ⇊ 0 ↺ 0) (Press ESC to enter the menu)

❯ Type your message...

How It Works

Specialist delegation:
fewer rounds, lower cost

Traditional assistants call basic tools one at a time, resending your entire context with every round. OmniContext CLI delegates multi-step operations to agentic sub-agents running on a cheaper model -- your expensive model stays focused on reasoning, not file I/O.

Task: "Find the definition of handleAuth"

Traditional

R1 glob("src/**/*.ts")

43 files returned

R2 grep("handleAuth", ...)

7 matches in 4 files

R3 read("src/middleware/auth.ts")

186 lines -- wrong file

R4 read("src/routes/login.ts")

124 lines -- still looking

R5 read("src/services/auth.ts", 40-90)

Found it -- 50 more lines

5 rounds, ~12K context added, all on main model

Specialist Mode

R1 pluck("handleAuth definition")

Sub-agent (cheap model):

glob grep read locate extract

auth.ts:42-78 -- full function body

1 round, ~1K context added, grunt work on cheap model

Agentic Tools

10 tools that think for themselves

Each tool runs as an autonomous sub-agent on a cheaper model. It handles file I/O, error recovery, and retries internally -- keeping intermediate output out of your main context and your token bill down. Tip: start with glance and slice when exploring a codebase -- they're faster than hunting file by file.

explore

Survey project architecture. Understands directory layout, key files, and how the codebase is organized.

spark

Run shell commands with automatic error detection and retry. Handles build failures and install issues.

sculpt

Edit files with surgical precision. Finds the right location, makes the change, and validates the result.

weave

Write entire files from scratch with auto-validation. Handles formatting and structure automatically.

sweep

Find files matching complex criteria. Searches by name, content, or structure across your project.

pluck

Extract specific code segments from any file. Pulls functions, classes, or blocks you need.

ripple

Trace symbol references across your codebase. Finds every usage of a function, variable, or type.

slice

Answer targeted code questions. Reads only the relevant parts to give you focused answers.

quest

Research any topic via web search. Finds documentation, examples, and solutions from across the internet.

glance

Preview multiple files at once with brief summaries. Quickly understand what you are working with.

Workflow Presets

One assistant, many modes

Switch how OmniContext CLI behaves with a single command. Each preset changes the tools available, the system prompt, and the response style.

Specialist

Default

Your main model reasons, a cheaper agent model executes. Agentic tools keep the cheap model out of decisions. Fewer rounds, cleaner context, lower cost.

Explorer

Research-first mode. Launches multiple web searches before answering. Great for current events, docs, and fact-checking.

Artist

Visual-first responses. Prioritizes image generation when the model supports it. Ideal for design exploration and mockups.

Assistant

Personal assistant for app integrations. Controls browser tabs, Office documents, and Figma designs through natural language.

Normal

Basic tools with manual orchestration. Direct read, write, edit, and bash access. Full control, no abstraction.

Native Multi-Protocol

Four API protocols,
zero format conversion

Most tools funnel everything through a single API format and hope for the best. OmniContext CLI has a dedicated request builder and stream handler for each protocol. Prompt caching, extended thinking, and provider-specific features work exactly as the vendor intended -- no lossy translation layer in between.

Anthropic

Native Messages API with prompt caching, extended thinking, and streaming. Token-level cache control via custom TTL.

Claude Opus 4.6 / Claude Sonnet 4.6 / DeepSeek V3.2

OpenAI

Native Chat Completions API. Compatible with any endpoint that speaks the OpenAI format -- Zhipu, MiniMax, local models, and more.

GLM-5 / Minimax 2.5

Gemini / Vertex

Native generateContent API with Gemini-specific streaming. No OpenAI shim -- tools and function calling use Gemini's own schema.

Gemini 3 Pro / Gemini 3 Pro Image

Responses API

OpenAI's newer Responses API with built-in tool orchestration. Separate path from Chat Completions, not a compatibility wrapper.

GPT-5.2 / GPT-5.2-Codex

Cost Optimization

Specialist mode saves real money

Every API call resends your full conversation history. Fewer rounds means fewer cache reads. Cleaner context means fewer tokens written. Specialist mode cuts both -- and offloads the grunt work to a cheaper model.

Fewer API rounds

Traditional tools need 5 rounds to find a function definition. Specialist mode does it in 1. That is 4 fewer full-context resends -- saving cache read costs on every skipped round.

Smaller context growth

Basic tools dump ~10KB of intermediate output into your conversation. Agentic tools return only the final result. Context editing automatically trims old tool payloads and thinking blocks, keeping growth in check even over long sessions.

Cheap model for execution

Sub-agents run on a low-cost model (e.g. GLM-5) while your main model (e.g. Claude Opus 4.6) handles only planning and decisions. The expensive model never does file I/O.

1-hour cache for deep work

The default 5-minute prompt cache expires if you pause to think. Switch to 1-hour in preferences for debugging, refactoring, or research -- it eliminates repeated cache rebuilds across a session.

Simulated cost comparison: "Find the definition of handleAuth"

	Traditional	Specialist	Saved
API rounds	5	1	-4 rounds
Cache read per round	~20K tokens x 5	~20K tokens x 1	-80K tokens
New context added	~10KB	~3KB	-70%
Cache write (new tokens)	~2.5K tokens	~1K tokens	-60%
Execution model	Opus 4.6 only	Opus 4.6 + GLM-5	~30% cheaper

Based on a 20K-token conversation finding a function across a TypeScript project. Actual savings depend on project size and model pricing.

Model Providers

One command to add
all your models

OmniContext CLI ships with built-in provider presets. Pick one, paste your API key, and every model from that service is ready to use.

Zenmux

DeepSeek

OpenRouter

Zhipu (GLM)

MiniMax

Quick setup with Zenmux

# List available providers
$ omx --list-providers

# Add all Zenmux models in one go
$ omx --add-provider zenmux --api-key zmx-...
Added: Zenmux Anthropic (Claude Sonnet 4)
Added: Zenmux Anthropic (Claude Haiku)
Added: Zenmux Gemini (Gemini 2.5 Flash)
Added: Zenmux OpenAI (GPT-4o)
...

# Remove a provider just as easily
$ omx --remove-provider zenmux

Cross-Session Memory

It learns as you work

OmniContext CLI remembers your coding style, project patterns, and past mistakes across sessions. Key points are scored over time -- helpful insights stick around, irrelevant ones decay.

Agentic Context Engineering Extracts key points from every conversation and injects them into future sessions

Scored Memory Helpful points gain score (+1), harmful ones drop fast (-3), unused ones decay naturally

Per-Project Storage Each project has its own memory file. Edit it directly if you want full control.

memory.json +3

"This project uses TypeScript strict mode with path aliases configured in tsconfig"

memory.json +2

"API routes follow REST conventions in src/routes/ with Zod validation"

memory.json -4

"Uses Webpack for bundling" Decaying -- will be removed at -5

Integrations

Goes everywhere your work does

Terminal is home base, but OmniContext CLI reaches into every tool you use. One AI, consistent context, zero context switching.

VS Code Extension

Full IDE integration with file context, diagnostics, and diff views. OmniContext CLI sees what you see in the editor.

Active file awareness Selection context Error diagnostics Inline diffs

Desktop App

GUI for the CLI. Acts as the local hub connecting Office, browser, and Figma extensions.

Full CLI features Model management Serve mode Connection hub

Chrome Extension

Sidebar on any webpage. Summarize, extract data, run scripts, and automate browser tasks.

Office Add-in

AI panel inside Word, Excel, and PowerPoint. Create budgets, format docs, and design slides.

Figma Plugin

Inspect layouts, create shapes, modify nodes, and export assets through the chat panel.

Zed Editor

Works as an external agent via Agent Client Protocol. Full tool access inside Zed's agent panel.

Web Client

Browser UI with LaTeX, Mermaid diagrams, file attachments, and drag-and-drop support.

Mobile Access

Run omx --serve and connect from your phone. Code reviews from the couch.

Extensibility

Build on top of OmniContext CLI

Custom agents, skills, slash commands, and MCP servers. Everything is a markdown file or JSON config.

Custom SubAgents

Write a markdown file with a prompt template and tool permissions. It becomes a new agentic tool instantly. Add OMX-AGENTS.md for global agent instructions.

~/.omx/agents/review.md

Custom Skills

Teach OmniContext CLI domain-specific knowledge and workflows. Skills inject instructions into the current conversation.

~/.omx/skills/code-style/SKILL.md

Slash Commands

Create shortcuts for common prompts. Type /review and your custom prompt fires with Handlebars templating.

~/.omx/slash/review.md

MCP Servers

Connect external tools and data sources via Model Context Protocol. Stdio and HTTP transports supported.

~/.omx/mcp.json

Built Right

The details matter

Lean System Prompts

Minimal, focused instructions and concise tool descriptions. Your tokens go toward actual work, not bloated framework overhead.

Zero Telemetry

No usage tracking, no analytics, no data collection. Your code and conversations never leave your machine.

Context Editing

Automatically trims old tool call payloads and thinking blocks from your conversation history. Keeps token usage lean in long sessions.

Extended Thinking

Enable deeper reasoning for complex tasks. The model thinks step by step before responding, with configurable budget limits.

CLAUDE.md Compatible

Already have a CLAUDE.md in your repo? OmniContext CLI reads it automatically, right alongside OMX.md. Zero-friction migration.

Auto-Compaction

When context hits 80% capacity, the conversation is compacted, key memories are extracted, and a fresh session picks up where you left off.

Native Prompt Caching

Automatic cache control for Anthropic and Gemini. Custom TTL settings (5 min or 1 hour) keep frequently used context cached and costs down.

Project Instructions

Drop an OMX.md in your repo root. Everyone on the team gets the same conventions and context. Also reads CLAUDE.md for easy migration.

Precision context. Minimal cost.

Specialist delegation:fewer rounds, lower cost