February 3, 2026·7 min read

Kimi K2.5 — Agent Swarm: Scaling Out, Not Just Up

agentsarchitectureopen-sourcemulti-modalreinforcement-learning

What is Kimi K2.5?

Kimi K2.5 is an open-source multimodal AI model from Moonshot AI that revolutionized how we think about AI execution. Instead of making models bigger or think longer, K2.5 introduced Agent Swarm — a paradigm where up to 100 sub-agents work simultaneously on parallelizable subtasks.

Built on Kimi K2 with ~15T mixed visual and text tokens of continued pretraining, K2.5 packs 1 trillion total parameters with only 32B activated through a Mixture-of-Experts architecture (384 experts, 8 selected per token). The result: 80% runtime reduction on complex tasks while matching GPT-5.2 and Claude Opus 4.5 on benchmarks.

🌙

Moonshot AI — Chinese AI Startup

Founded 2023, focused on long-context and agentic AI systems. Kimi.com serves millions of users across Asia.

⭐ GitHub Repository 🌐 Try at kimi.com 🤗 HuggingFace Model

How Agent Swarm Works

At the heart of K2.5 is Parallel-Agent Reinforcement Learning (PARL) — a training method that teaches the model to decompose complex tasks into parallelizable subtasks. Unlike frameworks like CrewAI or LangGraph that require manual workflow definition, K2.5 learns the decomposition itself.

🧠 Orchestrator Agent

Trainable component that analyzes incoming tasks and decomposes them into independent subtasks. No predefined roles — figures out the optimal decomposition through RL.

⚡ Frozen Sub-agents

Dynamically instantiated, specialized agents that execute individual subtasks. Each gets its own context and tools. Up to 100 can run simultaneously.

🎯 Critical Steps Metric

Instead of measuring total computation, K2.5 tracks the critical path — the longest chain of dependent steps. This forces genuine parallelism and prevents the model from just creating busywork. Result: 3-4.5x wall-clock speedup vs single-agent execution.

Key Concepts

Why Agent Swarm Matters

Kimi K2.5 represents a fundamental shift from "scale up" (bigger models, longer thinking) to "scale out" (parallel agents). This matters because single-agent execution hits hard walls.

Sequential tool calls are slow. Context windows fill up. Latency compounds with each step. Agent Swarm shows that the orchestration itself can belearned through RL, not hand-coded like traditional multi-agent frameworks.

🚀 Performance Breakthrough

BrowseComp: 78.4 with swarm vs 60.6 single. WideSearch: 79.0 vs 72.7. Competitive with GPT-5.2 and Claude Opus 4.5 on real benchmarks.

🔓 Open Source at Scale

1T parameters open-sourced — pushes the boundary of what's publicly available. Challenges the proprietary model hegemony.

Deep Dive: The Agent Swarm Revolution

🐌The Problem with Sequential Agents

Traditional AI agents execute tasks sequentially — one tool call after another. This creates fundamental bottlenecks:

⏱️ Latency Compounds

Each API call adds 1-3 seconds. A 10-step task takes 10-30 seconds, even if many steps could run simultaneously.

🧠 Context Bloat

Long task histories fill the context window. The agent spends tokens on irrelevant intermediate steps instead of the core task.

⚡ No Parallelism

Researching three topics? Must do them one by one, even though they're independent. Wastes both time and computational resources.

Agent Swarm solves this by identifying which subtasks are truly independent and executing them simultaneously. The orchestrator learns to find the optimal decomposition through reinforcement learning.

🎓How PARL Training Works

Parallel-Agent Reinforcement Learning (PARL) is the secret sauce. The training process has three key components:

1. Orchestrator Training

The orchestrator agent learns to decompose complex tasks into parallelizable subtasks. It's trained end-to-end with the frozen sub-agents to optimize for the total reward.

2. Three-Part Reward Function

r_parallel: Prevents "serial collapse" — rewards genuine parallelism over sequential execution

r_finish: Prevents spurious parallelism — penalizes unnecessary sub-agents that don't contribute

r_perf: Task success — did we actually solve the original problem correctly?

3. Staged Reward Annealing

The λ hyperparameters for auxiliary rewards (r_parallel, r_finish) are gradually annealed to zero during training. This ensures the model learns proper decomposition early, then focuses purely on task performance.

🎯 Key Insight

Unlike hand-coded multi-agent systems, the decomposition strategy emerges from the training data and reward signal. The orchestrator discovers which types of tasks parallelize well and which don't.

⚡Agent Swarm in Action

Let's trace through a complex task to see how Agent Swarm transforms execution. Consider: "Research the top 5 AI papers this month, summarize each, and create a comparative analysis."

🐌 Single-Agent Approach (Old)

1. Search for AI papers this month → 3 seconds

2. Read paper 1, summarize → 15 seconds

3. Read paper 2, summarize → 15 seconds

4. Read paper 3, summarize → 15 seconds

5. Read paper 4, summarize → 15 seconds

6. Read paper 5, summarize → 15 seconds

7. Write comparative analysis → 10 seconds

Total: ~93 seconds, 7 sequential steps

🐝 Agent Swarm Approach (New)

1. Orchestrator: Search + identify top 5 papers → 3 seconds

2a. Sub-agent 1: Summarize paper 1 → 15s (parallel)

2b. Sub-agent 2: Summarize paper 2 → 15s (parallel)

2c. Sub-agent 3: Summarize paper 3 → 15s (parallel)

2d. Sub-agent 4: Summarize paper 4 → 15s (parallel)

2e. Sub-agent 5: Summarize paper 5 → 15s (parallel)

3. Orchestrator: Write comparative analysis → 10 seconds

Total: ~28 seconds, 3 critical steps (80% reduction!)

📊 Real Benchmark Results

BrowseComp: 78.4 (swarm) vs 60.6 (single) — complex web research tasks

WideSearch: 79.0 (swarm) vs 72.7 (single) — broad information synthesis

SWE-Bench: 47.2 (swarm) vs 41.8 (single) — software engineering tasks

🎨Vision + Code: A New Paradigm

K2.5's native multimodality unlocks powerful new workflows. Unlike vision adapters bolted onto language models, MoonViT is trained jointly with the language components from the start. This enables true cross-modal reasoning.

🖼️ UI-to-Code Generation

Upload a design mockup or screenshot. K2.5 analyzes the visual layout, identifies components, and generates React/Vue/HTML code that matches the design. Works across multiple UI frameworks.

🎥 Video Workflow Analysis

Record a video of yourself using an app or performing a task. K2.5 analyzes the workflow frame-by-frame and generates automation scripts (Playwright, Selenium, etc.) to replicate the process.

🐛 Visual Debugging

When code doesn't work as expected, K2.5 can analyze screenshots of the actual output vs the intended design. It identifies visual discrepancies and suggests specific CSS/styling fixes.

The Agent Swarm architecture amplifies these capabilities. One sub-agent can analyze the visual design while another researches best practices for the target framework, and a third generates accessibility annotations — all in parallel.

📊Benchmark Deep Dive

K2.5's performance across diverse benchmarks reveals where Agent Swarm excels and where traditional approaches still compete:

🚀 K2.5 Strengths

Agentic Tasks

BrowseComp: 78.4 | WebNav: 82.1 | MultiStep: 76.8

Vision + Reasoning

MathVista: 68.3 | MMMU: 73.2 | ChartQA: 81.4

Coding (Complex)

SWE-Bench: 47.2 | LiveCodeBench: 52.8

⚖️ Competitive Areas

Pure Reasoning

MMLU: 84.7 | HellaSwag: 89.2 (vs GPT-5.2: 86.1, 91.4)

Language Understanding

SuperGLUE: 87.3 | WinoGrande: 84.6

Simple Coding

HumanEval: 78.4 | MBPP: 82.1 (parallelism less useful)

🧠 Key Insight

Agent Swarm provides the biggest advantage on tasks that naturally decompose into parallel subtasks. Pure reasoning or simple coding problems don't benefit as much, but complex multi-step workflows see dramatic speedups.

How K2.5 Compares

vs GPT-5.2

OpenClaw

Competitive benchmarks across reasoning, coding, vision. Open-source with native swarm capability built-in.

GPT-5.2

Proprietary, excellent quality but lacks native parallel execution. Single-agent bottlenecks on complex tasks.

vs Claude Opus 4.5

OpenClaw

Similar coding performance, K2.5 stronger on vision/search benchmarks with swarm parallelization.

Claude Opus 4.5

Excellent reasoning and safety, but sequential execution model. No built-in parallel decomposition.

vs CrewAI / LangGraph

OpenClaw

Learns task decomposition through RL — no manual workflow definition. Self-optimizing orchestration.

CrewAI / LangGraph

Requires manual workflow and role definition. Hand-crafted agent interactions, not learned.

vs DeepSeek V3.2

OpenClaw

Stronger across most benchmarks, especially agentic tasks. Native multimodality and vision capabilities.

DeepSeek V3.2

Excellent at reasoning and math, but primarily text-focused. No parallel agent architecture.

Get Started with Kimi K2.5

Experience Agent Swarm through the web interface

# Try online at kimi.com (Agent Swarm mode for paid users)

Install Kimi Code terminal tool with vision support

npm install -g @anthropic-ai/kimi-code

Get API keys for programmatic access to K2.5 and Agent Swarm

# API access via platform.moonshot.ai

← Back to all topics

Kimi K2.5 — Agent Swarm: Scaling Out, Not Just Up

What is Kimi K2.5?

How Agent Swarm Works

🧠 Orchestrator Agent

⚡ Frozen Sub-agents

🎯 Critical Steps Metric

Key Concepts

Mixture-of-Experts (MoE)

Native Multimodality

Agent Swarm

PARL Training

Critical Steps

Kimi Code

Coding with Vision

Why Agent Swarm Matters

Deep Dive: The Agent Swarm Revolution

🐌The Problem with Sequential Agents

🎓How PARL Training Works

1. Orchestrator Training

2. Three-Part Reward Function

3. Staged Reward Annealing

⚡Agent Swarm in Action

🐌 Single-Agent Approach (Old)

🐝 Agent Swarm Approach (New)

🎨Vision + Code: A New Paradigm

🖼️ UI-to-Code Generation

🎥 Video Workflow Analysis

🐛 Visual Debugging

📊Benchmark Deep Dive

🚀 K2.5 Strengths

⚖️ Competitive Areas

How K2.5 Compares

vs GPT-5.2

vs Claude Opus 4.5

vs CrewAI / LangGraph

vs DeepSeek V3.2

Get Started with Kimi K2.5