Claude托管智能体新三件套:Dreaming + Outcomes + Multi-Agent

Claude Managed Agents: Dreaming, Outcomes & Multi-Agent Orchestration

← 返回技术AI

🎯 核心收获

🎯 Key Takeaways

1. Dreaming:让Agent学会"睡觉"进化

1. Dreaming: Let Agents Learn Through "Sleep"

2. Outcomes:独立评分官机制

2. Outcomes: Independent Evaluator Mechanism

3. Multi-Agent:主从协作模式

3. Multi-Agent: Lead-Sub Architecture

一、技术背景与问题定义

I. Technical Background & Problem Definition

1.1 Claude Managed Agents的定位

1.1 Claude Managed Agents Positioning

Claude Managed Agents是Anthropic在2026年4月8日发布的托管智能体平台,旨在帮助开发者快速构建和部署AI Agent。核心卖点是"10倍加速"——通过模块化Agent模板、集成记忆存储和增强编排框架,让开发者无需自己搭建基础设施。

Claude Managed Agents is a managed agent platform launched by Anthropic on April 8, 2026, designed to help developers quickly build and deploy AI agents. The core value proposition is "10x acceleration" - modular agent templates, integrated memory stores, and enhanced orchestration frameworks eliminate infrastructure complexity.

维度 Dimension 传统方式 Traditional Managed Agents
开发周期 Dev Cycle 数周 Weeks 分钟级 Minutes
基础设施 Infrastructure 需自建 Build 托管服务 Managed
记忆管理 Memory 手动实现 Manual Memory Store

1.2 三个核心痛点

1.2 Three Core Pain Points

痛点1:跨会话记忆衰退

Pain Point 1: Cross-session Memory Decay

任何长期运行的Agent都会遇到上下文膨胀问题:

Any long-running agent faces context window limitations:

痛点2:输出质量不稳定

Pain Point 2: Unstable Output Quality

LLM Agent有一个天然bias:它倾向于认为自己做完了。会话内自评,本质上是"让被告当法官"。

LLM agents have a natural bias: they tend to think they're done. In-session self-assessment is essentially "letting the defendant be the judge."

痛点3:复杂任务超出单Agent能力

Pain Point 3: Complex Tasks Exceed Single Agent Capability

当任务涉及多领域知识或需要并行处理时,单Agent的局限明显。

Single agent limitations become apparent when tasks involve multiple domains or require parallel processing.

二、Dreaming:离线自我进化机制

II. Dreaming: Offline Self-Improvement Mechanism

2.1 核心原理

2.1 Core Principles

Dreaming的灵感来源于人类REM睡眠机制。白天大脑吸收原始信息存成短期记忆,夜间REM阶段把当天经历重放一遍,强化有价值的连接、丢弃无用信息、整合成长期记忆。

Dreaming's inspiration comes from human REM sleep mechanisms. During the day, the brain stores raw information as short-term memory; during REM sleep, it replays experiences, strengthens valuable connections, discards useless information, and consolidates long-term memory.

Dreaming surfaces patterns that a single agent can't see on its own, including recurring mistakes, workflows that agents converge on, and preferences shared across a team.
Dreaming reveals patterns that a single agent cannot see independently, including recurring mistakes, workflow convergence points, and team-wide preferences.

2.2 工作流程

2.2 Workflow

  1. 触发条件:定时/会话数阈值
  2. Trigger: Schedule or session count threshold
  3. 读取数据:现有Memory Store + 最多100个历史会话
  4. Read: Existing Memory Store + up to 100 past sessions
  5. 执行操作
    • 合并重复项
    • 替换过时条目
    • 挖掘宏观规律
  6. Execute:
    • Merge duplicates
    • Replace stale entries
    • Discover hidden patterns
  7. 生成输出:新Memory Store(原始存储不修改)
  8. Output: New Memory Store (original unchanged)
  9. 审核应用:开发者审核或自动应用
  10. Review: Developer review or auto-apply

2.3 技术实现

2.3 Technical Implementation

from anthropic import Anthropic
client = Anthropic()

# 触发Dreaming
dream = client.beta.dreams.create(
    model="claude-sonnet-4-6",  # 或 claude-opus-4-7
    memory_store="ms_abc123",
    instructions="重点关注工具调用相关的模式"
)

# 等待完成
while dream.status in ("pending", "running"):
    dream = client.beta.dreams.retrieve(dream.id)

# 获取输出的新Memory Store
new_store_id = dream.outputs[0].id

2.4 关键设计原则

2.4 Key Design Principles

1. 原始数据不修改

1. Original Data Never Modified

Dreaming永远不会修改输入的原始记忆库。它生成的是一个全新的Memory Store实例,开发者可以预览新旧差异,决定apply或discard。

Dreaming never modifies the original input memory store. It generates a new Memory Store instance; developers can preview the diff and decide whether to apply or discard.

2. 实时监控

2. Real-time Monitoring

Dream任务进入running状态后,开发者可以流式订阅事件流:实时看到AI正在读取哪条记忆、正在写入什么新条目,发现问题时可随时"叫醒"(取消)。

When a Dream task enters running state, developers can stream-subscribe to events: see in real-time what memory is being read and what new entries are being written, and "wake up" (cancel) if issues are found.

"你趴在AI的床边,看着它做梦。"
"You're lying beside the AI's bed, watching it dream."

2.5 真实案例:月球着陆任务

2.5 Real Case: Moon Landing Mission

初始场景:6个候选着陆点,第一轮跑完2个点坠毁

Initial: 6 candidate landing sites, 2 crashed in first run

触发Dreaming:选择Opus 4.7模型,点击"Start dreaming"

Trigger Dreaming: Select Opus 4.7 model, click "Start dreaming"

执行结果

Result:

第二天验证:原来失败的2个站点全部修复,整体安全评分从67%提升到100%

Day 2 Verification: Previously failed 2 sites all fixed, overall safety score improved from 67% to 100%

三、Outcomes:独立评估器机制

III. Outcomes: Independent Evaluator Mechanism

3.1 核心问题:模型自评的bias

3.1 Core Problem: Self-Evaluation Bias

LLM Agent有一个天然bias:它倾向于认为自己做完了。传统评估方式的局限是Agent在自己的上下文中评估,受限于自己的推理过程。

LLM agents have a natural bias: they tend to think they're done. Traditional evaluation is limited because agents assess within their own context, influenced by their own reasoning.

本质上是让被告当法官。
Essentially letting the defendant be the judge.

3.2 解法:独立评估器

3.2 Solution: Independent Evaluator

Outcomes的解法:派一个独立的Evaluator Agent来当法官。

Outcomes solution: Deploy an independent Evaluator Agent as the judge.

  1. 开发者定义评分标准(rubric)
  2. Developer defines scoring rubric
  3. 系统分配独立评估器,在自己的上下文中评估
  4. System assigns independent evaluator with its own context
  5. 逐条打分,不通过的返回修改
  6. Score item by item, return for revision if failed
  7. 迭代直到达标或达到最大次数(默认3次,最高20次)
  8. Iterate until passing or max iterations (default 3, max 20)

3.3 评分标准示例

3.3 Rubric Example

# 文档生成评分标准

## 功能完整性
- [ ] CSV文件包含price列
- [ ] price列值为数值类型
- [ ] 数据行数不少于100条

## 格式规范
- [ ] 文件编码为UTF-8
- [ ] 列头名称正确
- [ ] 无空行

## 质量要求
- [ ] 无重复数据
- [ ] 无缺失值
- [ ] 数据在合理范围内

3.4 效果数据

3.4 Impact Data

指标 提升幅度 Improvement
整体任务成功率 Overall Task Success +10%
docx文档生成准确率 +8.4%
pptx幻灯片生成准确率 +10.1%

越难的问题,提升越明显——因为复杂任务更容易出现遗漏和错误,评估器的作用更大。

The harder the problem, the greater the improvement - complex tasks are more prone to omissions and errors, where the evaluator's value is greatest.

四、Multi-Agent Orchestration:多智能体协作

IV. Multi-Agent Orchestration: Multi-Agent Collaboration

4.1 架构设计

4.1 Architecture Design

Multi-Agent编排采用主从架构:

Multi-Agent orchestration uses a lead-sub architecture:

Lead Agent(主智能体)
├── 任务拆解
├── 子Agent调度
├── 上下文整合
└── 最终交付

Sub-Agents(子智能体)
├── Specialist A(专业领域A)
├── Specialist B(专业领域B)
└── ...

每个子Agent

Each Sub-Agent:

4.2 真实案例

4.2 Real Cases

Harvey法律AI:任务完成率提升6倍

Harvey Legal AI: 6x improvement in task completion rate

Netflix日志分析:并行处理数百个构建来源,只呈现跨多应用的重复模式

Netflix Log Analysis: Parallel processing across hundreds of build sources, highlighting cross-application patterns

Spiral写作Agent(by Every):

Spiral Writing Agent (by Every):

五、三件套组合效果

V. Combined Impact of the Three Features

5.1 互补关系

5.1 Complementary Relationship

功能 解决的问题 Problem Solved 状态
Dreaming 跨会话记忆衰退 Cross-session memory decay 研究预览 Research Preview
Outcomes 输出质量不稳定 Unstable output quality 公测 Public Beta
Multi-Agent 复杂任务处理 Complex task handling 公测 Public Beta

5.2 价值演进

5.2 Value Evolution

阶段 能力 Capability
第一阶段 Phase 1 运行环境托管 Runtime hosting
第二阶段 Phase 2 Memory Store
第三阶段 Phase 3 三件套:自主进化、质量保障、复杂任务处理 Three features: self-improvement, quality assurance, complex tasks

六、总结

VI. Summary

核心要点

Key Points

Anthropic给AI装上了REM睡眠,让Agent可以在"不工作"的时间里自主整理记忆、发现规律、自我进化。配合独立评分官和多Agent协作,把AI Agent从"能跑"推向"能用"。
Anthropic gave AI its own REM sleep, enabling agents to independently organize memories, discover patterns, and self-improve during "off" time. Combined with independent evaluators and multi-agent collaboration, this pushes AI agents from "can run" to "can use".

对看宝AI的启发

Implications for KanBao AI

🔗 相关链接

🔗 Related Links

官方资源

Official Resources

技术解读

Technical Analysis

学术背景

Academic Background