SkVM - Skills虚拟机深度研究

📚 学习来源

📚 Learning Source

类型	Type	学术论文 + 开源项目	Research Paper + Open Source
ArXiv	2604.03088
GitHub	SJTU-IPADS/SkVM
作者	Authors	Le Chen, Erhu Feng, Yubin Xia, Haibo Chen	Le Chen, Erhu Feng, Yubin Xia, Haibo Chen
机构	Institution	上海交通大学 IPADS 实验室	Shanghai Jiao Tong University IPADS Lab

🎯 核心收获

🎯 Key Takeaways

1. 问题洞察深刻Deep Problem Analysis

同一技能在不同模型与执行框架上的表现差异显著，有15%的任务在使用技能后性能下降，87%的任务至少有一个模型无法从技能获益。

The same skill behaves inconsistently across different models and frameworks. 15% of tasks degrade in performance, and 87% have at least one model that cannot benefit from the skill.

2. 核心设计理念Core Design Philosophy

将技能视为"代码"，LLM视为"异构处理器"，用编译器思想解决技能移植与效率问题，实现"一次编写，处处高效运行"。

Treating skills as "code" and LLMs as "heterogeneous processors", using compiler concepts to solve skill portability and efficiency problems.

3. 系统性解决方案Systematic Solution

通过能力画像（26种原子能力）、AOT编译（能力适配/环境绑定/并发提取）和JIT优化（代码固化/自适应重编译），构建完整的编译与运行时体系。

Building a complete compilation and runtime system through capability profiling (26 primitive capabilities), AOT compilation, and JIT optimization.

4. 实际效果显著Significant Results

任务完成率平均提升15.3%，Token消耗最多降低40%，并行加速至3.2倍，代码固化延迟降低19-50倍。

15.3% average task completion improvement, up to 40% token reduction, 3.2× parallel speedup, and 19-50× latency reduction through code solidification.

5. 生态兼容性强Strong Ecosystem Compatibility

支持 OpenClaw、Hermes、OpenCode、pi Agent 等主流框架，已内置预构建的能力画像。

Supports OpenClaw, Hermes, OpenCode, pi Agent and other mainstream frameworks with pre-built capability profiles.

📖 正文内容

📖 Main Content

一、背景与问题Background and Problems

1.1 技能生态的快速发展

1.1 Rapid Development of Skills Ecosystem

2025年以来，AI Agent 智能体生态进入爆发期。OpenClaw、Hermes、Claude Code 等框架通过 Skills（技能）模块化封装领域知识，形成繁荣的技能市场。clawhub.ai 和 skills.sh 两大平台已上架超过118,000个技能包，覆盖数据分析、金融分析、办公自动化、编程开发等几乎所有常见场景。

Since 2025, the AI Agent ecosystem has entered an explosive growth period. Frameworks like OpenClaw, Hermes, and Claude Code encapsulate domain knowledge through modular Skills, creating a thriving skills marketplace with over 118,000 skills across various domains.

1.2 技能的"水土不服"问题

1.2 Skills "Water and Soil Incompatibility" Problem

上海交通大学IPADS实验室对11.8万个技能进行系统分析，发现了一个令人不安的事实：同一技能在不同模型或框架下的表现天差地别。

Shanghai Jiao Tong University's IPADS lab conducted a systematic analysis of 118,000 skills and discovered a troubling fact: the same skill behaves vastly differently across models or frameworks.

15%

任务性能下降

Tasks with degraded performance

17%

得分不变

Tasks with unchanged score

87%

至少一模型无法获益

Tasks where at least one model cannot benefit

1.3 三大"不匹配"挑战

1.3 Three Major "Mismatch" Challenges

🔴 模型不匹配Model Mismatch

不同AI模型的能力差异极大。技能包的写法往往默认模型具备完美遵循指令的能力，导致在能力较弱的模型上无法正确理解和执行。

Different AI models have vastly different capabilities. Skills are often written assuming the model has perfect instruction-following ability, causing failures on weaker models.

🟡 脚手架不匹配Harness Mismatch

智能体运行在"代理脚手架（Agent Harness）"之中，不同框架在工具调用接口、上下文管理方式、错误处理机制上的差异，会直接影响技能的执行效果。

Agents run within "harnesses" with different tool call interfaces, context management, and error handling mechanisms that directly affect skill execution.

🟠 环境不匹配Environment Mismatch

技能包中可能声明"需要Python 3.9+"、"依赖numpy库"，但用户机器上根本没有安装这些依赖，导致模型盲目重试浪费Token。

Skills may require certain Python packages or tools that aren't installed on the user's machine, causing blind retries and token waste.

二、SkVM的核心设计理念SkVM's Core Design Philosophy

2.1 灵感来源：编译器思想

2.1 Inspiration: Compiler Concepts

研究团队提出了一个核心观点：在Agent时代，Skills就是"代码"，而不同的LLM就是"异构的处理器"。

The research team proposed a core insight: In the Agent era, Skills are "code" and different LLMs are "heterogeneous processors".

传统计算机系统	Traditional Computing	Agent时代	Agent Era
C语言源码	C Source Code	Skills	Skills (natural language code)
处理器	Processors	LLM模型	LLM Models
编译器+虚拟机	Compiler + VM	SkVM	SkVM

2.2 SkVM整体架构

2.2 SkVM Architecture

SkVM借鉴了传统编译技术中的三种编译方式：

SkVM draws from three compilation techniques in traditional computing:

解释执行Interpreted execution：当前主流方式，直接将技能文本传递给模型
提前编译（AOT）Ahead-of-Time (AOT) compilation：在技能安装时预先编译和优化
即时优化（JIT）Just-in-Time (JIT) optimization：在运行过程中动态优化

AOT编译阶段（安装时）

AOT Compilation (at install time)

基于能力的编译
Capability-based compilation
环境绑定
Environment binding
并发提取
Concurrency extraction

→

JIT优化阶段（运行时）

JIT Optimization (at runtime)

代码固化
Code solidification
自适应重编译
Adaptive recompilation

三、AOT提前编译的三个步骤Three Steps of AOT Compilation

3.1 步骤一：基于能力的编译

3.1 Step 1: Capability-Based Compilation

研究者提炼出了跨越四个类别的26种原始能力，并为每种能力定义了多个熟练度级别：

Researchers extracted 26 primitive capabilities across four categories, with multiple proficiency levels for each:

类别	Category	原始能力示例	Example
代码生成	Code Generation	gen.code.shell	L1: 基础命令 / L2: 管道重定向 / L3: 复杂脚本	L1: Basic / L2: Piping / L3: Complex
工具使用	Tool Usage	tool.api.call	L1-L5 不同复杂度	L1-L5 varying complexity
推理能力	Reasoning	reason.planning	L1-L3 不同深度	L1-L3 varying depth
指令遵循	Instruction Following	follow.json.format	L1-L3 不同严格程度	L1-L3 varying strictness

3.2 步骤二：环境绑定

3.2 Step 2: Environment Binding

编译器从技能中提取所有依赖项，生成安装/检验脚本。运行前一键配好环境，避免模型盲目重试。

The compiler extracts all dependencies from the skill and generates install/check scripts for one-click environment setup.

# 环境绑定示例
if ! python -c "import pandas" 2>/dev/null; then
    pip install pandas
fi

3.3 步骤三：并发提取

3.3 Step 3: Concurrency Extraction

AOT编译能够发掘技能执行过程中不同粒度的并行机会：

AOT compilation discovers parallel opportunities at different granularities:

DLP 数据级并行Data-Level

一条指令，多个数据（如批量处理多个文件）

One instruction, multiple data (e.g., batch file processing)

ILP 指令级并行Instruction-Level

无依赖的指令并行发射（如独立的API调用）

Parallel emission of independent instructions (e.g., independent API calls)

TLP 线程级并行Thread-Level

多个独立sub-agent完成不同子任务

Multiple independent sub-agents for different subtasks

四、JIT运行时优化JIT Runtime Optimization

4.1 代码固化

4.1 Code Solidification

Skill中定义的脚本往往是可变参数的代码模板。每次运行LLM都需要重新生成，导致Token浪费。代码固化技术：

Skill-defined scripts are often template code with parameters. LLM regenerates them each time, wasting tokens. Code solidification:

AOT阶段：生成代码指纹、模板、参数列表
运行阶段：匹配成功则直接固化执行，跳过LLM重新生成
4.2 自适应重编译
4.2 Adaptive Recompilation

运行中出现报错/重试时，系统收集错误日志反馈给编译器，自动重新优化技能，防止同类错误重复发生。

When errors occur during execution, the system collects error logs and feeds them back to the compiler for automatic skill re-optimization.

五、实验结果与效果评估Experimental Results

+15.3%

任务完成率提升

Task completion rate increase

-40%

Token消耗降低

Token consumption reduction

3.2×

并行加速

Parallel speedup

19-50×

延迟降低

Latency reduction

支持的框架

Supported Frameworks

OpenClaw Hermes OpenCode pi Agent BareAgent

六、SkVM安装与使用指南SkVM Installation and Usage Guide

6.1 安装SkVM

6.1 Install SkVM

# curl一键安装（macOS / Linux）
curl -fsSL https://skillvm.ai/install.sh | sh

# 或通过npm安装
npm i -g @ipads-skvm/skvm

6.2 快速开始

6.2 Quick Start

# 1. 配置SkVM
skvm config init

# 2. 画像模型能力（约20分钟）
skvm profile \
  --adapter=bare-agent \
  --model=anthropic/claude-sonnet-4.6

# 3. 编译技能
skvm aot-compile \
  --skill=path/to/skill-dir \
  --model=anthropic/claude-sonnet-4.6 \
  --adapter=bare-agent \
  --pass=1

# 4. JIT优化
skvm jit-optimize \
  --skill=path/to/skill-dir \
  --task-source=synthetic

七、对知识库建设的启示Implications for Knowledge Base Development

📋 技能元数据标准化Skill Metadata Standardization

明确标注适用的模型能力等级
Clearly mark applicable model capability levels
声明所有依赖（Python包、工具、系统服务）
Declare all dependencies
提供并发执行提示
Provide concurrency execution hints

🧪 技能测试体系Skill Testing System

跨模型测试（至少2-3个主流模型）
Cross-model testing
跨框架测试（OpenClaw、Hermes等）
Cross-framework testing
环境依赖测试
Environment dependency testing

📝 技能编写规范Skill Writing Standards

避免复杂的相对路径
Avoid complex relative paths
避免过于复杂的shell管道
Avoid overly complex shell pipes
提供降级方案
Provide fallback options

八、总结与展望Summary and Outlook

SkVM代表了Agent技能系统的一个重要方向：从"自然语言代码"到"可编译可优化"的技能文件系统。

SkVM represents an important direction for Agent skills systems: from "natural language code" to "compilable and optimizable" skill file systems.

核心价值

Core Value

提升稳定性：通过能力适配和环境绑定，解决技能在不同环境下的不稳定性
Improved stability: Resolves skill instability across environments
提高效率：通过并发提取和代码固化，大幅降低Token消耗和延迟
Higher efficiency: Significantly reduces token consumption and latency
促进生态：通过统一的能力画像和编译接口，促进技能跨平台共享
Ecosystem promotion: Facilitates cross-platform skill sharing

未来方向

Future Directions

更丰富的原子能力（扩展26种原始能力）
Richer primitive capabilities
更智能的编译策略（引入机器学习）
Smarter compilation strategies
更强的安全机制
Stronger security mechanisms
更广泛的生态支持
Broader ecosystem support

🔗 相关链接

💭 思考与实践

💭 Reflections and Practice

对Agent开发的启示

Implications for Agent Development

SkVM的成功表明，Agent系统的未来在于工程化而非"魔法提示词"。

SkVM's success shows that the future of Agent systems lies in engineering rather than "magic prompts".

放弃"万能提示词"幻想：承认不同模型、不同框架的能力差异
Abandon the "universal prompt" illusion
拥抱编译思维：像写代码一样写技能
Embrace compiler thinking
关注效率：Token消耗和延迟是生产环境的关键指标
Focus on efficiency
建立测试体系：技能不是"写完就忘"
Build testing systems

个人反思

Personal Reflection

SkVM最打动我的，是它将一个看似混乱的问题（技能在不同环境下表现不一致），通过系统性的分析（三种不匹配）和经典的设计模式（编译器），转化为一个可工程化解决的问题。

What strikes me most about SkVM is how it transforms a seemingly chaotic problem (skills behaving inconsistently across environments) into an engineerable solution through systematic analysis and classic design patterns.

这让我想起雷军的"七字诀"：专注、极致、口碑、快。

This reminds me of Lei Jun's "Seven Character Mantra": Focus, Extremity, Reputation, Speed.

专注：SkVM专注解决一个问题——技能的可移植性和效率
Focus: Solving one problem
极致：通过26种原子能力、AOT+JIT双层优化，做到极致
Extremity: Through comprehensive optimization
口碑：在8个模型、3个框架上评测，用数据说话
Reputation: Measured rigorously
快：19-50倍延迟降低，3.2倍加速，真正的快
Speed: Real performance gains

这就是工程师思维的力量：不抱怨问题，而是用系统化的方法解决问题。

This is the power of engineering thinking: don't complain about problems, solve them systematically.

⚙ SkVM - Skills虚拟机深度研究