一、什么是情感计算?
1. What is Affective Computing?
情感计算(Affective Computing)是由麻省理工学院媒体实验室的Rosalind Picard教授于1997年首次提出的概念,旨在研究和开发能够识别、理解、处理人类情感的智能系统。这一领域横跨计算机科学、心理学、认知科学和神经科学,旨在弥合人类情感体验与机器交互之间的鸿沟。情感计算的核心目标是赋予计算机感知、识别和理解人类情感状态的能力,使其能够做出更加自然、富有同理心的人机交互响应。
Affective Computing, a term first coined by Professor Rosalind Picard at MIT Media Lab in 1997, refers to the study and development of intelligent systems capable of recognizing, understanding, and processing human emotions. This interdisciplinary field spans computer science, psychology, cognitive science, and neuroscience, aiming to bridge the gap between human emotional experience and machine interaction. The core objective of affective computing is to endow computers with the ability to perceive, recognize, and understand human emotional states, enabling them to provide more natural and empathetic human-computer interaction responses.
随着深度学习技术的飞速发展,情感计算在近年来取得了突破性进展。从最初基于规则的表情识别,到如今融合多模态生物信号的复杂情感分析系统,情感计算正在深刻改变人机交互的方式。2024年,慕尼黑工业大学的研究团队在《Intelligent Computing》期刊上发表的综述指出,虽然深度学习显著推动了情感计算的发展,但过度依赖深度学习可能会阻碍进步,因为这种方法忽视了人工智能其他新兴趋势的潜力。他们主张采用多元化的AI方法论来应对情感计算面临的持续挑战。
With the rapid advancement of deep learning technologies, affective computing has achieved breakthrough progress in recent years. From initial rule-based facial expression recognition to today's complex affective analysis systems that integrate multimodal biological signals, affective computing is profoundly transforming human-computer interaction. In 2024, a research team from the Technical University of Munich published a comprehensive review in the journal "Intelligent Computing," suggesting that while deep learning has significantly improved emotion recognition through innovations like transfer learning, self-supervised learning, and transformer architectures, an over-reliance on deep learning may hinder progress by overlooking the potential of other emerging trends in artificial intelligence. They advocate for using diverse AI methodologies to address ongoing challenges in affective computing.
二、情感计算的核心技术架构
2. Core Technical Architecture of Affective Computing
2.1 多模态情感感知系统
2.1 Multimodal Affective Perception Systems
人类的情感表达具有天然的多模态特性——面部表情、语音语调、肢体语言和文本内容共同构成了情感表达的完整图谱。多模态情感感知系统通过整合来自不同感官通道的信息,实现对情感状态的全方位感知。清华团队于2025年发布的毫米波雷达AC系统,可在3米外无感采集呼吸、体动和微表情,对住院老人情绪压抑识别灵敏度达89.6%。腾讯"隐形护理员"采用4K红外、远场麦克风阵列和环境声谱分析技术,实现语音、微表情和体动三模态毫秒级时间对齐,延迟仅12毫秒。
Human emotional expression inherently possesses multimodal characteristics—facial expressions, voice tone, body language, and textual content collectively constitute the complete picture of emotional expression. Multimodal affective perception systems achieve comprehensive perception of emotional states by integrating information from different sensory channels. In 2025, a Tsinghua University team released a millimeter-wave radar AC system capable of sensing breath, body movement, and micro-expressions from 3 meters away without contact, achieving 89.6% sensitivity in recognizing emotional suppression in hospitalized elderly patients. Tencent's "Invisible Caretaker" uses 4K infrared, far-field microphone arrays, and environmental sound spectrum analysis to achieve millisecond-level time alignment of three modalities—speech, micro-expressions, and body movement—with a delay of only 12 milliseconds.
语音语调特征深度建模方面,2024年科大讯飞iFLYTEK-EmoVoice模型在中文抑郁语音识别任务中,通过语速、基频和停顿时长的联合分析,准确率达到87.3%,较单模态提升21.5%。上海交大2025年提出的ST-MEAN网络,对AU12(嘴角上扬)动态建模,识别持续时间超过0.3秒的积极情绪准确率达91.8%,误报率仅4.2%。这些数据充分说明,多模态融合策略能够显著提升情感识别的准确性和可靠性。
In terms of deep modeling of speech and tone characteristics, iFLYTEK's EmoVoice model in 2024 achieved 87.3% accuracy in Chinese depression speech recognition through joint analysis of speech rate, fundamental frequency, and pause duration—a 21.5% improvement over single-modality approaches. In 2025, Shanghai Jiao Tong University proposed the ST-MEAN network, which dynamically models AU12 (lip corner puller) and achieves 91.8% accuracy in recognizing positive emotions lasting more than 0.3 seconds, with a false positive rate of only 4.2%. These data demonstrate that multimodal fusion strategies can significantly enhance the accuracy and reliability of emotion recognition.
2.2 情感量化的两种范式
2.2 Two Paradigms of Emotion Quantification
情感量化模型主要分为离散模型与连续模型两大范式。离散情感模型基于心理学理论,将人类情绪划分为离散标签。著名心理学家Ekman指出,所有文化中都存在一组普适的情感状态,包括愤怒、厌恶、恐惧、快乐、悲伤和惊讶六种基本情感。然而,人类情绪非常复杂,远不止六种基本情感,将情绪空间限制于基本类别必然会忽略一些细微情绪差异。
Emotion quantification models are primarily divided into two paradigms: discrete models and continuous models. Discrete emotion models are based on psychological theory, categorizing human emotions into discrete labels. The renowned psychologist Ekman proposed that all cultures share a set of universal emotional states, including six basic emotions: anger, disgust, fear, happiness, sadness, and surprise. However, human emotions are far more complex than just these six basic emotions, and restricting the emotional space to basic categories inevitably overlooks subtle emotional differences.
维度情感表示则采用不同的方法论,将人类情绪建模为连续多维空间中的一个点,能够建模细粒度情感。例如,Russell提出的PAD模型采用三个维度:效价度(Valence)描述情绪积极程度,唤醒度(Arousal)描述情绪兴奋水平,支配度(Dominance)描述人类对情绪的控制力度。这种连续表示方法虽然能够捕捉更细微的情感变化,但相对于离散表示更为抽象,与人类对于情感的直观感受不一致,导致维度情感在下游任务中应用较为受限。2024年,Colombetti等学者讨论了效价和唤醒度之间的关系,Smith等学者回顾了唤醒度在生理学中的起源,对其作为量化维度提出了质疑。
Dimensional emotion representation adopts a different methodology, modeling human emotions as a point in a continuous multidimensional space, capable of capturing fine-grained emotions. For example, Russell's PAD model employs three dimensions: Valence describing the positivity of emotion, Arousal describing the level of emotional excitement, and Dominance describing the individual's control over the emotion. Although this continuous representation method can capture more subtle emotional variations, it is more abstract compared to discrete representation and inconsistent with human intuitive feelings about emotions, leading to limited application of dimensional emotions in downstream tasks. In 2024, scholars like Colombetti discussed the relationship between valence and arousal, while Smith et al. reviewed the physiological origins of arousal, questioning its validity as a quantification dimension.
2.3 深度学习融合与下一代神经网络
2.3 Deep Learning Fusion and Next-Generation Neural Networks
深度学习技术为情感计算提供了强大的特征提取和模式识别能力。传统的深度学习方法包括CNN-RNN混合架构,在2024年Emohaa模型在PHQ-9和GAD-7联合预测任务中,MAE达到1.03,覆盖情绪维度达7类(包括羞耻、希望等复杂情感)。然而,深度学习也面临一些挑战,包括泛化能力差、文化适应性问题和可解释性不足。
Deep learning technology provides powerful feature extraction and pattern recognition capabilities for affective computing. Traditional deep learning methods include CNN-RNN hybrid architectures. The 2024 Emohaa model achieved an MAE of 1.03 in PHQ-9 and GAD-7 joint prediction tasks, covering 7 emotion dimensions (including complex emotions like shame and hope). However, deep learning faces challenges including poor generalization, cultural adaptability issues, and insufficient interpretability.
下一代神经网络正在超越传统深度学习模型,解决捕获复杂数据结构、空间关系和能效方面的局限性。胶囊网络(Capsule Networks)通过保留空间层级结构增强了卷积网络,改善了对复杂实体(如人体部位)的建模能力,这在医疗保健和情感识别中至关重要。几何深度学习将深度学习扩展到非欧几里得结构,能够更好地理解复杂数据交互,在情感分析和面部分析中特别有用。脉冲神经网络(Spiking Neural Networks)模拟生物神经元的阈值触发机制,为实时应用提供了更节能的替代方案,适用于资源受限的环境。
Next-generation neural networks are advancing beyond traditional deep learning models to address limitations in capturing complex data structures, spatial relationships, and energy efficiency. Capsule networks enhance convolutional networks by preserving spatial hierarchies, improving the modeling of complex entities such as human body parts, which is crucial in healthcare and emotion recognition. Geometric deep learning extends deep learning to non-Euclidean structures, allowing for better understanding of complex data interactions, particularly useful in sentiment analysis and facial analysis. Spiking neural networks mimic the threshold-based firing of biological neurons, offering more energy-efficient alternatives for real-time applications, making them suitable for environments with limited resources.
三、情感计算的应用场景
3. Application Scenarios of Affective Computing
3.1 医疗健康领域
3.1 Healthcare and Medical Field
在医疗健康领域,情感计算技术正在发挥越来越重要的作用。对于抑郁症或其他心理疾病的患者,情感识别技术可以帮助医生更准确地评估患者的情绪状态变化,及时调整治疗方案。在老年人护理方面,情感机器人可以帮助护理人员更好地理解老人的需求和感受,提供更加人性化的照护服务。2025年,"白泽"机器人通过7天连续监测建立用户情感基线,能在老人入院后第3天即识别出其提及"老伴"时语音能量下降57%、眼周肌紧张增加22%的异常情绪信号。
In the healthcare sector, affective computing technology is playing an increasingly important role. For patients with depression or other mental disorders, emotion recognition technology can help doctors more accurately assess changes in patients' emotional states and timely adjust treatment plans. In elderly care, affective robots can help caregivers better understand the needs and feelings of elderly people, providing more humanized care services. In 2025, the "Bai Ze" robot establishes a user emotional baseline through 7 days of continuous monitoring, capable of identifying abnormal emotional signals on the third day after a patient's hospitalization—such as a 57% decrease in speech energy and 22% increase in periocular muscle tension when the elderly person mentions their "partner."
2025年3月,国家网信办发布《情感计算服务伦理审查指南》,明确禁止利用情感计算技术诱导消费或政治倾向,规范行业发展。Character.AI的情感角色引擎支持用户设定"共情强度参数",在模拟心理咨询对话中情感一致性达88.4%(人工评估)。这些监管措施和技术进步共同推动情感计算在医疗健康领域的健康发展。
In March 2025, the Cyberspace Administration of China published the "Guidelines for Ethical Review of Affective Computing Services," explicitly prohibiting the use of affective computing technology for consumer inducement or political manipulation, standardizing industry development. Character.AI's emotional character engine supports users in setting "empathy intensity parameters," achieving 88.4% emotional consistency (human evaluation) in simulated psychological counseling conversations. These regulatory measures and technological advancements jointly promote the healthy development of affective computing in healthcare.
3.2 智能教育领域
3.2 Intelligent Education Field
在线教育平台可以实时监测学生的学习状态,当发现学生表现出困惑或沮丧的表情时,系统可以自动调整教学节奏或提供额外的帮助。虚拟教师通过情感智能技术,能够识别学生的情感状态,动态调整教学内容和方式。例如,当学生表现出困惑或疲倦时,虚拟教师可以通过增加互动环节或安排休息时间来激发学生的学习兴趣。这就像给每个学生配备了一个24小时在线的贴心老师,实现真正的个性化教育。
Online education platforms can monitor students' learning states in real-time. When students display confused or frustrated expressions, the system can automatically adjust teaching pace or provide additional assistance. Virtual teachers, through affective intelligence technology, can recognize students' emotional states and dynamically adjust teaching content and methods. For example, when students show confusion or fatigue, virtual teachers can stimulate learning interest by increasing interactive elements or arranging rest periods. This is like providing each student with a 24-hour online attentive teacher, achieving truly personalized education.
3.3 智能客服与人机交互
3.3 Intelligent Customer Service and Human-Computer Interaction
在客户服务场景中,多模态情感识别能够通过分析用户的语音语调、面部表情和语言内容,实时感知用户情绪变化,为客服代表提供决策支持或直接驱动对话系统的响应策略。传统质检靠关键词和通话时长,但一句"好的,我知道了"可能是敷衍,也可能是释然。情感识别让情绪成为可量化的质检维度。某保险公司的投诉工单分析显示,集成情感识别的智能客服系统能够将客户满意度提升25%以上,平均处理时间减少30%。
In customer service scenarios, multimodal emotion recognition can analyze users' voice tone, facial expressions, and language content to perceive changes in user emotions in real-time, providing decision support for customer service representatives or directly driving the response strategies of dialogue systems. Traditional quality inspection relies on keywords and call duration, but a phrase like "Okay, I understand" might be perfunctory or might indicate relief. Emotion recognition transforms emotions into quantifiable quality inspection dimensions. Analysis of complaint records from an insurance company shows that intelligent customer service systems integrated with emotion recognition can increase customer satisfaction by over 25% and reduce average handling time by 30%.
3.4 家庭陪伴机器人
3.4 Family Companion Robots
单身经济与老龄化趋势推动了情感机器人在家庭陪伴领域的需求。独居老人可以通过机器人倾诉心事,机器人能够捕捉孤独感并播放老歌等进行情感陪伴。2026年4月,心忆科技发布"心忆·康康"家庭情感陪伴机器人,基于自研BSLA心理学模型,构建了"识别→响应→解释"的完整情感计算链路。该机器人采用25cm极窄底盘设计,实现全屋通行能力,通过边缘计算与联邦学习架构,实现情感识别本地化与隐私全程守护。
The trends of single-person households and aging population are driving demand for emotional robots in family companionship. Lonely elderly people can confide in robots, which can detect loneliness and provide emotional companionship through playing old songs. In April 2026, Xinyi Technology released the "XINYI·Kangkang" family emotional companion robot, based on its self-developed BSLA psychological model, constructing a complete affective computing chain of "recognition → response → explanation." The robot features a 25cm ultra-narrow chassis design enabling full-house mobility, achieving local emotion recognition and full-process privacy protection through edge computing and federated learning architecture.
四、情感计算的挑战与未来趋势
4. Challenges and Future Trends of Affective Computing
4.1 当前面临的主要挑战
4.1 Major Challenges Currently Faced
尽管情感计算取得了显著进展,但仍面临诸多挑战。首先是跨文化偏差问题:2024年MIT调研显示,主流情感计算模型对东亚人群"克制型微笑"识别错误率达38.7%,误判为中性。华为2025年发布的East-Emo数据集将其降至12.4%,但仍有改进空间。其次是细微情绪区分瓶颈:愤怒与厌恶在面部AU4+AU15组合中相似度达82%,2024年中科院自动化所开发的微动补偿算法使区分准确率从63.5%升至87.9%。
Despite significant progress, affective computing still faces many challenges. First, cross-cultural bias: A 2024 MIT survey showed that mainstream affective computing models have a 38.7% error rate in recognizing "restrained smiles" among East Asian populations, often misjudging them as neutral. Huawei's 2025 East-Emo dataset reduced this to 12.4%, but there is still room for improvement. Second, the bottleneck in distinguishing subtle emotions: Anger and disgust have 82% similarity in facial AU4+AU15 combinations. A micro-motion compensation algorithm developed by the Institute of Automation at the Chinese Academy of Sciences in 2024 improved discrimination accuracy from 63.5% to 87.9%.
隐私保护也是重大挑战:2024年某养老平台微表情数据库遭入侵,暴露2.1万人面部几何特征。2025年工信部强制要求情感计算设备本地化处理,原始视频禁止上传云端。《深圳经济特区人工智能条例》首次明确"情感数据属于人格权延伸",用户可随时撤回授权。此外,算法偏见加剧健康不平等:2024年北大研究发现,现有情感计算模型对农村老年人抑郁识别敏感度比城市低29.6%。
Privacy protection is also a major challenge: In 2024, a nursing platform's micro-expression database was breached, exposing facial geometric features of 21,000 people. In 2025, the Ministry of Industry and Information Technology mandated that affective computing devices process data locally, prohibiting original videos from being uploaded to the cloud. The "Regulations on Artificial Intelligence of Shenzhen Special Economic Zone" first clarified that "emotional data belongs to the extension of personality rights," allowing users to withdraw authorization at any time. Additionally, algorithmic bias exacerbates health inequality: A 2024 Peking University study found that existing affective computing models have 29.6% lower sensitivity in detecting depression among rural elderly compared to urban populations.
4.2 未来发展趋势
4.2 Future Development Trends
展望未来,情感计算将沿着几个关键方向发展。首先是神经符号混合建模:2025年清华团队发布的Neuro-Symbolic EmoLogic,将CNN提取特征输入逻辑规则引擎,使抑郁诊断可解释性达91.4%,获AAAI 2025最佳应用奖。这种方法结合了深度学习的模式识别能力和传统AI的符号推理能力,提高了模型的可解释性和鲁棒性。
Looking ahead, affective computing will develop along several key directions. First, neurosymbolic hybrid modeling: In 2025, the Tsinghua University team released Neuro-Symbolic EmoLogic, which inputs CNN-extracted features into a logic rule engine, achieving 91.4% interpretability in depression diagnosis and winning the AAAI 2025 Best Applications Award. This approach combines the pattern recognition capabilities of deep learning with the symbolic reasoning capabilities of traditional AI, improving model interpretability and robustness.
其次是个性化与联邦学习结合:通过联邦学习,用户数据可以保持在本地设备上,只上传加密梯度更新,在保护隐私的同时实现个性化情感分析。第三是边缘计算实时分析:华为昇腾Atlas 500设备搭载轻量化情感计算模型,在"小康"机器人中实现端侧微表情分析,功耗仅8.3W,续航达16小时。第四是情感合成的人性化:虽然当前情感合成语音在"悲伤"维度自然度仅68.3分(满分100),远低于人类语音92.1分,但随着扩散模型等生成式AI技术的进步,这一差距正在逐步缩小。
Second, the combination of personalization and federated learning: Through federated learning, user data can remain on local devices, with only encrypted gradient updates uploaded, achieving personalized emotion analysis while protecting privacy. Third, edge computing real-time analysis: Huawei's Ascend Atlas 500 device is equipped with lightweight affective computing models, achieving end-side micro-expression analysis in the "Xiaokang" robot with only 8.3W power consumption and 16-hour battery life. Fourth, humanization of emotion synthesis: While current emotional synthesis speech scores only 68.3 out of 100 on the "sadness" dimension—far below the 92.1 of human speech—advances in generative AI technologies like diffusion models are gradually narrowing this gap.
五、总结与思考
5. Summary and Reflections
情感计算作为人工智能领域的重要研究方向,正在从实验室走向实际应用。从医疗健康到智能教育,从客户服务到家庭陪伴,情感计算技术正在深刻改变人机交互的方式。然而,技术的发展也带来了隐私保护、伦理规范等新挑战。未来,我们需要更多跨学科合作,将心理学、伦理学的洞见融入技术设计,确保情感计算技术在尊重人类尊严和隐私的前提下,真正造福人类社会。
Affective computing, as an important research direction in artificial intelligence, is moving from laboratories to practical applications. From healthcare to intelligent education, from customer service to family companionship, affective computing technology is profoundly transforming human-computer interaction. However, technological development also brings new challenges such as privacy protection and ethical regulation. In the future, we need more interdisciplinary collaboration, integrating insights from psychology and ethics into technology design, ensuring that affective computing technology truly benefits human society while respecting human dignity and privacy.
随着多模态大模型和边缘计算技术的发展,情感智能有望成为人工智能系统的重要能力之一。能够理解人类情绪的智能系统将在教育、人机交互和医疗辅助等应用场景中发挥重要价值。清华团队的研究表明,通过巧妙地结合判别性学习和生成性学习,AI系统能够更准确、更稳定地识别人类情感。这不仅是技术上的进步,更是向着构建更加智能、更加人性化的AI系统迈出的重要一步。
With the development of multimodal large language models and edge computing technology, emotional intelligence is expected to become an important capability of artificial intelligence systems. Intelligent systems capable of understanding human emotions will play significant value in applications such as education, human-computer interaction, and medical assistance. Research from the Tsinghua University team shows that by cleverly combining discriminative and generative learning, AI systems can recognize human emotions more accurately and stably. This represents not only technological advancement but also an important step toward building more intelligent and humanized AI systems.
核心知识点速览
Key Knowledge Points at a Glance
- 情感计算定义:由Rosalind Picard于1997年提出,旨在赋予计算机感知、识别和理解人类情感状态的能力
- Definition of Affective Computing: Proposed by Rosalind Picard in 1997, aiming to endow computers with the ability to perceive, recognize, and understand human emotional states
- 情感量化两种范式:离散模型(如Ekman的6种基本情绪)和维度模型(如PAD三维度模型)
- Two Paradigms of Emotion Quantification: Discrete models (such as Ekman's 6 basic emotions) and dimensional models (such as the PAD three-dimensional model)
- 多模态融合策略:特征层融合、决策层融合和模型层融合,其中模型层融合是2024年的研究趋势
- Multimodal Fusion Strategies: Feature-level fusion, decision-level fusion, and model-level fusion, with model-level fusion being the 2024 research trend
- 核心技术架构:多模态感知层、情感特征提取、深度学习分析、个性化响应生成
- Core Technical Architecture: Multimodal perception layer, emotion feature extraction, deep learning analysis, personalized response generation
- 主要应用场景:医疗健康、智能教育、智能客服、家庭陪伴机器人
- Main Application Scenarios: Healthcare, intelligent education, intelligent customer service, family companion robots
- 未来发展方向:神经符号混合建模、个性化联邦学习、边缘计算实时分析、生成式情感合成
- Future Development Directions: Neurosymbolic hybrid modeling, personalized federated learning, edge computing real-time analysis, generative emotion synthesis