面向自然口语对话的深层次信息感知与表达方法研究

基本信息

批准号：61375027

项目类别：面上项目

资助金额：78.00

负责人：吴志勇

学科分类：

依托单位：清华大学

批准年份：2013

结题年份：2017

起止时间：2014-01-01 - 2017-12-31

项目状态：已结题

项目参与者：杨明浩,温正棋,王晓慧,郑鑫,王欣,赵凯,吴锡欣,张冉,李昊

关键词：

对话意图音视融合人机交互深层次信息口语对话系统

结项摘要

While state-of-the-art spoken dialog system can understand the linguistic information of user's input speech, lots of subtle emotional and paralinguistic information, such as user's intension, attitude, affective states, etc, is largely neglected. Such information, named deep information related to communicative intentions in this project, plays a very important role in spoken language communication among humans in daily social interactions. People express themselves not only through audio channel (prosody and voice quality), but also through visual channel (expressions, head movements and even body gestures). Hence the deep information is expressed in audio and visual bimodalities. This project aims to develop methods for the perception and generation of deep information related to communicative intentions in both audio and visual modalities to provide more natural human computer spoken dialog interaction. This project intends: 1) To systematically analyze the correlations between deep information and the semantic meaning of current spoken dialog context, the audio and visual expressions from both sides of the two interactive speakers; 2) To propose a method for deep information perception (cognitive appraisal) such that the communicative intentions could be recognized from user's input by considering the information from current dialog context, audio and visual features; 3) To build a model for deep information prediction (response prediction) which predicts the communicative intentions of system's response based on the understanding of communicative intentions from user's input; 4) To establish an algorism for deep information expression (expression control) which generates appropriate audio and visual speech outputs according to the desired communicative intentions of system's response output; and 5) To propose a framework for deep information processing that integrates the above three aspects including cognitive appraisal, response prediction and expression control to form an all pass circuit for human computer spoken dialog interaction. It is expected that findings of this project will enrich the understanding of the relationship between dialog context and audio visual expressions in human-computer speech interaction. Findings can also extend its application in the field of natural human-computer interaction, visual reality, and intelligent agent for spoken dialog.

现有的口语对话系统在信息处理时，忽视了音视觉所传递的意图侧重、情感态度等反映对话意图的"深层次信息"，缺乏对其进行感知与表达的能力，导致系统输出平淡乏味，难以满足自然口语对话的要求。本项目拟系统地分析人们的自然口语对话过程；研究分析深层次信息与对话情境、语音视觉表现间的关系；提出用户输入的认知评估算法，建立融合对话情境、音视觉特征的深层次信息感知模型；提出系统响应的预测算法，建立深层次信息响应预测模型；提出系统输出的表达控制算法，实现深层次信息的音视觉表达生成；从语音和视觉多通道构建面向自然口语对话的深层次信息感知与表达方法（含认知评估、响应预测、表达控制），实现具有对话意图理解与表达能力的自然口语对话系统。相关成果将加深对言语交互过程中对话情境与音视觉表达间关系的理解，为在人机交互中建立更有效的音视觉感知与生成提供必要的理论基础，并积累相应的关键技术。本研究具有广泛的应用前景。

项目摘要

现有口语对话系统在处理时忽视了音视觉所传递的反映沟通意图的“深层次信息”，缺乏对其进行感知与表达的能力，难以满足自然口语对话的要求。本项目旨在从对话焦点入手，系统分析自然口语对话过程中信息表达的含义，研究对话焦点约束下的对话意图理解、对话意图的多模态表达的理解与呈现模型，研究新型的人机对话方法。.围绕上述目标，本项目取得的主要研究进展和成果如下：在对话焦点检测方面，提出了多模态的口语对话焦点感知和预测方法，实现由用户输入检测是否存在焦点；在对话意图理解方面，提出了基于多任务深度学习的用户意图理解模型，并将词向量模型用于对话系统意图分类，基于文本语音等多模态信息准确理解说话人意图；在对话建模管理方面，建立了语音图像对话管理模型，进行多模态深度融合内容理解及面向用户教授意图的答案反馈；在具有沟通意图表达功能的可视语音合成方面，提出了面向对话交互的焦点重音生成方法，利用双向长短时记忆网络构建音视觉参数映射模型，实现符合焦点重音表达需求的虚拟说话人脸像头动生成；在系统原型研制方面，构建了基于自我对话机制的用户教授意图的聊天机器人，研发了口语对话演示系统，实现了文本焦点及语音重音的自动检测、文本视觉语音融合的意图理解、凸显焦点意图表达的语音重音生成及虚拟人生成。.在国内外重点学术刊物和会议上发表学术论文46篇，其中SCI检索4篇，EI检索34篇，期刊论文6篇，CCF A类顶级会议论文3篇；获教育部科技进步二等奖，会议最佳论文奖，全球极客大赛“AI仿声验声攻防赛”第一名；培养毕业博士4人，毕业硕士12人；申请国家发明专利1项；科技成果转化93万元人民币。.本项目研究加深了对言语交互过程中话语意图与音视觉表达关系的理解，为人机交互中多模态意图感知理解、凸显意图的可视语音生成积累了关键技术。随着人工智能发展，本项目成果可应用在智能语音助手、智能音箱、聊天机器人、虚拟现实中等，具有广泛应用前景。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.16796/j.cnki.1000-3770.2022.03.003

发表时间：2022

DOI：

发表时间：

DOI：10.12202/j.0476-0301.2022178

发表时间：2022

DOI：

发表时间：

DOI：10.16383/j.aas.c180673

发表时间：2021

吴志勇的其他基金

批准号：21575019

批准年份：2015

资助金额：65.00

项目类别：面上项目

批准号：30972920

批准年份：2009

资助金额：31.00

项目类别：面上项目

批准号：40802074

批准年份：2008

资助金额：20.00

项目类别：青年科学基金项目

批准号：30070740

批准年份：2000

资助金额：18.00

项目类别：面上项目

批准号：51579065

批准年份：2015

资助金额：63.00

项目类别：面上项目

批准号：41001012

批准年份：2010

资助金额：21.00

项目类别：青年科学基金项目

批准号：20975018

批准年份：2009

资助金额：12.00

项目类别：面上项目

批准号：30571825

批准年份：2005

资助金额：25.00

项目类别：面上项目

批准号：60805008

批准年份：2008

资助金额：20.00

项目类别：青年科学基金项目

批准号：51779071

批准年份：2017

资助金额：60.00

项目类别：面上项目

相似国自然基金

面向口语对话系统的用户情感识别研究

批准号：61472117

批准年份：2014

负责人：全昌勤

学科分类：F0211

资助金额：81.00

项目类别：面上项目

对话管理为中心的双向多模态口语人机交互研究

批准号：90820303

批准年份：2008

负责人：徐波

学科分类：F0604

资助金额：280.00

项目类别：重大研究计划

幼儿汉语口语感知特点及神经机制

批准号：31471075

批准年份：2014

负责人：任桂琴

学科分类：C0907

资助金额：80.00

项目类别：面上项目

脑电空间分析与深层次信息提取

批准号：39300033

批准年份：1993

负责人：王云华

学科分类：C1005

资助金额：4.50

项目类别：青年科学基金项目

面向自然口语对话的深层次信息感知与表达方法研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

EBPR工艺运行效果的主要影响因素及研究现状

基于国产化替代环境下高校计算机教学的研究

复杂系统科学研究进展

基于LS-SVM香梨可溶性糖的近红外光谱快速检测

二维FM系统的同时故障检测与控制

吴志勇的其他基金

基于微纳界面浓度极化和分子穿孔效应的生物分析新方法研究

内皮源性血管活性物质对门静脉高压症内脏微血管RhoA/Rho激酶信号通路的影响

岩体结构信息化处理及岩体质量分级研究

cNOS和FasL基因联合转染预防门静脉高压症上消化道出血

多源土壤含水量高时空分辨率融合的大范围干旱定量识别方法研究

中国20世纪下半叶以来干旱情势量化研究

微纳流控效应及其在芯片毛细管电泳中的应用

tTG调控HSC凋亡和ECM交联在肝纤维化中的作用

音视融合的韵律模式的个性化研究

耦合作物模型与水文模型的农业干旱评估方法研究

相似国自然基金