Northwestern Polytechnical
Audio Speech & Language Processing Group
Digital Signal Processing
  • English
您是第counter free hit unique web位访客


Wireless Communications Speech Processing Medical Applications

西班牙Telefonica Research研究员Xavier Anguera博士来访

      应谢磊教授邀请,西班牙Telefonica Research研究员Xavier Anguera博士于2013年9月8日至12日访问西北工业大学计算机学院和陕西省语音与图像处理重点实验室。Xavier博士长期从事语音处理和多媒体内容分析方面的研究工作。9月9日上午10:30分,Xavier Auguera博士在计算机学院105报告厅做了题目为“Selected Topics in Multimedia Analytics”(多媒体内容分析的热点问题)的学术报告,内容涵盖多模态视频拷贝检测(Multimodal Video-Copy Detection)、说话人识别(Speaker Recognition)、网络口语搜索(Spoken Web Search)三个研究课题。Xavier Auguera博士是MediaEval2013国际评测中Spoken Web Search任务的组织者之一,他特别的介绍了该项国际评测的历史和研究进展。报告会后,Xavier与师生展开了讨论,现场气氛热烈。Xavier博士在实验室访问期间,进行了实验室参观,并同谢磊教授、付中华教授及部分师生在说话人识别、Zero/Low Resource based Spoken Web Search等研究任务上进行了深入讨论。

      报告题目:Selected Topics in Multimedia Analytics
      报告摘要:In this talk I will cover three topics that I have been working on during the last 3 years. First, I will talk about multimodal video-copy detection, which focuses on finding whether a given video contains any modified video excerpts obtained from an original video. For this we implemented a novel binary audio fingerprint that we call MASK, which I will describe. Next, I will talk about speaker recognition, in which we want to find whether an audio recording of a speaker belongs or not to its claimed identity. For this task we have developed a novel binary speaker representation and modeling technique. Last, I will speak about spoken web search, in which a given audio query is searched for inside a big audio database. For this we have proposed a novel algorithm based on Dynamic Time Warping (DTW) that allows to pre-index the audio database for faster retrieval of matches, and uses very little memory in comparison to standard DTW techniques.

Xavier Anguera博士简介:
      Xavier Anguera: Ing. [MS] 2001 UPC University (Barcelona, Spain), [MS] 2001 European Masters in Language and Speech, Ph.D. 2006 UPC University, with a thesis on speaker diarization for multi-microphone meeting recordings. From 2001 to 2003 he worked for Panasonic Speech Technology Lab in Santa Barbara, CA on text-to-speech for several languages. From 2004 to 2006 he was a visiting researcher at the International Computer Science Institute (ICSI) in Berkeley, CA. Since 2007 he is a research scientist at Telefonica Research in Barcelona. His research interests cover speech processing (both speaker and content-based) and multimodal multimedia processing. He has published over 60 peer reviewed papers and has several accepted or pending patents. He is an active member of IEEE and ACM associations, for which he has served in the organization and in the PC of several multimedia and speech conferences.



  • 校园风光