Northwestern Polytechnical
Audio Speech & Language Processing Group
Digital Signal Processing
  • English
您是第counter free hit unique web位访客


Wireless Communications Speech Processing Medical Applications


      应谢磊教授邀请,日本长冈技术科学大学王龙彪副教授2013年9月11日至15日对西北工业大学计算机学院和陕西省语音与图像处理重点实验室进行了学术访问。王龙彪博士长期从事语音识别、说话人识别的研究工作。9月12上午10:30分,王龙彪博士在计算机学院105报告厅做了题目为“Speaker recognition by combining MFCC and phase information in noisy conditions ”的学术报告,就如何结合MFCC特征和相位信息提高噪声条件下说话人识别的性能进行了深入报告。报告会后,王博士与师生展开了讨论,现场气氛热烈。王博士访问期间,参观了陕西省语音与图像信息处理重点实验室在语音与音频处理、图像与视频处理等方面的研究成果,并同谢磊教授、蒋冬梅教授、付中华教授在进行了深入讨论。

      目报告题目:Speaker recognition by combining MFCC and phase information in noisy conditions

      报告人简介:Longbiao Wang received his Dr. Eng. degree from Toyohashi University of Technology, Japan, in 2008. He was an assistant professor in the faculty of Engineering at Shizuoka University, Japan from April 2008 to September 2012. Since October 2013 he has been an associate professor at Nagaoka University of Technology, Japan. His research interests include robust speech recognition and speaker recognition. He received the “Chinese Government Award for Outstanding Self-financed Students Abroad” in 2008. He is a member of IEEE, the Institute of Electronics, ISCA, APSIPA, Information and Communication Engineers (IEICE) and the Acoustical Society of Japan (ASJ).

      报告摘要: In this talk, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. We propose a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech, and evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.




  • 校园风光