Google at Interspeech 2023 – Google Analysis Weblog

Posted by Catherine Armato, Program Supervisor, Google

This week, the twenty fourth Annual Convention of the Worldwide Speech Communication Affiliation (INTERSPEECH 2023) is being held in Dublin, Eire, representing one of many world’s most in depth conferences on analysis and know-how of spoken language understanding and processing. Specialists in speech-related analysis fields collect to participate in oral displays and poster classes and to construct collaborations throughout the globe.

We’re excited to be a Platinum Sponsor of INTERSPEECH 2023, the place we might be showcasing greater than 20 analysis publications and supporting quite a lot of workshops and particular classes. We welcome in-person attendees to drop by the Google Analysis sales space to fulfill our researchers and take part in Q&As and demonstrations of a few of our newest speech applied sciences, which assist to enhance accessibility and supply comfort in communication for billions of customers. As well as, on-line attendees are inspired to go to our digital sales space in Topia the place you may get up-to-date data on analysis and alternatives at Google. Go to the @GoogleAI Twitter account to search out out about Google sales space actions (e.g., demos and Q&A classes). You too can be taught extra in regards to the Google analysis being introduced at INTERSPEECH 2023 beneath (Google affiliations in daring).

Board and Organizing Committee

ISCA Board, Technical Committee Chair: Bhuvana Ramabhadran

Space Chairs embody:
    Evaluation of Speech and Audio Alerts: Richard Rose
    Speech Synthesis and Spoken Language Technology: Rob Clark
    Particular Areas: Tara Sainath

Satellite tv for pc occasions

Keynote speak – ISCA Medalist

Survey Speak

Speech Compression within the AI Period
Speaker: Jan Skoglund

Particular session papers

Cascaded Encoders for Fantastic-Tuning ASR Fashions on Overlapped Speech
Richard Rose, Oscar Chang, Olivier Siohan

TokenSplit: Utilizing Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Hakan Erdogan, Scott Knowledge, Xuankai Chang*, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

Papers

DeePMOS: Deep Posterior Imply-Opinion-Rating of Speech
Xinyu Liang, Fredrik Cumlin, Christian Schüldt, Saikat Chatterjee

O-1: Self-Coaching with Oracle and 1-Finest Speculation
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

Re-investigating the Environment friendly Switch Studying of Speech Basis Mannequin Utilizing Function Fusion Strategies
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno

MOS vs. AB: Evaluating Textual content-to-Speech Methods Reliably Utilizing Clustered Normal Errors
Joshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark

LanSER: Language-Mannequin Supported Speech Emotion Recognition
Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

Modular Area Adaptation for Conformer-Based mostly Streaming ASR
Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

On Coaching a Neural Residual Acoustic Echo Suppressor for Improved ASR
Sankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan

MD3: The Multi-dialect Dataset of Dialogues
Jacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma

Twin-Mode NAM: Efficient Prime-Okay Context Injection for Finish-to-Finish ASR
Zelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li

Utilizing Textual content Injection to Enhance Recognition of Private Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran

The way to Estimate Mannequin Transferability of Pre-trained Speech Fashions?
Zih-Ching Chen, Chao-Han Huck Yang*, Bo Li, Yu Zhang, Nanxin Chen, Shuo-yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

Bettering Joint Speech-Textual content Representations With out Alignment
Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Textual content Injection for Capitalization and Flip-Taking Prediction in Speech Fashions
Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Streaming Parrotron for On-Machine Speech-to-Speech Conversion
Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal

Semantic Segmentation with Bidirectional Language Fashions Improves Lengthy-Type ASR
W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

Common Computerized Phonetic Transcription into the Worldwide Phonetic Alphabet
Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang

Combination-of-Knowledgeable Conformer for Streaming Multilingual ASR
Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

Actual Time Spectrogram Inversion on Cell Cellphone
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

2-Bit Conformer Quantization for Computerized Speech Recognition
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

LibriTTS-R: A Restored Multi-speaker Textual content-to-Speech Corpus
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

PronScribe: Extremely Correct Multimodal Phonemic Transcription from Speech and Textual content
Yang Yu, Matthew Perez*, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang

Label Conscious Speech Illustration Studying for Language Identification
Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

* Work executed whereas at Google