default search action
Boris Ginsburg
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c54]Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte:
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer. ICASSP 2024: 10211-10215 - [c53]Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg:
A Chat about Boring Problems: Studying GPT-Based Text Normalization. ICASSP 2024: 10921-10925 - [c52]Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg:
Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition. ICASSP 2024: 12026-12030 - [c51]Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:
Stateful Conformer with Cache-Based Inference for Streaming Automatic Speech Recognition. ICASSP 2024: 12041-12045 - [c50]Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg:
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition. ICASSP 2024: 12111-12115 - [c49]Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg:
Investigating End-to-End ASR Architectures for Long Form Audio Transcription. ICASSP 2024: 13366-13370 - [c48]Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg:
SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation. ICASSP 2024: 13521-13525 - [c47]Paarth Neekhara, Shehzeen Samarah Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian J. McAuley:
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations. ICML 2024 - [i79]Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg:
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition. CoRR abs/2404.04295 (2024) - [i78]Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg:
RULER: What's the Real Context Size of Your Long-Context Language Models? CoRR abs/2404.06654 (2024) - [i77]Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte:
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer. CoRR abs/2405.12983 (2024) - [i76]Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg:
Label-Looping: Highly Efficient Decoding for Transducers. CoRR abs/2406.06220 (2024) - [i75]Andrei Andrusenko, Aleksandr Laptev, Vladimir Bataev, Vitaly Lavrukhin, Boris Ginsburg:
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter. CoRR abs/2406.07096 (2024) - [i74]Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan M. Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick LeGresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu:
Nemotron-4 340B Technical Report. CoRR abs/2406.11704 (2024) - [i73]Vahid Noroozi, Zhehuai Chen, Somshubra Majumdar, Steve Huang, Jagadeesh Balam, Boris Ginsburg:
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models. CoRR abs/2406.12946 (2024) - [i72]Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Rafael Valle, Rohan Badlani, Boris Ginsburg:
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment. CoRR abs/2406.17957 (2024) - [i71]Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee:
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment. CoRR abs/2406.18871 (2024) - [i70]Krishna C. Puvvada, Piotr Zelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data. CoRR abs/2406.19674 (2024) - [i69]Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Zelasko, Jagadeesh Balam, Boris Ginsburg:
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5. CoRR abs/2406.19954 (2024) - [i68]Kunal Dhawan, Nithin Rao Koluguri, Ante Jukic, Ryan Langman, Jagadeesh Balam, Boris Ginsburg:
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations. CoRR abs/2407.03495 (2024) - [i67]Wen Ding, Fei Jia, Hainan Xu, Yu Xi, Junjie Lai, Boris Ginsburg:
Romanization Encoding For Multilingual ASR. CoRR abs/2407.04368 (2024) - [i66]Somshubra Majumdar, Vahid Noroozi, Sean Narenthiran, Aleksander Ficek, Jagadeesh Balam, Boris Ginsburg:
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models. CoRR abs/2407.21077 (2024) - [i65]He Huang, Taejin Park, Kunal Dhawan, Ivan Medennikov, Krishna C. Puvvada, Nithin Rao Koluguri, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg:
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks. CoRR abs/2408.13106 (2024) - [i64]Weiqing Wang, Kunal Dhawan, Taejin Park, Krishna C. Puvvada, Ivan Medennikov, Somshubra Majumdar, He Huang, Jagadeesh Balam, Boris Ginsburg:
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR. CoRR abs/2409.01438 (2024) - [i63]Nithin Rao Koluguri, Travis M. Bartley, Hainan Xu, Oleksii Hrinchuk, Jagadeesh Balam, Boris Ginsburg, Georg Kucsko:
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation. CoRR abs/2409.05601 (2024) - [i62]Taejin Park, Ivan Medennikov, Kunal Dhawan, Weiqing Wang, He Huang, Nithin Rao Koluguri, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg:
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens. CoRR abs/2409.06656 (2024) - [i61]Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke:
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition. CoRR abs/2409.09785 (2024) - [i60]Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Zelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Chain-of-Thought Prompting for Speech Translation. CoRR abs/2409.11538 (2024) - [i59]Jinhan Wang, Weiqing Wang, Kunal Dhawan, Taejin Park, Myungjong Kim, Ivan Medennikov, He Huang, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR. CoRR abs/2409.12352 (2024) - [i58]Piotr Zelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg:
EMMeTT: Efficient Multimodal Machine Translation Training. CoRR abs/2409.13523 (2024) - [i57]Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee:
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data. CoRR abs/2409.20007 (2024) - [i56]Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun, Boris Ginsburg:
nGPT: Normalized Transformer with Representation Learning on the Hypersphere. CoRR abs/2410.01131 (2024) - [i55]Hainan Xu, Travis M. Bartley, Vladimir Bataev, Boris Ginsburg:
Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR. CoRR abs/2410.02597 (2024) - 2023
- [c46]Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models. ASRU 2023: 1-7 - [c45]Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:
Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition. ASRU 2023: 1-8 - [c44]Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro:
Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation. ICASSP 2023: 1-2 - [c43]Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg:
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models. ICASSP 2023: 1-5 - [c42]Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg:
ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations. ICASSP 2023: 1-5 - [c41]Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg:
Powerful and Extensible WFST Framework for Rnn-Transducer Losses. ICASSP 2023: 1-5 - [c40]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg:
Multi-Blank Transducers for Speech Recognition. ICASSP 2023: 1-5 - [c39]Yang Zhang, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg:
Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio. ICASSP 2023: 1-5 - [c38]Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon:
BigVGAN: A Universal Neural Vocoder with Large-Scale Training. ICLR 2023 - [c37]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. ICML 2023: 38462-38484 - [c36]Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg:
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings. INTERSPEECH 2023: 1404-1408 - [c35]Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg:
Confidence-based Ensembles of End-to-End Speech Recognition Models. INTERSPEECH 2023: 1414-1418 - [c34]Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg:
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator. INTERSPEECH 2023: 2928-2932 - [c33]He Huang, Jagadeesh Balam, Boris Ginsburg:
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling. INTERSPEECH 2023: 2933-2937 - [c32]Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg:
Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers. INTERSPEECH 2023: 3028-3032 - [c31]Elena Rastorgueva, Vitaly Lavrukhin, Boris Ginsburg:
NeMo Forced Aligner and its application to word alignment for subtitle generation. INTERSPEECH 2023: 5257-5258 - [c30]Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification. INTERSPEECH 2023: 5321-5325 - [c29]Oleksii Hrinchuk, Vladimir Bataev, Evelina Bakhturina, Boris Ginsburg:
NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023. IWSLT@ACL 2023: 442-448 - [c28]Ante Jukic, Jagadeesh Balam, Boris Ginsburg:
Flexible Multichannel Speech Enhancement for Noise-Robust Frontend. WASPAA 2023: 1-5 - [i54]Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg:
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations. CoRR abs/2302.08137 (2023) - [i53]Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg:
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator. CoRR abs/2302.14036 (2023) - [i52]Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro:
VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation. CoRR abs/2303.07578 (2023) - [i51]Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg:
Powerful and Extensible WFST Framework for RNN-Transducer Losses. CoRR abs/2303.10384 (2023) - [i50]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. CoRR abs/2304.06795 (2023) - [i49]Dima Rekesh, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Ankur Kumar, Boris Ginsburg:
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition. CoRR abs/2305.05084 (2023) - [i48]Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg:
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings. CoRR abs/2306.02317 (2023) - [i47]Kunal Dhawan, Dima Rekesh, Boris Ginsburg:
Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources. CoRR abs/2306.08753 (2023) - [i46]Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg:
Confidence-based Ensembles of End-to-End Speech Recognition Models. CoRR abs/2306.15824 (2023) - [i45]He Huang, Jagadeesh Balam, Boris Ginsburg:
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling. CoRR abs/2307.07057 (2023) - [i44]Yang Zhang, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg:
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio. CoRR abs/2308.05218 (2023) - [i43]Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg:
Investigating End-to-End ASR Architectures for Long Form Audio Transcription. CoRR abs/2309.09950 (2023) - [i42]Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg:
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition. CoRR abs/2309.10922 (2023) - [i41]Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg:
A Chat About Boring Problems: Studying GPT-based text normalization. CoRR abs/2309.13426 (2023) - [i40]Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models. CoRR abs/2310.02943 (2023) - [i39]Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg:
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation. CoRR abs/2310.09424 (2023) - [i38]Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian J. McAuley:
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations. CoRR abs/2310.09653 (2023) - [i37]Taejin Park, He Huang, Coleman Hooper, Nithin Rao Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg:
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation. CoRR abs/2310.12371 (2023) - [i36]Taejin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Rao Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg:
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System. CoRR abs/2310.12378 (2023) - [i35]Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition. CoRR abs/2312.17279 (2023) - 2022
- [c27]Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg:
Mixer-TTS: Non-Autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings. ICASSP 2022: 7482-7486 - [c26]Nithin Rao Koluguri, Taejin Park, Boris Ginsburg:
TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context. ICASSP 2022: 8102-8106 - [c25]Evelina Bakhturina, Yang Zhang, Boris Ginsburg:
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization. INTERSPEECH 2022: 491-495 - [c24]Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg:
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization. INTERSPEECH 2022: 550-554 - [c23]Taejin Park, Nithin Rao Koluguri, Fei Jia, Jagadeesh Balam, Boris Ginsburg:
NeMo Open Source Speaker Diarization System. INTERSPEECH 2022: 853-854 - [c22]Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg:
CTC Variations Through New WFST Topologies. INTERSPEECH 2022: 1041-1045 - [c21]Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
Multi-scale Speaker Diarization with Dynamic Scale Weighting. INTERSPEECH 2022: 5080-5084 - [c20]Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg:
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition. SLT 2022: 130-135 - [c19]Aleksandr Laptev, Boris Ginsburg:
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition. SLT 2022: 152-159 - [i34]Evelina Bakhturina, Yang Zhang, Boris Ginsburg:
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization. CoRR abs/2203.15917 (2022) - [i33]Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
Multi-scale Speaker Diarization with Dynamic Scale Weighting. CoRR abs/2203.15974 (2022) - [i32]Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon:
BigVGAN: A Universal Neural Vocoder with Large-Scale Training. CoRR abs/2206.04658 (2022) - [i31]Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg:
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization. CoRR abs/2208.00064 (2022) - [i30]Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg:
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition. CoRR abs/2210.03255 (2022) - [i29]Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
AmberNet: A Compact End-to-End Model for Spoken Language Identification. CoRR abs/2210.15781 (2022) - [i28]Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg:
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers. CoRR abs/2211.00585 (2022) - [i27]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg:
Multi-blank Transducers for Speech Recognition. CoRR abs/2211.03541 (2022) - [i26]Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg:
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models. CoRR abs/2211.05103 (2022) - [i25]Aleksandr Laptev, Boris Ginsburg:
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition. CoRR abs/2212.08703 (2022) - 2021
- [j2]Maria Korshunova, Boris Ginsburg, Alexander Tropsha, Olexandr Isayev:
OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design. J. Chem. Inf. Model. 61(1): 7-13 (2021) - [c18]Fei Jia, Somshubra Majumdar, Boris Ginsburg:
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection. ICASSP 2021: 6818-6822 - [c17]Jian Luo, Jianzong Wang, Ning Cheng, Edward Xiao, Jing Xiao, Georg Kucsko, Patrick K. O'Neill, Jagadeesh Balam, Slyne Deng, Adriana Flores, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Jason Li:
Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition. ICME 2021: 1-6 - [c16]Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko:
SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition. Interspeech 2021: 1434-1438 - [c15]Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang:
Hi-Fi Multi-Speaker English TTS Dataset. Interspeech 2021: 2776-2780 - [c14]Stanislav Beliaev, Boris Ginsburg:
TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis. Interspeech 2021: 3760-3764 - [c13]Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg:
NeMo Inverse Text Normalization: From Development to Production. Interspeech 2021: 4468-4472 - [c12]Yang Zhang, Evelina Bakhturina, Boris Ginsburg:
NeMo (Inverse) Text Normalization: From Development to Production. Interspeech 2021: 4857-4859 - [c11]Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
A Toolbox for Construction and Analysis of Speech Datasets. NeurIPS Datasets and Benchmarks 2021 - [i24]Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko:
SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition. CoRR abs/2104.02014 (2021) - [i23]Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
NeMo Toolbox for Speech Dataset Construction. CoRR abs/2104.04896 (2021) - [i22]Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg:
NeMo Inverse Text Normalization: From Development To Production. CoRR abs/2104.05055 (2021) - [i21]Stanislav Beliaev, Boris Ginsburg:
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction. CoRR abs/2104.08189 (2021) - [i20]Yang Zhang, Vahid Noroozi, Evelina Bakhturina, Boris Ginsburg:
SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services. CoRR abs/2105.08049 (2021) - [i19]Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg:
CarneliNet: Neural Mixture Model for Automatic Speech Recognition. CoRR abs/2107.10708 (2021) - [i18]Tuan Manh Lai, Yang Zhang, Evelina Bakhturina, Boris Ginsburg, Heng Ji:
A Unified Transformer-based Framework for Duplex Text Normalization. CoRR abs/2108.09889 (2021) - [i17]Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg:
CTC Variations Through New WFST Topologies. CoRR abs/2110.03098 (2021) - [i16]Nithin Rao Koluguri, Taejin Park, Boris Ginsburg:
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context. CoRR abs/2110.04410 (2021) - [i15]Paarth Neekhara, Jason Li, Boris Ginsburg:
Adapting TTS models For New Speakers using Transfer Learning. CoRR abs/2110.05798 (2021) - 2020
- [c10]Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang:
Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions. ICASSP 2020: 6124-6128 - [c9]Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg:
Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model. ICASSP 2020: 7074-7078 - [c8]Somshubra Majumdar, Boris Ginsburg:
MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition. INTERSPEECH 2020: 3356-3360 - [i14]Boris Ginsburg:
On regularization of gradient descent, layer imbalance and flat minima. CoRR abs/2007.09286 (2020) - [i13]Fei Jia, Somshubra Majumdar, Boris Ginsburg:
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection. CoRR abs/2010.13886 (2020)
2010 – 2019
- 2019
- [c7]Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde:
Jasper: An End-to-End Convolutional Neural Acoustic Model. INTERSPEECH 2019: 71-75 - [i12]Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde:
Jasper: An End-to-End Convolutional Neural Acoustic Model. CoRR abs/1904.03288 (2019) - [i11]Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Jonathan M. Cohen:
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks. CoRR abs/1905.11286 (2019) - [i10]Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen:
NeMo: a toolkit for building AI applications using Neural Modules. CoRR abs/1909.09577 (2019) - [i9]Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg:
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model. CoRR abs/1910.10697 (2019) - 2018
- [j1]Anastasia Dubrovina, Pavel Kisilev, Boris Ginsburg, Sharbell Y. Hashoul, Ron Kimmel:
Computational mammography using deep neural networks. Comput. methods Biomech. Biomed. Eng. Imaging Vis. 6(3): 243-247 (2018) - [c6]Peter H. Jin, Boris Ginsburg, Kurt Keutzer:
Spatially Parallel Convolutions. ICLR (Workshop) 2018 - [c5]Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu:
Mixed Precision Training. ICLR (Poster) 2018 - [i8]Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, Paulius Micikevicius:
OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models. CoRR abs/1805.10387 (2018) - [i7]Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin:
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation. CoRR abs/1811.00707 (2018) - 2017
- [c4]Oleksii Kuchaiev, Boris Ginsburg:
Factorization tricks for LSTM networks. ICLR (Workshop) 2017 - [c3]Kevin Vincent, Kevin Stephano, Michael A. Frumkin, Boris Ginsburg, Julien Demouth:
On Improving the Numerical Stability of Winograd Convolutions. ICLR (Workshop) 2017 - [i6]Oleksii Kuchaiev, Boris Ginsburg:
Factorization tricks for LSTM networks. CoRR abs/1703.10722 (2017) - [i5]Oleksii Kuchaiev, Boris Ginsburg:
Training Deep AutoEncoders for Collaborative Filtering. CoRR abs/1708.01715 (2017) - [i4]Yang You, Igor Gitman, Boris Ginsburg:
Scaling SGD Batch Size to 32K for ImageNet Training. CoRR abs/1708.03888 (2017) - [i3]Igor Gitman, Boris Ginsburg:
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification. CoRR abs/1709.08145 (2017) - [i2]Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu:
Mixed Precision Training. CoRR abs/1710.03740 (2017) - 2016
- [c2]Elad Richardson, Rom Herskovitz, Boris Ginsburg, Michael Zibulevsky:
SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques. NIPS 2016: 1534-1542 - [i1]Elad Richardson, Rom Herskovitz, Boris Ginsburg, Michael Zibulevsky:
SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques. CoRR abs/1609.00629 (2016)
2000 – 2009
- 2002
- [c1]Roy Armoni, Limor Fix, Alon Flaisher, Rob Gerth, Boris Ginsburg, Tomer Kanza, Avner Landver, Sela Mador-Haim, Eli Singerman, Andreas Tiemeyer, Moshe Y. Vardi, Yael Zbar:
The ForSpec Temporal Logic: A New Temporal Property-Specification Language. TACAS 2002: 296-211
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-08 20:27 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint