Kokoro 82M is an 82-million-parameter text-to-speech model that beats many TTS APIs while running locally on CPUs, including ...
Mistral AI is expanding its Voxtral model family with its first text-to-speech model. The launch comes amid intensifying competition in the fast-growing AI voice market, with Voxtral TTS pitched as an ...
French AI company Mistral released a new open source text-to-speech model on Thursday that can be used by voice AI assistants or in enterprise use cases like customer support. The model, which lets ...
This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time.
WhisperS2T is an optimized lightning-fast open-sourced Speech-to-Text (ASR) pipeline. It is tailored for the whisper model to provide faster whisper transcription. It's designed to be exceptionally ...
Abstract: While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show ...
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results