LauraGPT

  • Home
  • Features
  • Demos
  • Results

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

LauraGPT can take both audio and text as input and output both modalities, and perform a broad spectrum of content-oriented and speech-signal-related tasks such as automatic speech recognition, speech-to-text translation, text-to-speech synthesis, machine translation, speech enhancement, automated audio captioning, speech emotion recognition, and spoken language understanding.

ASR

The LauraGPT model can be used for speech recognition task which transcribes speech into textual content.

TTS

The LauraGPT model can be used for text-to-speech synthesis task which generates speech that matches the given text.

S2ST

The LauraGPT model is capable of performing the advanced task of speech-to-speech translation (S2ST) by combining the S2TT and TTS tasks.

S2TT

The LauraGPT model can be used for speech to text translation task which directly translates speech into text in the target language.

SE

The LauraGPT model can be used for speech enhancement task which aims to improve speech quality through noise suppression and echo cancellation.

AAC

The LauraGPT model can be used for automated audio captioning task aims to generate a sentence that describes the content of general audio using natural language

SLU

The LauraGPT model can be used for spoken language understanding task which aims to identify the user’s intent and the relevant entity slots that fill the intent.

SER

The LauraGPT model can be used for speech emotion recognition task which involves the classification of emotions using speech as input.

Demos

1. Automatic speech recognition (ASR) samples


Original speech Transcribed Text

and yesterday things went on just as usual

is it fair that he should do so or not

so then they waited for a steamboat

what difference is there between a bottle and a flagon

该 歌 曲 今 年 一 经 在 各 大 音 乐 网 站 上 线 便 收 获 无 数 好 评

在 确 保 系 统 顺 利 运 行 的 情 况 下

2. Text-to-speech synthesis (TTS) samples


2.1 LibriTTS Zero Shot TTS

[Prompt: 1995_1837_000020_000000] Up in the sick room Zora lay on the little white bed. [Continuation: 1995_1836_000003_000002] At last the Cotton Combine was to all appearances an assured fact and he was slated for the Senate.
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
[Prompt: 2830_3980_000018_000001] Humble man that he was, he will not now take a back seat. [Continuation: 2830_3980_000018_000000] Against these boasting, false apostles, Paul boldly defends his apostolic authority and ministry.
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
[Prompt: 6829_68771_000046_000000] A sudden wave of scarlet swept over Eliza's face. [Continuation: 6829_68769_000030_000000] Then he deliberately locked Kenneth and Beth in with the forger, and retreated along the passage.
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
[Prompt: 8230_279154_000004_000008] To deal with this problem, we must have a theory of memory. [Continuation: 8230_279154_000019_000000] The first of our vague but indubitable data is that there is knowledge of the past.
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
2.2 AISHELL Zero Shot TTS

[Prompt: S0764_BAC009S0764W0169] 实施较大幅度的补贴政策 [Continuation: S0764_BAC009S0764W0285] 两家公司是联网汽车的主要芯片供应商。
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
[Prompt: S0906_BAC009S0906W0202] 加强农牧互补牧养结合 [Continuation: S0906_BAC009S0906W0181] 月度市场成交量开始出现环比回升。
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
[Prompt: S0766_BAC009S0766W0321] 新能源汽车市场在逐步启动 [Continuation: S0766_BAC009S0766W0182] 转型后的今久整合营销集团。
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT
[Prompt: S0908_BAC009S0908W0473] 参考消息网七月八日报道 [Continuation: S0908_BAC009S0908W0361] 帮助仰泳运动员改善自己的出发技术。
Prompt wav (16k) Ground-truth (16k) VALL-E Phone VALL-E Token LauraGPT

3. Zero-shot speech-to-speech translation (S2ST) samples


3.1 English-to-Chinese Translation using CoVOST2 dataset
English Text English Speech LauraGPT Translated Speech
two workers in orange vests perform their job.
two boys are playing soccer in the water at the beach.
many programming languages are named after real people.
3.2 Chinese-to-English Translation using BSTC dataset
Chinese Text Chinese Speech LauraGPT Translated Speech
但不是这种所有的可能性都可以在市场上成功的。
要知道每个人都是怕输的,对吗?

4. Speech to text translation (S2TT) samples


Original speech Transcribed Text

But not all of them can be successful in the market.

Welcome to the advanced course on the UNIT dialogue system.

Then, upload the AR content package to the AR content platform.

但他似乎没有意识到任何危险。

我想预订一间靠近我们公寓的酒吧。

主要是那些努力实现自己命运的人的心。

5. Speech enhancement (SE) samples


Noisy LauraGPT-Enhanced Clean

Results

Results on All Tasks

Copyright © All rights reserved