Note: Samples generated by other neural vocoders for comparison are being prepared now and will be released later.
timedomAIn is a music technology company founded in 2019. We are dedicated to exploring the use of AI to empower non-professionals to create original music and express themselves.
ACE Virtual Singer is our main product where users write songs with highly realistic voices performed by AI-powered virtual singers.
speaker | predicted | ground-truth |
---|---|---|
Female 1 | ||
Female 2 | ||
Female 3 | ||
Male 1 | ||
Male 2 |
speaker | predicted | ground-truth |
---|---|---|
Female 1 | ||
Female 1 | ||
Female 2 | ||
Female 2 | ||
Male 1 |
speaker | predicted | ground-truth | file name |
---|---|---|---|
Female 1 (Tohoku Kiritan) | 24.wav | ||
Female 1 (Tohoku Kiritan) | 36.wav |
https://www.data-baker.com/data/index/source/
speaker | predicted | ground-truth | file name |
---|---|---|---|
Female 1 | 001279.wav | ||
Female 1 | 005116.wav | ||
Female 1 | 006757.wav | ||
Female 1 | 007658.wav |
https://openslr.org/109/
Bakhturina, E., Lavrukhin, V., Ginsburg, B., & Zhang, Y. (2021). Hi-Fi Multi-Speaker English TTS Dataset. arXiv preprint arXiv:2104.01497.
speaker | predicted | ground-truth | file name |
---|---|---|---|
Male 1 | antoinetteromances4_01_dumas_0061.flac | ||
Female 1 | dayoffate_33_roe_0129.flac | ||
Male 2 | roots_24_morris_0068.flac | ||
Male 3 | shadesofwilderness_13_altsheler_0097.flac |
https://gitlab.com/nicolasobin/att-hack
Moine, C. L., & Obin, N. (2020). Att-HACK: An Expressive Speech Database with Social Attitudes. arXiv preprint arXiv:2004.04410.
speaker | predicted | ground-truth | file name |
---|---|---|---|
Female 1 | F03_a1_s079_v05.wav | ||
Female 2 | F15_a3_s100_v02.wav | ||
Male 1 | M07_a4_s060_v05.wav | ||
Male 2 | M17_a1_s079_v01.wav |
https://sites.google.com/site/shinnosuketakamichi/publication/jsut
Sonobe R, Takamichi S, Saruwatari H. JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis[J]. arXiv preprint arXiv:1711.00354, 2017.
speaker | predicted | ground-truth | file name |
---|---|---|---|
Female | ONOMATOPEE300_107.wav | ||
Female | REPEAT500_set4_060.wav | ||
Female | TRAVEL1000_0670.wav | ||
Female | UT-PARAPHRASE-sent124-phrase2.wav |
Sodimana, K., Pipatsrisawat, K., Ha, L., Jansche, M., Kjartansson, O., De Silva, P., & Sarin, S. (2018). A step-by-step process for building tts voices using open source data and framework for bangla, javanese, khmer, nepali, sinhala, and sundanese.
speaker | language | predicted | ground-truth | file name |
---|---|---|---|---|
Female 1 | Javanese | jvf_08305_00814037052.wav | ||
Male 1 | Javanese | jvm_03424_01297287738.wav | ||
Female 1 | Khmer | khm_3154_0157853181.wav | ||
Female 1 | Khmer | khm_6753_3404534535.wav | ||
Female 1 | Nepali | nep_0546_7054581764.wav | ||
Female 1 | Nepali | nep_3614_7960099494.wav | ||
Female 1 | Sundanese | suf_02395_01693235787.wav | ||
Male 1 | Sundanese | sum_05186_00415408849.wav |