Citation Export
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Cho, Deok Hyeon | - |
| dc.contributor.author | Oh, Hyung Seok | - |
| dc.contributor.author | Kim, Seung Bin | - |
| dc.contributor.author | Lee, Sang Hoon | - |
| dc.contributor.author | Lee, Seong Whan | - |
| dc.date.issued | 2024-01-01 | - |
| dc.identifier.issn | 1990-9772 | - |
| dc.identifier.uri | https://aurora.ajou.ac.kr/handle/2018.oak/38119 | - |
| dc.identifier.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85206434807&origin=inward | - |
| dc.description.abstract | Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model's ability to control emotional style and intensity with high-quality expressive speech. | - |
| dc.description.sponsorship | This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program (Korea University), No. 2021-0-02068, Artificial Intelligence Innovation Hub, and AI Technology for Interactive Communication of Language Impaired Individuals). | - |
| dc.language.iso | eng | - |
| dc.publisher | International Speech Communication Association | - |
| dc.subject.mesh | Complex nature | - |
| dc.subject.mesh | Emotional speech | - |
| dc.subject.mesh | Emotional speech synthesis | - |
| dc.subject.mesh | Emotional style and intensity control | - |
| dc.subject.mesh | Expressive emotional speech synthesis | - |
| dc.subject.mesh | Human annotations | - |
| dc.subject.mesh | Intensity models | - |
| dc.subject.mesh | Speech emotions | - |
| dc.subject.mesh | Synthetic speech | - |
| dc.subject.mesh | Text to speech | - |
| dc.title | EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech | - |
| dc.type | Conference | - |
| dc.citation.conferenceDate | 2024.09.01.~2024.09.05. | - |
| dc.citation.conferenceName | 25th Interspeech Conferece 2024 | - |
| dc.citation.edition | Interspeech 2024 | - |
| dc.citation.endPage | 1814 | - |
| dc.citation.startPage | 1810 | - |
| dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
| dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.1810-1814 | - |
| dc.identifier.doi | 10.21437/interspeech.2024-398 | - |
| dc.identifier.scopusid | 2-s2.0-85206434807 | - |
| dc.identifier.url | https://www.isca-speech.org/iscaweb/index.php/online-archive | - |
| dc.subject.keyword | emotional style and intensity control | - |
| dc.subject.keyword | expressive emotional speech synthesis | - |
| dc.subject.keyword | Text-to-speech | - |
| dc.type.other | Conference Paper | - |
| dc.identifier.pissn | 2308457X | - |
| dc.description.isoa | true | - |
| dc.subject.subarea | Language and Linguistics | - |
| dc.subject.subarea | Human-Computer Interaction | - |
| dc.subject.subarea | Signal Processing | - |
| dc.subject.subarea | Software | - |
| dc.subject.subarea | Modeling and Simulation | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.