Statistical text-to-speech synthesis of Spanish subtitles

Piqueras Gozalbes, Santiago Romualdo; Del Agua Teba, Miguel Angel; Giménez Pastor, Adrián; Civera Saiz, Jorge; Juan Císcar, Alfonso

doi:10.1007/978-3-319-13623-3_5

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Statistical text-to-speech synthesis of Spanish subtitles

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: camera_ready.pdf

Tamaño: 262.3Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: editor-santi.pdf

Tamaño: 246.7Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Piqueras Gozalbes, Santiago Romualdo	es_ES
dc.contributor.author	Del Agua Teba, Miguel Angel	es_ES
dc.contributor.author	Giménez Pastor, Adrián	es_ES
dc.contributor.author	Civera Saiz, Jorge	es_ES
dc.contributor.author	Juan Císcar, Alfonso	es_ES
dc.date.accessioned	2015-05-19T09:44:47Z
dc.date.available	2015-05-19T09:44:47Z
dc.date.issued	2014
dc.identifier.isbn	978-3-319-13622-6
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/10251/50446
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-13623-3_5	es_ES
dc.description.abstract	Online multimedia repositories are growing rapidly. However, language barriers are often difficult to overcome for many of the current and potential users. In this paper we describe a TTS Spanish sys- tem and we apply it to the synthesis of transcribed and translated video lectures. A statistical parametric speech synthesis system, in which the acoustic mapping is performed with either HMM-based or DNN-based acoustic models, has been developed. To the best of our knowledge, this is the first time that a DNN-based TTS system has been implemented for the synthesis of Spanish. A comparative objective evaluation between both models has been carried out. Our results show that DNN-based systems can reconstruct speech waveforms more accurately.	es_ES
dc.description.sponsorship	The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 287755 (transLectures) and ICT Policy Support Programme (ICT PSP/2007-2013) as part of the Competitiveness and Innovation Framework Programme (CIP) under grant agreement no 621030 (EMMA), and the Spanish MINECO Active2Trans (TIN2012-31723) research project.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer International Publishing	es_ES
dc.relation.ispartof	Advances in Speech and Language Technologies for Iberian Languages: Second International Conference, IberSPEECH 2014, Las Palmas de Gran Canaria, Spain, November 19-21, 2014. Proceedings	es_ES
dc.relation.ispartofseries	Lecture Notes in Computer Science;8854
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Video lectures	es_ES
dc.subject	Text-to-speech synthesis	es_ES
dc.subject	Accessibility	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Statistical text-to-speech synthesis of Spanish subtitles	es_ES
dc.type	Capítulo de libro	es_ES
dc.identifier.doi	10.1007/978-3-319-13623-3_5
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/287755/EU/Transcription and Translation of Video Lectures/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/CIP/621030/EU/European Multiple MOOC Aggregator/EMMA/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-31723/ES/INTERACCION ACTIVA PARA TRANSCRIPCION DE HABLA Y TRADUCCION/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Piqueras Gozalbes, SR.; Del Agua Teba, MA.; Giménez Pastor, A.; Civera Saiz, J.; Juan Císcar, A. (2014). Statistical text-to-speech synthesis of Spanish subtitles. En Advances in Speech and Language Technologies for Iberian Languages: Second International Conference, IberSPEECH 2014, Las Palmas de Gran Canaria, Spain, November 19-21, 2014. Proceedings. Springer International Publishing. 40-48. https://doi.org/10.1007/978-3-319-13623-3_5	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://link.springer.com/chapter/10.1007/978-3-319-13623-3_5	es_ES
dc.description.upvformatpinicio	40	es_ES
dc.description.upvformatpfin	48	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	280326
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Ahocoder, http://aholab.ehu.es/ahocoder	es_ES
dc.description.references	Coursera, http://www.coursera.org	es_ES
dc.description.references	HMM-Based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp	es_ES
dc.description.references	Khan Academy, http://www.khanacademy.org	es_ES
dc.description.references	Axelrod, A., He, X., Gao, J.: Domain adaptation via pseudo in-domain data selection. In: Proc. of EMNLP, pp. 355–362 (2011)	es_ES
dc.description.references	Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of Neuro-Nîmes 1991. EC2, Nimes, France (1991)	es_ES
dc.description.references	Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)	es_ES
dc.description.references	Erro, D., Sainz, I., Navas, E., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing 8(2), 184–194 (2014)	es_ES
dc.description.references	Fan, Y., Qian, Y., Xie, F., Soong, F.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proc. of Interspeech (submitted 2014)	es_ES
dc.description.references	Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)	es_ES
dc.description.references	Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of ICASSP, vol. 1, pp. 373–376 (1996)	es_ES
dc.description.references	King, S.: Measuring a decade of progress in text-to-speech. Loquens 1(1), e006 (2014)	es_ES
dc.description.references	Koehn, P.: Statistical Machine Translation. Cambridge University Press (2010)	es_ES
dc.description.references	Kominek, J., Schultz, T., Black, A.W.: Synthesizer voice quality of new languages calibrated with mean mel cepstral distortion. In: Proc. of SLTU, pp. 63–68 (2008)	es_ES
dc.description.references	Lopez, A.: Statistical machine translation. ACM Computing Surveys 40(3), 8:1–8:49 (2008)	es_ES
dc.description.references	poliMedia: The polimedia video-lecture repository (2007), http://media.upv.es	es_ES
dc.description.references	Sainz, I., Erro, D., Navas, E., Hernáez, I., Sánchez, J., Saratxaga, I.: Aholab speech synthesizer for albayzin 2012 speech synthesis evaluation. In: Proc. of IberSPEECH, pp. 645–652 (2012)	es_ES
dc.description.references	Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent dnn for conversational speech transcription. In: Proc. of ASRU, pp. 24–29 (2011)	es_ES
dc.description.references	Shinoda, K., Watanabe, T.: MDL-based context-dependent subword modeling for speech recognition. Journal of the Acoustical Society of Japan 21(2), 79–86 (2000)	es_ES
dc.description.references	Silvestre-Cerdà, J.A., et al.: Translectures. In: Proc. of IberSPEECH, pp. 345–351 (2012)	es_ES
dc.description.references	TED Ideas worth spreading, http://www.ted.com	es_ES
dc.description.references	The transLectures-UPV Team.: The transLectures-UPV toolkit (TLK), http://translectures.eu/tlk	es_ES
dc.description.references	Toda, T., Black, A.W., Tokuda, K.: Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis. In: Proc. of ISCA Speech Synthesis Workshop (2004)	es_ES
dc.description.references	Tokuda, K., Kobayashi, T., Imai, S.: Speech parameter generation from hmm using dynamic features. In: Proc. of ICASSP, vol. 1, pp. 660–663 (1995)	es_ES
dc.description.references	Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Multi-space probability distribution HMM. IEICE Transactions on Information and Systems 85(3), 455–464 (2002)	es_ES
dc.description.references	transLectures: D3.1.2: Second report on massive adaptation, http://www.translectures.eu/wp-content/uploads/2014/01/transLectures-D3.1.2-15Nov2013.pdf	es_ES
dc.description.references	Turró, C., Ferrando, M., Busquets, J., Cañero, A.: Polimedia: a system for successful video e-learning. In: Proc. of EUNIS (2009)	es_ES
dc.description.references	Videolectures.NET: Exchange ideas and share knowledge, http://www.videolectures.net	es_ES
dc.description.references	Wu, Y.J., King, S., Tokuda, K.: Cross-lingual speaker adaptation for HMM-based speech synthesis. In: Proc. of ISCSLP, pp. 1–4 (2008)	es_ES
dc.description.references	Yamagishi, J.: An introduction to HMM-based speech synthesis. Tech. rep. Centre for Speech Technology Research (2006), https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/TrajectoryModelling/HTS-Introduction.pdf	es_ES
dc.description.references	Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech, pp. 2347–2350 (1999)	es_ES
dc.description.references	Zen, H., Senior, A.: Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: Proc. of ICASSP, pp. 3872–3876 (2014)	es_ES
dc.description.references	Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proc. of ICASSP, pp. 7962–7966 (2013)	es_ES
dc.description.references	Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Statistical text-to-speech synthesis of Spanish subtitles

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Statistical text-to-speech synthesis of Spanish subtitles

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)