Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion

Tejedor, Javier; Toledano, Doroteo T.; Anguera, Xavier; Varona, Amparo; Hurtado Oliver, Lluis Felip; Miguel, Antonio; Colás, José

doi:10.1186/1687-4722-2013-23

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Tejedor;Toledano; ...

Tamaño: 810.6Kb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Tejedor, Javier	es_ES
dc.contributor.author	Toledano, Doroteo T.	es_ES
dc.contributor.author	Anguera, Xavier	es_ES
dc.contributor.author	Varona, Amparo	es_ES
dc.contributor.author	Hurtado Oliver, Lluis Felip	es_ES
dc.contributor.author	Miguel, Antonio	es_ES
dc.contributor.author	Colás, José	es_ES
dc.date.accessioned	2014-09-29T11:42:50Z
dc.date.available	2014-09-29T11:42:50Z
dc.date.issued	2013-09-17
dc.identifier.issn	1687-4722
dc.identifier.uri	http://hdl.handle.net/10251/40402
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1186/1687-4722-2013-23	es_ES
dc.description.abstract	Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conferencea. The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshopsb, which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.	es_ES
dc.language	Español	es_ES
dc.publisher	SpringerOpen	es_ES
dc.relation.ispartof	EURASIP Journal on Audio, Speech, and Music Processing	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Query-by-example	es_ES
dc.subject	Spoken term detection	es_ES
dc.subject	International evaluation	es_ES
dc.subject	Search on spontaneous speech	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1186/1687-4722-2013-23
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Tejedor, J.; Toledano, DT.; Anguera, X.; Varona, A.; Hurtado Oliver, LF.; Miguel, A.; Colás, J. (2013). Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion. EURASIP Journal on Audio, Speech, and Music Processing. (23):1-17. doi:10.1186/1687-4722-2013-23	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://link.springer.com/article/10.1186/1687-4722-2013-23	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	17	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.issue	23	es_ES
dc.relation.senia	261848
dc.description.references	Zhang T, Kuo CCJ: Hierarchical classification of audio data for archiving and retrieving. In Proceedings of ICASSP. Phoenix; 15–19 March 1999:3001-3004.	es_ES
dc.description.references	Helén M, Virtanen T: Query by example of audio signals using Euclidean distance between Gaussian Mixture Models. In Proceedings of ICASSP. Honolulu; 15–20 April 2007:225-228.	es_ES
dc.description.references	Helén M, Virtanen T: Audio query by example using similarity measures between probability density functions of features. EURASIP J. Audio Speech Music Process 2010, 2010: 2:1-2:12.	es_ES
dc.description.references	Tzanetakis G, Ermolinskyi A, Cook P: Pitch histograms in audio and symbolic music information retrieval. In Proceedings of the Third International Conference on Music Information Retrieval: ISMIR. Paris; 2002:31-38.	es_ES
dc.description.references	Tsai HM, Wang WH: A query-by-example framework to retrieve music documents by singer. In Proceedings of the IEEE International Conference on Multimedia and Expo. Taipei; 27–30 June 2004:1863-1866.	es_ES
dc.description.references	Chia TK, Sim KC, Li H, Ng HT: A lattice-based approach to query-by-example spoken document retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore; 20–24 July 2008:363-370.	es_ES
dc.description.references	Tejedor J, Fapšo M, Szöke I, Černocký H, Grézl F: Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. ACM Trans. Inf. Syst 2012, 30(3):18:1-18:34.	es_ES
dc.description.references	Muscariello A, Gravier G, Bimbot F: Zero-resource audio-only spoken term detection based on a combination of template matching techniques. In Proceedings of Interspeech. Florence; 27–31 August 2011:921-924.	es_ES
dc.description.references	Lin H, Stupakov A, Bilmes J: Spoken keyword spotting via multi-lattice alignment. In 9th International Speech Communication Association Annual Conference. Brisbane, Australia; September 2008:2191-2194.	es_ES
dc.description.references	Parada C, Sethy A, Ramabhadran B: Query-by-Example Spoken Term Detection for OOV terms. In Proceedings of ASRU. Merano; 13-17 December 2009:404-409.	es_ES
dc.description.references	Shen W, White TJ, Hazen CM: A comparison of Query-by-Example methods for Spoken Term Detection. In Proceedings of Interspeech. Brighton; September 2009:2143-2146.	es_ES
dc.description.references	Lin H, Stupakov A, Bilmes J: Improving multi-lattice alignment based spoken keyword spotting. In Proceedings of ICASSP. Taipei; 19–24 April 2009:4877-4880.	es_ES
dc.description.references	Barnard E, Davel M, van Heerden C, Kleynhans N, Bali K: Phone recognition for spoken web search. In Proceedings of MediaEval. Pisa; 1–2 September 2011:5-6.	es_ES
dc.description.references	Buzo A, Cucu H, Safta M, Ionescu B, Burileanu C: ARF@MediaEval 2012: a Romanian ASR-based approach to spoken term detection. In Proceedings of MediaEval. Pisa; 4–5 October 2012:7-8.	es_ES
dc.description.references	Abad A, Astudillo RF: The L2F spoken web search system for MediaEval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:9-10.	es_ES
dc.description.references	Varona A, Penagarikano M, Rodríguez-Fuentes L, Bordel L, Diez M: GTTS system for the spoken web search task at MediaEval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:13-14.	es_ES
dc.description.references	Szöke I, Faps̆o M, Veselý K: BUT2012 Approaches for spoken web search - MediaEval 2012. In Proceedings of MediaEval. Pisa;4–5October 2012:15-16.	es_ES
dc.description.references	Hazen W, Shen TJ, White CM: Query-by-Example spoken term detection using phonetic posteriorgram templates. In Proceedings of ASRU. Merano; 13–17 December 2009:421-426.	es_ES
dc.description.references	Zhang Y, Glass JR: Unsupervised spoken keyword spotting via segmental DTW on Gaussian Posteriorgrams. In Proceedings of ASRU. Merano; 13–17 December 2009:398-403.	es_ES
dc.description.references	Chan C, Lee L: Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping. In Proceedings of Interspeech. Makuhari; 26–30 September 2010:693-696.	es_ES
dc.description.references	Anguera X, Macrae R, Oliver N: Partial sequence matching using an unbounded dynamic time warping algorithm. In Proceedings of ICASSP. Dallas; 14–19 March 2010:3582-3585.	es_ES
dc.description.references	Anguera X: Telefonica system for the spoken web search Task at Mediaeval 2011. In Proceedings of MediaEval. Pisa; 1–2 September 2011:3-4.	es_ES
dc.description.references	Muscariello A, Gravier G: Irisa MediaEval 2011 spoken web search system. In Proceedings of MediaEval. Pisa; 1–2 September 2011:9-10.	es_ES
dc.description.references	Szöke I, Tejedor J, Faps̆o M, Colás J: BUT-HCTLab approaches for spoken web search - MediaEval 2011. In Proceedings of MediaEval. Pisa; 1–2 September 2011:11-12.	es_ES
dc.description.references	Wang H, Lee T: CUHK System for the spoken web search task at Mediaeval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:3-4.	es_ES
dc.description.references	Joder C, Weninger F, Wöllmer M, Schuller M: The TUM cumulative DTW approach for the Mediaeval 2012 spoken web search task. In Proceedings of MediaEval. Pisa; 4–5 October 2012:5-6.	es_ES
dc.description.references	Vavrek J, Pleva M, Juhár J: TUKE MediaEval 2012: spoken web search using DTW and unsupervised SVM. In Proceedings of MediaEval. Pisa; 4–5 October 2012:11-12.	es_ES
dc.description.references	Jansen A, Durme P, Clark BV: The JHU-HLTCOE spoken web search system for MediaEval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:17-18.	es_ES
dc.description.references	Anguera X: Telefonica Research System for the spoken web search task at Mediaeval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:19-20.	es_ES
dc.description.references	NIST: The Ninth Text REtrieval Conference (TREC 9). 2000. http://trec.nist.gov . Accessed 16 September 2013	es_ES
dc.description.references	NIST: The Spoken Term Detection (STD) 2006 Evaluation Plan. 10 (National Institute of Standards and Technology (NIST), Gaithersburg, 2006). . Accessed 16 September 2013 http://www.nist.gov/speech/tests/std	es_ES
dc.description.references	Sakai T, Joho H: Overview of NTCIR-9. Proceedings of NTCIR-9 Workshop 2011, 1-7.	es_ES
dc.description.references	Rajput N, Metze F: Spoken web search. In Proceedings of MediaEval. Pisa; 1–2 September 2011:1-2.	es_ES
dc.description.references	Metze F, Barnard E, Davel M, van Heerden C, Anguera X, Gravier G, Rajput N: Spoken web search. In Proceedings of MediaEval. Pisa; 4–5 October 2012:1-2.	es_ES
dc.description.references	Tokyo University of Technology: Evaluation of information access technologies: information retrieval, question answering and cross-lingual information access. 2013. http://research.nii.ac.jp/ntcir/ntcir-10/ . Accessed 16 September 2013	es_ES
dc.description.references	NIST: The OpenKWS13 evaluation plan. 1, (National Institute of Standards and Technology (NIST), Gaithersburg, 2013). . Accessed 16 September 2013 http://www.nist.gov/itl/iad/mig/openkws13.cfm	es_ES
dc.description.references	Taras B, Nadeu C: Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion. EURASIP J. Audio Speech Music Process 2011, 1: 1-10.	es_ES
dc.description.references	Zelenák M, Schulz H, Hernando J: Speaker diarization of broadcast news in Albayzin 2010 evaluation campaign. EURASIP J. Audio Speech Music Process 2012, 19: 1-9.	es_ES
dc.description.references	Rodríguez-Fuentes LJ, Penagarikano M, Varona A, Díez M, Bordel G: The Albayzin 2010 language recognition evaluation. In Proceedings of Interspeech. Florence; 27–31 August 2011:1529-1532.	es_ES
dc.description.references	Méndez F, Docío L, Arza M, Campillo F: The Albayzin 2010 text-to-speech evaluation. In Proceedings of FALA. Vigo; November 2010:317-340.	es_ES
dc.description.references	Fiscus JG, Ajot J, Garofolo JS, Doddington G: Results of the 2006 spoken term detection evaluation. In Proceedings of SIGIR Workshop Searching Spontaneous Conversational Speech. Rhodes; 22–25 September 2007:45-50.	es_ES
dc.description.references	Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M: The DET curve in assessment of detection task performance. In Proceedings of Eurospeech. Rhodes; 22-25 September 1997:1895-1898.	es_ES
dc.description.references	NIST: NIST Speech Tools and APIs: 2006 (National Institute of Standards and Technology (NIST), Gaithersburg, 1996). . Accessed 16 September 2013 http://www.nist.gov/speech/tools/index.htm	es_ES
dc.description.references	Iberspeech 2012: VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop. . Accessed 16 September 2013 http://iberspeech2012.ii.uam.es/IberSPEECH2012_OnlineProceedings.pdf	es_ES
dc.description.references	Anguera X: Speaker independent discriminant feature extraction for acoustic pattern-matching. In Proceedings of ICASSP. Kyoto; 25–30 March 2012:485-488.	es_ES
dc.description.references	Anguera X, Ferrarons M: Memory efficient subsequence DTW for Query-by-Example spoken term detection. Proceedings of ICME 2013. http://www.xavieranguera.com/papers/sdtw_icme2013.pdf	es_ES
dc.description.references	Anguera X: Telefonica Research System for the Query-by-example task at Albayzin 2012. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:626-632.	es_ES
dc.description.references	Schwarz P: Phoneme recognition based on long temporal context. PhD Thesis, FIT, BUT, Brno, Czech Republic. 2008.	es_ES
dc.description.references	Stolckem A: SRILM - an extensible language modeling toolkit. In Proceedings of Interspeech. Denver; 2002:901-904.	es_ES
dc.description.references	Wang D, King S, Frankel J: Stochastic pronunciation modelling for out-of-vocabulary spoken term detection. IEEE Trans. Audio Speech Language Process 2011, 19(4):688-698.	es_ES
dc.description.references	Wang D, Tejedor J, King S, Frankel J: Term-dependent confidence normalization for out-of-vocabulary spoken term detection. J. Comput. Sci. Technol 2012, 27(2):358-375. 10.1007/s11390-012-1228-x	es_ES
dc.description.references	Wang D, King S, Frankel J, Vipperla R, Evans N, Troncy R: Direct posterior confidence for out-of-vocabulary spoken term detection. ACM Trans. Inf. Syst 2012, 30(3):1-34.	es_ES
dc.description.references	Varona A, Penagarikano M, Rodríguez-Fuentes LJ, Bordel G, Diez M: GTTS systems for the query-by-example spoken term detection task of the Albayzin 2012 search on speech evaluation. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:619-625.	es_ES
dc.description.references	Gómez J, Sanchis E, Castro-Bleda M: Automatic speech segmentation based on acoustical clustering. Proceedings of the Joint IAPR International Conference on Structural, Syntactic, and Statistical Pattern Recognition 2010, 540-548.	es_ES
dc.description.references	Gómez J, Castro M: Automatic segmentation of speech at the phonetic level. Proceedings of the joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition 2002, 672-680.	es_ES
dc.description.references	Sanchis E, Hurtado LF, Gómez JA, Calvo M, Fabra R: The ELiRF Query-by-example STD systems for the Albayzin 2012 search on speech evaluation. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:611-618.	es_ES
dc.description.references	Park A, Glass J: Towards unsupervised pattern discovery in speech. In Proceedings of ASRU. Cancun; 27 November to 1 December 2005:53-58.	es_ES
dc.description.references	Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P: The HTK Book. Engineering Department, Cambridge University; 2006.	es_ES
dc.description.references	Miguel A, Villalba J, Ortega A, Lleida E: Albayzin 2012 search on speech @ ViVoLab UZ. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:633-642.	es_ES
dc.description.references	Boersma P, Weenink D: Praat: Doing Phonetics by Computer. University of Amsterdam, Spuistraat, 210, Amsterdam, Holland. 2007. http://www.fon.hum.uva.nl/praat/ . Accessed 16 September 2013	es_ES
dc.description.references	Goldwater S, Jurafsky D, Maning CD: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun 2009, 52(3):181-200.	es_ES
dc.description.references	Mertens T, Wallace R, Schneider D: Cross-site combination and evaluation of subword spoken term detection systems. In Proceedings of CBMI. Madrid; 13–15 June 2011:61-66.	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48360]

Mostrar el registro sencillo del ítem

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)