- -

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion

Show full item record

Tejedor, J.; Toledano, DT.; Anguera, X.; Varona, A.; Hurtado Oliver, LF.; Miguel, A.; Colás, J. (2013). Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion. EURASIP Journal on Audio, Speech, and Music Processing. (23):1-17. doi:10.1186/1687-4722-2013-23

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/40402

Files in this item

Item Metadata

Title: Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results and discussion
Author:
UPV Unit: Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Issued date:
Abstract:
Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due ...[+]
Subjects: Query-by-example , Spoken term detection , International evaluation , Search on spontaneous speech
Copyrigths: Reconocimiento (by)
Source:
EURASIP Journal on Audio, Speech, and Music Processing. (issn: 1687-4722 )
DOI: 10.1186/1687-4722-2013-23
Publisher:
SpringerOpen
Publisher version: http://link.springer.com/article/10.1186/1687-4722-2013-23
Description: The final publication is available at Springer via http://dx.doi.org/10.1186/1687-4722-2013-23
Type: Artículo

References

Zhang T, Kuo CCJ: Hierarchical classification of audio data for archiving and retrieving. In Proceedings of ICASSP. Phoenix; 15–19 March 1999:3001-3004.

Helén M, Virtanen T: Query by example of audio signals using Euclidean distance between Gaussian Mixture Models. In Proceedings of ICASSP. Honolulu; 15–20 April 2007:225-228.

Helén M, Virtanen T: Audio query by example using similarity measures between probability density functions of features. EURASIP J. Audio Speech Music Process 2010, 2010: 2:1-2:12. [+]
Zhang T, Kuo CCJ: Hierarchical classification of audio data for archiving and retrieving. In Proceedings of ICASSP. Phoenix; 15–19 March 1999:3001-3004.

Helén M, Virtanen T: Query by example of audio signals using Euclidean distance between Gaussian Mixture Models. In Proceedings of ICASSP. Honolulu; 15–20 April 2007:225-228.

Helén M, Virtanen T: Audio query by example using similarity measures between probability density functions of features. EURASIP J. Audio Speech Music Process 2010, 2010: 2:1-2:12.

Tzanetakis G, Ermolinskyi A, Cook P: Pitch histograms in audio and symbolic music information retrieval. In Proceedings of the Third International Conference on Music Information Retrieval: ISMIR. Paris; 2002:31-38.

Tsai HM, Wang WH: A query-by-example framework to retrieve music documents by singer. In Proceedings of the IEEE International Conference on Multimedia and Expo. Taipei; 27–30 June 2004:1863-1866.

Chia TK, Sim KC, Li H, Ng HT: A lattice-based approach to query-by-example spoken document retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore; 20–24 July 2008:363-370.

Tejedor J, Fapšo M, Szöke I, Černocký H, Grézl F: Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. ACM Trans. Inf. Syst 2012, 30(3):18:1-18:34.

Muscariello A, Gravier G, Bimbot F: Zero-resource audio-only spoken term detection based on a combination of template matching techniques. In Proceedings of Interspeech. Florence; 27–31 August 2011:921-924.

Lin H, Stupakov A, Bilmes J: Spoken keyword spotting via multi-lattice alignment. In 9th International Speech Communication Association Annual Conference. Brisbane, Australia; September 2008:2191-2194.

Parada C, Sethy A, Ramabhadran B: Query-by-Example Spoken Term Detection for OOV terms. In Proceedings of ASRU. Merano; 13-17 December 2009:404-409.

Shen W, White TJ, Hazen CM: A comparison of Query-by-Example methods for Spoken Term Detection. In Proceedings of Interspeech. Brighton; September 2009:2143-2146.

Lin H, Stupakov A, Bilmes J: Improving multi-lattice alignment based spoken keyword spotting. In Proceedings of ICASSP. Taipei; 19–24 April 2009:4877-4880.

Barnard E, Davel M, van Heerden C, Kleynhans N, Bali K: Phone recognition for spoken web search. In Proceedings of MediaEval. Pisa; 1–2 September 2011:5-6.

Buzo A, Cucu H, Safta M, Ionescu B, Burileanu C: ARF@MediaEval 2012: a Romanian ASR-based approach to spoken term detection. In Proceedings of MediaEval. Pisa; 4–5 October 2012:7-8.

Abad A, Astudillo RF: The L2F spoken web search system for MediaEval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:9-10.

Varona A, Penagarikano M, Rodríguez-Fuentes L, Bordel L, Diez M: GTTS system for the spoken web search task at MediaEval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:13-14.

Szöke I, Faps̆o M, Veselý K: BUT2012 Approaches for spoken web search - MediaEval 2012. In Proceedings of MediaEval. Pisa;4–5October 2012:15-16.

Hazen W, Shen TJ, White CM: Query-by-Example spoken term detection using phonetic posteriorgram templates. In Proceedings of ASRU. Merano; 13–17 December 2009:421-426.

Zhang Y, Glass JR: Unsupervised spoken keyword spotting via segmental DTW on Gaussian Posteriorgrams. In Proceedings of ASRU. Merano; 13–17 December 2009:398-403.

Chan C, Lee L: Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping. In Proceedings of Interspeech. Makuhari; 26–30 September 2010:693-696.

Anguera X, Macrae R, Oliver N: Partial sequence matching using an unbounded dynamic time warping algorithm. In Proceedings of ICASSP. Dallas; 14–19 March 2010:3582-3585.

Anguera X: Telefonica system for the spoken web search Task at Mediaeval 2011. In Proceedings of MediaEval. Pisa; 1–2 September 2011:3-4.

Muscariello A, Gravier G: Irisa MediaEval 2011 spoken web search system. In Proceedings of MediaEval. Pisa; 1–2 September 2011:9-10.

Szöke I, Tejedor J, Faps̆o M, Colás J: BUT-HCTLab approaches for spoken web search - MediaEval 2011. In Proceedings of MediaEval. Pisa; 1–2 September 2011:11-12.

Wang H, Lee T: CUHK System for the spoken web search task at Mediaeval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:3-4.

Joder C, Weninger F, Wöllmer M, Schuller M: The TUM cumulative DTW approach for the Mediaeval 2012 spoken web search task. In Proceedings of MediaEval. Pisa; 4–5 October 2012:5-6.

Vavrek J, Pleva M, Juhár J: TUKE MediaEval 2012: spoken web search using DTW and unsupervised SVM. In Proceedings of MediaEval. Pisa; 4–5 October 2012:11-12.

Jansen A, Durme P, Clark BV: The JHU-HLTCOE spoken web search system for MediaEval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:17-18.

Anguera X: Telefonica Research System for the spoken web search task at Mediaeval 2012. In Proceedings of MediaEval. Pisa; 4–5 October 2012:19-20.

NIST: The Ninth Text REtrieval Conference (TREC 9). 2000. http://trec.nist.gov . Accessed 16 September 2013

NIST: The Spoken Term Detection (STD) 2006 Evaluation Plan. 10 (National Institute of Standards and Technology (NIST), Gaithersburg, 2006). . Accessed 16 September 2013 http://www.nist.gov/speech/tests/std

Sakai T, Joho H: Overview of NTCIR-9. Proceedings of NTCIR-9 Workshop 2011, 1-7.

Rajput N, Metze F: Spoken web search. In Proceedings of MediaEval. Pisa; 1–2 September 2011:1-2.

Metze F, Barnard E, Davel M, van Heerden C, Anguera X, Gravier G, Rajput N: Spoken web search. In Proceedings of MediaEval. Pisa; 4–5 October 2012:1-2.

Tokyo University of Technology: Evaluation of information access technologies: information retrieval, question answering and cross-lingual information access. 2013. http://research.nii.ac.jp/ntcir/ntcir-10/ . Accessed 16 September 2013

NIST: The OpenKWS13 evaluation plan. 1, (National Institute of Standards and Technology (NIST), Gaithersburg, 2013). . Accessed 16 September 2013 http://www.nist.gov/itl/iad/mig/openkws13.cfm

Taras B, Nadeu C: Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion. EURASIP J. Audio Speech Music Process 2011, 1: 1-10.

Zelenák M, Schulz H, Hernando J: Speaker diarization of broadcast news in Albayzin 2010 evaluation campaign. EURASIP J. Audio Speech Music Process 2012, 19: 1-9.

Rodríguez-Fuentes LJ, Penagarikano M, Varona A, Díez M, Bordel G: The Albayzin 2010 language recognition evaluation. In Proceedings of Interspeech. Florence; 27–31 August 2011:1529-1532.

Méndez F, Docío L, Arza M, Campillo F: The Albayzin 2010 text-to-speech evaluation. In Proceedings of FALA. Vigo; November 2010:317-340.

Fiscus JG, Ajot J, Garofolo JS, Doddington G: Results of the 2006 spoken term detection evaluation. In Proceedings of SIGIR Workshop Searching Spontaneous Conversational Speech. Rhodes; 22–25 September 2007:45-50.

Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M: The DET curve in assessment of detection task performance. In Proceedings of Eurospeech. Rhodes; 22-25 September 1997:1895-1898.

NIST: NIST Speech Tools and APIs: 2006 (National Institute of Standards and Technology (NIST), Gaithersburg, 1996). . Accessed 16 September 2013 http://www.nist.gov/speech/tools/index.htm

Iberspeech 2012: VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop. . Accessed 16 September 2013 http://iberspeech2012.ii.uam.es/IberSPEECH2012_OnlineProceedings.pdf

Anguera X: Speaker independent discriminant feature extraction for acoustic pattern-matching. In Proceedings of ICASSP. Kyoto; 25–30 March 2012:485-488.

Anguera X, Ferrarons M: Memory efficient subsequence DTW for Query-by-Example spoken term detection. Proceedings of ICME 2013. http://www.xavieranguera.com/papers/sdtw_icme2013.pdf

Anguera X: Telefonica Research System for the Query-by-example task at Albayzin 2012. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:626-632.

Schwarz P: Phoneme recognition based on long temporal context. PhD Thesis, FIT, BUT, Brno, Czech Republic. 2008.

Stolckem A: SRILM - an extensible language modeling toolkit. In Proceedings of Interspeech. Denver; 2002:901-904.

Wang D, King S, Frankel J: Stochastic pronunciation modelling for out-of-vocabulary spoken term detection. IEEE Trans. Audio Speech Language Process 2011, 19(4):688-698.

Wang D, Tejedor J, King S, Frankel J: Term-dependent confidence normalization for out-of-vocabulary spoken term detection. J. Comput. Sci. Technol 2012, 27(2):358-375. 10.1007/s11390-012-1228-x

Wang D, King S, Frankel J, Vipperla R, Evans N, Troncy R: Direct posterior confidence for out-of-vocabulary spoken term detection. ACM Trans. Inf. Syst 2012, 30(3):1-34.

Varona A, Penagarikano M, Rodríguez-Fuentes LJ, Bordel G, Diez M: GTTS systems for the query-by-example spoken term detection task of the Albayzin 2012 search on speech evaluation. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:619-625.

Gómez J, Sanchis E, Castro-Bleda M: Automatic speech segmentation based on acoustical clustering. Proceedings of the Joint IAPR International Conference on Structural, Syntactic, and Statistical Pattern Recognition 2010, 540-548.

Gómez J, Castro M: Automatic segmentation of speech at the phonetic level. Proceedings of the joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition 2002, 672-680.

Sanchis E, Hurtado LF, Gómez JA, Calvo M, Fabra R: The ELiRF Query-by-example STD systems for the Albayzin 2012 search on speech evaluation. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:611-618.

Park A, Glass J: Towards unsupervised pattern discovery in speech. In Proceedings of ASRU. Cancun; 27 November to 1 December 2005:53-58.

Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P: The HTK Book. Engineering Department, Cambridge University; 2006.

Miguel A, Villalba J, Ortega A, Lleida E: Albayzin 2012 search on speech @ ViVoLab UZ. In Proceedings of IberSPEECH. Madrid, Spain; 21–23 November 2012:633-642.

Boersma P, Weenink D: Praat: Doing Phonetics by Computer. University of Amsterdam, Spuistraat, 210, Amsterdam, Holland. 2007. http://www.fon.hum.uva.nl/praat/ . Accessed 16 September 2013

Goldwater S, Jurafsky D, Maning CD: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun 2009, 52(3):181-200.

Mertens T, Wallace R, Schneider D: Cross-site combination and evaluation of subword spoken term detection systems. In Proceedings of CBMI. Madrid; 13–15 June 2011:61-66.

[-]

This item appears in the following Collection(s)

Show full item record