- -

Querying out-of-vocabulary words in lexicon-based keyword spotting

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Querying out-of-vocabulary words in lexicon-based keyword spotting

Show simple item record

Files in this item

dc.contributor.author Puigcerver, Joan es_ES
dc.contributor.author Toselli, Alejandro Héctor es_ES
dc.contributor.author Vidal, Enrique es_ES
dc.date.accessioned 2017-06-09T10:47:46Z
dc.date.available 2017-06-09T10:47:46Z
dc.date.issued 2016-02
dc.identifier.issn 0941-0643
dc.identifier.uri http://hdl.handle.net/10251/82643
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/s00521-016-2197-8 es_ES
dc.description.abstract [EN] Lexicon-based handwritten text keyword spotting (KWS) has proven to be a faster and more accurate alternative to lexicon-free methods. Nevertheless, since lexicon-based KWS relies on a predefined vocabulary, fixed in the training phase, it does not support queries involving out-of-vocabulary (OOV) keywords. In this paper, we outline previous work aimed at solving this problem and present a new approach based on smoothing the (null) scores of OOV keywords by means of the information provided by ``similar'' in-vocabulary words. Good results achieved using this approach are compared with previously published alternatives on different data sets. es_ES
dc.description.sponsorship This work was partially supported by the Spanish MEC under FPU Grant FPU13/06281, by the Generalitat Valenciana under the Prometeo/2009/014 Project Grant ALMA-MATER, and through the EU Projects: HIMANIS (JPICH programme, Spanish grant Ref. PCIN-2015-068) and READ (Horizon-2020 programme, grant Ref. 674943). en_EN
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation MICINN/PCIN-2015-068 es_ES
dc.relation MEC/FPU13/06281 es_ES
dc.relation GV/PROMETEO/2009/014 es_ES
dc.relation.ispartof Neural Computing and Applications es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Keyword spotting es_ES
dc.subject Lexicon-based es_ES
dc.subject Smoothing es_ES
dc.subject Out-of-vocabulary es_ES
dc.subject Handwritten text recognition es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Querying out-of-vocabulary words in lexicon-based keyword spotting es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s00521-016-2197-8
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/674943/EU es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Puigcerver, J.; Toselli, AH.; Vidal, E. (2016). Querying out-of-vocabulary words in lexicon-based keyword spotting. Neural Computing and Applications. 1-10. doi:10.1007/s00521-016-2197-8 es_ES
dc.description.accrualMethod Senia es_ES
dc.relation.publisherversion https://link.springer.com/article/10.1007/s00521-016-2197-8 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 10 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.senia 303254 es_ES
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Educación y Ciencia (MEC)
dc.contributor.funder Generalitat Valenciana (GV)
dc.contributor.funder Ministerio de Ciencia e Innovación (MICINN)
dc.relation.references Almazan J, Gordo A, Fornes A, Valveny E (2013) Handwritten word spotting with corrected attributes. In: 2013 IEEE international conference on computer vision (ICCV), pp 1017–1024. doi: 10.1109/ICCV.2013.130 es_ES
dc.relation.references Amengual JC, Vidal E (2000) On the estimation of error-correcting parameters. In: Proceedings 15th international conference on pattern recognition, 2000, vol 2, pp 883–886 es_ES
dc.relation.references Fernández D, Lladós J, Fornés A (2011) Handwritten word spotting in old manuscript images using a pseudo-structural descriptor organized in a hash structure. In: Vitri'a J, Sanches JM, Hern'andez M (eds) Pattern recognition and image analysis: Proceedings of 5th Iberian Conference, IbPRIA 2011, Las Palmas de Gran Canaria, Spain, June 8–10. Springer, Berlin, Heidelberg, pp 628–635. doi: 10.1007/978-3-642-21257-4_78 es_ES
dc.relation.references Fischer A, Keller A, Frinken V, Bunke H (2012) Lexicon-free handwritten word spotting using character HMMs. Pattern Recognit Lett 33(7):934–942. doi: 10.1016/j.patrec.2011.09.009 Special Issue on Awards from ICPR 2010 es_ES
dc.relation.references Fornés A, Frinken V, Fischer A, Almazán J, Jackson G, Bunke H (2011) A keyword spotting approach using blurred shape model-based descriptors. In: Proceedings of the 2011 workshop on historical document imaging and processing, pp 83–90. ACM es_ES
dc.relation.references Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224. doi: 10.1109/TPAMI.2011.113 es_ES
dc.relation.references Gatos B, Pratikakis I (2009) Segmentation-free word spotting in historical printed documents. In: 10th International conference on document analysis and recognition, 2009. ICDAR’09, pp 271–275. IEEE es_ES
dc.relation.references Jelinek F (1998) Statistical methods for speech recognition. MIT Press, Cambridge es_ES
dc.relation.references Kneser R, Ney H (1995) Improved backing-off for N-gram language modeling. In: International conference on acoustics, speech and signal processing (ICASSP ’95), vol 1, pp 181–184. IEEE Computer Society, Los Alamitos, CA, USA. doi: http://doi.ieeecomputersociety.org/10.1109/ICASSP.1995.479394 es_ES
dc.relation.references Kolcz A, Alspector J, Augusteijn M, Carlson R, Popescu GV (2000) A line-oriented approach to word spotting in handwritten documents. Pattern Anal Appl 3:153–168. doi: 10.1007/s100440070020 es_ES
dc.relation.references Konidaris T, Gatos B, Ntzios K, Pratikakis I, Theodoridis S, Perantonis SJ (2007) Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. Int J Doc Anal Recognit 9(2–4):167–177 es_ES
dc.relation.references Kumar G, Govindaraju V (2014) Bayesian active learning for keyword spotting in handwritten documents. In: 2014 22nd International conference on pattern recognition (ICPR), pp 2041–2046. IEEE es_ES
dc.relation.references Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10(8):707–710 es_ES
dc.relation.references Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York es_ES
dc.relation.references Marti UV, Bunke H (2002) The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5(1):39–46. doi: 10.1007/s100320200071 es_ES
dc.relation.references Puigcerver J, Toselli AH, Vidal E (2014) Word-graph and character-lattice combination for KWS in handwritten documents. In: 14th International conference on frontiers in handwriting recognition (ICFHR), pp 181–186 es_ES
dc.relation.references Puigcerver J, Toselli AH, Vidal E (2014) Word-graph-based handwriting keyword spotting of out-of-vocabulary queries. In: 22nd International conference on pattern recognition (ICPR), pp 2035–2040 es_ES
dc.relation.references Puigcerver J, Toselli AH, Vidal E (2015) A new smoothing method for lexicon-based handwritten text keyword spotting. In: 7th Iberian conference on pattern recognition and image analysis. Springer es_ES
dc.relation.references Rath T, Manmatha R (2007) Word spotting for historical documents. Int J Doc Anal Recognit 9:139–152 es_ES
dc.relation.references Robertson S. (2008) A new interpretation of average precision. In: Proceedings of the international. ACM SIGIR conference on research and development in information retrieval (SIGIR ’08), pp 689–690. ACM, New York, NY, USA. doi: http://doi.acm.org/10.1145/1390334.1390453 es_ES
dc.relation.references Rodriguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden markov models and universal vocabularies. Pattern Recognit 42(9):2106–2116. doi: 10.1016/j.patcog.2009.02.005 . http://www.sciencedirect.com/science/article/pii/S0031320309000673 es_ES
dc.relation.references Rusinol M, Aldavert D, Toledo R, Llados J (2011) Browsing heterogeneous document collections by a segmentation-free word spotting method. In: International conference on document analysis and recognition (ICDAR), pp 63–67. doi: 10.1109/ICDAR.2011.22 es_ES
dc.relation.references Shang H, Merrettal T (1996) Tries for approximate string matching. IEEE Trans Knowl Data Eng 8(4):540–547 es_ES
dc.relation.references Toselli AH, Vidal E (2013) Fast HMM-Filler approach for key word spotting in handwritten documents. In: Proceedings of the 12th international conference on document analysis and recognition (ICDAR), pp 501–505 es_ES
dc.relation.references Toselli AH, Vidal E (2014) Word-graph based handwriting key-word spotting: impact of word-graph size on performance. In: 11th IAPR international workshop on document analysis systems (DAS), pp 176–180. IEEE es_ES
dc.relation.references Toselli AH, Vidal E, Romero V, Frinken V (2013) Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politécnica de Valéncia es_ES
dc.relation.references Vidal E, Toselli AH, Puigcerver J (2015) High performance query-by-example keyword spotting using query-by-string techniques. In: 2015 13th International conference on document analysis and recognition (ICDAR), pp 741–745. IEEE es_ES
dc.relation.references Woodland P, Leggetter C, Odell J, Valtchev V, Young S (1995) The 1994 HTK large vocabulary speech recognition system. In: International conference on acoustics, speech, and signal processing (ICASSP ’95), vol 1, pp 73 –76. doi: 10.1109/ICASSP.1995.479276 es_ES
dc.relation.references Wshah S, Kumar G, Govindaraju V (2012) Script independent word spotting in offline handwritten documents based on hidden markov models. In: 2012 International conference on frontiers in handwriting recognition (ICFHR), pp 14–19. doi: 10.1109/ICFHR.2012.264 es_ES


This item appears in the following Collection(s)

Show simple item record