- -

Probabilistic multi-word spotting in handwritten text images

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Probabilistic multi-word spotting in handwritten text images

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Toselli, Alejandro Héctor es_ES
dc.contributor.author Vidal, Enrique es_ES
dc.contributor.author Puigcerver, Joan es_ES
dc.contributor.author Noya-García, Ernesto es_ES
dc.date.accessioned 2020-01-09T21:00:51Z
dc.date.available 2020-01-09T21:00:51Z
dc.date.issued 2019-05-02 es_ES
dc.identifier.issn 1433-7541 es_ES
dc.identifier.uri http://hdl.handle.net/10251/134140
dc.description.abstract [EN] Keyword spotting techniques are becoming cost-effective solutions for information retrieval in handwritten documents. We explore the extension of the single-word, line-level probabilistic indexing approach described in our previous works to allow for page-level search of queries consisting in Boolean combinations of several single-keywords. We propose heuristic rules to combine the single-word relevance probabilities into probabilistically consistent confidence scores of the multi-word boolean combinations. An empirical study, also presented in this paper, evaluates the search performance of word-pair queries involving AND and OR Boolean operations. Results of this study support the proposed approach and clearly show its effectiveness. Finally, a web-based demonstration system based on the proposed methods is presented. es_ES
dc.description.sponsorship This work was partially supported by the Generalitat Valenciana under the Prometeo/2009/014 Project Grant ALMAMATER, Spanish MEC under Grant FPU13/06281, and through the EU projects: HIMANIS (JPICH programme, Spanish grant Ref. PCIN-2015-068) and READ (Horizon-2020 programme, Grant Ref. 674943). es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof Pattern Analysis and Applications es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Handwritten text processing es_ES
dc.subject Keyword spotting es_ES
dc.subject Multi-word Boolean queries es_ES
dc.subject Image processing es_ES
dc.subject Pattern recognition es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.title Probabilistic multi-word spotting in handwritten text images es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s10044-018-0742-z es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//PCIN-2015-068/ES/INDEXACION DE MANUSCRITOS HISTORICOS PARA BUSQUEDAS CONTROLADAS POR EL USUARIO/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/674943/EU/Recognition and Enrichment of Archival Documents/
dc.relation.projectID info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F014/ES/Adaptive learning and multimodality in pattern recognition (Almapater)/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MECD//FPU13%2F06281/ES/FPU13%2F06281/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Toselli, AH.; Vidal, E.; Puigcerver, J.; Noya-García, E. (2019). Probabilistic multi-word spotting in handwritten text images. Pattern Analysis and Applications. 22(1):23-32. https://doi.org/10.1007/s10044-018-0742-z es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s10044-018-0742-z es_ES
dc.description.upvformatpinicio 23 es_ES
dc.description.upvformatpfin 32 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 22 es_ES
dc.description.issue 1 es_ES
dc.relation.pasarela S\372706 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Ministerio de Educación es_ES
dc.contributor.funder Ministerio de Economía y Empresa es_ES
dc.contributor.funder European Commission es_ES
dc.description.references Andreu Sanchez J, Romero V, Toselli A, Vidal E (2014) ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: 14th International conference on frontiers in handwriting recognition (ICFHR), 2014, pp 785–790 es_ES
dc.description.references Bazzi I, Schwartz R, Makhoul J (1999) An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans Pattern Anal Mach Intell 21(6):495–504 es_ES
dc.description.references Bluche T, Hamel S, Kermorvant C, Puigcerver J, Stutzmann D, Toselli AH, Vidal E (2017) Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the hIMANIS Project. In: 14th International conference on document analysis and recognition (ICDAR). (Accepted) es_ES
dc.description.references Bluche T, Hamel S, Kermorvant C, Puigcerver J, Stutzmann D, Toselli AH, Vidal E (2017) Preparatory kws experiments for large-scale indexing of a vast medieval manuscript collection in the himanis project. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 01, pp 311–316. https://doi.org/10.1109/ICDAR.2017.59 es_ES
dc.description.references Boole G (1854) An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. Macmillan, New York es_ES
dc.description.references Causer T, Wallace V (2012) Building a volunteer community: results and findings from Transcribe Bentham. Digital Humanities Quarterly 6 es_ES
dc.description.references España-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2011) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779. https://doi.org/10.1109/TPAMI.2010.141 es_ES
dc.description.references Fischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 15th International conference on virtual systems and multimedia, 2009. VSMM ’09, pp 137–142. https://doi.org/10.1109/VSMM.2009.26 es_ES
dc.description.references Fréchet M (1935) Généralisations du théorème des probabilités totales. Seminarjum Matematyczne es_ES
dc.description.references Fréchet M (1951) Sur les tableaux de corrélation dont les marges sont données. Ann Univ Lyon 3 $$^{\wedge }$$ ∧ e ser Sci Sect A 14:53–77 es_ES
dc.description.references Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855–868 es_ES
dc.description.references Jelinek F (1998) Statistical methods for speech recognition. MIT Press, Cambridge es_ES
dc.description.references Kneser R, Ney H (1995) Improved backing-off for N-gram language modeling. In: International conference on acoustics, speech and signal processing (ICASSP ’95), IEEE Computer Society, Los Alamitos, vol. 1, pp. 181–184, https://doi.org/10.1109/ICASSP.1995.479394 es_ES
dc.description.references Kozielski M, Forster J, Ney H (2012) Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 international conference on frontiers in handwriting recognition, ICFHR ’12, pp 256–261. IEEE Computer Society, Washington. https://doi.org/10.1109/ICFHR.2012.236 es_ES
dc.description.references Lavrenko V, Rath TM, Manmatha R (2004) Holistic word recognition for handwritten historical documents. In: First Proceedings of international workshop on document image analysis for libraries, 2004, pp 278–287. https://doi.org/10.1109/DIAL.2004.1263256 es_ES
dc.description.references Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval. Cambridge University Press, New York es_ES
dc.description.references Marti UV, Bunke H (2002) The iam-database: an english sentence database for offline handwriting recognition. Int J Doc Anal Recogn 5:39–46. https://doi.org/10.1007/s100320200071 es_ES
dc.description.references Noya-García E, Toselli AH, Vidal E (2017) Simple and effective multi-word query spotting in handwritten text images, pp 76–84. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-58838-4_9 es_ES
dc.description.references Pratikakis I, Zagoris K, Gatos B, Louloudis G, Stamatopoulos N (2014) ICFHR 2014 competition on handwritten keyword spotting (h-kws 2014). In: 14th International conference on frontiers in handwriting recognition (ICFHR), 2014, pp 814–819 es_ES
dc.description.references Puigcerver J, Toselli AH, Vidal E (2015) Icdar2015 competition on keyword spotting for handwritten documents. In: 13th international conference on document analysis and recognition (ICDAR), 2015, pp 1176–1180 es_ES
dc.description.references Riba P, Almazn J, Forns A, Fernndez-Mota D, Valveny E, Llads J (2014) e-crowds: a mobile platform for browsing and searching in historical demography-related manuscripts. In: 14th International conference on frontiers in handwriting recognition (ICFHR), 2014, pp 228–233. https://doi.org/10.1109/ICFHR.2014.46 es_ES
dc.description.references Robertson S (2008) A new interpretation of average precision. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval (SIGIR ’08), pp 689–690. ACM, New York. https://doi.org/10.1145/1390334.1390453 es_ES
dc.description.references Romero V, Toselli AH, Vidal E (2012) Multimodal interactive handwritten text transcription. Series in machine perception and artificial intelligence (MPAI). World Scientific Publishing, Singapore es_ES
dc.description.references Sánchez JA, Romero V, Toselli AH, Vidal E (2016) ICFHR2016 competition on handwritten text recognition on the READ dataset. In: 15th International conference on frontiers in handwriting recognition (ICFHR’16), pp 630–635. https://doi.org/10.1109/ICFHR.2016.0120 es_ES
dc.description.references Toselli A, Vidal E (2015) Handwritten text recognition results on the Bentham collection with improved classical N-Gram-HMM methods. In: 3rd International workshop on historical document imaging and processing (HIP15), pp 15–22 es_ES
dc.description.references Toselli AH, Juan A, Keysers D, González J, Salvador I, Ney H, Vidal E, Casacuberta F (2004) Integrated Handwriting Recognition and Interpretation using Finite-State Models. Int J Pattern Recogn Artif Intell 18(4):519–539 es_ES
dc.description.references Toselli AH, Vidal E, Romero V, Frinken V (2016) HMM word graph based keyword spotting in handwritten document images. Inf Sci 370(C):497–518. https://doi.org/10.1016/j.ins.2016.07.063 es_ES
dc.description.references Vidal E, Toselli AH, Puigcerver J (2015) High performance query-by-example keyword spotting using query-by-string techniques. In: Proceedings of 13th ICDAR, pp 741–745 es_ES
dc.description.references Vidal E, Toselli AH, Puigcerver J (2017) Lexicon-based probabilistic keyword spotting in handwritten text images (to be published) es_ES
dc.description.references Vinciarelli A, Bengio S, Bunke H (2004) Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans Pattern Anal Mach Intell 26(6):709–720 es_ES
dc.description.references Young S, Evermann G, Gales M, Hain T, Kershaw D (2009) The HTK book: hidden markov models toolkit V3.4. Microsoft Corporation and Cambridge Research Laboratory Ltd, Cambridge es_ES
dc.description.references Young S, Odell J, Ollason D, Valtchev V, Woodland P (1997) The HTK book: hidden markov models toolkit V2.1. Cambridge Research Laboratory Ltd, Cambridge es_ES
dc.description.references Zhu M (2004) Recall, precision and average precision. Working paper 2004-09 Department of Statistics and Actuarial Science–University of Waterloo es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem