MSIR@FIRE: A Comprehensive Report from 2013 to 2016

Banerjee, Somnath; Choudhury, Monojit; Chakma, Kunal; Kumar Naskar, Sudip; Das, Amitava; Bandyopadhyay, Sivaji; Rosso, Paolo

doi:10.1007/s42979-019-0058-0

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

MSIR@FIRE: A Comprehensive Report from 2013 to 2016

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: BanerjeeChoudhury ...

Tamaño: 409.0Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: FIRE_Special_Issu ...

Tamaño: 1.068Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Banerjee, Somnath	es_ES
dc.contributor.author	Choudhury, Monojit	es_ES
dc.contributor.author	Chakma, Kunal	es_ES
dc.contributor.author	Kumar Naskar, Sudip	es_ES
dc.contributor.author	Das, Amitava	es_ES
dc.contributor.author	Bandyopadhyay, Sivaji	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.date.accessioned	2021-11-05T14:10:50Z
dc.date.available	2021-11-05T14:10:50Z
dc.date.issued	2020	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/176432
dc.description.abstract	[EN] India is a nation of geographical and cultural diversity where over 1600 dialects are spoken by the people. With the technological advancement, penetration of the internet and cheaper access to mobile data, India has recently seen a sudden growth of internet users. These Indian internet users generate contents either in English or in other vernacular Indian languages. To develop technological solutions for the contents generated by the Indian users using the Indian languages, the Forum for Information Retrieval Evaluation (FIRE) was established and held for the first time in 2008. Although Indian languages are written using indigenous scripts, often websites and user-generated content (such as tweets and blogs) in these Indian languages are written using Roman script due to various socio-cultural and technological reasons. A challenge that search engines face while processing transliterated queries and documents is that of extensive spelling variation. MSIR track was first introduced in 2013 at FIRE and the aim of MSIR was to systematically formalize several research problems that one must solve to tackle the code mixing in Web search for users of many languages around the world, develop related data sets, test benches and most importantly, build a research community focusing on this important problem that has received very little attention. This document is a comprehensive report on the 4 years of MSIR track evaluated at FIRE between 2013 and 2016.	es_ES
dc.description.sponsorship	Somnath Banerjee and Sudip Kumar Naskar are supported by Media Lab Asia, MeitY, Government of India, under the Visvesvaraya PhD Scheme for Electronics & IT. The work of Paolo Rosso was partially supported by the MISMIS research project PGC2018-096212-B-C31 funded by the Spanish MICINN.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer	es_ES
dc.relation.ispartof	SN Computer Science	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Information retrieval	es_ES
dc.subject	Indian languages	es_ES
dc.subject	Social media	es_ES
dc.subject	Transliterated search	es_ES
dc.subject	Code-mixed QA	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	MSIR@FIRE: A Comprehensive Report from 2013 to 2016	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s42979-019-0058-0	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PGC2018-096212-B-C31/ES/DESINFORMACION Y AGRESIVIDAD EN SOCIAL MEDIA: AGREGANDO INFORMACION Y ANALIZANDO EL LENGUAJE/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Banerjee, S.; Choudhury, M.; Chakma, K.; Kumar Naskar, S.; Das, A.; Bandyopadhyay, S.; Rosso, P. (2020). MSIR@FIRE: A Comprehensive Report from 2013 to 2016. SN Computer Science. 1(55):1-15. https://doi.org/10.1007/s42979-019-0058-0	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1007/s42979-019-0058-0	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	15	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	1	es_ES
dc.description.issue	55	es_ES
dc.identifier.eissn	2661-8907	es_ES
dc.relation.pasarela	S\434253	es_ES
dc.contributor.funder	AGENCIA ESTATAL DE INVESTIGACION	es_ES
dc.description.references	Ahmed UZ, Bali K, Choudhury M, Sowmya VB. Challenges in designing input method editors for Indian languages: the role of word-origin and context. In: Advances in text input methods (WTIM 2011). 2011. pp. 1–9	es_ES
dc.description.references	Banerjee S, Chakma K, Naskar SK, Das A, Rosso P, Bandyopadhyay S, Choudhury M. Overview of the mixed script information retrieval (MSIR) at fire-2016. In: Forum for information retrieval evaluation. Springer; 2016. pp. 39–49.	es_ES
dc.description.references	Banerjee S, Kuila A, Roy A, Naskar SK, Rosso P, Bandyopadhyay S. A hybrid approach for transliterated word-level language identification: CRF with post-processing heuristics. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 54–59.	es_ES
dc.description.references	Banerjee S, Naskar S, Rosso P, Bandyopadhyay S. Code mixed cross script factoid question classification—a deep learning approach. J Intell Fuzzy Syst. 2018;34(5):2959–69.	es_ES
dc.description.references	Banerjee S, Naskar SK, Rosso P, Bandyopadhyay S. The first cross-script code-mixed question answering corpus. In: Proceedings of the workshop on modeling, learning and mining for cross/multilinguality (MultiLingMine 2016), co-located with the 38th European Conference on Information Retrieval (ECIR). 2016.	es_ES
dc.description.references	Banerjee S, Naskar SK, Rosso P, Bandyopadhyay S. Named entity recognition on code-mixed cross-script social media content. Comput Sistemas. 2017;21(4):681–92.	es_ES
dc.description.references	Barman U, Das A, Wagner J, Foster J. Code mixing: a challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching. 2014. pp. 13–23.	es_ES
dc.description.references	Bhardwaj P, Pakray P, Bajpeyee V, Taneja A. Information retrieval on code-mixed Hindi–English tweets. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.	es_ES
dc.description.references	Bhargava R, Khandelwal S, Bhatia A, Sharmai Y. Modeling classifier for code mixed cross script questions. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.	es_ES
dc.description.references	Bhattacharjee D, Bhattacharya, P. Ensemble classifier based approach for code-mixed cross-script question classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.	es_ES
dc.description.references	Chakma K, Das A. CMIR: a corpus for evaluation of code mixed information retrieval of Hindi–English tweets. In: The 17th international conference on intelligent text processing and computational linguistics (CICLING). 2016.	es_ES
dc.description.references	Choudhury M, Chittaranjan G, Gupta P, Das A. Overview of fire 2014 track on transliterated search. Proceedings of FIRE. 2014. pp. 68–89.	es_ES
dc.description.references	Ganguly D, Pal S, Jones GJ. Dcu@fire-2014: fuzzy queries with rule-based normalization for mixed script information retrieval. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 80–85.	es_ES
dc.description.references	Gella S, Sharma J, Bali K. Query word labeling and back transliteration for Indian languages: shared task system description. FIRE Working Notes. 2013;3.	es_ES
dc.description.references	Gupta DK, Kumar S, Ekbal A. Machine learning approach for language identification and transliteration. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 60–64.	es_ES
dc.description.references	Gupta P, Bali K, Banchs RE, Choudhury M, Rosso P. Query expansion for mixed-script information retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, ACM, 2014. pp. 677–686.	es_ES
dc.description.references	Gupta P, Rosso P, Banchs RE. Encoding transliteration variation through dimensionality reduction: fire shared task on transliterated search. In: Fifth forum for information retrieval evaluation. 2013.	es_ES
dc.description.references	HB Barathi Ganesh, M Anand Kumar, KP Soman. Distributional semantic representation for information retrieval. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.	es_ES
dc.description.references	HB Barathi Ganesh, M Anand Kumar, KP Soman. Distributional semantic representation for text classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.	es_ES
dc.description.references	Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst. 2002;20:422–46. https://doi.org/10.1145/582415.582418.	es_ES
dc.description.references	Joshi H, Bhatt A, Patel H. Transliterated search using syllabification approach. In: Forum for information retrieval evaluation. 2013.	es_ES
dc.description.references	King B, Abney S. Labeling the languages of words in mixed-language documents using weakly supervised methods. In: Proceedings of NAACL-HLT, 2013. pp. 1110–1119.	es_ES
dc.description.references	Londhe N, Srihari RK. Exploiting named entity mentions towards code mixed IR: working notes for the UB system submission for MSIR@FIRE’16. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.	es_ES
dc.description.references	Anand Kumar M, Soman KP. Amrita-CEN@MSIR-FIRE2016: Code-mixed question classification using BoWs and RNN embeddings. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.	es_ES
dc.description.references	Majumder G, Pakray P. NLP-NITMZ@MSIR 2016 system for code-mixed cross-script question classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.	es_ES
dc.description.references	Mandal S, Banerjee S, Naskar SK, Rosso P, Bandyopadhyay S. Adaptive voting in multiple classifier systems for word level language identification. In: FIRE workshops, 2015. pp. 47–50.	es_ES
dc.description.references	Mukherjee A, Ravi A , Datta K. Mixed-script query labelling using supervised learning and ad hoc retrieval using sub word indexing. In: Proceedings of the Forum for Information Retrieval Evaluation, Bangalore, India, 2014.	es_ES
dc.description.references	Pakray P, Bhaskar P. Transliterated search system for Indian languages. In: Pre-proceedings of the 5th FIRE-2013 workshop, forum for information retrieval evaluation (FIRE). 2013.	es_ES
dc.description.references	Patel S, Desai V. Liga and syllabification approach for language identification and back transliteration: a shared task report by da-iict. In: Proceedings of the forum for information retrieval evaluation, ACM, 2014. pp. 43–47.	es_ES
dc.description.references	Prabhakar DK, Pal S. Ism@fire-2013 shared task on transliterated search. In: Post-Proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, ACM, 2013. p. 17.	es_ES
dc.description.references	Prabhakar DK, Pal S. Ism@ fire-2015: mixed script information retrieval. In: FIRE workshops. 2015. pp. 55–58.	es_ES
dc.description.references	Prakash A, Saha SK. A relevance feedback based approach for mixed script transliterated text search: shared task report by bit Mesra. In: Proceedings of the Forum for Information Retrieval Evaluation, Bangalore, India, 2014.	es_ES
dc.description.references	Raj A, Karfa S. A list-searching based approach for language identification in bilingual text: shared task report by asterisk. In: Working notes of the shared task on transliterated search at forum for information retrieval evaluation FIRE’14. 2014.	es_ES
dc.description.references	Roy RS, Choudhury M, Majumder P, Agarwal K. Overview of the fire 2013 track on transliterated search. In: Post-proceedings of the 4th and 5th workshops of the forum for information retrieval evaluation, ACM, 2013. p. 4.	es_ES
dc.description.references	Saini A. Code mixed cross script question classification. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. CEUR-WS.org. 2016.	es_ES
dc.description.references	Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill, Inc.; 1986.	es_ES
dc.description.references	Sequiera R, Choudhury M, Gupta P, Rosso P, Kumar S, Banerjee S, Naskar SK, Bandyopadhyay S, Chittaranjan G, Das A, et al. Overview of fire-2015 shared task on mixed script information retrieval. FIRE Workshops. 2015;1587:19–25.	es_ES
dc.description.references	Singh S, M Anand Kumar, KP Soman. CEN@Amrita: information retrieval on code mixed Hindi–English tweets using vector space models. In: Working notes of FIRE 2016—forum for information retrieval evaluation, Kolkata, India, December 7–10, 2016, CEUR workshop proceedings. 2016.	es_ES
dc.description.references	Sinha N, Srinivasa G. Hindi–English language identification, named entity recognition and back transliteration: shared task system description. In: Working notes os shared task on transliterated search at forum for information retrieval evaluation FIRE’14. 2014.	es_ES
dc.description.references	Voorhees EM, Tice DM. The TREC-8 question answering track evaluation. In: TREC-8, 1999. pp. 83–105.	es_ES
dc.description.references	Vyas Y, Gella S, Sharma J, Bali K, Choudhury M. Pos tagging of English–Hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. pp. 974–979.	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

MSIR@FIRE: A Comprehensive Report from 2013 to 2016

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

MSIR@FIRE: A Comprehensive Report from 2013 to 2016

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)