A hybrid approach for transliterated word-level language identification: CRF with post processing heuristics

Banerjee, Somnath; Kuila, Alapan; Roy, Aniruddha; Naskar, Sudip Kumar; Rosso, Paolo; Bandyopadhyay, Sivaji

doi:10.1145/2824864.2824876

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

A hybrid approach for transliterated word-level language identification: CRF with post processing heuristics

Mostrar el registro completo del ítem

Banerjee, S.; Kuila, A.; Roy, A.; Naskar, SK.; Rosso, P.; Bandyopadhyay, S. (2014). A hybrid approach for transliterated word-level language identification: CRF with post processing heuristics. En FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation. ACM. 170-173. https://doi.org/10.1145/2824864.2824876

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/66381

Ficheros en el ítem

Nombre: Banerjee_TS_EtAl_ ...

Tamaño: 190.3Kb

Formato: PDF

Descripción: Versión del autor

Abrir/Preview

Nombre: p54-banerjee.pdf

Tamaño: 217.5Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

Metadatos del ítem

Título:

A hybrid approach for transliterated word-level language identification: CRF with post processing heuristics

Autor:

Banerjee, Somnath Kuila, Alapan Roy, Aniruddha Naskar, Sudip Kumar

Rosso, Paolo Bandyopadhyay, Sivaji

Entidad UPV:

Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació

Fecha difusión:

2014-12-05

Resumen:

[EN] In this paper, we describe a hybrid approach for word-level language (WLL) identification of Bangla words written in Roman script and mixed with English words as part of our participation in the shared task on ...[+]

Palabras clave:

Code switch , Transliteration , Word-Level Language Identification

Derechos de uso:

Reserva de todos los derechos

ISBN:

978-1-4503-3755-7

Fuente:

FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation.

DOI:

10.1145/2824864.2824876

Editorial:

ACM

Versión del editor:

http://dx.doi.org/10.1145/2824864.2824876

Título del congreso:

6th Forum for Information Retrieval Evaluation (FIRE 2014)

Lugar del congreso:

Bangalore, India

Fecha congreso:

December, 5-7, 2014

Código del Proyecto:

info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/
info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/

Descripción:

© {Owner/Author | ACM} {Year}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in FIRE '14 Proceedings of the Forum for Information Retrieval Evaluation, http://dx.doi.org/10.1145/2824864.2824876

Agradecimientos:

We acknowledge the support of the Department of Electronics and Information Technology (DeitY), Government of India, through the project “CLIA System Phase II”. The research work of the last author was carried out in the ...[+]

Tipo:

Capítulo de libro Comunicación en congreso

References

Y. Al-Onaizan and K. Knight. Named entity translation: Extended abstract. In HLT, pages 122--124. Singapore, 2002.

P. J. Antony, V. P. Ajith, and K. P. Suman. Feature extraction based english to kannada transliteration. In In hird International conference on Semantic E-business and Enterprise Computing. SEEC 2010, 2010.

P. J. Antony, V. P. Ajith, and K. P. Suman. Kernel method for english to kannada transliteration. In International conference on-Recent trends in Information, Telecommunication and computing. ITC2010, 2010.

M. Arbabi, S. M. Fischthal, V. C. Cheng, and E. Bart. Algorithms for arabic name transliteration. In IBM Journal of Research and Development, page 183. TeX Users Group, 1994.

S. Banerjee, S. Naskar, and S. Bandyopadhyay. Bengali named entity recognition using margin infused relaxed algorithm. In TSD, pages 125--132. Springer International Publishing, 2014.

U. Barman, J. Wagner, G. Chrupala, and J. Foster. Identification of languages and encodings in a multilingual document. page 127. EMNLP, 2014.

K. R. Beesley. Language identifier: A computer program for automatic natural-language identification of on-line text. pages 47--54. ATA, 1988.

P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer. Mercer: The mathematics of statistical machine translation: parameter estimation. pages 263--311. Computational Linguistics, 1993.

M. Carpuat. Mixed-language and code-switching in the canadian hansard. page 107. EMNLP, 2014.

G. Chittaranjan, Y. Vyas, K. Bali, and M. Choudhury. Word-level language identification using crf: Code-switching shared task report of msr india system. pages 73--79. EMNLP, 2014.

A. Das, A. Ekbal, T. Mandal, and S. Bandyopadhyay. English to hindi machine transliteration system at news. pages 80--83. Proceeding of the Named Entities Workshop ACL-IJCNLP, Singapore, 2009.

A. Ekbal, S. Naskar, and S. Bandyopadhyay. A modified joint source channel model for transliteration. pages 191--198. COLING-ACL Australia, 2006.

I. Goto, N. Kato, N. Uratani, and T. Ehara. Transliteration considering context information based on the maximum entropy method. pages 125--132. MT-Summit IX, New Orleans, USA, 2003.

R. Haque, S. Dandapat, A. K. Srivastava, S. K. Naskar, and A. Way. English to hindi transliteration using context-informed pb-smt:the dcu system for news 2009. NEWS 2009, 2009.

S. Y. Jung, S. Hong, and E. Paek. An english to korean transliteration model of extended markov window.

S. Y. Jung, S. L. Hong, and E. Paek. An english to korean transliteration model of extended markov window. pages 383--389. COLING, 2000.

B. J. Kang and K. S. Choi. Automatic transliteration and back-transliteration by decision tree learning. LERC, May 2000.

B. King and S. Abney. Labeling the languages of words in mixed-language documents using weakly supervised methods. pages 1110--1119. NAACL-HLT, 2013.

R. Kneser and H. Ney. Improved backing-off for m-gram language modeling. In ICASSP, pages 181--184. Detroit, MI, 1995.

R. Kneser and H. Ney. SRILM-an extensible language modeling toolkit. In Intl. Conf. on Spoken Language Processing, pages 901--904, 2002.

K. Knight and J. Graehl. Machine transliteration. in computational linguistics. pages 599--612, 1998.

P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: open source toolkit for statistical machine translation. In ACL, pages 177--180, 2007.

P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In HLT-NAACL, 2003.

A. Kumaran and T. Kellner. A generic framework for machine transliteration. In 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 721--722. ACM, 2007.

H. Li, Z. Min, and J. Su. A joint source-channel model for machine transliteration. In ACL, page 159, 2004.

C. Lignos and M. Marcus. Toward web-scale analysis of codeswitching. In Annual Meeting of the Linguistic Society of America, 2013.

J. H. Oh and K. S. Choi. An english-korean transliteration model using pronunciation and contextual rules. In 19th international conference on Computational linguistics. ACL, 2002.

T. Rama and K. Gali. Modeling machine transliteration as a phrase based statistical machine translation problem. In Language Technologies Research Centre. IIIT, Hyderabad, India, 2009.

A. K. Singh and J. Gorla. Identification of languages and encodings in a multilingual document. In ACL-SIGWAC's Web As Corpus3, page 95. Presses univ. de Louvain, 2007.

V. Sowmya, M. Choudhury, K. Bali, T. Dasgupta, and A. Basu. Resource creation for training and testing of transliteration systems for indian languages. In LREC, pages 2902--2907, 2010.

V. Sowmya and V. Varma. Transliteration based text input methods for telugu. In ICCPOL-2009, 2009.

B. G. Stalls and J. Graehl. Translating names and technical terms in arabic text. In Workshop on Computational Approaches to Semitic Languages, pages 34--41. ACL, 1998.

S. Sumaja, R. Loganathan, and K. P. Suman. English to malayalam transliteration using sequence labeling approach. International Journal of Recent Trends in Engineering, 1(2), 2009.

M. S. Vijaya, V. P. Ajith, G. Shivapratap, and K. P. Soman. English to tamil transliteration using weka. International Journal of Recent Trends in Engineering, 2009.

[-]

recommendations

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro completo del ítem

A hybrid approach for transliterated word-level language identification: CRF with post processing heuristics

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

A hybrid approach for transliterated word-level language identification: CRF with post processing heuristics

Ficheros en el ítem

Metadatos del ítem

References

recommendations

Este ítem aparece en la(s) siguiente(s) colección(ones)