Language variety identification using distributed representations of words and documents

Franco Salvador, Marc; Rangel, Francisco; Rosso, Paolo; Taulé, Mariona; Martí, M. Antònia

doi:10.1007/978-3-319-24027-5_3

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Language variety identification using distributed representations of words and documents

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: LanguageVarietyId ...

Tamaño: 411.7Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: LanguageVarietyId ...

Tamaño: 371.8Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Franco Salvador, Marc	es_ES
dc.contributor.author	Rangel, Francisco	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Taulé, Mariona	es_ES
dc.contributor.author	Martí, M. Antònia	es_ES
dc.date.accessioned	2016-05-19T09:41:48Z
dc.date.available	2016-05-19T09:41:48Z
dc.date.issued	2015-11-20
dc.identifier.isbn	978-3-319-24026-8
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/10251/64372
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-24027-5_3	es_ES
dc.description.abstract	In this work we focus on the use of distributed representations of words and documents using the continuous Skip-gram model. We compare this model with three recent approaches: Information Gain Word-Patterns, TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We evaluate the models introducing the Hispablogs dataset, a new collection of Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and Spain. Experimental results show state-of-the-art performance in language variety identification.	es_ES
dc.description.sponsorship	This research has been carried out within the framework of the European Commis-sion WIQ-EI IRSES (no. 269180) and DIANA - Finding Hidden Knowledge in Texts (TIN2012-38603-C02) projects. The work of the second author was partially funded by Autoritas Consulting SA and by Spanish the Ministry of Economics by means of a ECOPORTUNITY IPT-2012-1220-430000 grant.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer International Publishing	es_ES
dc.relation.ispartof	Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings	es_ES
dc.relation.ispartofseries	Lecture Notes in Computer Science;9283
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Author profiling	es_ES
dc.subject	Language variety identification	es_ES
dc.subject	Distributed representations	es_ES
dc.subject	Information Gain Word-Patterns	es_ES
dc.subject	TF-IDF graphs	es_ES
dc.subject	Emotion-labeled Graphs	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Language variety identification using distributed representations of words and documents	es_ES
dc.type	Capítulo de libro	es_ES
dc.identifier.doi	10.1007/978-3-319-24027-5_3
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//IPT-2012-1220-430000/ES/ECOPORTUNITY/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Franco Salvador, M.; Rangel, F.; Rosso, P.; Taulé, M.; Martí, MA. (2015). Language variety identification using distributed representations of words and documents. En Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings. Springer International Publishing. 28-40. https://doi.org/10.1007/978-3-319-24027-5_3	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://link.springer.com/chapter/10.1007/978-3-319-24027-5_3	es_ES
dc.description.upvformatpinicio	28	es_ES
dc.description.upvformatpfin	40	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	303139	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Ministerio de Educación y Ciencia	es_ES
dc.contributor.funder	Autoritas Consulting, S.A.	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Barto, A.G.: Reinforcement learning: An introduction. MIT press (1998)	es_ES
dc.description.references	Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. The Journal of Machine Learning Research 3, 1137–1155 (2003)	es_ES
dc.description.references	Dumais, S.T.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)	es_ES
dc.description.references	Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research 13(1), 307–361 (2012)	es_ES
dc.description.references	Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Distributed representations. In: Rumelhart, D.E., McClelland, J.L., (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press (1986)	es_ES
dc.description.references	Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the International Conference on Empirical Methods in Natural Language Processing (2014)	es_ES
dc.description.references	Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014)	es_ES
dc.description.references	Levin, B.: English verb classes and alternations. University of Chicago Press, Chicago (1993)	es_ES
dc.description.references	Maier, W., Gómez-Rodríguez, C.: Language variety identification in Spanish tweets. In: Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants, pp. 25–35. Association for Computational Linguistics, Doha, Qatar, October 2014. http://emnlp2014.org/workshops/LT4CloseLang/call.html	es_ES
dc.description.references	Martí, M.A., Bertran, M., Taulé, M., Salamó, M.: Distributional approach based on syntactic dependencies for discovering constructions. Computational Linguistics (2015, under review)	es_ES
dc.description.references	Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (2013)	es_ES
dc.description.references	Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, September 26–30, 2010	es_ES
dc.description.references	Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)	es_ES
dc.description.references	Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426 (2012)	es_ES
dc.description.references	Mohammad, S.M., Yang, T.: Tracking sentiment in mail: how gender differ on emotional axes. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (2011)	es_ES
dc.description.references	Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 246–252. Citeseer (2005)	es_ES
dc.description.references	Pennebaker, J.W.: The secret life of pronouns: What our words say about us. Bloomsbury Press (2011)	es_ES
dc.description.references	Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Information Processing & Management, Special Issue on Emotion and Sentiment in Social and Expressive Media (2015, in press)	es_ES
dc.description.references	Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at pan 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180 (2014)	es_ES
dc.description.references	Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at pan 2013. In: Forner P., Navigli R., Tufis, D. (eds.) Notebook Papers of CLEF 2013 LABs and Workshops. CEUR-WS.org, vol. 1179 (2013)	es_ES
dc.description.references	Sadat, F., Kazemi, F., Farzindar, A.: Automatic identification of arabic language varieties and dialects in social media. In: Proceeding of the 1st International Workshop on Social Media Retrieval and Analysis SoMeRa (2014)	es_ES
dc.description.references	Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)	es_ES
dc.description.references	Sidorov, G., Miranda-Jimnez, S., Viveros-Jimnez, F., Gelbukh, F., Castro-Snchez, N., Velsquez, F., Daz-Rangel, I., Surez-Guerra, S., Trevio, A., Gordon-Miranda, J.: Empirical study of opinion mining in spanish tweets. In: 11th Mexican International Conference on Artificial Intelligence, MICAI, pp. 1–4 (2012)	es_ES
dc.description.references	Zampieri, M., Gebrekidan-Gebre, B.: Automatic identification of language varieties: the case of portuguese. In: Proceedings of the Conference on Natural Language Processing (2012)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Language variety identification using distributed representations of words and documents

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Language variety identification using distributed representations of words and documents

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)