- -

Language variety identification using distributed representations of words and documents

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Language variety identification using distributed representations of words and documents

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Franco Salvador, Marc es_ES
dc.contributor.author Rangel, Francisco es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.contributor.author Taulé, Mariona es_ES
dc.contributor.author Martí, M. Antònia es_ES
dc.date.accessioned 2016-05-19T09:41:48Z
dc.date.available 2016-05-19T09:41:48Z
dc.date.issued 2015-11-20
dc.identifier.isbn 978-3-319-24026-8
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/10251/64372
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-24027-5_3 es_ES
dc.description.abstract In this work we focus on the use of distributed representations of words and documents using the continuous Skip-gram model. We compare this model with three recent approaches: Information Gain Word-Patterns, TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We evaluate the models introducing the Hispablogs dataset, a new collection of Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and Spain. Experimental results show state-of-the-art performance in language variety identification. es_ES
dc.description.sponsorship This research has been carried out within the framework of the European Commis-sion WIQ-EI IRSES (no. 269180) and DIANA - Finding Hidden Knowledge in Texts (TIN2012-38603-C02) projects. The work of the second author was partially funded by Autoritas Consulting SA and by Spanish the Ministry of Economics by means of a ECOPORTUNITY IPT-2012-1220-430000 grant. es_ES
dc.language Inglés es_ES
dc.publisher Springer International Publishing es_ES
dc.relation.ispartof Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings es_ES
dc.relation.ispartofseries Lecture Notes in Computer Science;9283
dc.rights Reserva de todos los derechos es_ES
dc.subject Author profiling es_ES
dc.subject Language variety identification es_ES
dc.subject Distributed representations es_ES
dc.subject Information Gain Word-Patterns es_ES
dc.subject TF-IDF graphs es_ES
dc.subject Emotion-labeled Graphs es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Language variety identification using distributed representations of words and documents es_ES
dc.type Capítulo de libro es_ES
dc.identifier.doi 10.1007/978-3-319-24027-5_3
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//IPT-2012-1220-430000/ES/ECOPORTUNITY/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Franco Salvador, M.; Rangel, F.; Rosso, P.; Taulé, M.; Martí, MA. (2015). Language variety identification using distributed representations of words and documents. En Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF'15, Toulouse, France, September 8-11, 2015, Proceedings. Springer International Publishing. 28-40. https://doi.org/10.1007/978-3-319-24027-5_3 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://link.springer.com/chapter/10.1007/978-3-319-24027-5_3 es_ES
dc.description.upvformatpinicio 28 es_ES
dc.description.upvformatpfin 40 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.senia 303139 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Ministerio de Educación y Ciencia es_ES
dc.contributor.funder Autoritas Consulting, S.A. es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.description.references Barto, A.G.: Reinforcement learning: An introduction. MIT press (1998) es_ES
dc.description.references Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. The Journal of Machine Learning Research 3, 1137–1155 (2003) es_ES
dc.description.references Dumais, S.T.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004) es_ES
dc.description.references Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research 13(1), 307–361 (2012) es_ES
dc.description.references Hinton, G.E., McClelland, J.L., Rumelhart, D.E.: Distributed representations. In: Rumelhart, D.E., McClelland, J.L., (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press (1986) es_ES
dc.description.references Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the International Conference on Empirical Methods in Natural Language Processing (2014) es_ES
dc.description.references Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (2014) es_ES
dc.description.references Levin, B.: English verb classes and alternations. University of Chicago Press, Chicago (1993) es_ES
dc.description.references Maier, W., Gómez-Rodríguez, C.: Language variety identification in Spanish tweets. In: Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants, pp. 25–35. Association for Computational Linguistics, Doha, Qatar, October 2014. http://emnlp2014.org/workshops/LT4CloseLang/call.html es_ES
dc.description.references Martí, M.A., Bertran, M., Taulé, M., Salamó, M.: Distributional approach based on syntactic dependencies for discovering constructions. Computational Linguistics (2015, under review) es_ES
dc.description.references Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (2013) es_ES
dc.description.references Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, September 26–30, 2010 es_ES
dc.description.references Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013) es_ES
dc.description.references Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426 (2012) es_ES
dc.description.references Mohammad, S.M., Yang, T.: Tracking sentiment in mail: how gender differ on emotional axes. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (2011) es_ES
dc.description.references Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 246–252. Citeseer (2005) es_ES
dc.description.references Pennebaker, J.W.: The secret life of pronouns: What our words say about us. Bloomsbury Press (2011) es_ES
dc.description.references Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Information Processing & Management, Special Issue on Emotion and Sentiment in Social and Expressive Media (2015, in press) es_ES
dc.description.references Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at pan 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180 (2014) es_ES
dc.description.references Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at pan 2013. In: Forner P., Navigli R., Tufis, D. (eds.) Notebook Papers of CLEF 2013 LABs and Workshops. CEUR-WS.org, vol. 1179 (2013) es_ES
dc.description.references Sadat, F., Kazemi, F., Farzindar, A.: Automatic identification of arabic language varieties and dialects in social media. In: Proceeding of the 1st International Workshop on Social Media Retrieval and Analysis SoMeRa (2014) es_ES
dc.description.references Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975) es_ES
dc.description.references Sidorov, G., Miranda-Jimnez, S., Viveros-Jimnez, F., Gelbukh, F., Castro-Snchez, N., Velsquez, F., Daz-Rangel, I., Surez-Guerra, S., Trevio, A., Gordon-Miranda, J.: Empirical study of opinion mining in spanish tweets. In: 11th Mexican International Conference on Artificial Intelligence, MICAI, pp. 1–4 (2012) es_ES
dc.description.references Zampieri, M., Gebrekidan-Gebre, B.: Automatic identification of language varieties: the case of portuguese. In: Proceedings of the Conference on Natural Language Processing (2012) es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem