[EN] Interest has grown around the classification of stance that users assume within online debates in recent years. Stance has been usually addressed by considering users posts in isolation, while social studies highlight ...
Wikipedia has been used as a source of comparable texts
for a range of tasks, such as Statistical Machine Translation and CrossLanguage
Information Retrieval. Articles written in different languages
on the same topic ...
Franco Salvador, Marc(Universitat Politècnica de València, 2017-07-03)
Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human languages. One of its most challenging ...
[EN] Digital text forensics aims at examining the originality and
credibility of information in electronic documents and, in this regard, to extract and analyze information about the authors of these documents. The research ...
This paper presents a methodology to address lexical disambiguation in a standard phrase-based statistical
machine translation system. Similarity among source contexts is used to select appropriate translation
units. The ...
[EN] In this paper, we describe a hybrid approach for word-level language (WLL) identification of Bangla words written in Roman script and mixed with English words as part of our participation in the shared task on ...
[EN] In this work, we propose a variant of a well-known instancebased
algorithm: WKNN. Our idea is to exploit task-dependent features
in order to calculate the weight of the instances according to a
novel paradigm: the ...
Rangel-Pardo, Francisco Manuel; Franco-Salvador, Marc; Rosso, Paolo(Springer-Verlag, 2018)
[EN] Language variety identification aims at labelling texts in a
native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In ...
Irony is a pervasive aspect of many online texts, one made all the more difficult by the absence of face-to-face contact and vocal intonation. As our media increasingly become more social, the problem of irony detection ...
[EN] Commendable amount of work has been attempted in the field of Sentiment Analysis or Opinion Mining from natural language texts and Twitter texts. One of the main goals in such tasks is to assign polarities (positive ...
The present paper introduces the first corpus for the evaluation of Arabic intrinsic plagiarism detection. The corpus consists of 1024 artificial suspicious documents in which 2833 plagiarism cases have been inserted ...
Aguilera, Juan; González, Luis C.; Montes-y-Gómez, Manuel; Rosso, Paolo(Springer-Verlag, 2019)
[EN] The kNN algorithm has three main advantages that make
it appealing to the community: it is easy to understand, it regularly offers competitive performance and its structure can be easily tuning to adapting to the ...
Short-texts clustering is currently an important research area because of its applicability to web information retrieval, text summarization and text mining. These texts are often available in different languages and ...
[EN] Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed ...
Pinto, David; Rosso, Paolo; Jiménez-Salazar, Héctor(Oxford University Press (OUP): Policy A - Oxford Open Option A, 2011)
Clustering narrow domain short texts is considered to be a complex task because of the intrinsic features of the corpus to be clustered: (i) the low frequencies of vocabulary terms in short texts, and (ii) the high vocabulary ...
[EN] The possibility of knowing people traits on the basis of what they write is a field of growing interest named author profiling. To infer a user's gender, age, native language, language variety, or even when the user ...
Franco-Salvador, Marc; Rosso, Paolo; Montes Gomez, Manuel(Elsevier, 2016-07)
Cross-language plagiarism detection aims to detect plagiarised fragments of text among
documents in different languages. In this paper, we perform a systematic examination of
Cross-language Knowledge Graph Analysis; an ...
[EN] We present a corpus of Spanish tweets of 15 Twitter accounts
of politicians of the main five parties (PSOE, PP, Cs, UP and VOX) covering the campaign of the Spanish election of 10th November 2019 (10N
Spanish Election). ...
Alonso Nanclares, Jesús Alberto(Universitat Politècnica de València, 2016-03-08)
[EN] Social circles arised out of a need to organize the contacts in personal networks, within the
current social networking services. The automatic detection of these social circles still
remains an understudied problem, ...
[EN] Today¿s Internet is evolving toward an open society of humans and computational
entities, where intelligent agent systems increasingly support the interaction between
users and computational components. In this scenario, ...