- -

Scalable and Language-Independent Embedding-based Approach for Plagiarism Detection Considering Obfuscation Type: No Training Phase

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Scalable and Language-Independent Embedding-based Approach for Plagiarism Detection Considering Obfuscation Type: No Training Phase

Mostrar el registro completo del ítem

Gharavi, E.; Veisi, H.; Rosso, P. (2020). Scalable and Language-Independent Embedding-based Approach for Plagiarism Detection Considering Obfuscation Type: No Training Phase. Neural Computing and Applications. 32(14):10593-10607. https://doi.org/10.1007/s00521-019-04594-y

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/159837

Ficheros en el ítem

Metadatos del ítem

Título: Scalable and Language-Independent Embedding-based Approach for Plagiarism Detection Considering Obfuscation Type: No Training Phase
Autor: Gharavi, Erfaneh Veisi, Hadi Rosso, Paolo
Entidad UPV: Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Fecha difusión:
Resumen:
[EN] The efficiency and scalability of plagiarism detection systems have become a major challenge due to the vast amount of available textual data in several languages over the Internet. Plagiarism occurs in different ...[+]
Palabras clave: Text alignment , Language-independent plagiarism detection , Word embedding , Text representation , Obfuscation type
Derechos de uso: Reserva de todos los derechos
Fuente:
Neural Computing and Applications. (issn: 0941-0643 )
DOI: 10.1007/s00521-019-04594-y
Editorial:
Springer-Verlag
Versión del editor: https://doi.org/10.1007/s00521-019-04594-y
Código del Proyecto:
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PGC2018-096212-B-C31/ES/DESINFORMACION Y AGRESIVIDAD EN SOCIAL MEDIA: AGREGANDO INFORMACION Y ANALIZANDO EL LENGUAJE/
Agradecimientos:
The work of Paolo Rosso was partially funded by the Spanish MICINN under the research Project MISMIS-FAKEn-HATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31).
Tipo: Artículo

References

Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937

Al-Suhaiqi M, Hazaa MAS, Albared M (2018) Arabic English cross-lingual plagiarism detection based on keyphrases extraction, monolingual and machine learning approach. Asian J Res Comput Sci 2:1–12

Alvi F, Stevenson M, Clough PD (2014) Hashing and merging heuristics for text reuse detection. CLEF (working notes), pp 939–946 [+]
Agarwal B, Ramampiaro H, Langseth H, Ruocco M (2018) A deep network model for paraphrase detection in short text messages. Inf Process Manag 54(6):922–937

Al-Suhaiqi M, Hazaa MAS, Albared M (2018) Arabic English cross-lingual plagiarism detection based on keyphrases extraction, monolingual and machine learning approach. Asian J Res Comput Sci 2:1–12

Alvi F, Stevenson M, Clough PD (2014) Hashing and merging heuristics for text reuse detection. CLEF (working notes), pp 939–946

Asghari H, Mohtaj S, Fatemi O, Faili H, Rosso P, Potthast M (2016) Algorithms and corpora for Persian plagiarism detection. In: CEUR workshop proceedings, 1737, pp 135–144

Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155. https://doi.org/10.1162/153244303322533223

Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. ArXiv preprint arXiv:1607.04606

Chong M, Specia L, Mitkov R (2010) Using natural language processing for automatic detection of plagiarism. Language. Retrieved from http://clg.wlv.ac.uk/papers/show_paper.php?ID=272

Clough P (2003) Old and new challenges in automatic plagiarism detection. National Plagiarism Advisory Service (February), 14. Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Old+and+new+challenges+in+automatic+plagiarism+detection#0

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

Ehsan N, Shakery A, Tompa FW (2018) Cross-lingual text alignment for fine-grained plagiarism detection. J Inf Sci. https://doi.org/10.1177/0165551518787696

Esteki F, Esfahani FS (2016) A plagiarism detection approach based on SVM for Persian texts. In: CEUR workshop proceedings, 1737, pp 149–153

Ferrero J, Besacier L, Schwab D, Agnès F (2017) Using word embedding for cross-language plagiarism detection. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers. https://doi.org/10.18653/v1/E17-2066

Firth JR (1957) A synopsis of linguistic theory, 1930–1955. Studies in linguistic analysis

Gharavi E, Veisi H, Bijari K, Zahirnia K (2018) A fast multi-level plagiarism detection method based on document embedding representation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-73606-8_7

Gharavi E, Bijari K, Veisi H, Zahirnia K (2016) A deep learning approach to Persian plagiarism detection. Retrieved from https://pdfs.semanticscholar.org/b0a8/7335289264368a7ee804acc7715fc4799310.pdf

Glinos DG (2014) A hybrid architecture for plagiarism detection. CLEF (working notes), pp 958–965

Gross P, Modaresi P (2014) Plagiarism alignment detection by merging context seeds. CLEF (working notes), pp 966–972

Hinton G (1986) Learning distributed representations of concepts. In: CSS, pp 1–12. https://doi.org/10.1109/69.917563

Hoad TC, Zobel J (2003) Methods for identifying versioned and plagiarized documents. J Am Soc Inf Sci Technol 54:203–215. https://doi.org/10.1002/asi.10170

Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: ACL, pp 655–665. https://doi.org/10.3115/v1/P14-1062

Le QV, Mikolov T (2014) Distributed representations of sentences and documents, vol 32. https://doi.org/10.1145/2740908.2742760

Leilei K, Haoliang Q, Cuixia D, Mingxing W, Zhongyuan H (2013) Approaches for source retrieval and text alignment of plagiarism detection: notebook for PAN at CLEF 2013. In: CEUR workshop proceedings, 1179

Leilei K, Haoliang Q, Shuai W, Cuixia D (2012) Approaches for candidate document retrieval and detailed comparison of plagiarism detection. Notebook for PAN at CLEF 2012. Retrieved from http://www.uni-weimar.de/medien/webis/research/events/pan-12/pan12-papers-final/pan12-plagiarism-detection/kong12-notebook.pdf

Livermore MA, Dadgostari F, Guim M, Beling P, Rockmore D (2018) Law search as prediction. Virginia Public Law and Legal Theory Research Paper (2018-61)

Mashhadirajab F, Shamsfard M (2016) A text alignment algorithm based on prediction of obfuscation types using SVM neural network. FIRE (working notes), pp 167–171

Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of the international conference on learning representations (ICLR 2013), pp 1–12. https://doi.org/10.1162/153244303322533223

Mikolov T, Yih W, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT (June), pp 746–751. Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Linguistic+Regularities+in+Continuous+Space+Word+Representations#0%5Cnhttps://www.aclweb.org/anthology/N/N13/N13-1090.pdf

Minaei B, Niknam M (2016) An n-gram based method for nearly copy detection in plagiarism systems. FIRE (working notes), pp 172–175

Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cognit Sci 34(8):1388–1429. https://doi.org/10.1111/j.1551-6709.2010.01106.x

Momtaz M, Bijari K, Salehi M, Veisi H (2016) Graph-based approach to text alignment for plagiarism detection in persian documents. FIRE (working notes), pp 176–179

Palkovskii Y, Belov A (2013) Using hybrid similarity methods for plagiarism detection. Notebook for PAN at CLEF 2013

Palkovskii Y, Belov A (2014) Developing high-resolution universal multi-type N-gram plagiarism detector. Working notes papers of the CLEF 2014 evaluation labs, pp 984–989

Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543. https://doi.org/10.3115/v1/D14-1162

Potthast M, Stein B, Eiselt A, Barrón-Cedeño A, Rosso P (2009) Overview of the 1st international competition on plagiarism detection. In: SEPLN 09 workshop on uncovering plagiarism, authorship, and social software misuse, pp 1–9. Retrieved from http://ceur-ws.org/Vol-502

Potthast M, Hagen M, Beyer A, Busse M, Tippmann M, Rosso P, Stein B (2014) Overview of the 6th international competition on plagiarism detection. Notebook for PAN at CLEF 2014, pp 845–876

Potthast M, Hagen M, Gollub T, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. In: CEUR workshop proceedings, 1179

Potthast M, Stein B, Barrón-cedeño A, Rosso P (2010) An evaluation framework for plagiarism detection. In: Proceedings of the 23rd international conference on computational linguistics (COLING 2010) (August), pp 997–1005. Retrieved from http://dl.acm.org/citation.cfm?id=1944566.1944681

Qimin C, Qiao G, Yongliang W, Xianghua W (2015) Text clustering using VSM with feature clusters. Neural Comput Appl 26(4):995–1003

Rodríguez Torrejón D, Martín Ramos J (2014) CoReMo 2.3 plagiarism detector text alignment module: notebook for PAN at CLEF 2014. In: CEUR workshop proceedings, 1180, pp 997–1003

Sanchez-Perez MA, Sidorov G, Gelbukh A (2014) The winning approach to text alignment for text reuse detection at PAN 2014: notebook for PAN at CLEF 2014. In: CEUR workshop proceedings, 1180, pp 1004–1011

Sánchez-Vega F, Villatoro-Tello E, Montes-y-Gómez M, Rosso P, Stamatatos E, Villaseñor-Pineda L (2019) Paraphrase plagiarism identification with character-level features. Pattern Anal Appl 22(2):669–681

Shrestha P, Maharjan S, Solorio T (2014) Machine translation evaluation metric for text alignment. CLEF (working notes), pp 1012–1016

Shrestha P, Solorio T (2013) Using a variety of n-grams for the detection of different kinds of plagiarism. Notebook for PAN at CLEF

Socher R (2014) Recursive deep learning for natural language processing and computer vision. Ph.D. thesis (August). https://papers.nips.cc/paper/4204-dynamic-pooling-and-unfolding-recursive-autoencoders-for-paraphrase-detection.pdf

Socher R, Huang E, Pennington J (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in neural information processing systems, pp 801–809. Retrieved from http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2011_0538.pdf%5Cnhttps://papers.nips.cc/paper/4204-dynamic-pooling-and-unfolding-recursive-autoencoders-for-paraphrase-detection.pdf

Socher R, Manning CDC, Ng AYA (2010) Learning continuous phrase representations and syntactic parsing with recursive neural networks. In: Proceedings of the NIPS-2010 deep learning and unsupervised feature learning workshop, pp 1–9. https://doi.org/10.1007/978-3-540-87479-9

Socher R, Manning C, Huval B, Ng A (2012) Semantic compositionality through recursive matrix-vector spaces. In: EMNLP-CoNLL’12: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211. https://doi.org/10.1162/153244303322533223

Suchomel Š, Kasprzak J, Brandejs M et al (2013) Diverse queries and feature type selection for plagiarism discovery. Notebook for PAN at CLEF 2013

Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. Proc ACL. https://doi.org/10.1515/popets-2015-0023

Talebpour A, Shirzadi M, Aminolroaya Z (2016) Plagiarism detection based on a novel trie-based approach. In: CEUR workshop proceedings, 1737, pp 180–183

[-]

recommendations

 

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro completo del ítem