- -

On Methods of Data Standardization of German Social Media Comments

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

On Methods of Data Standardization of German Social Media Comments

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Melnyk, Lidiia es_ES
dc.contributor.author Feld, Linda es_ES
dc.date.accessioned 2024-01-02T11:03:06Z
dc.date.available 2024-01-02T11:03:06Z
dc.date.issued 2023-12-12
dc.identifier.uri http://hdl.handle.net/10251/201316
dc.description.abstract [EN] This article is part of a larger project aiming at identifying discursive strategies in social media discourses revolving around the topic of gender diversity, for which roughly 350,000 comments were scraped from the comments sections below YouTube videos relating to the topic in question. This article focuses on different methods of standardizing social media data in order to enhance further processing. More specifically, the data are corrected in terms of casing, spelling, and punctuation. Different tools and models (LanguageTool, T5, seq2seq, GPT-2) were tested. The best outcome was achieved by the German GPT-2 model: It scored highest in all of the applied scores (ROUGE, GLEU, BLEU), making it the best model for the task of Grammatical Error Correction in German social media data. es_ES
dc.language Inglés es_ES
dc.publisher Universitat Politècnica de València es_ES
dc.relation.ispartof Journal of Computer-Assisted Linguistic Research es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Grammatical Error Correction es_ES
dc.subject LanguageTool es_ES
dc.subject Data augmentation es_ES
dc.subject Seq2seq es_ES
dc.subject T5 es_ES
dc.subject GPT-2 es_ES
dc.title On Methods of Data Standardization of German Social Media Comments es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.4995/jclr.2023.19907
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Melnyk, L.; Feld, L. (2023). On Methods of Data Standardization of German Social Media Comments. Journal of Computer-Assisted Linguistic Research. 7:22-42. https://doi.org/10.4995/jclr.2023.19907 es_ES
dc.description.accrualMethod OJS es_ES
dc.relation.publisherversion https://doi.org/10.4995/jclr.2023.19907 es_ES
dc.description.upvformatpinicio 22 es_ES
dc.description.upvformatpfin 42 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 7 es_ES
dc.identifier.eissn 2530-9455
dc.relation.pasarela OJS\19907 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem