- -

A machine learning approach to Czech readability

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

A machine learning approach to Czech readability

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Williams, Peter es_ES
dc.contributor.author Reynolds, Robert es_ES
dc.date.accessioned 2024-07-24T11:19:44Z
dc.date.available 2024-07-24T11:19:44Z
dc.date.issued 2024-02-12
dc.identifier.isbn 9788413961316
dc.identifier.uri http://hdl.handle.net/10251/206583
dc.description.abstract [EN] We present a new corpus of Czech texts labeled for second-language readability, and show results of experiments to train machine-learning classifiers to automatically label new texts according to reading level. We report results comparing the performance of traditional machine-learning models (including Random Forest, XGBoost, Linear Discriminant Analysis, and XGBoost Random Forest) and a neural network (XLM-RoBERTa). The results of our research can be implemented in tools to support learning Czech, a less commonly taught language. We extract 46 linguistic features in various categories for use with traditional machine-learning algorithms. We train models on these features and evaluate their performance with recursive feature elimination to determine how informative each feature is for each model. We then compare those results to those of a transformer trained for the same task on the same corpus. XGBoost achieves the highest accuracy at 0.81, suggesting that these traditional models can still perform as well as, or better, than newer models on this task. Notably, the transformer has the lowest mean F1 at 0.74.¹ ¹ Code available at https://github.com/peterjwms/czech-readability. es_ES
dc.format.extent 12 es_ES
dc.language Inglés es_ES
dc.publisher Editorial Universitat Politècnica de València es_ES
dc.relation.ispartof EuroCALL 2023. CALL for all Languages - Short Papers
dc.rights Reconocimiento - No comercial - Compartir igual (by-nc-sa) es_ES
dc.subject Readability es_ES
dc.subject Machine learning es_ES
dc.subject Czech es_ES
dc.subject Transformer es_ES
dc.subject Corpus es_ES
dc.title A machine learning approach to Czech readability es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.4995/EuroCALL2023.2023.16991
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Williams, P.; Reynolds, R. (2024). A machine learning approach to Czech readability. Editorial Universitat Politècnica de València. https://doi.org/10.4995/EuroCALL2023.2023.16991 es_ES
dc.description.accrualMethod OCS es_ES
dc.relation.conferencename EuroCALL 2023: CALL for all Languages es_ES
dc.relation.conferencedate Agosto 15-18, 2023 es_ES
dc.relation.conferenceplace Reykjavik, Islandia es_ES
dc.relation.publisherversion http://ocs.editorial.upv.es/index.php/EuroCALL/EuroCALL2023/paper/view/16991 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela OCS\16991 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem