Mostrar el registro sencillo del ítem
dc.contributor.author | Williams, Peter | es_ES |
dc.contributor.author | Reynolds, Robert | es_ES |
dc.date.accessioned | 2024-07-24T11:19:44Z | |
dc.date.available | 2024-07-24T11:19:44Z | |
dc.date.issued | 2024-02-12 | |
dc.identifier.isbn | 9788413961316 | |
dc.identifier.uri | http://hdl.handle.net/10251/206583 | |
dc.description.abstract | [EN] We present a new corpus of Czech texts labeled for second-language readability, and show results of experiments to train machine-learning classifiers to automatically label new texts according to reading level. We report results comparing the performance of traditional machine-learning models (including Random Forest, XGBoost, Linear Discriminant Analysis, and XGBoost Random Forest) and a neural network (XLM-RoBERTa). The results of our research can be implemented in tools to support learning Czech, a less commonly taught language. We extract 46 linguistic features in various categories for use with traditional machine-learning algorithms. We train models on these features and evaluate their performance with recursive feature elimination to determine how informative each feature is for each model. We then compare those results to those of a transformer trained for the same task on the same corpus. XGBoost achieves the highest accuracy at 0.81, suggesting that these traditional models can still perform as well as, or better, than newer models on this task. Notably, the transformer has the lowest mean F1 at 0.74.¹ ¹ Code available at https://github.com/peterjwms/czech-readability. | es_ES |
dc.format.extent | 12 | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Editorial Universitat Politècnica de València | es_ES |
dc.relation.ispartof | EuroCALL 2023. CALL for all Languages - Short Papers | |
dc.rights | Reconocimiento - No comercial - Compartir igual (by-nc-sa) | es_ES |
dc.subject | Readability | es_ES |
dc.subject | Machine learning | es_ES |
dc.subject | Czech | es_ES |
dc.subject | Transformer | es_ES |
dc.subject | Corpus | es_ES |
dc.title | A machine learning approach to Czech readability | es_ES |
dc.type | Capítulo de libro | es_ES |
dc.type | Comunicación en congreso | es_ES |
dc.identifier.doi | 10.4995/EuroCALL2023.2023.16991 | |
dc.rights.accessRights | Abierto | es_ES |
dc.description.bibliographicCitation | Williams, P.; Reynolds, R. (2024). A machine learning approach to Czech readability. Editorial Universitat Politècnica de València. https://doi.org/10.4995/EuroCALL2023.2023.16991 | es_ES |
dc.description.accrualMethod | OCS | es_ES |
dc.relation.conferencename | EuroCALL 2023: CALL for all Languages | es_ES |
dc.relation.conferencedate | Agosto 15-18, 2023 | es_ES |
dc.relation.conferenceplace | Reykjavik, Islandia | es_ES |
dc.relation.publisherversion | http://ocs.editorial.upv.es/index.php/EuroCALL/EuroCALL2023/paper/view/16991 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.pasarela | OCS\16991 | es_ES |