- -

Pruned Wasserstein Index Generation Model and wigpy Package

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Pruned Wasserstein Index Generation Model and wigpy Package

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Xie, Fangzhou es_ES
dc.date.accessioned 2020-09-08T11:37:28Z
dc.date.available 2020-09-08T11:37:28Z
dc.date.issued 2020-05-08
dc.identifier.isbn 9788490488324
dc.identifier.uri http://hdl.handle.net/10251/149600
dc.description.abstract [EN] Recent proposal of Wasserstein Index Generation model (WIG) has shown a new direction for automatically generating indices. However, it is challenging in practice to fit large datasets for two reasons. First, the Sinkhorn distance is notoriously expensive to compute and suffers from dimensionality severely. Second, it requires to compute a full N × N matrix to be fit into memory, where N is the dimension of vocabulary. When the dimensionality is too large, it is even impossible to compute at all. I hereby propose a Lasso-based shrinkage method to reduce dimensionality for the vocabulary as a pre-processing step prior to fittig the WIG model. After we get the word embedding from Word2Vec model, we could cluster these high-dimensional vectors by k-means clustering, and pick most frequent tokens within each cluster to form the “base vocabulary”. Non-base tokens are then regressed on the vectors of base token to get a transformation weight and we could thus represent the whole vocabulary by only the “base tokens”. This variant, called pruned WIG (pWIG), will enable us to shrink vocabulary dimension at will but could still achieve high accuracy. I also provide a wigpy module in Python to carry out computation in both flavor. Application to Economic Policy Uncertainty (EPU) index is showcased as comparison with existing methods of generating time-series sentiment indices. es_ES
dc.language Inglés es_ES
dc.publisher Editorial Universitat Politècnica de València es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Web data es_ES
dc.subject Internet data es_ES
dc.subject Big data es_ES
dc.subject Qca es_ES
dc.subject Pls es_ES
dc.subject Sem es_ES
dc.subject Conference es_ES
dc.subject Wasserstein Index Generation Model (WIG) es_ES
dc.subject Lasso Regression es_ES
dc.subject Pruned Wassersteinn Index Generation (pWIG) es_ES
dc.subject Economic Policy Uncertainty Index (EPU) es_ES
dc.title Pruned Wasserstein Index Generation Model and wigpy Package es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.4995/CARMA2020.2020.11557
dc.rights.accessRights Abierto es_ES
dc.description.bibliographicCitation Xie, F. (2020). Pruned Wasserstein Index Generation Model and wigpy Package. Editorial Universitat Politècnica de València. 69-76. https://doi.org/10.4995/CARMA2020.2020.11557 es_ES
dc.description.accrualMethod OCS es_ES
dc.relation.conferencename CARMA 2020 - 3rd International Conference on Advanced Research Methods and Analytics es_ES
dc.relation.conferencedate Julio 08-09,2020 es_ES
dc.relation.conferenceplace Valencia, Spain es_ES
dc.relation.publisherversion http://ocs.editorial.upv.es/index.php/CARMA/CARMA2020/paper/view/11557 es_ES
dc.description.upvformatpinicio 69 es_ES
dc.description.upvformatpfin 76 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.pasarela OCS\11557 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem