- -

Site-Level Web Template Extraction Based on DOM Analysis

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Site-Level Web Template Extraction Based on DOM Analysis

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Alarte-Aleixandre, Julián es_ES
dc.contributor.author Insa Cabrera, David es_ES
dc.contributor.author Silva, Josep es_ES
dc.contributor.author Tamarit Muñoz, Salvador es_ES
dc.date.accessioned 2017-05-30T09:57:32Z
dc.date.available 2017-05-30T09:57:32Z
dc.date.issued 2016-06
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/10251/82004
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-41579-6_4 es_ES
dc.description.abstract One of the main development resources for website engineers are Web templates. Templates allow them to increase productivity by plugin content into already formatted and prepared pagelets. For the final user templates are also useful, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important problem, because templates usually contain irrelevant information such as advertisements, menus, and banners. Processing and storing this information leads to a waste of resources (storage space, bandwidth, etc.). It has been measured that templates represent between 40 % and 50 % of data on the Web. Therefore, identifying templates is essential for indexing tasks. In this work we propose a novel method for automatic web template extraction that is based on similarity analysis between the DOM trees of a collection of webpages that are detected using an hyperlink analysis. Our implementation and experiments demonstrate the usefulness of the technique. es_ES
dc.description.sponsorship This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Econom´ıa y Competitividad (Secretar´ıa de Estado de Investigaci´on, Desarrollo e Innovaci´on) under grant TIN2013-44742-C4-1-R and by the Generalitat Valenciana under grant PROMETEOII/2015/013. David Insa was partially supported by the Spanish Ministerio de Eduaci´on under FPU grant AP2010-4415.
dc.format.extent 14 es_ES
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation.ispartof Lecture Notes in Computer Science es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Information retrieval es_ES
dc.subject Content extraction es_ES
dc.subject Template extraction es_ES
dc.subject.classification BIBLIOTECONOMIA Y DOCUMENTACION es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Site-Level Web Template Extraction Based on DOM Analysis es_ES
dc.type Artículo es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.1007/978-3-319-41579-6
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2013-44742-C4-1-R/ES/VALIDACION ASISTIDA DE PROGRAMAS MEDIANTE METODOS PRECISOS Y RIGUROSOS PARA UNA INGENIERIA DEL SOFTWARE ROBUSTA/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/ME//AP2010-4415/ES/AP2010-4415/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Alarte-Aleixandre, J.; Insa Cabrera, D.; Silva, J.; Tamarit Muñoz, S. (2016). Site-Level Web Template Extraction Based on DOM Analysis. Lecture Notes in Computer Science. 9609:36-49. https://doi.org/10.1007/978-3-319-41579-6 es_ES
dc.description.accrualMethod S es_ES
dc.relation.conferencename 10th International Andrei Ershov Informatics Conference in Memory of Helmut Veith (PSI) es_ES
dc.relation.conferencedate Aug 24-27, 2015 es_ES
dc.relation.conferenceplace Russia es_ES
dc.relation.publisherversion https://link.springer.com/chapter/10.1007/978-3-319-41579-6_4 es_ES
dc.description.upvformatpinicio 36 es_ES
dc.description.upvformatpfin 49 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 9609 es_ES
dc.relation.senia 320288 es_ES
dc.contributor.funder Ministerio de Economía y Competitividad
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Ministerio de Educación es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem