- -

Web template extraction based on hyperlink analysis

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Web template extraction based on hyperlink analysis

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Alarte, Julián es_ES
dc.contributor.author Insa Cabrera, David es_ES
dc.contributor.author Silva Galiana, Josep Francesc es_ES
dc.contributor.author Tamarit Muñoz, Salvador es_ES
dc.date.accessioned 2015-05-18T14:44:09Z
dc.date.available 2015-05-18T14:44:09Z
dc.date.issued 2015-01
dc.identifier.issn 2075-2180
dc.identifier.uri http://hdl.handle.net/10251/50403
dc.description.abstract [EN] Web templates are one of the main development resources for website engineers. Templates allow them to increase productivity by plugin content into already formatted and prepared pagelets. For the final user templates are also useful, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important problem, because templates usually contain irrelevant information such as advertisements, menus, and banners. Processing and storing this information is likely to lead to a waste of resources (storage space, bandwidth, etc.). It has been measured that templates represent between 40% and 50% of data on the Web. Therefore, identifying templates is essential for indexing tasks. In this work we propose a novel method for automatic template extraction that is based on similarity analysis between the DOM trees of a collection of webpages that are detected using menus information. Our implementation and experiments demonstrate the usefulness of the technique. es_ES
dc.description.sponsorship This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Economia y Competitividad (Secretaria de Estado de Investigacion, Desarrollo e Innovacion) under Grant TIN201344742-C4-1-R and by the Generalitat Valenciana under Grant PROMETEO/2011/052. David Insa was partially supported by the Spanish Ministerio de Educacion under FPU Grant AP2010-4415. Salvador Tamarit was partially supported by research project POLCA, Programming Large Scale Heterogeneous Infrastructures (610686), funded by the European Union, STREP FP7. en_EN
dc.language Inglés es_ES
dc.relation.ispartof Electronic Proceedings in Theoretical Computer Science es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Information Retrieval es_ES
dc.subject Template Extraction es_ES
dc.subject Content Extraction es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Web template extraction based on hyperlink analysis es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.4204/EPTCS.173.2
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/610686/EU/Programming Large Scale Heterogeneous Infrastructures/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2013-44742-C4-1-R/ES/VALIDACION ASISTIDA DE PROGRAMAS MEDIANTE METODOS PRECISOS Y RIGUROSOS PARA UNA INGENIERIA DEL SOFTWARE ROBUSTA/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEO%2F2011%2F052/ES/LOGICEXTREME: TECNOLOGIA LOGICA Y SOFTWARE SEGURO/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Alarte, J.; Insa Cabrera, D.; Silva Galiana, JF.; Tamarit Muñoz, S. (2015). Web template extraction based on hyperlink analysis. Electronic Proceedings in Theoretical Computer Science. 173:16-26. https://doi.org/10.4204/EPTCS.173.2 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.4204/EPTCS.173.2 es_ES
dc.description.upvformatpinicio 16 es_ES
dc.description.upvformatpfin 26 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 173 es_ES
dc.relation.senia 280504
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Economía y Competitividad
dc.contributor.funder Generalitat Valenciana


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem