- -

TeMex: The Web Template Extractor

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

TeMex: The Web Template Extractor

Mostrar el registro completo del ítem

Alarte, J.; Insa Cabrera, D.; Silva Galiana, JF.; Tamarit Muñoz, S. (2015). TeMex: The Web Template Extractor. ACM. https://doi.org/10.1145/2740908.2742835

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/66952

Ficheros en el ítem

Metadatos del ítem

Título: TeMex: The Web Template Extractor
Autor: Alarte, julián Insa Cabrera, David Silva Galiana, Josep Francesc Tamarit Muñoz, Salvador
Entidad UPV: Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Fecha difusión:
Resumen:
This paper presents and describes TeMex, a site-level web template extractor. TeMex is fully automatic, and it can work with online webpages without any preprocessing stage (no information about the template or the ...[+]
Derechos de uso: Reserva de todos los derechos
ISBN: 978-1-4503-3473-0
DOI: 10.1145/2740908.2742835
Editorial:
ACM
Versión del editor: http://dx.doi.org/10.1145/2740908.2742835
Título del congreso: 24th International World Wide Web Conference (WWW 2015)
Lugar del congreso: Florence, Italy
Fecha congreso: May 18-22, 2015
Código del Proyecto:
info:eu-repo/grantAgreement/MINECO//TIN2013-44742-C4-1-R/ES/VALIDACION ASISTIDA DE PROGRAMAS MEDIANTE METODOS PRECISOS Y RIGUROSOS PARA UNA INGENIERIA DEL SOFTWARE ROBUSTA/
info:eu-repo/grantAgreement/EC/FP7/610686/EU/Programming Large Scale Heterogeneous Infrastructures/
info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/
info:eu-repo/grantAgreement/MECD//AP2010-4415/ES/AP2010-4415/
Descripción: "© ACM} 2015. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM, In Proceedings of the 24th International Conference on World Wide Web (pp. 155-158), http://dx.doi.org/10.1145/2740908.2742835
Agradecimientos:
This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Economía y Competitividad (Secretaría de Estado de Investigación, Desarrollo e Innovación) under Grant TIN2013-44742-C4-1-R and by the ...[+]
Tipo: Comunicación en congreso

References

Overlay extension. Available from URL: https://developer.mozilla.org/en-US/Add-ons/Overlay_Extensions, 2005.

J. Alarte, D. Insa, J. Silva, and S. Tamarit. Automatic Detection of Webpages that Share the Same Web Template. In M. H. ter Beek and A. Ravara, editors, Proceedings of the 10th International Workshop on Automated Specification and Verification of Web Systems (WWV 14), volume 163 of Electronic Proceedings in Theoretical Computer Science, pages 2--15. Open Publishing Association, July 2014.

J. Alarte, D. Insa, J. Silva, and S. Tamarit. A Benchmark Suite for Template Detection and Content Extraction. CoRR, abs/1409.6182, 2014. [+]
Overlay extension. Available from URL: https://developer.mozilla.org/en-US/Add-ons/Overlay_Extensions, 2005.

J. Alarte, D. Insa, J. Silva, and S. Tamarit. Automatic Detection of Webpages that Share the Same Web Template. In M. H. ter Beek and A. Ravara, editors, Proceedings of the 10th International Workshop on Automated Specification and Verification of Web Systems (WWV 14), volume 163 of Electronic Proceedings in Theoretical Computer Science, pages 2--15. Open Publishing Association, July 2014.

J. Alarte, D. Insa, J. Silva, and S. Tamarit. A Benchmark Suite for Template Detection and Content Extraction. CoRR, abs/1409.6182, 2014.

Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of the 11th International Conference on World Wide Web (WWW'02), pages 580--591, New York, NY, USA, 2002. ACM.

M. Baroni, F. Chantree, A. Kilgarriff, and S. Sharoff. Cleaneval: a Competition for Cleaning Web Pages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'08), pages 638--643. European Language Resources Association, may 2008.

D. Gibson, K. Punera, and A. Tomkins. The volume and evolution of web page templates. In A. Ellis and T. Hagino, editors, Proceedings of the 14th International Conference on World Wide Web (WWW'05), pages 830--839. ACM, may 2005.

T. Gottron. Evaluating content extraction on HTML documents. In V. Grout, D. Oram, and R. Picking, editors, Proceedings of the 2nd International Conference on Internet Technologies and Applications (ITA'07), pages 123--132. National Assembly for Wales, sep 2007.

D. d. C. Reis, P. B. Golgher, A. S. Silva, and A. H. F. Laender. Automatic web news extraction using tree edit distance. In Proceedings of the 13th International Conference on World Wide Web (WWW'04), pages 502--511, New York, NY, USA, 2004. ACM.

K. Vieira, A. L. da Costa Carvalho, K. Berlt, E. S. de Moura, A. S. da Silva, and J. Freire. On finding templates on web collections. World Wide Web, 12(2):171--211, 2009.

K. Vieira, A. S. da Silva, N. Pinto, E. S. de Moura, J. a. M. B. Cavalcanti, and J. Freire. A fast and robust method for web page template detection and removal. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06), pages 258--267, New York, NY, USA, 2006. ACM.

T. Weninger, W. Henry Hsu, and J. Han. CETR: Content Extraction via Tag Ratios. In M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, editors, Proceedings of the 19th International Conference on World Wide Web (WWW'10), pages 971--980. ACM, apr 2010.

L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD'03), pages 296--305, New York, NY, USA, 2003. ACM.

[-]

recommendations

 

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro completo del ítem