López Romero, Sergio; Silva Galiana, Josep Francesc; Insa Cabrera, David(IPN, Centro de Innovación y Desarrollo Tecnológico en Cómputo, 2012-06)
This article introduces a new approach for content
extraction that exploits the hierarchical inter-relations of the
elements in a webpage. Content extraction is a technique used
to extract from a webpage the main textual ...
Alarte Aleixandre, Julián(Universitat Politècnica de València, 2023-09-14)
[ES] Desde hace varios años, la cantidad de información disponible en la web crece de manera exponencial. Cada día se genera una gran cantidad de información que prácticamente de inmediato está disponible en la web. Los ...
Alarte, Julián; Silva, Josep(Association for Computing Machinery, 2021-12)
[EN] The main content of a webpage is often surrounded by other boilerplate elements related to the template, such as menus, advertisements, copyright notices, and comments. For crawlers and indexers, isolating the main ...
The main content in a webpage is usually centered and visible without the need to scroll. It
is often rounded by the navigation menus of the website and it can include advertisements,
panels, banners, and other not ...
[EN] A Web template is a resource that implements the structure and format of a website, making it ready for plugging content into already formatted and prepared pages. For this reason, templates are one of the main ...