López Romero, Sergio; Silva Galiana, Josep Francesc; Insa Cabrera, David(IPN, Centro de Innovación y Desarrollo Tecnológico en Cómputo, 2012-06)
This article introduces a new approach for content
extraction that exploits the hierarchical inter-relations of the
elements in a webpage. Content extraction is a technique used
to extract from a webpage the main textual ...
Adelantado Romero, Luis(Universitat Politècnica de València, 2018-10-17)
[ES] La extracción del contenido web comprende un conjunto de técnicas que le permiten a
un programa localizar los diversos componentes de una página web y extraer aquellos que
puedan ser de utilidad u ocultar los que ...
Alarte Aleixandre, Julián(Universitat Politècnica de València, 2023-09-14)
[ES] Desde hace varios años, la cantidad de información disponible en la web crece de manera exponencial. Cada día se genera una gran cantidad de información que prácticamente de inmediato está disponible en la web. Los ...
Alarte, Julián; Silva, Josep(Association for Computing Machinery, 2021-12)
[EN] The main content of a webpage is often surrounded by other boilerplate elements related to the template, such as menus, advertisements, copyright notices, and comments. For crawlers and indexers, isolating the main ...
One of the main development resources for website engineers
are Web templates. Templates allow them to increase productivity by
plugin content into already formatted and prepared pagelets. For the
final user templates ...
The main content in a webpage is usually centered and visible without the need to scroll. It
is often rounded by the navigation menus of the website and it can include advertisements,
panels, banners, and other not ...