López Romero, Sergio; Silva Galiana, Josep Francesc; Insa Cabrera, David(IPN, Centro de Innovación y Desarrollo Tecnológico en Cómputo, 2012-06)
This article introduces a new approach for content
extraction that exploits the hierarchical inter-relations of the
elements in a webpage. Content extraction is a technique used
to extract from a webpage the main textual ...
The main content in a webpage is usually centered and visible without the need to scroll. It
is often rounded by the navigation menus of the website and it can include advertisements,
panels, banners, and other not ...