Alarte, J., Insa, D., Silva, J., & Tamarit, S. (2015). TeMex. Proceedings of the 24th International Conference on World Wide Web - WWW ’15 Companion. doi:10.1145/2740908.2742835
Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49. Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49.
Alassi, D., & Alhajj, R. (2013). Effectiveness of template detection on noise reduction and websites summarization. Information Sciences, 219, 41-72. doi:10.1016/j.ins.2012.07.022
[+]
Alarte, J., Insa, D., Silva, J., & Tamarit, S. (2015). TeMex. Proceedings of the 24th International Conference on World Wide Web - WWW ’15 Companion. doi:10.1145/2740908.2742835
Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49. Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49.
Alassi, D., & Alhajj, R. (2013). Effectiveness of template detection on noise reduction and websites summarization. Information Sciences, 219, 41-72. doi:10.1016/j.ins.2012.07.022
Bar-Yossef, Z., & Rajagopalan, S. (2002). Template detection via data mining and its applications. Proceedings of the eleventh international conference on World Wide Web - WWW ’02. doi:10.1145/511446.511522
Chakrabarti, D., Kumar, R., & Punera, K. (2007). Page-level template detection via isotonic smoothing. Proceedings of the 16th international conference on World Wide Web - WWW ’07. doi:10.1145/1242572.1242582
Chen, L., Ye, S., & Li, X. (2006). Template detection for large scale search engines. Proceedings of the 2006 ACM symposium on Applied computing - SAC ’06. doi:10.1145/1141277.1141534
Gibson, D., Punera, K., & Tomkins, A. (2005). The volume and evolution of web page templates. Special interest tracks and posters of the 14th international conference on World Wide Web - WWW ’05. doi:10.1145/1062745.1062763
Kim, C., & Shim, K. (2011). TEXT: Automatic Template Extraction from Heterogeneous Web Pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612-626. doi:10.1109/tkde.2010.140
Barbara Ann Kitchenham David Budgen and Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman 8 Hall/CRC. Barbara Ann Kitchenham David Budgen and Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman 8 Hall/CRC.
Kołcz, A., & Yih, W. (s. f.). Site-Independent Template-Block Detection. Lecture Notes in Computer Science, 152-163. doi:10.1007/978-3-540-74976-9_17
Kohlschütter, C. (2009). A densitometric analysis of web template content. Proceedings of the 18th international conference on World wide web - WWW ’09. doi:10.1145/1526709.1526909
Jing Li and C. I. Ezeife. 2006. Cleaning web pages for effective web content mining. In Database and Expert Systems Applications Stéphane Bressan Josef Küng and Roland Wagner (Eds.). Springer Berlin 560--571. 10.1007/11827405_55 Jing Li and C. I. Ezeife. 2006. Cleaning web pages for effective web content mining. In Database and Expert Systems Applications Stéphane Bressan Josef Küng and Roland Wagner (Eds.). Springer Berlin 560--571. 10.1007/11827405_55
Bing Liu. 2006. Web Data Mining: Exploring Hyperlinks Contents and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc. Secaucus NJ. Bing Liu. 2006. Web Data Mining: Exploring Hyperlinks Contents and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc. Secaucus NJ.
Liu, L., Han, W., Buttler, D., Pu, C., & Tang, W. (1999). An XJML-based wrapper generator for Web information extraction. Proceedings of the 1999 ACM SIGMOD international conference on Management of data - SIGMOD ’99. doi:10.1145/304182.304570
Ma, L., Goharian, N., Chowdhury, A., & Chung, M. (2003). Extracting unstructured data from template generated web documents. Proceedings of the twelfth international conference on Information and knowledge management - CIKM ’03. doi:10.1145/956863.956961
Manjula, R., & Chilambuchelvan, A. (2013). Extracting templates from Web pages. 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE). doi:10.1109/icgce.2013.6823541
Christopher D. Manning Prabhakar Raghavan and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press New York NY. Christopher D. Manning Prabhakar Raghavan and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press New York NY.
Meng, X., Hu, D., & Li, C. (2003). Schema-guided wrapper maintenance for web-data extraction. Proceedings of the fifth ACM international workshop on Web information and data management - WIDM ’03. doi:10.1145/956699.956701
Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., & Bui, T. D. (2009). A Fast Template-Based Approach to Automatically Identify Primary Text Content of a Web Page. 2009 International Conference on Knowledge and Systems Engineering. doi:10.1109/kse.2009.39
Schäfer, R. (2016). Accurate and efficient general-purpose boilerplate detection for crawled web corpora. Language Resources and Evaluation, 51(3), 873-889. doi:10.1007/s10579-016-9359-2
Sivakumar, P. (2015). Effectual Web Content Mining using Noise Removal from Web Pages. Wireless Personal Communications, 84(1), 99-121. doi:10.1007/s11277-015-2596-7
Song, D., Sun, F., & Liao, L. (2013). A hybrid approach for content extraction with text density and visual importance of DOM nodes. Knowledge and Information Systems, 42(1), 75-96. doi:10.1007/s10115-013-0687-x
R. Uma and B. Latha. 2018. Noise elimination from web pages for efficacious information retrieval. Cluster Comput. (Mar. 2018). https://link.springer.com/article/10.1007/s10586-018-2366-x#citeas. R. Uma and B. Latha. 2018. Noise elimination from web pages for efficacious information retrieval. Cluster Comput. (Mar. 2018). https://link.springer.com/article/10.1007/s10586-018-2366-x#citeas.
Uzun, E., Agun, H. V., & Yerlikaya, T. (2013). A hybrid approach for extracting informative content from web pages. Information Processing & Management, 49(4), 928-944. doi:10.1016/j.ipm.2013.02.005
Vieira, K., da Costa Carvalho, A. L., Berlt, K., de Moura, E. S., da Silva, A. S., & Freire, J. (2009). On Finding Templates on Web Collections. World Wide Web, 12(2), 171-211. doi:10.1007/s11280-009-0059-3
Vieira, K., da Silva, A. S., Pinto, N., de Moura, E. S., Cavalcanti, J. M. B., & Freire, J. (2006). A fast and robust method for web page template detection and removal. Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM ’06. doi:10.1145/1183614.1183654
Thijs Vogels Octavian-Eugen Ganea and Carsten Eickhoff. 2018. Web2Text: Deep structured boilerplate removal. CoRR abs/1801.02607 (2018). Retrieved from http://arxiv.org/abs/1801.02607. Thijs Vogels Octavian-Eugen Ganea and Carsten Eickhoff. 2018. Web2Text: Deep structured boilerplate removal. CoRR abs/1801.02607 (2018). Retrieved from http://arxiv.org/abs/1801.02607.
Wang, Y., Fang, B., Cheng, X., Guo, L., & Xu, H. (2008). Incremental web page template detection. Proceeding of the 17th international conference on World Wide Web - WWW ’08. doi:10.1145/1367497.1367749
Yi, L., Liu, B., & Li, X. (2003). Eliminating noisy information in Web pages for data mining. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03. doi:10.1145/956750.956785
Zheng, S., Song, R., Wen, J.-R., & Giles, C. L. (2009). Efficient record-level wrapper induction. Proceeding of the 18th ACM conference on Information and knowledge management - CIKM ’09. doi:10.1145/1645953.1645962
Zheng, S., Song, R., Wen, J.-R., & Wu, D. (2007). Joint optimization of wrapper generation and template detection. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’07. doi:10.1145/1281192.1281287
[-]