- -

Data Mining Paradigm in the Study of Air Quality

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by


Data Mining Paradigm in the Study of Air Quality

Show full item record

Represa, NS.; Fernández-Sarría, A.; Porta, A.; Palomar-Vázquez, J. (2020). Data Mining Paradigm in the Study of Air Quality. Environmental Processes. 7(1):1-21. https://doi.org/10.1007/s40710-019-00407-5

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/160828

Files in this item

Item Metadata

Title: Data Mining Paradigm in the Study of Air Quality
Author: Represa, Natacha Soledad Fernández-Sarría, Alfonso Porta, Andrés Palomar-Vázquez, Jesús
UPV Unit: Universitat Politècnica de València. Departamento de Ingeniería Cartográfica Geodesia y Fotogrametría - Departament d'Enginyeria Cartogràfica, Geodèsia i Fotogrametria
Issued date:
[EN] Air pollution is a serious global problem that threatens human life and health, as well as the environment. The most important aspect of a successful air quality management strategy is the measurement analysis, air ...[+]
Subjects: Air quality , Environmental management , Air pollution , Data mining
Copyrigths: Cerrado
Environmental Processes. (issn: 2198-7491 )
DOI: 10.1007/s40710-019-00407-5
Publisher version: https://doi.org/10.1007/s40710-019-00407-5
Project ID:
info:eu-repo/grantAgreement/ANPCyT//PICT-2015-0618/AR/Estudio de potenciales emergencias químicas en escenarios urbanos y suburbanos con modelos simples y complejos/
Type: Artículo


Alsahli MM, Al-Harbi M (2018) Allocating optimum sites for air quality monitoring stations using GIS suitability analysis. Urban Clim 24:875–886

Amegah AK, Agyei-Mensah S (2017) Urban air pollution in sub-Saharan Africa: time for action. Environ Pollut 220:738–743

Austin E, Coull BA, Zanobetti A, Koutrakis P (2013) A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Environ Int 59:244–254 [+]
Alsahli MM, Al-Harbi M (2018) Allocating optimum sites for air quality monitoring stations using GIS suitability analysis. Urban Clim 24:875–886

Amegah AK, Agyei-Mensah S (2017) Urban air pollution in sub-Saharan Africa: time for action. Environ Pollut 220:738–743

Austin E, Coull BA, Zanobetti A, Koutrakis P (2013) A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Environ Int 59:244–254

Bai Y, Li Y, Wang X, Xie J, Li C (2016) Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions. Atmos Pollut Res 7(3):557–566

Bakhtiarifar MH, Bashiri M, Amiri A (2017) Optimization of problems with multivariate multiple functional responses: a case study in air quality. Commun Statist Simul Comput 46(10):8049–8063

Baldasano JM, Valera E, Jimenez P (2003) Air quality data from large cities. Sci Total Environ 307:141–165

Bellinger C, Jabbar MSM, Zaïane O, Osornio-Vargas A (2017) A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health 17(1):907

Biancofiore F, Busilacchio M, Verdecchia M, Tomassetti B, Aruffo E, Bianco S et al (2017) Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmos Pollut Res 8(4):652–659

Birant D (2011) Comparison of decision tree algorithms for predicting potential air pollutant emissions with data mining models. J Environ Inform 17(1)

Carslaw DC, Ropkins K (2012) Openair—an R package for air quality data analysis. Environ Model Softw 27:52–61

Castellanos MG, Dayal U, Simitsis A, Wilkinson WK (2014). Quality-driven ETL design optimization 2014. U.S. Patent No. 8:719–769. U.S. Patent and Trademark Office, Washington, DC

Chen G, Li S, Knibbs LD, Hamm NAS, Cao W, Li T, Guo J, Ren H, Abramson MJ, Guo Y (2018a) A machine learning method to estimate PM 2.5 concentrations across China with remote sensing, meteorological and land use information. Science of the Total Environment 636:52-60

Chen G, Wang Y, Li S, Cao W, Ren H, Knibbs LD, Abramson MJ, Guo Y (2018b) Spatiotemporal patterns of PM10 concentrations over China during 2005–2016: A satellite-based estimation using the random forests approach. Environmental Pollution 242:605-613

Chen J, Xin J, An J, Wang Y, Liu Z, Chao N, Meng Z (2014) Observation of aerosol optical properties and particulate pollution at background station in the Pearl River Delta region. Atmos Res 143:216–227

Chen M, Wang P, Chen Q, Wu J, Chen X (2015) A clustering algorithm for sample data based on environmental pollution characteristics. Atmos Environ 107:194–203

Chen Y, Wang L, Li F, Du B, Choo KKR, Hassan H, Qin W (2017) Air quality data clustering using EPLS method. Inform Fusion 36:225–232

Csépe Z, Makra L, Voukantsis D, Matyasovszky I, Tusnády G, Karatzas K, Thibaudon M (2014) Predicting daily ragweed pollen concentrations using computational intelligence techniques over two heavily polluted areas in Europe. Sci Total Environ 476:542–552

Desarkar A, Das A (2018) Implementing decision tree in air pollution reduction framework. In: Smart computing and informatics. Springer, Singapore, pp 105–113

Dincer NG, Akkuş Ö (2018) A new fuzzy time series model based on robust clustering for forecasting of air pollution. Ecol Inform 43:157–164

Domańska D, Łukasik S (2016) Handling high-dimensional data in air pollution forecasting tasks. Ecol Inform 34:70–91

Domańska D, Wojtylak M (2014) Explorative forecasting of air pollution. Atmos Environ 92:19–30

Duboue M (1978) Pollution roses: a simple way of interpreting the data obtained by air pollution measurement systems in the proximity of refineries. Stud Environ Sci:133–136

Elangasinghe MA, Singhal N, Dirks KN, Salmond JA (2014b) Development of an ANN–based air pollution forecasting system with explicit knowledge through sensitivity analysis. Atmos Pollut Res 5(4):696–708

Elangasinghe MA, Singhal N, Dirks KN, Salmond JA, Samarasinghe S (2014a) Complex time series analysis of PM10 and PM2.5 for a coastal site using artificial neural network modelling and k-means clustering. Atmos Environ 94:106–116

European Commission (2008) Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Off J European Union

Feng X, Li Q, Zhu Y, Hou J, Jin L, Wang J (2015) Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos Environ 107:118–128

Franceschi F, Cobo M, Figueredo M (2018) Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá, Colombia, using artificial neural networks, principal component analysis, and k-means clustering. Atmos Pollut Res 9(5):912–922

Fu M, Wang W, Le Z, Khorram MS (2015) Prediction of particulate matter concentrations by developed feed-forward neural network with rolling mechanism and gray model. Neural Comput Appl 26(8):1789–1797

Gacquer D, Delcroix V, Delmotte F, Piechowiak S (2011) Comparative study of supervised classification algorithms for the detection of atmospheric pollution. Eng Appl Artif Intell 24(6):1070–1083

Gómez-Losada Á (2017) Clustering air monitoring stations according to background and ambient pollution using hidden Markov models and multidimensional scaling. In: Data science. Springer, Cham, pp 123–132

Gong B, Ordieres-Meré J (2016) Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques: case study of Hong Kong. Environ Model Softw 84:290–303

Gong B, Ordieres-Meré J (2017) Reconfiguring existing pollutant monitoring stations by increasing the value of the gathered information. Environmental Modelling & Software 96:106-122

Gulia S, Nagendra SS, Khare M, Khanna I (2015) Urban air quality management-a review. Atmos Pollut Res 6(2):286–304

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, New York

Harkat MF, Mansouri M, Nounou M, Nounou H (2018) Enhanced data validation strategy of air quality monitoring network. Environ Res 160:183–194

Hasenfratz D, Saukh O, Walser C, Hueglin C, Fierz M, Arn T et al (2015) Deriving high-resolution urban air pollution maps using mobile sensor nodes. Pervasive Mobile Comput 16:268–285

Hastie TJ (2017) Generalized additive models. In: Statistical models in S. Routledge, Boca Raton, pp 249–307

He HD, Li M, Wang WL, Wang ZY, Xue Y (2018) Prediction of PM2. 5 concentration based on the similarity in air quality monitoring network. Building and Environment 137:11-17

Holešovský J, Čampulová M, Michálek J (2018) Semiparametric outlier detection in nonstationary times series: case study for atmospheric pollution in Brno, Czech Republic. Atmos Pollut Res 9(1):27–36

Honarvar AR, Sami A (2019) Towards sustainable smart city by particulate matter prediction using urban big data, excluding expensive air pollution infrastructures. Big Data Res 17:56–65

Hu Y, Fan J, Zhang H, Chen X, Dai G (2016) An estimated method of urban PM2. 5 Concentration distribution for a mobile sensing system. Pervasive Mobile Comput 25:88–103

Jiang P, Dong Q, Li P (2017) A novel hybrid strategy for PM2. 5 concentration analysis and prediction. J Environ Manag 196:443–457

Junger WL, De Leon AP (2015) Imputation of missing data in time series for air pollutants. Atmos Environ 102:96–104

Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907

Kitchenham B (2004) Procedures for performing systematic reviews. Keele UK Keele Univ 33(2004):1–26

Knaflic CN (2015) Storytelling with data: a data visualization guide for business professionals. Wiley

Leung Y, Leung KS, Wong MH, Mak T, Cheung KY, Lo LY et al (2018) An integrated web-based air pollution decision support system–a prototype. Int J Geogr Inform Sci:1–28

Li Q, Shao J (2015) Regularizing lasso: a consistent variable selection method. Stat Sin:975–992

Liao TW (2005) Clustering of time series data—a survey. Pattern Recogn 38(11):1857–1874

Lin H, Liu T, Xiao J, Zeng W, Li X, Guo L et al (2016) Quantifying short-term and long-term health benefits of attaining ambient fine particulate pollution standards in Guangzhou, China. Atmos Environ 137:38–44

Liu Z, Xie M, Tian K, Gao P (2017) GIS-based analysis of population exposure to PM2. 5 air pollution—a case study of Beijing. J Environ Sci 59:48–53

Ma KL, Liao I, Frazier J, Hauser H, Kostis HN (2012) Scientific storytelling using visualization. IEEE Comput Graph Appl 32(1):12–19

Mabahwi NAB, Leh OLH, Omar D (2014) Human health and wellbeing: human health effect of air pollution. Procedia Soc Behav Sci 153:221–229

Marć M, Bielawska M, Simeonov V, Namieśnik J, Zabiegała B (2016) The effect of anthropogenic activity on BTEX, NO2, SO2, and CO concentrations in urban air of the spa city of Sopot and medium-industrialized city of Tczew located in North Poland. Environ Res 147:513–524

Martínez J, Saavedra Á, García-Nieto PJ, Piñeiro JI, Iglesias C, Taboada J et al (2014) Air quality parameters outliers detection using functional data analysis in the Langreo urban area (Northern Spain). Appl Math Comput 241:1–10

Mayer H (1999) Air pollution in cities. Atmos Environ 33(24–25):4029–4037

Mintz D (2012). Technical assistance document for the reporting of daily air quality-the air quality index (aqi): US environmental protection agency. Office of Air Quality Planning and Standards

Mori U, Mendiburu A, Lozano JA (2016) Similarity measure selection for clustering time series databases. IEEE Trans Knowl Data Eng 28(1):181–195

Ni XY, Huang H, Du WP (2017) Relevance analysis and short-term prediction of PM2.5 concentrations in Beijing based on multi-source data. Atmos Environ 150:146–161

Olvera-García MÁ, Carbajal-Hernández JJ, Sánchez-Fernández LP, Hernández-Bautista I (2016) Air quality assessment using a weighted fuzzy inference system. Ecol inform 33:57–74

Petkova EP, Jack DW, Volavka-Close NH, Kinney PL (2013) Particulate matter pollution in African cities. Air Qual Atmos Health 6(3):603–614

Pires JCM, Sousa SIV, Pereira MC, Alvim-Ferraz MCM, Martins FG (2008) Management of air quality monitoring using principal component and cluster analysis—Part I: SO2 and PM10. Atmos Environ 42(6):1249–1260

Podobnik B, Stanley HE (2008) Detrended cross-correlation analysis: a new method for analyzing two nonstationary time series. Phys Rev Lett 100(8):084102

Qiao ZX, Pan W, Lu WZ (2017) Multiscale multifractal properties between ground-level ozone and its precursors in rural area in Hong Kong. J Environ Manag 196:270–277

Qin S, Liu F, Wang C, Song Y, Qu J (2015) Spatial-temporal analysis and projection of extreme particulate matter (PM10 and PM2.5) levels using association rules: A case study of the Jing-Jin-Ji region, China. Atmospheric Environment 120:339-350

Rathore MMU, Paul A, Ahmad A, Chen BW, Huang B, Ji W (2015) Real-time big data analytical architecture for remote sensing application. IEEE J Sel Top Appl Earth Obs Remote Sens 8(10):4610–4621

Russo A, Lind PG, Raischel F, Trigo R, Mendes M (2015) Neural network forecast of daily pollution concentration using optimal meteorological data at synoptic and local scales. Atmos Pollut Res 6(3):540–549

Sadat YK, Nikaein T, Karimipour F (2015) Fuzzy spatial association rule mining to analyze the effect of environmental variables on the risk of allergic asthma prevalence. Geodesy Cartogr 41(2):101–112

Salako GO, Hopke PK (2012) Impact of percentile computation method on PM 24-h air quality standard. J Environ Manag 107:110–113

Sammarco M, Tse R, Pau G, Marfia G (2017) Using geosocial search for urban air pollution monitoring. Pervasive Mobile Comput 35:15–31

Sekar C, Gurjar BR, Ojha CSP, Goyal MK (2015) Potential assessment of neural network and decision tree algorithms for forecasting ambient PM 2.5 and CO concentrations: case study. J Hazard Toxic Radioactive Waste 20(4):A5015001

Shahbazi H, Taghvaee S, Hosseini V, Afshin H (2016) A GIS based emission inventory development for Tehran. Urban Clim 17:216–229

Sharma P, Chandra A, Kaushik SC (2009) Forecasts using box–Jenkins models for the ambient air quality data of Delhi City. Environ Monit Assess 157(1–4):105–112

Shi D, Guan J, Zurada J, Manikas A (2017) A data-mining approach to identification of risk factors in safety management systems. J Manag Inf Syst 34(4):1054–1081

Shi Y, Lau KKL, Ng E (2017b) Incorporating wind availability into land use regression modelling of air quality in mountainous high-density urban environment. Environ Res 157:17–29

Shmilovici A (2009) Support vector machines. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Boston, MA

Soh PW, Chang JW, Huang JW (2018) Adaptive deep learning-based air quality prediction model using the Most relevant spatial-temporal relations. IEEE Access 6:38186–38199

Soysal ÖM (2015) Association rule mining with mostly associated sequential patterns. Expert Syst Appl 42(5):2582–2592

Sulemana I (2012) Assessing over-aged Car legislation as an environmental policy law in Ghana. Int J Bus Soc Sci 3(20)

Sullivan TJ, Driscoll CT, Beier CM, Burtraw D, Fernandez IJ, Galloway JN et al (2018) Air pollution success stories in the United States: the value of long-term observations. Environ Sci Policy 84:69–73

Terry WR, Lee JB, Kumar A (1986) Time series analysis in acid rain modeling: evaluation of filling missing values by linear interpolation. Atmos Environ 20:1941–1943

Tian Y, Yao X, Chen L (2019) Analysis of spatial and seasonal distributions of air pollutants by incorporating urban morphological characteristics. Comput Environ Urban Syst 75:35–48

Villar A, Zarrabeitia MT, Fdez-Arroyabe P, Santurtún A (2018) Integrating and analyzing medical and environmental data using ETL and business intelligence tools. Int J Biometeorol 62(6):1085–1095

Wamba SF, Akter S, Edwards A, Chopin G, Gnanzou D (2015) How ‘big data’ can make big impact: findings from a systematic review and a longitudinal case study. Int J Prod Econ 165:234–246

Wang D, Wei S, Luo H, Yue C, Grunder O (2017a) A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. Sci Total Environ 580:719–733

Wang H, Zhao L (2018) A joint prevention and control mechanism for air pollution in the Beijing-Tianjin-Hebei region in China based on long-term and massive data mining of pollutant concentration. Atmos Environ 174:25–42

Wang J, Song G (2018) A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing 314:198–206

Wang J, Zhang X, Guo Z, Lu H (2017b) Developing an early-warning system for air quality prediction and assessment of cities in China. Expert Syst Appl 84:102–116

Wang L, Zhong B, Vardoulakis S, Zhang F, Pilot E, Li Y et al (2016) Air quality strategies on public health and health equity in Europe—a systematic review. Int J Environ Res Public Health 13(12):1196

Wang S, Paul MJ, Dredze M (2015) Social media as a sensor of air quality and public response in China. J Med Internet Res 17(3)

Westerlund J, Urbain JP, Bonilla J (2014) Application of air quality combination forecasting to Bogota. Atmos Environ 89:22–28

Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann

World Health Organization (2016). Ambient air pollution: a global assessment of exposure and burden of disease

Wu Y, Zhang F, Shi Y, Pilot E, Lin L, Fu Y et al (2016) Spatiotemporal characteristics and health effects of air pollutants in Shenzhen. Atmos Pollut Res 7(1):58–65

Xie Y, Zhao L, Xue J, Gao HO, Li H, Jiang R et al (2018) Methods for defining the scopes and priorities for joint prevention and control of air pollution regions based on data-mining technologies. J Clean Prod 185:912–921

Xu Y, Yang W, Wang J (2017) Air quality early-warning system for cities in China. Atmos Environ 148:239–257

Yang F, Tan J, Zhao Q, Du Z, He K, Ma Y et al (2011) Characteristics of PM2.5 speciation in representative megacities and across China. Atmos Chem Phys 11(11):5207–5219

Yang G, Huang J, Li X (2018b) Mining sequential patterns of PM2. 5 pollution in three zones in China. J Clean Prod 170:388–398

Yang L, Xu H, Jin Z (2018a). Estimating spatial variability of ground-level PM2.5 based on a satellite-derived aerosol optical depth product: Fuzhou, China

Yang X, Zheng Y, Geng G, Liu H, Man H, Lv Z, He K, de Hoogh K (2017) Development of PM2.5 and NO2 models in a LUR framework incorporating satellite remote sensing and air quality model data in Pearl River Delta region, China. Environmental Pollution 226:143–153

Yeganeh B, Hewson MG, Clifford S, Knibbs LD, Morawska L (2017) A satellite-based model for estimating PM2.5 concentration in a sparsely populated environment using soft computing techniques. Environ Model Softw 88:84–92

Zhang C, Ni Z, Ni L (2015) Multifractal detrended cross-correlation analysis between PM2.5 and meteorological factors. Physica A: Statist Mech Appl 438:114–123

Zhang NN, Ma F, Qin CB, Li YF (2018) Spatiotemporal trends in PM2.5 levels from 2013 to 2017 and regional demarcations for joint prevention and control of atmospheric pollution in China. Chemosphere 210:1176–1184

Zhang Y, Bocquet M, Mallet V, Seigneur C, Baklanov A (2012) Real-time air quality forecasting. Part I: History, techniques, and current status. Atmos Environ 60:632–655

Zhao C, Song G (2017) Application of data mining to the analysis of meteorological data for air quality prediction: a case study in Shenyang. IOP Conf Ser: Earth Environ Sci 81(1)

Zotteri G, Kalchschmidt M, Caniato F (2005) The impact of aggregation level on forecasting performance. Int J Prod Econ 93:479–491




This item appears in the following Collection(s)

Show full item record