We evaluate the data quality of Continuous Emissions Monitoring Systems (CEMS) in China by proposing a new data quality assessment framework. First, we show the quality of waste-gas and waste-water monitoring data are both getting better from January 1, 2016 to June 30, 2017. Second, our result indicates that state-controlled factories, large-scale factories and factories with foreign investment or with funds from Hong Kong, Macao and Taiwan get better data quality scores. However, private enterprises perform poorly and call for more attention from authorities. Third, we find that the quality of CEMS data in underdeveloped regions of China where suffer from serious air pollution is worse. And this situation may hamper further improvement of environment. To achieve continued CEMS data quality improvement, our results suggest a need to differentiated regional CEMS-related policy. Fourth, we compare the factories with poor CEMS data quality with the list of factories received serious administrative penalty in 2016-2017 and find a striking overlap. To detect factories with suspicious pollution activities, clustering analysis and logistic regression model are used. Furthermore, we show the two approaches provide a potential application for a new quantitative monitoring strategy of pollutant factory.
常象宇,西安交通大学数学系与加州大学伯克利分校统计系联合培养博士。现为西安交通大学管理学院副教授,陕西省优秀青年学者支持计划入选者。西安交通大学数据科学与信息质量研究中心联合主任;其个人学术研究主要集中在统计,机器学习与数据科学相关应用领域。曾经在统计学期刊AOS,SS,EJS等;机器学习与人工智能期刊JMLR,TNNLS,TC,TSP等发表论文三十余篇。同时他也曾是人工智能与数据挖掘相关的著名会议:AAAI,IJCAI,SDM,ICDM,ICDS等会议的程序委员会委员。其个人在业界研究主要集中在大数据与人工智产业。是数据科学组织"统计之都"的编委;数据科学公众号"狗熊会"的联合创始人。