Estimating the gaps in air pollution data

21 July 2017

Statistics doctoral student Fangyao Li
Statistics doctoral student Fangyao Li

Air pollution is dangerous for human health, with the fine particles generated by motor vehicles, planes, residential and forest fires, volcanic eruptions and dust storms easily inhaled. The most damaging particles are below 2.5 micrometers (PM2.5) – this is just 3% of the diameter of a human hair, and something of that size is invisible to the naked eye. You wouldn’t know if you were breathing that in.

Unsurprisingly, many countries have strict testing regimes to measure the concentration of PM2.5. However, the instruments to collect such high-dimensional time series are expensive, and statisticians are interested in ways to best estimate concentrations of fine particulate where complete data is not available.

That’s where doctoral student Fangyao Li comes in. She is using National Institute of Water and Atmospheric Research (NIWA) data taken hourly between 2008 and 2014 at four sites in Auckland – Patumahoe, Penrose, Takapuna and Whangaparaoa – and is assessing which measurements to employ in order to get estimates that are as accurate as possible.

Fangyao is using the amusingly-named “greedy algorithms” to do this. She explains that a greedy algorithm looks for the best solution at each step in the hope that together those will lead to the best solution. Greedy algorithms aren’t new – they’ve been in existence for the past three decades – but they have been re-discovered recently in the context of big data. Essentially, Fangyao’s research aims to find a good trade-off between algorithmic complexity and estimation accuracy.

Research is something of a treasure hunt for Fangyao. “Just like mining, the most beautiful and valuable things are always hidden deep beneath the earth,” she says. “The deeper you dig, the more you find.”

The NIWA data has also underlined to Fangyao how vulnerable we all are to the effects of air pollution. “For instance, according to the NIWA air pollution data I have, there was an extremely high concentration of PM2.5 in Auckland in September 2009. The reason was a dust storm in the Australian desert; the dust was blown over the Tasman Sea to the North Island. Air pollution isn’t an issue for a specific city or a specific country, all human beings should be concerned about it.”

Fangyao comes from Xinxiang in northern China, and gained a bachelors degree in mathematics at Xinjiang University. She then decided to pursue study overseas: “There were many definitions, theorems, proofs and propositions involved in my undergraduate study, and undeniably, they are interesting, lovely and beautiful. However, when I decided to further my education, I wanted to switch modes and try something more practical, looking at how to solve real-world problems by using the knowledge I have learned.”

New Zealand won, “just because the reputation of the Department of Statistics and because this is the birthplace of [the well-known statistics software] R.” A Postgraduate Diploma in Science specialising in Statistics and a masters followed, and Fangyao started her doctoral studies last year.