Department of Statistics


Merging Modal Clusters via Significance Assessment

Speaker: Yong Wang

Affiliation: The University of Auckland

When: Thursday, 19 August 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

In this talk, I will describe a new procedure that merges modal clusters step by step and produces a hierarchical clustering tree. This is useful to deal with superfluous clusters and to reduce the number of clusters as often desired in practice. Based on some new properties we establish for Morse functions, the procedure merges clusters in a sequential manner without causing unnecessary density distortion. Each cluster is evaluated for its significance relative to the other clusters, using the Kullback-Leibler divergence or its log-likelihood approximation, by truncating the density for the cluster at an appropriate level. The least significant cluster is then merged into one of its adjacent clusters, using the novel concept of cluster adjacency we define. The resulting hierarchical clustering tree is useful for determining the number of clusters, as may be preferred by a specific user or in a general, meaningful manner. Numerical studies show that the new procedure handles well difficult clustering problems and often produces intuitively appealing and numerically more accurate clustering results, as compared with several other popular clustering methods in the literature.

Generally-altered, -inflated and -truncated regression, with application to heaped and seeped counts

Speaker: Thomas Yee

Affiliation: University of Auckland

When: Thursday, 22 July 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

A very common aberration in retrospective self-reported survey data is digit preference (heaping) whereby multiples of 10 or 5 upon rounding are measured in excess, creating spikes in spikeplots. Handling this problem requires great flexibility. To this end,

and for seeped data also, we propose GAIT regression to unify truncation, alteration and inflation simultaneously, e.g., to general sets rather than {0}. Models such as the zero-inflated

and zero-altered Poisson are special cases. Parametric and nonparametric alteration and inflation means our combo model has five types of 'special' values. Consequently it spawns

a novel method for overcoming underdispersion through general truncation by expanding out the support. Full estimation details involving Fisher scoring/iteratively reweighted least squares are presented as well as working implementations for three 1-parameter

distributions: Poisson, logarithmic and zeta. Previous methods to date for heaped data have been found wanting, however GAIT regression hold great promise by allowing the joint flexible

modelling of counts having absences, deficiencies and excesses at arbitrary multiple special values. Now it is possible to analyze the joint effects of alteration, inflation and truncation

on under- and over-dispersion. The methodology is now implemented in the VGAM R package available on CRAN.

Does It Add Up? Hierarchical Bayesian Analysis of Compositional Data

Speaker: Em Rushworth

Affiliation: University of Auckland

When: Thursday, 22 July 2021, 2:00 pm to 3:00 pm

Where: 303-B05

Compositional data is everywhere - in mineral analysis, demographics, and species abundance for example. However, despite the well-known difficulties when analysing such data, research into expanding the existing methods or exploring alternative approaches has been limited. The past five years has seen a resurgence of interest in the field popularised by the publication of Aitchison(1982) that uses a family of log-ratio transformations with traditional statistical methodology to analyse compositional data. Most recent publications using this methodology focus solely on application despite the innate limitations of log-ratios preventing wider adoption. This research aims to fill in many of the blanks, including studying the approaches outside the log-ratio transformation family, by proposing a consistent definition of compositional data regardless of approach, methodological developments to the consideration of zeroes and sparse data, and demonstrations of applications across multiple domains.

Bayesian hierarchical models are prominent in many of the domains considered in this research, such as ecology and movement studies, and provide a useful framework for considering compositional data. Despite consistent mentions in either Bayesian or compositional data modelling papers of the crossover between these two fields, there is very little literature and it remains largely unexplored. Leininger et al. (2013) successfully used a Bayesian hierarchical framework to model the presence of zeroes as a separate hierarchical level, but there has not been any research outside of the log-ratio transformation. This research will seek to present a Bayesian approach to compositional data analysis using hierarchical models, and hopefully, assist to make the field more accessible for future researchers.

Stationary distribution approximations for two-island and seed bank models

Speaker: Han Liang Gan

Affiliation: The University of Waikato

When: Tuesday, 29 June 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

In this talk we will discuss two-island Wright-Fisher models which are used to model genetic frequencies and variability for subdivided populations. One of the key components of the model is the level of migration between the two islands. We show that as the population size increases, the appropriate approximation and limit for the stationary distribution of a two-island Wright-Fisher Markov chain depends on the level of migration. In a related seed bank model, individuals in one of the islands stay dormant rather than reproduce. We give analogous results for the seed bank model, compare and contrast the differences and examine the effect the seed bank has on genetic variability. Our results are derived from a new development of Stein's method for the two-island diffusion model and existing results for Stein's method for the Dirichlet distribution.

iNZight, Surveys, and the IDI

Speaker: Tom Elliott

Affiliation: Victoria University of Wellington

When: Tuesday, 1 June 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

iNZight was originally designed to teach students core data analysis skills without the need for coding. However, it is also a powerful research development tool, allowing researchers low on time, money, or both to quickly obtain simple (or advanced) statistics without having to learn to code or pay an expensive programmer/statistician to do the work for them. iNZight now handles survey designs natively (without even needing to specify the design!?) into all graphs, summaries, data wrangling, and modelling. iNZight also now features an add-on system, providing a simple way of extending the existing UI to unique problems, for example Bayesian small area demography. In this talk, I'll be discussing recent modifications and additions to iNZight, plus some other work I've been doing as a member of Te Rourou Tātaritanga (, an MBIE-funded data science research group aiming to improve New Zealand's data infrastructure.

War Stories

Speaker: Peter Mullins


When: Tuesday, 25 May 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

A trip through 50 years of consulting, with brief “views” into a variety of consulting tasks I’ve been involved in.

Estimating Power Spectral Density Parameters of Stochastic Gravitational Wave Background for LISATBA

Speaker: Petra Tang


When: Monday, 24 May 2021, 1:00 pm to 2:00 pm

Where: 303-155

Complementary to electromagnetic waves, the detection of gravitational waves (GWs) can lead astrophysics to dive deeper to the understanding of our Universe. Only until the last decade the detections of GWs have become possible. As we expand our search we bring on more challenges. One of these challenges is how do we resolve stochastic gravitational wave background (SGWB). My research uses the Bayesian parametric algorithms to unfold the properties of GW signals. More specifically, my research estimates the power spectral density of mock SGWB signals for the Laser Interferometer Space Antenna (LISA) in the millimeter frequency band. In this talk I will discuss my computational models and some current results using Bayesian parametric models using a Python package PYMC3 and end with further research ideas.

Statistical Modelling and Machine Learning, and their conflicting philosophical bases

Speaker: Murray Aitkin

Affiliation: University of Melbourne

When: Friday, 21 May 2021, 2:00 pm to 3:00 pm

Where: MLT1/303-G23

The Data Science/Big Data/Machine Learning era is upon us. Some Statistics departments are morphing into Data Science departments. The new era is focussed on flexibility and innovation, not on models and likelihood. In this process the history of these developments has become obscure. This talk traces these developments back to the arguments between Fisher and Neyman over the roles of models and likelihood in statistical inference. For many flexible model-free analyses there is a model-based analysis in the background. We illustrate with examples of the bootstrap and smoothing.

Strengthening the evidence base for assuring trustworthiness of government

Speaker: Len Cook


When: Tuesday, 18 May 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

A long-standing trust in public services is being challenged for a mix of reasons. This has brought to light a variety of long standing tensions that exist in the scope, development, evaluation and publication of evidence about public services in New Zealand. These include:

  • A mantra of evidence-based policy which overshadows the critical importance of evidence-based process and practice.
  • The nature of risks from relying on anecdote and rare events to influence policy change and practice that the paring back of evidence in the public domain helps create.
  • A focus on institutional measures ignores the changing societal context and dynamics of population groups that is essential to assess generational change in well-being.
  • Ignorance of the judicial and societal dimensions of proportionality
  • The role of independent well-resourced third parties (Ombudsman, Judiciary, Auditor-General) for providing public confidence in trustworthiness of public services.
  • The importance of evidence from social sciences, official statistics, operational research and continuous improvement in enabling sector change involving diverse autonomous agencies.
  • The necessity for of government wide principles that provide common assurance of the integrity of research selection, methods, quality and release.

The presentation will extent the evidence framework developed by Superu by drawing on an analysis of the five reviews of Oranga Tamariki, as well as experiences in official statistics. It will include examples from the presenter’s study of the justice system.

A Platform for Large-scale Statistical Modelling using R

Speaker: Jason Cairns


When: Tuesday, 18 May 2021, 11:00 am to 12:00 pm

Where: 303S-G75

The growing sizes of data sets make it increasingly challenging to fit models and perform analytics on a single computer. Distributed computing infrastructure and projects like Hadoop or Spark make it possible to leverage a large number of compute

nodes, but they offer only a limited set of tools and algorithms or their extendibility is limited by performance or programming requirements. R on the other hand provides a vast variety of efficient tools for statistical computing, but it is typically limited

to a single machine. The aim of this project is to leverage the versatility and power of R in a distributed environment like Hadoop by allowing R users to define and run complex distributed algorithms using R. This will allow statisticians to vastly expand

the space of models available for large-scale data modelling. As part of the project we will illustrate the use of such methodology by implementing distributed iterative models in R and applying it to real-world tasks on large-scale problems.

Marine spatial planning to conserve deep sea corals in the South Pacific High Seas

Speaker: Carolyn Lundquist

Affiliation: University of Auckland

When: Tuesday, 11 May 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

Decision support tools have been developed to facilitate spatial management planning, utilising various computational methods to select representative sets of priority areas to conserve biodiversity over extensive geographic areas, while at the same time minimising the cost to existing users. Here, I will discuss the use of the decision support tool Zonation to support a stakeholder process to design revised spatial management of the South Pacific high seas for a process initiated by the South Pacific Regional Fisheries Management Organisation (SPRFMO). The objective of the process was to reduce the impact of fisheries (primarily orange roughy) on deep sea corals and other vulnerable deep sea invertebrates. Through a series of workshops, stakeholders directly contributed to the decisions required to parameterise the tool, including the choice of datasets to represent the range of priorities of the stakeholders. Industry, environmental stakeholders and government representatives determined which taxa to include to represent Vulnerable Marine Ecosystems using predictive habitat suitability layers, the weighting of uncertainty in these layers, and other biodiversity layers reflecting 'rarity and uniqueness'. Industry provided layers to represent an index of value to the fishery that included a 'buffer zone' to allow for logistics of deploying gear. A 'naturalness' layer was developed to incorporate prior disturbance history and likelihood of recovery. Scenarios using Zonation allowed stakeholders to visualise implications of decisions, and calculate relative cost to industry and protection of corals for each spatial management option. Following a series of iterations, a final spatial management proposal was agreed by the stakeholder working group, and boundaries were adjusted slightly for practicality. The proposed spatial management plan was adopted by the SPRFMO Commission in January 2019, and resulted in closures of >2 million km2 of high seas to bottom trawling. Ongoing iterations with stakeholders as part of the annual SPRFMO work plan have assessed the fishery closures with respect to additional data and improvements in species models.


Carolyn Lundquist, NIWA/UoA Joint Graduate School in Coastal and Marine Science

Associate Professor, School of Environment, University of Auckland

Principal Scientist, National Institute of Water & Atmospheric Research, Hamilton

Carolyn Lundquist moved to New Zealand in 2000, after obtaining a PhD in Ecology at the University of California, Davis, and a BSc in Marine Biology from UCLA. She holds a joint position as Principal Scientist in Marine Ecology at the National Institute of Water and Atmospheric Research (NIWA) in Hamilton and as Associate Professor at the School of Environment at the University of Auckland. She is an applied marine ecologist, providing scientific and social-scientific input to inform decision-making for coastal and ocean management at local, national, regional and international scales. She leads two projects for the Sustainable Seas National Science Challenge that are developing marine spatial planning tools to improve management of cumulative impacts in New Zealand’s ocean ecosystems. Other recent projects include management of mangroves and other coastal wetland habitats, reviewing impacts of climate change on the seafood sector, and the development of global biodiversity scenarios for IPBES (the biodiversity equivalent of IPCC).


Please give us your feedback or ask us a question

This message is...

My feedback or question is...

My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page