Department of Statistics


Estimating the approximation error for the saddlepoint maximum likelihood estimate

Speaker: Godrick Maradona Oketch

Affiliation: The University of Auckland

When: Wednesday, 8 December 2021, 2:00 pm to 3:00 pm

Where: MLT3/303-101

Saddlepoint approximation to a density function is increasingly being used primarily because of its immense accuracy. A common application of this approximation is to interpret it as a likelihood function, especially when the true likelihood function does not exist or is intractable. The application aims at obtaining parameter estimates using the likelihood function based on saddlepoint approximation. This study examines the likelihood function (based on first and second-order saddlepoint approximation) to estimate the difference between the true but unknown maximum likelihood estimation (MLE) estimates and the saddlepoint-based MLEs. We propose an expression to estimate this difference (error) by computing the gradient of the neglected term in the first-order saddlepoint approximation. Then using common distributions whose true likelihood functions are known to perform confirmatory tests on the proposed error expression, we show that the results are consistent with the difference between the true MLEs and saddlepoint MLEs. These tests indicate that the proposed formula could complement simulation studies, which has been widely used to justify the accuracy of such saddlepoint MLEs.

Applications of scoring rules

Speaker: Matthew Parry

Affiliation: University of Otago

When: Thursday, 21 October 2021, 3:00 pm to 4:00 pm


Suppose you publicly express your uncertainty about an unobserved quantity by quoting a distribution for it. A scoring rule is a special kind of loss function intended to measure the quality of your quoted distribution when an outcome is actually observed. In statistical decision theory, you seek to minimise your expected loss. A scoring rule is said to be proper if the expected loss under your quoted distribution is minimised by quoting that distribution. In other words, you cannot game the system!

In addition to having a rich theoretical structure – for example, associated with every scoring rule is an entropy and a divergence function – scoring rules can be tailored to the problem at hand and consequently have a wide range of application. They are used in statistical inference, for evaluating and ranking of forecasters, for assessing the quality of predictive distributions, and in exams.

I will talk about a range of scoring rules and discuss their application in areas such as classification and time series. In addition to so-called local scoring rules that do not depend on the normalisation of the quoted distribution, I will also discuss recently discovered connections between scoring rules and the Whittle likelihood.

Current state and prospects of R-package for the design of experiments

Speaker: Emi Tanaka

Affiliation: Monash University

When: Thursday, 14 October 2021, 3:00 pm to 4:00 pm


The critical role of data collection is well captured in the expression "garbage in, garbage out" -- in other words, if the collected data is rubbish then no analysis, however complex it may be, can make something out of it. The gold standard for data collection is through well-designed experiments. Re-running an experiment is generally expensive, contrary to statistical analysis where re-doing it is generally low-cost; there's a higher stake in getting it wrong for experimental designs. But how do we design experiments in R? In this talk, I will review the current state of R-package for the design of experiments and present my prototype R-package {edibble} that implements a framework that I call the "grammar of experimental design".


Dr. Emi Tanaka is a lecturer in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area includes data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter's Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops.

Highly comparative time-series analysis

Speaker: Ben Fulcher

Affiliation: School of Physics, The University of Sydney

When: Thursday, 7 October 2021, 3:00 pm to 4:00 pm


Over decades, an interdisciplinary scientific literature has contributed myriad methods for quantifying patterns in time series. These methods can be encoded as features that summarize different types of time-series structure as interpretable real numbers (e.g., the shape of peaks in the Fourier power spectrum, or the estimated dimension of a time-delay reconstructed attractor). In this talk, I will show how large libraries of time-series features (>7k, implemented in the hctsa package) and time series (>30k, in the CompEngine database) enable new ways of of analyzing time-series datasets, and of assessing the novelty and usefulness of time-series analysis methods. I will highlight new open tools that we’ve developed to enable these analyses, and discuss specific applications to neural time series.

Merging Modal Clusters via Significance Assessment

Speaker: Yong Wang

Affiliation: The University of Auckland

When: Thursday, 19 August 2021, 3:00 pm to 4:00 pm


In this talk, I will describe a new procedure that merges modal clusters step by step and produces a hierarchical clustering tree. This is useful to deal with superfluous clusters and to reduce the number of clusters as often desired in practice. Based on some new properties we establish for Morse functions, the procedure merges clusters in a sequential manner without causing unnecessary density distortion. Each cluster is evaluated for its significance relative to the other clusters, using the Kullback-Leibler divergence or its log-likelihood approximation, by truncating the density for the cluster at an appropriate level. The least significant cluster is then merged into one of its adjacent clusters, using the novel concept of cluster adjacency we define. The resulting hierarchical clustering tree is useful for determining the number of clusters, as may be preferred by a specific user or in a general, meaningful manner. Numerical studies show that the new procedure handles well difficult clustering problems and often produces intuitively appealing and numerically more accurate clustering results, as compared with several other popular clustering methods in the literature.

Generally-altered, -inflated and -truncated regression, with application to heaped and seeped counts

Speaker: Thomas Yee

Affiliation: University of Auckland

When: Thursday, 22 July 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

A very common aberration in retrospective self-reported survey data is digit preference (heaping) whereby multiples of 10 or 5 upon rounding are measured in excess, creating spikes in spikeplots. Handling this problem requires great flexibility. To this end,

and for seeped data also, we propose GAIT regression to unify truncation, alteration and inflation simultaneously, e.g., to general sets rather than {0}. Models such as the zero-inflated

and zero-altered Poisson are special cases. Parametric and nonparametric alteration and inflation means our combo model has five types of 'special' values. Consequently it spawns

a novel method for overcoming underdispersion through general truncation by expanding out the support. Full estimation details involving Fisher scoring/iteratively reweighted least squares are presented as well as working implementations for three 1-parameter

distributions: Poisson, logarithmic and zeta. Previous methods to date for heaped data have been found wanting, however GAIT regression hold great promise by allowing the joint flexible

modelling of counts having absences, deficiencies and excesses at arbitrary multiple special values. Now it is possible to analyze the joint effects of alteration, inflation and truncation

on under- and over-dispersion. The methodology is now implemented in the VGAM R package available on CRAN.

Does It Add Up? Hierarchical Bayesian Analysis of Compositional Data

Speaker: Em Rushworth

Affiliation: University of Auckland

When: Thursday, 22 July 2021, 2:00 pm to 3:00 pm

Where: 303-B05

Compositional data is everywhere - in mineral analysis, demographics, and species abundance for example. However, despite the well-known difficulties when analysing such data, research into expanding the existing methods or exploring alternative approaches has been limited. The past five years has seen a resurgence of interest in the field popularised by the publication of Aitchison(1982) that uses a family of log-ratio transformations with traditional statistical methodology to analyse compositional data. Most recent publications using this methodology focus solely on application despite the innate limitations of log-ratios preventing wider adoption. This research aims to fill in many of the blanks, including studying the approaches outside the log-ratio transformation family, by proposing a consistent definition of compositional data regardless of approach, methodological developments to the consideration of zeroes and sparse data, and demonstrations of applications across multiple domains.

Bayesian hierarchical models are prominent in many of the domains considered in this research, such as ecology and movement studies, and provide a useful framework for considering compositional data. Despite consistent mentions in either Bayesian or compositional data modelling papers of the crossover between these two fields, there is very little literature and it remains largely unexplored. Leininger et al. (2013) successfully used a Bayesian hierarchical framework to model the presence of zeroes as a separate hierarchical level, but there has not been any research outside of the log-ratio transformation. This research will seek to present a Bayesian approach to compositional data analysis using hierarchical models, and hopefully, assist to make the field more accessible for future researchers.

Stationary distribution approximations for two-island and seed bank models

Speaker: Han Liang Gan

Affiliation: The University of Waikato

When: Tuesday, 29 June 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

In this talk we will discuss two-island Wright-Fisher models which are used to model genetic frequencies and variability for subdivided populations. One of the key components of the model is the level of migration between the two islands. We show that as the population size increases, the appropriate approximation and limit for the stationary distribution of a two-island Wright-Fisher Markov chain depends on the level of migration. In a related seed bank model, individuals in one of the islands stay dormant rather than reproduce. We give analogous results for the seed bank model, compare and contrast the differences and examine the effect the seed bank has on genetic variability. Our results are derived from a new development of Stein's method for the two-island diffusion model and existing results for Stein's method for the Dirichlet distribution.

iNZight, Surveys, and the IDI

Speaker: Tom Elliott

Affiliation: Victoria University of Wellington

When: Tuesday, 1 June 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

iNZight was originally designed to teach students core data analysis skills without the need for coding. However, it is also a powerful research development tool, allowing researchers low on time, money, or both to quickly obtain simple (or advanced) statistics without having to learn to code or pay an expensive programmer/statistician to do the work for them. iNZight now handles survey designs natively (without even needing to specify the design!?) into all graphs, summaries, data wrangling, and modelling. iNZight also now features an add-on system, providing a simple way of extending the existing UI to unique problems, for example Bayesian small area demography. In this talk, I'll be discussing recent modifications and additions to iNZight, plus some other work I've been doing as a member of Te Rourou Tātaritanga (, an MBIE-funded data science research group aiming to improve New Zealand's data infrastructure.

War Stories

Speaker: Peter Mullins


When: Tuesday, 25 May 2021, 3:00 pm to 4:00 pm

Where: MLT3/303-101

A trip through 50 years of consulting, with brief “views” into a variety of consulting tasks I’ve been involved in.

Estimating Power Spectral Density Parameters of Stochastic Gravitational Wave Background for LISATBA

Speaker: Petra Tang


When: Monday, 24 May 2021, 1:00 pm to 2:00 pm

Where: 303-155

Complementary to electromagnetic waves, the detection of gravitational waves (GWs) can lead astrophysics to dive deeper to the understanding of our Universe. Only until the last decade the detections of GWs have become possible. As we expand our search we bring on more challenges. One of these challenges is how do we resolve stochastic gravitational wave background (SGWB). My research uses the Bayesian parametric algorithms to unfold the properties of GW signals. More specifically, my research estimates the power spectral density of mock SGWB signals for the Laser Interferometer Space Antenna (LISA) in the millimeter frequency band. In this talk I will discuss my computational models and some current results using Bayesian parametric models using a Python package PYMC3 and end with further research ideas.


Please give us your feedback or ask us a question

This message is...

My feedback or question is...

My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page