Department of Statistics


There are no seminars to show.

Past seminars

Adversarial Risk Analysis for Bi-Agent Influence Diagrams: An Algorithmic Approach

Speaker: Javier Cano


When: Wednesday, 2 November 2022, 2:00 pm to 3:00 pm

Where: 303-310

Authors: Jorge González-Ortega, David Ríos Insua, Javier Cano

Abstract: We describe how to support a decision maker who faces an adversary. To that end, we consider general interactions entailing sequences of both agents’ decisions, some of them possibly being simultaneous or repeated across time. We model their joint problem as a bi-agent influence diagram. Unlike previous solutions framed under a standard game-theoretic perspective, we provide a decision-analytic methodology to support the decision maker based on an adversarial risk analysis paradigm. This allows the avoidance of non-realistic strong common knowledge assumptions typical of non-cooperative game theory as well as a better apportion of uncertainty sources. We illustrate the methodology with a schematic critical infrastructure protection problem.


Generally-altered, -inflated, -truncated and deflated concrete regression

Speaker: Willow Shi

Affiliation: The University of Auckland

When: Wednesday, 31 August 2022, 4:00 pm to 5:00 pm

Where: 303-G14

Zero-altered, -inflated and -truncated count regression is now well established in modern statistical modelling, especially with Poisson and binomial parents. Recently Yee and Ma (2022) have proposed generally-altered, -inflated, -truncated and deflated (GAITD) regression that extends these models in almost a maximal way. The main idea is to combine the four operators into a single supermodel. GAITD regression includes parametric and nonparametric forms, the latter based on the multinomial logit models (MLM). A special application is heaped measurement error data, e.g., spikes at multiples of 10 or 5 upon rounding. The VGAM R package currently implements three GAITD distributions. The next stage of this work is to develop GAITD continuous distributions. However, when handling spikes and seeping (dips), these width-zero support values must be treated as in a discrete distribution. We thus call the distribution 'concrete': both CONtinuous and disCRETE. The resulting GAITD Mix-MLM combo model has thirteen special value types. Some potential applications of GAITD concrete distributions will be presented as well as some preliminary results.

Networks of Accumulating Priority Queues

Speaker: Yan Chen

Affiliation: University of Auckland

When: Friday, 29 July 2022, 2:00 pm to 3:00 pm

Where: 303-310

Priority queueing systems have applications in many different fields.

In accumulating priority queues (APQs), the priority of a new arrival increases with time spent in

the system, at a rate that depends on the class of the arrival. Thus, an arrival from a lower priority class who has been waiting sufficiently long may be seen before a more recent arrival from a higher class.

Most of the research on APQs has been for single queues, while many processes function like a network of queues, particularly in a medical setting. We will apply coupling methods to explore the sample path behaviour and performance of queueing networks under the accumulating priority discipline. Natural performance measures to consider here are waiting and departure times. This talk will describe initial results for sequential queues, as well as discussing proposed further work.

On computationally efficient methods for testing multivariate distributions with unknown parameters

Speaker: Sara Algeri

Affiliation: School of Statistics, University of Minnesota

When: Wednesday, 20 July 2022, 11:00 am to 12:00 pm

Where: 303-310

Despite the popularity of classical goodness fit tests such as Pearson’s chi-squared and Kolmogorov-Smirnov, their applicability often faces serious challenges in practical applications. For instance, in a binned data regime, low counts may affect the validity of the asymptotic results. Excessively large bins, on the other hand, may lead to loss of power. In the unbinned data regime, tests such as Kolmogorov-Smirnov and Cramer-von Mises do not enjoy distribution-freeness if the models under study are multivariate and/or involve unknown parameters. As a result, one needs to simulate the distribution of the test statistic on a case-by-case basis. In this talk, I will discuss a testing strategy that allows us to overcome these shortcomings and equips experimentalists with a novel tool to perform goodness-of-fit while reducing substantially the computational costs.

Joint Modelling of Medical Cost and Survival in Complex Sample Surveys

Speaker: Seong Hoon Yoon

Affiliation: The university of Auckland

When: Wednesday, 25 May 2022, 2:00 pm to 3:00 pm

Where: Zoom

Joint modelling of longitudinal and time-to-event data is a method that recognises the dependency between the two data types, and combines the two outcomes into a single model, which leads to more efficient estimates. These models are applicable when individuals are followed over a period of time, generally to monitor the progression of a disease or a medical condition, and also when longitudinal covariates related to the time-to-event variable are available. The present project consists in developing and applying joint models of medical cost and survival. However, medical cost datasets are usually obtained using a complex sampling design rather than simple random sampling and this design needs to be considered in the statistical analysis. This project aims to develop a novel approach to joint modelling of complex data by combining survey calibration with standard joint modelling, which will be achieved by incorporating a new set of equations to calibrate the sampling weights. The newly developed methods will be applied to jointly model survival and cost data taken from the 'Analyzing clustered epidemiological studies from complex surveys: Using big data to estimate dementia prevalence in New Zealand' data from the Integrated Data Infrastructure (IDI).

Mixed Proportional Hazard Models with Complex Samples

Speaker: Brad Drayton

Affiliation: The University of Auckland

When: Friday, 25 February 2022, 2:00 pm to 3:00 pm


Large amounts of data are collected by the administrative parts of government. These data can potentially be useful for research, but we need valid analysis methods that can account for their complex characteristics. This project focuses on two characteristics of time-to-event data – cluster correlation and complex samples. The use of auxiliary data to improve estimates will also be investigated. Time-to-event data are traditionally modelled with the famous Cox proportional hazards model (PHM). This has been extended to account for cluster correlation – the mixed-effect PHM, and to work with complex samples. A current gap in both theory and software is handling both cluster correlation and complex sampling at once. In this talk I will review key parts of the theory around cox models, the mixed-effect and complex sample extensions, and propose plans to integrate these theories. I will also review the use of auxiliary information, and how our new theory may be extended to incorporate this.

Learning symmetries of regression functions

Speaker: Louis Christie

Affiliation: University of Cambridge

When: Tuesday, 22 February 2022, 2:00 pm to 3:00 pm


A model that incorporate the symmetry of an object to be estimated (in this talk non-parametric regression functions) perform better. One way to build symmetry into a model is feature averaging - taking an average of the model's output over the orbit of the input under some group action G. This averaging operator is a projection on L^2 functions into the space of invariant functions, and so the generalisation error under L^2 loss of the averaged model is guaranteed to be small if the regression function f is invariant to the action. These symmetrised models are increasingly being used in practice through invariant and equivariant neural networks. In this talk we present a consistent hypothesis test for H_0: f is G-invariant against H_1 : f is not G-invariant to give confidence to the use of such models when the symmetry is not known a priori. This is based on a family of test statistics derived from the test errors of the symmetrised and unsymmetrised model. Further, we present a method for estimating the maximal symmetry of a regression function from within a specified class of symmetries.

Adaptive features in platform trials

Speaker: Chuyao Xu

Affiliation: The University of Auckland

When: Monday, 21 February 2022, 2:00 pm to 3:00 pm

Where: Zoom

As a new type of Randomized Controlled Trial (RCT), adaptive platform trials compare multiple interventions to a shared control arm and are usually endowed with processes to remove and add treatment arms and/or change the control treatment during conduct. This approach aims to find effective treatments for a disease instead of focusing on any specific experimental therapy. A popular type of adaptive platform trials is response-adaptive randomization (RAR). This method tends to allocate fewer trial participants to an inferior treatment, especially when the difference between treatments is larger, but requires greater total sample sizes than equal allocation. Supporters of response-adaptive randomisation have argued this provides an ethical argument for using adaptive randomisation. I studied this claim considering also the impact on patients outside the trial, who benefit from the trial results being available earlier in a traditional RCT.

Join Zoom Meeting

Dealing with the badness of goodness-of-fit

Speaker: Rishika Chopara

Affiliation: The University of Auckland

When: Thursday, 17 February 2022, 2:00 pm to 3:00 pm


Goodness-of-fit (GOF) testing is vital to statistical analysis, as it allows us to validate the reliability of any statistical inference we make.

In many models, the deviance is used to assess GOF by comparing it against a Chi-squared distribution. However, in some situations (e.g. when dealing with sparse counts) the deviance does not have a Chi-squared distribution, even approximately, yielding such tests unusable. For example, this problem often arises in capture-recapture studies, for which various methods have been proposed to avoid the assumption of a Chi-squared distribution —often these resort to simulation. In principle, the true distribution for the deviance should be computable, however in practice, this would involve inverting the moment-generating function. We show that this step can be achieved using the saddlepoint method, which allows us to calculate a close approximation to the true underlying distribution of the deviance. Using this approximation, we no longer need to worry about the suitability of the Chi-squared distribution nor do we need to turn to simulation, enhancing the usability and power of GOF tests involving the deviance.

Another approach to GOF is to regard it as part of the model selection process. Model selection can be cumbersome when dealing with models that are challenging or time-consuming to fit. Score tests are a promising option in this scenario because they do not require all the models under comparison to be fitted. However, the challenge of computing the derivative of the log-likelihood for complex models means the score test approach has been underutilised. In addition, score tests are more powerful if the expected (rather than the observed) information matrix is used, but this can be difficult to compute. We demonstrate that these issues can be resolved by using automatic differentiation, as implemented in the R package TMB. This approach greatly increases the scope, accessibility, and stability of the score test framework.

Investigating linkage bias in the Integrated Data Infrastructure

Speaker: Eileen Li

Affiliation: The University of Auckland

When: Tuesday, 14 December 2021, 2:00 pm to 3:00 pm


Linked administrative data can provide rich information on a wide range of outcomes, and its usage in on the rise both in New Zealand and internationally. The Integrated Data Infrastructure (IDI) is a database maintained by Statistics New Zealand (Stats NZ) and contains linked administrative data at individual level. In the absence of unique personal identifier, probabilistic record linkage is performed which unavoidably would evoke linkage errors. However, the majority of IDI analysis is completed without understanding, measuring or correcting for potential linkage bias. We aim to quantify linkage errors in the IDI and provide feasible approaches to adjust for linkage biases in IDI analysis. In this talk, I will briefly explain how linkage errors (false links and missed links) may occur in the IDI, followed by approaches on false link and missed link identification. Some key limitations will also be addressed.


Please give us your feedback or ask us a question

This message is...

My feedback or question is...

My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page