Department of Statistics


There are no seminars to show.

Past seminars

Joint Modelling of Medical Cost and Survival in Complex Sample Surveys

Speaker: Seong Hoon Yoon

Affiliation: The university of Auckland

When: Wednesday, 25 May 2022, 2:00 pm to 3:00 pm

Where: Zoom

Joint modelling of longitudinal and time-to-event data is a method that recognises the dependency between the two data types, and combines the two outcomes into a single model, which leads to more efficient estimates. These models are applicable when individuals are followed over a period of time, generally to monitor the progression of a disease or a medical condition, and also when longitudinal covariates related to the time-to-event variable are available. The present project consists in developing and applying joint models of medical cost and survival. However, medical cost datasets are usually obtained using a complex sampling design rather than simple random sampling and this design needs to be considered in the statistical analysis. This project aims to develop a novel approach to joint modelling of complex data by combining survey calibration with standard joint modelling, which will be achieved by incorporating a new set of equations to calibrate the sampling weights. The newly developed methods will be applied to jointly model survival and cost data taken from the 'Analyzing clustered epidemiological studies from complex surveys: Using big data to estimate dementia prevalence in New Zealand' data from the Integrated Data Infrastructure (IDI).

Mixed Proportional Hazard Models with Complex Samples

Speaker: Brad Drayton

Affiliation: The University of Auckland

When: Friday, 25 February 2022, 2:00 pm to 3:00 pm


Large amounts of data are collected by the administrative parts of government. These data can potentially be useful for research, but we need valid analysis methods that can account for their complex characteristics. This project focuses on two characteristics of time-to-event data – cluster correlation and complex samples. The use of auxiliary data to improve estimates will also be investigated. Time-to-event data are traditionally modelled with the famous Cox proportional hazards model (PHM). This has been extended to account for cluster correlation – the mixed-effect PHM, and to work with complex samples. A current gap in both theory and software is handling both cluster correlation and complex sampling at once. In this talk I will review key parts of the theory around cox models, the mixed-effect and complex sample extensions, and propose plans to integrate these theories. I will also review the use of auxiliary information, and how our new theory may be extended to incorporate this.

Learning symmetries of regression functions

Speaker: Louis Christie

Affiliation: University of Cambridge

When: Tuesday, 22 February 2022, 2:00 pm to 3:00 pm


A model that incorporate the symmetry of an object to be estimated (in this talk non-parametric regression functions) perform better. One way to build symmetry into a model is feature averaging - taking an average of the model's output over the orbit of the input under some group action G. This averaging operator is a projection on L^2 functions into the space of invariant functions, and so the generalisation error under L^2 loss of the averaged model is guaranteed to be small if the regression function f is invariant to the action. These symmetrised models are increasingly being used in practice through invariant and equivariant neural networks. In this talk we present a consistent hypothesis test for H_0: f is G-invariant against H_1 : f is not G-invariant to give confidence to the use of such models when the symmetry is not known a priori. This is based on a family of test statistics derived from the test errors of the symmetrised and unsymmetrised model. Further, we present a method for estimating the maximal symmetry of a regression function from within a specified class of symmetries.

Adaptive features in platform trials

Speaker: Chuyao Xu

Affiliation: The University of Auckland

When: Monday, 21 February 2022, 2:00 pm to 3:00 pm

Where: Zoom

As a new type of Randomized Controlled Trial (RCT), adaptive platform trials compare multiple interventions to a shared control arm and are usually endowed with processes to remove and add treatment arms and/or change the control treatment during conduct. This approach aims to find effective treatments for a disease instead of focusing on any specific experimental therapy. A popular type of adaptive platform trials is response-adaptive randomization (RAR). This method tends to allocate fewer trial participants to an inferior treatment, especially when the difference between treatments is larger, but requires greater total sample sizes than equal allocation. Supporters of response-adaptive randomisation have argued this provides an ethical argument for using adaptive randomisation. I studied this claim considering also the impact on patients outside the trial, who benefit from the trial results being available earlier in a traditional RCT.

Join Zoom Meeting

Dealing with the badness of goodness-of-fit

Speaker: Rishika Chopara

Affiliation: The University of Auckland

When: Thursday, 17 February 2022, 2:00 pm to 3:00 pm


Goodness-of-fit (GOF) testing is vital to statistical analysis, as it allows us to validate the reliability of any statistical inference we make.

In many models, the deviance is used to assess GOF by comparing it against a Chi-squared distribution. However, in some situations (e.g. when dealing with sparse counts) the deviance does not have a Chi-squared distribution, even approximately, yielding such tests unusable. For example, this problem often arises in capture-recapture studies, for which various methods have been proposed to avoid the assumption of a Chi-squared distribution —often these resort to simulation. In principle, the true distribution for the deviance should be computable, however in practice, this would involve inverting the moment-generating function. We show that this step can be achieved using the saddlepoint method, which allows us to calculate a close approximation to the true underlying distribution of the deviance. Using this approximation, we no longer need to worry about the suitability of the Chi-squared distribution nor do we need to turn to simulation, enhancing the usability and power of GOF tests involving the deviance.

Another approach to GOF is to regard it as part of the model selection process. Model selection can be cumbersome when dealing with models that are challenging or time-consuming to fit. Score tests are a promising option in this scenario because they do not require all the models under comparison to be fitted. However, the challenge of computing the derivative of the log-likelihood for complex models means the score test approach has been underutilised. In addition, score tests are more powerful if the expected (rather than the observed) information matrix is used, but this can be difficult to compute. We demonstrate that these issues can be resolved by using automatic differentiation, as implemented in the R package TMB. This approach greatly increases the scope, accessibility, and stability of the score test framework.

Investigating linkage bias in the Integrated Data Infrastructure

Speaker: Eileen Li

Affiliation: The University of Auckland

When: Tuesday, 14 December 2021, 2:00 pm to 3:00 pm


Linked administrative data can provide rich information on a wide range of outcomes, and its usage in on the rise both in New Zealand and internationally. The Integrated Data Infrastructure (IDI) is a database maintained by Statistics New Zealand (Stats NZ) and contains linked administrative data at individual level. In the absence of unique personal identifier, probabilistic record linkage is performed which unavoidably would evoke linkage errors. However, the majority of IDI analysis is completed without understanding, measuring or correcting for potential linkage bias. We aim to quantify linkage errors in the IDI and provide feasible approaches to adjust for linkage biases in IDI analysis. In this talk, I will briefly explain how linkage errors (false links and missed links) may occur in the IDI, followed by approaches on false link and missed link identification. Some key limitations will also be addressed.

Estimating the approximation error for the saddlepoint maximum likelihood estimate

Speaker: Godrick Maradona Oketch

Affiliation: The University of Auckland

When: Wednesday, 8 December 2021, 2:00 pm to 3:00 pm


Saddlepoint approximation to a density function is increasingly being used primarily because of its immense accuracy. A common application of this approximation is to interpret it as a likelihood function, especially when the true likelihood function does not exist or is intractable. The application aims at obtaining parameter estimates using the likelihood function based on saddlepoint approximation. This study examines the likelihood function (based on first and second-order saddlepoint approximation) to estimate the difference between the true but unknown maximum likelihood estimation (MLE) estimates and the saddlepoint-based MLEs. We propose an expression to estimate this difference (error) by computing the gradient of the neglected term in the first-order saddlepoint approximation. Then using common distributions whose true likelihood functions are known to perform confirmatory tests on the proposed error expression, we show that the results are consistent with the difference between the true MLEs and saddlepoint MLEs. These tests indicate that the proposed formula could complement simulation studies, which has been widely used to justify the accuracy of such saddlepoint MLEs.

Disease risk prediction using deep neural networks

Speaker: Xiaowen Li

Affiliation: The University of Auckland

When: Thursday, 2 December 2021, 2:00 pm to 3:00 pm


Accurate disease risk prediction is an essential step towards precision medicine, an emerging model of health-care that tailors treatment strategies based on individual's profiles. The recent abundant genome-wide data provide unprecedented opportunities to systematically investigate complex human diseases. However, the ultra-high dimensionality and complex relationships between biomarkers and outcomes have brought tremendous analytical challenges. Hence, dimension reduction is crucial for analysing high-dimensional genomic data. Deep learning models are promising approaches for modelling features of high complexity, and thus they have the potential to offer a unified approach in efficiently modelling diseases with different underlying genetic architectures. The overall objective of this project is to develop a hybrid deep neural network with multi-kernel Hilbert-Schmidt independence Criterion-Lasso (MK-HSIC-lasso) incorporated to efficiently select important predictors from ultra-high dimensional genomic data and model their complex relationships, for risk prediction analysis on high-dimensional genomic data.

Accessing 'grid' from 'ggplot2'

Speaker: Paul Murrell

Affiliation: The University of Auckland

When: Thursday, 18 November 2021, 3:00 pm to 4:00 pm


The 'ggplot2' package for R is a very popular package for producing statistical plots (in R). 'ggplot2' provides a high-level interface that makes it easy to produce complex images from small amounts of R code. The 'grid' package for R is an unpopular package for producing arbitrary images (in R). 'grid' provides a low-level interface that requires a lot of work to produce complex images. However, 'grid' provides complete control over the fine details of an image. 'ggplot2' uses the low-level package 'grid' to do its drawing so, in theory, users should be able to get the best of both worlds. This talk will discuss the surprising fact that 'ggplot2' users cannot easily get the best of both worlds and it will introduce the 'gggrid' package, which is here to save the day (and both worlds).

Applications of scoring rules

Speaker: Matthew Parry

Affiliation: University of Otago

When: Thursday, 21 October 2021, 3:00 pm to 4:00 pm


Suppose you publicly express your uncertainty about an unobserved quantity by quoting a distribution for it. A scoring rule is a special kind of loss function intended to measure the quality of your quoted distribution when an outcome is actually observed. In statistical decision theory, you seek to minimise your expected loss. A scoring rule is said to be proper if the expected loss under your quoted distribution is minimised by quoting that distribution. In other words, you cannot game the system!

In addition to having a rich theoretical structure – for example, associated with every scoring rule is an entropy and a divergence function – scoring rules can be tailored to the problem at hand and consequently have a wide range of application. They are used in statistical inference, for evaluating and ranking of forecasters, for assessing the quality of predictive distributions, and in exams.

I will talk about a range of scoring rules and discuss their application in areas such as classification and time series. In addition to so-called local scoring rules that do not depend on the normalisation of the quoted distribution, I will also discuss recently discovered connections between scoring rules and the Whittle likelihood.


Please give us your feedback or ask us a question

This message is...

My feedback or question is...

My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page