Department of Statistics
Postgraduate research topics
Postgraduate students at the Department of Statistics have many research areas to choose from.
| Topic | Supervisor |
|---|---|
|
Generalised Estimating Equations (GEEs) in the Multivariate Omnibus Test We will be exploring (by simulation) the performance of GEEs for the analysis of unbalanced multivariate ANOVA designs. |
Brian McArdle b.mcardle@auckland.ac.nz |
|
Function Discontinuity Detection and Location Graphics systems do not generally provide primitives for drawing general smooth curves. The most commonly used technique is to appoximate the curve by a connected piecewise straight-line "polyline". In R the curve defined by the function f over the interval [a,b] could be drawn as follows. x = seq(a, b, length = 1000) If the function f is discontinuous this approach fails (e.g. consider the function f(x) = 1/x on the interval [-1,1]). The purpose of this topic is to explore techniques for The project could be taken further by developing methods for detecting and locating sharp "corners" in a function's graph. This amounts to finding the discontinuities in the derivative f'(x). To take on this project you will need to be a competent R programmer (i.e. have completed or be doing 782) and be familar with notions like "discontinuity", "derivative", "smooth" etc. |
Ross Ihaka r.ihaka@auckland.ac.nz |
|
Implementing Computational Methods for Frequency Domain Time Series Analysis. The purpose of this project is to build a R library of functions for carrying out frequency-domain analysis of time series. This includes filtering, spectrum and cross-spectrum analysis. The basis of such To take on this project you will need to be a competent R programmer (i.e. have completed or be doing 782) and be familiar with the ideas of frequency-domain time series analysis. |
Ross Ihaka r.ihaka@auckland.ac.nz |
|
A diagnostic for the Gaussian copula The Gaussian copula is a multivariate model that says each variable can be independently transformed so that the distribution is multivariate Normal. This model is popular in finance and in bioinformatics. The Gaussian copula model implies that transforming each variable to Normal will increase the correlation, and the actual change in correlation provides a possible diagnostic. The project would involve both simulations and data analysis, and would require good computing and graphics skills. |
Thomas Lumley t.lumley@auckland.ac.nz |
| Analysis of “deal or no deal” Description: We will look at variants of the gameshow “deal or no deal”. The project will involve probability and simulation, and perhaps some statistics. Required background would be stats 320 or 325 and a 3rd year pure mathematics course. |
Mark Holmes m.holmes@auckland.ac.nz |
| Urn models and clinical trials | Mark Holmes m.holmes@auckland.ac.nz |
|
Meta-analysis of diverse longitudinal survey data on children Meta-analysis seeks to combine the results of several studies related to a particular research question. This is in order to obtain better estimates than could be obtained from a single study . Typically a common measure of effect size is used and a weighted average is calculated . The weighting takes account of differences between the various studies for example in sample size and demographic composition. Using data from several New Zealand longitudinal studies on children, the student will investigate and carry out the meta-analysis of selected health, educational and behavioural outcomes. |
Peter Davis pb.davis@auckland.ac.nz Alan Lee aj.lee@auckland.ac.nz |
|
Bayesian estimation of the coancestry coefficient. Balding (Likelihood-based inference for genetic correlation coefficients. Theoretical Population Biology, 2003, 63: 221-230) proposed a Bayesian method for estimating the coancestry coefficient. The coancestry coefficient can be regarded as a measure of low-level relatedness between individuals within a population, or alternatively, as a genetic distance between populations. Balding, Ayres and others encapsulated this method in a C programme called BayesFst (http://www.reading.ac.uk/Statistics/genetics/software.html). Part of this project will involve taking Professor Balding’s code and writing an R package to call it and simplify its use. The other part of this project will involve understanding the population genetics behind the programme and applying it to some data. One goal is to use multicore technology to run multiple chains of the MCMC sampler simultaneously. This project is suitable for students with very good programming skills and preferably with knowledge of C or a similar language. |
James Curran j.curran@auckland.ac.nz |
| Automated identification and classification of bird calls in long-duration recordings | Louis Ranjard l.ranjard@auckland.ac.nz James Russell j.russell@auckland.ac.nz |
| Population modelling interactions between introduced and threatened species for conservation management | James Russell j.russell@auckland.ac.nz |
|
Model selection in complex survey data Model selection for prediction is fairly well understood in independently-sampled data, with the Akaike Information Criterion (AIC) giving the best predictive model and the Schwartz Information Criterion (BIC) choosing the true model given enough data. Alastair Scott and I have ideas for how to extend these model-selection criteria to complex samples. The project would involve mostly simulations, but with some example analyses using real survey data. Needs good computing skills, STATS 330, should plan to take STATS 740. |
Thomas Lumley t.lumley@auckland.ac.nz |
|
An R package for the analysis of designed experiments R has become the software of choice for many statistical applications. Yet, surprisingly, an important area of statistics in which there has been a lack of development of R functions is for the analysis of data from designed experiments. While the lm() and aov() functions in R, for example, can be used for this purpose, both have their limitations particularly if experimental design has a lot of structure. The goal of this project is to write a suite of R functions that enable users to define, in a straightforward manner, the structure of even the most complex experiments and which generate meaningful results for the purposes of designed experiments. |
Katya Ruggiero k.ruggiero@auckland.ac.nz Chris Triggs cm.triggs@auckland.ac.nz James Curran j.curran@auckland.ac.nz |
|
Meta-analysis in tests for rare genetic variants. It is now possible to measure the entire DNA sequence of a gene in a large number of individuals, and tests have been developed for assessing the importance of rare genetic variants. In some DNA resequencing studies that involve collaboration between multiple cohorts it is necessary to do the analysis separately within each cohort and combine the results. We need to know how much information is lost by this separate analysis and pooling. The project would mostly involve simulations, but there may be a real data example available towards the end. Needs good computing skills, STATS 310. Knowing what a gene is would be helpful but not completely necessary. |
Thomas Lumley t.lumley@auckland.ac.nz |
|
Producing HTML Tables with the xtable Package The R package xtable is capable of producing elaborately structured tables in LaTeX format, but only limited arrangements in HTML format. This project has the aim of allowing the production of far more sophisticated tables in HTML format using xtable. The student will also investigate other R packages. |
David Scott d.scott@auckland.ac.nz |
|
The Generalized Lambda Distribution There are a number of packages dealing with the generalized lambda distribution on CRAN (and one which is no longer on CRAN). This project will use existing work to bring some of this disparate code into a coherent form using the same structure as my other packages for distributions. |
David Scott d.scott@auckland.ac.nz |
|
Fitting financial data using heavy-tailed distributions and copulas Financial data typically is heavy-tailed and skewed. This project will examine the fitting of multivariate data using marginals which are heavy-tailed and copulas. |
David Scott d.scott@auckland.ac.nz |
|
How can length of hospital stay be reliably analysed? Length of hospital stay is measured as the number of days from admission to discharge. As a data variable, it has particular characteristics: it is a count rather than continuous; and it has an atypical distribution skewed towards many short stays and with a long tail containing outliers with very long stays. Using New Zealand hospital admissions data, the student will investigate and carry out analyses of the determinants of length of stay, assessing the strengths and weaknesses of and comparing the results from various methods. |
Peter Davis Alastair Scott |
|
Winter growth in electricity demand Each year in April / May / June, the electricity industry discovers the amount by which the winter demand for electricity has increased since the previous winter. Much of this increase is not predictable in advance, because it arises from new installations of equipment (e.g. for heating) that is used only in the winter months. It is of interest to estimate the size of the increase as early as possible in the winter, using data available at that time. Complications include weather (e.g. this winter may be starting out colder than last winter did) and public holidays (particularly Easter, the date of which varies from year to year). |
Geoffrey Pritchard g.pritchard@auckland.ac.nz |
| Web-based interactive graphics | Paul Murrell p.murrell@auckland.ac.nz |
| Visualizing hypergraphs | Paul Murrell p.murrell@auckland.ac.nz |
| Raster images in statistical graphics | Paul Murrell p.murrell@auckland.ac.nz |
| Processing and analyses of some data sets. Project only available for Semester 1. |
Thomas Yee t.yee@auckland.ac.nz |
| Parameter estimation for selected univariate distributions. Project only available for Semester 1 |
Thomas Yee t.yee@auckland.ac.nz |
|
Comparison of risk scoring versus propensity scoring approaches for control of multiple case-mix factors in hospital performance comparisons. (This might involve simulation studies and practical examples of the performance of different approaches to dealing with multiple case-mix factors (e.g ~30). Prognostic scores could be considered too). |
Thomas Lumley Barry Milne |
|
Investigation of Bayesian additive regression trees (BART) for estimation of propensity scores. Description: On the face of it BART should be a useful tool for estimating propensity scores. However initial experimentation with this approach suggested that it was inferior to logistic regression approaches in achieving covariate balance. Could further experimentation with settings within the BART method improve BART's performance (e.g number of depth of trees fitted) improve BART's performance or is there something else going on here? Could it be that sparsenesss in the data upsets BART or does the fact that covariate balance rather prediction, per se, is the aim of propensity score estimation mean that BART is less well suited to propensity score estimation than to pure prediction problems. |
Thomas Lumley Peter Davis |
|
2-stage dependent percolation models. Description: Percolation describes a class of mathematical models for porous media. These models have been used to analyze such things as fractures in sea ice. Typical models are defined in terms of a single parameter p, and exhibit a ``phase transition’’ (as p increases from 0 to 1) at some critical value p_c. |
Mark Holmes m.holmes@auckland.ac.nz |
| Accessible graphics for time-series data. This project will look at ways in which people graph time series data with a view to identifying ways of displaying the features in one or more time series so that they are easy to understand. The point of this project is to inform scope and design decisions for a time-series module for the iNZight package (http://www.stat.auckland.ac.nz/~wild/iNZight/) and build prototypes. In addition to decisions about what gets displayed and how, thinking is needed about input-data formats that should be catered for. Requirements: Stats 326 and Stats 380. |
Chris Wild c.wild@auckland.ac.nz |
|
Accessible graphics for data on maps. This project will look at ways in which people display data about what is happening at different geographical locations to inform scope and design decisions for a data-displayed-on maps module for the iNZight package (http://www.stat.auckland.ac.nz/~wild/iNZight/) and build some prototypes. In addition to decisions about what gets displayed and how, thinking is needed about input-data formats that should be catered for. |
Chris Wild c.wild@auckland.ac.nz |
| Accessible graphics for multiple-binary-response data. Many simple-questionnaire items produce sets of binary responses on similar topics (e.g. in Census At School, “Do you have: a smart phone? TV in bedroom? …”). This project will look at ways of analysing and displaying the features of such data so that they are easy to understand. The point of this project is to inform scope and design decisions for a multiple-binary-response module for the iNZight package (http://www.stat.auckland.ac.nz/~wild/iNZight/) and build some prototypes. In addition to decisions about what gets displayed and how, thinking is needed about input-data formats that should be catered for. Requirements: Stats 310 and Stats 380. |
Chris Wild c.wild@auckland.ac.nz |
| Statistical versus paleontological evidence for estimating the timing of species divergence. In 1967, New-Zealand-born Allan Wilson was the first to use molecular sequences to date the divergence between humans and apes. Since then, the statistical methods for estimating these dates have improved quite significantly. The amount of genetic data available has also increased dramatically. Despite this, timing the origins of animals is still subject to debate. In particular, dates estimated from molecular data are generally twice as old as those derived from paleontology, suggesting that animals first occurred before the Cambrian explosion. This project will aim at testing new Bayesian techniques for estimating divergence dates from molecular data. In particular, parametric bootstrap approaches will be used to better understand the discrepancy between methods for inferring the timing of speciation in a phylogenetic framework. |
Stephane Guindon s.guindon@auckland.ac.nz |
| Non-Sampling Error in the 2011 New Zealand Election Survey. The New Zealand Election Survey was conducted over the summer period, December/February 2011/2012. It involved sending questionnaires to several thousand individuals identified from the electoral roll. There are two features that are worthy of further analysis from the point of view possible non-sampling error and bias - the low response rate (43%) and the fact that a small sub-sample chose to submit electronically rather than using a paper form. The project is designed to assess the degree to which non-responders to the survey might have been systematically different from those completing a questionnaire and whether those submitting electronically were also a select group and whether their response pattern was different (e.g. item non-response). |
Alastair Scott Peter Davis |
|
Improving models of protein evolution. The accuracy with which phylogenetic relationships between species are estimated largely depends on models describing how substitutions accumulate among protein sequences over the course of evolution. |
Stephane Guindon s.guindon@auckland.ac.nz |
| Topic | Supervisor |
|---|---|
| Modelling cricket batting scores |
Chris Triggs |
| Species detection modelling for conservation | James Russell j.russell@auckland.ac.nz |
| Student attitudes towards introductory statistics courses |
Stephanie Budgett/ Maxine Pfannkuch |
| Statistics education research and its implications for the teaching and learning of statistics |
Stephanie Budgett/ Maxine Pfannkuch |
| Parameter estimation for selected univariate distributions | Thomas Yee t.yee@auckland.ac.nz |
| Topics in statistical computing | Thomas Yee t.yee@auckland.ac.nz |
| Using principal components for the evaluation likelihood ratios for forensic trace evidence | James Curran j.curran@auckland.ac.nz |
| Assessing the ecosystem effects of fishing on shallow reefs using Information derived from satellite and aerial imagery | Nick Shears n.shears@auckland.ac.nz |
| Bayesian unit root tests in stochastic volatility models |
Renate Meyer |
| Using Copulas to model dependence for risk assessment | Renate Meyer renate.meyer@auckland.ac.nz |
| Modelling the size-selectivity of fishing gear | Russell Millar r.millar@auckland.ac.nz |
| Standardized weights for survey regression |
Thomas Lumley |
| Subsample design and survey calibration for subsampling in genetic association studies | Thomas Lumley t.lumley@auckland.ac.nz |
| Software for teaching (and doing) regression | Thomas Lumley t.lumley@auckland.ac.nz |
| Analysis of supersaturated designs | Arden Miller a.miller@auckland.ac.nz |
| Modelling the order book in a stock market | Geoff Pritchard g.pritchard@auckland.ac.nz |
| Tools for reproducible research using R | David Scott d.scott@auckland.ac.nz |
| The multivariate generalised hyperbolic distribution | David Scott
d.scott@auckland.ac.nz |
| Testing methods for automated detection of footprints from conservation species | James Russell j.russell@auckland.ac.nz |
| R-package for fertilization models | Russell Millar r.millar@auckland.ac.nz |
| R-package for nonlinear multivariate canonical models | Russell Millar r.millar@auckland.ac.nz |
| Can a mother's diet during pregnancy affect the long-term health of her daughters and granddaughters? | Chris Triggs cm.triggs@auckland.ac.nz |
| Finding gene sets | Chris Triggs cm.triggs@auckland.ac.nz |
| Signature for a disease | Chris Triggs cm.triggs@auckland.ac.nz |
| What foods are safe to eat? | Chris Triggs cm.triggs@auckland.ac.nz |
| What information do different distance measures and transformations emphasise? | Brian McArdle b.mcardle@auckland.ac.nz |
| Analysis of hospital performance indicators | Alastair Scott/ Peter Davis a.scott@auckland.ac.nz pb.davis@auckland.ac.nz |
| Capacity allocation and optimization at Auckland City Hospital | Ilze Ziedins i.ziedins@auckland.ac.nz |
| Time-varying Markov decision processes | Ilze Ziedins i.ziedins@auckland.ac.nz |
| Estimating equations for regression | Chris Wild c.wild@auckland.ac.nz |
| Visualising software for statistics education | Chris Wild c.wild@auckland.ac.nz |
| Analysis of a case-cohort data set using SAS | Patricia Metcalf p.metcalf@auckland.ac.nz Thomas Lumley t.lumley@auckland.ac.nz |
| Bayesian medical statistics | Peter David pb.davis@auckland.ac.nz Alastair Scott a.scott@auckland.ac.nz Thomas Lumley t.lumley@auckland.ac.nz Wayne Stewart w.stewart@auckland.ac.nz |
| Modeling the relationships between meteorological variables and sediment loads in the coastal marine environment. | Nick Shears n.shears@auckland.ac.nz |
| Is our climate changing and how does this affect rocky reef communities? Analysis of long-term meteorological and biological monitoring data from the Leigh Marine Laboratory | Nick Shears n.shears@auckland.ac.nz |
| An improved R mosaic plot | Ross Ihaka r.ihaka@auckland.ac.nz |
| Statistical methods to model rugby union scores and results | Alan Lee aj.lee@auckland.ac.nz |
| Modelling hydroelectricity in New Zealand | Geoff Pritchard g.pritchard@auckland.ac.nz |
| Tracking changes in primary medical care | Alastair Scott a.scott@auckland.ac.nz |
| Estimating advertising decay through Adstock modelling | Andrew Balemi a.balemi@auckland.ac.nz |
| The use of Bayes count to assess the need for zero inflation in ecological data | Brian McArdle b.mcardle@auckland.ac.nz |
|
Simulation based confidence intervals for parameters of spatial point process models. Outline: For non-Poisson point process models, the distribution of parameter estimates is analytically intractable and must be conducted via simulation methods. A fairly ``obvious'' method of constructing a Depending on time constraints and the student's inclinations we might also look for ways of improving the ``obvious'' method. For instance we might investigate whether the Effron and Tibshirani ``accelerated bias-corrected method'' of forming bootstrap confidence intervals can be applied in this context. The only prerequisites for this project are a basic understanding of confidence intervals and a good facility for programming in the R language. |
Rolf Turner r.turner@auckland.ac.nz |
For new available PhD topics, please contact our PhD enrolment officers as supervisors are willing to consider different topics to match suitable PhD applicants:
Rachel Fewster
Email: r.fewster@auckland.ac.nz
Renate Meyer
Email: renate.meyer@auckland.ac.nz
Geoffrey Pritchard
Email: g.pritchard@auckland.ac.nz
Thomas Yee
Email: t.yee@auckland.ac.nz
Below are some examples of PhD research topics that have been or are still being offered.
| Topic | Supervisor |
|---|---|
| Exponentially weighted moving average control charts | Arden Miller a.miller@auckland.ac.nz |
| Modeling and optimal control of queueing networks | Ilze Ziedins i.ziedins@auckland.ac.nz |
| Monte Carlo methods for LISA data analysis | Renate Meyer renate.meyer@auckland.ac.nz |
| Making the link between investigative questions and conclusions in the statistical enquiry cycle | Maxine Pfannkuch m.pfannkuch@auckland.ac.nz |
| Multivariate predictors of diabetes and CVD | Patricia Metcalf p.metcalf@auckland.ac.nz |
| Mixture density estimation under shape restrictions | Yong Wang yong.wang@auckland.ac.nz |
| Efficient analysis with biased samples | Chris Wild c.wild@auckland.ac.nz |
| Quantitative models for ecological data | Russell Millar r.millar@auckland.ac.nz |
| Some problems concerning the generalized hyperbolic distribution | David Scott d.scott@auckland.ac.nz |
| New methods for estimating effective population size | Rachel Fewster r.fewster@auckland.ac.nz |
| A statistical computation environment built on common lisp | Ross Ihaka r.ihaka@auckland.ac.nz |
| Wind power in the New Zealand electricity market | Geoffrey Pritchard g.pritchard@auckland.ac.nz |
| Statistics applied to statistical biology, multivariate modeling and inference | Thomas Yee t.yee@auckland.ac.nz |
| Analysis and design of outcome-dependent sampling studies using semiparametric maximum likelihood | Alan Lee aj.lee@auckland.ac.nz |
| Assessing biological and statistical violations in the analysis of conservation datasets | James Russell j.russell@auckland.ac.nz |
Below are some examples of our Statistics PhD student posters that were presented.



