## Department of Statistics

# Seminars

**Dirichlet and Poisson-Dirichlet approximations**

Speaker: Han Liang Gan

Affiliation: Northwestern University

When: Thursday, 26 April 2018, 1:00 pm to 2:00 pm

Where: 303-257

The Dirichlet and Poisson-Dirichlet distributions are multi-dimensional distributions that can be used to model proportions. In this talk, we will give explicit error bounds when applying Dirichlet and Poisson-Dirichlet approximations in a variety of applications that include urn models and stationary distributions of genetic drift models. The results are derived using new developments in Stein's method.

This is joint work with Adrian Rollin (National University of Singapore) and Nathan Ross (University of Melbourne).

**Bayesian nonparametric analysis of multivariate time series**

Speaker: Alexander Meier

Affiliation: Otto-von-Guericke University Magdeburg

When: Wednesday, 9 May 2018, 3:00 pm to 4:00 pm

Where: 303.310

While there is an increasing amount of literature about Bayesian time series analysis, only few nonparametric approaches to multivariate time series exist. Many methods rely on Whittle's likelihood, involving the second order structure of a stationary time series by means of its spectral density matrix f. The latter is often modeled in terms of the Cholesky decomposition to ensure positive definiteness. However, asymptotic properties under these priors such as posterior consistency or posterior contraction rates are not known.

A different idea is to model f by means of random measures. This is in line with (1), who model the normalized spectral density of a univariate time series with a Dirichlet process mixture of beta densities. We use a similar approach, with matrix-valued mixture weights induced by a completely random matrix-valued measure (2,3). We use a class of infinitely divisible matrix Gamma distributions (4) for this purpose. While the procedure performs well in practice, we also establish posterior consistency and derive posterior contraction rates.

(1) N. Choudhuri, S. Ghosal and A. Roy (2004). Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association 99(468), 1050–1059

(2) A. Lijoi and I. Pruenster (2010). Models beyond the Dirichlet process. Bayesian nonparametrics, 28:80

(3) J. B. Robertson, M. Rosenberg, et al. (1968). The decomposition of matrix-valued measures. The Michigan Mathematical Journal, 15(3), 353-368

(4) V. Perez-Abreu and R. Stelzer (2014). Infinitely divisible multivariate and matrix Gamma distributions. Journal of Multivariate Analysis, 130, 155–175

Authors:

Alexander Meier, Otto-von-Guericke University Magdeburg

Claudia Kirch, Otto-von-Guericke University Magdeburg

Renate Meyer, The University of Auckland

**An asymmetric measure of population differentiation based on the saddlepoint approximation method**

Speaker: Louise McMillan

Affiliation: The University of Auckland

When: Wednesday, 16 May 2018, 11:00 am to 12:00 pm

Where: 303-310

In the field of population genetics there are many measures of genetic diversity and population differentiation. The best known is Wright's Fst, later expanded by Cockerham and Weir, which is very widely used as a measure of separation between populations. More recently a multitude of other measures have been developed, from Gst to D, all with different features and disadvantages. One thing these measures all have in common is that they are symmetric, which is to say that the Fst between population A and population B is the same as that between population B and population A. Following my work on GenePlot, a visualization tool for genetic assignment, I am now working on the development of an asymmetric measure, where the fit of A into B may not be the same as the fit of B into A. This measure will enable the detection of scenarios such as "subsetting", the relationship between a large, diverse population A and a smaller population B that has experienced genetic drift since being separated from A. The measure has several features that distinguish it from existing measures, and is constructed using the same saddlepoint approximation method underlying GenePlot, and which is used to approximate the multi-locus genetic distributions of populations.

**Weakly informative prior for mixture models**

Speaker: Kate Lee

Affiliation: Auckland University of Technology

When: Thursday, 19 April 2018, 1:00 pm to 2:00 pm

Where: 303-310

A mixture model is a probability model for presenting the presence of subpopulation within an overall population. It comprises a finite or infinite number of components, possibly of different distributions, that can describe different features of data. They thus facilitate much more careful description of complex systems and they have been adopted in diverse areas. While mixture models have been studied for more than a century, the construction of a reference Bayesian analysis of those models remains unsolved due to the ill-posed nature of such statistical objects. In this talk, a new parameterisation centred on the mean and possibly the variance of the mixture distribution is suggested and based on the reparameterisation, a weakly informative prior for a wide class of mixtures is proposed. I will demonstrate that under some generous conditions, the resulting posterior distributions are proper and illustrate MCMC implementations.

**Statistics of forensic lineage DNA markers**

Speaker: Mikkel Meyer Andersen

Affiliation: Associate professor, Department of Mathematical Sciences, Aalborg University, Denmark

When: Wednesday, 18 April 2018, 11:00 am to 12:00 pm

Where: 303-310

Genetic information from biological material is often used in forensic casework such as in criminal cases. The biological material collected from the crime scene (assumed to originate from the culprit) is used to obtain a so-called DNA profile.

When a suspect is detained, a reference DNA profile can be taken from the suspect and compared to the crime scene profile. If the profiles do not match, the suspect can be released. If the profiles match, the evidential value of this match must be assessed because a DNA profile is only a part of the entire genome. There are many different kinds of DNA profiles and calculation of an evidential value depends on the type of DNA profile. The most common kind of DNA profiles is called autosomal DNA profiles (and are based on the non-sex chromosomes), and there is a wide consensus on how to calculate the evidential value of matching autosomal DNA profiles (I will not spend much time on these).

Another type of DNA profiles are lineage DNA markers as they are based on the paternal lineage using the paternally inherited Y-chromosome and the maternal lineage using the maternally inherited mitochondrial. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source of the DNA. This happens for example in sexual assault cases. Mitochondrial profiles are used for example when the biological material obtained from the crime scene is heavily degraded (e.g. by weather or time). Traditional DNA profiles are based on the nuclear DNA, and if the nuclei of the cells are too damaged then such profiles cannot be obtained. Instead, DNA profiles based on the mitochondria are made because mitochondria are more robust than the nuclei of the cells and are often present even in heavily degraded samples.

DNA profiles based on lineage markers pose a challenging statistical problem as the markers are not statistically independent (as markers used in autosomal DNA profiles are). Thus, the joint probability distribution must be modelled instead of just the marginal distributions.

In this talk, I will discuss methods for calculating the evidential value of lineage DNA markers. This includes both a statistical model based on a finite mixture of generalised linear models (GLMs) and a simulation model. I will discuss computational aspects of both these models.

**Visual trumpery: How charts lie**

Speaker: Alberto Cairo

Affiliation: University of Miami

When: Wednesday, 21 March 2018, 6:30 pm to 7:30 pm

Where: 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

In our final 2018 Ihaka lecture, Alberto Cairo (Knight Chair in Visual Journalism at the University of Miami) will deliver the following:

Visual trumpery: How charts lie -- and how they make us smarter

With facts and truth increasingly under assault, many interest groups have enlisted charts -- graphs, maps, diagrams, etc. -- to support all manner of spin. Because digital images are inherently shareable and can quickly amplify messages, sifting through the visual information and misinformation is an important skill for any citizen.

The use of graphs, charts, maps and infographics to explore data and communicate science to the public has become more and more popular. However, this rise in popularity has not been accompanied by an increasing awareness of the rules that should guide the design of these visualisations.

This talk teaches normal citizens principles to become a more critical and better informed readers of charts.

Lecture commences at 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

Please join us for refreshments from 6pm in the foyer area outside the lecture theatre.

Biography

Alberto Cairo is the Knight Chair in Visual Journalism at the University of Miami. He's also the director of the visualisation programme at UM's Center for Computational Science. Cairo has been a director of infographics and multimedia at news publications in Spain (El Mundo, 2000-2005) and Brazil (Editora Globo, 2010-2012,) and a professor at the University of North Carolina-Chapel Hill. Besides teaching at UM, he works as a freelancer and consultant for companies such as Google and Microsoft. He's the author of the books The Functional Art: An Introduction to Information Graphics and Visualization (2012) and The Truthful Art: Data, Charts, and Maps for Communication (2016).

[ihaka Series Link: https://www.stat.auckland.ac.nz/ihaka-lectures

Map: https://goo.gl/maps/fNuHvmNWPru ]

https://www.stat.auckland.ac.nz/ihaka-lectures

**On the distribution of the ratio for the components of a bivariate normal random vector**

Speaker: Francois Perron

Affiliation: University of Montreal

When: Wednesday, 21 March 2018, 11:00 am to 12:00 pm

Where: 303-310

Let X be a bivariate normal random vector and R=X_1/X_2. We show that the distribution of R can be represented as a Poisson mixture of some new distributions extending the family of Student distributions. We give some of the properties related to this new family. We also show that the distribution of R can be approximated by a normal distribution. We give more precision on how good are the approximations.

**Making colour accessible**

Speaker: Paul Murrell

Affiliation: The University of Auckland

When: Wednesday, 14 March 2018, 6:30 pm to 7:30 pm

Where: 6.30pm, Large Chemistry Lecture Theatre(LgeChem/301-G050), Ground Floor, UoA Building 301 at 23 Symonds Street, City Campus, Auckland Central.

In the second of the 2018 Ihaka lecture series, Associate Professor Paul Murrell (The University of Auckland) will deliver the following lecture:

Making colour accessible

The 'BrailleR' package for R generates text descriptions of R plots.

When combined with screen reader software, this provides information for blind and visually-impaired R users about the contents of an R plot. A minor difficulty that arises in the generation of these text descriptions involves the information about colours within a plot. As far as R is concerned, colours are described as six-digit hexadecimal strings, e.g. "#123456", but that is not very helpful for a human audience. It would be more useful to report colour names like "red" or "blue".

This talk will make a mountain out of that molehill and embark on a daring Statistical Graphics journey featuring colour spaces, high-performance computing, Te Reo, and XKCD. The only disappointment will be the ending.

Lecture commences at 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

Please join us for refreshments from 6pm in the foyer area outside the lecture theatre.

Biography

Paul Murrell is an Associate Professor in the Department of Statistics at The University of Auckland. He is a member of the core development team for R, with primary responsibility for the graphics system.

https://www.stat.auckland.ac.nz/ihaka-lectures

**Myth busting and apophenia in data visualisation: is what you see really there?**

Speaker: Dianne Cook

Affiliation: Monash University

When: Wednesday, 7 March 2018, 6:30 pm to 7:30 pm

Where: 6.30pm, Large Chemistry Lecture Theatre(LgeChem/301-G050), Ground Floor, UoA Building 301 at 23 Symonds Street, City Campus, Auckland Central.

Ihaka lectures 2018: Myth busting and apophenia in data visualisation: is what you see really there?

Launching our 2018 Ihaka Lecture Series, Professor Dianne Cook (Monash University) will deliver the following lecture:

Myth busting and apophenia in data visualisation: Is what you see really there?

[ihaka Series Link: https://www.stat.auckland.ac.nz/ihaka-lectures

Map: https://goo.gl/maps/fNuHvmNWPru ]

In data science, plots of data become important tools for observing patterns, discovering relationship, busting myths, making decisions, and communicating findings. But plots of data can be viewed differently by different observers, and it is easy to imagine patterns that may not exist.

This talk will describe some simple tools for helping to decide if patterns are really there, in the larger context of the problem. We will talk about two protocols, the Rorschach, which can help insulate the mind from spurious structure, and the lineup, which places the data plot in the context of nothing happening. There will be an opportunity for the audience to try out these protocols in examining data from current affairs.

Lecture commences at 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

Please join us for refreshments from 6pm in the foyer area outside the lecture theatre.

Biography

Dianne Cook is a Fellow of the American Statistical Association, elected Ordinary Member of the R Foundation, Editor of the Journal of Computational and Graphical Statistics. Her research is in statistical graphics and exploratory data analysis. She has contributed to the development of several visualisation systems, XGobi, GGobi, numerous R packages, and explored the use of virtual environments, eye trackers, and crowd-sourcing for the purposes of visualising data.

https://www.stat.auckland.ac.nz/ihaka-lectures

**Queueing Models for Healthcare Capacity Planning**

Speaker: Peter T. Vanberkel

Affiliation: Department of Industrial Engineering Dalhousie University

When: Wednesday, 7 March 2018, 3:00 pm to 4:00 pm

Where: 303-310

In this seminar I will present two studies of capacity planning problems which we investigate using queueing theory. To foster collaboration, I will emphasize and discuss extensions and next steps.

In the first study, we develop queuing network models to determine the appropriate number of patients to be managed by a single oncologist. This is often referred to as a physician’s panel size. The key features that distinguish our study of oncology practices from other panel size models are high patient turnover rates, multiple patient and appointment types and follow-up care. The paper develops stationary and non-stationary queuing network models corresponding to stabilized and developing practices, respectively. These models are used to determine new patient arrival rates that ensure practices operate within certain performance thresholds. Extensions to this work are needed to account for collaborative practices where patients with co-morbidities are followed by multiple care providers.

In the second study, we investigate a novel Emergency Department (ED) replacement found in rural communities in Nova Scotia, Canada. Staffed by a paramedic and a registered nurse, and overseen by physician via telephone, Collaborative Emergency Centres (CECs) have replaced traditional physician-led EDs overnight. To determine if CECs are suitable in larger communities we model the flow of patients and analyze the resulting performance with Lindley’s recursion. The analysis, done with simulation, shows that a CECs success depends on the relationship between the demand for primary care appointments and the supply of primary care appointments. Furthermore, we show that larger communities can successfully use CECs but that there are diminishing returns. I’m interested in extending this work such that the analysis of Lindley’s recursion can be completed without simulation.

**A causal assessment of the validity of surrogate endpoints in randomised intervention studies**

Speaker: Jeremy Taylor

Affiliation: Department of Biostatistics, University of Michigan

When: Wednesday, 28 February 2018, 11:00 am to 12:00 pm

Where: 303-310

In randomized clinical trials, a surrogate outcome variable (S) can be measured before the true outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Most previous methods for surrogate validation rely on models for the conditional distribution of T given Z and S. However, S is a post-randomization variable, and unobserved, simultaneous predictors of S and T may exist. When such confounders exist, these methods do not have a causal interpretation. Using the potential outcomes framework introduced by Frangakis and Rubin (2002), we propose a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal, and extend it using copulas to the case of an ordered categorical surrogate and a censored event time true endpoint. We model the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S and propose surrogacy validation measures from this model. By conditioning on principal strata of S, the resulting estimates are causal. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. The method is applied to data from advanced colorectal cancer clinical trials.

**On non-equilibria threshold strategies in ticket queues**

Speaker: Dr. Yoav Kerner

Affiliation: Industrial Engineering and Management, Ben Gurion University of the Negev

When: Wednesday, 31 January 2018, 3:00 pm to 4:00 pm

Where: 303-310

In many real life queueing systems, customer balk from the queue but the environment is aware of it only at their times to be served. Naturally, the balking is an outcome of the queue length, and the decision is based on a threshold. Yet, the inspected queue length contains customers who balked In this work, we consider a Markovian queue with infinite capacity with homogeneous customers with respect to their cost reward functions. We show that any threshold strategy is not a symmetric Nash equilibrium strategy. Furthermore, we show that for any threshold strategy adopted by all, the individual's best response is a double threshold strategy. That is, join if and only if one of the following (i) the inspected queue length is smaller from one threshold, or (ii) the inspected queue length is larger than a second threshold. We discuss the validity of the result when the response time for an absence of customers is positive. We also show that in the case of a finite capacity queue a threshold strategy can be equilibrium, but this result depends on the model's parameters (and the capacity).

**Fitting Markov chains to sampled aggregate data: modelling tree fern gametophyte growth under different conditions**

Speaker: Louise McMillan

Affiliation: The University of Auckland

When: Wednesday, 31 January 2018, 11:00 am to 12:00 pm

Where: 303-310

The alternate life-stage (gametophyte) of plants has received relatively little academic attention, and so the environmental effects on growth not well understood. This talk focuses on a recent experiment by James Brock of the School of Biological Sciences in which he grew tree fern gametophytes in vitro, in different levels of light and phosphorus, and monitored their growth stages. I then worked on fitting Markov chains to the growth stage data, a task made more difficult by the fact that the data was incomplete aggregate data rather than following particular spores through each stage of their growth. This talk will cover the fitting methods used, and the results of the analysis.