**2018 Seminars**

## Department of Statistics

# 2018 Seminars

Seminars by year: Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018

**A hidden Markov open population model for spatial capture-recapture surveys**

Speaker: Prof David Borchers

Affiliation: University of St Andrews

When: Wednesday, 12 December 2018, 11:00 am to 12:00 pm

Where: 303-310

Open population capture-recapture models are widely used to estimate population demographics and abundance over time The only published open population methods for spatial capture-recapture surveys use Markov chain Monte Carlo methods for inference, and have relatively high computational cost. We formulate open population spatial capture-recapture surveys as a hidden Markov model, allowing inference by maximum likelihood for both Cormack-Jolly-Seber and Jolly-Seber models with and without animal activity centre movement. The method is applied to a twelve-year survey of male jaguars (Panthera onca) in the Cockscomb Wildlife Sanctuary Basin, Belize, to estimate survival probability and population abundance over time. For this application, inference is shown to be biased when assuming activity centres are fixed over time, while including a model for activity centre movement provides negligible bias and nominal confidence interval coverage, as demonstrated by a simulation study. The hidden Markov model approach is compared with Bayesian method and a series of closed population models applied to the same data. The method is much more efficient than the Bayesian approach and provides a lower root-mean-square error in predicting population density compared to a series of closed population models.

**Visualization and directional measures of population differentiation based on the saddlepoint approximation method**

Speaker: Louise McMillan

Affiliation: The University of Auckland

When: Wednesday, 5 December 2018, 3:00 pm to 4:00 pm

Where: 303-310

We propose a method for visualizing genetic assignment data by characterizing the distribution of genetic profiles for each candidate source population (McMillan & Fewster, 2017). This method enhances the assignment method of Rannala & Mountain (1997) by calculating appropriate graph positions for individuals for which some genetic data are missing. An individual with missing data is positioned in the distributions of genetic profiles for a population according to its estimated quantile based on its available data. The quantiles of the genetic profile distribution for each population are calculated by approximating the cumulative distribution function (CDF) using the saddlepoint method, and then inverting the CDF to get the quantile function. The saddlepoint method also provides a way to visualize assignment results calculated using the leave-one-out procedure. We call the resulting plots GenePlots.

This new method offers an advance upon assignment software such as GeneClass2, which provides no visualization method, and is biologically more interpretable than the bar charts provided by the widely-used genetic software STRUCTURE.

We show results from simulated data and apply the methods to microsatellite genotype data from ship rats (Rattus rattus) captured on the Great Barrier Island archipelago, New Zealand. The visualization method makes it straightforward to detect features of population structure and to judge the discriminative power of the genetic data for assigning individuals to source populations.

We then advance these techniques further by proposing methods for quantifying population genetic structure, and associated tests of significance. The measures we propose are closely related to GenePlots, and enable visual features obvious from the plots to be expressed more formally. One measure is the interloper detection probability: for two random genotypes arising from populations A and B, the probability that the one from A has the better fit to A and thus the genotype from B would be correctly identified as the `interloper' in A. Another measure is the correct assignment probability: this corresponds to the probability that a random genotype arising from A would be correctly assigned to A rather than B.

Using permutation tests, we can test two populations for significant population structure. These permutation tests are sensitive to subtle population structure, and are particularly useful for eliciting asymmetric features of the populations being studied, e.g. where one population has undergone extensive genetic drift but the other population has remained large enough to retain greater genetic diversity.

McMillan, L. F. and Fewster, R. M. (2017), Visualizations for genetic assignment analyses using the saddlepoint approximation method. Biometrics, 73:1029�1041. doi:10.1111/biom.12667

**The mixture of Markov jump processes: distributional properties and statistical estimation**

Speaker: Budhi Surya

Affiliation: Victoria University of Wellington

When: Wednesday, 21 November 2018, 3:00 pm to 4:00 pm

Where: 303-310

In this talk, I will discuss a tractable development and statistical estimation of the finite mixture of right-continuous Markov jump processes moving at different speeds on the same finite state space, while the speed regimes are assumed to be unobservable. Unlike the underlying processes, the mixture itself has non-Markov property. The mixture model was first proposed by Frydman (JASA 2005) as a generalization of the mover-stayer model of Blumen et al. (Cornell Stud. Ind. Labor Relat., 1955) and was recently generalized by Surya (Stochastic Systems 2018 and Surya 2018), in which further distributional properties and explicit identities of the process are given. Statistical estimation is performed under complete and incomplete observations of the sample paths of the mixture process. Under complete observations, maximum likelihood estimates are given in explicit form in terms of sufficient statistics of the process. Estimation under incomplete observation is performed under the EM algorithm. The estimation results completely characterize the mixture process in terms of the initial probability of the process, intensity matrices of the underlying Markov processes, and the switching probability matrix. The new method generalizes the existing statistical inferences for the Markov model (Albert, Ann. Math. Statist, 1961), the mover-stayer model (Frydman, JASA 1984), and the Markov mixture model (Frydman, JASA 2005). Some numerical examples are given to illustrate the results.

**Estimation of inbreeding, kinship and population structure parameters**

Speaker: Professor Bruce Weir

Affiliation: Department of Biostatistics, University of Washington, Washington, US

When: Friday, 16 November 2018, 1:00 pm to 2:00 pm

Where: 303-G15

Knowledge of inbreeding, kinship and population structure is important for many aspects of population, quantitative, ecological, human and forensic genetics. These quantities can all be framed in terms of probabilities of pairs of alleles being identical by descent, within and between individuals, or within and between populations.

Values of the various parameters can be predicted if the individual or population pedigrees are known. With information about only the observed individuals or populations, however, it can be difficult to estimate the parameters. Many current estimators use functions involving squares and products of sample allele frequencies and this can result in rankings of kinship estimates, for example, that do not reflect the rankings of the pedigree values for a set of individuals.

Professor Weir and his colleagues are having success with approaches based on allele matching proportions, without explicit use of sample allele frequencies. In this seminar, he will present results from recent papers in Genetics and Molecular Ecology and some unpublished work with simulated, human and natural population data.

**The split-and-drift random graph, a null model for speciation**

Speaker: Francois Bienvenu

Affiliation: Sorbonne Universite, Paris

When: Wednesday, 14 November 2018, 3:00 pm to 4:00 pm

Where: 303-310

We introduce a new random graph model motivated by biological questions relating to speciation. This random graph is defined as the stationary distribution of a Markov chain on the space of graphs on {1,... ,n}. The dynamics of this Markov chain is governed by two types of events: vertex duplication, where at constant rate a pair of vertices is sampled uniformly and one of these vertices loses its incident edges and is rewired to the other vertex and its neighbors; and edge removal, where each edge disappears at constant rate. Besides the number of vertices n, the model has a single parameter r_n.

Using a coalescent approach, we obtain explicit formulas for the first moments of several graph invariants such as the number of edges or the number of complete subgraphs of order k. These are then used to identify five non-trivial regimes depending on the asymptotics of the parameter r_n. We derive an explicit expression for the degree distribution, and show that under appropriate rescaling it converges to classical distributions when the number of vertices goes to infinity. Finally, we give asymptotic bounds for the number of connected components, and show that in the sparse regime the number of edges is Poissonian.

**All New Zealand Acute Coronary Syndrome - Quality Improvement (ANZACS-QI) Programme: National Coverage and Applications**

Speaker: Yannan Jiang and Rachel Chen

Affiliation: Statistical consulting centre, Department of Statistics, Auckland University

When: Wednesday, 14 November 2018, 11:00 am to 12:00 pm

Where: 303-310

Governance of the ANZACS-QI programme is on behalf of the NZ branch of the Cardiac Society of Australia and New Zealand (CSANZ) and the NZ Cardiac Network, and funded by the NZ Ministry of Health. The goal of this programme is to support appropriate, evidence-based management of ACS and drive improvements in the quality of cardiac care delivered in New Zealand’s hospitals. It also aims to reduce regional variation in regard to the assessment, investigation and treatment of patients with ACS.

ANZACS-QI utilises two main sources of anonymised heart disease related patient data, including the ANZACS-QI web-based registry deployed nationally to 41 public hospitals and 6 private hospitals, and the MOH national hospitalisation and mortality datasets. This talk will provide an overview of the ANZAS-QI programme and data collections, discuss its coverage and agreement with the national administrative datasets, and present some applications in national healthcare system and clinical research studies.

**Mixture-based Nonparametric Density Estimation**

Speaker: Yong Wang

Affiliation: The University of Auckland

When: Wednesday, 7 November 2018, 11:00 am to 12:00 pm

Where: 303-310

In this talk, I will describe a general framework for nonparametric density estimation that uses nonparametric or semiparametric mixture distributions. Similar to kernel-based estimation, the proposed approach uses bandwidth to control the density smoothness, but each density estimate for a fixed bandwidth is determined by likelihood maximization, with bandwidth selection carried out as model selection. This leads to much simpler models than the kernel ones, yet with higher accuracy. Results of simulation studies and real-world data in both the univariate and the multivariate situation will be given, all suggesting that these mixture-based estimators outperform the kernel-based ones.

**Version control: An introduction to git and GitHub using RStudio**

Speaker: Blake Seers

Affiliation: Data analyst, Department of Statistics Consultant Statistical Consulting Centre (SCC), Auckland University

When: Wednesday, 10 October 2018, 11:00 am to 12:00 pm

Where: 303-310

Reproducibility is the hallmark of good science. Git is becoming increasingly popular with researchers as it can facilitate greater reproducibility and transparency in science [1]. In this talk I will introduce the git version control system, the GitHub hosting service, and how to get started on using git for version control in RStudio.

[1]: Ram, K 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine 2013 8:7. https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-8-7

**Maximum likelihood estimation for latent count models**

Speaker: Wei Zhang

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 26 September 2018, 11:00 am to 12:00 pm

Where: 303-310

Latent count models constitute an important modeling class in which a latent vector of counts, z, is summarized or corrupted for reporting, yielding observed data y=Tz where T is a known but non-invertible matrix. The observed vector y generally follows an unknown multivariate distribution with a complicated dependence structure. Latent count models arise in diverse fields, such as estimation of population size from capture-recapture studies; inference on multi-way contingency tables summarized by marginal totals; or analysis of route flows in networks based on traffic counts at a subset of nodes. Currently, inference under these models relies primarily on stochastic algorithms for sampling the latent vector z, typically in a Bayesian data-augmentation framework. These schemes involve long computation times and can be difficult to implement. Here, we present a novel maximum-likelihood approach using likelihoods constructed by the saddlepoint approximation. We show how the saddlepoint likelihood may be maximized efficiently, yielding fast inference even for large problems. For the case where z has a multinomial distribution, we validate the approximation by applying it to a specific model for which an exact likelihood is available. We implement the method for several models of interest, and evaluate its performance empirically and by comparison with other estimation approaches. The saddlepoint method consistently gives fast and accurate inference, even when y is dominated by small counts.

**Incorporating spatial processes into stock assessment models, with application to the New Zealand surfclam and snapper fisheries**

Speaker: Christopher Nottingham

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 19 September 2018, 11:00 am to 12:00 pm

Where: 303-310

Fisheries stock assessments use mathematical and statistical descriptions of population dynamics to make quantitative predictions about the reactions of a stock to alternative management choices. For high value species this commonly involves carrying out a simulation procedure known as a management strategy evaluation (MSE). This procedure involves developing a number of stock assessment models and testing how well they perform with respect to various states of nature (commonly known as operating models) and a harvest allocation rule that determines management actions (e.g., setting catch limits for the next year) based on the results of the stock assessment models. In the proposed PhD, MSE is used to further develop and improve the management of the surfclam and snapper fisheries in New Zealand. A large part of the proposed work will involve developing novel population dynamics models. The proposed models account for continuous spatial processes that are correlated with time and the sizes or ages of individuals. These processes are represented as Gaussian Markov Random Fields that are approximated using the stochastic partial differentiation approach.

**Linear mixed model for multi-level omics data**

Speaker: Yang Hai

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 15 August 2018, 11:00 am to 12:00 pm

Where: 303-310

Accurate disease prediction is expected to facilitate the precision medicine with emerging genetic findings and other existing knowledge (Ashley et al., 2015). The genomic best linear unbiased prediction (gBLUP) models and their extensions are widely used for genetic risk prediction, where genetic effects were assumed to follow different distributions. However, the true effect size distribution for any given outcome is usually unknown (Chatterjee et al., 2016). While multi-omic information and family structure can further improve the predictive accuracy, few related analytical approaches were developed.

For my thesis, I will first develop a hybrid model that can resemble the shape of true effect size distributions. Secondly, I propose a Bayesian framework for family-based genetic risk prediction, which improves the prediction performance by using unmeasured polygenic and shared environmental variations explained by the within-family correlation. Thirdly, I propose to develop a method for risk prediction which can efficiently integrate information from multi-level omics data.

Ashley E A. The precision medicine initiative: a new national effort[J]. Jama, 2015, 313(21): 2119-2120.

Chatterjee N, Shi J, Garcia-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention[J]. Nature Reviews Genetics, 2016, 17(7): 392.

**Modelling career trajectories of cricket players using Gaussian processes**

Speaker: Oliver Stevenson

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 8 August 2018, 11:00 am to 12:00 pm

Where: 303-310

In the sport of cricket, variations in a player’s batting ability can usually be measured on one of two scales. Short-term changes in ability that are observed during a single innings, which can span multiple days, and long-term changes that are witnessed over entire playing careers, which can span decades. To measure short-term, within-innings variation, a Bayesian survival analysis method is derived and used to fit a model which predicts how the batting abilities of professional cricketers change during an innings. A second model is then fitted which employs a Gaussian process to measure and predict between-innings variations in ability. Given the high dimensionality of the Gaussian process model, and for ease of model comparison, models are fitted using nested sampling. Generally speaking, the results support an anecdotal description of a typical sporting career. Young players tend to begin their careers with some raw but undeveloped ability, which improves over time as they gain experience and participate in specialised training and coaching regimes. Eventually players reach the peak of their careers, after which ability tends to decline. However, continual fluctuations in ability are commonly observed for many players, likely due to external factors such as injury and player form, which can lead to multiple peaks during a long career. The results provide more accurate quantifications of a player’s batting ability at any given point of their career, compared with traditional cricketing metrics. This has practical implications in terms of talent identification, player comparison and team selection policy.

**The C3 depletion design: Population estimation and gear calibration using Catches from Concentric Circles.**

Speaker: Liese Carleton

Affiliation: Virginia Institute of Marine Science, College of William & Mary

When: Monday, 28 May 2018, 1:00 pm to 2:00 pm

Where: 303-310

Depletion studies are often used in closed systems to estimate population size and catchability coefficient. Application of depletion methods to open water systems is hindered by the uncertain size of the defined domain due to the attraction of fish from the outside into the study area. In this novel design approach, the study area is comprised of two concentric circles. The diameter of the outer circle is specified by the length of a bottom longline, which is set repeatedly in a star pattern to serially deplete the circle. Catches are recorded as either within the smaller inner circle or in the outer ring. This design allows us to include an immigration component into the depletion model so that initial abundance, catchability, and net movement can be estimated. Gear efficiency can be derived from the estimated catchability, and could then be used to convert a survey index of abundance into an estimate of absolute population size. The method is illustrated with bottom longline sets for Atlantic sharpnose shark in the Gulf of Mexico.

**Combinatorial Inference**

Speaker: Junwei Lu

Affiliation: Department of Operations Research and Financial Engineering Princeton University

When: Monday, 14 May 2018, 1:00 pm to 2:00 pm

Where: 303-310

We propose the combinatorial inference to explore the topologicl structures of graphical models. The combinatorial inference can conduct the hypothesis tests on many graph properties including connectivity, hub detection, perfect matching, etc. On the other side, we also develop a generic minimax lower bound which shows the optimality of the proposed method for a large family of graph properties. Our methods are applied to the neuroscience by discovering hub voxels contributing to visual memories.

**Bayesian nonparametric analysis of multivariate time series**

Speaker: Alexander Meier

Affiliation: Otto-von-Guericke University Magdeburg

When: Wednesday, 9 May 2018, 3:00 pm to 4:00 pm

Where: 303.310

While there is an increasing amount of literature about Bayesian time series analysis, only few nonparametric approaches to multivariate time series exist. Many methods rely on Whittle's likelihood, involving the second order structure of a stationary time series by means of its spectral density matrix f. The latter is often modeled in terms of the Cholesky decomposition to ensure positive definiteness. However, asymptotic properties under these priors such as posterior consistency or posterior contraction rates are not known.

A different idea is to model f by means of random measures. This is in line with (1), who model the normalized spectral density of a univariate time series with a Dirichlet process mixture of beta densities. We use a similar approach, with matrix-valued mixture weights induced by a completely random matrix-valued measure (2,3). We use a class of infinitely divisible matrix Gamma distributions (4) for this purpose. While the procedure performs well in practice, we also establish posterior consistency and derive posterior contraction rates.

(1) N. Choudhuri, S. Ghosal and A. Roy (2004). Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association 99(468), 1050–1059

(2) A. Lijoi and I. Pruenster (2010). Models beyond the Dirichlet process. Bayesian nonparametrics, 28:80

(3) J. B. Robertson, M. Rosenberg, et al. (1968). The decomposition of matrix-valued measures. The Michigan Mathematical Journal, 15(3), 353-368

(4) V. Perez-Abreu and R. Stelzer (2014). Infinitely divisible multivariate and matrix Gamma distributions. Journal of Multivariate Analysis, 130, 155–175

Authors:

Alexander Meier, Otto-von-Guericke University Magdeburg

Claudia Kirch, Otto-von-Guericke University Magdeburg

Renate Meyer, The University of Auckland

**Modelling spatial-temporal processes with applications to hydrology and wildfires**

Speaker: Professor Valerie Isham

Affiliation: University College London

When: Friday, 4 May 2018, 11:00 am to 12:00 pm

Where: 303-610

Mechanistic stochastic models aim to represent an underlying physical process (albeit in highly idealised form, and using stochastic components to reflect uncertainty) via analytically tractable models, in which interpretable parameters relate directly to physical phenomena. Such models can be used to gain understanding of the process dynamics and thereby to develop control strategies.

In this talk, I will review some stochastic point process-based models constructed in continuous time and continuous space using spatial-temporal examples from hydrology such as rainfall (where flood control is a particular application) and soil moisture. By working with continuous spaces, consistent properties can be obtained analytically at any spatial and temporal resolutions, as required for fitting and applications. I will briefly cover basic model components and properties, and then go on to discuss model construction, fitting and validation, including ways to incorporate nonstationarity and climate change scenarios. I will also describe some thoughts about using similar models for wildfires.

**Dirichlet and Poisson-Dirichlet approximations**

Speaker: Han Liang Gan

Affiliation: Northwestern University

When: Thursday, 26 April 2018, 1:00 pm to 2:00 pm

Where: 303-257

The Dirichlet and Poisson-Dirichlet distributions are multi-dimensional distributions that can be used to model proportions. In this talk, we will give explicit error bounds when applying Dirichlet and Poisson-Dirichlet approximations in a variety of applications that include urn models and stationary distributions of genetic drift models. The results are derived using new developments in Stein's method.

This is joint work with Adrian Rollin (National University of Singapore) and Nathan Ross (University of Melbourne).

**Weakly informative prior for mixture models**

Speaker: Kate Lee

Affiliation: Auckland University of Technology

When: Thursday, 19 April 2018, 1:00 pm to 2:00 pm

Where: 303-310

A mixture model is a probability model for presenting the presence of subpopulation within an overall population. It comprises a finite or infinite number of components, possibly of different distributions, that can describe different features of data. They thus facilitate much more careful description of complex systems and they have been adopted in diverse areas. While mixture models have been studied for more than a century, the construction of a reference Bayesian analysis of those models remains unsolved due to the ill-posed nature of such statistical objects. In this talk, a new parameterisation centred on the mean and possibly the variance of the mixture distribution is suggested and based on the reparameterisation, a weakly informative prior for a wide class of mixtures is proposed. I will demonstrate that under some generous conditions, the resulting posterior distributions are proper and illustrate MCMC implementations.

**Statistics of forensic lineage DNA markers**

Speaker: Mikkel Meyer Andersen

Affiliation: Associate professor, Department of Mathematical Sciences, Aalborg University, Denmark

When: Wednesday, 18 April 2018, 11:00 am to 12:00 pm

Where: 303-310

Genetic information from biological material is often used in forensic casework such as in criminal cases. The biological material collected from the crime scene (assumed to originate from the culprit) is used to obtain a so-called DNA profile.

When a suspect is detained, a reference DNA profile can be taken from the suspect and compared to the crime scene profile. If the profiles do not match, the suspect can be released. If the profiles match, the evidential value of this match must be assessed because a DNA profile is only a part of the entire genome. There are many different kinds of DNA profiles and calculation of an evidential value depends on the type of DNA profile. The most common kind of DNA profiles is called autosomal DNA profiles (and are based on the non-sex chromosomes), and there is a wide consensus on how to calculate the evidential value of matching autosomal DNA profiles (I will not spend much time on these).

Another type of DNA profiles are lineage DNA markers as they are based on the paternal lineage using the paternally inherited Y-chromosome and the maternal lineage using the maternally inherited mitochondrial. Y-chromosome profiles are valuable when there is a mixture of male-source and female-source DNA, and interest centres on the identity of the male source of the DNA. This happens for example in sexual assault cases. Mitochondrial profiles are used for example when the biological material obtained from the crime scene is heavily degraded (e.g. by weather or time). Traditional DNA profiles are based on the nuclear DNA, and if the nuclei of the cells are too damaged then such profiles cannot be obtained. Instead, DNA profiles based on the mitochondria are made because mitochondria are more robust than the nuclei of the cells and are often present even in heavily degraded samples.

DNA profiles based on lineage markers pose a challenging statistical problem as the markers are not statistically independent (as markers used in autosomal DNA profiles are). Thus, the joint probability distribution must be modelled instead of just the marginal distributions.

In this talk, I will discuss methods for calculating the evidential value of lineage DNA markers. This includes both a statistical model based on a finite mixture of generalised linear models (GLMs) and a simulation model. I will discuss computational aspects of both these models.

**Visual trumpery: How charts lie**

Speaker: Alberto Cairo

Affiliation: University of Miami

When: Wednesday, 21 March 2018, 6:30 pm to 7:30 pm

Where: 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

In our final 2018 Ihaka lecture, Alberto Cairo (Knight Chair in Visual Journalism at the University of Miami) will deliver the following:

Visual trumpery: How charts lie -- and how they make us smarter

With facts and truth increasingly under assault, many interest groups have enlisted charts -- graphs, maps, diagrams, etc. -- to support all manner of spin. Because digital images are inherently shareable and can quickly amplify messages, sifting through the visual information and misinformation is an important skill for any citizen.

The use of graphs, charts, maps and infographics to explore data and communicate science to the public has become more and more popular. However, this rise in popularity has not been accompanied by an increasing awareness of the rules that should guide the design of these visualisations.

This talk teaches normal citizens principles to become a more critical and better informed readers of charts.

Lecture commences at 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

Please join us for refreshments from 6pm in the foyer area outside the lecture theatre.

Biography

Alberto Cairo is the Knight Chair in Visual Journalism at the University of Miami. He's also the director of the visualisation programme at UM's Center for Computational Science. Cairo has been a director of infographics and multimedia at news publications in Spain (El Mundo, 2000-2005) and Brazil (Editora Globo, 2010-2012,) and a professor at the University of North Carolina-Chapel Hill. Besides teaching at UM, he works as a freelancer and consultant for companies such as Google and Microsoft. He's the author of the books The Functional Art: An Introduction to Information Graphics and Visualization (2012) and The Truthful Art: Data, Charts, and Maps for Communication (2016).

[ihaka Series Link: https://www.stat.auckland.ac.nz/ihaka-lectures

Map: https://goo.gl/maps/fNuHvmNWPru ]

https://www.stat.auckland.ac.nz/ihaka-lectures

**On the distribution of the ratio for the components of a bivariate normal random vector**

Speaker: Francois Perron

Affiliation: University of Montreal

When: Wednesday, 21 March 2018, 11:00 am to 12:00 pm

Where: 303-310

Let X be a bivariate normal random vector and R=X_1/X_2. We show that the distribution of R can be represented as a Poisson mixture of some new distributions extending the family of Student distributions. We give some of the properties related to this new family. We also show that the distribution of R can be approximated by a normal distribution. We give more precision on how good are the approximations.

**Making colour accessible**

Speaker: Paul Murrell

Affiliation: The University of Auckland

When: Wednesday, 14 March 2018, 6:30 pm to 7:30 pm

Where: 6.30pm, Large Chemistry Lecture Theatre(LgeChem/301-G050), Ground Floor, UoA Building 301 at 23 Symonds Street, City Campus, Auckland Central.

In the second of the 2018 Ihaka lecture series, Associate Professor Paul Murrell (The University of Auckland) will deliver the following lecture:

Making colour accessible

The 'BrailleR' package for R generates text descriptions of R plots.

When combined with screen reader software, this provides information for blind and visually-impaired R users about the contents of an R plot. A minor difficulty that arises in the generation of these text descriptions involves the information about colours within a plot. As far as R is concerned, colours are described as six-digit hexadecimal strings, e.g. "#123456", but that is not very helpful for a human audience. It would be more useful to report colour names like "red" or "blue".

This talk will make a mountain out of that molehill and embark on a daring Statistical Graphics journey featuring colour spaces, high-performance computing, Te Reo, and XKCD. The only disappointment will be the ending.

Lecture commences at 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

Please join us for refreshments from 6pm in the foyer area outside the lecture theatre.

Biography

Paul Murrell is an Associate Professor in the Department of Statistics at The University of Auckland. He is a member of the core development team for R, with primary responsibility for the graphics system.

https://www.stat.auckland.ac.nz/ihaka-lectures

**Myth busting and apophenia in data visualisation: is what you see really there?**

Speaker: Dianne Cook

Affiliation: Monash University

When: Wednesday, 7 March 2018, 6:30 pm to 7:30 pm

Where: 6.30pm, Large Chemistry Lecture Theatre(LgeChem/301-G050), Ground Floor, UoA Building 301 at 23 Symonds Street, City Campus, Auckland Central.

Ihaka lectures 2018: Myth busting and apophenia in data visualisation: is what you see really there?

Launching our 2018 Ihaka Lecture Series, Professor Dianne Cook (Monash University) will deliver the following lecture:

Myth busting and apophenia in data visualisation: Is what you see really there?

[ihaka Series Link: https://www.stat.auckland.ac.nz/ihaka-lectures

Map: https://goo.gl/maps/fNuHvmNWPru ]

In data science, plots of data become important tools for observing patterns, discovering relationship, busting myths, making decisions, and communicating findings. But plots of data can be viewed differently by different observers, and it is easy to imagine patterns that may not exist.

This talk will describe some simple tools for helping to decide if patterns are really there, in the larger context of the problem. We will talk about two protocols, the Rorschach, which can help insulate the mind from spurious structure, and the lineup, which places the data plot in the context of nothing happening. There will be an opportunity for the audience to try out these protocols in examining data from current affairs.

Lecture commences at 6.30pm, Large Chemistry Lecture Theatre, Ground Floor, Building 301, 23 Symonds Street, City Campus, Auckland Central.

Please join us for refreshments from 6pm in the foyer area outside the lecture theatre.

Biography

Dianne Cook is a Fellow of the American Statistical Association, elected Ordinary Member of the R Foundation, Editor of the Journal of Computational and Graphical Statistics. Her research is in statistical graphics and exploratory data analysis. She has contributed to the development of several visualisation systems, XGobi, GGobi, numerous R packages, and explored the use of virtual environments, eye trackers, and crowd-sourcing for the purposes of visualising data.

https://www.stat.auckland.ac.nz/ihaka-lectures

**Queueing Models for Healthcare Capacity Planning**

Speaker: Peter T. Vanberkel

Affiliation: Department of Industrial Engineering Dalhousie University

When: Wednesday, 7 March 2018, 3:00 pm to 4:00 pm

Where: 303-310

In this seminar I will present two studies of capacity planning problems which we investigate using queueing theory. To foster collaboration, I will emphasize and discuss extensions and next steps.

In the first study, we develop queuing network models to determine the appropriate number of patients to be managed by a single oncologist. This is often referred to as a physician’s panel size. The key features that distinguish our study of oncology practices from other panel size models are high patient turnover rates, multiple patient and appointment types and follow-up care. The paper develops stationary and non-stationary queuing network models corresponding to stabilized and developing practices, respectively. These models are used to determine new patient arrival rates that ensure practices operate within certain performance thresholds. Extensions to this work are needed to account for collaborative practices where patients with co-morbidities are followed by multiple care providers.

In the second study, we investigate a novel Emergency Department (ED) replacement found in rural communities in Nova Scotia, Canada. Staffed by a paramedic and a registered nurse, and overseen by physician via telephone, Collaborative Emergency Centres (CECs) have replaced traditional physician-led EDs overnight. To determine if CECs are suitable in larger communities we model the flow of patients and analyze the resulting performance with Lindley’s recursion. The analysis, done with simulation, shows that a CECs success depends on the relationship between the demand for primary care appointments and the supply of primary care appointments. Furthermore, we show that larger communities can successfully use CECs but that there are diminishing returns. I’m interested in extending this work such that the analysis of Lindley’s recursion can be completed without simulation.

**A causal assessment of the validity of surrogate endpoints in randomised intervention studies**

Speaker: Jeremy Taylor

Affiliation: Department of Biostatistics, University of Michigan

When: Wednesday, 28 February 2018, 11:00 am to 12:00 pm

Where: 303-310

In randomized clinical trials, a surrogate outcome variable (S) can be measured before the true outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Most previous methods for surrogate validation rely on models for the conditional distribution of T given Z and S. However, S is a post-randomization variable, and unobserved, simultaneous predictors of S and T may exist. When such confounders exist, these methods do not have a causal interpretation. Using the potential outcomes framework introduced by Frangakis and Rubin (2002), we propose a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal, and extend it using copulas to the case of an ordered categorical surrogate and a censored event time true endpoint. We model the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S and propose surrogacy validation measures from this model. By conditioning on principal strata of S, the resulting estimates are causal. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. The method is applied to data from advanced colorectal cancer clinical trials.

**On non-equilibria threshold strategies in ticket queues**

Speaker: Dr. Yoav Kerner

Affiliation: Industrial Engineering and Management, Ben Gurion University of the Negev

When: Wednesday, 31 January 2018, 3:00 pm to 4:00 pm

Where: 303-310

In many real life queueing systems, customer balk from the queue but the environment is aware of it only at their times to be served. Naturally, the balking is an outcome of the queue length, and the decision is based on a threshold. Yet, the inspected queue length contains customers who balked In this work, we consider a Markovian queue with infinite capacity with homogeneous customers with respect to their cost reward functions. We show that any threshold strategy is not a symmetric Nash equilibrium strategy. Furthermore, we show that for any threshold strategy adopted by all, the individual's best response is a double threshold strategy. That is, join if and only if one of the following (i) the inspected queue length is smaller from one threshold, or (ii) the inspected queue length is larger than a second threshold. We discuss the validity of the result when the response time for an absence of customers is positive. We also show that in the case of a finite capacity queue a threshold strategy can be equilibrium, but this result depends on the model's parameters (and the capacity).

**Fitting Markov chains to sampled aggregate data: modelling tree fern gametophyte growth under different conditions**

Speaker: Louise McMillan

Affiliation: The University of Auckland

When: Wednesday, 31 January 2018, 11:00 am to 12:00 pm

Where: 303-310

The alternate life-stage (gametophyte) of plants has received relatively little academic attention, and so the environmental effects on growth not well understood. This talk focuses on a recent experiment by James Brock of the School of Biological Sciences in which he grew tree fern gametophytes in vitro, in different levels of light and phosphorus, and monitored their growth stages. I then worked on fitting Markov chains to the growth stage data, a task made more difficult by the fact that the data was incomplete aggregate data rather than following particular spores through each stage of their growth. This talk will cover the fitting methods used, and the results of the analysis.

**Strategic Behavior in Transportation Systems**

Speaker: Dr. Athanasia Manou

Affiliation: Department of Industrial Engineering, Koc University

When: Wednesday, 24 January 2018, 3:00 pm to 4:00 pm

Where: 303-310

In this work we study the behavior of strategic customers in transportation systems. We consider two different models. In the first model, arriving customers decide whether to join the station or balk, based on a natural reward-cost structure. Solving the game among customers, we determine their strategic behavior and explore the effect of key service parame- ters on customer behavior. In the second model, arriving customers decide whether to join the station or balk and the administrator sets the fee. In this case, a two-stage game among the customers and the administrator takes place. Moreover, we consider three cases distinguished by the level of delay information provided to customers at their arrival instants. We explore how system parameters affect the customer behavior and the fee imposed by the administrator. We then compare the three cases and show that the customers almost always prefer to know their exact waiting times whereas the administrator prefers to provide either no information or the exact waiting time depending on the system parameters.