## Department of Statistics

# Seminars

**The mixture of Markov jump processes: distributional properties and statistical estimation**

Speaker: Budhi Surya

Affiliation: Victoria University of Wellington

When: Wednesday, 21 November 2018, 11:00 am to 12:00 pm

Where: 303-310

**Critical issues in recent guidelines**

Speaker: Prof Markus Neuhaeuser

Affiliation: Dept. of Mathematics and Technology, Koblenz University of Applied Sciences, Remagen, Germany

When: Tuesday, 12 February 2019, 11:00 am to 12:00 pm

Where: 303-310

To increase rigor and reproducibility, some medical journals provide detailed guidelines for experimental design and statistical analysis. Although this development is positive, quite a few recommendations are critical because they reduce the power or are indefensible from a statistical point of view. This is shown using two current examples, namely the 2017 published checklist of the journal Circulation Research [Circulation Research. 2017; 121:472-9] and the 2018 published guideline of the British Journal of Pharmacology [British Journal of Pharmacology. 2018; 175:987-93]. Topics discussed are the analysis of variance in case of heteroscedasticity, the question of balanced sample sizes, the power calculation including so-called post-hoc power analyses, minimum group sizes, and the t test for small samples.

https://www.hs-koblenz.de/en/profilepages/profil/neuhaeuser/

**Version control: An introduction to git and GitHub using RStudio**

Speaker: Blake Seers

Affiliation: Data analyst, Department of Statistics Consultant Statistical Consulting Centre (SCC), Auckland University

When: Wednesday, 10 October 2018, 11:00 am to 12:00 pm

Where: 303-310

Reproducibility is the hallmark of good science. Git is becoming increasingly popular with researchers as it can facilitate greater reproducibility and transparency in science [1]. In this talk I will introduce the git version control system, the GitHub hosting service, and how to get started on using git for version control in RStudio.

[1]: Ram, K 2013. Git can facilitate greater reproducibility and increased transparency in science. Source Code for Biology and Medicine 2013 8:7. https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-8-7

**Maximum likelihood estimation for latent count models**

Speaker: Wei Zhang

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 26 September 2018, 11:00 am to 12:00 pm

Where: 303-310

Latent count models constitute an important modeling class in which a latent vector of counts, z, is summarized or corrupted for reporting, yielding observed data y=Tz where T is a known but non-invertible matrix. The observed vector y generally follows an unknown multivariate distribution with a complicated dependence structure. Latent count models arise in diverse fields, such as estimation of population size from capture-recapture studies; inference on multi-way contingency tables summarized by marginal totals; or analysis of route flows in networks based on traffic counts at a subset of nodes. Currently, inference under these models relies primarily on stochastic algorithms for sampling the latent vector z, typically in a Bayesian data-augmentation framework. These schemes involve long computation times and can be difficult to implement. Here, we present a novel maximum-likelihood approach using likelihoods constructed by the saddlepoint approximation. We show how the saddlepoint likelihood may be maximized efficiently, yielding fast inference even for large problems. For the case where z has a multinomial distribution, we validate the approximation by applying it to a specific model for which an exact likelihood is available. We implement the method for several models of interest, and evaluate its performance empirically and by comparison with other estimation approaches. The saddlepoint method consistently gives fast and accurate inference, even when y is dominated by small counts.

**Incorporating spatial processes into stock assessment models, with application to the New Zealand surfclam and snapper fisheries**

Speaker: Christopher Nottingham

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 19 September 2018, 11:00 am to 12:00 pm

Where: 303-310

Fisheries stock assessments use mathematical and statistical descriptions of population dynamics to make quantitative predictions about the reactions of a stock to alternative management choices. For high value species this commonly involves carrying out a simulation procedure known as a management strategy evaluation (MSE). This procedure involves developing a number of stock assessment models and testing how well they perform with respect to various states of nature (commonly known as operating models) and a harvest allocation rule that determines management actions (e.g., setting catch limits for the next year) based on the results of the stock assessment models. In the proposed PhD, MSE is used to further develop and improve the management of the surfclam and snapper fisheries in New Zealand. A large part of the proposed work will involve developing novel population dynamics models. The proposed models account for continuous spatial processes that are correlated with time and the sizes or ages of individuals. These processes are represented as Gaussian Markov Random Fields that are approximated using the stochastic partial differentiation approach.

**Linear mixed model for multi-level omics data**

Speaker: Yang Hai

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 15 August 2018, 11:00 am to 12:00 pm

Where: 303-310

Accurate disease prediction is expected to facilitate the precision medicine with emerging genetic findings and other existing knowledge (Ashley et al., 2015). The genomic best linear unbiased prediction (gBLUP) models and their extensions are widely used for genetic risk prediction, where genetic effects were assumed to follow different distributions. However, the true effect size distribution for any given outcome is usually unknown (Chatterjee et al., 2016). While multi-omic information and family structure can further improve the predictive accuracy, few related analytical approaches were developed.

For my thesis, I will first develop a hybrid model that can resemble the shape of true effect size distributions. Secondly, I propose a Bayesian framework for family-based genetic risk prediction, which improves the prediction performance by using unmeasured polygenic and shared environmental variations explained by the within-family correlation. Thirdly, I propose to develop a method for risk prediction which can efficiently integrate information from multi-level omics data.

Ashley E A. The precision medicine initiative: a new national effort[J]. Jama, 2015, 313(21): 2119-2120.

Chatterjee N, Shi J, Garcia-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention[J]. Nature Reviews Genetics, 2016, 17(7): 392.

**Modelling career trajectories of cricket players using Gaussian processes**

Speaker: Oliver Stevenson

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 8 August 2018, 11:00 am to 12:00 pm

Where: 303-310

In the sport of cricket, variations in a player’s batting ability can usually be measured on one of two scales. Short-term changes in ability that are observed during a single innings, which can span multiple days, and long-term changes that are witnessed over entire playing careers, which can span decades. To measure short-term, within-innings variation, a Bayesian survival analysis method is derived and used to fit a model which predicts how the batting abilities of professional cricketers change during an innings. A second model is then fitted which employs a Gaussian process to measure and predict between-innings variations in ability. Given the high dimensionality of the Gaussian process model, and for ease of model comparison, models are fitted using nested sampling. Generally speaking, the results support an anecdotal description of a typical sporting career. Young players tend to begin their careers with some raw but undeveloped ability, which improves over time as they gain experience and participate in specialised training and coaching regimes. Eventually players reach the peak of their careers, after which ability tends to decline. However, continual fluctuations in ability are commonly observed for many players, likely due to external factors such as injury and player form, which can lead to multiple peaks during a long career. The results provide more accurate quantifications of a player’s batting ability at any given point of their career, compared with traditional cricketing metrics. This has practical implications in terms of talent identification, player comparison and team selection policy.

**The C3 depletion design: Population estimation and gear calibration using Catches from Concentric Circles.**

Speaker: Liese Carleton

Affiliation: Virginia Institute of Marine Science, College of William & Mary

When: Monday, 28 May 2018, 1:00 pm to 2:00 pm

Where: 303-310

Depletion studies are often used in closed systems to estimate population size and catchability coefficient. Application of depletion methods to open water systems is hindered by the uncertain size of the defined domain due to the attraction of fish from the outside into the study area. In this novel design approach, the study area is comprised of two concentric circles. The diameter of the outer circle is specified by the length of a bottom longline, which is set repeatedly in a star pattern to serially deplete the circle. Catches are recorded as either within the smaller inner circle or in the outer ring. This design allows us to include an immigration component into the depletion model so that initial abundance, catchability, and net movement can be estimated. Gear efficiency can be derived from the estimated catchability, and could then be used to convert a survey index of abundance into an estimate of absolute population size. The method is illustrated with bottom longline sets for Atlantic sharpnose shark in the Gulf of Mexico.

**Combinatorial Inference**

Speaker: Junwei Lu

Affiliation: Department of Operations Research and Financial Engineering Princeton University

When: Monday, 14 May 2018, 1:00 pm to 2:00 pm

Where: 303-310

We propose the combinatorial inference to explore the topologicl structures of graphical models. The combinatorial inference can conduct the hypothesis tests on many graph properties including connectivity, hub detection, perfect matching, etc. On the other side, we also develop a generic minimax lower bound which shows the optimality of the proposed method for a large family of graph properties. Our methods are applied to the neuroscience by discovering hub voxels contributing to visual memories.

**Bayesian nonparametric analysis of multivariate time series**

Speaker: Alexander Meier

Affiliation: Otto-von-Guericke University Magdeburg

When: Wednesday, 9 May 2018, 3:00 pm to 4:00 pm

Where: 303.310

While there is an increasing amount of literature about Bayesian time series analysis, only few nonparametric approaches to multivariate time series exist. Many methods rely on Whittle's likelihood, involving the second order structure of a stationary time series by means of its spectral density matrix f. The latter is often modeled in terms of the Cholesky decomposition to ensure positive definiteness. However, asymptotic properties under these priors such as posterior consistency or posterior contraction rates are not known.

A different idea is to model f by means of random measures. This is in line with (1), who model the normalized spectral density of a univariate time series with a Dirichlet process mixture of beta densities. We use a similar approach, with matrix-valued mixture weights induced by a completely random matrix-valued measure (2,3). We use a class of infinitely divisible matrix Gamma distributions (4) for this purpose. While the procedure performs well in practice, we also establish posterior consistency and derive posterior contraction rates.

(1) N. Choudhuri, S. Ghosal and A. Roy (2004). Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association 99(468), 1050–1059

(2) A. Lijoi and I. Pruenster (2010). Models beyond the Dirichlet process. Bayesian nonparametrics, 28:80

(3) J. B. Robertson, M. Rosenberg, et al. (1968). The decomposition of matrix-valued measures. The Michigan Mathematical Journal, 15(3), 353-368

(4) V. Perez-Abreu and R. Stelzer (2014). Infinitely divisible multivariate and matrix Gamma distributions. Journal of Multivariate Analysis, 130, 155–175

Authors:

Alexander Meier, Otto-von-Guericke University Magdeburg

Claudia Kirch, Otto-von-Guericke University Magdeburg

Renate Meyer, The University of Auckland

**Modelling spatial-temporal processes with applications to hydrology and wildfires**

Speaker: Professor Valerie Isham

Affiliation: University College London

When: Friday, 4 May 2018, 11:00 am to 12:00 pm

Where: 303-610

Mechanistic stochastic models aim to represent an underlying physical process (albeit in highly idealised form, and using stochastic components to reflect uncertainty) via analytically tractable models, in which interpretable parameters relate directly to physical phenomena. Such models can be used to gain understanding of the process dynamics and thereby to develop control strategies.

In this talk, I will review some stochastic point process-based models constructed in continuous time and continuous space using spatial-temporal examples from hydrology such as rainfall (where flood control is a particular application) and soil moisture. By working with continuous spaces, consistent properties can be obtained analytically at any spatial and temporal resolutions, as required for fitting and applications. I will briefly cover basic model components and properties, and then go on to discuss model construction, fitting and validation, including ways to incorporate nonstationarity and climate change scenarios. I will also describe some thoughts about using similar models for wildfires.

**Dirichlet and Poisson-Dirichlet approximations**

Speaker: Han Liang Gan

Affiliation: Northwestern University

When: Thursday, 26 April 2018, 1:00 pm to 2:00 pm

Where: 303-257

The Dirichlet and Poisson-Dirichlet distributions are multi-dimensional distributions that can be used to model proportions. In this talk, we will give explicit error bounds when applying Dirichlet and Poisson-Dirichlet approximations in a variety of applications that include urn models and stationary distributions of genetic drift models. The results are derived using new developments in Stein's method.

This is joint work with Adrian Rollin (National University of Singapore) and Nathan Ross (University of Melbourne).